147 66 14MB
English Pages 375 [367] Year 2020
Studies in Computational Intelligence 864
Ricardo Jardim-Goncalves Vassil Sgurev Vladimir Jotsov Janusz Kacprzyk Editors
Intelligent Systems: Theory, Research and Innovation in Applications
Studies in Computational Intelligence Volume 864
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, self-organizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. The books of this series are submitted to indexing to Web of Science, EI-Compendex, DBLP, SCOPUS, Google Scholar and Springerlink.
More information about this series at http://www.springer.com/series/7092
Ricardo Jardim-Goncalves Vassil Sgurev Vladimir Jotsov Janusz Kacprzyk •
•
•
Editors
Intelligent Systems: Theory, Research and Innovation in Applications
123
Editors Ricardo Jardim-Goncalves Department of Electrical Engineering and Computers, Faculdade de Ciências e Tecnologia, UNINOVA-CTS Centre of Technology and Systems Universidade Nova de Lisboa Caparica, Portugal Vladimir Jotsov University of Library Studies and Information Technologies Sofia, Bulgaria
Vassil Sgurev Institute of Information and Communication Technologies Bulgarian Academy of Sciences Sofia, Bulgaria Janusz Kacprzyk Systems Research Institute Polish Academy of Sciences Warsaw, Poland
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-030-38703-7 ISBN 978-3-030-38704-4 (eBook) https://doi.org/10.1007/978-3-030-38704-4 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Contemporary intelligent systems are penetrating into all aspects of future emerging technologies. Autonomous agents, machine learning, evolutionary computation, fuzzy sets and data science theory and applications are one of the most frequently used research fields. The presented volume gives fourteen examples of recent advances in this rapidly changing area. The first chapter “Interpretable Convolutional Neural Networks Using a RuleBased Framework for Classification” is written by Zhen Xi and George Panoutsos. Fuzzy logic-based rules have been applied to the layers of a convolutional neural network (CNN). The proposed hybrid NN structure helps towards building more complex deep learning structures with fuzzy logic-based interpretability. It is shown that the proposed learning structure maintains a good level of forecasting/prediction accuracy compared to other contemporary CNN deep learning structures. This research gives valuable answers to contemporary machine learning methods by using data-driven modelling applications. Chapter “A Domain Specific ESA Method for Semantic Text Matching” written by Luca Mazzola, Patrick Siegfried, Andreas Waldis, Florian Stalder, Alexander Denzler and Michael Kaufmann aims at the creation of an education recommendertype expert system based on skills, capabilities and areas of expertise extracted from curriculum vitae or personal preferences. Semantic text matching, domain-specific filtering and explicit semantic analysis had been applied in this case. A system prototype had been discussed and compared to other semantically characterizing systems and concept extraction applications. The adoption of a granular approach is expected in future research. The authors applied the prototype in a use case demonstrating the potential to semantically match similar CVs and education descriptions. Chapter “Smart Manufacturing Systems: A Game Theory-Based Approach” by Dorothea Schwung, Jan Niclas Reimann, Andreas Schwung and Steven X. Ding represents a novel approach for self-optimization and learning by using game theory multi-player-based learning restricted in time. Multi-agent scenarios and self-learning in a flexible manufacturing approach have been used for the
v
vi
Preface
coordination of learning agents. Supervisory self-learning module is developed aiming at the the optimization of predefined quality functions using a distributed game theory-based algorithm. As a result, a modern plug-and-play control of highly flexible, modular manufacturing units has been applied for energy optimization in production environments. In chapter “Ensembles of Cluster Validation Indices for Label Noise Filtering”, the authors Jan Kohstall, Veselka Boeva, Lars Lundberg and Milena Angelova consider cluster validation measures and their usage for identifying mislabeled instances or class outliers prior to training in supervised learning problems. An extended overview of cluster analysis methods and especially of the validation of clustering results is considered in section “ Cluster Validation Techniques”. Then an ensemble technique, entitled CVI-based Outlier Filtering, is elaborated aiming at identification and elimination of mislabeled instances from the training set. Other classification-oriented advantages have been presented. The added value of this approach in comparison with the logical and rank-based ensemble solutions is demonstrated in the application part of chapter “Ensembles of Cluster Validation Indices for Label Noise Filtering”. In chapter “Interpretation, Modeling, and Visualization of Crowdsourced Road Condition Data”, the authors Pekka Sillberg, Mika Saari, Jere Grönman, Petri Rantanen and Markku Kuusisto introduce a combination of models for data gathering and analysis of the gathered data, enabling effective data processing of large data sets. A prototype system with the Web user interface is represented aiming to illustrate road condition data problems and solutions. Crowdsourcing, data collection by sensors and other practical problems of data processing by using road networks have been deeply researched, discussed, interpreted and visualized in this chapter. Chapter “A New Network Flow Platform for Building Artificial Neural Networks” is written by Vassil Sgurev Stanislav Drangajov and Vladimir Jotsov. It is shown how to migrate from the classical platform for building up multilayer artificial neural networks (ANNs) to a new platform, based on generalized network flows with gains and losses on directed graphs. It is shown that the network flow ANNs are of more general network structure than the multilayer ANNs, and consequently, all results obtained through the multilayer ANN are a part of the new network flow platform. A number of advantages of this new platform are pointed out. The possibility for effective training and recognition is proven through rigorous procedures for the network flow platform and without using of heuristic algorithms for approximate solutions, characteristic to the multilayer ANNs. The considered method is aimed at processing of big data/knowledge sets and starts to be widely distributed in many contemporary technologies. In the next chapter “Empowering SMEs with Cyber-Physical Production Systems: From Modelling a Polishing Process of Cutlery Production to CPPS Experimentation”, the authors José Ferreira, Fábio Lopes, Guy Doumeingts, João Sousa, João P. Mendonça, Carlos Agostinho, Ricardo Jardim-Goncalves claim that better tools are needed to guarantee process control, surveillance and maintenance
Preface
vii
in manufacturing small and medium enterprises (SMEs). These intelligent systems involve transdisciplinary approaches to guarantee interaction and behavioural fluidity between hardware and software components. An architecture is proposed, which is based on the process modelling and process simulation of the different stages of an existing SME factory production, allowing real-time information, through IoT data collection, to feed different mechanisms of production improvement modules. Current practices and existing problems are deeply analysed in this chapter. The discussed cyber-physical systems in manufacturing are the backbone of contemporary and emerging Industry 4.0 technologies. Model-driven paradigms have been applied in this direction. Simulation results and perspective innovations have been presented in chapter “Empowering SMEs with Cyber-Physical Production Systems: From Modelling a Polishing Process of Cutlery Production to CPPS Experimentation”. Chapter “Intelligent Approach for Analysis of 3D Digitalization of Planer Objects for Visually Impaired People” by Dimitar Karastoyanov, Lyubka Doukovska, Galia Angelova and Ivan Yatchev is dedicated to resolve the problems of persons with visual impairments who have difficulties to work with graphical computer interface. The chapter presents an approach for providing visual Braille services by 3D digitization of planar or spatial objects. The elaborated technology for graphical Braille display is based on the usage of linear electromagnetic micro-drives. An “InterCriteria” decision-making approach has been developed and applied in the application part of this chapter. Chapter “Semantically Enriched Multi-level Sequential Pattern Mining for Exploring Heterogeneous Event Log Data” written by Pierre Dagnely, Tom Ruette, Tom Tourwé and Elena Tsiporkova represent a research purposed for contemporary applications in the field of Internet of things (IoT). To exploit the event log data, the authors propose an explorative methodology that overcomes two main constraints, i.e. (1) the rampant variability in event labelling, leading to data model and semantic heterogeneities, and (2) the unavailability of a clear methodology to traverse the immense amount of generated event sequences. Ontologies have been applied for better data understanding. In chapter “One Class Classification Based Anomaly Detection for Marine Engines”, the authors Edward Smart, Neil Grice, Hongjie Ma, David Garrity and David Brown present a method that uses non-invasive engine monitoring methods and does not require training on faulty data. Support vector novelty detection and K-means clustering in one-class classifiers have been combined. High test engine results have been presented in the application sections of this chapter. Anomaly detection for marine engines by using one-class classifiers has been considered. The results are significant for developers of predictive monitoring systems. Chapter “Enhanced Methodologies in Photovoltaic Production with Energy Storage Systems Integrating Multi-cell Lithium-Ion Batteries” is written by J. B. L. Fermeiroa, J. A. N. Pombo, R. L. Velho, G. Calvinho, M. R. C. Rosário and S. J. P. S. Marianoa. An extended overview of Energy Storage System (ESS) applications is considered in this chapter. The authors aim to increase the efficiency
viii
Preface
of a photovoltaic (PV) production. In this case, a maximum power point tracking (MPPT) algorithm is proposed based on the particle swarm optimization (PSO) algorithm. Additionally, a new charging algorithm was developed based on the parameters of the battery pack in real time, extending the battery lifespan. The PSO-based MPPT algorithm results demonstrate the excellent performance of the controller. Charging efficiency comparison reveals impressive results. The authors achieve an optimized PV production and at the same time increase the ESS effectiveness and efficiency. Chapter “Mobility in the Era of Digitalization: Thinking Mobility as a Service (MaaS)” by Luís Barreto, António Amaral and Sara Baltazar represents the main issues and characteristics that any future Mobility-as-a-Service (MaaS) should consider. Smart MaaS systems should allow a more convenient provision of sustainable, versatile and attractive mobility services. The authors foresee one important line of action: to pursue amongst the European Union the intention to elaborate concrete legislation that propels the full integration of transnational mobility services and services, driving the development of a multidisciplinary and full-integrated European MaaS system. Chapter “Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm Applied to Quadrotor Aerial Robots” by Jorge Sampaio Silveira Júnior and Edson Bruno Marques Costa propose to use a state-space Takagi-Sugeno (TS) considers fuzzy models to tackle quadrotor aerial robot complexities aiming at automatic estimation of parameters. Two fuzzy modelling methodologies based on observer/Kalman filter identification (OKID) and eigensystem realization algorithm (ERA) are proposed. The key difference between the two methods is that the first one that considers the initial samples of the experimental data, while the second method disregards them. Estimation of fuzzy Markov parameters of the system from the state observer is also discussed. The applications of fuzzy C-means algorithm to both methods did not show satisfactory results. In chapter “A Generic Architecture for Cyber-Physical-Social Space Applications”, the authors Stanimir Stoyanov, Todorka Glushkova, Asya Stoyanova-Doycheva, Jordan Todorov and Asya Toskova present a reference architecture called virtual physical space. An architecture to implement an intelligent personal touristic guide is also considered. An original combination of ambient-based modelling and deep learning methods is presented in this last chapter from the book. As a result, an intelligent personal assistant is designed to support tourists. In chapter “Analysis of Data Exchanges, Towards a Tooled Approach for Data Interoperability Assessment”, a wide range of selected intelligent methods and systems is presented in this book. The authors use different theoretic results and technological innovations, and they report various advances, problems and perspectives but one common conclusion should be stated: nowadays, the intelligent element is the standard for most advanced technologies and innovations. In this direction, most groups should resolve a set of controversies aiming at elaboration of data-driven, secure, computationally efficient and enough universal systems in challenging fields like intelligent transportation, Internet of things, Industry 4.0 and
Preface
ix
smart systems. Most of the emerging intelligent systems use deep modelling possibilities aiming at semi-autonomous and autonomous control. Cybersecurity issues occur in Web-based applications: their importance is rapidly arising and could not be resolved without an innovative intelligent research. Caparica, Portugal Sofia, Bulgaria Sofia, Bulgaria Warsaw, Poland
Ricardo Jardim-Goncalves Vassil Sgurev Vladimir Jotsov Janusz Kacprzyk
Contents
Interpretable Convolutional Neural Networks Using a Rule-Based Framework for Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhen Xi and George Panoutsos
1
A Domain Specific ESA Method for Semantic Text Matching . . . . . . . . Luca Mazzola, Patrick Siegfried, Andreas Waldis, Florian Stalder, Alexander Denzler and Michael Kaufmann
25
Smart Manufacturing Systems: A Game Theory based Approach . . . . . Dorothea Schwung, Jan Niclas Reimann, Andreas Schwung and Steven X. Ding
51
Ensembles of Cluster Validation Indices for Label Noise Filtering . . . . . Jan Kohstall, Veselka Boeva, Lars Lundberg and Milena Angelova
71
Interpretation, Modeling, and Visualization of Crowdsourced Road Condition Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pekka Sillberg, Mika Saari, Jere Grönman, Petri Rantanen and Markku Kuusisto
99
A New Network Flow Platform for Building Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Vassil Sgurev, Stanislav Drangajov and Vladimir Jotsov Empowering SMEs with Cyber-Physical Production Systems: From Modelling a Polishing Process of Cutlery Production to CPPS Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 José Ferreira, Fábio Lopes, Guy Doumeingts, João Sousa, João P. Mendonça, Carlos Agostinho and Ricardo Jardim-Goncalves Intelligent Approach for Analysis of 3D Digitalization of Planer Objects for Visually Impaired People . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Dimitar Karastoyanov, Lyubka Doukovska, Galia Angelova and Ivan Yatchev
xi
xii
Contents
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring Heterogeneous Event Log Data . . . . . . . . . . . . . . . . . . . . 203 Pierre Dagnely, Tom Ruette, Tom Tourwé and Elena Tsiporkova One Class Classification Based Anomaly Detection for Marine Engines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 Edward Smart, Neil Grice, Hongjie Ma, David Garrity and David Brown Enhanced Methodologies in Photovoltaic Production with Energy Storage Systems Integrating Multi-cell Lithium-Ion Batteries . . . . . . . . 247 J. B. L. Fermeiro, J. A. N. Pombo, R. L. Velho, G. Calvinho, M. R. C. Rosário and S. J. P. S. Mariano Mobility in the Era of Digitalization: Thinking Mobility as a Service (MaaS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Luís Barreto, António Amaral and Sara Baltazar Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm Applied to Quadrotor Aerial Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Jorge Sampaio Silveira Júnior and Edson Bruno Marques Costa A Generic Architecture for Cyber-Physical-Social Space Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Stanimir Stoyanov, Todorka Glushkova, Asya Stoyanova-Doycheva, Jordan Todorov and Asya Toskova Analysis of Data Exchanges, Towards a Tooled Approach for Data Interoperability Assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . 345 Nawel Amokrane, Jannik Laval, Philippe Lanco, Mustapha Derras and Nejib Moala
Interpretable Convolutional Neural Networks Using a Rule-Based Framework for Classification Zhen Xi and George Panoutsos
Abstract A convolutional neural network (CNN) learning structure is proposed, with added interpretability-oriented layers, in the form of Fuzzy Logic-based rules. This is achieved by creating a classification layer based on a Neural Fuzzy classifier, and integrating it into the overall learning mechanism within the deep learning structure. Using this new structure, one can extract linguistic Fuzzy Logic-based rules from the deep learning structure directly, and link this information to input features, which enhances the interpretability of the overall system. The classification layer is realised via a Radial Basis Function (RBF) Neural-Network, that is a direct equivalent of a class of Fuzzy Logic-based systems. In this work, the development of the RBF neural-fuzzy system and its integration into the deep-learning CNN is presented. The proposed hybrid CNN RBF-NF structure can form a fundamental building block, towards building more complex deep-learning structures with Fuzzy Logic-based interpretability. Using simulation results on benchmark data (MNIST handwriting digits and MNIST Fashion) we show that the proposed learning structure maintains a good level of forecasting/prediction accuracy compared to CNN deep learning structures. Crucially, we also demonstrate in both cases the resulting interpretability, in the form of linguistic rules that link the classification decisions to the input feature space. Keywords Deep learning · Convolutional neural networks · Fuzzy logic · Interpretable machine learning
Z. Xi (B) · G. Panoutsos Department of Automatic Control and Systems Engineering, University of Sheffield, Mappin Street, Sheffield S1 3JD, UK e-mail: [email protected] G. Panoutsos e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_1
1
2
Z. Xi and G. Panoutsos
1 Introduction In data-driven modelling systems and methods, machine learning has received considerable attention, in particular in the past decade where advancements in availability of computing power enabled more applications. Machine Learning focuses on applied maths and computing algorithms for creating ‘computational machines’ that can learn to imitate system behaviours automatically [1]. As a subarea of Artificial Intelligence (AI), using machine learning (ML) one could also construct computer systems and algorithms to improve performance based on what has already been experienced (empirical-based, learning from examples) [1, 2]. ML has emerged as a popular method for process modelling, also used in natural language processing, speech recognition, computer vision, robot control, and other applications [1–3]. Unlike traditional system modelling methods (physics-based, numerical etc.), machine learning does not require a dynamic process model but sufficient data, including input data and output data of a specific system, hence a class of machine learning algorithms can be considered as data-driven modelling methods that are able to capture static or dynamic process behaviour in areas such as manufacturing and biomedical systems among others. Gong et al. introduced a way to analysis time series signals and to create a human body model using CNNs [4]. Segreto et al. evaluated the correlation between wavelet processed time series signals and the machining conditions using neural networks [5]. Based on the type of modelling structures used, machine learning could be broadly viewed in two parts with – to a certain extent – unclear boundaries, which are statistical modelling and learning, and neural and other hybrid network structures [2]. In deep learning in particular [2], convolutional neural networks (CNNs) have been widely used [6–8]. CNNs are a kind of feed-forward neural network using convolutional cores to process data in multiple arrays. Multiple arrays could be in the form of variable data modalities: 1D for time-domain signals, 2D for images, and 3D for videos [3]. Using CNN deep learning structures has been very successful for certain class of applications, for example Szegedy et al. proved a deep enough network can classify ImageNet [9] efficiently [6], and He et al. provided a model structure to build deep neural networks without considerable gradient loss [10]. Simonyan and Zisserman show that CNNs could be designed as even ‘deeper’ structures, and perform even better in ImageNet classification problems [11]. Deep CNN networks however, lack any significant interpretation, and act as ‘black boxes’ that predict/classify data well, and this is understandable given their deep structure and overall inherent complexity. There is an opportunity, to use the paradigm of Fuzzy Logic (FL) theory, and attempt to add linguistic interpretability to neural-based structures [12–14] and achieve the same effect on deep learning structures. Successful implementation would be beneficial to a variety of problems, in particular in cases where there is a need for human-machine interaction, such as in decision support systems for critical applications (healthcare, biomedical, high-value manufacturing etc.). For example, one could use such a framework to provide linguistic interpretation to classification tasks performed by deep learning networks.
Interpretable Convolutional Neural Networks Using …
3
There are existing attempts in the literature to combine FL with deep learning. Muniategui et al. designed a system in spot welding monitoring [15]. In this approach the authors use the deep learning network only as a method for data pre-processing, followed by the FL classifier as a separate process step. In an attempt to reduce data size without affect monitoring performance, a system based on deep learning and FL classification was introduced. Using a deep convolutional auto-encoder, an image could be compressed from resolution of 120 × 120 to 15 × 15 without affecting the overall performance of the fuzzy classification methodology. Deng et al. introduced a FL-based deep neural network (FDNN) which extracts information from both neural representation and FL simultaneously [16]. It was shown that the FDNN has higher classification accuracy than networks based on NN or FL separately and then fusions the results from the two kinds of networks. The current gap in the research literature is in that the deep learning methodologies, when combined with FL, are not integrated together as a single system. In this research work, a CNN-based deep learning structure is used as the fundamental building block of a data-driven classification network. For the first time in the literature, a FL-based layer (in the form of a hybrid Neural-Fuzzy network) is introduced as an integral part of the overall CNN structure, which acts as the main classification layer (fully connected) of the deep learning structure. Consequently, one could extract directly from the deep learning structure linguistic rules in the form of a FL rule-base. Via simulation results based on a popular benchmark problem/dataset we show that the proposed network structure performs as well as CNN-based structures, hence there is a measurable but not significant loss of performance by introducing the FL layer as part of the deep learning structure.
2 Background Theory and Methods In this section we cover the fundamental background theory and methods, used to create the proposed CNN-FL structure.
2.1 Radial Basis Function Neural-Fuzzy Network RBF networks were formulated in [17] as a learning network structure. RBF networks can also be used efficiently as as a kernel function for a variety of machine learning methodologies, for example in Support Vector Machines (SVMs) to solve non-linear classification problems [18]. Similar to SVM, RBF networks could be implemented as FL-based systems [12]. In this section, for the benefit of the reader, the RBF-NF network is summarised (Fig. 1), and its relevance to the deep learning structure is shown, while full details of the fundamental RBF network as a data-driven model can be found in [12, 19].
4
Z. Xi and G. Panoutsos
Input layer
RBF layer m1
Ouput layer
x1
.. .
.. .
z1 mi
xj
zi
.. .
.. .
y
zn mn
xm
Fig. 1 RBF network structure
Equation (1) represents a multiple-input and single-output (MISO) FL system with m system inputs and p number of rules, where μi j (x j ) defined in (2) is the Gaussian membership function of input x j belonging to the i-th rule and ci j and σi j are the centre and width of the Gaussian membership function respectively [12]. The overall function z(x) could be adjusted to represent one of the following three forms of FL-based systems: • Singleton; • Mamdani; • Takagi-Sugeno. In the proposed work, the overall system function z(x) will be considered as a Singleton model. Figure 1 depicts the structure of the RBF network, where X n are the system’s inputs, m n the membership function of each rule-input combination, z n the Takagi-Sugeno polynomial function for each rule, and y the overall output of the system. Hence the output function takes the mathematical form shown in (3).
Interpretable Convolutional Neural Networks Using …
y=
p
zi
i=1 p
=
5
m
j=1 μi j x j p m j=1 μi j x j i=1
,
(1)
z i gi (x),
i=1
μi j (x j ) = exp −
zi =
p
x j − ci j
2
,
σi2j
bi xi
(2)
(3)
i=1
Equation (2) could be expressed in vector form, as follows (which is also the expression for a RBF in i dimensions): m i (x) = exp −x − ci 2 /σ i2 ,
(4)
thus this FL system could be written as: y=
p
z i m i (x) /
i=1 p
=
p
m i (x) ,
(5)
i=1
z i gi (x),
(6)
i=1
where gi =
m
j=1 μi j x j p m i=1 j=1 μi j x j p
= m i (x) /
m i (x).
, (7)
i=1
2.2 A Convolutional Neural Network A CNN structure for image classification would contain several layers, grouped in a way to perform specific tasks. Figure 2 demonstrates a typical CNN architecture. The first few layers would be multiple pairs of convolution layers and pooling layers. The size of these convolution windows can be different, which ensure convolution layers can extract features in different scales. The pooling layers are proposed to sub-sample features into a smaller size, where a max pooling method is generally used. Then,
6
Z. Xi and G. Panoutsos
Fig. 2 Representative CNN structure
fully connected layers would also be used, in which neurons are fully connected to all outputs from the previous layer. These layers also convert the data structure from a multiple-layer structure to a vector form. Rectangular linear units (ReLUs) would normally be the activation function of the convolution layers as well as in the fully connected layers as these can provide non-linear properties to those layers and are also convenient for the calculation of the error backpropagation [20]. To avoid exploding and vanishing gradients in deep networks, batch normalisation can also be applied in every layer [21]. CNNs are not considered as convex functions, which means parametric optimisation for CNNs is challenging, hence numerous optimisation strategies have been developed [22], such as stochastic gradient descent (SGD), Nesterov momentum [23], and adaptive subgradient (Adagrad) methods [24]. Figure 3 depicts the overall structure of a CNN network. This model was designed to use 28 × 28 pixel grey-scale images as input. After two convolutional layers, a max pooling layer was added. The dropout layers were applied to avoid overfitting. The Flatten layer was added to convert data structure into vectors, and two Dense layers are fully connected layers. All activation functions in this model were ReLUs. The loss function of this model was cross entropy loss function, which is widely used in CNNs [6, 7]. In the proposed research work, the adaptive sub-gradient method was applied to perform the learning task, to take advantage of its fast convergence properties. In order to achieve a good balance between training speed and avoidance of overfitting the batch size was chosen as 128. Table 1 shows the architecture of the designed CNN.
3 Proposed CNN-FL Modelling Structure Adding interpretability features in deep learning structures could benefit certain applications of deep learning, where interpretability can be used to provide additional function. For example, in advanced manufacturing systems, where understanding and modelling images and videos of complex processes are critical tasks. A process model (or classifier) based on CNNs could be developed to take advantage of processing data in array forms [3] which has already been proven to be very effective [6, 25] in a number of applications. We propose, to achieve an enhanced interpretability in
Interpretable Convolutional Neural Networks Using …
7
Fig. 3 Basic CNN layered structure
a CNN deep learning structure by performing the final classification task using a Fuzzy Logic-based structure. In this section, we describe the integration of a RadialBasis-Function Neural-Fuzzy layer into the deep learning structure, that provides the mechanism to extract a linguistic rule base from the CNN.
8
Z. Xi and G. Panoutsos
Table 1 Basic CNN architecture Type Patch size/stride Convolution Convolution Maxpooling Dropout (25%) Flatten Linear Dropout (50%) Linear Softmax
3 × 3/0 3 × 3/0 2 × 2/0
Output size
Parameters
26 × 26 × 32 24 × 24 × 64 12 × 12 × 64 12 × 12 × 64 9216 64 128 10 10
320 18,496 0 0 0 589,888 0 1290 0
3.1 Convolutional Neural Network with an RBF Fuzzy Logic Rule-Base Classification Layer In this section, the main CNN structure is detailed, and it is shown how the RBF-NF layer is integrated into the overall network structure and learning methodology. In [3], LeCun states the usage of convolution layers of CNNs is to extract different scale features. In this research work, it is proposed that a deep learning network, which includes a convolution layered structure, and for the first time in the literature include a FL layer (RBF) to perform the classification task. An extra layer was proposed here, which is an RBF layer to maintain the rule base of the system. To defuzzify the FL statements into crisp classification labels, a normalised exponential function (softmax) is used. Due to the addition of the FL layer one has to consider the credit assignment and error backpropagation for these layers which is not a trivial task. Figure 4 depicts the architecture of the FL RBF-CNN, and Table 2 shows parameter setting of the FL RBF-CNN. Similar to FL RBF networks, FL RBF-CNNs will also be sensitive to initial conditions (initial model structure and parameters) of the RBF and defuzzification layers, and the initial parameters could not be determined since the features from CNN layers have not been extracted before training. Therefore, one has to establish some initial conditions for the FL rule base for successful model training. The overall training would rely on a square error loss function and it would be performed as follows. This model could be mainly separated into two parts, which are CNN layers as a feature extractor and FL layers as a classifier. As mentioned in Table 2, the first 7 layers of this FL RBF-CNN model are identical to the reference CNN model, while the last two layers were changed to RBF layer and defuzzification layer. Following from Eqs. (4) and (7), the activation function becomes:
2 (8) m lj = exp − xl−1 − ci /σ i2 ,
Interpretable Convolutional Neural Networks Using …
Fig. 4 FL RBF-CNN layered structure
9
10
Z. Xi and G. Panoutsos
Table 2 FL RBF-CNN architecture Type Patch size/stride
Output size
Parameters 320 18,496 0 0 0 9216×Feature size 0 2×Rule size Rule numbers
Convolution Convolution Maxpooling Dropout (25%) Flatten Linear
3 × 3/0 3 × 3/0 2 × 2/0 – – –
26 × 26 × 32 24 × 24 × 64 12 × 12 × 64 12 × 12 × 64 9216 Feature size
Dropout (50%) RBF Defuzzy
– – –
Feature size Rule size 1
j
glj = m j /
p
m lj ,
(9)
j=1
therefore,
2 glj = s − xl−1 − ci /σ i2 ,
(10)
where s(x) is a softmax function. In the defuzzification layer, using gkl = gl−1 j , there would be y l = z l · gl .
(11)
Noteworthily, the outputs of a RBF layer would be continuous floating numbers rather than discrete integers. Rounding the output of this layer to the nearest integer (based on a predetermined threshold) provides the integer class.
3.2 Identification of Interpretable Features The input space for the rule-based structure, in our case the fully connected RBF layer, is a flat vector of weights, as shown in the CNN structure depicted in Fig. 4. To enhance the interpretability of the antecedent part of the ‘IF…THEN…’ Fuzzy Logic rules we propose to ‘track back’ the weights of the flat input vector of the fully connected layer towards revealing the relevant features of the input image. Effectively, we propose to associate via this mechanism relevant rules to features in the image’s feature space, so that the user can appreciate which rules are responsible for each classification decision, and what is the relevant input space for each rule in the feature space. This is achieved as follows:
Interpretable Convolutional Neural Networks Using …
11
For any CNN model, as defined in the structure shown in Table 2, the final layer acts linearly [25]. Therefore, to track back the weights of the input space of the fully connected layer, one can follow the process: • Calculate a mask layer using a least mean square solution. The input is a vector of selected features, and the output is a vector of 9216 points. • Reshape the 9216-point vector as a tensor a with dimensions 12 − 12 − 64. • Use tensor a as the mask. Calculate a track-back maxpooling layer result using Mtrackback = M ◦ a. • Reveal feature ‘heat maps’ using mask Mtrackback . This procedures could be applied on any CNN model of the proposed structure, regardless of having a FL-based RBF layer or not. The advantage of using the FLbased RBF layer is that one can now link linguistic rules to the input feature space using the above described process. As the FL-RBF layer consists of multiple fuzzy rules, the relative importance of each rule could be estimated using a Fuzzy Logic entropy measure. In the proposed framework, a non probabilistic entropy function could be used [26], this is shown below. H = −K
n
(μi log (μi ) + (1 − μi ) log (1 − μi )) ,
(12)
i=1
where K is a positive constant (usually equal to 1/N for normalisation), and μi is i-th membership degree. Using Fuzzy Entropy to identify the most ‘active’ rules for a given prediction, and by identifying the Membership Functions of each rule with the highest relevance to the input vector (membership degree), then a framework can be established to directly link rules, to input space features; this is demonstrated in the Simulation Results is Sect. 4.
4 Simulation Results In this section, the proposed RBF-CNN modelling framework is tested against two popular benchmark datasets to assess its learning and recall performance, as well as demonstrate the developed linguistic interpretability. In both benchmark case studies (MNIST character recognition, and MNIST Fashion), Adadelta [27] was used for optimisation, with an adaptive learning rate during training. Early stopping was used to avoid overfitting.
12
Z. Xi and G. Panoutsos
Fig. 5 Several examples from the MNIST characters dataset
4.1 Case Study: MNIST The modified National Institute of Standards and Technology (MNIST) database, which was introduced in [8], was chosen as a case study; the MNIST database is a labelled handwriting digits dataset containing 60,000 training images and 10,000 testing images, an example is shown in Fig. 5. The training images were further split into two parts randomly, as 50,000-samples for training set and 10,000-samples for validation (to avoid overfitting).
4.1.1
Modelling Performance
Simulation results were created to assess the performance of the developed deep learning structure. This is done in two parts. First the learning performance on a popular benchmark data set is assessed. This is achieved by comparing the proposed learning structure against a classical and state-of-the art CNN structure. On the second part, the robustness of the learning ability of the proposed system is assessed by reducing consecutively the number of features and evaluating the learning and recall performance. The presented results include the mean classification accuracy as well as the standard deviation in each case. Each set of simulation results shows the loss function during training and validation as well as the classification accuracy for training and validation. This is presented for a number of rules, for the rule base of the FuzzyLogic-based classification layer (varying from 3 — simpler — to 15 rules — more complex). The learning model makes use of an adaptive learning rate method to optimise the model weights. The model is trained for 50 epochs, but also includes an early stopping criterion, to stop earlier if the validation performance is not improved, with an improvement window of 15 epochs. After the stop, the model weights which resulted the smallest validation loss would be stored for the following process. As shown in Fig. 6, the training of this network with 64 features converges within the first 30 epochs. The mean training accuracy (for 10 repeats) of this model was 99.16%, and both the validation and test accuracy of this model are at around 97.5% which is
Interpretable Convolutional Neural Networks Using …
13
Loss and accuracy in training and validation with 64 features 1
0.4 0.35
0.98 0.3 0.96
0.94
0.2
Loss
Accuracy
0.25 Accumulative training accuracy Overall training accuracy Validation accuracy Training loss Validation loss
0.15 0.92 0.1 0.9 0.05 0.88
0
5
10
15
20
25
0 30
Epoch
Fig. 6 CNN model with 64 features, training and validation performance (average of 10 simulations)
comparable with other state-of-the-art CNN classification structures. As an example comparison, LeNet-5 [8], which has a similar structure, achieves an accuracy of 99%. A higher test classification accuracy (99.77%) is achieved in [28], however this is achieved with a significantly more complex structure. One can therefore conclude that the proposed structure does not sacrifice significant performance in this case study, despite the much simpler overall structure that aims at enhancing the interpretability of the model rather than its accuracy. The performance of this FL RBF-CNN is further assessed via reducing the number of classification features from 64, to 32 and finally to 16. The same algorithmic approach was followed, as presented in the training of the basement model. Tables 3, 4, and 5 were generated with using the raw simulation results (10 repeats per training case). In each of these three tables, there are two columns whose values are average accuracy and standard variance for training, validation, and test case respectively, and every feature case were trained from 3 to 15 rules as listed in with a reference CNN network result (labelled as REF). As shown in Table 3, the mean accuracy has a trend that would reach the best performance when the number of Fuzzy Logic rules equals to 7. However, to a certain extent, despite of the good performance, a model having 64 features may not be very interpretable, hence models with 32 and 16 features were also simulated to ‘stress-test’ the performance of the proposed structure. When the size of the classification features decreases, the neurons of the last fully connected layers also gets reduced. It is expected to observe a reduced classification power due to the fewer model parameters available to capture the classification prob-
14
Z. Xi and G. Panoutsos
Table 3 Accuracy mean and standard deviation of the model using 64 features and MNIST characters data Rule Training Validation Test Mean (%) Std. (%) Mean (%) Std. (%) Mean (%) Std. (%) 3 5 7 9 11 13 15 REF
98.80 98.83 99.16 99.05 98.93 98.07 97.89 99.74
0.25 0.60 0.18 0.27 0.46 2.90 2.42 0.06
97.23 97.32 97.80 97.79 97.63 96.82 96.81 99.03
0.27 0.56 0.27 0.34 0.40 2.97 2.44 0.06
97.01 97.09 97.52 97.52 97.30 96.56 96.37 99.05
0.17 0.53 0.20 0.27 0.39 2.86 2.40 0.07
Table 4 Accuracy mean and standard deviation of the model using 32 features and MNIST characters datasets Rule Training Validation Test Mean (%) Std. (%) Mean (%) Std. (%) Mean (%) Std. (%) 3 5 7 9 11 13 15 REF
91.09 95.60 95.12 94.40 96.00 97.12 95.19 99.54
3.94 3.17 4.05 3.84 3.35 2.52 3.54 0.07
89.91 94.38 94.11 93.46 94.73 95.88 94.31 98.84
3.56 2.89 3.61 3.64 3.31 2.20 3.26 0.08
89.65 94.25 93.89 93.19 94.54 95.77 93.92 98.88
3.80 2.98 3.66 3.67 3.23 2.26 3.29 0.07
lem. In general, the classification accuracy is reduced, as demonstrated in Tables 4 and Table 5. In the case of 32 features, the test accuracy that reached with 13 rules of 97.12% could be considered as acceptable, however the test accuracy of 78.57% in the case with 16 features using 7 rules demonstrates that there is a significant performance loss when the number of features is significantly lower.
4.1.2
FL-based Interpretability: MNIST Characters
In some application areas, the interpretability of models could be key to understanding the underlying processes. For example in complex manufacturing processes, when trying to understand the conditions causing faults and defects. With the fully connected layer of the proposed CNN structure being a Fuzzy Logic based layer, one can enhance the interpretability of the classification task, by extracting Fuzzy Logic linguistic rules directly from the classification layer. Such
Interpretable Convolutional Neural Networks Using …
15
Table 5 Accuracy mean and standard deviation of the model using 16 features and MNIST characters datasets Rule Training Validation Test Mean (%) Std. (%) Mean (%) Std. (%) Mean (%) Std. (%) 3 5 7 9 11 13 15 REF
72.58 74.22 79.21 77.22 72.56 76.74 75.85 98.94
8.64 11.24 8.19 8.62 7.09 7.86 6.58 0.10
72.27 73.93 78.78 76.90 72.64 76.38 75.68 98.40
8.08 10.12 7.27 7.88 6.05 7.15 5.65 0.13
71.97 73.58 78.57 76.70 71.96 76.17 75.27 98.37
8.59 11.11 7.72 8.39 7.04 7.34 6.43 0.12
information can be, for example, further used to aid decision making, or to assist the creation of human-machine interfaces. Figure 7, as an example, depicts two different rules from the rule base of the 32-feature FL RBF-CNN model; just four inputs (features) and one output (classification weight) are shown for simplicity. Rule 1 for example, translates into the following Singleton-based Fuzzy rule:
A1
1 0.5 0
A2
1 0.5
0
0.2
0.4
0.6
0.8
B1
1 0.5 0
0
1
0.2
0.4
0.6
0.8
1
0.6
0.8
1
0.6
0.8
1
0.6
0.8
1
9
9.5
10
B2
1 0.5
0
0.2
0.4
0.6
0.8
C1
1 0.5 0
0
1
0.2
0.4
C2
1 0.5
0
0.2
0.4
0.6
0.8
D1
1
1 0.5 0 -1
0.2
0.4
0.6
0.8
1
O1
-0.5
0
0.2
0.4
D2
1 0.5 0
0.5 0
0
1
0.5
1
1.5
0
1 0.5 0 7.5
0.2
0.4
O2
8
8.5
Fig. 7 Example of two FL rules, of the FL RBF-CNN model with 32 features
16
Z. Xi and G. Panoutsos
‘I F Featur e 1 is A1, and Featur e 2 is B1, and Featur e 3 is C1, and..etc. T H E N the Out put class is O1. (13)
During feature extraction and classification, a trained CNN structure would entail a set of image-like matrices, which could be visualised using the method described in Sect. 3. Figure 8 demonstrates such images for an input digit ‘0’ as in Fig. 8a. Figure 8b is extracted from the first CNN layer, which outlines the outer round feature of ‘0’. Furthermore, Fig. 8c depicts more abstract features, including the outer edges and inner edges, and Fig. 8d includes similar features as Fig. 8c albeit with a lower definition. Using the methodology outlined in Sect. 3.2, feature maps linked to specific Fuzzy Logic rules can be obtained. Figure 9 demonstrated three fuzzy rules corresponding to cases in Fig. 10 in the same sequence. Figure 10 depicts three feature maps corresponding to relevant FL rules which were identified by fuzzy entropy. For a given input vector, using the linguistic FL rules, and the corresponding image-based features it is possible to appreciate why a particular class has been predicted. This may
(a) The input image for the prediction
(c) The output of the second CNN layer
(b) The output of the first CNN layer
(d) The output of the maxpooling layer
Fig. 8 Features extracted by FL RBF-CNN during prediction for a sample in MNIST characters dataset
Interpretable Convolutional Neural Networks Using …
17
Fig. 9 Three fuzzy rules, of the FL RBF-CNN model with 32 features for the MNIST digit
be trivial for the character recognition case study, however it can be extremely important when investigating problems in manufacturing, biomedical systems etc. when trying to understand the feature space for a particular prediction.
4.2 Case Study: Fashion MNIST The Fashion-MNIST dataset was introduced to replace MNIST as a new benchmark dataset [29] for machine learning. The Fashion-MNIST dataset contains 60,000 training images and 10,000 testing images, which includes 28-by-28 greyscale images labelled into 10 classes. For comparative analysis purposes, the training regime in this case study is set to be the same as the one applied in the MNIST character recognition case, i.e. a 10,000-sample validation set is selected randomly from the training dataset and used to avoid overfitting. 4.2.1
Modelling Performance
An identical performance assessment is used, as in Sect. 4.1. Tables 6, 7, and 8 were also generated with 10 times simulations. Similar to the MNIST case, the FL RBFCNN model would achieve best performance when 5–7 rules are used. As shown in Table 6, the mean test accuracy fluctuated around 84.0% since 3 rules to 13 rules. In the case of 32 features (Table 7) and 16 features (Table 8, the test accuracy that reached with 5 rules of 80.03% could be considered as acceptable for a model without specific tuning, however the test accuracy of 62.69% in the case with 16 features using 13 rules demonstrates that there is a significant performance loss when the number of features is significantly lower.
18
Z. Xi and G. Panoutsos
Fig. 10 Example of three ‘heat maps’ sorted by fuzzy entropy of the MNIST digit
(a) Features extracted by rule one
(b) Features extracted by rule two
(c) Features extracted by rule three
Interpretable Convolutional Neural Networks Using …
19
Table 6 Accuracy mean and standard deviation of the model using 64 features and fashion MNIST Fashion datasets Rule Training Validation Test Mean (%) Std. (%) Mean (%) Std. (%) Mean (%) Std. (%) 3 5 7 9 11 13 15 REF
90.71 89.04 90.55 91.14 90.55 92.05 85.84 97.55
1.64 2.06 1.67 1.51 4.36 1.87 6.33 0.31
84.84 85.58 85.88 86.31 86.02 86.60 83.02 92.88
1.07 1.08 1.40 0.61 2.47 0.67 4.40 0.25
83.52 83.41 83.64 84.47 83.77 84.70 80.67 92.34
1.15 1.41 1.44 0.96 3.02 1.07 4.76 0.13
Table 7 Accuracy mean and standard deviation of the model using 32 features and fashion MNIST Fashion datasets Rule Training Validation Test Mean (%) Std. (%) Mean (%) Std. (%) Mean (%) Std. (%) 3 5 7 9 11 13 15 REF
82.39 85.76 84.78 83.59 82.56 79.73 81.99 95.83
8.24 3.80 4.11 5.11 4.96 5.67 5.26 0.18
79.57 81.55 81.98 80.75 79.75 77.98 79.20 92.44
7.82 2.80 2.42 3.93 3.87 4.17 3.82 0.32
77.18 80.03 79.76 78.83 77.87 75.62 77.49 91.77
7.61 2.80 3.22 3.99 3.69 4.69 4.03 0.12
Table 8 Accuracy mean and standard deviation of the model using 16 features and fashion MNIST Fashion datasets Rule Training Validation Test Mean (%) Std. (%) Mean (%) Std. (%) Mean (%) Std. (%) 3 5 7 9 11 13 15 REF
61.42 61.95 63.01 64.30 63.12 65.19 60.68 93.96
4.14 4.50 5.44 4.70 4.68 2.89 6.17 0.34
59.81 60.57 61.58 62.63 62.27 63.99 59.63 91.26
4.16 3.95 5.14 4.13 4.35 2.36 5.54 0.32
58.65 59.08 60.64 61.84 60.82 62.69 58.86 90.73
4.21 4.05 4.90 4.13 4.36 2.66 5.77 0.23
20
4.2.2
Z. Xi and G. Panoutsos
FL-based Interpretability: Fashion MNIST
A sneaker shoe is used here as an example. Figure 11 demonstrates the relevant features obtained within this particular prediction. Figure 11b shows the outline shape of the shoe, and Fig. 11c, d demonstrates further abstract features, linked to the relevant FL rules. Similar to the case in the MNIST benchmark, Fig. 12 shows three fuzzy rules could be used to track back to feature maps, and Fig. 13 contains three feature maps corresponding to tracked feature maps with three fuzzy rules respectively.
(a) The input image for the prediction
(b) The output of the first CNN layer
(c) The output of the second CNN layer (d) The output of the maxpooling layer Fig. 11 Features extracted by FL RBF-CNN during prediction for a sample in Fashion MNIST
Interpretable Convolutional Neural Networks Using …
21
Fig. 12 Three fuzzy rules, of the FL RBF-CNN model with 32 features for the Fashion MNIST image
5 Conclusion In this research work, an interpretability-oriented deep learning network is presented, based on a CNN structure combined with a Fuzzy Logic structure to perform the classification task and also provide the capability to linguistically interpret the structure’s rule base. By combining the feature extraction property of CNNs and the classification and interpretability ability of FL based systems, an FL RBF-CNNs was developed. The proposed structure relies on a Radial Basis Function realisation of the Neural-Fuzzy network, which is integrated into the CNN structure via an adaptive subgradient method for the credit assignment and error backpropagation. A systematic algorithmic process is also developed to assign features to specific FL linguistic rules, and identify such rules using an entropy function. The combination of the new modelling structure, with the rule identification and linking to the input feature space, yields a methodology that can be used to provide linguistic interpretability to a deep learning structure. We demonstrate, via two case studies (MNIST characters, and MNIST fashion) that there is no significant predictive performance loss, given enough features are used, and the rules to features maps can be used to provide interpretability to a given classification.
22
Z. Xi and G. Panoutsos
Fig. 13 Example of three ‘heat maps’ sorted by fuzzy entropy of the Fashion MNIST image
(a) Features extracted by rule one
(b) Features extracted by rule two
(c) Features extracted by rule three
Interpretable Convolutional Neural Networks Using …
23
References 1. R. Schaprie, Computer Science 511 Theoretical Machine Learning (Computer Science Department, Princeton University, Princeton, 2008) 2. M.I. Jordan, T.M. Mitchell, Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015) 3. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521(7553), 436–444 (2015) 4. J. Gong, M.D. Goldman, J. Lach, Deep motion: a deep convolutional neural network on inertial body sensors for gait assessment in multiple sclerosis, in 2016 IEEE Wireless Health, WH 2016 (2016), pp 164–171 5. T. Segreto, A. Caggiano, S. Karam, R. Teti, Vibration sensor monitoring of nickel-titanium alloy turning for machinability evaluation. Sensors (Switzerland) 17(12) (2017) 6. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp 1–9 7. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems 25, ed. by F. Pereira, C.J.C. Burges, L. Bottou, K.Q. Weinberger (Curran Associates, Inc., 2012), pp 1097–1105 8. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998) 9. J. Deng, W. Dong, R. Socher, L. Li, ImageNet: A large-scale hierarchical image database, in 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009), pp. 248–255 10. K. He, X. Zhang, S. Ren, J. Sun, Deep Residual Learning for Image Recognition (2015). CoRR abs/1512.03385 11. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition. CoRR (2014) 12. G. Panoutsos, M. Mahfouf, A neural-fuzzy modelling framework based on granular computing: concepts and applications. Fuzzy Sets Syst. 161(21), 2808–2830 (2010) 13. G. Panoutsos, M. Mahfouf, G.H. Mills, B.H. Brown, A generic framework for enhancing the interpretability of granular computing-based information, in 2010 5th IEEE International Conference Intelligent Systems. IEEE (2010), pp. 19–24 14. R.P. Paiva, A. Dourado, Interpretability and learning in neuro-fuzzy systems. Fuzzy Sets Syst. 147(1), 17–38 (2004) 15. A. Muniategui, B. Hériz, L. Eciolaza, M. Ayuso, A. Iturrioz, I. Quintana, P. Álvarez, Spot welding monitoring system based on fuzzy classification and deep learning, in 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (2017), pp. 1–6 16. Y. Deng, Z. Ren, Y. Kong, F. Bao, Q. Dai, A hierarchical fused fuzzy deep neural network for data classification. IEEE Tran. Fuzzy Syst. 25(4), 1006–1012 (2017) 17. D.S. Broomhead, D. Lowe, Radial Basis Functions, Multi-variable Functional Interpolation and Adaptive Networks Technical report, Royal Signals and Radar Establishment Malvern (United Kingdom) (1988) 18. K. Muller, S. Mika, G. Ratsch, K. Tsuda, B. Scholkopf, An introduction to Kernel-based learning algorithms. IEEE Trans. Neural Netw. 12(2), 181–201 (2001) 19. M.Y. Chen, D.A. Linkens, A systematic neuro-fuzzy modeling framework with application to material property prediction. IEEE Trans. Syst. Man Cybern. B Cybern. 31(5), 781–90 (2001) 20. X. Glorot, A. Bordes, Y. Bengio, Deep sparse rectifier neural networks, in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (2011), pp 315–323 21. S. Ioffe, C. Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (2015). CoRR abs/1502.03167 22. T. Schaul, I. Antonoglou, D. Silver, Unit Tests for Stochastic Optimization (2013). arXiv:13126055 [cs] 1312.6055
24
Z. Xi and G. Panoutsos
23. Y. Bengio, N. Boulanger-Lewandowski, R. Pascanu, Advances in optimizing recurrent networks, in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (2013), pp. 8624–8628 24. J. Duchi, E. Hazan, Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011) 25. N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014) 26. S. Al-sharhan, F. Karray, W. Gueaieb, O. Basir, Fuzzy entropy: a brief survey, in 10th IEEE International Conference on Fuzzy Systems (Cat. No.01CH37297), vol. 3, vol. 2 (2001), pp. 1135–1139 27. M.D. Zeiler, ADADELTA: An Adaptive Learning Rate Method (2012). CoRR abs/1212.5701 28. D. Cire¸san, U. Meier, J. Schmidhuber, Multi-column Deep Neural Networks for Image Classification (2012). arXiv:12022745 [cs] 1202.2745 29. H. Xiao, K. Rasul, R. Vollgraf, Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms (2017). CoRR abs/1708.07747, 1708.07747
A Domain Specific ESA Method for Semantic Text Matching Luca Mazzola, Patrick Siegfried, Andreas Waldis, Florian Stalder, Alexander Denzler and Michael Kaufmann
Abstract An approach to semantic text similarity matching is concept-based characterization of entities and themes that can be automatically extracted from content. This is useful to build an effective recommender system on top of similarity measures and its usage for document retrieval and ranking. In this work, our research goal is to create an expert system for education recommendation, based on skills, capabilities, areas of expertise present in someone’s curriculum vitae and personal preferences. This form of semantic text matching challenge needs to take into account all the personal educational experiences (formal, informal, and on-the-job), but also work-related know-how, to create a concept based profile of the person. This will allow a reasoned matching process from CVs and career vision to descriptions of education programs. Taking inspiration from the explicit semantic analysis (ESA), we developed a domain-specific approach to semantically characterize short texts and to compare their content for semantic similarity. Thanks to an enriching and a filtering process, we transform the general purpose German Wikipedia into a domain specific model for our task. The domain is defined also through a German knowledge base or vocabulary of description for educational experiences and for job offers. Initial testing with a small set of documents demonstrated that our approach L. Mazzola · P. Siegfried · A. Waldis · F. Stalder · A. Denzler · M. Kaufmann (B) School of Information Technology, Lucerne University of Applied Sciences, 6343 Rotkreuz, Switzerland e-mail: [email protected] L. Mazzola e-mail: [email protected] P. Siegfried e-mail: [email protected] A. Waldis e-mail: [email protected] F. Stalder e-mail: [email protected] A. Denzler e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_2
25
26
L. Mazzola et al.
covers the main requirements and can match semantically similar text content. This is applied in a use case and lead to the implementation of an education recommender system prototype. Keywords Semantic text matching · Document similarity · Concept extraction · Explicit semantic analysis · Domain-specific semantic model
1 Introduction Human consulting is expensive and time consuming. In the area of HR consulting, giving advice on possible job placements and possible further education could be automated. The vision is that users can upload their CV and their career goal, and an expert system recommends the best possible option. One of the issues for building an effective recommender system for job placement and further education programs is the difficulty of automatically identifying the skills, capabilities and areas of expertise that a person has. This is even more difficult when the person, on top of the mix of formal, informal, and on-the-job educational experiences has also work-related know-how. In a research project partially financed by the Innovation and Technology commission (CTI) of the Swiss Confederation, we identified a possible technical solution for this problem. There is already an extensive knowledge of approaches in the state of the art, but none of the existing approaches are well tailored to our problem. In fact, the problem is characterized by the following main aspects: (a) the need for analyzing unstructured and semi-structured documents, (b) commitment at extracting a semantic signature for a given document, (c) obligation to treat documents written in German since most documents in Switzerland are written in this language, (d) usage of semantic concepts also in German, (e) capability of running analysis on multi-parted sets finding ranked assignments for comparisons, and (f) capacity to run with minimal human intervention towards a fully automated approach. For these reasons, we performed research on a new approach to extract concepts and skills from text using a domain specific ESA space that is described in this work. The rest of the paper is organized as follows: Sect. 2 presents a very brief overview of related work, then our approach is described in Sect. 3 covering the different aspects of the functional requirements, the design of the system, and the data source characterization. Section 4 reports the requirement validation from the tuning of the parameters to the experimental settings. In Sect. 5, an initial evaluation with a business case oriented test bed is provided. Two use cases with different objectives
A Domain Specific ESA Method for Semantic Text Matching
27
are reported in Sect. 6, demonstrating the applicability of our approach to specific instances of real problems. The conclusions (Sect. 7) recapitulate our contribution for a solution to this problem stressing also some future work we intend to address in the next step of this research project.
2 Related Work Our proposed solution was inspired by numerous previously existing approaches and systems. For example, in the domain of document indexing, comparison and most similar retrieval there is a good review in the work of Alvarez and Bast [1], in particular with respect to word embedding and document similarity computations. Another very influential article by Egozi et al. [2], on top of supporting a conceptbased information retrieval pathway, provided us with the idea of the map model called ESA (explicit semantic analysis) and also suggested some measures and metrics for the implementation. A following work by Song and Roth [3] suggested the idea of filtering the model matrix and the internal approach for sparse vector densification towards similarity computation whenever we have as input a short text. The idea of kicking-off from the best crowd-based information source, Wikipedia, was supported by the work of Gabrilovich and Markovitch [4], who described their approach for Computing semantic relatedness using wikipedia-based explicit semantic analysis. This also fits our need of a German-specific knowledge base, as wikipedia publicly provides separated dumps for each language. Recently, a work from two LinkedIn employees [5] showed a different approach to map together profiles and jobs with perceived good matches by using a two step approach for text comprehension: relying on the set of skills S existing on the users profiles, the job description is mapped by a neural network (Long Short Term Memory) into an implicit vectorial space and then transformed into an explicit set of related skills ∈ S using a linear transformation of multiplicative matrix W . Since embedding is a key feature, we also analyzed the work of Pagliardini et al. [6], which focuses on the unsupervised learning of sentence embeddings using compositional n-gram features, and we relied on one of our previous work [7] to extract the candidate concepts from the domain. Another possibility for achieving this task could have been to adopt the embedRank of Bennani et al. [8] in which they suggest an unsupervised key-phrase extraction using sentence embeddings. It is possible that focusing on the usage of information granulation for fuzzy logic and rough sets applications could be beneficial for this objective [9], together with its underlying contributions to interpretability [10].
3 The Approach The general objective of this work is to design, implement and evaluate a databased system that is able to compare the education steps and experiences of a person
28
L. Mazzola et al.
(generally know as Curriculum Vitae or CV for short) in terms of keywords with possible education programs, and to semantically match them for recommendation. This means extracting from a CV its major points. To this objective, the initial prototype was devoted to analyze a single document, returning the extracted signature for human operator usage. As this approach is useful for human expert direct consumption, but suboptimal for further more abstract tasks such as direct document comparisons, similarities extraction or document matching, there is a need for a novel type of solution, which is able to satisfy all the imposed requirements, specified in the next section.
3.1 Functional Requirements Given the objective and the state of the art described, as starting point, we elicited some requirements through direct discussions with experts: employees of a business partner who do manual CV assessment and personalized suggestion of further educational steps on a daily basis. As a result of these interactions and the related iterative process of refinements, a common set of needs emerged as functional requirements useful to achieve common goals present in their day to day practice. Matching this candidate set with the business requirements expressed by the project partner, we eventually identified a core group of considerations stated in the following list: 1. develop a metric for comparing documents or short texts based on common attributes’ sets 2. compare two given documents: 2.1 identify similarities between two education-related documents 2.2 extract the capabilities, skills, and areas of expertise common to two (or more) documents. 3. compare a given document against a set: 3.1 assign the most relevant related job posting to a given CV 3.2 find the closest education program to a CV based on a common skill-set 3.3 find CVs similar to a given one in term of capabilities, skills, and areas of expertise. Also, we identified some additional nice-to-have capabilities, such as: (a) the use of a granular approach [11] for semistructured documents, to improve their conceptbased signature (b) the capability of using different knowledge metrics, (such as presence, direct count, count balanced against frequency and normalized count balanced against frequency) for considering the keyword occurrences into documents [12], and (c) the usage of different distance metrics (such as cosine distance/similarity and multi-dimensions euclidean distance) for comparing vector entries into the knowledge matrix, also called “semantic distance” measure [13].
A Domain Specific ESA Method for Semantic Text Matching
29
3.2 System Design The system is designed to create a matrix representing the relationship between sets of keywords and concepts. We define concepts, following the ESA approach [4], by using the wikipedia German version (called DEWiki in the rest of the paper). This means that we consider every page existing in this source as a concept, using as its identifier the page title and as description the text body (except the metadata part). The definition of a valid concept is in itself a research subject, and we built upon our previous work about concepts-extraction from unstructured text [7], to adopt the same approach. Figure 1 presents the two processes of enriching and domain specific filtering that constitute our pipeline to go from the source dump to the knowledge matrix. Enriching is the process used to extract the complete set of valid pages, meaning all pages with a valid content (eg: excluding disambiguation pages) and also enriched by simulating an actual content for the Redirect pages. In particular, the filtering process eliminates a page whenever at least one of the following conditions holds: • the title is entirely numerical (only consisting of a single number) • the length of the title is equal to one • it is a disambiguation page (no actual content, only pointers to the term different meanings) • the page is associated with geo-tagging metadata • the page text start with a redirect or a forward link. Domain specific filtering refers to our intuition that instead of using a generic, transverse knowledge base, we would like to have a more focused and specific model, only covering the concepts relevant for our application domain. Nevertheless in order to not lose too much coverage, we allow redirected pages if at least one of its incoming links is part of our domain. This process preserves all wikipedia pages that are part of the set generated by computing all valid ngrams (from a vocabulary of education descriptions) for the domain specific texts (without including any punctuation). After these two steps, the dataset is ready to be transformed into the knowledge source. Through the use of statistical approaches, the enriched and filtered list of wikipedia pages is transformed in a bidimensional matrix, whose dimensions are the stem1 of the words in the page content (columns) and the page names, consider as concepts (rows). The content of the matrix in the centre of Fig. 2 represents the importance of each dimension for characterising a concept. We envisioned four different metrics to use for the creation of this space: BINARY (presence or absence of the stem in the page), Term Frequency (TF, sum of the number of appearances), Term Frequency - Inverse Document Frequency (TFIDF, the frequency scaled by the selectivity of the stem), and its variation named TFIDF-NORMALISED (with a normalisation obtained by dividing the TF-IDF value by the sum of elements in each row. to give values between 0 and 1). Eventually, we adopted the last one of them, balancing the frequency of the stem within the document (the TF part), its 1 identification
of the base word, by removal of derived or inflected variations.
30
L. Mazzola et al.
Fig. 1 The semantic matrix building process, with the two processes of enriching and domain specific filtering
specificity to the current document (as the inverse of the stem distribution amongst all the documents, the IDF part), and normalising the value to represent the relative importance of each single stem for the given concept. The resulting matrix is our knowledge base, where for each wikipedia article relevant for our domain there is a distribution of stems, after filtering out too frequent and infrequent ones. Thus, every concept is represented as a vector in this knowledge space, and every short text can be transformed into such a vector and compared to the Wikipedia concepts. It is important to note that the matrix is transposed with respect of a standard ESA model. This means that the vector space is constructed starting from stems and not wikipedia article (concepts). This difference also affects the function used for computing similarity between documents as each one of them is represented by a vector in this stems space. Consequently, the similarity of a document to a concept can be measured by the vector distance of its stem vector to the stem vector for the concept. Accordingly, it is possible to produce a ranking of concepts for any arbitrary text document, and it
A Domain Specific ESA Method for Semantic Text Matching
31
Fig. 2 The matrix and two additional data structure used to store the knowledge base for our analysis. On the rows, there are the concepts (∼800 K) known by our system derived from titles of corresponding Wikipedia entries. The columns refer to the basic stems (∼45 K) found in the full text of the Wikipedia entries for the analysis. In each cell of the matrix the weight of that component for the vector representation of the concept is stored. The two accessory multidimensional arrays maintain information about the relative position and the accumulated value of each element into the distribution, respectively for the DEWiki and the domain knowledge base. Compared to the ESA approach of Egozi et al. [2], our novel ESA matrix is transposed, having stems as dimensions to allow to position and compare not just single words in a vector space, but whole text documents as sets of words
is possible to compare the similarity of two documents by measuring the aggregated distance of their stems vectors. As represented in Fig. 2, additional supporting data structures are maintained in order to allow restriction on the columns and rows to be taken into account for the actual computations. These consist in two bidimensional arrays that describe the relative position and the cumulated value of each element into the distribution respectively in the DEWiki and the Domain. Thanks to these supplementary information, it is possible to filter out too diffused or too specific stems and concepts, allowing a fine tune for the algorithm at run-time. Figure 3 represents the 5 steps-long pipeline for the similarity computation for two documents, as implemented into the project demonstrator. It relies on the data structure shown in Fig. 2, here represented as the “ESA space”. The input documents (Doc A and Doc B ) are parsed to extract the contained stems in step 1 by usage of the function stems E xtractor (Docx ). This creates a ranked stem vector, using the TF measure (stemsx ). Using the domain specific and the wiki vocabularies, they are filtered in Step 2, as shown in the figure by the tick and x
32
L. Mazzola et al.
Fig. 3 The pipeline for the similarity adopted in the demonstrator is organised in 5 steps as follows. Step 1: starting from two documents (A and B, on the top of the figure), the stems are extracted from the document text (stems E xtractor (X ) → stems). Step 2: these sets (Stems A and Stems B ) are then filtered, using the domain specific and the wiki vocabularies. This is the meaning of the approval or reject symbol on the side of each stem. Step 3: to deal with the potentially very long list of stems, and also to take into account the different length of the analysed documents, a (soft or hard) threshold is applied. Step 4: The resulting set of stems is then transformed into the most relevant set of concepts (E S A(stems) → concepts), by using the calculated ESA matrix, giving Concepts A and Concepts B ). Step 5: The list of concept is compared to compute a similarity index, after a common threshold is applied to limit the input (sim(Concepts A , Concepts B ) → [0, 1])
A Domain Specific ESA Method for Semantic Text Matching
33
symbols on the side of each entry. A (soft or hard) limiting threshold is then applied on each filtered vector in Step 3, also in order to deal with the potentially very long list of stems, and to compensate for the potential different length of the analysed documents. The filtered and limited stems set for each document is used in Step 4 as input for computing the relevant concepts over our calculated ESA matrix, through a mapping function (E S A(stems) → concepts): this generates a ranked list of relevant concepts for each document (Concepts A and Concepts B ). Eventually, in Step 5 we again limit the number of concepts (either in a “soft” approach, by accepting concepts accounting for a given percentage of the initial information, or in a “hard” way, by limiting the absolute number of concept allowed in the vector). The final result is the similarity measure in the unitary range (Sim(Concepts A , Concepts B ) → [0, 1]), which is computed by the weighted ratio of the common concepts over the full set of concepts. Data Sources Characterization The main data source is represented by a dump of the German version of wikipedia (DEWiki), taken on March 2018, and it is composed of ∼2.5 Millions pages. For the domain extension definition, we used three main data sources. The first one, composed of set of CV has a cardinality of ∼27,000, the second one, representing the description of publicly available educational experiences in Switzerland sums up to ∼1100 entries (around 300 vocational training, called “Lehre” in German, and 800 Higher education descriptions). The third and last source refers to open Jobs offer and has ∼30,000 postings. After enriching the initial candidate set of more then 2 millions pages, we have more than 3 millions valid entries, thanks to the removal of 253,061 irrelevant disambiguation pages and the addition of 1,443,110 “virtual” entities, derived by redirect links to 757,908 valid pages. On this initial candidate set of pages, we apply the filtering process to restrict them only to entries relevant for our domain reducing the number of considered concepts to 39,797. To do this, we create two list of stems and their occurrences, once for the wiki and once for the domain specific documents. after that we use both of the list to filter the stems in the esa matrix. (wiki_limits and domain_limits) Consequently, the set of stems is reduced. In fact the one included in the full enriched dataset has a dimension of ∼870 K, that reduces to ∼66 K after the filtering process. These constitute the full set of dimensions. For defining the additional data structures used in the filtering process at runtime, we computed individual and cumulated frequency of the stem and concepts in the reference model produced after the filtering process. As an example, Table 1 reports the top 10% of the distribution of the stems. In italics, the English-based stem, showing the contamination from other languages. This can be problematic since the stop-word removal and the stemming process are language dependent. Anyway, as we demonstrate later in one experiment, it can be possible nevertheless to compare documents formulated in different documents under the condition that the domain specific vocabulary is identical. Unfortunately, in our current approach, this is not a generalised result.
34
L. Mazzola et al.
Table 1 Top 10% of the stem distribution in the considered dataset Stem Number Percent (%) gut ch ag team sowi aufgab bewerb erfahr profil person freu arbeit bereich deutsch such biet mail of ausbild per mitarbeit gern abgeschloss vollstand verfug kenntnis hoh kund tatig kontakt weit vorteil unterstutz berufserfahr jahr
16,169 15,870 15,725 15,709 14,444 13,569 13,225 12,880 12,422 11,519 11,422 11,140 10,926 10,711 10,523 10,447 10,435 10,352 9668 9643 9607 9451 9294 9126 8923 8889 8831 8454 8397 8336 8238 8193 7999 7813 7776
0.43 0.42 0.42 0.42 0.39 0.36 0.35 0.34 0.33 0.31 0.31 0.30 0.29 0.29 0.28 0.28 0.28 0.28 0.26 0.26 0.26 0.25 0.25 0.24 0.24 0.24 0.24 0.23 0.22 0.22 0.22 0.22 0.21 0.21 0.21
Cumulated (%) 0.43 0.86 1.28 1.70 2.08 2.45 2.80 3.15 3.48 3.79 4.09 4.39 4.68 4.97 5.25 5.53 5.81 6.09 6.34 6.60 6.86 7.11 7.36 7.60 7.84 8.08 8.32 8.54 8.77 8.99 9.21 9.43 9.65 9.85 10.06
A Domain Specific ESA Method for Semantic Text Matching
35
4 Implementation To apply the document matching method described in this paper, we implemented a recommender system that matches possible education descriptions to descriptions of CVs and professional vision based on proximity in ESA space. To allow easier interaction with the demonstrator, a very simple HTML based GUI was developed profiting of the REST approach adopted in the development of the software solution as shown into Fig. 4. The demonstrator computes the similarity amongst the (CV + Vision) text and each of the available education experience. In order to provide a fast and reactive interface, the concepts set for each available education experience is precomputed and stored instead of being computed at run-time. In this particular case, the profile used is an example for a software developer, whether the vision expressed the interest for extending the knowledge into the Big Data, Machine Learning and Artificial Intelligence direction. As result, all the proposed education experience include both aspects although in different degrees. It ranges from Machine Learning (both principles, practical and as element of more general Data Science approach) to specific solution for ML (tensorFlow), passing through Deep Learning and case studies. The industry partner reportedly found these results very interesting and well aligned with what a human expert will suggest for the same input. This implicitly supports the approach, even if we still don’t have any structured evaluation of the result quality.
Fig. 4 A simple interface developed to allow the testing by the industry partner. The interface allows to input a Curriculum Vitae on the left bottom and a Vision text on the top of the same column. It then computes the most similar education experiences for the combination of these two elements. The column on the right reports the results, in descending order of importance
36
L. Mazzola et al.
5 Evaluation To provide the requirement (R1), we have developed a metric for comparisons of two documents. We use the balanced weight of the common concepts describing the two documents with respect to the average weight of the total set of concepts. This allows us to consider the concepts used as well as their relative pertinence to each document. With respect to the comparison of two documents requirement (R2), we measured the capabilities of our approach based on some examples. The same is used for both the subgoals: for (R2.1) the ordered list of common concepts represent a solution, whether the consideration of the level of relevance provides an indication of the capabilities, skills and areas of expertise underlining the similarity level reported providing in this way the (R2.2) requirement. With respect to the requirement (R3), this is a generalization of the previous category with the additional demand of considering a bigger set of documents for comparison. Despite the similarity of the internal approach required to satisfy FR3, computationally this is a more challenging problem, and we developed an additional set of functions to run, compare and rank the results of individual comparisons. Every subcategory into this requirement is distinguished by the type of resulting documents (R3.1: CV −→ Jobs, R3.2: CV −→ Education, and R3.3: CV −→ CVs) used for the comparison, but the algorithm to provide the results is substantially identical.
5.1 Parameters Tuning As the system has multiple parameters to control its behavior, we ran a multiparametrized analysis to discover the best configuration. One problem is due to the limited dimension of the test-set available since preparing the dataset and the human expert based assignment is a time consuming activity. Despite the risk of overfitting on the obtainable cases, we perceived the usefulness of this analysis. For this, we developed a piece of code to generate a discrete variation of the set of parameters and we used these criterion lots for finding the best (most related) assignment for each document. In order to compare the result, we used a transformation matrix for generating a mono-dimensional measure from the assignment results. Table 2 presents the multipliers used. For the top-K documents in the ordered result set, the number of entries common with the human-proposed solution is counted and then this number is multiplied by the value present into the matrix to give one component of the global summation. In this way, we are able to directly compare runs based on different parameters set. The set of parameters controlling our system is as follows: • wiki_limits, controls the rows used by restricting too frequent or infrequent entries using the first additional multidimensional array of cumulated frequency in Fig. 2,
A Domain Specific ESA Method for Semantic Text Matching Table 2 Transformation for a mono-dimensional quality measure Rank #1 #2 Top-1 Top-2 Top-3 Top-5 Top-10
2 1/2 1/3 1/5 1/10
3/2 3/3 3/5 3/10
37
#3 5/3 5/5 5/10
meaning computed referring to the DEWiki. It is composed by a top and a bottom filtering level. • domain_limits, also controls the rows to be considered in the computation, based on the cummulated frequencies into the Domain corpus. It is based on the second additional multidimensional array in Fig. 2. • top_stems, indicated the maximum number of vector components that can be used to characterize at run-time a concept. It dynamically restrict the columns considerable for comparisons, by ranked absolute filtering. • concept_limitation_method, controls the way concepts limitation is done: it assumes a value in the set {HARD,SOFT }. In the first case instructs the system to use an absolute number, whether in the second to conserve a certain information percentage. The value to use is respectively given by the following parameters: – top_concepts is the absolute number of top ranked concepts to use, normally between 25 and 1000. – top_soft_concepts is the cumulated information percentage that the considered top ranked concepts hold. It normally ranges between 0.05 and 0.30. • matrix_method, is the method used to compute each cell value in Fig. 2. Currently we implemented an initial set {BINARY, TF, TFIDF, TFIDF_NORMALIZED}. For the current publication experiments we adopted the last value. • comparing_method, is the method used for measuring the distance of elements (dissimilarity) in the restricted vector space between two or more documents. Currently we implemented only a metric that represent the cosine distance (COSINE). Additionally to these parameters that affect the algorithm behavior, we have some config voices that only affect the presentation of results. The main ones amongst them are: • poss_level, instructs the system on which final value to consider as a similarity threshold for indication of uncertain (under the given value) and possible (over it) similarity level. Usually set to 0.10. • prob_level, indicates the dual threshold to distinguish between possible (under it) and probable (over it) similar documents. One candidate value from our experiment seems to be 0.25.
38
L. Mazzola et al.
• debug, control the amount of information about the computation problem that the algorithm emits. It can be one of {True,False}
5.2 Demonstration of Semantic Relatedness To demonstrate that our solution is producing semantically related results, we created a test case composed of 17 CVs and 44 different educational experience description, indicated by the business partner. As preparation, they also provided us with the three best assignments, as the golden standard. We then ran multiple bipartite analysis with different parameters sets, creating ranked association sets and measured their quality, based on the weight presented in Table 2. The reference is the expected quality value for a purely random distribution without repetition of 44 elements for the considered top-k sets, with expected value E[Q] ≈ 0.32. On our set of 27 different runs we observed a quality in the range [3.96–10.39] with an average Q ≈ 6.62 and a dispersion measured with standard deviation of σ [Q] ≈ 1.68. This support our hypothesis that our approach (the model and its usage in the system) provides some knowledge. Additionally, an human-based evaluation was performed, as we would like to have an estimation of the utility and effectiveness of our approach to support human reasoning. An expert from the business domain ranked five selected entries. We selected one entry we considered very successful (C V9 ), one with intermediate results (C V11 ), and three elements with not too good assignments (one with at least one match into the top-10 and two without anyone). For the analytical data (matches and relevant score based on Table 2) we point the reader to Table 3. Here the second, third and forth columns represent the descending ordered position of the matches in the candidate list, whether the fifth column encode the quality score (Q) achieved by that configuration. Eventually, the seventh and last column provides the evaluation assigned by the human expert to the specific choices arrangement, here called Stars for analogy with a rating system.
Table 3 The manual evaluation of an initial test case subset. For everyone of the 5 CV, the 3 proposed assignments are evaluated against their position in the ex-ante human ranking. The last column presents the evaluation attached ex-post to this assignments sequence by the same human expert CV ID Opt #1 Opt #2 Opt #3 Quality Stars C V3 C V6 C V9 C V11 C V16
>10 >10 1 5 10
>10 >10 2 6 >10
>10 >10 5 >10 >10
0 0 6.7 0.6 0.1
3 2 4 2 1
A Domain Specific ESA Method for Semantic Text Matching
39
The range is [0–4], with highest value representing better option distribution. The selected set of five CV achieve an average value of 2.4, with values ranging from 1 to 4. For a very initial analysis of the rates given, is possible to note a high correlation of our quality measure with the stars-based expert rate. Interestingly and in contrast with the expectation, the two worst cases for our quality measure are rated with 2 and 3, indicating a nevertheless acceptable to good utility for the human judgment: we currently do have not clear explanations for this fact, and we need more experimental result to test any hypothesis.
6 Use Case 6.1 The Initial Testing After the quantitative and qualitative evaluation of semantic relatedness, we identified an initial small set of documents to be used for running an experimental use case in semantic text matching. They are as follows: • Doc1 : Description of the federal capacity certificate for car mechatronics engineer [Automobil Mechatroniker EZF] • Doc2 : Job offer for a Software developer [Software Entwickler] • Doc3 : Description of the Bachelor of Sciences in Medical Informatics ad at the BernerFachhochshule [Bcs. MedizinInformatiker/in BFH] • Doc4 : Job offer for a car mechatronics specialist [Automechatroniker @ Renault dealer] • Doc5 : Research group “Data Intelligence Team” at the HSLU - School of Information Technology • Doc6 : Job offer as a general purpose Nurse [Dipl. Pflegefachperson HF/FH 80– 100% (Privatabteilung)] • Doc7 : Description of the general information of the Lucerne cantonal hospital on the website [Luzerner Kantonspital] • Doc8 : The page “about us” of the Zug cantonal hospital website [Zuger Kantonspital] • Doc9 : the news on the portal 20Minuten (http://www.20min.ch) about the technical issues VISA experienced in Europe on 01 June 2018 [Visa hat technische Probleme in ganz Europa] • Doc10 : the news on the portal 20Minuten about the acquisition of Monsanto by Bayer on 07 June 2018 [Bayer übernimmt Monsanto für 63 Milliarden] The set of ten documents was designed to have some clear correlations, but also to test the performance of the system on general purposes records such as the last two entries (news). Within every document we extracted a weighted sequence of the top K concepts, which we considered as its semantic signature. The summarized result of the compu-
40
L. Mazzola et al.
tation is shown on Table 4, where each cell represents the similarity measure between a couple of document in the selected set. To support this interpretation, we compute the differentials with respect to each row using the relative similarity measures from Table 4 following the formula: Vy = x Vx y (coherently, the same is valid for the column, based on the formula Vx = y Vx y ), giving us the two transposed matrices. These matrices, encode the relative distance of each other document from the average ones. One of them is represented in Table 5, but we skipped it to represent the transposed ones. In this table, the different
Table 4 The similarity measure (cosine distance of stem vectors) amongst all the 10 documents in the test-case. Diagonals are not considered as they would always achieve the maximal score (1). Bigger values represent higher semantic signature similarities for the two documents affected. The last elements (line and column) represent the averages, respectively for row and column Score Doc1 Doc2 Doc3 Doc4 Doc5 Doc6 Doc7 Doc8 Doc9 Doc10 Vy Doc1 Doc2 Doc3 Doc4 Doc5 Doc6 Doc7 Doc8 Doc9 Doc10 Vx
– 0.160 0.153 0.478 0.106 0.202 0.117 0.146 0.114 0.174 0.183
0.160 – 0.285 0.227 0.341 0.157 0.183 0.269 0.238 0.213 0.230
0.153 0.285 – 0.186 0.235 0.369 0.360 0.367 0.265 0.176 0.266
0.478 0.227 0.186 – 0.201 0.144 0.183 0.231 0.233 0.342 0.247
0.106 0.341 0.235 0.201 – 0.126 0.178 0.258 0.252 0.20 0.211
0.202 0.157 0.369 0.144 0.126 – 0.432 0.420 0.221 0.148 0.247
0.117 0.183 0.360 0.183 0.178 0.432 – 0.447 0.283 0.201 0.266
0.146 0.269 0.367 0.231 0.258 0.42 0.447 – 0.345 0.262 0.305
0.114 0.238 0.265 0.233 0.252 0.221 0.283 0.345 – 0.302 0.250
0.174 0.213 0.176 0.342 0.200 0.148 0.201 0.262 0.302 – 0.224
0.183 0.230 0.266 0.247 0.211 0.247 0.266 0.305 0.250 0.224 –
Table 5 The differential of each similarity value from Table 4 with respect to the row average: Delta1 : Δx y1 = Vx y − Vy = Vx y − x Vx y Δor Doc1 Doc2 Doc3 Doc4 Doc5 Doc6 Doc7 Doc8 Doc9 Doc10 STD
Doc1 – –0.070 –0.113 0.231 –0.105 –0.045 –0.148 –0.159 –0.136 –0.050
Doc2 –0.023 – 0.019 –0.020 0.130 –0.090 –0.082 –0.036 –0.012 –0.011
Doc3 –0.030 0.055 – –0.061 0.024 0.122 0.095 0.062 0.015 –0.048
Doc4 0.295 –0.003 –0.080 – –0.010 –0.103 –0.082 –0.074 –0.017 0.118
Doc5 –0.077 0.111 –0.031 –0.046 – –0.121 –0.087 –0.047 0.002 –0.024
Doc6 0.019 –0.073 0.103 –0.103 –0.085 – 0.167 0.115 –0.029 –0.076
Doc7 –0.066 –0.047 0.094 –0.064 –0.033 0.185 – 0.142 0.033 –0.023
Doc8 –0.037 0.039 0.101 –0.016 0.047 0.173 0.182 – 0.095 0.038
Doc9 –0.069 0.008 –0.001 –0.014 0.041 –0.026 0.018 0.040 – 0.078
Doc10 –0.009 –0.017 –0.090 0.095 –0.011 –0.099 –0.064 –0.043 0.052 –
± 0.074 ± 0.040 ± 0.051 ± 0.090 ± 0.044 ± 0.086 ± 0.079 ± 0.061 ± 0.032 ± 0.047
A Domain Specific ESA Method for Semantic Text Matching
41
Table 6 The final result of our experiment over the designed test-case with 10 documents: based on the simple summation of values in Table 5 and its transposed (R x y = Δx y1 + Δx y2 = Δx y1 + Δ yx1 ), the final R measure is computed. The final similarity level is encoded by the different gradations of red. Higher saturation suggest a semantic closeness R Doc1 Doc2 Doc3 Doc4 Doc5 Doc6 Doc7 Doc8 Doc9 Doc10 Best AVG STD
Doc1 – –0.094 –0.144 0.525 –0.182 –0.026 –0.214 –0.196 –0.206 –0.060
Doc2 –0.094 – 0.073 –0.024 0.241 –0.163 –0.129 0.003 –0.005 –0.029
Doc3 –0.144 0.073 – –0.141 –0.007 0.225 0.189 0.163 0.013 –0.138
Doc4 0.525 –0.024 –0.141 – –0.056 –0.206 –0.146 –0.090 –0.032 0.213
Doc5 –0.182 0.241 –0.007 –0.056 – –0.205 –0.120 0.000 0.043 –0.035
Doc6 –0.026 –0.163 0.225 –0.206 –0.205 – 0.353 0.288 –0.055 –0.175
Doc7 –0.214 –0.129 0.189 –0.146 –0.120 0.353 – 0.324 0.051 –0.087
Doc8 –0.196 0.003 0.163 –0.090 0.000 0.288 0.324 – 0.135 –0.005
Doc9 –0.206 –0.005 0.013 –0.032 0.043 –0.055 0.051 0.135 – 0.129
Doc10 –0.060 –0.029 –0.138 0.213 –0.035 –0.175 –0.087 –0.005 0.129 –
Doc4
Doc5
Doc6
Doc1
Doc2
Doc7
Doc6
Doc7
Doc8
Doc4
–0.066 –0.014 0.026 0.005 –0.036 0.004 0.024 0.069 0.008 –0.021 ± 0.142 ± 0.082 ± 0.121 ± 0.162 ± 0.093 ± 0.190 ± 0.182 ± 0.141 ± 0.073 ± 0.089
gradation of yellow in the standard deviation bottom filed, represents the polarization of the set of result for each given entry in the set. Higher measures in this field intuitively suggest a better comprehension and differentiation of the peculiarities of a specific element with respect of the others in the set. For a global view, we (cell-wise) summed-up the symmetric elements creating the final object represented into Table 6. For example R : Doc2_5 and R : Doc5_2 are both filled with the sum of Δor : Doc2_5 = 0.111 and Δor : Doc5_2 = 0.130 giving a value of 0.241. In this matrix the most significant similarity indications are highlighted with a red background, whose tone intensity positively correlates with their strength while considering the average and standard deviation of all the delta-based similarity metric reported for the specific document. The 11th row, represents for each column (document) the best candidate for semantic matching. The highlighting color used here indicate the “natural” clusters that emerge by the document thematic matching process. It is interesting to note that based on the fact the tint of the highlighting is defined on a column-based analysis, the same value can present different intensity, such as for W3_6 and W6_3 .
6.1.1
Considerations on the Initial Testing
From the analysis of the results, we believe we can clearly identify some strong similarities, roughly corresponding with the darkest red-highlighted cells in Table 6: • Doc1 and Doc4 are very similar, as they both describe the profession of car mechatronics engineer, even though from two different points of view (the first as a capacity certificate, whether the latter one as a job offer),
42
L. Mazzola et al.
• Doc2 and Doc5 are quite similar, as they are both strictly related to computer science subareas: one presenting a software developer vacancy in a well-known online job platform, the other characterizing the research topics and projects carried out in the “Data Intelligence” team at HSLU-Informatik, • Doc3 is fairly comparable to Doc6 , as they partially reproduce the first case (even if in this case the domain is health-related); here a good case is represented by the similarity also with Doc7 and Doc8 , that describe hospital profiles and offers. • Doc6 , Doc7 , and Doc8 constitute a reasonably related cluster, as they all are about health aspects and operations/service offered in the health domain. Here again, the relative relatedness of Doc4 is present. Eventually, Doc9 and Doc10 , which are not specific of the domain used for building the system model, are included into the evaluation to showcase the effect of noise: no clear similarity emerges, but the effects of similar structure and common delimiter elements take a preponderant role, suggesting a similarity amongst each another, as also shown into Fig. 5.
6.2 An Additional Experiment For this experiment, we retrieved existing jobs and courses descriptions, from online sources and used them without any preprocessing stage. This ensured the minimum amount of overfitting versus our corpus and our methodology of the test cases. We offer in this paper the description and the output of two main cases: the first one with two course descriptions in German for two somehow related professional areas (but usually adopting different vocabularies), and the second one with a very short professional outline in German and a longer job opening in French. In this second case, the capability of our approach to rely on the domain specific terminology is supported by the very specialist area the openings refers to. We could not present any example using CVs, due to the private nature of the information there present and the new EU regulation on data privacy (EUGDPR). Comparing documents in the same language This experiment tests for the capability of abstraction of our “ESA space” in terms of being able to abstract on the specific stems used (for example, different registries or writing styles/habits) in favour of the meaning conveyed by the specific choice of words. The first document describes the official federal professional certification competencies for a “Custodian”: DOC1 : Hauswart/-in mit eidg. FA Hauswart/-in mit eidg. FA Hauswartinnen und Hauswarte sind ausgewiesene Führungs- und Fachspezialisten. Für grössere bzw. komplexere Arbeiten beauftragen sie nach Rücksprache mit der vorgesetzten Stelle spezialisierte externe
A Domain Specific ESA Method for Semantic Text Matching
43
Betriebe und begleiten die Ausführung. Sie verfügen über grundlegende administrative und rechtliche Kenntnisse. Sie sind zuständig für die Umsetzung der ökologischen und sicherheitstechnischen Richtlinien. Als Bindeglied oder Vermittler zwischen Nutzern, Kunden, Mietern und Liegenschaftsbesitzern leisten Hauswartinnen und Hauswarte einen wichtigen Beitrag für die Gesellschaft. EBZ Erwachsenenbildungszentrum Solothurn-Grenchen 91 The second document illustrates (on a more abstract level) a course for a “responsible for maintenance” of real estates: DOC2 : Sachbearbeiter/in Liegenschaftenunterhalt KS/HEV Nachhaltigkeit ist eines der Schlagwörter, wenn es um Immobilien und deren Unterhalt geht. Dies erfordert ein fundiertes Verständnis für den Lebenszyklus einer Immobilie. Möchten Sie den Unterhalt Ihrer eigenen Liegenschaft optimieren oder werden Sie in Ihrem beruflichen Umfeld immer wieder mit diesem Thema konfrontiert? Interessieren Sie sich für die Bausubstanz und deren Alterung, um grössere Ersatzinvestitionen frühzeitig planen zu können? Wollen Sie beim Kauf einer Liegenschaft wissen, worauf Sie bezüglich der Bausubstanz achten müssen und die Notwendigkeit von anstehenden Unterhaltsarbeiten erkennen sowie die zu erwartenden Kosten abschätzen können? Baufachleute vermitteln Ihnen einen umfassenden Überblick über die Bauteile einer Liegenschaft und deren Lebensdauer, Bauerneuerungsaufgaben und damit verbundene vertragliche Bindungen. Zudem erhalten Sie wertvolle Tipps für die Praxis mit auf den Weg. KS Kaderschulen 201 The individual set of stems coming from the Step 3 are given in Table 7: the only common stem between Doc1 and Doc2 is gross, here represented in bold. To transform this stems into a similarity score, we can apply the same function used on the concepts vectors (Sim(X , Y ) → [0, 1]): the retrieved score will then be 0.001494. Instead, the third column represents the concepts extracted after Step 4 and limited by applying the threshold (only a part of the set is represented, as it would have been too long to include the full set of 150 concepts). Computing the similarity measure for these sets creates a value of 0.261876. This clearly shows a case where a comparison in the space of stems will provide a very low (at the limit of being non-existing) similarity between the given set of two documents, but their projection into the concept space is able to extract a higher level meaning and re-balance the similarity level computed. Comparing documents in different languages (special case) In the second case, we shows the capabilities of comparing two documents written in different languages, under the assumption that the domain specific vocabulary is partially language-independent. The first document is a one-line short collection of terminology in German associated with the operation of cutting with a milling machine:
stem
[‘richtlini’,‘fuhrung’,‘gesellschaft’,‘wichtig’,‘beitrag’,‘verfug’,‘bzw’,‘nutz’, ‘solothurn’,‘vorgesetzt’,‘fa’,‘bindeglied’,‘extern’,‘spezialisiert’,‘grench’,‘stell’, ‘ausgewies’,gross’,‘rucksprach’,‘sicherheitstechn’,‘beauftrag’,‘fachspezialist’,‘leist’, ‘vermittl’,‘miet’,‘arbeit’,‘kenntnis’,‘kund’,‘zustand’,‘umsetz’,‘okolog’,‘komplex’, ‘rechtlich’,‘administrativ’,‘betrieb’,‘eidg’,‘grundleg’,‘ausfuhr’,‘hauswart’,‘begleit’]
[‘worauf’, ‘gross’, ‘kauf’, ‘erhalt’, ‘kaderschul’, ‘vermitteln’, ‘fruhzeit’, ‘sowi’, ‘nachhalt’, ‘praxis’, ‘vertrag’, ‘liegenschaft’, ‘optimi’, ‘notwend’, ‘verbund’, ‘zud’, ‘beim’, ‘kost’, ‘baufachleut’, ‘imm’, ‘beruf’, ‘umfass’, ‘erfordert’, ‘unterhalt’, ‘der’, ‘acht’, ‘unterhaltsarbeit’, ‘interessi’, ‘bauteil’, ‘schlagwort’, ‘fundiert’, ‘bindung’, ‘erwart’, ‘umfeld’, ‘sachbearbeiterin’, ‘thema’, ‘ansteh’, ‘mocht’, ‘bausubstanz’, ‘konfrontiert’, ‘verstandnis’, ‘lebensdau’, ‘bezug’, ‘abschatz’, ‘ks’, ‘muss’, ‘lebenszyklus’, ‘erkenn’, ‘immobili’, ‘wertvoll’, ‘wiss’, ‘tipps’, ‘eig’, ‘uberblick’, ‘plan’, ‘geht’]
Doc
1
2
[‘Alkoholkrankheit’, ‘Bindella’, ‘Seele’,..., ‘Leasing’, ‘Geldwäsche’, ‘Social Media’, ‘Immobilientreuhänder’,..., ‘Designmanagement’, ‘Partner Privatbank Zürich’, ‘Denkmalpflege’, ‘Bilanz’,..., ‘Grünes Gebäude’, ‘Hermeneutik’, ‘Immobilienmarkt’, ‘Immobilie’,..., ‘Energiemanagement’, ‘Depression’, ‘Angela Merkel’, ‘Star Trek: Der erste Kontakt’, ‘Tierversuch’, ‘Ethik’, ‘Immobilienmakler’, ‘Coaching’, ‘Design’, ‘Bauherrenberatung’,..., ‘Netzwerk-Marketing’, ‘Werbung’, ‘Ganztagsschule’, ‘Facilitymanagement’, ‘Haus- und Familienarbeit’,..., ‘Wohnungsbau’, ‘Experiments’, ‘Behinderung’, ‘Personalentwicklung’, ‘Due-Diligence-Prüfung’, ‘Energieeinsparung’, ‘Rehaklinik Hasliberg’, ‘Bauökonomie’,...]
[‘Religion’,‘Verordnung (EG) Nr. 1907/2006 (REACH)’,...,‘Infrastrukturmanagement’,‘Fachlaufbahn’, ’Bankbetriebslehre’,‘Due-Diligence-Prüfung’, ‘Bildungsberatung’,..., ‘Facilitymanagement’, ‘Concierge’, ‘Motivation’, ‘Versicherer’, ‘Schulleitung’, ‘Personalentwicklung’, ‘Teambildung’,..., ‘Designmanagement’, ‘Lean Management’,... ‘Immobilienmarkt’,..., ‘Projektkommunikation’, ‘Energieeinsparung’, ‘Schweizerischer Städteverband’,..., ‘Apotheke’, ‘Bauherrenberatung’,..., ‘ZEWO’, ‘Mediation’, ‘Leasing’, ‘Der Process’,...]
concept
Table 7 An experiment to demonstrate the capabilities of the transformation space stem → concept to overcome the choices of words (stems) in favour of the underlying semantics
44 L. Mazzola et al.
A Domain Specific ESA Method for Semantic Text Matching
45
DOC3 : CNC Dreher cnc dreher cnc turner cnc dreher cnc fräsercnc dreher Instead, the other document in this set describes a job offer for an “milling machine with automatic control operator”, in particular for the task of setting up and regulating them, and it is formulated in French: DOC4 : Régleur CNC régleur cnc ok job sa cnc machine operator régleur cnc l’un de nos clients, une entreprise du jura bernois active dans le développement de systèmes automatisés, cherche de manière temporaire pour renforcer son équipe un/e :régleur cnc h/fvotre mission:vous êtes en charge de la préparation du travail et des mises en train de machines de transfert cnc. vous garantissez le suivi de production de composants horlogers tout en contrôlant la qualité de celles-ci à l’aide d’outils de contrôle nouvelle génération.votre profil:·vous avez effectué un cfc de mécanicien de précision ou équivalent·vous avez de bonnes connaissances de la programmation et du réglage sur machines cnc· vous avez idéalement de l’expérience dans l’usinage de composants horlogers·vous êtes autonome, précis, polyvalent et curieuxintéressé(e)? dans ce cas frédéric maugeon se réjouit à l’avance de la réception de votre dossier complet. merci de transmettre votre candidature en cliquant sur “postuler” The individual set of stems coming from the Step 3 are given in Table 8: the only common stem between Doc3 and Doc4 is cnc, again represented in bold. Anyway, due to the minimal number of stems in the Doc3 vector, this single element is already able to produce a similarity measure not close to zero, namely having the value of 0.208347. Given the specificity of the vocabulary adopted by these two documents, their projection into the semantic (ESA) space is able to stress the similarity of the underlying concepts, producing a value for the similarity in Step 5 of 0.771173. The fact that many of the concepts retrieved that are common to this set are indeed strictly related to the milling machine world is noteworthy. They are definitively more semantically oriented than in the previous example, which included a significant portion of more generic concepts in the vectors intersection.
7 Conclusions In this work, we presented an ESA-inspired, domain-specific approach to semantically characterizing documents and comparing them for similarities. After clarifying the usage context and the functional requirements, we described the creation of a model that sits at the core of our proposal. The peculiarities of our approach are the enriching and filtering processes„ which allow the starting from a general purpose
46
L. Mazzola et al.
Table 8 The following is an experiment to demonstrate the capabilities of the computing similarities in documents about very specialised domains, regardless of the language in which they are formulated Doc stem concept 3
4
[‘cnc’,‘turn’,‘dreh’]
[‘Schraube’,...,‘Palettenwechsler’, ‘Steuerungstechnik’,‘Polymechaniker’, ‘Rundschleifmaschine’,...,‘IEEE 1284’,‘Schnelle Produktentwicklung’,‘Kinästhetik’, ‘Präzision’,..., ‘Roto Frank’, ‘Drehen (Verfahren)’, ‘Fanuc’,..., ‘Drehbank’, ‘Drechsler’, ‘Tischler’, ‘Zerspanungsmechaniker’, ‘CNC-Fachkraft’, ‘CAD’, ‘Werkzeugmechaniker’,..., ‘CNC-Drehmaschine’, ‘Häner’, ‘Fitting’, ‘AutoCAD’,..., ‘DMG Mori K.K.’,..., ‘Werkzeugmaschinenfabrik Arno Krebs’, ‘Sinumerik’, ‘Feldbus’, ‘Arbeitsumgebung’, ‘CNC-Maschine’,..., ‘Produktionswirtschaft’,..., ‘Drehmaschine’, ‘Metallurgie’, ‘Formenbau’,...] [‘job’, ‘train’, ‘cas’, ‘production’, ‘client’, [‘Werkzeugschleifen’, ‘Swatch Group’, ‘outil’, ‘contrôl’, ‘ci’, ‘jura’, ‘Cadwork’, ‘Landwirtschaftsschule’, ‘développement’, ‘travail’, ‘équip’, ‘Arbeitsumgebung’,..., ‘effectué’, ‘aid’, ‘cnc’, ‘précis’, ‘précision’, ‘Zerspanungsmechaniker’,..., ‘CAD’, ‘cfc’, ‘avanc’, ‘nouvell’, ‘autonom’, ‘CNC-Fachkraft’,..., ‘Fitting’,..., ‘réception’, ‘machin’, ‘charg’, ‘régleur’, ‘Schraube’,..., ‘Häner’, ‘bernois’, ‘manièr’, ‘programmation’, ‘Polymechaniker’, ‘mis’, ‘ok’, ‘mécanici’, ‘bonn’, ‘qualité’, ‘Bearbeitungszentrum’,..., ‘Metallurgie’, ‘merci’, ‘candidatur’, ‘cherch’, ‘dossi’, ‘Formenbau’, ‘Prototypes’, ‘cell’, ‘tout’, ‘temporair’, ‘horlog’, ‘Thermografie’, ‘Präzision’, ‘frédéric’, ‘polyvalent’, ‘complet’, ‘Werkzeugmaschinenfabrik Arno ‘connaissanc’, ‘entrepris’, ‘activ’, ‘systèm’, Krebs’, ‘Machine to Machine’,..., ‘operator’, ‘suivi’, ‘préparation’] ‘Fanuc’,..., ‘Produktionstechnik’, ‘DMG Mori K.K.’, ‘LNS SA’, ‘Drawing Interchange Format’, ‘Digitalisierung’, ‘Schnelle Produktentwicklung’, ‘Tebis’,..., ‘Drehmaschine’,..., ‘Computer-aided manufacturing’, ‘Produktionswirtschaft’, ‘Fräsmaschine’, ‘IEEE 1284’, ‘Feldbus’, ‘Maschinelles Lernen’,..., ‘Drehen (Verfahren)’, ‘CNC-Drehmaschine’,..., ‘AutoCAD’, ‘DMG Mori Aktiengesellschaft’,..., ‘Werkzeugmechaniker’, ‘Senkerodieren’,...]
A Domain Specific ESA Method for Semantic Text Matching
47
corpus of documents and create a domain specific model. This computation happens at the system initialization stage, offering a model ready-to-use at run-time. To improve the performance, we designed additional data structures and parameters to allow a more fine grained adjustment for each execution. On top of the model, we designed functions and metrics to use from seamless documents characterization and similarity scoring. The challenge of the ESA approach proposed in [2] is the aggregation of vector representation from single words to whole documents, as this is the unity in our application domain. To solve this issue, we contribute a new ESA approach with a transposed vector space consisting of stems, representing Wikipedia Text concepts as points in this space. This allows the positioning of arbitrary text documents in this space and to compare their similarities to Wikipedia entries and all other text documents using Vector distance. Our conclusion is even though this method is not directly applicable for concept extraction like traditional ESA, we have shown that our method produces meaningful results for semantic document matching based on similarity if the set of concepts similar to two texts is compared. We applied our approach to curricula vitae, defining our domain through a German knowledge base of description for educational experiences and for job offers. We initially statistically demonstrated that the produced results are semantically related, based on a quality mono-dimensional measure transformation of the results. From this we can conclude that some semantics is captured by our approach. Furthermore we designed a small set of 10 documents for a use case, divided into 3 clusters, with 2 unrelated elements. From that similar documents were grouped by the algorithm and thus our algorithm demonstrated the potential for semantic text matching, starting from heterogeneous sources. Through our contribution, we show that the idea of restricting the knowledge based for the ESA space to a specific domain and the possibility to filter too common or infrequent elements from both the dimensions of the model seems to improve the capability of recognizing semantic relationship amongst documents, by reducing the noise affecting the system. Figure 5 shows the dendrogram (hierarchical tree) produced by the normalization of the distance matrix using the complete approach, to balance the clusters by reducing the summation of the inter-cluster distance. The major limit of our approach is its language dependency because the model is produced on a specific language-based jargon. Unfortunately, this is currently a structural limit since we developed our model on the German language, which is the more prominent language used in Switzerland. The job offers and the educational experience are specific for Switzerland and described in the same language. We do not expect any major issues (except the potential lack of data) in repeating the full process using sources in different languages. Currently, this prototype is being used for comparison with manually annotated CVs in order to assess its stability (absence of macroscopic false positive) and also to verify its usefulness (in term of additional enrichment it can produce with respect to the information a human operator in a typical iteration produces). No structural result is still available in this respect as the testing is still in a initial phase.
48
L. Mazzola et al.
Fig. 5 Right: The heatmap of the document distances, for the use case in Table 6. Color saturation positively correlate with the score. Left: The corresponding dendrogram: here the cluster are highlighted by the use of different colours. We predefined the presence of exactly 4 clusters
Despite the promising results, we would like to improve the system and extend the testing with particular respect to: 1. implementing a quantitative benchmarking of the document matching method based on several gold standards, 2. adoption of a granular approach. We expect to improve the document characterization by its concept-based signature, in particular considering that curricula vitae are intrinsically already semistructured documents, 3. development of customizable metrics for stems weighting into the domainspecific model allowing the selection at runtime of which one to adopt for a specific run, 4. envision of different distance metrics for comparing vector entries into the knowledge matrix in order to stress distinctive aspects of our model vector space 5. estimation of the effects of parameters choice to the output, in order to identify optimal parameters sets, 6. ideate an approach to deal with multiple languages. Switzerland is a multi-lingual entity, and this will be definitely interesting, but also towards the capability of comparing documents written in different languages or to consider entries with section in various languages. An idea we are assessing is to create different ESA model, each one starting from a dump in the relevant language, and then somehow relate them using the metadata stating the equivalence of pages in different languages (normally present in Wikipedia as “Languages” in the bottom left of a page).
A Domain Specific ESA Method for Semantic Text Matching
49
Some of these aspects will be researched in the next project steps, together with the concurrent semi-automatic creation of a lightweight ontology for concepts existing into our domain. Acknowledgements The research leading to this work was partially financed by the KTI/Innosuisse Swiss federal agency, through a competitive call. The financed project KTI-Nr. 27104.1 is called CVCube: digitale Aus- und Weiterbildungsberatung mittels Bildungsgraphen. The authors would like to thank the business project partner for the fruitful discussions and for allowing us to use the examples in this publication. We would like to thank Benjamin Haymond for his very helpful and precise revision and language editing support of this manuscript.
References 1. J.E. Alvarez, H. Bast, A review of word embedding and document similarity algorithms applied to academic text, in Bachelor’s Thesis, University of Freiburg (2017). https://pdfs. semanticscholar.org/0502/05c30069de7df8164f2e4a368e6fa2b804d9.pdf 2. O. Egozi, S. Markovitch, E. Gabrilovich, Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst. (TOIS) 29(2), 8 (2011) 3. Y. Song, D. Roth, Unsupervised sparse vector densification for short text similarity, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2015), pp. 1275–1280 4. E. Gabrilovich, S. Markovitch, Computing semantic relatedness using Wikipedia-based explicit semantic analysis, in IJcAI, vol. 7 (2007), pp. 1606–1611 5. D. Bogdanova, M. Yazdani, SESA: Supervised Explicit Semantic Analysis. arXiv preprint arXiv:1708.03246 (2017) 6. M. Pagliardini, P. Gupta, M. Jaggi, Unsupervised Learning of Sentence Embeddings Using Compositional n-gram Features arXiv preprint arXiv:1703.02507 (2017) 7. A. Waldis, L. Mazzola, M. Kaufmann, Concept extraction with convolutional neural networks, in Proceedings of the 7th International Conference on Data Science, Technology and Applications (DATA 2018), vol. 1 (2018), pp. 118–129 8. K. Bennani-Smires, C. Musat, M. Jaggi, A. Hossmann, M. Baeriswyl, EmbedRank: Unsupervised Keyphrase Extraction Using Sentence Embeddings. arXiv preprint arXiv:1801.04470 (2018) 9. Y. Yao et al., Granular computing: basic issues and possible solutions, in Proceedings of the 5th Joint Conference on Information Sciences, vol. 1 (2000), pp. 186–189 10. C. Mencar, Theory of Fuzzy Information Granulation: Contributions to Interpretability Issues (University of Bari, 2005), pp. 3–8 11. M.M. Gupta, R.K. Ragade, R.R. Yager, Advances in Fuzzy Set Theory and Applications (NorthHolland Publishing Company, 1979) 12. G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988) 13. K. Lund, C. Burgess, Producing high-dimensional semantic spaces from lexical co-occurrence. Behav. Res. Methods Instrum. Comput. 28(2), 203–208 (1996)
Smart Manufacturing Systems: A Game Theory based Approach Dorothea Schwung, Jan Niclas Reimann, Andreas Schwung and Steven X. Ding
Abstract This paper presents a novel approach for self-optimization and learning as well as plug-and-play control of highly flexible, modular manufacturing units. The approach is inspired by recent encouraging results of game theory (GT) based learning in computer and control science. However, instead of representing the entire control behavior as a strategic game which might results in long training times and huge data set requirements, we restrict the learning process to the supervisor level by defining appropriate parameters from the basic control level (BCL) to be learned by learning agents. To this end, we define a set of interface parameters to the BCL programmed by IEC 61131 compatible code, which will be used for learning. Typical control parameters include switching thresholds, timing parameters and transition conditions. These parameters will then be considered as players in a multi-player game resulting in a distributed optimization approach. We apply the approach to a laboratory testbed consisting of different production modules which underlines the efficiency improvements for manufacturing units. In addition, plug-and-produce control is enabled by the approach as different configuration of production modules can efficiently be put in operation by re-learning the parameter sets.
D. Schwung (B) · J. N. Reimann · A. Schwung South Westphalia University of Applied Sciences, Soest, Germany e-mail: [email protected] J. N. Reimann e-mail: [email protected] A. Schwung e-mail: [email protected] S. X. Ding University of Duisburg-Essen, Duisburg, Germany e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_3
51
52
D. Schwung et al.
1 Introduction The digitalization of the industrial sector gives raise to ever more complex demands on state of the art manufacturing processes. One of the most prominent potentials offered by flexible manufacturing is lot size 1 production of individualized products. However, this requires highly flexible and fast and reliably adapting production processes to stay competitive. To this end, highly automated production environments are required offering full plug-and-work functionality [1, 2], fast reconfiguration [3] and fast adaption to different production goals. Crucial requirement for such systems is a modular system structure on the hardware as well as on the software level which however poses challenges for their control. Hence, decentralized and distributed control and communication systems in a multi-agent system (MAS) setting [4] have to be developed. In addition, due to the changing production environments and requirements as well as the demand for efficient production, self-learning capabilities have to be included in the control system. Self-learning can be implemented using different approaches for machine learning, namely unsupervised, supervised and reinforcement approaches, see [5, 6] for recent overviews. Also, approaches based on game theory have been developed, though mostly in a static setting. Essentially, GT is concerned with representing the optimization process in form of a game with multiple players [7]. Each player continuously takes actions in an environment while observing the other players actions and reacting in a way to optimize an overall objective function. Consequently, optimization in a game setting is inherently distributed, suiting its application for large technical systems including distributed production environments. In [8], the application of dynamic potential games to the management of distributed communication networks is presented while [9] applies games to energy-efficient power control and subcarrier allocation in networks, see also the overview in [10] for further references in this field. Applications of game theory based control have been presented in [11, 12]. Smart grid optimization using state based potential games has been reported in [13] while supply chain coordination using cooperative game theory is presented in [14]. The modeling of driver and vehicle interactions is reported in [15]. However, successful implementations of GT based approaches in manufacturing domains are rare and often restricted to scheduling problems [16], mainly due to various obstacles inherent to GT which prevent them from a broader application. First, GT approaches often suffer from relatively long training times which have to be executed while running the game, i.e. by interacting with the production system. Second, GT requires some exploration in the environment to be successful in finding optimal policies. However, exploration in industrial processes is naturally limited on the one hand by safety constraints and on the other hand by production requirements if learning is implemented online while manufacturing continues. Consequently, applications of games in industrial applications have not been reported to date. In this paper we present a novel approach for a decentralized control design of modular production units thereby avoiding the above mentioned drawbacks. To this end, we restrict the self-learning capabilities to the supervisory control level
Smart Manufacturing Systems: A Game Theory based Approach
53
(SCL) implemented by distributed learning modules. Particularly, we assume that each production module’s basic control functions like logic of sequential control are implemented using IEC 61131 compatible program code, e.g. in form of functions, function blocks or sequential function charts (SFC). The task of the distributed learning modules is the learning of optimal control parameters and optimal control sequences including optimal timing by means of a multi-agent GT-architecture. Hence, each control parameter serves as a player in the multi-agent game setting. We apply our approach to a laboratory scale testbed representing parts of a typical bulk material production process. Experiments on a detailed simulation environment as well as at the testbed show promising results and a great potential for improvements of production efficiency in terms of energy consumption as well as throughput times. Furthermore, in combination with the service-oriented architecture (SOA) proposed previously [17], fast reconfiguration and fast re-adaptation of the control system in case of module exchange or new module integration are obtained allowing for plugand-produce control. Additionally, the proposed approach allows for implementation in existing production installations by appropriate configured add-on learning modules. A preliminary version of this work has been presented in [18], where however, we presented a similar approach of self-learning on the SCL using centralized actorcritic reinforcement learning [19, 20]. The presented approach using GT provides a much simpler learning approach which operates in a fully distributed manner. The paper is organized as follows. In Sect. 2 we state the underlying problem. In Sect. 3 we present the GT-based distributed supervisory control approach. Section 4 presents results obtained on the simulation environment while Sect. 5 gives conclusions and outlooks future work.
2 Problem Statement In this section, we introduce the challenges at hand and formally state the underlying problem. We consider system structures as depicted in Fig. 1. As shown, we consider modular production systems consisting of several potentially different subsystems. We assume that each module is interchangeable within the considered manufacturing process such that each module can have arbitrary positions in the production and module schedule. Correspondingly, we assume that each module is equipped with its own control system responsible for the local control functions of each module. In addition, we assume that the modules are connected via suitable communication interfaces like e.g. Fieldbus, Ethernet or wireless communication. Furthermore, we assume that the modules can be uniquely identified within the production environment, e.g. by using identification techniques such as RFID or bar code. Note, that the above stated system structure is quite common in industrial environments. Examples can be found in state-of-the-art manufacturing processes as well as in the process industry.
54
D. Schwung et al.
Fig. 1 Considered system structure consisting of several subsystems with their own PLC-based control system
After describing the considered system setup, we will now state the considered problem: Consider the distributed system S with l subsystems Si as illustrated in Fig. 1 where each subsystem is assumed to contain n i different control relevant parameters. In addition, we define a number of production goals m i for each module indicated by ei, j (n i ) with i = 1, . . . l, j = 1, . . . m i . Then, find the optimal production control over a given production episode t = 0, . . . T by optimizing e∗ (t) = max ni
T
ei, j (t).
(1)
t=0 i, j
Some remarks to the previously defined problem are in order. The production goals ei, j (n i ) can be arbitrarily defined based on the given process objectives and available process measurements. Examples include product concentrations in chemical plants, mass flows in bulk good plants or processing times in manufacturing plants. Note, that each module does not necessarily have such production goals, but the production goal might be defined as the output of a sequence of subsequent modules. The length of the considered production episode is closely related to the production goals as they can only be accessed after some processing time. Hence, the episode should be at least as long as the processing times. The number of samples per episode should be determined such that the important dynamics of the process parameters are represented. The control relevant parameters depend on the design of the BCL and have to be defined in an application oriented manner. We will discuss this in more detail in Sect. 3.4.
Smart Manufacturing Systems: A Game Theory based Approach
55
3 Game Theory based Self-learning in Flexible Manufacturing In this section we present our approach for game theory based self-learning in flexible manufacturing systems. After a short introduction to GT, we describe the BCL and define typical optimization parameters to be learned for production optimization. The self-learning agents are described thereafter together with a scheme for the coordination of the learning agents.
3.1 Introduction to Dynamic Games Let N = 1; . . . ; N be the set of players and let X ∈ R S denote the set of states of the game where the dimensionality of the state set can be different from the number of players. At every time step t, the state vector of the game is represented by S xt = xtk k=1 ∈ X . Every player i ∈ N can be influenced only by a subset of states Xi ⊆ X . We define the partition of the state space X in the component domain, i.e. we define X (i) ⊆ 1; . . . ; Sas the subset of indexes of state-vector components that influence player i and xti = xtk k∈X (i) indicates the value of the state-vector for player i at time t. We also define xt−i = xtk k ∈X ∈ X −1 for the vector of components that / (i) do not influence player i. Let U ⊆ R Q denote the set of actions of all players, and U i ⊆ R Q i represent for i i the subset of actions of player i. We denote u t ∈ U the action vector of player i at time t, such that the vector u t = u 1t ; . . . ; u tQ i concatenates the action vectors of Qi i+1 all players. We also define u −i = u 1t ; . . . ; u i−1 as the vector of all ; u ; . . . ; u t t t players actions except that of player i. Hence, by slightly abusing notation, we can . rewrite u t = u it ; u −i t The state transitions are determined by f : X × U × N ← X , such that the nonstationary Markovian dynamic equation of the game is xt+1 = f (xt ; u t ), which can k = f k (xt ; u t ) for k = 1; . . . ; S. The dynamic is be split among components: xt+1 Markovian because the state transition to xt+1 depends only on the current stateaction pair (xt ; u t ), rather than on the whole history of state-action pairs. For each player we define an utility function π i : X i × U × N ← R, such that, at i i −i every time t, each player receives an utility value π i (x t ; u t ; u t ). The aim of player i is to find the sequence of actions u i0 ; . . . ; u it ; . . . that maximizes its long term −i cumulative utility, given other players sequence of actions u −i 0 ; . . . ; u t ; . . . . Thus, a discrete time infinite-horizon noncooperative, nonstationary Markovian dynamic game can be represented as a set of Q coupled optimal control problems [8]:
56
D. Schwung et al.
max ∞
u it ∈
t=0
∞ Ui
β t π i (xti ; u it ; u −i t ), ∀i ∈ N
(2)
t=0
s.t. xt+1 = f (xt ; u t ), x0 given
(3)
with 0 < β < 1 the discount factor bounding the cumulative utility. The above problem is infinite-horizon because the reward is accumulated over infinite time steps. The solution to the problem is given by the Nash Equilibrium (NE) of the game, which is a sequence of actions u ∗t ∞ t=0 satisfying the following condition for every player i ∈ N : ∞
∗−i β t π i (xti ; u ∗i t ; ut ) ≥
t=0
∞
β t π i (xti ; u it ; u ∗−i t ).
(4)
t=0
To solve the above problem, i.e. to find the NE of the dynamic game, different approaches exist. In general, such a game can recasted as a set of coupled optimalcontrol-problems [8] which are however difficult to solve. By introducing the class of dynamic potential games (DPG), the problem can be relaxed to a single multivariate optimal-control-problem (MOCP) which is simpler to solve but still requires a considerable amount of computation making it hard to implement in realtime and resource constraint environments. In [21] binary log linear learning (BLLL) [22] has been introduced for state based potential games, a related approach aiming to maximize the immediate reward. The BLLL is considerably less complex to implement compared to MOCP solvers and hence, better suited for PLC implementation.
3.2 Learning Algorithm In this section, we present a simple yet efficient learning algorithm for dynamic games, inspired by the idea of BLLL. We assume that the system dynamics are unknown with only the initial state x0 given. Hence, the learning algorithm has to infer optimal actions for each player from interacting with the environment. In each episode, each player provides an action vector from its actual policies to the environment. At the end of each episode e, the utilities of each player πei (xti ; u it ; u −i t ) are calculated. Furthermore, the mean of the individual utilities μe =
N 1 i i i −i π (x ; u ; u ) N i=1 e t t t
(5)
is calculated as a measure of the collective performance of the individual actions. Furthermore we define the change of the utility mean Δμ = μe − μe−1 in the last episode and its direction δμ as
Smart Manufacturing Systems: A Game Theory based Approach
sgn(Δμ) , i f Δμ = 0 δμ = −1 , other wise.
57
(6)
Similar to BLLL, within one update step, only one action is updated. This update is done by means of a mixture of exploration and exploitation of the action space. More specific, we first select the player i and its action u i, j to be updated with equal probability. The chosen action is either set to a permissible random value or updated depending on the exploration rate which decays over time. The update of action u i, j depends on δμ, the action change rate αc and the maximum allowed action change rate αmc and is defined as
j u i,new
⎧ i, j i, j i, j ⎪ if u old ≥ 1 − αc ⎨u old + min(δμ · αc · sgn(Δu ), 0), i, j i, j = u old + max(δμ · αc · sgn(Δu i, j ), 0), if u old ≤ αc ⎪ ⎩ i, j u old + max(min(δμ · αc · sgn(Δu i, j ), αmc ), −αmc ), otherwise
(7) Note that the first two cases assure that the action cannot increase or decrease, if i, j i, j u old ≥ 1 − αc or u old ≤ αc , respectively. Otherwise the action increase is limited to αmc . The term δμ · sgn(Δu i, j ) measures the quality of the last action change. The pseudocode of the algorithm is given as follows:
Data: Environment (simulation model), hyperparameters Result: Set of optimization parameters Logical randomization of all parameters; for e ∈ [1, emax ] do Simulate Episode; Calculate πei ∀i ∈ N ; Calculate μe ; if e > 1 then Calculate Δμ; Calculate δμ; end Sample random action of random player; Store players action setting; if Exploring then Set selected action to random value; else Update selected players action; end i, j
i, j
Calculate and store action change Δu i, j = u new − u old ; Update αc ← αc · dα ; Update ← · d ; end
58
D. Schwung et al.
Fig. 2 GT-framework for the PLC-controlled system
3.3 Learning System Overview The system setup developed for solving the problem stated in Sect. 2 is presented in Fig. 2. As shown, the general system structure follows the basic ideas of GT, where a learning agent acts on an environment by using an appropriate action set and observes the current state of the system together with the reward evaluating the system performance. However, in contrast to standard GT frameworks, where the environment is represented by the uncontrolled system, in this work the environment is represented by the controlled system. Thereby, the BCL is implemented using standard PLC-programming following the specification of IEC 61131-3, IEC 61499 or IEC 61512, respectively. The GT-framework is implemented in the supervisory control layer (SCL). Consequently, the action set of the learner does not consist of actuation signals like motor or valve control signals, but consists of parameters of the BCL ranging form controller parameters and threshold values in logic control to timing and transition conditions in sequence control. In this regard, the approach can also be interpreted as a form of meta learning, discussed recently in fairly different scenarios [23]. Note, that the proposed framework allows for an upgrade of existing PLCprograms by addition of possibly distributed GT-functions. Furthermore, the available baseline program in the BCL assures a safety operation also during exploratory learning behavior. Hence, safety constraints don’t have to be dealt with during GT which makes the implementation easier.
3.4 Basic Control Layer The function of the BCL comprises all necessary control functions required to steer the operation of the system. This includes logic and sequential control functions and standardized function blocks, e.g. for motors, valves and controllers as well as
Smart Manufacturing Systems: A Game Theory based Approach
59
timer and counter functions etc. Particularly, all safety relevant control functions are implemented in the BCL. Hence, all low level control functions are realized by the BCL. The connection to the supervisory self-learning level shall be accomplished by appropriately defined action interfaces to allow for a hierarchical decomposition of the control layers. To this end, we consider the pertinent norms for PLC-programming like the IEC 61131 [24] and IEC 61499 [25] to define such interfaces. As will be illustrated in the following, action interfaces shall be independently defined according to the application at hand and shall be defined on different levels of the BCL, i.e. on the control logic, on a function block (FB) basis as well as on higher logic levels like changes in sequential function charts and switch of control modi or service requests in service-oriented architectures (SOA) [26, 27]. According to the IEC 61131 framework, different possibilities exist for incorporating actions from the supervisory level depending especially on the considered programming language. Most of these possibilities are related to the FB concept of the IEC 61131. More specific, we propose to define action inputs for those FBs which are employed for optimization relevant parts of the control logic. Note, that for standard FBs, as defined by the norm, existing inputs can be employed as action inputs. Typical examples range from the PT-input of timer-FBs to set or reset-inputs of bistable elements to slope-parameter of ramp-FBs, to name a few. Note, that the action inputs can be either binary or continuous variables. Another available input to be defined as action input is the EN-input allowing to freeze the execution of the corresponding FB. Additional actions can be defined in the logic control part, i.e. the part connecting the FBs. Particularly, threshold parameters for interlocks or switching conditions can be integrated as binary or continuous actions. In addition to logic control, sequential control functions represented as sequential function charts (SFC) are often employed, forming a higher level from the program structure point of view. Different possibilities exist for defining the action interface. The one is following a similar approach to actions in FBs, in that parameters of the SFC are defined as actions. This includes transition conditions, actions associated to a step and action qualifiers. The other approach is based on a more structural evaluation of the SFC as illustrated in Fig. 3. As can be seen, actions are associated with either different sequential orders or even interchange of SFCs or parts of it. This latter approach together with the GT-framework allows also for automatic and adaptable reconfiguration and reorganization of production environments by learning to exchange different SFC structures. Following similar ideas as with the actionbased reorganization of SFCs, we can also define actions for switching between different control modes. A possible scenario is the optimized switching between operation modi, e.g. from start-up to normal operation. Furthermore, an extension to the recently proposed SOAs is possible by defining actions in form of service requests. In this case, the system learns an optimal orchestration of service requests. With regard to SOAs and distributed architectures, using the implementation paradigms of the recently developed IEC 61499 offers some advantages [28]. Hence, we have a closer look on how actions can be defined in the IEC 61499 programming domain. Particularly, function blocks (FB) as the basic element of the IEC 61499 programming paradigm as well as the event based execution of the FB and the available management
60
D. Schwung et al.
Fig. 3 GT in SFC for convertible production environments
commands are of interest. The ability of creating, deleting and resetting FB-instances offers the possibility to extend the learning process. i.e. to create new logic based on the GT-process. In addition, the event based execution of FB allows for additional action interfaces. In [27], such a reconfiguration of control code is proposed on the service level, i.e. by exchanging corresponding services. Form this point of view, the GT-framework can be seen as operating on the service orchestration level, extending the learning process to the design of new control code. We will present an implementation example for the action definition in our application example in Sect. 4.2.
3.5 Supervisory Self-learning Modules The goal of the supervisory self-learning module is the optimization of predefined quality functions using a distributed game theory based algorithm as presented in Sect. 3.2. To finally set up the GT-problem to be solved by the presented architecture, we have to define the states and actions, the utility function as well as different learning parameters. In the previous section, we already presented a big range of possibilities to define the actions for the GT-framework. Additionally, the states of the system have to be identified in a generalizable form for application scenarios. Basically, states are related to the available sensory equipment indicating a certain state of production, e.g. the product output of a production module. However, as we define the actions not only on the basic control level, the definition of states for e.g.
Smart Manufacturing Systems: A Game Theory based Approach
61
reconfiguration tasks in SFCs or for SOAs have to be reevaluated. More specific, in addition to states generated by the physical system in form of measurements of physical variables, also states representing the actual conditions of the control system, i.e. cyber states, have to be defined. As such cyber states can include control as well as communication related states. A typical control related cyber state being used if SFCs have to be structurally optimized, is the current step of the sequence as well as coded information about the current structure of the sequence. An example for communication oriented states is the orchestration of communicating production modules in form of communication graphs. Such states are particularly useful in SOAs where the service exchange highly depends on the communication between modules. The role of the utility function is that of a quality function representing the objectives for the optimization of the production environment. Depending on the given optimization objective, the utility can take different strategic goals into account. Typical performance indicators to be used are the energy consumption of the overall plant as well as of certain actuators, process optimization related parameters like throughput time, wait and idle times, storage requirements or reconfiguration, setup and changeover times of machines or quality related objectives like production rejects or amount of reworking. In application scenarios, a combination of different objectives is used. Beside the definition of the physical objectives, the timing of utilities has to be taken into account. In production scenarios, we distinguish between periodic and aperiodic, i.e. continuously operated processes. This influences on the one hand the utility definition and on the other hand the length of the game. For periodical processes, the game can be stopped if the goal has been reached successfully or after a maximum time duration and an utility is given by the end of the game. For continuously operated tasks, the game is played infinitely long. Hence, a continuous utility, allocated at each time step, has to be defined.
4 Results In this section we apply the developed approach to a detailed simulation model of the laboratory testbed. We highlight aspects of the actual implementation for energy optimization in production environments and present results.
4.1 Laboratory Testbed We start with a short explanation of the laboratory scale testbed which is illustrated in Fig. 4 [29]. The testbed consists of four modules for processing bulk good material. Modules 1 and 2 represent typical supply, buffer and transportation stations. Module 1 consists of a container and a continuously controlled belt conveyor from which the
62
D. Schwung et al.
Fig. 4 Schematic of the laboratory scale testbed
bulk good is carried to a mini hopper, which is the interface to module 2. Module 2 consists of a vacuum pump, a buffer container and a vibration conveyor. The vacuum pump itself transports the material from module 1 into an internal container. The material is then released to the buffer container by a pneumatically actuated flap and then charged to the vibration conveyor. The product is then conveyed to a mini hopper which is the interface to the next module. Module 3 is the dosing module, consisting of a second vacuum pump with similar setup as in module 2 and the dosing unit. This unit consists of a buffer container with a weighing system and a continuously controlled rotary feeder. The dosed material is finally transported to module 4 using a third vacuum pump. Module 4 is a filling unit where the product is filled into transport containers. Note, that each module can have an arbitrary position in the production sequence or can be completely removed. Each of the four modules is equipped with its own PLC-based control system using a Siemens ET200SP. The four control systems communicate with each other via Profinet. Each module is equipped with a set of sensors to monitor the modules state allowing for discrete and/or continuous control of each module. Additionally, each module is equipped with a RFID-reader and a RFID-tag which allows each station to identify the subsequent module in the production flow. Furthermore, the individual IP-address of each module is coded on the tag.
Smart Manufacturing Systems: A Game Theory based Approach
63
4.2 Implementation For the application of the proposed GT-framework to the bulk good laboratory system, we consider the optimization of the energy consumption of the whole plant while simultaneously meeting the given process constraints. More specific, we require the system to perform different dosing operations, namely continuous dosing and batch dosing which is a periodical task. The optimization goal is to optimize the energy consumption of the dosing module as well as the two supplier modules, while meeting the dosing requirements as well as additional constraints, particularly the avoidance of overflow of any of the buffer container and hopper. To start off, we first define the states, utilities and learning episode. After that we give some details about the implementation of the BCL and determine the action set based on the BCL. Finally we discuss the actual parameter settings of the applied GT-algorithm. For the state variables we choose the filling levels of each of the buffer container and mini hopper. The level sensors work differently. The sensors of the buffer container provide a continuous measurement while the sensors of the mini hopper are simple level switches with three states (full, medium, empty). The utilities are defined for each station as follows:
(8) Ps dt, r E,s = −η E
(9) r T,s = ηT V˙T,s dt,
r W,s = −ηW V˙W,s dt, , (10) rs = r E,s + r T,s + r W,s , rs Rt =
(11) (12)
s
where s denotes the station, η E , ηT and ηW are the utility weights for energy being used, transported good and wasted good, respectively, Ps is the consumed electrical power and V˙T,s and V˙W,s are the actual mass flow between the stations and the wasted mass flow due to overflow. Hence, the final utility weights the energy consumption and the exchanged bulk good and additionally penalizes possible buffer overflows. The learning episode comprises a fixed duration of system simulation, in our case a time duration of 100 s with varying initial conditions depending on the state of the system. I.e., the episode duration can be seen as a sliding episode window and hence, can be used for both episodic as well as continuous operation. For the implementation of the BCL, we follow the SOA-based approach presented in [17] which is based on exchange of service requests and service provision by interconnected modules. The conditions to provide services and to request services are implemented in the BCL of each module and depend mainly on the level status of the buffer and hopper respectively. More precisely, a service request is displayed if the
64
D. Schwung et al.
Table 1 Definition of optimization parameters, i.e. actions of the game Parameters of players Explanation k1,A k2,A k3,A k4,A k5,A k1,B k2,B k3,B k4,B k1,C k2,C k3,C k1,D k2,D k3,D k4,D k1,E k2,E k3,E k4,E k5,E
Minimum fill-level of Buffer A to start conveyor belt Minimum fill-level of Hopper A to start conveyor belt Maximum fill-level of Hopper A to start conveyor belt Drive speed of the conveyor belt at k2,A Drive speed of the conveyor belt at k3,A Minimum fill-level of Hopper A to start vacuum pump Level difference threshold value to k1,B to stop vacuum pump Maximum fill-level of Buffer B to start vacuum pump Level difference threshold value to k3,B to stop vacuum pump Minimum fill-level of Buffer B to start vibration conveyor Maximum fill-level of Hopper B to start vibration conveyor Run time of vibration conveyor during episode Minimum fill-level of Hopper B to start vacuum pump Level difference threshold value to k1,D to stop vacuum pump Maximum fill-level of Buffer C to start vacuum pump Level difference threshold value to k3,D to stop vacuum pump Minimum fill-level of Buffer C to start rotary feeder Minimum fill-level of Hopper C to start rotary feeder Maximum fill-level of Hopper C to start rotary feeder Drive speed of the rotary feeder at k2,A Drive speed of the rotary feeder at k3,A
level in the buffer is below a certain threshold, a service is provided, if enough material is available, again indicated by a threshold value. All these thresholds have been kept fixed during operation and consequently, the orchestration of services in the system is not optimized. Hence, we choose to employ these thresholds as the actions of our proposed supervisory controller. Additionally we define the switching thresholds of each conveyor as actions. An overview of the actions and corresponding explanations are given in Table 1. A graphical representation of the parameter relations for each module are provided in Fig. 5. By means of the such defined states, actions and utilities, we are able to set up the GT algorithm as detailed in Sect. 3.2. The meta-parameters of the GT-based learning algorithm along with the weight parameters are given in Table 2.
Smart Manufacturing Systems: A Game Theory based Approach
65
Fig. 5 Relations between the parameters ki,X of each player X . The first row illustrates the parameters of conveyor belt ki,A and rotary feeder ki,E , second row illustrates the parameters ki,B , ki,D of both vacuum pumps, third row illustrates the parameters of vibration conveyor ki,C Table 2 Weigth parameters and meta-parameters for the implementation example Hyperparameter Description Example ηE ηT ηW emax 2 α pc,2
Utility weight for energy consumption Utility weight for transported good Utility weight for wasted good Maximum Number of Episodes Exploration rate after emax /2 episodes Relative Action change rate after emax /2 episodes
10 3 10 20,000 0.05 0.25
d
Exploration Rate decay
2emax
dα
Action change decay
emax α pc,2
α pc
Initial action change rate
1 ηi
αsmc
Maximum action change rate
0.2
2
2
≈ 0.043
66
D. Schwung et al.
4.3 Experiments In what follows, we show experimental results of the implementation of the presented framework with a focus on energy optimization in a production scenario. To this end, we use a detailed simulation model of the BGS. This allows for more efficient experiments in terms of learning speed and adjustment of appropriate learning parameters compared to experiments on the real plant. In Fig. 6 (left), we show the learning behavior of the GT learning algorithm applied to the bulk good system. As can be seen, the overall potential strongly increases after a short exploration and finally converges. During convergence, we observe some coordination between the individual utility until ending up in the final setting. The energy utility first increases with increasing transport utility, however after the transport is reaching the maximum then slightly decreases to end up in the overall best operation point. The overflow penalty finally converges to no overflow condition. The progress of the optimization parameters is shown in Fig. 7. We note that the values are each normalized to the interval [0 1]. In general, the convergence of all parameters can be observed. Furthermore, there are some parameters which converge fast while other parameters tend to vary strongly over the course of training which might indicate the irrelevance of these parameters with regard to the optimization objectives.
Fig. 6 Progress of the overall and individual utilities functions (left) and the ratio of transported material per energy consumed (right) during learning
Smart Manufacturing Systems: A Game Theory based Approach
67
Fig. 7 Development of optimization parameters during learning
The final parameter settings obtained after training are reasonable from the operation of the system point of view. Particularly, the actuators (players) of the supply stations, i.e. modules 1 und 2, exhibit a reasonable behavior which assures that the buffer in front of the rotary feeder is constantly filled. Hence, the rotary feeder is able to meet the demand correspondingly. Also, the switch on durations of vibration conveyor and vacuum pump as well as the conveyer and feeder speed are learned in such a way that the throughput of each actuator is nearly identical and corresponds to the given demand. Interestingly, we can also observe from the parameters ki,A that the learning algorithm keeps the conveyor belt on continuously once a relatively low minimum fill level is reached. This seems counterintuitive with regard to energy consumption. However, the first module is crucial for the successful operation of the whole plant as material which is kept back in the module might cause subsequent station to run empty.
68
D. Schwung et al.
5 Conclusions In this paper we presented a novel approach for optimization in large scale distributed production environments. The approach is based on a formulation of the optimization problem in form of a distributed game acting on a basic control layer programmed as standard PLC-code. To this end, we employ a simple best response learning algorithm to optimize suitably defined learning parameters used in PLC function blocks of the basic control layer. The algorithm operates in a fully distributed setting allowing for an easy implementation in the basic control layer as individual function blocks. This allows for an application not only for newly developed control software but also to plants already in operation as plug-in-modules without the requirement to change the actual control software. Furthermore, as the search space for optimal parameters is restricted by definition, unsafe operation due to inappropriate parameter settings are avoided. Also, the resulting control policy will not deviate much from the original policy as only parameters are adjusted. Correspondingly, in contrast to approaches learning from scratch as e.g. reinforcement learning where long training times are required until the policies finally converge, this approach can be run in real time with ensured baseline performance. The approach is applied to the energy optimization of a laboratory bulk good testbed with very promising results. In future research, a PLC-ready implementation of the approach will be provided, making it applicable for realtime application in industrial environments. Furthermore, we will explore more advanced game theoretical environments, especially dynamic as well as state-based potential games as a framework for the development of more advanced learning algorithms. Also, the development of adaptable and learning enabled PLC-code opens direction for future research. Beside game theoretic approaches, also reinforcement learning based approaches, incorporating the already available PLC-code, will be investigated. Furthermore, a coupling of game theoretic concepts with RL shall be studied.
References 1. J. Pfrommer, D. Stogl, K. Aleksandrov, S.E Navarro, B. Hein, J. Beyerer, Plug & produce by modelling skills and service-oriented orchestration of reconfigurable manufacturing systems. at-Automatisierungstechnik 63(10), pp. 790–800 (2015) 2. M. Schleipen, A. Lüder, O. Sauer, H. Flatt, J. Jasperneite, Requirements and concept for plug-and-work-adaptivity in the context of industry 4.0. at-Automatisierungstechnik 63(10), 801–820 (2015) 3. R.W. Brennan, P. Vrba, P. Tichy, A. Zoitl, C. Sünder, T. Strasser, V. Marik, Developments in dynamic and intelligent reconfiguration of industrial automation. Comput. Ind. 59(6), 533–547 (2008) 4. P. Leitao, A. Walter Colombo, S. Karnouskos, Industrial automation based on cyber-physical systems technologies: prototype implementations and challenges, Comput. Ind. 81, 11–25 (2016) 5. J. Wang, Y. Ma, L. Zhang, R.X. Gao, D. Wu, Deep learning for smart manufacturing: methods and applications. J. Manufact. Syst. (in Press) (2018)
Smart Manufacturing Systems: A Game Theory based Approach
69
6. T. Wuest, D. Weimer, C. Irgens, K.-D. Thoben, Machine learning in manufacturing: advantages, challenges, and applications. Prod. Manufact. Res. 4(1), 23–45 (2016) 7. D. Bauso, Game Theory with Engineering Applications (Society for Industrial and Applied Mathematics, Philadelphia, 2016) 8. S. Zazo, S. Valcarcel Macua, M. Sánchez-Fernández, J. Zazo, Dynamic potential games with constraints: fundamentals and applications in communications. IEEE Trans. Ind. Electron. Mag. 3(4), 49–55 (2015) 9. S. Buzzi, G. Colavolpe, D. Saturnino, A. Zappone, Potential games for energy-efficient power control and subcarrier allocation in uplink multicell OFDMA systems. IEEE J. Sel. Top. Sig. Process. 6(2), 89–103 (2012) 10. K. Yamamoto, A comprehensive survey of potential game approaches to wireless networks. IEICE Trans. Commun. E98 B(9), 1804–1823 (2015) 11. J.R. Marden, S.D. Ruben, L.Y. Pao, A model-free approach to wind farm control using game theoretic methods. IEEE Trans. Control Syst. Technol. 21(4), 1207–1214 (2013) 12. Q. Zhu, T. Basar, Game-theoretic methods for robustness, security, and resilience of cyberphysical control systems. IEEE Control Syst. Mag. 35(1), 46–65 (2015) 13. Y. Liang, F. Liu, S. Mei, Distributed real-time economic dispatch in smart grids: a state-based potential game approach. IEEE Trans. Smart Grids 21(4), 1207–1214 (2018) 14. Y. Zhao, S. Wang, T.C.E. Cheng, X. Yang, Z. Huang, Coordination of supply chains by option contracts: a cooperative game theory approach. Eur. J. Oper. Res. 207(2), 668–675 (2010) 15. N. Li, D.W. Oyler, M. Zhang, Y. Yildiz, I. Kolmanovsky, A.R. Girard, Game theoretic modeling of driver and vehicle interactions for verification and validation of autonomous vehicle control systems. IEEE Trans. Control Syst. Technol. 21(4), 1207–1214 (2019) 16. G. Zhou, P. Jiang, G.Q. Huang, A game-theory approach for job scheduling in networked manufacturing. Int. J. Adv. Manufact. Technol. 41(9–10), 972–985 (2009) 17. A. Schwung, A. Elbel, D. Schwung, System reconfiguration of modular production units using a SOA-based control structure, in Proceedings of the 15th International Conference on Industrial Informatics (INDIN 2017), Emden, Germany (2017) 18. D. Schwung, J.N. Reimann, A. Schwung, S.X. Ding, Self learning in flexible manufacturing units: a reinforcement learning approach, in Proceedings of the 9th International Conference on Intelligent Systems, Funchal, Portugal (2018) 19. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998) 20. V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu, Asynchronous methods for deep reinforcement learning, in Proceedings of The 33rd International Conference on Machine Learning, PMLR, 48 (2016), pp. 1928–1937 21. J.R. Marden, State based potential games. Automatica 48, 3075–3088 (2012) 22. J.R. Marden, J.S. Shamma, Revisiting log-linear learning: asynchrony, completeness and payoff-based implementation. Games Econ. Behav. 75, 788–808 (2012) 23. C. Lemke, M. Budka, B. Gabrys, Metalearning: a survey of trends and technologies. Artif. Intell. Rev. 44(1), 117–130 (2015) 24. Programmable controllers part 3: programming languages. International Standard IEC 611313, 2nd edn. (2003) 25. Function blocks, International Standard IEC 61499, 1st edn. (2005) 26. T. Cucinotta, A. Mancina, G.F. Anastasi, G. Lipari, L. Mangeruca, R. Checcozzo, F. Rusina, A real-time service-oriented architecture for industrial automation. IEEE Trans. Ind. Inform. 5(3), 267–277 (2009) 27. W. Dai, V. Vyatkin, J.H. Christensen, V.N. Dubinin, Bridging service-oriented architecture and IEC 61499 for flexibility and interoperability. IEEE Trans. Ind. Inf. 11(3), 771–781 (2015) 28. A. Zoitl, T. Strasser, C. Sünder, T. Baier, Is IEC 61499 in harmony with IEC 61131–3? IEEE Ind. Electron. Mag. 3(4), 49–55 (2009) 29. D. Schwung, T. Kempe, A. Schwung, Self-optimization of energy consumption in complex bulk good processes using reinforcement learning, in Proceedings of the 15th International Conference on Industrial Informatics (INDIN 2017), Emden, Germany (2017)
Ensembles of Cluster Validation Indices for Label Noise Filtering Jan Kohstall, Veselka Boeva, Lars Lundberg and Milena Angelova
Abstract Cluster validation measures are designed to find the partitioning that best fits the underlying data. In this study, we show that these measures can be used for identifying mislabeled instances or class outliers prior to training in supervised learning problems. We introduce an ensemble technique, entitled CVI-based Outlier Filtering, which identifies and eliminates mislabeled instances from the training set, and then builds a classification hypothesis from the set of remaining instances. Our approach assigns to each instance in the training set several cluster validation scores representing its potential of being a class outlier with respect to the clustering properties the used validation measures assess. In this respect, the proposed approach may be referred to a multi-criteria outlier filtering measure. In this work, we specifically study and evaluate valued-based ensembles of cluster validation indices. The added value of this approach in comparison to the logical and rank-based ensemble solutions are discussed and further demonstrated.
1 Introduction Supervised learning algorithms are used to generate classifiers [23]. For this machine learning task, the main idea is to apply a learning algorithm to detect patterns in a data set (inputs) that are associated with known class labels (outputs) in order to J. Kohstall acs Plus GmbH, Berlin, Germany e-mail: [email protected] V. Boeva (B) · L. Lundberg Blekinge Institute of Technology, Karlskrona, Sweden e-mail: [email protected] L. Lundberg e-mail: [email protected] M. Angelova Technical University of Sofia, Plovdiv, Bulgaria e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_4
71
72
J. Kohstall et al.
automatically create a generalization, i.e., a classifier. Under the assumption that the known data properly represent the complete problem studied, it is further assumed that the generated classifier will be able to predict the classes of new data instances. However, noise and outliers exist in real world data sets due to different errors. When the data are modelled using machine learning algorithms, the presence of label noise and outliers can affect the model that is generated. Improving how learning algorithms handle noise and outliers can produce better models. Outlier mining is the process of finding unexpected events and exceptions in the data. There is a lot of work on outlier detection including statistical methods [26], rule creation [22], and clustering techniques [7]. Conventional outlier mining methods find exceptions or rare cases with respect to the whole data set. In this paper, we introduce a novel outlier filtering technique that is close to class outlier detection approaches which find suspicious instances taking into account the class label [16–18, 33]. Such filtering approaches are also referred to as label noise cleansing [12]. The proposed ensemble approach, called Cluster Validation Index (CVI)-based Outlier Filtering, applies cluster validation measures to identify mislabeled instances or class outliers. We remove these instances prior to training and study how this effects the performance of the machine learning algorithm. Cluster validation measures are usually used for evaluating and interpreting clustering solutions in unsupervised learning. However, we apply these well-known and scientifically proven measures in a different context; we use them for detecting mislabeled instances or class outliers in training sets in supervised learning scenarios. In supervised learning the clusters (in the form of classes) are known, and if there exists a strong relation among the instances of these clusters the classes of new instances can be accurately predicted. The intuition behind our approach is that instances in the training set that are not strongly connected to their clusters are mislabeled instances or class outliers and should be removed prior to training to improve the classification performance of the classifier. Our idea can also be considered in the context of cluster assumption, a notion originating from semi-supervised learning. The cluster assumption states that two points which are in the same cluster (i.e., which are linked by a high density path) are likely to be of the same label [9]. This means that, by applying internal cluster validation measures on the classes of the training set we measure the degree of violation of the cluster assumption without explicitly computing clusters. Our approach assigns several cluster validation scores to each instance in the training set, thus representing the instance’s potential of being a class outlier with respect to the clustering properties that the used validation measures assess. In this respect, the proposed approach may be referred to as multi-criteria outlier filtering. The approach uses a combination of different cluster validation indices in order to reflect different aspects of the clustering model determined by the labeled instances of the training set. In order to be able to utilize the scores from every cluster validation measure, it must be able to combine the scores produced by different cluster validation indices into a single overall score representing the trade-off among the estimations of the involved measures. The latter requires to have cluster validation index scores that range in the same value interval, i.e. score normalization is needed.
Ensembles of Cluster Validation Indices for Label Noise Filtering
73
In the current study, we discuss different normalization techniques and specifically consider normalization aspects related to the cluster validation measures used in our experiments. We evaluate the effects of mining class outliers for five commonly used learning algorithms on eight data sets from the UCI data repository using three different cluster validation measures (Silhouette Index (SI), Connectivity (Co) and Average Intracluster gap (IC-av)). In the current experimental setup the used cluster validation indices are combined by logical operators: ∨ (OR) and ∧ (AND); mean operators: M (the arithmetic mean), G (the geometric mean) and H (the harmonic mean); and the median, i.e., six different experimental scenarios are conducted. In addition, we study two different data setups: the original data sets versus the data sets with injected noise. Notice that the results presented in this article are an extension of the study published in [5]. In the current work we have proposed and evaluated a valued-based ensemble of cluster validation indices. The added value of the proposed approach in comparison to the solution published in [5] is discussed and demonstrated in the current paper. Compared to the previous paper, the bibliography and related work section have been extended with more recent works on the studied problem. The rest of the paper is organized as follows. Section 2 reviews related works. Section 3 discusses the cluster validation measures and describes the proposed class outlier filtering approach. Section 4 presents the evaluation of the proposed approach and discusses the obtained results. Section 5 is devoted to conclusions and future work.
2 Related Work A number of methods that treat individual instances in a data set differently during training to focus on the most informative ones have been developed. For example, an automated method that orders the instances in a data set by complexity based on their likelihood of being misclassified for supervised classification problems is presented in [39]. The underlying assumption of this method is that instances with a high likelihood of being misclassified represent more complex concepts in a data set. Authors have shown that focusing on the simpler instances during training significantly increases the generalization accuracy. Identifying and removing noisy instances and outliers from a data set prior to training generally result in an increase in classification accuracy on non-filtered test data [8, 13, 38]. Conventional outlier detection methods find exceptions or rare cases in a data set irrespective of the class label of these cases, whereas class outlier detection approaches find suspicious instances taking the class label into account [16–18, 33]. Papadimitriou and Faloutsos [33] propose a solution to the problem when two classes of objects are given and it is necessary to find those which deviate with respect to the other class. Those points are called cross-outlier, and the problem is identified by cross-outlier detection.
74
J. Kohstall et al.
He et al. [16] try to find meaningful outliers using an approach called Semantic Outlier Factor (SOF). The approach is based on applying a clustering algorithm on a data set with a class label, it is expected that the instances in every output cluster should have the same class label. However, this is not always true. The semantic outlier definition is a data point which behaves differently compared to other data points in the same class, but looks normal with respect to data points in another class. He et al. [17] further define a general framework based on the contributions presented in [16, 33] by proposing a practical solution and extending existing outlier detection algorithms. The generalization does not consider only outliers that deviate with respect to their own class, but also outliers that deviate with respect to other classes. Hewahi and Saad [18] introduce a novel definition of class outlier and a new method for mining class outliers based on a distance-based approach and nearest neighbors. This method is called the CODB algorithm and is based on the concept of Class Outlier Factor (COF) which represents the degree of being a class outlier for a data object. The key factors of computing COF for an instance are the probability of the instance’s class among its neighbors, the deviation of the instance from the instances of the same class, and the distance between the instance and its k nearest neighbors. An interesting local outlier detection approach is proposed in [7]. It assigns to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in the sense that the degree depends on how isolated the object is with respect to the surrounding neighborhood. Closely related to class outlier mining is noise reduction [37, 40] that attempts to identify and remove mislabeled instances. For example, Brodley and Friedl [8] attempt to identify mislabeled instances using an ensemble of classifiers. Rather than determining if an instance is mislabeled, the approach introduced in [38] filters instances that should be misclassified. A different approach by Zeng and Martinez [45] uses multi-layer perceptrons that changes the class label on suspected outliers, assuming that the wrong label was assigned to that instance. A comprehensive survey on the different types of label noise, their consequences and the algorithms that consider label noise is presented by Frénay and Verleysen in [12]. In addition, in [47] a review of some typical problems associated with high-dimensional data and outlier detection specialized for high-dimensional data is published. In this paper, we propose an ensemble approach that applies cluster validation measures to identify mislabeled instances or class outliers. Ensemble methods are frequently used in statistics and machine learning to obtain better classification performance. Ensembles unite different learners into a combined one. Each learner makes a hypothesis and then – after training – a decision is achieved by majority voting. For the success of an ensemble, it is necessary that all the learners are independent of each other. The most commonly used forms of ensembles are bagging [6] and boosting [35]. Ensembles are widely used in data mining problems as classification and clustering [43, 44]. The technique of ensembles is also applied in data cleansing methods. For example, Brodley and Friedl [8] attempt to identify misla-
Ensembles of Cluster Validation Indices for Label Noise Filtering
75
beled instances by using an ensemble of classifiers. A similar approach is proposed in [21], where a majority vote to remove instances identified as noise is used. Ensemble approaches are also applied for outlier detection. Aggarwal has published a position paper describing different ensemble methods [1]. He categorizes those in model-centered ensembles and data-centered ensembles. Model-centered approaches use different models built on the same data. Data-centered ensembles, on the other hand, explore different parts, samples or functions of the data. Zimek et al. [46] also focus on ensembles for outlier detection and discuss how to combine different models. A different approach for mining outliers with ensembles is presented by Nguyen et al. [32]. The authors create ensembles by repeatedly assigning a random detector to a random subspace of the data. To form the ensemble out of the detectors, they are applied to unlabeled training data which allows the adjustment of the weights for each detector.
3 Methods and Technical Solutions 3.1 Cluster Validation Techniques One of the most important issues in cluster analysis is the validation of clustering results. Essentially, the cluster validation techniques are designed to find the partitioning that best fits the underlying data, and should therefore be regarded as a key tool in the interpretation of clustering results. The data mining literature provides a range of different cluster validation measures, which are broadly divided into two major categories: external and internal [19]. External validation measures have the benefit of providing an independent assessment of clustering quality, since they evaluate the clustering result with respect to a pre-specified structure. However, previous knowledge about data is rarely available. Internal validation techniques, on the other hand, avoid the need for using such additional knowledge, but have the problem that they need to base their validation on the same information used to derive the clusters themselves. Internal measures can be split with respect to the specific clustering property they reflect and assess to find an optimal clustering scheme: compactness, separation, connectedness, and stability of the cluster partitions. Compactness evaluates the cluster homogeneity that is related to the closeness within a given cluster. Separation demonstrates the opposite trend by assessing the degree of separation between individual groups. The third type of internal validation measure, connectedness, quantifies to what extent the nearest neighboring data items are placed into the same cluster. Stability measures evaluate the consistency of a given clustering partition by clustering from all but one experimental condition. The remaining condition is subsequently used to assess the predictive power of the resulting clusters by measuring the within-cluster similarity in removed experiment.
76
J. Kohstall et al.
In [29], Liu et al. focus on internal clustering validation and present a study of eleven widely used internal cluster validation measures for crisp clustering. The results of their study indicate that these existing measures have certain limitations in different application scenarios. As an alternative choice, Liu et al. propose a new internal cluster validation measure, which is based on the notion of nearest neighbors. Bayá and Granitto introduce also a new validation index, called Average Intracluster gap (IC-av) [2], aiming at solving some deficiencies of previous validation methods. This new measure is based on graph concepts, which has been designed to find arbitrary shaped clusters by exploiting the spatial layout (minimum spanning tree) of the patterns and their clustering label. The introduced index estimates cluster tightness, but instead of assuming spherical shape, it assumes that clusters are connected structures with arbitrary shape. A detailed and comparative overview of different types of validation measures can be found in [14, 42]. Since the aforementioned criteria are inherently related in the context of both classification and clustering problems, some approaches have been presented which try to evaluate multiple of these criteria or that try to improve one criterion by optimizing another. For example, Lavesson and Davidsson [28] empirically analyze a practical multi-criteria measure based on estimating the accuracy, similarity, and complexity properties of classification models. In a recent work, Jaskowiak et al. have also proposed a method for combining internal cluster validation measures into ensembles, which show superior performance when compared to any single ensemble member [20]. In the context of the presented study, using an ensemble of several internal cluster validation measures to analyze the labeled instances prior to training may be referred to a multi-criteria outlier filtering measure which tries to find a trade-off between the evaluation performance of the combined measures. In [10], Davidsson introduces an approach for controlling the generalization during model generation on (training) data to ensure that instances of an unknown class/type would not be classified as belonging to known (learned) clusters or classes. He introduces a confidence area for each cluster, which encompasses the known instances of the cluster, e.g., the instances used to generate the cluster, as well as an additional area outside of the cluster, for which the size is determined by a confidence factor. Thus, given a certain confidence factor, an instance may be regarded as belonging to specific cluster even if it is outside of the cluster since it is inside the outer area. However, if the instance was outside the outer area, e.g., if the confidence factor was higher, the instance would be categorized as being of unrecognized type, i.e., not belonging to any of the generated clusters. One can find some similarity between the above described approach of Davidsson [10] and the class outlier filtering algorithm proposed in this study. Namely, the proposed approach will identify an instance as class outlier if it cannot be regarded as belonging to the known instances of a cluster (class) with respect to the used cluster validation index. In the considered context, a confidence factor can be introduced by giving a threshold determining the maximum value of the validation index above (or below) which the instances are considered as not belonging to the clusters. In the context of cluster assumption, discussed in the introduction, this threshold can also be interpreted as the degree to which the cluster assumption is allowed to be violated.
Ensembles of Cluster Validation Indices for Label Noise Filtering
77
According to Bezdek and Pal [3], a possible approach to bypass the selection of a single cluster validity criterion is to rely on multiple criteria in order to obtain more robust evaluations. Thus in the experimental setup of this work, we have selected to ensemble three internal validation measures for analyzing the labeled instances prior to training in supervised classification problems in order to identify mislabeled data items. Based on the above mentioned classification, we have selected one validation measure for assessing the compactness and separation properties of a partitioning - Silhouette Index (SI), one for assessing connectedness – Connectivity (Co) and one for assessing tightness and dealing with arbitrary shaped clusters – Intracluster Average Distance (IC-av). Silhouette Index, presented by Rousseeuw in [34], is a cluster validation index that is used to judge the quality of any clustering solution C = C1 , C2 , . . . , Ck . Suppose ai represents the average distance of object i from the other objects of its assigned cluster, and bi represents the minimum of the average distances of object i from objects of the other clusters. Subsequently the Silhouette Index of object i can be calculated by: (bi − ai ) . (1) s(i) = max{ai , bi } The overall Silhouette Index for clustering solution C of m objects is defined as: s(C) =
m 1 (bi − ai ) . m i=1 max{ai , bi }
(2)
The values of Silhouette Index vary from -1 to 1 and higher values indicate better clustering results. As the Silhouette Index compares the distances from instances to its respective cluster against the distance to the nearest cluster, it assesses the separation between clusters. Evidently, instances which are assigned to a wrong cluster (class) can be detected by this measure. The same applies also when mislabeled instances form a small group, i.e. the Silhouette Index is robust against such a scenario. On the other hand, instances which are far away from their cluster but not close to any other cluster will not be detected as outliers by this measure. Connectivity captures the degree to which objects are connected within a cluster by keeping track of whether the neighboring objects are put into the same cluster [15]. Define m i j as the j th nearest neighbor of object i , and let χim i j be zero if i and m i j are in the same cluster and 1/ j otherwise. Then, for a particular clustering solution C = C1 , . . . , Ck of m objects and a neighborhood size nr , the Connectivity is defined as Co(C) =
nr m
i=1 j=1
χim i j .
(3)
78
J. Kohstall et al.
The Connectivity has a value between zero and n1r 1/nr and should be minimized. Evidently, the Connectivity of object i can be calculated by Co(i) =
nr
χim i j .
(4)
j=1
The Connectivity can easily capture class label noise. However, it might be not very robust in detecting groups of outliers, because of the hyperparameter nr determining in advance the neighborhood size. For instance, if the amount of outliers grouped together equals or is close to nr the Connectivity does not recognize those instances as outliers since their respective neighbors are in the same cluster. The above discussed scenario can be identified by Average Intracluster Distance (IC-av), which estimates cluster tightness. Instead of assuming spherical shape, it considers clusters as connected structures with arbitrary shape [2]. Consequently, instances that are remote from the majority of the clusters (classes) are treated as outliers. For a particular clustering solution C = C1 , C2 , . . . , Ck , the IC-av is defined as I C-av(C) =
k 1 2 d M E D (i, j), nr
r =1
(5)
i, j∈Cr
where nr is the number of objects in cluster Cr (r = 1, 2, . . . , k ) and d M E D (i, j) is the maximum edge distance, which represents the longest edge in the path joining objects i and j in the minimum spanning tree (MST) built on the clustered set of objects. Formally this can be described as, under the assumption that the instances i and j are in the same cluster they are connected via a unique path Pi j , which is given based on the definition of MST, and obviously Pi j consists of edges. If i = j we can further assume that the length of the path is greater than zero. Then the maximum edge distance is given as follows d M E D (i, j) = max {e ∈ Pi j } i, j∈Cr
(6)
The IC-av Index of object i , which is in cluster Cr , can be calculated by I C-av(i) =
1 2 d M E D (i, j). nr
(7)
j∈Cr
The IC-av has a minimal and maximal value between the average and the longest edge length in the MST and should be minimized. In order to adapt the IC-av for detecting outliers or label noise, the MST is calculated separately for each cluster. In that way, the class outliers can be clearly identified since they have a bigger distance to the majority of the instances in the considered class.
Ensembles of Cluster Validation Indices for Label Noise Filtering
79
3.2 Ensembles of Cluster Validation Indices for Label Noise Filtering In this study, we propose a class outlier filtering technique, entitled Cluster Validation Index (CVI)-based Outlier Filtering, that combines a few cluster validation measures in order to build an ensemble for mining label noise. We have validated the above idea by using the three internal validation measures discussed in the foregoing section: Silhouette Index (SI), Connectivity (CO) and Average Intracluster Gap (IC-av). Figure 1 shows a hypothetical 2-dimensional data set with two classes (circle and square) and three outliers (the two filled circles and the filled square). If we apply SI for assessing the instances of this data set instance 2 will be recognized as an outlier, while instance 1 will be removed if Connectivity is used. However, outlier instance 3 will not be considered as an outlier with respect to neither SI nor Connectivity. This instance would be filtered out as an outlier by IC-av measure estimating cluster tightness. The choice of cluster validation measure is therefore crucial for the performance of the proposed outlier mining technique. According to [3], a possible approach to bypass the selection of a single cluster validity criterion is to rely on multiple criteria in order to obtain more robust evaluations. In a recent work, Jaskowiak et al. also propose a method for combining internal cluster validation measures into ensembles, which show superior performance when compared to any single ensemble member [20]. Consequently, a rather straightforward solution to the above described problem is to use different cluster validation measures in order to find some complementarity among the clustering properties they assess. In this way different aspects of the clustering model determined by the
Fig. 1 A hypothetical 2-dimensional data set
80
J. Kohstall et al.
known class labels will be reflected in the filtering phase. In the current work we have studied and evaluated three different ensemble approaches for combining the discussed cluster validation indices.
3.2.1
Logical Ensembles
The selected cluster validation measures can be combined by logical operators: ∨ (OR) and ∧ (AND). In this way, it is possible to find the intersection or union of the set of noisy instances identified by the different cluster validation measures and filter out the corresponding instances. Obviously, when the ∧ operator is applied only instances which are considered outliers by all involved cluster validation measures are removed. On the other hand, by using the ∨ operator we identify the instances that are considered as outliers by at least one of the involved cluster validation measures. This idea has been initially studied and validated in [5].
3.2.2
Rank-Based Ensembles
Cluster validation measures are usually used for evaluating and interpreting clustering solutions. A cluster validation measure can assign a single score to each instance of a given data set that is usually interpreted as the degree to which the instance satisfies (or violates) the cluster property estimated by this measure. However, the scores generated by the different cluster validation measures can be in different ranges (see Chap. 3.1). This makes difficult to compare and aggregate scores generated by different cluster validation measures into a single overall score. One way to handle this is by ranking the data instances with respect to the produced cluster validation index (CVI) scores. In this way a ranking of the instances for each used cluster validation measure can be produced and the instances’ rankings can further be compared among the different cluster validation measures. In the context of cluster validation, ensemble approaches have been studied in [20, 43]. The authors discuss several different ways of combining cluster validation indices. For example, the Scaled Footrule Aggregation [11] which attempts to approximate an optimal aggregation. An alternative solution, entitled Robust Rank Aggregation, has been introduced in [24]. It focuses on dealing with the combination of ranked lists which contain noise and outliers. The rank-based ensemble techniques have also been used for finding outliers in [30, 31]. In [4], we have studied the rank-based approaches and have shown that they propose new aspects for detecting label noise compared to the logical-based ensembles. Namely, we have studied a median-based ensemble which combines cluster validation indices by calculating the overall rank as the median of the ranks produced by the used measures. The median ensemble has been shown to produce more stable results than the ones generated by the previously discussed logical ensembles. Especially, when combining more than two different cluster validation measures this can significantly improve the outlier detection.
Ensembles of Cluster Validation Indices for Label Noise Filtering
3.2.3
81
Value-Based Ensembles
As it was discussed in the previous section the rank-based integration of cluster validation indices has an added value in comparison to the logical ensembles by enabling to generate a single overall ranking of the data instances with respect to their outlierness. However, the scores generated by the different cluster validation measures are not normalized and therefore they cannot be aggregated into a single overall score that can be interpreted as the degree of outlierness of the instance under consideration. The latter limits the applicability of rank-based approaches. In [36], Schubert et al. have discussed further problems of rank-based ensemble approaches. Namely, these approaches allow arbitrary scores to be compared with each other independently of their value range. However, outlier detection does not really benefit from such an approach, mainly because outlier detection is a rather imbalanced problem, i.e. the key objects of interest are rare. Moreover, according to the authors [36] the ranking is only significant for the top objects, while the remaining inliers do not vary much at all. In addition, the scores themselves often convey a meaning and might even indicate that there are no outliers at all. Notice that the above discussion about non-value based ensembles is also valid when we filter out label noise by combining a few cluster validation measures. Evidently, in order to be able to utilize the scores from every cluster validation measure (criteria), it is necessary to build a value-based ensemble. Such an ensemble must be able to combine the scores produced by different cluster validation indices into a single overall score representing the trade-off among the estimations of the involved measures. The latter in most cases requires to have cluster validation index scores that range in the same value interval, i.e. score normalization is needed.
3.2.4
Cluster Validation Index Normalization
In order to combine different cluster validation measures in a value-based way, it is necessary first to bring all the scores in the same range otherwise the combined measure would be biased towards the method with the largest scale. In addition, the characteristics of a score need to be taken into considerations. A score which is open-ended needs to be treated differently than a score with a pre-defined range. Furthermore, it must be taken into account what each measure interprets as inlier and how label noise is treated. In [46], Zimek et al. have considered the challenges and issues with normalization of different outlier scores and have identified future challenges concerning the combination of outlier scores. In an earlier work methods for normalizing different outlier scores have been proposed [25]. The authors suggest regularization as an additional step before normalization. In this way, a common inlier value for regularized scores is defined. They suggest to apply different regularization techniques (baseline regularization, linear inversion or logarithmic inversion) depending on the score properties. Several possibilities for normalization further have been discussed, e.g., linear transformation which does not add any con-
82
J. Kohstall et al.
trast to the data, or Gaussian Scaling and Gamma Scaling which seem more fit for normalizing LOF [7]. The aim of the normalization is to place the scores of the considered cluster validation measure in the interval [0, 1]. In this context a normalized score close to 0 is more likely to represent an outlier, while a value close to 1 is surely an inlier. The following conditions need to be taken into account when normalization is conducted: • Stable ranking for potential outliers, e.g., an outlier should still be visible after the
normalization based on its rank. • The normalized scores should not be biased by the properties of the data set, e.g.
scaling. • Normalized scores should be comparable. Thus, all normalized scores should have
a similar weight distribution. • The worst performing instance does not necessarily have to be an outlier.
In order to satisfy all the above conditions, the input values are necessary to be normalized by themselves. In this work, normalization is achieved by applying linear transformation. We define S Ni as a normalized score of si . S Ni is calculated as follows: S Ni =
si − Smin , Smax − Smin
(8)
where Smin and Smax are minimum and maximum, respectively. According to the last listed condition the worst performing instance does not necessarily have to be an outlier. This needs to be taken into consideration when calculating Smax and Smin . They must be defined independently from the given scores for the cluster validation criteria and must not be considered as the best and the worst instance scores. The latter allows for instances that are at the decision boundary to be kept as instances with lower scores and not as outliers. For cluster validation scores with an inverse outlier definition, Smax and Smin have to be switched in the above equation (see Eq. 8). In the remainder of this section, we will discuss the specific normalization aspects for the three cluster validation measures (Silhouette Index, Connectivity and Average Intracluster Distance) introduced in Sect. 2 and used in our experiments. The absolute minimal and maximal values for Silhouette Index (see Sect. 2) are easily identifiable based on the definition. The overall maximum is 1 while the minimum, which defines an outlier, is -1. However, for the most studied real-world data sets the extreme values of -1 and 1 are not reached. This is demonstrated in Fig. 2, which depicts overview over all SI scores generated by the data sets used in our experiments. Evidently, if we normalize with these extreme values (-1 and 1) the SI scores would be distributed only in the middle of the available score range. In order to avoid this, we take into account the complete distribution of SI scores from all evaluated data sets. The values of the data points at 5% and at 95% of all calculated SI scores are taken as minimum and maximum. In this way, robust minimum and
Ensembles of Cluster Validation Indices for Label Noise Filtering
83
Fig. 2 Overview over all SI scores generated by the evaluated data sets
maximum are chosen and normalization is stable by equally spreading the data points for most data sets. However, there is still the possibility that some data sets might be more leaning towards the lower or upper threshold. Therefore further robustness measures are necessary to be applied. For example, if more than 10% of the data points are below the minimum threshold, the minimum for the normalization of this data set is switched to the value of the instance at 10%. The same applies to the upper bound. Due to such chosen maximum and minimum values, some scores might be above 1 or below 0. Obviously, the respective instances are clearly either outliers or inliers, their scores are set to 1 or 0, respectively. Figure 3 compares the original SI scores with the normalized ones on Iris data set. As one can notice the shape of scores’ curves for the three Iris classes is almost the same, since the score orientation and range are similar. Obviously, the requirement that the ranking must stay stable for outliers is taken into account. The smallest possible score value for Connectivity measure is zero. Opposite to the Silhouette Index, this lowest value does not define an outlier, but an inlier. The highest possible value of Connectivity depends on the neighborhood size nr . Evidently, the higher the nr , the more likely an instance with a high Connectivity score to be recognized as an outlier. Let us denote by Comax the maximal Connectivity value. This value is achieved when all neighbors are distributed in different classes. Then, for m objects and a neighborhood size nr , Comax is calculated as Comax =
nr 1 i i=1
(9)
84
J. Kohstall et al.
Fig. 3 SI original scores (left) versus the normalized SI scores (right) generated on Iris data set
Fig. 4 Connectivity original scores (left) versus the normalized Connectivity scores (right) generated on Iris data set (nr = 10)
The Comax and 0 are respectively used as maximum and minimum for the normalization. In comparison to Silhouette Index the data points are more spread, as it can be seen in Fig. 4. This depends on the factor nr , which is a hyperparameter and has to be determined for every data set individually. For example if we choose a too small nr this would lead to many values that are close to one, i.e. a heavy-tail distribution. A higher nr makes it more difficult for instances to be complete inliers or complete outliers, which leads to more values in the middle of the score range. In Fig. 4, as one can notice many instances have the normalized score of one. This imbalance does not affect the functionality of Connectivity, as the main focus is on instances, which are in the lower score range. In case of the Average Intracluster Distance (IC-av) the normalization is done on a cluster basis. This is because the distances between the clusters might be different and using the longest edge from a very sparse cluster on a dense cluster that only has a few outliers might effect the comparability of the IC-av with other cluster validation criteria. The maximum is found by approximating. The edges of the minimum span-
Ensembles of Cluster Validation Indices for Label Noise Filtering
85
Fig. 5 IC-av original scores (left) versus the normalized IC-av scores (right) generated on Iris data set
ning tree (MST) are sorted based on their length. In order to avoid taking the edge of an outlier which is very far away from all instances of the cluster, the edge ratio to the longest edge is calculated. This results in having the ratio of each edge compared to the longest edge in the cluster. For normalization we select the edge which is 90% or less of the longest edge. The length of this edge is considered as the maximum edge. The minimum is further calculated as the mean of all edges, since that is the minimal possible IC-av score. This allows for a robust normalization. Namely, the longest and the shortest edge for a small cluster might have the same length. In this case, the normalized score is set to one, since it cannot be determined whether it is an outlier or an inlier. In this way, the class will not be removed when filtering out outliers. The effect of cluster-based normalization of IC-av measure can be seen in Fig. 5. The normalization takes the cluster density into account. However, differently from the Silhouette Index, IC-av gives the highest scores for the instances in Class 2 Verginica.
3.2.5
Integration of Cluster Validation Indices
There are several ways to combine the scores generated by a few different cluster validation indices to build an ensemble. For example, one of the following aggregation 1/n n n xi , the geometric G = operators can be used: the arithmetic M = 1/n i=1 i=1 xi n and the harmonic H = n/( i=1 1/xi ). The choice of aggregation operator however, is crucial. Some aggregation operators can lead to a significant loss of information since their values can be greatly influenced by extreme scores (e.g., M), while others are penalizing too much for low-scoring outliners (e.g., G and H). A more robust aggregation solution is proposed in [41]. The authors have introduced a nonparametric recursive aggregation process called Multilayer Aggregation (MLA). This approach allows to combine a few different aggregation operators and to find a trade-off between their conflicting behaviour. For example, the MLA can
86
J. Kohstall et al.
be used to build an ensemble of cluster validation indices by combing the above introduced mean operators M, G and H. For instance, the initial aggregation of the normalized cluster validation scores with these operators will produce three new values. Further these values can be combined again with the same aggregation operators and again until ultimately the difference between the maximum and minimum values will be small enough to stop further aggregation. It has been proven in [41] that MLA process is convergent. In the current work, we have studied and evaluated four different value-based ensembles. Namely, the normalized cluster validation scores have been aggregated by applying M, G, H and the median, respectively. We have not integrated the cluster validation indices scores by MLA, since in the current experimental setup we use only three cluster validation measures. We plan to study the MLA-based ensemble in our future work, where the aim will be to pursue further enhancement and validation of the proposed outlier filtering approach by applying additional cluster validation measures.
4 Evaluation and Results 4.1 Experimental Setup In [5], we have initially compared the three selected cluster validation indices (see Sect. 3.1) among each other in order to evaluate their performance on detecting mislabeled instances. Their performance has further been studied and compared to those of the logical-based ensembles discussed in Sect. 3.2.1. In [5], we have also benchmarked our class outlier filtering approach against LOF method [7], which is widely used as a baseline algorithm. In the current work we focus on the evaluation and comparison of the CVI ensemble techniques discussed in the foregoing section. More specifically, we study six CVI-based ensembles: union (OR), intersection (AND), the value-based median and the three mean operators discussed in Sect. 3.2.5. We study how the filtering of mislabeled instances affects the classification accuracy of eight data sets from the UCI data repository and five learning algorithms trained with and without filtering. The algorithms that have been used are: 1 nearest neighbor (1-NN), 5 nearest neighbor (5-NN), Support Vector Machine (SVM), Gaussian Naïve Bayes (GNB), Decision Tree (CART). No parameter optimization has been performed on any of the algorithms. In order to evaluate the detection of outliers, we have injected noise in the used data sets. This has been achieved by randomly flipping 10% of the labels. We further compare the performance of the different CVI ensemble techniques to identifying those instances with the flipped labels as outliers. Notice that when evaluating the accuracy performance by removing potentially missclassified instances prior to training the original data sets without injected noise have been used.
Ensembles of Cluster Validation Indices for Label Noise Filtering
87
Each outlier filtering method has been evaluated using 5 by 10-fold crossvalidation (running 10-fold cross-validation 5 times, each time with a different seed to partition the data). In each iteration we obtain a training set and a test set. The filtering is performed only on the training set. Each learning algorithm after that is trained on the filtered training set. The results of the cross-validations are unified with the average. The unfiltered test set is used to evaluate the models. We have studied six different experimental scenarios. Initially, SI, Connectivity and IC-av scores are calculated for each instance of the considered data sets. Then the calculated scores are normalized by applying the techniques introduced in Sect. 3.2.4. Assume that T ∈ [0, 1) is a predefined cut-off threshold. All instances with normalized scores below or equal to T will be treated as label noise outliers. Next for each instance of the considered data sets the normalized cluster validation scores are combined by applying M, G, H and the median. The instances in each data set are ranked in decreasing order with respect to the combined scores. In that way four different rankings have been assigned to the instances of each data set. Then by using the given cut-off threshold (T ) bottom ranked instances can be identified and filtered out from the training set as outliers for each separate ranking. In addition, the instances of the non-filtered training data sets can be ranked in decreasing order based on the assigned normalized cluster validation scores separately for each single measure (SI, Co and IC-av) and then the outliers with respect to the given cut-off threshold can be identified and filtered out separately for each measure. The filtered out data sets produced separately for each measure can further be combined by using union (OR) or intersection (AND). Evidently, for each considered data set six different ensemble outlier filtering techniques are applied and evaluated. For each experimental scenario we have tested the following cut-off thresholds: 0, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3. Note that in the following tables and figures we use a cut-off threshold of -1 to label the case of non-filtered out training set.
4.2 Implementation and Availability The proposed CVI-based Outlier Filtering algorithm has been implemented in Python 3.6. The used data sets are publicly available from the UCI repository.1 The selected data sets are: digits, ecoli, iris, yeast, breast cancer, abalone, white wine quality, red wine quality. An additional information about the used data sets can be found in Table 1. In our experiments we have used three different cluster validation measures: Silhouette Index, Connectivity and IC-av. Silhouette Index is used from the Python library Scikit-learn. IC-av index has been implemented in Python according to the description given in Sect. 3 (see Eq. 6) while Connectivity has been coded following its R script definition. In addition, the used neighborhood size of Connectivity is 10, a default value of its R implementation. We have used the F-measure to evaluate the accuracy of the learning 1 https://archive.ics.uci.edu/ml/index.php.
88
J. Kohstall et al.
Table 1 List of data sets Data sets #Instances Digits Ecoli Iris Yeast White wine quality Red wine quality Breast cancer Abalone
1797 336 150 1484 4898 1599 286 4153
#Attributes 64 8 4 8 11 11 9 10
#Classes 10 8 3 10 7 6 2 19
Data type Numeric
Categorical Mixed
algorithms used in our experiments. The F-measure is the harmonic mean of the precision and recall values for each cluster [27]. The scikit-learn implementation of the F-measure (micro-average F1 ) has been used. The executable of the CVI-based Outlier Filtering algorithm, the used data sets and the experimental results are available at GitLab.2
4.3 Results and Discussion As it was already mentioned above we conduct and study six different experimental scenarios. We have evaluated the instances of each data set listed in Table 1 with respect to the six ensemble techniques discussed in Sect. 4.1 for five different learning algorithms. In addition, we have studied two different data setups: the original data sets versus the data sets with injected noise. In order to facilitate the interpretation and understandability of the results in the following figures and tables we present only selected examples from the data sets and learning algorithms. All the obtained results generated on the used data sets by applying the selected learning algorithms can be found at GitLab. Table 2 shows the classification accuracy values produced by the six studied scenarios on ecoli data set, while Table 3 presents the values for the corresponding six scenarios on ecoli data set with injected noise. Notice that the highest classification accuracy improvement on both data sets is obtained for CART, a cut-off threshold of 0.3 and using the median in the first data setup and intersection respectively, in the second one. Evidently, removing outliers is particularly beneficial for the CART algorithm, which is known to be sensitive to outliers. In both data setups SVM has a modest improvement due to outlier filtering, i.e. SVM is more robust to outliers than the CART algorithm. In addition, one can observe that the gained improvement in both data setups produced by the 1-NN learning algorithm is higher than the one generated by the 5-NN algorithm. The 5-NN learning algorithm is naturally more robust 2 https://gitlab.com/machine_learning_vm/outliers.
Ensembles of Cluster Validation Indices for Label Noise Filtering
89
Table 2 The classification accuracy for the five considered learning algorithm using the studied ensemble outlier filtering methods on ecoli data set Ensemble LA −1 0 0.05 0.1 0.15 0.2 0.25 0.3 Arith
1-NN 5-NN SVM GNB CART
0.819 0.863 0.878 0.814 0.755
0.823 0.863 0.879 0.811 0.756
0.840 0.865 0.878 0.830 0.712
0.841 0.865 0.879 0.838 0.870
0.841 0.865 0.879 0.838 0.870
0.841 0.865 0.879 0.838 0.870
0.850 0.864 0.875 0.836 0.875
0.849 0.864 0.875 0.835 0.875
Geom
1-NN 5-NN SVM GNB CART
0.819 0.863 0.878 0.814 0.755
0.857 0.864 0.873 0.831 0.867
0.857 0.864 0.873 0.831 0.867
0.857 0.864 0.873 0.831 0.867
0.861 0.864 0.875 0.829 0.878
0.861 0.864 0.875 0.829 0.878
0.861 0.867 0.876 0.830 0.879
0.862 0.871 0.876 0.843 0.878
Harm
1-NN 5-NN SVM GNB CART
0.819 0.863 0.878 0.814 0.755
0.819 0.863 0.878 0.814 0.755
0.857 0.863 0.875 0.829 0.870
0.861 0.867 0.876 0.830 0.879
0.860 0.869 0.881 0.844 0.880
0.864 0.874 0.877 0.848 0.881
0.864 0.874 0.877 0.848 0.881
0.867 0.880 0.872 0.846 0.882
Median
1-NN 5-NN SVM GNB CART
0.819 0.863 0.878 0.814 0.755
0.836 0.864 0.878 0.827 0.858
0.848 0.864 0.877 0.836 0.872
0.851 0.865 0.877 0.840 0.880
0.854 0.871 0.879 0.841 0.881
0.861 0.871 0.876 0.844 0.882
0.869 0.875 0.873 0.854 0.885
0.869 0.875 0.873 0.854 0.885
Union
1-NN 5-NN SVM GNB CART
0.819 0.863 0.878 0.814 0.755
0.852 0.863 0.877 0.824 0.867
0.853 0.868 0.880 0.826 0.872
0.858 0.871 0.881 0.839 0.880
0.861 0.879 0.873 0.839 0.878
0.852 0.870 0.866 0.829 0.872
0.853 0.869 0.852 0.845 0.869
0.854 0.870 0.846 0.834 0.868
Intersect
1-NN 5-NN SVM GNB CART
0.819 0.863 0.878 0.814 0.755
0.831 0.862 0.880 0.822 0.757
0.852 0.865 0.880 0.840 0.738
0.856 0.866 0.880 0.846 0.746
0.860 0.872 0.880 0.846 0.740
0.867 0.872 0.879 0.849 0.741
0.872 0.875 0.874 0.850 0.883
0.876 0.875 0.879 0.848 0.883
against label noise and therefore it will not benefit much from label noise filtering. On the other hand, the 1-NN algorithm is more sensitive to noise. Therefore we can see a bigger improvement in classification accuracy due to filtering for 1-NN than for 5-NN. The GNB algorithm has a comparatively high improvement in classification accuracy for both data sets. This means that removing outliers is also beneficial for GNB. We believe that the reason for this is that the Gaussian curve that is used in the GNB algorithm is very sensitive to outliers. It is not surprising that the improvement
90
J. Kohstall et al.
Table 3 The classification accuracy for the five considered learning algorithm using the studied ensemble outlier filtering methods on ecoli data set with injected noise Ensemble LA −1 0 0.05 0.1 0.15 0.2 0.25 0.3 Arith
1-NN 5-NN SVM GNB CART
0.711 0.804 0.809 0.693 0.607
0.727 0.811 0.812 0.718 0.621
0.746 0.814 0.817 0.728 0.653
0.757 0.818 0.815 0.730 0.753
0.763 0.819 0.816 0.734 0.796
0.768 0.819 0.815 0.741 0.794
0.784 0.818 0.809 0.757 0.795
0.790 0.820 0.813 0.766 0.798
Geom
1-NN 5-NN SVM GNB CART
0.711 0.804 0.809 0.693 0.607
0.792 0.805 0.804 0.761 0.783
0.792 0.805 0.804 0.761 0.783
0.796 0.807 0.804 0.767 0.786
0.800 0.807 0.805 0.767 0.786
0.800 0.807 0.805 0.767 0.786
0.800 0.807 0.805 0.767 0.786
0.800 0.806 0.802 0.776 0.803
Harm
1-NN 5-NN SVM GNB CART
0.711 0.804 0.809 0.693 0.607
0.711 0.804 0.809 0.693 0.607
0.800 0.807 0.802 0.761 0.785
0.799 0.803 0.804 0.764 0.785
0.790 0.800 0.801 0.763 0.786
0.796 0.802 0.799 0.768 0.792
0.797 0.809 0.803 0.776 0.804
0.797 0.809 0.805 0.781 0.802
Median
1-NN 5-NN SVM GNB CART
0.711 0.804 0.809 0.693 0.607
0.752 0.808 0.819 0.743 0.640
0.776 0.818 0.810 0.750 0.659
0.781 0.818 0.811 0.756 0.757
0.794 0.820 0.816 0.758 0.801
0.806 0.823 0.813 0.786 0.807
0.816 0.825 0.811 0.785 0.816
0.819 0.825 0.812 0.789 0.816
Union
1-NN 5-NN SVM GNB CART
0.711 0.804 0.809 0.693 0.607
0.798 0.814 0.817 0.770 0.711
0.794 0.807 0.813 0.755 0.702
0.789 0.811 0.816 0.763 0.769
0.792 0.812 0.812 0.771 0.796
0.798 0.819 0.809 0.773 0.803
0.804 0.820 0.806 0.775 0.809
0.805 0.816 0.798 0.773 0.806
Intersect
1-NN 5-NN SVM GNB CART
0.711 0.804 0.809 0.693 0.607
0.740 0.809 0.811 0.726 0.634
0.781 0.820 0.809 0.765 0.656
0.792 0.821 0.814 0.772 0.664
0.807 0.824 0.809 0.782 0.780
0.817 0.824 0.813 0.788 0.791
0.822 0.825 0.812 0.788 0.821
0.824 0.825 0.811 0.788 0.821
in classification accuracy is significantly larger for all studied learning algorithms in the second data setup (a data set with injected noise) than in the original data set. This is observed for all used data sets. Finally, it is interesting to mention that the median is involved in generating the highest results for two learning algorithms in the first data setup (Table 2) and for four algorithms in the second setup (Table 3). We will discuss and explain the reasons behind this observation below in this section.
Ensembles of Cluster Validation Indices for Label Noise Filtering
91
Table 4 shows the amount of filtered-out instances for ecoli data set with injected noise. As it is expected with more removed label noise instances the accuracy increases. In addition, when more instances are removed in total the accuracy does not decrease and stays stable as the highest accuracy are achieved in most cases at a cutoff threshold of 0.3. The difference in the performance of the six studied ensembles is also visible with respect to the amount of filtered-out instances. The arithmetic mean, the median and the intersection filter out less instances while achieving similar percentage of detected noise compared to the geometric mean, the harmonic mean and the union. Evidently, the former three ensembles are better in identifying label noise instances than the latter three ones. Figure 6 benchmarks the classification accuracy of four of the six studied ensemble techniques (all the value-based ones) against the accuracy produced by the single cluster validation indices. In order to outline the main observations about the valuebased ensembles we have selected to plot the results for two data sets and two learning algorithms. The arithmetic mean shows a very stable performance independently of the used cluster validation indices. This is well demonstrated in the case of breast cancer data set, where the best performance is achieved at a cut-off threshold of 0.25, while at the same time for the same threshold the SI measure shows a very low accuracy. The harmonic and geometric means are more dependant on the behaviour of the single cluster validation measures. For example, they especially profit from the higher performance of the corresponding measures as this can be seen in the case of ecoli data set. However, in the case of breast cancer data set they show lower results due to the modest performance of the single cluster validation indices. In summary, the performance of the arithmetic mean is less dependant on the performance of the used cluster validation indices than the one of the geometric and harmonic means. In addition, as one can notice in both plots of Fig. 6, the median usually has behaviour that is close to the best performing single measure. In order to compare the performance of the different CVI-based ensembles we have calculated the receiver operator curves (ROC) on the used data sets with injected label noise. The ROC curve plots the true positive rate (TPR) against the false positive rate (FPR). A so-called true positive means in the considered context, that an instance with a flipped label has been correctly identified. A false positive, on the other hand,
Table 4 The amount of filtered-out instances at the corresponding cut-off thresholds for ecoli data set with injected noise. The numbers in the parenthesis show the proportion of the filtered-out injected noise Ensemble #Noise 0 0.05 0.1 0.15 0.2 0.25 0.3 Arith Geom Harm Median Union Intersect
31 31 31 31 31 31
7 (0.23) 44 (0.77) 0 (0.00) 20 (0.55) 44 (0.77) 7 (0.22)
16 (0.42) 44 (0.77) 46 (0.77) 28 (0.71) 49 (0.81) 15 (0.42)
19 (0.45) 45 (0.77) 48 (0.81) 30 (0.71) 54 (0.84) 16 (0.42)
25 (0.58) 47 (0.81) 50 (0.81) 34 (0.71) 60 (0.87) 18 (0.42)
27 (0.61) 47 (0.81) 53 (0.84) 39 (0.81) 65 (0.87) 19 (0.45)
33 (0.74) 47 (0.81) 57 (0.87) 43 (0.81) 66 (0.87) 21 (0.45)
35 (0.77) 55 (0.87) 58 (0.87) 45 (0.87) 75 (0.87) 21 (0.45)
92
J. Kohstall et al.
Fig. 6 Receiver operator curves on data sets with 10% injected noise
is an inlier which has been considered as outlier. Therefore, the TPR is defined as the amount of removed outliers (true positives) divided by the total amount of outliers, while the FPR is calculated the same way for the remaining instances. In other words, in case of noise removal, if an instance that is an outlier is removed, then it is a true positive; on the other hand, if an inlier is removed, then this increases the false positive rate. Evidently, the optimal result is obtained when first all flipped instances are correctly identified and then all the remaining instances. In this case the ROC is in the top left corner and the area under the curve (AUC) reaches the maximum value. Figure 7 shows the comparison of ROCs produced by the six studied ensemble techniques for four data sets with injected noise: breast cancer, yeast, digits and ecoli. As it can be seen the outlier detection approaches based on logical operations perform worse than the rank-based or score-based ensembles. This is especially valid in the case when the cluster validation scores are combined by using the intersection. The approach of combining the ranks of the instances via the rank-based median shows promising results especially, for detecting the clearly mislabeled instances. For such instances the performance tends to be the best on most evaluated data sets. This is due to the fact that the used cluster validation indices agree on those instances being outliers. In the case of less clear outliers the different cluster validation indices tend to agree less which causes more inliers being detected as outliers. This can be overcome by involving more and alternative cluster validation measures that evaluate different aspects of the clustering solutions and in that way manage to achieve a more balanced result. On most data sets the three value-based approaches and the median show a very similar performance which tends to be better compared to the logical-based ensembles (see Fig. 7). In addition, the rank-based median demonstrates a better performance especially on more complex data sets (e.g., see yeast data set). Figure 8 presents, for each ensemble method, the average of classification accuracy for all five learning algorithms on three data sets: abalone, red wine quality and white wine quality. It benchmarks the results generated on the original data sets (the left column) against those obtained on the corresponding data sets with injected noise (the right column). As it can be seen the ensemble techniques show different
Ensembles of Cluster Validation Indices for Label Noise Filtering
93
performance on the different data sets. For example, the arithmetic mean and the median produce one of the highest results on abalone data set, but not for the red wine quality data set, where the union and the harmonic mean perform better. The best performing ensemble can also be different between the original data set and the corresponding one with injected noise, e.g., this is the case with white wine quality data set. The arithmetic mean, the intersection and the median, all the three have the highest accuracy values in the first data set versus the union in the second one for cut-off threshold values below 0.15. Interestingly, the union and the harmonic mean show higher improvement of the classification accuracy than the other ensembles in case of both red wine quality data sets (the original and one with injected noise) and cut-off threshold values below 0.25. Generally, in case of union with the increasing amount of filtered-out data the outliers are getting more disjoint which results in a bigger set of removed outliers, especially for cut-off threshold values above 0.1. The latter can be noticed in all six plots of Fig. 8, where the classification accuracy of union degrades when the number of out-filtered data increases. In contrast to this the results produced by the intersection are not influenced so strongly from the increasing of the cut-off threshold value. However, it also has the worst performance in the most studied scenarios (see Fig. 7). This is due to the fact that the individual cluster validation measures consider different points as outliers and the intersection
Fig. 7 Receiver operator curves on data sets with 10% injected noise
94
J. Kohstall et al.
Fig. 8 Average of classification accuracy over all learning algorithms for the six studied ensemble methods
Ensembles of Cluster Validation Indices for Label Noise Filtering
95
only filters out points that are considered as outliers by all cluster validation measures. In addition, it is interesting to notice the big difference in the performance of the three mean operators, especially in the case of red wine quality data set. This very well illustrates the discussion about the mean operators’ behaviour in Sect. 3.2.5. The main conclusion from the above discussions is that the performance of CVIbased outlier filtering ensemble can be improved by its customization to the data set under consideration. This can be done by selecting the proper combination of cluster validation measures and ensemble techniques through conducting an initial study on the performance of the involved individual cluster validation measures and the corresponding combinations on this data set.
5 Conclusions and Future Work In this work, we have proposed an extension of our CVI-based outlier filtering approach. We have been interested in defining an outlier scoring method that assigns a degree of outlierness to each instance in the training set. Instead of directly combining the cluster validation measures scores by applying a logical or a rank-based ensemble we have introduced a preprocessing score normalization step. Furthermore, we have studied and evaluated different score-based ensemble methods for assembling the cluster validation indices scores. The accuracy improvement of removing outliers and label noise in this guided way has been evaluated on eight data sets from the UCI repository for five different learning algorithms. The obtained results have demonstrated that the proposed approach is a robust outlier filtering technique that is able to improve classification accuracy of the learning algorithms. Our approach allows to design a label noise mining measure that is customized and specially suited for the machine learning task under consideration. Namely, we can initially study and select a proper combination of cluster validation measures that reflects the specific properties of the involved data and learning algorithms. For future work, the aim is to pursue further enhancement and validation of the proposed outlier filtering approach by applying alternative cluster validation measures on a higher variety of data sets and learning algorithms. In addition, we plan to study different cluster validation indices normalization techniques, as well as different ensemble methods. A unified robust CVI-outlier score would allow the application of the proposed approach in different application domains and scenarios for detecting and treating outliers and we will study further these opportunities. Acknowledgements This work is part of the research project “Scalable resource efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.
96
J. Kohstall et al.
References 1. C.C. Aggarwal, Outlier ensembles: Position paper. ACM SIGKDD Explor. Newsl. 14(2), 49–58 (2013) 2. A.E. Bayá, P.M. Granitto, How many clusters: A validation index for arbitrary-shaped clusters. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 10(2), 401–414 (2013) 3. J. Bezdek, N. Pal, Some new indexes of cluster validity. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 28(3), 301–315 (1998) 4. V. Boeva, J. Kohstall, L. Lundberg, M. Angelova, Combining cluster validation indices for detecting label noise, in Archives of Data Science, Series A, p. submitted (2018) 5. V. Boeva, L. Lundberg, M. Angelova, J. Kohstall, Cluster validation measures for label noise filtering, in 9th IEEE International Conference on Intelligent Systems (IS’18), pp. 109–116 (2018) 6. L. Breiman, Bagging predictors. Mach. Learn. 24(2), 123–140 (1996) 7. M.M. Breunig, H.P. Kriegel, R.T. Ng, J. Sander, Lof: identifying density-based local outliers, in ACM Sigmod Record, vol. 29. (ACM, 2000), pp. 93–104 8. C.E. Brodley, M.A. Friedl, Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131– 167 (1999) 9. O. Chapelle, B. Scholkopf, A. Zien, Semi-supervised learning. IEEE Trans. Neural Netw. 20(3), 542–542 (2009) 10. P. Davidsson, Coin classification using a novel technique for learning characteristic decision trees by controlling the degree of generalization, in 9th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (1997), pp. 403–412 11. C. Dwork, R. Kumar, M. Naor, D. Sivakumar, Rank aggregation methods for the web, in Proceedings of the 10th International Conference on World Wide Web (ACM, 2001), pp. 613– 622 12. B. Frénay, M. Verleysen, Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014) 13. D. Gamberger, N. Lavrac, S. Dˇz eroski, Noise detection and elimination in data preprocessing: Experiments in medical domains. Appl. Artif. Intell. 14(2), 205–223 (2000) 14. M. Halkidi, Y. Batistakis, M. Vazirgiannis, On clustering validation techniques. J. Intell. Inf. Syst. 17(2–3), 107–145 (2001) 15. J. Handl, J. Knowles, D. Kell, Computational cluster validation in post-genomic data analysis. Bioinformatics 21(15), 3201–3212 (2005) 16. Z. He, S. Deng, X. Xu, Outlier detection integrating semantic knowledge, in International Conference on Web-Age Information Management. (Springer, 2002), pp. 126–131 17. Z. He, X. Xu, J. Huang, S. Deng, Mining class outliers: Concepts, algorithms and applications in crm. Expert. Syst. Appl. 27(4), 681–697 (2004) 18. N. Hewahi, M. Saad, Class outliers mining: Distance-based approach. Int. J. Intell. Syst. Technol. 2, 5 (2007) 19. A. Jain, R. Dubes, Algorithms for Clustering Data (Prentice-Hall Inc, Upper Saddle River, NJ, USA, 1988) 20. P.A. Jaskowiak, D. Moulavi, C.A. Furtado, R.J. Campello, A. Zimek, J. Sander, On strategies for building effective ensembles of relative clustering validity criteria. Knowl. Inf. Syst. 47(2), 329–354 (2016) 21. T.M. Khoshgoftaar, P. Rebours, Generating multiple noise elimination filters with the ensemblepartitioning filter, Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference on IEEE (2004), pp. 369–375 22. T.M. Khoshgoftaar, N. Seliya, K. Gao, Rule-based noise detection for software measurement data, Information Reuse and Integration, 2004. IRI 2004. Proceedings of the 2004 IEEE International Conference on IEEE (2004), pp. 302–307 23. R. Kohavi, A study of cross-validation and bootstrap for accuracy estimation and model selection, in Ijcai, vol. 14. (Montreal, Canada 1995), pp. 1137–1145
Ensembles of Cluster Validation Indices for Label Noise Filtering
97
24. R. Kolde, S. Laur, P. Adler, J. Vilo, Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics 28(4), 573–580 (2012) 25. H.P. Kriegel, P. Kroger, E. Schubert, A. Zimek, Interpreting and unifying outlier scores, in Proceedings of the 2011 SIAM International Conference on Data Mining. (SIAM, 2011), pp. 13–24 26. J.M. Kubica, A. Moore, Probabilistic noise identification and data cleaning, in ICDM (2003), pp. 131–138 27. B. Larsen, C. Aone, Fast and effective text mining using linear-time document clustering, in Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (ACM, 1999), pp. 16–22 28. N. Lavesson, P. Davidsson, A multi-dimensional measure function for classifier performance, Intelligent Systems, 2004, in Proceedings. 2004 2nd International IEEE Conference, vol. 2. (IEEE, 2004), pp. 508–513 29. Y. Liu, Understanding and enhancement of internal clustering validation measures. IEEE Trans. Cybern. 43(3), 982–994 (2013) 30. E. Müller, I. Assent, P. Iglesias, Y. Mulle, K. Bohm, Outlier ranking via subspace analysis in multiple views of the data, in Data Mining (ICDM), 2012 IEEE 12th International Conference on IEEE (2012), pp. 529–538 31. E. Müller, I. Assent, U. Steinhausen, T. Seidl, Outrank: Ranking outliers in high dimensional data, in Data Engineering Workshop, 2008. ICDEW 2008. IEEE 24th International Conference on IEEE (2008), pp. 600–603 32. H.V. Nguyen, H.H. Ang, V. Gopalkrishnan, Mining outliers with ensemble of heterogeneous detectors on random subspaces, in International Conference on Database Systems for Advanced Applications (Springer, 2010), pp. 368–383 33. S. Papadimitriou, C. Faloutsos, Cross-outlier detection, in International Symposium on Spatial and Temporal Databases. (Springer, 2003), pp. 199–213 34. P. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987) 35. R.E. Schapire, The strength of weak learnability. Mach. Learn. 5(2), 197–227 (1990) 36. E. Schubert, R. Wojdanowski, A. Zimek, H.P. Kriegel, On evaluation of outlier rankings and outlier scores, in Proceedings of the 2012 SIAM International Conference on Data Mining (SIAM, 2012), pp. 1047–1058 37. N. Segata, E. Blanzieri, Fast and scalable local kernel machines. J. Mach. Learn. Res. 11, 1883–1926 (2010) 38. M. Smith, T. Martinez, Improving classification accuracy by identifying and removing instances that should be misclassified, in Neural Networks (IJCNN), The 2011 International Joint Conference on IEEE (2011), pp. 2690–2697 39. M. Smith, T. Martinez, A comparative evaluation of curriculum learning with filtering and boosting in supervised classification problems. Comput. Intell. 32(2), 167–195 (2016) 40. I. Tomek, An experiment with the edited nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. SMC–6(6), 448–452 (1976). https://doi.org/10.1109/TSMC.1976.4309523 41. E. Tsiporkova, V. Boeva, Nonparametric recursive aggregation process. Kybern. J. Czech Soc. Cybern. Inf. Sci. 40(1), 51–70 (2004) 42. L. Vendramin, R. Campello, E.R. Hruschka, Relative clustering validity criteria: A comparative overview. Stat. Anal. Data Min. ASA Data Sci. J. 3(4), 209–235 (2010) 43. L. Vendramin, P. Jaskowiak, R. Campello, On the combination of relative clustering validity criteria, in Proceedings of the 25th International Conference on Scientific and Statistical Database Management (ACM, 2013), p. 4 44. D. Xu, Y. Tian, A comprehensive survey of clustering algorithms. Ann. Data Sci. 2(2), 165–193 (2015) 45. X. Zeng, T.R. Martinez, An algorithm for correcting mislabeled data. Intell. Data Anal. 5(6), 491–502 (2001)
98
J. Kohstall et al.
46. A. Zimek, R.J. Campello, J. Sander, Ensembles for unsupervised outlier detection: Ehallenges and research questions a position paper. Acm Sigkdd Explor. Newsl. 15(1), 11–22 (2014) 47. A. Zimek, E. Schubert, H.P. Kriegel, A survey on unsupervised outlier detection in highdimensional numerical data. Stat. Anal. Data Min. ASA Data Sci. J. 5(5), 363–387 (2012)
Interpretation, Modeling, and Visualization of Crowdsourced Road Condition Data Pekka Sillberg, Mika Saari, Jere Grönman, Petri Rantanen and Markku Kuusisto
Abstract Nowadays almost everyone has a mobile phone and even the most basic smartphones often come embedded with a variety of sensors. These sensors, in combination with a large user base, offer huge potential in the realization of crowdsourcing applications. The crowdsourcing aspect is of interest especially in situations where users’ everyday actions can generate data usable in more complex scenarios. The research goal in this paper is to introduce a combination of models for data gathering and analysis of the gathered data, enabling effective data processing of large data sets. Both models are applied and tested in the developed prototype system. In addition, the paper presents the test setup and results of the study, including a description of the web user interface used to illustrate road condition data. The data were collected by a group of users driving on roads in western Finland. Finally, it provides a discussion on the challenges faced in the implementation of the prototype system and a look at the problems related to the analysis of the collected data. In general, the collected data were discovered to be more useful in the assessment of the overall condition of roads, and less useful for finding specific problematic spots on roads, such as potholes. Keywords Models · Data gathering · Data analysis · Visualization · Sensors · Mobile devices
1 Introduction It is important to keep road networks in good condition. These days, technology and mobile devices in particular enable the automation of environmental observation [1, 2]. Mobile phones can be deployed for a particular purpose for which they were not originally designed. In addition, applications that combine road maintenance and mobile devices have already been developed [3]. In Finland, there has been a similar P. Sillberg (B) · M. Saari · J. Grönman · P. Rantanen · M. Kuusisto Faculty of Information Technology and Communication Sciences, Tampere University, Pori, Finland e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_5
99
100
P. Sillberg et al.
study on how to utilize mobile phones for collecting road condition information [4]. In the study, bus companies tested mobile phone software that sends real-time weather condition data to road maintainers in winter time. Nevertheless, traditional road condition monitoring requires manual effort—driving on the roads and checking their condition, observing traffic cameras, and investigating reports and complaints received from road users. Automation of the monitoring process, for example by utilizing crowdsourcing, could provide a more cost-efficient solution. Data gathering is an important part of research related to the Internet of Things (IoT) [5]. In this research, the focus of data gathering has been redirected toward a Wireless Sensor Network (WSN) [6] type of solution. Previously, we have studied technologies related to applications that automate environmental observations utilizing mobile devices. In a recent research study [7], we introduced two cases: the tracking and photographing of bus stops, and the tracking and photo-graphing of recycling areas. The first case used mobile phones and the second used a Raspberry Pi embedded system. Our other study [8] facilitated the utilization of information gathered from road users. As part of the research work, a mobile application was developed for gathering crowdsourced data. The gathered data per se are not very usable and therefore some kind of processing is necessary. Ma et al. discussed IoT data management in their paper [9] and focused on handling data in different layers of WSN. Also, they discussed data handling challenges, approaches, and opportunities. In this study we use our previously introduced Faucet-Sink-Drain model [10]. In this model the data processing and data sources are combined in a controlled and systematic way. This paper is an extension of Sillberg et al. [11], where the focus was on introducing the prototype system. In this extension paper, more emphasis is placed on the models behind the prototype system. We have developed a mobile application for sensing road surface anomalies (called ShockApplication). The purpose of this application is to sense the vibration of a mobile phone installed in a car. The application was tested by gathering data on real-life scenarios. The data were stored in a cloud service. In addition, we present methods that utilize the free map services available on the Internet for visualization of the data. The research goal in this paper is to combine models of (1) data gathering and (2) analysis of the gathered data that enables effective data processing of large data sets. Both models were applied and tested in the developed prototype system. Our previous studies related to the models are presented in Sect. 3, where the data gathering model and the modifications made for this study are introduced in Sect. 3.1. Data processing produces useful information for the user. Section 3.2 describes the processing model used in the prototype system. This model is designed as a general-purpose tool for systematic control and analysis of big data. With the use of these fundamentally simple models it is possible to create practical and interoperable applications. The rest of this paper is structured as follows. In Sect. 2, we introduce the related research on crowdsourcing efforts in the collection of road condition data. Section 4 integrates the models presented in Sect. 3. In Sect. 5, we present the test setup and results. Section 6 includes a discussion and suggestions for future research on the topic and finally, the study is summarized in Sect. 7.
Interpretation, Modeling, and Visualization of Crowdsourced …
101
2 Background Nowadays almost everyone has a mobile phone and even the most basic smartphones often come embedded with a variety of sensors [2]. This opens up the possibility of crowdsourcing through the use of mobile phones. The term crowdsourcing was defined by Howe [12] in 2006. When several users use their devices for gathering data for a specific purpose, it can be considered a crowdsourcing activity. The idea of utilizing crowdsourcing as a model for problem solving was introduced in [13]. Furthermore, crowdsourcing can be used to support software engineering activities (e.g., software development). This matter has been widely dealt with in survey [14]. There have been several studies on using a mobile phone to detect road surface anomalies. One piece of research [15] presented an extensive collection of related studies. Further, the research introduced an algorithm for detecting road anomalies by using an accelerometer and a Global Positioning System (GPS) integrated into a mobile phone. The application was described as easy-to-use and developed for crowdsourcing, but the crowdsourcing aspects were not elaborated. The tests were performed with six different cars at slow speeds (20 and 40 km/h). The route used in the test was set up within a campus area. The research paper did not discuss the visualization aspect nor the application itself and focused primarily on the algorithm that was presented. The research presented in [16, 17] was aimed at finding particular holes in a certain road. Alqudah and Sababha [16] used a gyroscope instead of an accelerometer and looked for spikes in the data. The other information logged was sampling time, speed, and GPS locations. The test was conducted on a route that was about four kilometers long and the test was repeated five times to ensure consistency and repeatability. The crowdsourcing aspect was not mentioned and, according to the paper, the data were collected “through a common repository.” The research [17] presented an Android application for detecting potholes, but did not provide much detail on the technical implementation. There are several studies where the research was performed in a real-life scenario using taxis [18, 19] or buses [20]. In study [18], the data were gathered by seven taxis in the Boston area. The data collection devices were embedded computers running on a Linux-based operating system. In study [19], the data were gathered by 100 taxis in the Shenzhen urban region. The devices consisted of a microcontroller (MCU), a GPS module, a three-axis accelerometer, and a GSM module. The devices were mounted inside the cars and sent the data to servers over a wireless connection. The main idea of the research [18] was to collect data and then train a detector based on the peak X and Z accelerations and instantaneous velocity of the vehicle. The result reported in the paper was that over 90% of the potholes reported by the system were real potholes or other road anomalies. The crowdsourcing aspect was not mentioned, and the visualization was limited to showing a set of detections on a map. In study [20], the data were gathered by phones installed in buses. The data were projected on a map, but the amount of data collected (100 MB/week) and how this would affect a larger crowd were not discussed.
102
P. Sillberg et al.
3 Two-Phased Model of Data Processing The research goal in this paper is a combination of models for (1) data gathering and (2) analysis of the gathered data which enables effective data processing of large data sets. Both models were applied and tested in the developed prototype system. With the use of these fundamentally simple models, it is possible to create highly practical and interoperable applications that can improve the overall quality of software. The data gathering model and the modifications made for this study are introduced in Sect. 3.1. The model is one type of Wireless Sensor Network (WSN) solution. In addition, the usage of the model in our previous research is introduced. Section 3.2 describes the processing model used in the prototype system. The processing model is designed as a general-purpose tool for systematic control and analysis of big data. However, the model is very flexible and should fit a wide range of applications.
3.1 Data Gathering Data gathering is an important part of research on the Internet of Things (IoT). In this research, the focus of data gathering has been redirected toward the WSN type of solution. Because we use mobile phones as sensor nodes, it could be categorized as a mobile sensor network. The advantages of a mobile sensor network have been discussed by Dyo [21]. In addition, Leppänen et al. [22] discuss using mobile phones as sensor nodes in data collection and data processing. A survey conducted in 2002 compiled the basic features of sensor networks [23]. In this study, we used the previously presented data gathering model. This model was introduced by Saari et al. [24] and it has three main parts: sensor node, master node, and cloud. The sensor node sends data to the master node. The master node collects and saves data, but does not process the data significantly. The master node sends data to the cloud service which stores the data. The data gathering model includes the following WSN features presented in [23]: • Sensor nodes can be used for continuous sensing—When using a mobile phone as a sensor node, this is enabled by dedicated software. • The mobile phone includes the basic components of a sensor node: sensing unit, processing unit, transceiver unit, and power unit. • A sensor network is composed of a large number of sensor nodes—The proto-type design presented in this study does not limit the number of mobile phones used. • The network—Mobile phones have the communication network provided by telecommunications companies. The model has been tested with an off-the-shelf credit card sized computer and other instruments [24–26]. The data collector service [25] used a BeagleBone Black computer and sensors. The embedded Linux controlled sensor network [24] used
Interpretation, Modeling, and Visualization of Crowdsourced …
103
Fig. 1 The modified data gathering model
Arduino boards and sensors for the sensor nodes and an Intel Galileo Computer for the master node. Communication between sensor nodes and master nodes was handled with ZigBee expansion boards. The third study [26] used the model to test a low-energy algorithm for sensor data transmission from sensor nodes to master node. Figure 1 shows the modified data gathering model. The present study differs from previous research in that we used mobile phones for data gathering, which caused changes to the data gathering model. Another difference from the previous model [24] is that the sensor nodes and master nodes are combined into one entity. This was due to the use of mobile phones as sensor devices. The mobile phone includes the necessary sensors, data storage, and communication channels for this proto-type system. In addition, the mobile phones use the Android operating system (OS), which has enough capabilities to gather and store data. Also, the communication protocols are supported by OS. We developed the testing software during this research. This software, called the ShockApplication, and its properties are described later in Sect. 5.1. The usage of mobile phones enabled the crowdsourcing idea. The developed ShockApplication can be installed on all modern Android phones. The user has an identification mark which helps to order the data points in the cloud. The data are stored in a cloud service.
3.2 Data Processing: Manageable Data Sources For the data processing part, the Faucet-Sink-Drain model introduced in [10] is applied to the system architecture. The ultimate goal of the model is to enable realization of a framework that is able to manage data and data sources in a controlled and systematic way [10]. In this study, the model was applied to the proto-type system, but the implementation of the framework was not carried out. This prototype is
104
P. Sillberg et al.
the first instance of the model in a real-world use case and will help in the further evaluation and development of the model. The model considers that data processing can be modeled with a water piping apparatus consisting of five components: faucets, streams, sink, sieves, and drains [10]. The data flow through the model as many times as is deemed necessary to achieve the desired information. At each new cycle, a new set of faucets, sieves, and drains are created, which generate new streams to be stored in the sink [10]. The components of the Faucet-Sink-Drain model are shown in Fig. 2. The faucet is the source of the data (e.g., original source or processed source). The running water (i.e., strings of numbers and characters) are instances of data streams, and the sink is used for storing of the data. The sieve is a filter component with the capability of selecting and processing any chunk of any given data stream. The drain is a piping system to transfer data to other locations. The drain may also be utilized for removal of excess data [10]. The Faucet-Sink-Drain model, by design, does not specify how the data are gathered into it. As shown in Fig. 2, the initial data simply appear in the model by means of the attached faucet (or faucets). The gap can be filled by utilizing models that are stronger in this respect, such as the data collection model described in Sect. 3.1. Fig. 2 Abstract data processing model [10]
Interpretation, Modeling, and Visualization of Crowdsourced …
105
4 Integration of the Models in the Prototype System The models used lay out the basis for measurement and data analysis. By following them, it is then possible to implement the artifacts of the prototype system. The implemented prototype system has five identifiable high level tasks: 1. Acquisition: The data are gathered by a mobile device, which acts as a combined sensor-master node as it is capable enough for both of those tasks. 2. Storage: The cloud service receives and parses the data (communicated by the master node). Parsing of the data is the first task to be done on the system before the received data can be fully utilized. After parsing is finished, the service can then proceed by storing and/or by further processing the data. 3. Identification and Filtering: The data will be identified and filtered when the service receives an HTTP GET query on its REST (Representational State Transfer) interface. The selection is based on the rules that are passed in the request as parameters. 4. Processing: The selected data are processed further by the rules given out by the program. 5. Visualization: The data provided by the service are finally visualized in a client’s user interface, e.g., web browser. The data gathering is performed by a mobile phone by utilizing several of its available sensors. Secondly, the collected data are communicated to the cloud service where storage, selection, and further processing of the data are implemented. Once the data have been processed the last time, they are ready to be presented to the user, for example, to be visualized in a web browser or provided to another service through a machine-to-machine (M2M) interface. Figure 3 shows the deployment diagram of the implemented system. It also depicts where the aforementioned tasks are carried out. These tasks can also be identified from the incorporated models, the Data Gathering model and the Faucet-Sink-Drain model. The first task, data acquisition, corresponds to the whole data gathering model and also to the combination of the (leftmost) faucet and stream icons in Fig. 2. The storage task matches the sink icon in Fig. 2. The (right-most) sieve in Fig. 2 represents the third task, identification and filtering whereas the combination of (rightmost) drain and faucet represent the processing task. The final step, visualization, is said to be handled by the sink as it is “used to store and display data” [10]. However, the visualization step could begin as early as when a data stream has emerged from a faucet and could last until the moment the data have finally been drained out from the sink.
5 Testing The high-level description of our testing setup is illustrated in Fig. 4. The purpose was to gather data from mobile devices—primarily smartphones—that could be used to
106
P. Sillberg et al.
Fig. 3 System deployment diagram
Fig. 4 High-level diagram of the test setup
detect the surface condition of the road being driven on. These data could be further refined into more specific data, such as reports of bumps on the road, uneven road surfaces, roadworks, and so on. The traffic signs visualize the possible roadside conditions that users might be interested in. The data are sent to a central service and can be later browsed using a user interface running in a web browser.
Interpretation, Modeling, and Visualization of Crowdsourced …
107
In our case, the users travelled by car. In principle, other road users such as cyclists or motorcyclists could be included, but in the scope of this study, only passenger car users were considered.
5.1 Setup Existing studies often assume that the device is firmly attached in a specific place inside the vehicle, and in a specific way, but for crowdsourcing purposes this is not a feasible scenario. It should be possible to attach the device in a way that is the most convenient for the user, and in an optimal scenario the device could also be kept, for example, inside the pockets of the user. In our benchmarks, the device holder was not limited although we presumed that the devices were placed in a fairly stable location, and did not move about the vehicle in an unpredictable fashion (e.g., sliding along the dashboard). In addition to the attachment of the device, several other factors (e.g., suspension, tires, vehicle load, and weight) may affect the sensor reading. It can be challenging to implement measurement of these factors in crowdsourcing scenarios. Due to these limitations, we decided to focus on sensors available in commonly used mobile devices. The testing software itself was a simple Android application, usable on any reasonably recent Android phone. Most of the newer smartphones generally contain all the necessary sensors required in our use case. The application consists of a single main view, shown on the left side of Fig. 5. In our case, the user only needs to input his/her credentials (in the example, “user”) and use the start and stop buttons to control when the sensors are active. The user interface also contains a few convenience functions: the possibility to attempt manual transmission of all collected data; a count, which shows the total number of measurements (a single measurement contains all sensor data collected at a particular point in time, in the example pictures taken from an Android emulator the value is shown simply as “0”); the option to create all measurements as “public”, which means that any logged-in user can see the travelled route and the collected measurements; the option to save the updated settings, mainly authentication details; and two debug options that the users do not generally need to use. The software will automatically select between the linear accelerometer (which is used, if available) and the basic accelerometer. If the device is set on a stable surface the linear accelerometer should show zero for all axes and the accelerometer should show gravity, but in practice the devices showed slight variances from the expected values. The “show systematic error” option can be used to show the currently measured values and to select whether the systematic error should be removed from the values before sending the results to the service. The “print log” can be used to show a debug log of the events (such as errors) detected since application startup. It would have also been a minor matter to simply hide the debug options from the user interface, but as the primary purpose of the application was to collect data and this version of the application would not be made available for
108
P. Sillberg et al.
Fig. 5 The Android test client
public download and installation (e.g., in an application store), there was no specific need to polish the user interface. Thus, the users were simply instructed to input their credentials and use the start and stop buttons, and to ignore the other options. The sensor measurements are collected by an Android foreground service, which runs as a background process. After the service has been started, the main application can be freely closed and the statistics of the collected data (number of measurements) can be seen in the Android’s pull-down menu, which is visible on the right side of Fig. 5. In the trial, the users kept the sensors on while driving (i.e., when “participating” in the trial) and off at other times. In addition to changing the user credentials, no further configuration was required by the users. The application was used to measure accelerometer data (X, Y, and Z acceleration), direction, speed, location (GPS coordinates), and timestamps. The collected information was automatically sent to the service at pre-defined intervals (every 30 min) by the background process. In addition, gyroscope and rotation data were stored on-device in an SQLite database for possible future debugging or testing purposes (e.g., for detecting braking or acceleration events, or the orientation of the device in general), but these data were not synchronized with the service.
Interpretation, Modeling, and Visualization of Crowdsourced …
109
For practical reasons (e.g., limitations in the available server capacity), the user trial was not open to an unlimited number of users. A total of ten users participated in the trial, of which half were university personnel and the other half volunteers from the staff of the City of Pori and from a company participating in our research project. The users either used their own smartphones or borrowed one from the university. The user’s choice of car was not limited, but as the users generally drove their own cars, the selection of cars driven turned out to consist of smaller personal cars. A couple of users reported driving two different cars, so the number of cars was slightly higher than the number of users. The routes driven were a mixed set of commuting, work-related trips, and leisure. The majority of the driving involved consisted of driving from home to work, as reported by the users. This can also be seen in the collected data, as the same (identical) routes were driven on a daily basis. Most of the driving was concentrated around the cities of Pori and Rauma, located on the west coast of Finland. Additional driving was done around the city of Tampere, which is located further inland, including the highway connecting Pori to Tampere. The distances were approximately 110 km between Pori and Tampere and 50 km between Pori and Rauma. Pori and Rauma are slightly smaller cities (with populations of about 85,000 and 40,000, respectively) whereas Tampere is the third largest city in Finland (with a population of about 232,000), although in the case of Tampere the routes driven were located mostly outside the city center. The routes are also illustrated in Fig. 6 (Sect. 5.3). The total duration of the testing period was about three months (from March 2018 to June 2018).
Fig. 6 Visualization of routes driven
110
P. Sillberg et al.
5.2 Results The number of data points can be seen in Table 1, where the count and percentage figures of the data are grouped by different Shock Levels. The shock levels are arbitrary levels used for breaking down the data from the accelerometer readings. The first row (LN/A ) indicates the data points where the test device did not calculate the shock level. The highest level (L4 ) represents the most intense values reported by the accelerometer. The levels can be recalculated afterwards for each device if needed. The shock levels are further discussed in Sect. 5.3. The data point count on the left side of Table 1 includes all data regardless of the speed, and the right side omits speeds below 1 m/s. We have arbitrarily chosen 1 m/s to be the lowest speed recorded and taken into account in our test. This prevents the device from collecting data when the vehicle ought to be stationary, and helps to reduce the amount of unnecessary data. In the further analysis of the data, only the pre-calculated shock level data where the speed is at least 1 m per second are included (nLEVEL = 145,106). This represents approximately 30 percent of the total data collected. No further data have been eliminated from this data set. The relative percentage figures for each level in nLEVEL are L0 = 67.7, L1 = 29.0, L2 = 2.35, L3 = 0.62, and L4 = 0.25. Tables 2 and 3 illustrate how the speed affects the measured shock intensity in the collected data. Rows 1–5 display the data of each individual level, while the last row (L0–4 ) indicates the summarized information including each level. Table 2 indicates the average speed (vAVG ) and the standard deviation (vSTD ) in each group. The average speed is quite similar on each level, while the standard deviation is only slightly lower on levels L0 and L1 than on the others. Additionally, the average speed and standard deviation of all data points (i.e., data with and without shock levels) was 68.0 and 23.4 km/h. The respective values for data points without a shock level were 69.2 and 21.6 km/h. The average speed and standard deviation information alone seem to support the fact that the reported shock levels occur around a speed of 65 km/h. However, when the data are further divided into speed-based intervals, Table 1 Breakdown of shock data points Shock level
v ≥ 0 m/s
v ≥ 1 m/s
n
%
n
%
334,730
69.3
312,334
68.3
L0
98,367
20.4
98,320
21.5
L1
45,083
9.34
42,101
9.20
L2
3419
0.71
3413
0.75
L3
904
0.19
904
0.20
L4
368
0.08
368
0.08
Total count
482,871
100
457,440
100
Total count with level
148,141
30.7
145,106
31.7
LN/A
Interpretation, Modeling, and Visualization of Crowdsourced … Table 2 Average speed per shock level
Shock level
111
Speed (km/h) vAVG
vSTD
L0
63.7
27.2
L1
70.5
24.4
L2
64.3
30.2
L3
59.6
32.4
L4
55.0
32.3
L0–4
65.6
26.8
Table 3 Distribution of data points per shock level Shock level
Data point distribution based on speed (%) Right-open intervals (km/h) [3.6, 20]
[20, 40]
[40, 60]
[60, 80]
[80, 100]
[100, 120]
L0
76.9
70.8
78.2
68.2
60.8
63.0
L1
17.8
25.5
19.1
29.5
35.9
32.9
L2
3.50
2.51
1.90
1.74
2.53
2.87
L3
1.28
0.76
0.53
0.43
0.54
0.91
L4
0.56
0.20
0.29
L0–4
7.83
0.40 13.6
0.25 14.8
0.14 22.1
36.7
4.99
the average speeds can be seen to be slightly higher, and about two-fifths of the data points are located above the 80 km/h limit. Based on the data, it can be observed that algorithms used for detecting vibrations and road condition anomalies should cover at least the common urban area speed limits (from 40 to 60 km/h) and preferably up to highway speeds (from 80 to 100 km/h). In the area around the city of Pori, lower speeds were less represented than higher speeds. Thus, algorithms developed only for slower speeds would not be feasible for practical implementations. Table 3 displays the distribution of data points belonging to a given speed interval. There are six right-open intervals starting from 3.6 km/h (i.e., 1 m/s), and ending at 120 km/h. The last row (L0–4 ) indicates the percentage share of data in each speed interval of all data points. The bulk of the data belongs to the lowest level. The lowest level (L0 ) appears to be over-represented in the lowest three speed intervals (3.6–60 km/h) whereas a small amount of the percentage share seems to have shifted from the lowest level (L0 ) to the next level (L1 ) in the last two speed intervals (80–120 km/h). It seems logical that higher speeds (i.e., greater energy) create more variance in the vibration detected by the sensor, but on the other hand, levels L2 , L3 , and L4 appear slightly less often at higher speeds. It can only be speculated whether the reason is—for example—due to the better overall condition of roads with higher
112
P. Sillberg et al.
speed limits, or the fact that the phone/sensor is simply not able to record everything because it is not necessarily mounted in the car securely. Speeds above 120 km/h account for a negligible amount of data points (totaling 38 data points), thus the information is not shown in Table 3. Almost three-fifths (58.8%) of the data points are distributed between 60 and 100 km per hour. The phenomena can be explained by two facts. First, the data collection was conducted mostly on longer distance journeys on the highways between major cities, corresponding to higher speed limits and a longer time spent on the road. Second, heavy traffic in the tested area is not commonly observed. More detailed information may be retrievable if the data are observed on the user/device level rather than on the global level. In future, it might also be worthwhile recalculating the data in four levels instead of five to obtain a clearer distinction between “good road condition” data and “bad road condition” data. Currently, levels L0 and L1 seem to overlap, and contain both data types.
5.3 Visualization Five levels (0–4) were used for describing the detected condition of the road. The number of levels has no specific meaning, and another amount of levels could be chosen for more coarse or fine-tuned results. The levels are dynamically calculated per device, with level L0 being the “normal” of the device and L4 being the most extreme. In the current version of our application, the calculations do not take speed into consideration, even though speed does have an effect on the intensity of the measured values (e.g., variance). An exception to this is the exclusion of very low speed values (e.g., < 1 m/s), which could be caused by the user temporarily leaving the vehicle to walk about or be erroneous values caused by GPS inaccuracies when the vehicle is not in fact moving. In any case, even without utilizing the velocity data, the measured levels seem to correspond fairly accurately to the overall road conditions. Still, improved analysis of speed data could perhaps be used to further increase the accuracy of the level calculations. In our case, the levels can be calculated either from the long-term data collected on the device (or from the data stored for testing purposes on the server), or by using a smaller data set, such as the data collected within the last 30 min. Ultimately, we decided to use smaller data sets when calculating the levels and showing the visualization on the map. The primary purpose of this was to minimize the effects caused by the user’s change of vehicle as well as the cases where the user kept his/her device in a different holder or location on different trips. The test users also reported a few times when they had accidentally dropped the device, or the device had come loose from its holder. The former cases were fairly easy to recognize based on the reported, much higher than normal, acceleration values, but the latter cases tend to be erroneously detected as road condition problems. In any case, the calculated levels should be fairly comparable regardless of the devices used, even when the individual values reported by the accelerometers are
Interpretation, Modeling, and Visualization of Crowdsourced …
113
not. Unfortunately, rare cases where a user often changes vehicles remain a problem for detection. This problem would also be present if data were to be collected from, for example, public transportation utilizing the user’s mobile devices. The level markers and their use are illustrated in Figs. 6, 7 and 8. Figure 6 shows a map using OpenStreetMaps, whereas Figs. 7 and 8 use Google Maps. The OpenStreetMaps implementation is slightly newer, but the features of both implementations are basically the same. One exception is the Street View functionality shown in Fig. 8, which is available only when using Google Maps. Both implementations
Fig. 7 Visualization of the route between the cities of Pori and Tampere
Fig. 8 Visualization in Google maps street view
114
P. Sillberg et al.
also utilize the same underlying Representational State Transfer (REST) Application Programming Interfaces (API) provided by the cloud service. The routes driven by the users are visualized in Fig. 6. The shock levels are illustrated by five colors (green, yellow, orange, red, and black—green being the best road condition, black the worst). The areas on the map are: the cities of Pori (top left), Rauma (bottom left), and Tampere (right). The various markers are also of slightly different sizes with the green “good condition” markers being the smallest and the black “bad condition” markers being the largest. This is in order to make the “bad condition” markers easier to spot among the data, which largely consist of green markers. The user interface contains basic features for filtering data: viewing data from only a single user; excluding undesired shock levels, calculating highlights; selecting a specific date or time to observe; selecting the area to view; and the possibility to limit the number of level markers by only returning an average or median of the reported values within a certain area. The exclusion of undesired shock levels and highlights are illustrated in Fig. 7. The upper part of the figure shows basically the “raw data” selected from an area, in this case from a route between the cities of Pori and Tampere. In the lower part, the individual markers are removed and only the calculated highlights (exclamation marks) can be seen. The highlights represent an area where the measurements contain a large number of certain types of shock levels. The highlights can be calculated for any level, but naturally, are more useful for spotting places where there is a high concentration of “bad condition” markers. It would also be possible to show any combination of level markers with the highlights, e.g., red or black markers without green, yellow, and orange markers. Finally, Fig. 8 shows the shock level markers in the Street View application. The Street View photos are not always up-to-date so the feature cannot be used as such to validate the results, but it can be used to give a quick look at an area. In this case, the cause of several orange, red, and black—“bad condition”—markers can be seen to be the bumps located on the entrance and exit sections of a bridge located on the highway.
6 Discussion The basic programming task of creating a simple application for tracking the user’s location and gathering data from the basic sensors embedded in a mobile device is, in general, a straightforward process. Nevertheless, a practical implementation can pose both expected and unexpected challenges.
Interpretation, Modeling, and Visualization of Crowdsourced …
115
6.1 Technical Difficulties We chose to use the Android platform because the authors had previous experience in Android programming. Unfortunately, the Android devices have hardware differences, which can affect the functionality of the application. In our case, there were two major issues. First, one of the older devices we used in our benchmarks lacked the support of a linear acceleration sensor, despite including a basic accelerometer. In practice, this means that all measured acceleration values included a gravity component without an easy or automated means of filtering the output. Filtering can be especially difficult on older models that do not contain proper rotation sensors that could be used to detect the orientation of the device. Second, as it turned out, devices from different manufacturers and even different device models from the same manufacturer had variations in the reported accelerometer values, making direct comparison of values between devices challenging at best. Larger bumps are visible from the results regardless of the device, but smaller road surface features can become lost due to the device inaccuracies. In practice, differences in the devices required the calculation of a “normal” for each device, against which variations in the data would be compared. Calculating a universal normal usable for all devices and users would probably be very difficult, if not entirely impossible. In any case, in laboratory conditions or in a controlled environment finding this normal is not a huge problem, but where a large crowdsourcing user and device base is concerned, finding the normal for each device can be a challenge. Additionally, the vehicle the user is driving can have a major impact on the detected values; after all, car manufacturers generally prefer to provide a smooth ride for the driver, and on the other hand, a car with poor suspension or tires can cause data variations that can be difficult to filter out. This also means that, if the user drives multiple vehicles, there should be a way for the application to either detect the vehicle used or adapt to the altered conditions. In principle, the collected data could be analyzed to determine the device’s normal, for example, if known “good condition” roads have been driven on. In practice, the data amounts (and the required server and network capacity) can be too extreme for this approach to be feasible. A better option would be to analyze the data on-device and the devices should only send the variances that exceed the calculated threshold values (i.e., detected potholes, roads of poor quality).
6.2 Interpretation of the Data When examining the collected data set, the known places of data variance are visible, and in expected places. These include, among others, known roadworks, speed bumps, and bridge ramps, i.e., spots that the drivers cannot avoid can be easily seen in the collected data. Unfortunately, the same cannot be said about potholes or other larger, but in general, more infrequent road condition issues which are not
116
P. Sillberg et al.
always detected. We did not perform extensive studies to discover the driving habits of the users participating in our trial, although a quick interview revealed (perhaps unsurprisingly) that the drivers had tried to avoid driving into potholes. In the initial phase of data analysis, validating the findings proved troublesome. As the drivers could drive along any road they wished, we did not have a clear idea of which of the roads driven were in bad shape or where on the road the bumps were located, nor was there available any conclusive database of speed bumps or other purpose-built road features that could be accidentally identified as road surface problems. Driving to the location of each detected bump for validation purposes in the case of a larger data set would be quite impractical. To get a basic idea of where the “bumpy” roads were located, the preliminary results were shared with the department of the City of Pori responsible for road maintenance and compared with their data. The data collected by the city are based on complaints received from road users or reported by the city maintenance personnel driving on the city roads. Thus, maintaining the data requires a lot of manual labor and the data are not always up-todate. Nevertheless, this did give us some insight into the known conditions of the roads around the city. Furthermore, the discussion with the maintenance department gave a clear indication that an automated method for the collection of road condition data around the city would be a great help for the people responsible for road maintenance. Moreover, collecting a sufficiently large data set with a very large user base could ultimately help in finding individual road problems as drivers would, for example, accidentally drive into potholes, but in our trials identifying specific road problems turned out to be quite challenging. On the other hand, the results showed, in a more general fashion, which of the driven roads were in the worst condition, and furthermore, which parts of a single road were in worse condition than the road on average. Both findings can be used for assessing road conditions, and with a much larger data set, even individual bumps could perhaps be more reliably detected. A larger database is also advantageous in the elimination of unwanted data caused by individual random events—such as the user moving or tapping the phone during driving, sudden braking events or accidents—which could be erroneously detected as road condition problems. On the other hand, larger sets increase computing resource requirements and challenges in managing the data. In fact, even the amount of data collected in our user trials can be problematic. One of the main challenges is the visualization of large data sets. For testing and validation purposes, all data generated by the mobile devices were stored on our server. Storing the “good condition” data can also help to map the roads the users have driven on as opposed to only reporting detected variations from the normal. Unfortunately, serializing the data—using JavaScript Object Notation (JSON) or Extensible Markup Language (XML)—and showing the measurements on a map in a web browser may be quite resource-intensive. Even when measurements are combined and indexed on the server to reduce the amount of transferred data, there can still be thousands of markers to be drawn on the map, especially if “good condition” data are included. Showing multiple roads in a large area simultaneously on a map can be a good method from a visualization point of view, but it can also make the web user interface sluggish or slow to load. For reference, loading and
Interpretation, Modeling, and Visualization of Crowdsourced …
117
showing the map visible in Fig. 6 consisting of 100,000 measurement markers takes approximately 3–4 min, which is not an entirely impractical length of time for constructing the visualization, but can be an annoying delay when performing repeated work on the data set. Constructing visualizations with smaller data sets (e.g., less than 10,000 data points), depending on the chosen filter settings, takes anything from a couple of seconds to almost half a minute.
6.3 Future Studies One possible future action could be to open up the collected data for further analysis by other researchers. In general, the data are relatively easy to anonymize and do not contain any hard-coded user details. A method of generating anonymous data is also an advantage if a larger, more public user trial is to be performed in the future. Running the trials with a larger user base would be one possible course of future action, although acquiring sufficient server resources for a wide-scale user trial could pose a challenge. A less resource-intensive option could be to collect data for a longer period on a specific set of roads with the goal of discovering whether a gradual worsening of road conditions can be detected or how the results differ between winter and summer. Our current trials were run in spring and summer, and it is unknown how winter conditions would affect the results. Furthermore, the roads driven on were primarily paved and gravel roads were not included in the analysis of the data. In addition, the increase in the number of dashboard cameras installed in vehicles, and the decrease in the prices of 360-degree cameras could provide an interesting aspect for data collection. The utilization of cameras could also make data validation easier during the trial phase, as there would be no need to go and check the detected road condition problems locally, or to use Google Street View or similar applications that may contain outdated images. The Faucet-Sink-Drain model was used for the first time in an actual use case, and it could prove useful in other applications as well. However, the model requires more research and development to fully unlock its potential. Also, the framework [10] that is based on the model would require an actual implementation before more conclusions can be drawn of the model’s usefulness. Data security is an important factor that has not been addressed in this study. The prototype has basic user identification with username and password, but this was not used for filtering input data. Issues of data security, privacy, and anonymization of data need to be solved before commercialization.
118
P. Sillberg et al.
7 Summary This paper introduced a study that utilized data collected by sensors—primarily from an accelerometer and GPS—embedded in smartphones for detecting the condition of road surfaces. The data were obtained from a group of users driving on paved roads in western Finland. Furthermore, the test setup was described including a discussion on the challenges faced. This paper showed how to combine a data gathering model and a data analysis model. Both of the models were applied and tested in the developed prototype system. The results achieved from the trial period showed that even though the chosen methods could, in principle, find individual road surface problems (such as potholes), the results were more useful in the assessment of the overall condition of the road. In addition, the paper presented methods for visualizing road condition data collected from test users.
References 1. M. Krommyda, E. Sdongos, S. Tamascelli, A. Tsertou, G. Latsa, A. Amditis, Towards citizenpowered cyberworlds for environmental monitoring, in 2018 International Conference on Cyberworlds (CW) (2018) pp. 454–457 2. K.I. Satoto, E.D. Widianto, S. Sumardi, Environmental health monitoring with smartphone application, in 2018 5th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE) (2018), pp. 281–286 3. P. Pyykonen, J. Laitinen, J. Viitanen, P. Eloranta, T. Korhonen, IoT for intelligent traffic system, in 2013 IEEE 9th International Conference on Intelligent Computer Communication and Processing (ICCP) (2013), pp. 175–179 4. Yle Uutiset: Lapin Ely lupaa vähemmän lunta ja polanteita – Bussinkuljettajat keräävät tietoa Lapin teiden kunnosta. https://yle.fi/uutiset/3-9277596 (2016) Retrieved 27th June 2018 5. O. Vermesan, P. Friess, P. Guillemin, S. Gusmeroli, H. Sundmaeker, A. Bassi, I. Jubert, M. Mazura, M. Harrison, M. Eisenhauer, P. Doody, Internet of Things Strategic Research Roadmap. http://www.internet-of-things.no/pdf/IoT_Cluster_Strategic_Research_ Agenda_2011.pdf (2009). Retrieved 23rd Mar. 2019 6. A. Ha´c, Wireless Sensor Network Designs (Wiley, Chichester, 2003) 7. J. Grönman, P. Rantanen, M. Saari, P. Sillberg, H. Jaakkola, Lessons Learned from Developing Prototypes for Customer Complaint Validation. Software Quality Analysis, Monitoring, Improvement, and Applications (SQAMIA), Serbia (August 2018) 8. P. Rantanen, P. Sillberg, J. Soini, Towards the utilization of crowdsourcing in traffic condition reporting, in 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Croatia (2017), pp. 985–990 9. M. Ma, P. Wang, C.-H. Chu, Data management for internet of things: challenges, approaches and opportunities, in 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing (2013), pp. 1144–1151 10. P. Sillberg, Toward manageable data sources. Inf. Modell. Knowl. Bases XXX. Front. Artif. Intell. Appl. 312, 101–111 (2019) (IOS Press) 11. P. Sillberg, J. Grönman, P. Rantanen, M. Saari, M. Kuusisto, Challenges in the interpretation of crowdsourced road condition data, in International Conference on Intelligent Systems (IS) (2018)
Interpretation, Modeling, and Visualization of Crowdsourced …
119
12. Howe, J.: The Rise of Crowdsourcing. https://www.wired.com/2006/06/crowds (2006). Retrieved 27th June 2018 13. D.C. Brabham, Crowdsourcing as a model for problem solving. Convergence: Int. J. Res. New Media Technol. 14(1), 75–90 (2008) 14. K. Mao, L. Capra, M. Harman, Y. Jia, A Survey of the Use of Crowdsourcing in Software Engineering. Technical Report RN/15/01, Department of Computer Science, University College London (2015) 15. C.-W. Yi, Y.-T. Chuang, C.-S. Nian, Toward crowdsourcing-based road pavement monitoring by mobile sensing technologies. IEEE Trans. Intell. Transp. Syst. 16(4), 1905–1917 (2015) 16. Y.A. Alqudah, B.H. Sababha, On the analysis of road surface conditions using embedded smartphone sensors, in 2017 8th International Conference on Information and Communication Systems (ICICS), Jordan (2017), pp. 177–181 17. F. Carrera, S. Guerin, J.B. Thorp, By the people, for the people: the crowdsourcing of “STREETBUMP”: an automatic pothole mapping app. Int. Arch. Photogrammetry Remote Sens. Spat. Inf. Sci. (ISPRS), XL-4/W1 (4W1), 19–23 (2013) 18. J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden, H. Balakrishnan, The pothole patrol, in Proceedings of the 6th International Conference on Mobile systems, applications, and services—MobiSys ’08, Colorado, USA (2008), p. 29 19. K. Chen, M. Lu, G. Tan, J. Wu, CRSM: crowdsourcing based road surface monitoring, in 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing, China (2013), pp. 2151–2158 20. G. Alessandroni, L. Klopfenstein, S. Delpriori, M. Dromedari, G. Luchetti, B. Paolini, A. Seraghiti, E. Lattanzi, V. Freschi, A. Carini, A. Bogliolo, A, SmartRoadSense: collaborative road surface condition monitoring, in The Eighth International Conference on Mobile Ubiquitous Computing, Systems, Services and Technologies (UBICOMM), Italy (2014) 21. V. Dyo, Middleware design for integration of sensor network and mobile devices, in Proceedings of the 2nd International Doctoral Symposium on Middleware—DSM ’05, New York, USA. ACM Press (2005), pp. 1–5 22. T. Leppanen, M. Perttunen, J. Riekki, P. Kaipio, Sensor network architecture for cooperative traffic applications, in 2010 6th International Conference on Wireless and Mobile Communications (2010), pp. 400–403 23. I.F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, Wireless sensor networks: a survey. Comput. Netw. 38(4), 393–422 (2002) 24. M. Saari, A.M. Baharudin, P. Sillberg, P. Rantanen, J. Soini, Embedded Linux controlled sensor network, in 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (2016), pp. 1185–1189 25. M. Saari, P. Sillberg, P. Rantanen, J. Soini, H. Fukai, Data collector service—practical approach with embedded linux, in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (2015), pp. 1037–1041 26. A.M. Baharudin, M. Saari, P. Sillberg, P. Rantanen, J. Soini, T. Kuroda, Low-energy algorithm for self-controlled wireless sensor nodes, in 2016 International Conference on Wireless Networks and Mobile Communications (WINCOM), pp. 42–46 (2016)
A New Network Flow Platform for Building Artificial Neural Networks Vassil Sgurev, Stanislav Drangajov and Vladimir Jotsov
Abstract A number of results are exposed in the present work, related to the transition being proposed from the widely spread nowadays platform for building up multilayer ANNs to a new platform, based on generalized network flows with gains and losses on directed graphs. It is shown that the network flow ANNs are of more general network structure than the multilayer ANNs and consequently all results obtained through the multilayer ANN are a part of the new network flow platform. A number of advantages of this new platform are pointed out. Generalized network flow with gains and losses is used in it as a base and on this ground, a mathematical model of ANN is proposed. A number of results are obtained for the network flow ANNs that are corollaries of the Ford-Fulkerson’s mincut-maxflow theorem, namely: existence of upper bound vmax of the flow from sources to consumers and lower bound cmin of capacity on all possible cuts, as well as equality between the maximal flow and the minimal cut. A way for building in additional linear constraints between the different signals in the ANN is pointed out. Defining of the optimal coefficients is carried out through corresponding optimization procedures with polynomial computational complexity of the network flow programming. The possibility for effective training and recognition is proven through rigorous procedures for the network flow platform and without using of heuristic algorithms for approximate solutions, characteristic to the multilayer ANNs. Keywords Artificial neural networks · Multilayer neural networks · Network flow neural networks · Mincut-maxflow theorem · Network optimization procedures
V. Sgurev · S. Drangajov Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria e-mail: [email protected] S. Drangajov e-mail: [email protected] V. Jotsov (B) University of Library Studies and Information Technologies, Sofia, Bulgaria e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_6
121
122
V. Sgurev et al.
1 Introduction The Artificial Neural Networks (ANN) are widely spread nowadays and they find a lot of applications in all spheres of society—scientific, material, economical, educational, spiritual etc. This is due mostly to the fact that in some areas with their help results are obtained, which would be very hard to achieve by other methods. This scientific field was left for a long time in oblivion, but it undergoes a real renaissance in the last decades. Serious researchers do not still fancy any illusions that difference between the natural neural networks and their artificial analogues— such as we know them at present, may at once disappear—the distance is still huge. It is true however, that approximate solutions are reached through these Artificial Neural Networks (ANN) that are close enough to the optimal ones and this is very important by itself and besides—in many application areas. Neural networks have passed different stages of their development that is inseparably connected with the advances in computers and the information technologies. One should start from the model of the artificial neuron, proposed in 1965 by Rosenblatt [1], which has been complementary developed additionally by Minsky [2]. The enthusiasm that appeared from these achievements was very soon followed by disappointments and abating research activity in this scientific field. The interest to the Artificial Neural Networks (ANN) received a new impulse and the research and applications exhibit an ascending development in the beginning of the 80s of the past century [3]. This is so even nowadays. The interest to ANN raised still even more in the last two decades after the appearance of the new technology Deep Learning [4, 5]. It is aimed at processing of big massifs of data and knowledge and starts to be widely distributed in medicine and health care, in police investigations, in judicature, in big commercial structures etc.
2 Necessity of Changing the Platform at Building ANN One and the same theoretical base have been used since the creation of ANN even until now. A combination of signals of different value is fed up to the input of the network, and at the output—a combination of signals of desired value is achieved. To attain this effect the network is being trained by a series of experiments. As a result, such internal for the network coefficient are picked out, which guarantee good enough mapping between the signal combination being fed up and the desired combination of values of the output signals. The ANN in this case functions as d specific, self-teaching computer, which has memory and can solve a wide range of specific problems—in many cases, more effectively than the general-purpose computers. Multilayer ANN are used for the realization of these features in which the artificial neurons are located in hierarchically arranged layers that regulate the connections between the neurons of one layer with the neurons of another layer, as well as the connections between neurons of the same layer.
A New Network Flow Platform for Building …
(a)
(b) x1
x1 w1 x2 xn
123
w2
∑
N
F ∫
∑
k1,1
∑
x2 ∑
F(N)
wn
∑
y1
y2
xm ∑
km,p
∑
yp
Fig. 1 Artificial neuron and a two-layer ANN
The scheme of an artificial neuron with activation functions F and coefficients (w1 , w2 , . . . , wn ) used for s teaching is shown in Fig. 1a. A two layer artificial neural network is illustrated in Fig. 1b. All algorithms, known up to now for teaching and pattern recognition have been built through this multilayer network structure. On the other hand there are no scientifically grounded proves that the natural networks of neurons in the human brain are ordered in layers and function in a way similar to the one in the ANN. Even on the contrary—it seems for now that the neural network in the brain is of possibly of most general structure with most different connections between neurons. It is known from the graph theory that imposing of additional constraints, such as layers and the connections between them results in lesser efficiency in optimisational, structural and other aspects, than in the most general network. There are no principal difficulties teaching and recognition procedures to be analogously carried out in networks of general character (with no layers), and even more effectively than in multilayer networks [6]. All this leads to the idea ANN to be built up on the base of flows on graphs (network flows)—most general structures and besides without using “multilayerness”. This was proposed in [7]. At such formulation, the artificial neurons are considered as vertices (nodes) of the network and signals are interpreted as network flow functions on the separate arcs, connecting two vertices of the graph. A survey of the set of all possible classes of network flows shows that the generalized network flow with gains and losses is most appropriate for artificial neural networks. In it, arc coefficients exist that increase or diminish the flow value on each arc. At that, flow may be inserted or extracted for each separate vertex, into or out of it through the incoming or outgoing arcs of the network. A platform for constructing of ANN is proposed in the present work—on the base of a generalized network flow with gains and losses. The purpose is the network flow ANNs, obtained in this way to solve problems with not worse, and even better, if possible, indices of algorithms’ computational complexity than with the multilayer ANN.
124
V. Sgurev et al.
As the network flow models [8–10] have diverse and well-elaborated optimization procedures of polynomial computational complexity, then through the platform being proposed a wider scope of problems would be solved, than this is accomplished nowadays. Passing to network-flow models of ANN is in essence a transition from discrete (multilayer) to discrete-continuous (network-flow) models. This transition is related to difficulties that are surmountable. Formal description of the generalized network flow with gains and losses will be exposed in the next sections, as well as of its adaptation to the ANN’s and solving of some specific problems of ANN’s. The results of an example of numerical interpretation of ANN through a generalized network flow with gains and losses are described.
3 Generalized Network Flows with Gains and Losses A characteristic feature of this class of flows is that the value of the arc flow may be of different value at the starting vertex of the arc and different—in the end vertex of the same arc. It follows from this, that the amount of flow from all source vertices may be not equal in the most general case to the amount of flows in the consumer vertices [11, 12]. The formalization of the network flows’ class being described [8–10] will be done on the base of the following denotations: G(X, U ) directed graph with a set of vertices X and a set of arcs—U;
X = {xi /i ∈ I };
(1)
I a set of indices of all vertices from X; U = xi j /(i, j) ∈ I ;
(2)
I a set of the pairs of indices of all arcs of the graph; i1 direct mapping of each vertex x i ; i1 = j/xi j ∈ U ; i−1 reverse mapping of vertex x i ;
(3)
A New Network Flow Platform for Building …
125
i−1 = j/x ji ∈ U ;
(4)
|X | = n |U | = m f i j = f (xi , x j )
number of elements in the set of vertices X; number of elements in the set of arcs in U; arc flow function on arc x ij with initial vertex x i and end node xj ; the aggregate flow on all arcs outgoing from x i ; f i1 = j∈i1 f i j f i−1 = j∈i−1 f ji the aggregate flow of all arcs incoming into x i ; S⊆X a set of vertices that are flow sources; T ⊆ X ; T ∩ S = ∅ a set of vertices that are flow consumers; ∅ the empty set; total amount of flow from all sources; v s = i∈S vis – total amount of flow from all consumers; v t = i∈T vit – value of the flow from source xi ∈ S; vis value of the flow from consumer xi ∈ T ; vit upper bound of the flow capacity on arc xi j ; ci j gi j > 0; (i, j) ∈ I arc coefficient for arc xi j , showing by what value the flow f i j from the initial vertex x i is changed at the end vertex x j of the arc, i.e. ⎧ ⎨ > f i j ; if gi j > 1; gi j f i j = f i j ; if gi j = 1; ⎩ < f i j ; if gi j < 1;
(5)
ai j arc assessment for arc xi j ;
L=
ai j f i j ;
(6)
(i, j)∈I
is the objective function. It follows from the denotations being introduced, that: f i j = gi j f i1 ; gi j =
fi j ; f i1
(7) (8)
On the base of the denotations introduced, the generalized network flow with gains and losses could be described in general through the following relations:
126
V. Sgurev et al.
j∈i1
gi j f i j −
g ji f ji
j∈i−1
⎧ s ⎨ vi ; if xi ∈ S; = 0; if xi ∈ / S ∪ T; ⎩ t −vi ; if xi ∈ T ;
(9)
f i j ≤ ci j ; (i, j) ∈ I ;
(10)
f i j ≥ 0; (i, j) ∈ I ;
(11)
It follows from relations (5) to (11) that for different values of the arc coefficients the flow in the sources S might not be equal to the flow in the consumers T, i.e. the following relations are possible: v S > v T , or v S = v T , or v S < v T ;
(12)
i.e. amplification or attenuation of the flow. The relations for the maximal flow and the minimal cut are valid also and for the generalized network flow with gains and losses. Relations (6) and from (9) to (11) provide a possibility different problems of the mathematical programming, emerging from practical tasks to be solved. It will be shown in the next section how the described class of generalized network flows with gains and losses may be applied for description of Artificial Neural Networks (ANN).
4 Artificial Neural Network as a Generalized Network Flow with Gains and Losses As seen from Fig. 1a set of signals {xi } multiplied by the coefficients {wi } respectively, are fed to the input of the artificial neuron and they lead to the total sum N for which the following may be put down: N=
n
wi xi
(13)
i=1
This sum falls under the impact of the activation function F and a signal F(N) comes out at its output, which is fed to those neurons that have connection with the artificial neuron being considered. If the neuron attenuates or amplifies the sum of signals N out of the range of the activation function F, then it is expedient correcting signals {Ni } to be introduced for the purpose of achieving a balance of signals in the respective neuron, such that:
−Ni ≤ Ni ≤ Ni ;
(14)
A New Network Flow Platform for Building …
127
where −Ni and Ni are the bounds of the correcting signal Ni , respectively. If the activation function is accepted to be linear, then F(N ) = k N ;
(15)
where k is the coefficient of linearity. Therefore, the signal being emitted after the activation function impact is equal to
N = N + k N = N +
n
wi xi
(16)
i=1
Using its program the artificial neuron assigns new coefficients wi for which xi = wi N ;
(17)
where xi is the signal from the artificial neuron being examined to the neuron of index i . At such formalization it follows that the balance of signals in the artificial neuron is to be sought after summation of the signals from (13) and the correction signal Ni and before the transformation of N into N through the activation function F. Then the following equality for signals’ conservation is valid [7]: n i=1
wi xi + N = k
n
wi xi
(18)
i=1
The artificial neuron fixes the results of training and recognition through the new coefficients {wi }. The artificial neural network is built up when on the base of relations from (13) to (18) the separate artificial neurons are united in a given way into a common neural network. At appropriate training the network, after feeding of given signals at the input, generates at the output results close enough the ones being sought. The network functions like a specialized neural computer for approximate solutions. Depending on the number of layers, the problems being solved, and the algorithms used, the neural network may achieve results close enough to the optimal. This is mainly carried out by changing the coefficients {wi }. Relations from (13) to (18) show the way in which the artificial neuron from Fig. 1a, operates. They provide a possibility adequate interpretation of this neuron to be carried out through relations (6) and (9)–(11) describing the generalized network flow with gains and losses. The following correct assumptions will be used:
128
V. Sgurev et al.
(a) Let for each artificial neuron of index i the right hand side of (13) be substituted by f i−1 , i.e. N=
n
wi xi → f i−1 =
f ji ;
(19)
j∈i−1
i=1
(b) The right hand side of (16) is substituted by f i,0 + f i1 , i.e. N +
n
wi xi → f i,0 +
i=1
fi j ;
(20)
j∈i1
(c) Flows {vis } in the sources S are assumed to be input signals to the neuron and {vit }—its output signals; (d) If the product wi xi is an upper bounded non-negative function, then relations (19) and (20) lead to inequalities (10) and (11). Then equality (18) turns into the following equation, specific to the conservation of the generalized network flow with gains and losses:
gi j f i j + f i,0 − ki
j∈i−1
j∈i1
g ji f ji
⎧ s ⎨ vi ; if xi ∈ S; = 0; if xi ∈ / S ∪ T; ⎩ t −vi ; if xi ∈ T ;
(21)
This equality together with the inequalities (10) and (11) and the objective function (6) define a specific generalized network flow with gains and losses that is identical to the multilayer artificial neural network. The proportionality factors {ki } are in the most general case different for the distinct artificial neurons of indices i ∈ I . If the activation function is non-linear, then the conservation equation should be modified in an adequate way. The formalism elaborated for generalized network flows provides a possibility, like in the multilayer ANN, a variety of problems for training and recognition to be solved. The values of the awaited input signals are fixed at teaching of the ANN. In the network flow ANNs these are parameters {vis /i ∈ S}. Desired output signals are also t known—in interpretation those are {vi /i ∈ T}. It is necessary such the network flow values of gi j /(i, j) ∈ I to be found that are relevant to the teaching processes. If the realization of all signal-network flow functions, f i j /(i, j) ∈ I has a corresponding value ai j , then the optimal solution of the problem thus defined may be obtained after solving the following linear problem of the network flow programming: L=
(i, j)∈I
ai j f i j → min(max);
(22)
A New Network Flow Platform for Building …
129
subject to constraints (10), (11), and (21). From the optimal solutions received for f i j optimal values of gi j /(i, j) ∈ I may be received by Formula (8), that are necessary for the training process. At that unlike from the multilayer ANN, in which a number of intuitive heuristic algorithms are used that do not guarantee optimality of the solutions, with the network flow model (8), (10), (11), and (21) proven optimal solutions of polynomial computational complexity are achieved. Cases are possible when the network selected and the fixed values on it {vi /i ∈ S ∪ T } do not lead to a feasible solution. Then the sophistication of the network should be increased through greater number of vertices and arcs until an optimal solution is obtained. The fact, that a number of methods and algorithms are elaborated for the network flows, in which the specific features of the network structure is taken into account, is also of importance. This results in procedures of polynomial complexity that are considerably more effective than the general procedures of the linear and discrete programing. The multilayer ANNs widely used nowadays have specific structure that is a private case of the general structure of the network flows. At that, all results received through the multilayer ANNs are preserved and at the network-flow interpretation of the ANNs, plus a number of new results that cannot be achieved through multilayer ANNs. Some additional potentialities of the network flows that extend the options for investigation and application of ANNs are exhibited in the next section. These features are not available in the multilayer ANNs.
5 Maximal Flow, Minimal Cut and Additional Linear Constraints in Network-Flow Artificial Neural Networks The signals flowing from one neuron to the other in any natural biological network are not infinitely big; they are of limited upper bounds. Lower bounds are zero when there is no signal that cannot be of negative value. These requirements are well reflected in relations (10) and (11) for the network flow interpretation of ANNs. This provides a possibility a number of results from the network flow theory to be transferred to the ANNs. In it a fundamental role plays the widely-known Ford-Fulkerson’s theorem [8, 9] for the maximal flow and the minimal cut, abbreviated—mincut-maxflow theorem. Under ‘cut’ is meant the set of all arcs that block all paths from sources S to consumers T,
in each path one and only arc of the cut is encountered [10]. Each cut at which X 0 , X¯ 0 between S and T meets the following requirements [13, 14]: X 0 ⊂ X; S ⊆ X 0 ; X¯ 0 = X \X 0 ;
X 0 , X¯ 0 = xi j /xi j ∈ U ; xi ∈ X 0 ; x j ∈ X¯ 0 ;
130
V. Sgurev et al.
X¯ 0 , X 0 = xi j /xi j ∈ U ; xi ∈ X¯ 0 ; x j ∈ X 0 ;
where X¯ 0 , X 0 is a reverse cut.
By f X 0 , X¯ 0 and c X 0 , X¯ 0 are respectively denoted the sum of the network flow
and the sum of the capacities on the cut X 0 , X¯ 0 that are defined in the following way:
f X 0 , X¯ 0 =
f xi j =
fi j ; (i, j)∈I (X 0 ) (xi j )∈( X 0 , X¯ 0 )
f X¯ 0 , X 0 = f xi j = fi j ; ¯ ¯ (i, j)∈I ( X 0 ) (xi j )∈( X 0 ,X 0 )
c X 0 , X¯ 0 = ci j ; c xi j = ¯ j)∈I (i, (X ) 0 (xi j )∈( X 0 , X 0 )
c X¯ 0 , X 0 = c xi j = ci j ; ¯ ¯ j)∈I x ∈ ,X X X (i, ( 0) ( i j ) ( 0 0)
where
(X 0 ) = (i, j)/(i, j) ∈ I ; xi j ∈ X 0 , X¯ 0 ;
I X¯ 0 = (i, j)/(i, j) ∈ I ; xi j ∈ X¯ 0 , X 0 . The set of all possible cuts between sources S and consumers T will be denoted by R. Then the maximal flow and the minimal cut between S and T will be defined in the following way:
vmax = f X 0 , X¯ 0 max =
max X 0 , X¯ 0 ; ( X 0 , X¯ 0 )∈R
X 0 , X¯ 0 ; = min ( X 0 , X¯ 0 )∈R
(23)
cmin = c X 0 , X¯ 0 min
(24)
It follows from relations (10) and (11) that a maximal flow always exists and it is the greatest possible of all flows between S and T. Minimal cut also always exists, even at zero values of {ci j }. Then according to the fundamental Ford-Fulkerson’s mincut-maxflow theorem, one may write down:
vmax = f X 0 , X¯ 0 − f X¯ 0 , X 0 = c X 0 , X¯ 0 ;
(25)
f X¯ 0 , X 0 = 0.
(26)
A New Network Flow Platform for Building …
131
This means that: in (a) The sum of the signals from the neurons of S to the neurons of T does not exceed vmax ;
(b) A minimal cut—c X 0 , X¯ 0 min always exists between neurons S and T in the neural network, no matter how sophisticated it may be, which imposes an upper bound on the signals from S to T. At that
vmax = f X 0 , X¯ 0 max = c X 0 , X¯ 0 min . Under such formalization, no path exists from any neuron of S to any neuron of T that does not contain an arc of the minimal cut. A possibility exists a number of other results from the network flow structures to be also transferred to ANNs, which enables latter to be deeper rationalized, and at that without unnecessary heuristic approaches and assumptions. The network flow interpretation of ANNs provides a possibility for introduction of additional linear constraints, which might be probably used in the natural neural networks [15]. Acceptable hypothesis is that the natural neural networks are realized in such a way, that relations between the separate signals exist. Most easy is to suppose that they are linear relations of the following type: for each r ∈ P
birj f i j ≤ Dr ;
(27)
(i, j)∈Ir
where Ir ⊆ I is a set of pairs of indices of arcs included in the constraint of index r ∈ P; P = {1, 2, . . . , g} set of constraints’ indices; non-negative coefficient, corresponding to arc xi j entering into birj ≥ 0 constraint of index r ∈ P. At that |P| < m, where |P| is the number of elements in the set P. Dr ≥ 0 non-negative coefficient in the right hand side of constraint r ∈ P At defining the optimal realization of the generalized network flow, the additional constraints (27) should be added to constraints (10), (11), and (21). Then defining of the optimal generalized network flow will be reduced to the following linear programming problem: (10), (11), (21), (22), and (27). How exactly these additional linear constraints would look could be clarified after much more significant research results and knowledge of natural neural networks.
132
V. Sgurev et al.
6 Numerical Experiment with a Network Flow Artificial Neural Network A numerical example will be described illustrating the possible implementation of generalized network flows with gains and losses for interpretation of artificial neural networks. The network in Fig. 2 consists of nine vertices and thirteen arcs. It is accepted in the present example that for all linear coefficients is true: ki = 1; i ∈ I ; and functions f i,0 are of zero values, i.e. f i,0 = 0; for each i ∈ I. It follows from Fig. 2 that there are two sources in the network S = {x1 , x2 }, and three consumers
(
0,
0)
x4 -6
7 x1
x7
2 (0.66) x3
x6
5
-4
x2
x8
(
0,
0)
x5
-2 x9
Fig. 2 A generalized network with gains and losses as a ANN
A New Network Flow Platform for Building …
133
T = {x7 , x8 , x9 }. Flows in the sources and consumers are of the following values: v1 = 7; v2 = 5; v7 = 6; v8 = 4; v9 = 2; which means, that equality exist between the sums of flows in the sources and consumers, i.e. vis = vit = 12. xi ∈S
xi ∈T
Arc assessments for each arc of the network are assumed to be of unit value, i.e. ai j = 1, f or each (i, j) ∈ I . Upper bounds of the capacities of each arc are equal, respectively to: c1,3 = 2; c1,4 = 5; c2,3 = 1; c2,5 = 4; c3,4 = 2; c3,6 = 3; c4,6 = 1; c4,7 = 6; c5,6 = 3; c5,8 = 2; c5,9 = 2; c6,7 = 3; c6,8 = 4. The objective function (22) is of the following kind: L = f 1,3 + f 1,4 + f 2,3 + f 2,5 + f 3,4 + f 3,6 + f 4,6 + f 4,7 + f 5,6 + f 5,8 + f 5,9 + f 6,7 + f 6,8 → min. Equalities and inequalities (10), (11), (21), and (22) for the example being considered are of the following kind: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
f 1,3 + f 1,4 = 7; f 2,3 + f 2,5 = 5; f 3,4 + f 3,6 − f 1,3 − f 4,6 + f 4,7 − f 1,4 − f 5,6 + f 5,8 + f 5,9 − f 6,7 + f 6,8 − f 3,6 − − f 4,7 − f 6,7 = −6; − f 5,8 − f 6,8 = −4; − f 6,9 = −2; 1 ≤ f 1,3 ≤ 2; 1 ≤ f 1,4 ≤ 5; 1 ≤ f 2,3 ≤ 1; 1 ≤ f 2,5 ≤ 4; 1 ≤ f 3,4 ≤ 2; 1 ≤ f 3,6 ≤ 3; 1 ≤ f 4,6 ≤ 1;
f 2,3 = 0; f 3,4 = 0; f 2,5 = 0; f 4,6 − f 5,6 = 0;
134
17. 18. 19. 20. 21. 22.
V. Sgurev et al.
1≤ 1≤ 1≤ 1≤ 1≤ 1≤
f 4,7 f 5,6 f 5,8 f 5,9 f 6,7 f 6,8
≤ 6; ≤ 3; ≤ 2; ≤ 2; ≤ 3; ≤ 4;
For solving the optimization problem defined above the free web linear programming solver http://www.phpsimplex.com was used. The following values of the arc flow functions were received: f 1,3 = 2; f 1,4 = 5; f 2,3 = 1; f 2,5 = 4; f 3,4 = 1; f 3,6 = 2; f 4,6 = 1; f 4,7 = 5; f 5,6 = 1; f 5,8 = 1; f 5,9 = 2; f 6,7 = 1; f 6,8 = 3. On the base of theoptimal solutions received for f i, j the following optimal values of coefficients gi, j may be defined through Formulae (7) and (8): f 11 f 21 f 31 f 41 f 51 f 61
= 7; = 5; = 3; = 6; = 4; = 4;
g1,3 g2,3 g3,4 g4,6 g5,6 g6,7
= 0.3; = 0.2; = 0.33; = 0.17; = 0.25; = 0.25;
g1,4 g2,5 g3,6 g4,7 g5,8 g6,8
= 0.71; = 0.8; = 0.66; = 0.83; = 0.25; g5,9 = 0.5; = 0.75.
The optimal values receivedfor the arc flow functions are shown above each arc next to them, in brackets. in Fig. 2. Optimal coefficients gi, j are shown with upper bounds of capacities of the Comparison of optimal values of f i, j corresponding arcs ci, j shows that there are six saturated arcs with equality between the arc flow and the upper bound of capacity of the same arc. Two of these arcs, namely x4,6 and x5,9 are isolated and remaining four saturated arcs—x1,3 , x1,4 , x2,3 , and x2,5 make the following minimal cut:
X 0 , X¯ 0 = {x1,3 , x1,4 , x2,3 , x2,5 }; X¯ 0 , X 0 = ∅;
where X 0 = {x1 , x2 }; X¯ 0 = {x3 , x4 , x5 , x6 , x7 , x86 , x96 }; X 0 ∩ X¯ 0 = ∅; X 0 ∪ X¯ 0 = X ; X¯ 0 = X \X 0 ; Mincut-maxflow theorem namely:
is observed
in the example
being considered,
vmax = f X 0 , X¯ 0 − f X¯ 0 , X 0 = c X 0 , X¯ 0 and f X¯ 0 , X 0 = 0. Maximal flow is equal to vmax = f 1,3 + f 1,4 + f 2,3 + f 2,5 = 2 + 5 + 1 + 4 = 12.
A New Network Flow Platform for Building …
135
On the other hand vmax = v1s + v2s = 7 + 5 = 12.
Minimal cut X 0 , X¯ 0 is shown by a thick dashed line in Fig. 2. It has the following values of capacities:
c X 0 , X¯ 0 = c1,3 + c1,4 + c2,3 + c2,5 = 2 + 5 + 1 + 4 = 12. Total sum of signals passing through the network flow ANN is upper bounded and
to 12. If necessary it to be greater, capacities of arcs of the minimal cut is equal X 0 , X¯ 0 should be increased. An approach will be described of trained network flow ANN for implementation through calculated coefficients gi, j /(i, j) ∈ I . Let’s new values be assigned to the input signals: v1s = 6 and v2s = 5, which differ from the previous values of the same quantities. It is assumedthat capacities are not changed from the already defined. Output signals v7t , v8t , v9t are considered as unknown variables, impending to be defined. Numerical example will be described for using an ANN, trained by the network flow optimization procedure. According to the flow conservation equation and accounting for the new assumption made f i,0 = 0 and ki = 1 for each i ∈ I one may write down
f ji = f i−1 = f i1 .
j∈i−1
Let’s values of the left hand side of the relations above be already known. Then 1 new values of fi j /j ∈ i will be defined through Eq. (7) on the base of computed gi, j /(i, j) ∈ I . This procedure may be described in the following way: (a) Assumethatv1t and v2t are equal to f 11and f 21 ; are already known are (b) Values f i1 for which values of f ji /j ∈ i−1 sequentially found and new values of f i j /j ∈ i1 are defined; (c) Applying sequentially the procedural steps above, one could reach to defining of the output signals {vit /xi ∈ T }, by which the level of training of the network could be judged.
136
V. Sgurev et al.
Thus, by execution in sequence of the three steps of the procedure described above, starting from the input signals and using coefficients gi, j /(i, j) ∈ I from the optimization solution, the following results will be obtained: f 11 f 21 f 31 f 41 f 51 f 61
= 6; = 5; = 2.8; = 6; = 4; = 4;
f 1,3 f 2,3 f 3,4 f 4,6 f 5,6 f 6,7
= 1.8; = 1; = 0.92; = 0.88; = 1; = 0.93;
f 1,4 f 2,5 f 3,6 f 4,7 f 5,8 f 6,8
= 4.26; = 4; = 1.85; = 4.4; = 1; f 5,9 = 2; = 2.80.
output signals received have the following values: tNew v7 = 5.33, v8t = 3.80, and v9t = 2 . Values of the newly received arc flow functions f i, j /(i, j) ∈ I are shown in Fig. 3 above the arcs and next to vertices x7 , x8 , and x9 —values of output signals. Comparing results from Figs. 2 and 3 one can ascertain that the network flow ANN renders satisfactory results in trainability. At alteration of input signals with a unit (Fig. 3), one of the output signals—x9 has not changed, but the two other—x7 and x8 have diminished in total the sum of their values by 0.87. This is the case when a very simplified network with n = 9 and m = 13 is used for illustration be x4 -5.33
7 x1
x7
1.85 x3
x6 -3.80
5 x2
x8
x5
-2 x9
Fig. 3 New output values after training
A New Network Flow Platform for Building …
137
If more sophisticated network would be used the mechanisms of activation and functions were actuated through parameters f i,0 and {ki }, then better results in trainability might be achieved. It follows from the numerical example presented that the network flow platform of ANNs is competitive to the widely spread platform of the multilayer ANNs, as latter is known.
7 Conclusion 1. A new network flow platform for building up artificial neural networks (ANN) is developed in the present work, which is substantially different from the widely used nowadays multilayer ANNs. Latter may be considered as a specific part of a network flow structure and results obtained by it are preserved in the new platform. The necessity of change of current platform of multilayer structure by a new network flow platform for building up ANNs is pointed out. 2. The generalized network flow with gains and losses is shown to be most appropriate for interpretation of ANNs among all other network flows. Formalization of ANNs is accomplished on the base of this generalized network flow with gains and losses. 3. A number of results are received in the framework of the network flow interpretation of the ANNs, related to the maximal generalized network flow with gains and losses, minimal cut, and the application of the mincut-maxflow theorem. Cases of introducing additional linear constraints into the ANNs are considered, which connect the realization of separate signals from one neuron to the other. 4. It is proven that in network flow interpretation of ANNs usage of specific highly effective network algorithms for optimization of polynomial complexity provides exact optimal solutions. In this way heuristic procedures of the multilayer ANNs, leading to approximate solutions are avoided. 5. A numerical example is realized that demonstrates the possibility, which is a question of principle, about the realization of ANNs on the base of a generalized network flow with gains and losses and confirms the results achieved.
References 1. F. Rosenblatt, Principles of Neurodynamics (Spartan Books, N.Y, 1965) 2. M. Minsky, S. Papert, Perceptrons (MIT Press, MA, 1969) 3. T. Sejnowski, C. Rosenberg, Parallel networks that pronounce English text. Complex Syst. 1, 145–168 (1987) 4. D. Graupe, Principles of Artificial Neural Networks, 3rd edn. Advanced series in circuits and systems, vol. 7 (World Scientific, 2013). ISBN 9814522740, 9789814522748 5. R.J. Schalkoff, Artificial Neural Networks (McGraw-Hill Higher Education, 1997). ISBN: 007057118X
138
V. Sgurev et al.
6. D. Kwon, Intelligent machines that learn like children. Sci Am (2018). https://www. scientificamerican.com/article/intelligent-machines-that-learn-like-children/ 7. V. Sgurev, Artificial Neural Networks as a Network Flow Capacities. Comptes Rendus de l’Academie Bulgare des Sciences, Tome 71(9) (2017) 8. L.R. Ford, D.R. Fulkerson, Flows in Networks (Princeton University Press, 1962) 9. N. Christofides, Graph Theory: An Algorithmic Approach (Academic Press, 1986) 10. P.A. Jensen, W.J.P. Barnes, Network Flow Programming (Krieger Pub Co, 1987). ISBN-13:9780894642104, ISBN-10:0894642103. https://www.amazon.com/Network-Flow-ProgrammingPaul-Jensen/dp/0894642103 11. D. Dai, W. Tan, H. Zhan, Understanding the Feedforward Artificial Neural Network Model From the Perspective of Network Flow (Cornell University Library, 2017). https://arxiv.org/ abs/1704.08068 12. V. Sgurev, S. Drangajov, V. Jotsov, Network flow interpretation of artificial neural networks, in Proceedings of the 9th International Conference on Intelligent Systems—IS’18, Madeira Island, Portugal, IEEEXplore (2019). ISBN: 978-1-5386-7097-2, ISSN: 1541-1672. https:// doi.org/10.1109/is.2018.8710524, 494-498 13. R.K. Ahuja, T.L. Magnanti, J.B. Orlin, Network Flows: Theory, Algorithms, and Applications (Pearson 2013). ISBN 1292042702, 9781292042701 14. V. Sgurev, Network Flows with General Constraints (Publishing House of the Bulgarian Academy of Sciences, Sofia, 1991). (in Bulgarian) 15. V. Sgurev, St. Drangajov, Intelligent control of flows with risks on a network, in Proceedings of the 7th IEEE International Conference Intelligent Systems—IS’14, 24–26 Sept 2014, Warsaw, Poland, https://doi.org/10.1007/978-3-319-11310-4. Tools, Architectures, Systems, Applications, vol. 2 (Springer International Publishing, Switzerland). Advances in Intelligent Systems and Computing, vol. 323 (2014), pp. 27–35
Empowering SMEs with Cyber-Physical Production Systems: From Modelling a Polishing Process of Cutlery Production to CPPS Experimentation José Ferreira, Fábio Lopes, Guy Doumeingts, João Sousa, João P. Mendonça, Carlos Agostinho and Ricardo Jardim-Goncalves Abstract As technology evolves, the contemporary technological paradigm of manufacturing Small and Medium Enterprises (SME’s) has been changing and gaining traction to accommodate its ever-growing needs for adaptation to modern industrial processes and standards. Their willingness to integrate Cyber-Physical Systems (CPS) modules in their manufacturing processes and to implement Cyber-Physical Production Systems (CPPS) is firmly based on the perception that value added services result from the technological evolution and, in the future, better tools are expected to guarantee process control, surveillance and maintenance. These Intelligent Systems involve transdisciplinary approaches to guarantee interaction and behavioural fluidity between hardware and software components which often leads to complexity in the coordination of these components and processes. The main objective of this work is to study these aspects and to contribute with data regarding the applicability of CPS components in the current SME’s environment with the intent of improving the performance of manufacturing processes. To accomplish J. Ferreira · F. Lopes · C. Agostinho · R. Jardim-Goncalves Centre of Technology and Systems, CTS, UNINOVA, DEE/FCT, Universidade NOVA de Lisboa, Lisbon, Portugal e-mail: [email protected] F. Lopes e-mail: [email protected] C. Agostinho e-mail: [email protected] R. Jardim-Goncalves e-mail: [email protected] G. Doumeingts Interop Vlab, Brussels, Belgium e-mail: [email protected] J. Sousa · J. P. Mendonça (B) MEtRICs Research Centre, University of Minho, Guimarães, Portugal e-mail: [email protected] J. Sousa e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_7
139
140
J. Ferreira et al.
this, an architecture is proposed, which is based in the process modelling and process simulation of the different stages of an existing SME factory production, allowing real-time information, through IoT data collection, to feed different mechanisms of production improvement modules such as planning, scheduling and monitoring. Since SME’s are the most active and common company profile in the northern part of Portugal, it seemed beneficial to take part in this environment, by implementing the solution in a promising cutlery producing SME, with the objective of investigate and validate the applicability of the BEinCPPS components, within the proposed architecture, to improve the performance of the company industrial processes. Keywords Process modelling · Process simulation · Cyber-physical systems · Data acquisition · Sensors · Interoperability · Industry 4.0
1 Introduction The term cyber-physical systems (CPS) refers to a new generation of intelligent systems with integrated computational, networking and physical capabilities [1] that aims to fill the gap between the cyber world, where data is exchanged and transformed, and the physical world in which we live. The ability to interact with, and expand the capabilities of, the physical world through computation, communication and control is a key enabler for future technology developments. The emerging CPS are set to enable a modern grand vision for society-level services that transcend space and time at scales never possible before [2]. Contemporary opportunities and research challenges include the design and development of next-generation airplanes and space vehicles, hybrid gas-electric vehicles, fully autonomous urban driving and prostheses that allow brain signals to control physical objects. In CPS, embedded computers and networks monitor and control the physical processes, usually with feedback loops where physical processes affect computations and vice versa [3]. This means many things happen at once. Physical processes are compositions of many activities, unlike software processes, which are deeply rooted in sequential steps. In [4], computer science is described as “procedural epistemology,” knowledge through procedure. In the physical world, by contrast, processes are rarely procedural. Physical processes are compositions of many parallel processes. Measuring and controlling the dynamics of these processes, by orchestrating actions that influence them, are the main tasks of existing embedded systems. Consequently, concurrency is intrinsic in CPS. Many of the technical challenges in designing and analysing embedded software stem from the need to bridge an inherently sequential semantics with an intrinsically concurrent physical world [3]. In the architecture of the experiment presented in this work, which is more detailed in further chapters, the design time and runtime processes are distinctively separated. The design time includes the process modelling, using the GRAI method [5], and process simulation, accounting for the full manufacturing processes that the system is
Empowering SMEs with Cyber-Physical Production Systems …
141
set to achieve. The runtime includes the execution of such models, through Business Process Modelling Notation (BPMN) [6], to follow the shop floor production in real-time and provide a global view of the production system, creating opportunities for the creation of “to be” scenarios. This is possible by including IoT (Internet of Things) agents, responsible for the data acquisition and pre-processing, and the inclusion of all available data from the existing ERP (Enterprise Resource Planning) systems and incoming production orders for the factory. This data is fed into a process orchestrator, which oversees the coordination of all runtime modules, which processes and forwards it to the data consumers, i.e. the company managers and the process improvement modules such as a scheduler and a capacity planner. The selection of these modules for the improvement of the manufacturing process is directly correlated with the needs of the use-case scenario. The use-case scenario is based on CRISTEMA, a young cutlery manufacturing SME company, located in the north of Portugal. The cutlery industry is one of the main industries in the region, being the main metallurgical industry, in Guimarães, along with textile industries that represent 70% of the companies, and footwear production [7]. In the year 2011, the cutlery sector reported almost 40 million euros in sales (Internal, EU and non-EU) [8]. CRISTEMA has more than twenty years of history, but fork production only started in 2001. Since then, it has been (along with the complete cutlery set) the main objective of the company and future investments will be for supporting cutlery production in order to improve performance. The make-tostock strategy was implemented since the beginning for rapidly serving each customer and continues to be one of the main key features of CRISTEMA when compared to the competition. This specific sector had a quite linear technological progress since the mid-80 s and 90 s, where the production lines were basically based on human work. Since then, the industry has undergone a bold transformation in labour-intensive tasks through automation and robotization of their production lines. More recently, the inhouse machines and tool development has made a great contribution in the production scenario, enabling some flexibility of the production process, an increase of quality of final products and diversification of their catalogue. Furthermore, the cutlery industry has been paying attention to recent developments in Industrial 4.0 initiatives, IoT and CPPS, to keep their competitiveness and update to modern production lines and grant more efficiency to their processes, providing production agility, increased production status awareness and delivery time reliability. This experiment (analysis of the problem, requirements, solution and deployment) is performed in the frame of CPMSinCPPS (Cutting-edge Process Modelling and Simulation). This experiment was selected in the second open call of the European project BEinCPPS (Business Experiments in Cyber-Physical Production Systems), to carry out innovative Modelling & Simulation (M&S) experiments in CPPS-based manufacturing systems, using components of the project reference architecture. The deployed CPMSinCPPS project solution is currently still being tested at CRISTEMA, with focus on the BEinCPPS components. The implementation had a particular emphasis on the polishing process of cutlery, since the polishing activity is, currently, the bottleneck of the overall production process. This happens due to
142
J. Ferreira et al.
high setup times and low processing capacity of the machine, since it is a single polishing machine for different production lines, that requires different time-consuming configurations (sometimes lasting for hours) for polishing different pieces of cutlery. The technical evaluation, indicators of the platform and the technical lessons learnt are described in this work. Overall, the technical evaluation shows the almost total fulfilment of the user requirements, strong learnability, understandability, user attraction, and efficiency of the BEinCPPS components. The technical lessons learnt are related to integration as a major obstacle, modelling as a key learning, reusing past knowledge as a best practice and further development of the BEinCPPS components as next steps. The business evaluation of the platform, where the different Business Performance Indicators are described, that show improvements regarding the defined Business Objectives (BO), the recommendations and the results of the business questionnaire are also presented in this work.
1.1 CPMSinCPPS Vision The main objective of the BEinCPPS CPMSinCPPS experiment is to investigate the applicability of the BEinCPPS components in the SME’s (Small and Medium-sized Enterprises) environment for improving the performance of their manufacturing processes. BEinCPPS envisages the full adoption of cyber-physical production systems and related business models by European manufacturing SME’s by 2020 [9]. Hence, it promotes a series of business experiments that seek to integrate and experiment a Future Internet (FI) based machine-factory [10]. The industrial use-case for this work, is a young SME focused on cutlery manufacturing that is seeking to move towards an operational CPS-based production process with the support of BEinCPPS modelling and simulation components for the design and execution of their processes. Their willingness to accommodate the integration of CPS modules in their manufacturing process and adhesion to CPPS is firmly based on their perception that value added services will result from the technological evolution and in the development of future better tools that are expected to guarantee better conditions. For these purposes, which are set to accomplish the project requirements and also to provide significant scientific content to this experimentation, this work consists in the presentation of the advantages regarding the CPMSinCPPS solution and the involved BEinCPPS components, with focus on the different stages of process modelling and simulation (separating Design Time and Runtime) and on the analysis of the industry requirements and functionalities. It also consists in the presentation of the architecture that comprises the different modules of the CPMSinCPPS proposed solution and, finally, the detailed experimentation of this work in the use-case scenario. During the exposition of this information, relevant business indicators for industrial relevance and the different impacts, for the contemporary technological paradigm, are presented.
Empowering SMEs with Cyber-Physical Production Systems …
143
1.2 Industrial Relevance and Potential Impact The production of cutlery is a complex process involving different steps, which are composed of both manual and automated tasks. With the complexity inherit by the demand to accomplish such tasks, the technology that seeks to optimize them, gains room to grow. The company’s objective is to position itself as an important player in its sector, by creating a solid brand and a reference in the national and international market, whether as a key integrator of the value chain or as a key brand, as it is strategically positioning in the medium/long term. The company believes that the introduction of BEinCPPS will allow: • To improve the flexibility of the manufacturing system by analysing several production scenarios; • The decrease of costs of the manufacturing process by optimizing the production plan; • The improvement of the CPPS capability by analysing multiple usage scenarios; • Real-time tracking and optimization of the production and maintenance plans. The expected impact of the project is presented in Table 1. Table 1 Expected impact Business impact Cost
10% savings in production costs through the optimisation of the planning
Delivery time
Improvement of the planning of manufacturing tasks based on optimization in order to meet the delivery time of the customers (customer satisfaction)
Technical impact Equipment performance
Improvement of the equipment performance through a better coordination of the manufacturing activities and the maintenance activities
Productivity
Boost productivity with the optimisation of the process planning
Quality
The improvement of the quality procedures through the modelling of the processes
Innovation impact To improve the efficiency of the CPPS in the production lifecycle To convince the companies of the sector about the benefits of modelling and simulation in order to improve the efficiency of the CPPS
144
J. Ferreira et al.
1.3 Current Practices and Existing Problems Currently, most of the scheduling and managing of ongoing operations are manually organized close to the work cells, using paper-based documentation. The polishing schedule shows, for each day, the orders, the products, the quantities and the estimated times for the production. Despite the paper-based polishing schedule, the Production Orders are digitalized into the production software. The orders provide the details of the order itself (e.g. ID, article, product name, etc.) and production activities (e.g. cutting, polishing, stamping etc.) that must be performed in a specific sequence. Barcodes are used to provide a unique identifier for each activity in each production order. The current approach is lacking the flexibility demanded by the increasing pace of the business, mainly because the operators have to perform manual monitoring, planning and scheduling activities. Quality standards are usually materialized in a jig (a model) for every product that passed through the cutting, teeth trimming, stamping and polishing activities. When a suspicion of an anomaly arises, a reference model is used to check for defects. When the packaging activity finishes, an operator counts every defected part and registers the values. These defected parts are in a container that goes through every activity area, so each operator can place the defected parts collected in his station. The issues, weaknesses and points to improve in the production systems prior to BEinCPPS components implementation were identified by the analysis of the current practices and through several brainstorming sessions among the CPMSinCPPS technical experts and the CRISTEMA production stakeholders, mainly the production managers and the maintenance personnel. The identified points to improve are further detailed in Table 2.
1.4 Cyber-Physical Systems in Manufacturing In the paradigm of the manufacturing industry, the adoption of CPS is defined as CPPS and the goal is to interconnect available digital technologies with physical resources associated with manufacturing processes [11]. This idea takes part in the Industry 4.0, the fourth Industrial Revolution, a broad concept and a new trend in manufacturing [12, 13]. The term, was first introduced in 2011, in Germany, at the Hannover Fair, as “Industrie 4.0” [14, 15], and is mainly represented by CPS, IoT and Cloud Computing [16]. The development of these concepts lays in the necessity to respond to the diversification of consumer needs, the irregular demand fluctuations and the small-quantity batch productions [17]. To tackle the typical manufacturing method of mass production, some degree of innovation, flexible and predictive production, must be considered [18]. Thus, a CPS includes embedded systems (such as equipment and buildings), logistics, coordination, internet services and management processes [19]. A CPPS
Empowering SMEs with Cyber-Physical Production Systems …
145
Table 2 Points to Improve Points to improve in physical sub-system
Machine setups are very laborious and time consuming activities, and often lead to delays in the production. CRISTEMA must optimize the machine setup process, without disregarding the setup quality, or at least reduce the number of times it has to be performed in order to be competitive
Points to improve in decisional sub-system
The writing process of the Production Sequence (leading to the Production Order) relies mainly in the experience of the Production Responsible, who has experience about the tools, the machines, the resources capacity and the processes required for manufacturing the different cutlery products. Since no information is “stored” in the company, the execution of the Product Sequence relies mainly on one person Production Plan is the sequence of Production Orders. This document is written by the Production Responsible based on order priorities (includes internal orders and priority orders), unscheduled orders, orders that were not performed during the previous weeks, setup times and machine availability. This process is performed without the assistance of an IT tool and is based on manual calculations, the experience of the Production Responsible and the feedback from the operators. This is very prone to errors and lacks optimization due to the daily changes required each week, the number of different possible combinations of sequences and resulting delivery times. Production re-scheduling decisions usually result in unnecessary additional time, with all associated costs and/or delays affecting the delivery. Additionally, uncertainty is created even when small changes are introduced, resulting in high-stress levels in the workshop and further plan modifications to minimize other order re-scheduling impacts The Maintenance activities in CRISTEMA have low integration with Production mainly due to the lack of feedback about machine usage time (present and future), machine availability (due to machine failure) and the necessary preventive maintenance requirements (continued)
146
J. Ferreira et al.
Table 2 (continued) Points to improve in information sub-system
CRISTEMA mainly uses custom-built and “simple” IT solutions. One of the drawbacks is the low integration between software, modules and machines. For example, the production software only has the possibility of creating Production Orders and assigning an operator, but does not have the possibility to predict the delivery time of the Production Order using that specific sequence of activities Additionally, it does not allow simulating different Production Order sequences in order to estimate and improve the delivery time. When re-scheduling is required, delivery time improvements and delays cannot be quantified since is based on hand calculations, that are also time consuming The production planning software does not take into account the preventive maintenance schedules, so a machine can be assigned to be active during the preventive maintenance scheduled stoppage time
varies from a CPS in the sense that is a manufacturing-centred version of a CPS that fuses computer science, ICT (Information and Communications Technology) and manufacturing-science technology [20]. Both these concepts are extremely promising technologies of Industrie 3.0 and essential components of a smart factory [21]. Their implementation aid in various decision-making processes by predicting the future based on past and present situations [22, 23]. Currently, works like [21], shed light on the benefits of a CPPS regarding the improvement of processes with the use of real-time information. This information, acquired from the factory shop floor, is typically pre-processed in a module with Big Data (or any module capable of handling the data context) configurations. The information that is deemed viable, or as requested, is then used for the coordination and quality/productivity detection. With these parameters, the data can be compared, regarding the Key Performance Indicators (KPI’s). These indicators are parallelly build, simulated and used by the coordination sub-module to handle the creation or re-configuration of the manufacturing schedule and make other important decisions about the manufacturing process. Of course, all these processes can, usually, be visualized in a dashboard with a graphical user interface, to provide feedback and inputs from the factory’s personnel. The solution presented in this experimentation has a specific focus, without losing the sense of the complete architecture, on bringing the scheduling process to the factory’s shop floor, in order to respond to time-constraints and unforeseen variability of the manufacturing process. The objective of this is to quickly provide an adaptation that will better suit the necessities and the production priorities of a contemporary SME.
Empowering SMEs with Cyber-Physical Production Systems …
147
2 Model-Driven Paradigm for the CPPS In this experiment, in order to guide the development process and to select the appropriate modelling solutions (from business to technical levels), the Model Driven Service Engineering Architecture (MDSEA) will be followed [24, 25]. MDSEA architecture, takes in consideration the various aspects of the manufacturing system, not only the real-world business approaches, the technical parameters and the human interaction, but also the life cycle of the system, the product and the services: – Business System Model (BSM): At this level, the models represent the real-world (user point of view). BSM describes the process of the manufacturing system at a conceptual level, independent of any specifications concerning human or technical resources. – Technology Independent Model (TIM): This model level is obtained by extracting and transforming the concepts identified at the previous level (BSM) in three domains: IT (IT components/artefacts), Physical Means (Machines or tools development) and Human/Organization (competences/skills and departments). – Technology Specific Model (TSM): The TSM is the detailed level of modelling. Detailed specifications should be defined here depending on the specific technologies in order to develop or provide software, recruit or train personnel, or purchase of machines or means of production. MDSEA architecture (Fig. 1) is supported by a modelling tool called Manufacturing System Toolbox (MSTB), initially developed in the frame of the MSEE European project [26]. The tool can be used by enterprises willing to develop a new service/system or improve an existing one, within a single enterprise or in a supply chain [27]. MSTB is considered as an intuitive tool which could be used by the end-user after an initial training and it does not require a permanent support of a consultant. Fig. 1 MDSEA architecture [28]
148
J. Ferreira et al.
A business process can be seen as a set of internal activities performed to serve a customer [29]. It is characterized by being a purposed activity, carried out collaboratively by a group that often crosses functional boundaries and it is invariably driven by outside agents or customers [30]. This means that, to accomplish a business process, especially in manufacturing, it is necessary to involve several partners, or user profiles, and manage knowledge across different boundaries of the enterprise [31], much alike the Liquid-Sensing Enterprise (LSE) [32]. To better align the implementation and support of a process lifecycle, a separation of requirement concerns is in order, starting from the business goals, down to the consequent physical means to fulfil them [33]. This can be easily accomplished if a model driven approach is applied. Thus, instead of writing the code directly, such approach, enables services to be firstly modelled with a high level of abstraction in a context independent way. The main advantages of applying model driven approaches are the improvement of the portability, interoperability and reusability through the architectural separation of concerns [34]. The work presented here was inspired by [33], which adapted the model driven concept to manufacturing services development, with the definition of the MDSEA concept. It followed the Model Driven Architecture (MDA) and Model Driven Interoperability (MDI) principles [35], supporting the modelling stage and guiding the transformation from the business requirements (BSM) into detailed specification of components that need to be implemented (TSM). This approach proposes that each model, retrieved by the model transformation from an upper-level model, should use a dedicated service modelling language, which represents the system containing the level of description needed. MDSEA was the chosen method because is already oriented to the development of services for business processes and identifies the concepts of IT, Physical Mean and Human, used to describe the processes. For such approach to be successful, the three levels of abstraction provided by the MDSEA architecture (Fig. 1) were adapted, with some degree of specification, to correlate with the objectives of this work’s experimentation, as follows (and is summarized in Fig. 2). Business System Model (BSM) At this level, the business case of the user-case scenario is specified at a conceptual level, with no regard for specifications. It provides an abstraction level that envisages meta-information about the components (actors, resources, etc.) and the activities and the world in which they are active [e.g. “scheduling maintenance” is from the Digital World (DW) and “configuring machine” is from the Real World (RW)]. In the scope of this work and based on the GRAI integrated methodology [36], the manufacturing system is decomposed in three sub-systems (Fig. 2): controlled (physical) system, control (decisional) system and the information system. In a brief description, the controlled system transforms the input (materials and information) into outputs (new information, products or services) to be delivered to the costumers. The control system manages the mentioned transformation based on the received feedback, from the controlled system, and delivers actions or adjustments accordingly. The information system includes information from the physical system and from the customers, suppliers and other
Empowering SMEs with Cyber-Physical Production Systems …
149
Fig. 2 Manufacturing sub-systems [28]
stakeholders, and manages the information flow between the control and controlled systems. These systems are modelled using the formalisms/languages supported by the MSTB tool. It allows the creation of the physical system using the Extended Actigram Star (EA*), to be later transformed, using model transformation, to a BPMN diagram at TIM level. This diagram, in similarity to the described physical sub-system, includes the processes that comprise the manufacturing activities of the company to transform the inputs (materials and information) into outputs (new information, products or services). The needed decisional modelling is made using GRAI grids and GRAI nets and the information system diagrams are described and issued as UML (Unified Modelling Language). Technology Independent Model (TIM) The transformation in BSM originated a BPMN diagram that contains processes that are refined at this level. The information that allows this refinement is previously retrieved from the factory and is based on the IT components, machines, tools, and human/organisational competences and skills. The events, activities (tasks), gateways and flows (from the diagram) are thoroughly specified to match with the necessities and logical continuity of the manufacturing process. The services to be provided are specified and aligned with the diagram process that is deployed in the next level. Technology Specific Model (TSM) The specifications come into play at this level, depending on the technologies needed for the experimentation. The services provided by the different technologies are aligned and connected to (or integrated into) the diagram process. The diagram process (BPMN) is deployed in an engine that allows the process execution, considering the instantiation and parameterization of the identified activities with the services needed. The parametrization takes place in the engine, the Activiti tool (in the case of this experimentation) and, after that, the process is ready for execution, with the intent of following the use-case manufacturing process in real-time (Fig. 3). The usage of model driven approaches applied to process modelling is not a novelty per se. Several related works can be found in existing literature. In the work
150
J. Ferreira et al.
Fig. 3 MDSEA process design methodology for CPMSinCPPS
presented in [37], the authors propose a methodology for classifying and specializing generic business processes. With that methodology, the authors aim to derive, from a catalogue of generic processes and process specialization operators, an enterprisespecific process, which corresponds closely to MDA’s Computation Independent Models (CIM). In [38], the authors propose a framework for Inter-Organizational Business Processes based on MDA that considers three levels in a top-down manner: business (organizational), conceptual (logic) and technical (execution). Other relevant works are [39–41]. Based on the validation of the presented applications of MDA techniques in process modelling, the authors consider that the experimentation design benefits from the methodology behind MDA and MDSEA, in order to accelerate the transition of the traditional enterprise to the “internet-friendly” and context-aware organization that is envisaged in the BEinCPPS project.
3 CPMSinCPPS Development Framework The solution presented in this work is divided into three different layers, as can be seen in Fig. 4. The design layer is where the user’s knowledge gains shape and results in a representation of the factory’s production processes. The runtime stage is where the execution of the modelled processes occurs, enabling the monitorization of the production. The business layer includes the process requirements and the end-users. In this specific case, the presented process is a polishing activity, relevant for the use-case scenario presented further in this work (Sect. 5). This machine is relevant for the company due to the lack of automatic scheduling (the previous schedules
Empowering SMEs with Cyber-Physical Production Systems …
151
Fig. 4 CPMSinCPPS architecture
were made manually), the need of long periods for maintenance or setup (for accommodating different kinds of products) and the fact that all company’s products must pass by this machine, creating a production bottleneck. In Fig. 5, the flow for the process development is presented. It follows what was mentioned about the MDSEA approach, it begins with the definition of the business case. After this, the specification of services is possible, along with the creation of the Actigram model (BSM). In this model, the company is defined, along with the resources, services and processes. The manufacturing process is specified (at a business level) and the process to be monitored is created and transformed into BPMN. Through that, the service implementation and BPMN process refinement (TIM) is possible, to allow further parameterization and deployment of the process (TIM) in a BPMN engine, which, in the case of this experimentation, is the Activiti tool. The information that feeds the process engine is retrieved from the factory shop floor, using the sensors from the IoT devices. The solution allows relevant feedback about the polishing activity and, based on the production orders (and the type of products to be produced) that must be completed, creates an optimized schedule for a time frame of one week, which results in an optimization of the production. The architecture presented in Fig. 4 translates into two different stages (Fig. 6): the design time and runtime.
152
J. Ferreira et al.
Fig. 5 Flow of the process development
Design Time
Runme REST request (delay) or REST request (maintenace)
Data Collector Module
Scheduler Module
Count operaon cycles Count products Coun me
Automacally Generate Schedule for a Week Period
Process Modeller Module
Business User “Company Manager”
Feedback
Taccal Planning Overall Process Modeling (BSM) Full producon
REST response (JPEG file+ Shedule Detais)
Simulate Process Taccal Decisions
SLM Toolbox
FIWARE Context Broker
Schedule Opmizer
REST request (Order Data)
Json Data
Capacity Planning Module
SimStudio
EA* and GRAI Grid DEVS Local/Specific Process Modeling (BSM) Polishing
System Engineer “Producon Manager”
Open Form to import Excel
Publish Form
SLM Toolbox
Capacity Calculator
LS Importer Feedback for operaonal and execuon planning
EA* Model fine Tuning (TIM) Overall and local Informaon System Model
Producon Orders file (Excel)
SLM Toolbox Technical User “Developer”
Operaonal Planning Decision support
Data Collector Module
BPMN
Producon Data
IT model definion (TSM) System configuraon
AcviEditor
AcviEngine
BPMN
Deploy Process Lauch Process Orchestrator
Publish Form w/ Schedule details
Accept Next Acvity
Interface to Start Polishing
Technical User “Operator” Producon Orders
Publish Form
Mark End of Acvity
Interface to Conclude Polishing Json Data
JPEG file
Shopfloor ERP
Schedule Polishing
Feedback (e.g. accept more orders)
Fig. 6 Design time and runtime procedures in CPMSinCPPS
3.1 Design Time The design time is where the user’s knowledge gains shape and results in a representation of the factory’s production processes. This knowledge is provided by the different types of users (of the different roles in the company) that have their own understanding of the factory and production stage that they’re responsible for. It’s when these different perceptions come together, that the modelling activity can bring
Empowering SMEs with Cyber-Physical Production Systems …
153
real value and feedback to enable better strategic planning and proportionate a better integration, coordination and control during the overall process of the production execution. With the intent of creating a valid and viable proof of concept, the initial model resulted in a BPMN diagram that will serve the purpose of providing the different production steps to be modelled and simulated during the execution of the factory’s production. The model created within MTSB, as can be seen in Fig. 7, is exported to ACTIVITI, a tool for executing business processes included in the BEinCPPS project. In the Activiti editor, the exported simplified version of the BPMN process (Fig. 8) needs to be refined to allow it to be executed properly. Several fields, inside the tool,
Fig. 7 MSTB diagram for polishing machine
Fig. 8 BPMN process in Activiti
154
J. Ferreira et al.
determinate some aspects of the its behaviour and allow to insert some additional logic, through a JAR (Java Archive) file, to be carried out during the process, e.g. e-mail notifications about the status of a certain machine. After the BPMN process is ready, the design time phase is finished. The modelled process can be executed along with the factory’s production and the Activiti tool engine functions as process orchestrator, coordinating the software and hardware components of the architecture. This is possible due to an IoT device deployed on the polishing machine. With this, the company administrator is able get information such as when the activities are completed, in real-time, the number of polished, re-polished and wasted cutlery and production/setup times for the machines.
3.2 Runtime The runtime phase consists, as mentioned previously, in following the real-time production process. This is possible with the usage of the Activiti engine and the data fed to it by the IoT node deployed in the polishing machine. This IoT device is integrated in the developed data collector module, which is detailed further in this work. To avoid any complication or waste of time in the production process, an addition tool was developed to be used by the factory workers. It consists in a very simple and intuitive user interface that allows the operator, of a certain machine, to carry out his work without significant inconveniences while providing relevant real-time information. This interface is supported by an Android mobile device (tablet) place on the machine and allows the input of information such as choosing the next activity to be performed by the machine or when a production step begins or ends (Fig. 9), or also check the production schedule for the machine, as illustrated in Fig. 10. With the information readily available in the Activiti engine, decisions can be made based on real-time data. This data is used by the Scheduler and Capacity
Fig. 9 User interface screen for confirming the end of a production step (unloading the machine)
Empowering SMEs with Cyber-Physical Production Systems …
155
Fig. 10 User interface screen for checking the machine’s production schedule (allows confirming or changing the next activity to be performed)
Planner modules (explained in the next chapters) with the objective of optimizing the polishing process. With this brief description of the solution provided by this work, the next chapter has a more detailed explanation of the relevant components of the CPMSinCPPS architecture.
4 CPMSinCPPS Architecture To allow a better explanation of the solution in terms of the components that build the architecture of this work, the already presented Fig. 4 depicts the different layers, modules and components that integrate the presented solution. The following subchapters provide a focus on a functional description of some main architectural modules that aim to validate the process optimization provided by this experiment.
4.1 Process Modeller Module Process modelling techniques usually provide simplified representations of complex processes and clarify the key elements of the modelled object. In the case of manufacturing, it has been proven that is very useful to understand the overall organization of the production process. As already introduced, the CMPSinCPPS Process Modeller
156
J. Ferreira et al.
Fig. 11 Key elements of the Process Modeller module (final prototype)
is a key toolset that supports users in the design and tactical planning of CRISTEMA’s production process. The usage of modelling solutions should be followed by a structural architecture and a methodology covering different points of view (e.g. business or technical, static or dynamic) while being capable of considering the end-users at different layers of the enterprise. For this reason, the Process Modeller has been implemented following the principles of MDSEA, envisaging modelling at the business level (BSM), technology independent level (TIM) and technology-specific levels (TSM) [24]. However, the CMPSinCPPS Process Modeller (Fig. 11) is not only about modelling and understanding the organization and its production process. It supports the design layer of the architecture and enables the application of the modules for simulations purposes, hence creating a feedback loop to the decision makers, providing them future insights of what might happen in case of occurring changes in the process (Table 3).
4.2 Simulation To observe and verify dynamics of the specified behaviour of EA* and BPMN models, a simulation model has been introduced [42]. For that purpose, the MSTB has been extended with a DEVS model editor developed in conformity with the specifications presented in [43]. This extension allows the generation of a DEVS coupled model due to the template instantiation of DEVS atomic models and their coupling. The tool is capable of running a simulation to observe the performance indicators evolution such as the time to achieve a service process. Also, for pedagogic purposes, it can show a step by step animation, starting from the first active state in models till reaching the last active ones. The animation of the DEVS diagrams is based on the results obtained from the simulation. The animation indicates active states and models that represent BPMN activities with associated resources. The step by step animation can be visualized by a colour timed change such as indicated in Fig. 12. The extended MSTB possesses a graphical wizard for the creation of new diagrams. The user is able to create DEVS diagrams in two ways: either to start from
Empowering SMEs with Cyber-Physical Production Systems …
157
Table 3 Functionalities/components of the modeller Functionality
Level
Description
Component
Process modelling and design
Business level (BSM)
Modelling of business processes in Extended Actigram Star (EA*) language
MSTB
Decision modelling
Business level (BSM)
Modelling of the decision structure in GRAI GRID
MSTB
Model transformation
BSM → TIM
Transformation of EA* models to initial BPMN models
MSTB
Process modelling and design
Technology-independent level (TIM)
Enrichment and further detail of the business processes in BPMN language
MSTB
Data modelling
Technology-independent level (TIM)
Representation of the information subsystem using UML class diagrams
MSTB
Model transformation
TIM → TSM
Transformation of BPMN models to Activiti compliant models
MSTB
Model transformation
TIM → TSM
Transformation of BPMN models to DEVS models for simulation
MSTB
Process modelling and design
Technology-specific level (TSM)
Enrichment and parameterization of the business processes, preparing them for their execution by an orchestrator (BPMN language)
ACTIVITI Editor
Process simulation
Technology-specific level (TSM)
Simulation and testing of process workflows and events
SimStudio
scratch and to generate from BPMN by the transformation process or to generate a new diagram from an existing EA*.
158
J. Ferreira et al.
Fig. 12 User interface for DEVS M&S at TIM level in MSTB
4.3 Capacity Planner Module The goal of this section is to present the Capacity Planner module that responds to the business requirements for the use-case scenario. The purpose of the Capacity Planning is to check the adequacy of working capacities in relation with the workload resulted from the production plan (based on the production orders) and to propose corrective/preventive actions in case of over/under load. In addition, Capacity Planning allows increasing the awareness on the production status, and consequently its agility. For instance, in case of an urgent or unexpected order, the Production Supervisor will be able to evaluate the feasibility of that order and its expected delivery date. It is envisaged to provide data on the production status by the Data Collector module to update the Capacity Planning. The general capacity planning concepts and functionalities that allow the implementation of the module (Fig. 13) in CPMSinCPPS are presented in this sub-section.
Fig. 13 Key elements of the capacity planner module (final prototype)
Empowering SMEs with Cyber-Physical Production Systems …
159
Routing File and Operations It is about the sequence of production activities. In the CPMSinCPPS use-case, the “rough routing” is globally represented in the Process Models (results from the Process Modeller module) and the “detailed routing” is defined for each production order and is available in the Production Software. Workload There are different ways to express the workload. The most natural one is to express it in quantity of products. However, this is not possible when several references of products are produced by the same resource. Then, it is necessary to define a common unit. The most usual unit is time. Then, the workload is expressed in occupation time. The total occupation time must be considered: production time, set-up time, etc. For a period (t), the workload (L) is the sum of loads generated by each product passing through the considered resource during the considered period. Lt = qpt.up + st
(1)
where Lt qpt up st
workload during the t period quantity of p products manufactured by the resource during the t period unit time of the p product set-up time during the t period.
In the CPMSinCPPS use-case, the workload is calculated based on the data provided by the Production Software, mainly indicating the active orders, quantities to produce, related production activities and operations times (start and end time) for each activity. Capacity In the CPMSinCPPS use-case, the capacity is defined by the Production Supervisor based on his knowledge of the production machinery, working days, and operators/machine schedule. In the use-case, the resources are the production machines and the capacity is presented in a time unit. Occupancy Rate The occupancy rate is the result of the comparison of the workload with the capacity. It is expressed as a percentage for each period (t):
Occupancy Rate (%) = Workload (t)/Capacity (t) ∗ 100
(2)
In the CPMSinCPPS use-case, based on the Occupancy Rate (Fig. 14), the Production Supervisor can envisage corrective/predictive actions. There are some ways to decrease the workload. For instance, by moving towards another period all manageable issues that decrease the capacity (vacation, preventive maintenance, etc.). Another way is to take advantage of underloads to absorb overloads (i.e. load smoothing).
160
J. Ferreira et al.
Fig. 14 Capacity planning and occupancy rate
Order N°3366
To build the Capacity Planner, CRISTEMA supplied a portion of the history of production orders that included the operation times of the four main production activities (i.e. Cutting, teeth trimming, stamping, and polishing), related to the usecase. Figure 15 illustrates the Capacity Planning of the cutting activity in June 2017. In this figure, as an example regarding the production order n° 3366, the Cutting activity began on June 1st at 00:00 (= 8 a.m.) and finished at 2:53. The working time (2:53) represents the occupation of the cutting machine. For the moment, the elaboration of the Capacity Planning is supported by Excel functions due to the lack of an adapted open source tool (it is under investigation). In fact, open tools such as frePPLe exist but, despite the advantages of this tool and due
Fig. 15 Example of capacity planning for the cutting activity (on June 2017) in the frame of the use-case
Empowering SMEs with Cyber-Physical Production Systems …
161
to a multiplicity of modules (ERP), the tool does not correspond to the demand of the CPMSinCPPS end-user. Therefore, considering the importance of this module, which is related to several requirements, the tool search is ongoing and if necessary, a custom development will be performed.
4.4 Scheduler Module This module is a tool that addresses the typical scheduling problem, i.e. optimization problems that involve assigning resources to perform a set of tasks in specific times. In this solution, the scheduler module allows the automation of the creation of an optimized schedule for a specific machine, which helps the company’s Production Responsible to set the sequence of production orders for the workweek. It is important to mention that in the scope of this use-case, resource optimization is not considered, since only one polishing machine is available. To better determine the sequence of activities, some rules were defined, prioritizing the urgent activities and considering the different types (or families) of products. The types, in this case, are related to the shape of the cutlery, and for each different type, it is necessary to make a specific re-configuration of the polishing machine. To optimize the timeline, the scheduling tool groups the maximum number of activities of the same type together, to avoid the machine preparation/re-configuration, which ultimately results in saving time. With the integration with the process orchestrator, this tool maximizes the efficiency of the polishing process by swiftly responding (re-generating the schedule), to accommodate necessary readjustments, in face of delays or maintenance needs. The implementation of the scheduler module can be divided into three subcomponents: An optimizer, a front-end application and an API (Application Programming Interface). The optimizer is the core engine of the scheduler and is responsible for generating the optimized schedule, based on the provided inputs and applicable constraints by using strategies as maximizing machine usage, minimizing setup time between orders and others. The front-end application acts as the dashboard for the scheduler, where the end-user can view the entire schedule and status of the polishing machine, for each product (Fig. 16). The possible states for each product are Pending, Progress or Completed. The schedule, presented as a Gantt chart, provides information about the start and end times for each polishing activity. The API is a REST API for interaction with the orchestrator module, that comprises the Activiti engine. It provides the resulting schedule based on inputs such as an Excel file with new production orders, the completion of an activity (marks it as complete and responds with the next activity for the user-interface to display) or the need for maintenance (a maintenance is added to the schedule in the next immediate feasible timeframe).
162
J. Ferreira et al.
Fig. 16 Front-end application for the scheduler module
4.5 Process Orchestrator The purpose of the Process Orchestrator (see Figs. 4 and 6) is to standardize, unify and automate the production processes of the factory, by aligning the various services available to perform the polishing process. The Process Orchestrator runs on the Activiti Engine and orchestrates the data received through Data Collector Module services, which are used to decide the next steps, and making the connection with the Scheduler Module and the Capacity Planner Module, enabling them to function properly and, ultimately, achieve the objectives of this architecture. The Orchestrator also coordinates the information that is presented to the company managers, through user interfaces or alerts (e-mails).
4.6 Data Collector Module The data collector module integrates the FIWARE context broker and deploys a set of IoT nodes in the polishing machine. The objective is to increase the CPPS capabilities by collecting production data, providing it to modules such as the scheduler and the capacity planner through the process orchestrator, which then can relay feedback to the company and production managers. The IoT is an emerging technological concept that aims to combine consumer products, industrial components and other everyday objects with Internet connectivity and powerful data analytic capabilities that can transform the way we work and live [44]. Combined with CPPS, the IoT objective is to optimize functionalities, as every extractable information becomes a mean of analysing and computing the
Empowering SMEs with Cyber-Physical Production Systems …
163
functioning process, to enhance its context functioning and to provide new purposes, that emerge when connecting a certain object to an intelligent network [45]. The data collection process concerns an important step for the establishment of an efficient and dynamic industry solution of an intelligent system, in the IoT contemporary environments. Its main function is to fill the gap between devices and information systems, comprising the acquisition (heavily based on the low-layer devices from the network), the processing and management of raw field data retrieved from the physical worlds into viable information on which the applications can make decisions upon. The objective of the data collection module, in this work, is to comply with the factory’s business requirements by retrieving data from the factory shop floor and production systems, processing it and providing relevant information to the higherlevel modules of the architecture. This process of data collection uses the existing systems in the CRISTEMA‘s factory production process, performing basic ETL (Extract, Transform, Load), with additional middleware processes in the cloud and hardware, on-site sensors and nodes, for better data harmonization, filtering and extraction, into the internal data structure of the CPMSinCPPS architecture. The CPMSinCPPS middleware technology is located in the cloud. This approach is denominated as cloud computing, where the information goes directly from the IoT devices and nodes into the cloud for processing. This is less expensive, suitable for the contemporary SME’s, and allows more complex processes but requires the allocation of some processing time in the data servers/consumers. As mentioned before, the data collector module uses field data, retrieved from the deployment environment (order status and polishing functioning) to provide structured planning and scheduling for the polishing machine. Following the architecture presented in Fig. 4 and the implementation of the technologies presented in Fig. 17, the field data retrieved from the IoT devices, which have Internet capability, is forwarded to the FIWARE Context Broker submodule using the lightweight Pub/Sub messaging services of Mosquitto (M2M open source software for the MQTT (www. mqtt.org) connectivity protocol), which itself is handled by the CPPS Publishing Services (IoT Agent) (which acts as a gateway for communication devices using the MQTT protocol with NGSI brokers) in order to respect the structure already defined in the Orion Context Broker and relevant NoSQL MongoDB databases holding the collected information (in the JSON format). The FIWARE Orion is responsible for the lifecycle of context information including storage, updates, queries, registrations and subscriptions of the CRISTEMA‘s devices. From this context information is possible to easily retrieve and infer about production data (e.g. setup and idle times, production cycles and maintenance requirements). The information retrieved from the existing CRISTEMA‘s Excel spread sheets of the Legacy ERP about production orders also plays an important role in this data collection process by providing the necessary information for other CPMSinCPPS modules. This section of the data collector module is parallel to the devices/FIWARE process. A Legacy Systems Importer provides the data structuring to feed the scheduler module, which among other fields needs to use the priority and product ID fields of
164
J. Ferreira et al.
Fig. 17 Architecture implementation and technologies used
the production order along with the hardware restrictions of the polishing machine to optimize the best scheduling of the production orders available. The general communication process for the legacy data is relatively simple where the legacy systems importer simply adapts the data for the scheduler. The process for the field data originated by the IoT devices has a more intricate communication methodology and is presented in Fig. 18. In both, the data collector feeds data into the architecture process orchestrator (that uses the Activiti engine). This communication process allows essential interoperability for the functioning of the data collector module, providing certain functionalities through the CPPS Publishing Services (IoT Agent) capabilities of managing devices, configuration exchange, information forwarding and data structure. Initially, the CPPS Publishing Services (IoT Agent) defines the structure for the data in the Orion Context Broker and then performs discovery of the existing sensors, subscribing to MQTT topics that allow the communication with the devices. The sensors, connected to their respective nodes, pre-configured for communication, subscribe to the CPPS Publishing Services (IoT Agent) topics and reply with their identification. After that, the CPPS Publishing Services (IoT Agent) sends the full configuration of data collection to the devices, which then begin to publish the acquired field data (following the parameters of the configuration). This data is processed in the CPPS Publishing Services (IoT Agent) and sent to the Orion Context Broker. In the Orion, the information is stored in MongoDB NoSQL databases for further querying by the process orchestrator (Activiti engine).
Empowering SMEs with Cyber-Physical Production Systems …
165
Fig. 18 IoT data collection communication flow
5 CPMSinCPPS Use-Case The site for the deployment and experimentation is the factory of CRISTEMA, in the north of Portugal. The production process of cutlery involves different processes, but the experiment prototype is only focused on the polishing process (Fig. 19). Similarly, to the previous chapter, the constraints related to the design time and runtime are explained here. The technologies used, that allowed the implementation of this architecture are presented in Fig. 17.
166
J. Ferreira et al.
Fig. 19 CRISTEMA polishing machine
5.1 Design Time The design time phase is focused on the representation and formalisation of the knowledge that is inside the user’s mind. Each type of user (see Fig. 4) has its own understanding of the factory and production system, and only when put together can the modelling activity bring real value to this stage. This phase includes the process modelling, using GRAI, and process simulation, considering the full manufacturing processes that the company is set to achieve. Following the flow of actions explained before, the Company Manager and the Production Manager need to start by defining the overall process using the MSTB and the Extending Actigram language (EA*). To accomplish this, the components (actors, resources, etc.) and activities are first defined at a conceptual level, with no regard for specifications concerning them (Fig. 20). Based on the GRAI integrated methodology, the factory’s system is modelled using the formalisms supported by the MSTB tool. The physical system is created, using the EA* format (Figs. 21 and 22), to be later transformed, using model transformation, to a BPMN diagram (Fig. 23). Regarding the focus of this work, the polishing machine process explains the functioning of the polishing machine (Fig. 19) while allowing to gather information about the state of the production. This process, first developed in MSTB and transformed in BPMN (Fig. 23), is exported to be parameterized in the Activiti tool (Fig. 8). This parametrization allows the execution of the diagram in the Activiti engine and, ultimately, following the manufacturing process in real-time. The Developer takes up the modelling activity, at this stage, using the information already contained in the process and extending it, specifying two BPMN processes that will serve as the monitoring and control workflow for the runtime prototype: The polishing process, and the “master” process which coordinates the runtime loop.
Empowering SMEs with Cyber-Physical Production Systems … Fig. 20 CRISTEMA’s components and activities, defined in MSTB
Fig. 21 EA* model that represents the overview of the manufacturing process
167
168
J. Ferreira et al.
Fig. 22 EA* model of the activities in the production
Fig. 23 BPMN process obtained using model transformation of the EA* models
There are four participants in this process (see Fig. 4), the Operator, the CPPS Publishing Services (IoT Agent), the Polishing Machine and the Production Controller (or Manager). The operator only has user tasks, which require user feedback or approval when finished. These tasks are related to a quality check, transportation and machine setup and loading. The “polishing machine” participant represents the functioning processes that need to be mentioned but do not require the input information of a user (manual tasks), such as the polishing and re-polishing processes. The production manager oversees the production within the factory and receives notifications about the end of production orders. The CPPS Publishing Services (IoT Agent) represents the service tasks related with other module calls, calculation of times and cycles, usually done by calling a Java class. The polishing process (Fig. 24) starts with the setup of the machine, first with the
Empowering SMEs with Cyber-Physical Production Systems …
169
Fig. 24 Polishing process model
demarking of the setup initial time, then the machine setup performance and finally the calculation of the setup time. After that, the machine is loaded by an operator and the polishing process is performed. During this task, the production counter is checked in order to determine if the production is finished. Each cycle corresponds to a number of products and the total number of products to polish is provided by the production order, before this process occurs. When the counter becomes equal or greater than the total number of products of the production order, the production duration time is calculated, and the counter is reset. After the production, the operator may perform a random quality check and if the piece of cutlery is not well polished, a re-polishing activity is performed. The number of cycles in this re-polishing process is stored in order to inform the production manager. After this, the current date is compared with the date predicted for the end of production in the schedule, to understand if there was a delay. If so, the scheduler is called in order to reorganize the schedule for the polishing machine and providing it. A notification for the end of production is then sent as is followed by the user tasks of unloading the machine and transporting the products to the warehouse, which is assured by the operator. The “master” process (Fig. 25) coordinates the runtime loop. Here, it all starts with importing the production order, which will provide information to the scheduler about the production orders available. After this is done, the scheduler API is called and provides the list of production orders to complete. The list provided is then checked and the next production order to perform (with production order ID, estimated time for the activity to end, number of products, etc.) is provided to the user. If there are no activities available, the process ends. The user confirms if the next activity should be performed right away (this was defined automatically by the rules of the scheduler but can be changed by the user if needed). If the user chooses to change the activity, he must provide the details in order to send them back to the scheduler when the polishing ends, to acknowledge that specific task as finished and exclude it from further schedules.
170
J. Ferreira et al.
Fig. 25 “Master” process model
The type of activity is verified next, to understand if it is a maintenance or a polishing one. If it is a maintenance, the counters and flags for the machine maintenance are reset, and the scheduler is called again (providing it with information about the activity performed). If it is a polishing activity, the BPMN service presented previously is called. When the polishing process ends, the maintenance counter is checked. If it is above 10,000 cycles, the maintenance flag is verified. This prevents the BPMN process to be stuck sending the notification for maintenance needed when the maintenance is not made right away due to optimization or user demands. If the flag is false, that mentioned notification is sent to the scheduler and the flag is set to true (meaning the machine needs a maintenance soon, and it will only turn to false after that maintenance occurs). If the flag is true it means a maintenance was already flagged and acknowledged by the scheduler, and the process must continue. After this, new production orders are checked. If any exists, the user must attach the correspondent spread sheet and import it to the scheduler. If no new production order exists, the process continues with another call for the scheduler in order to continue the production orders already scheduled for.
Empowering SMEs with Cyber-Physical Production Systems …
171
5.2 Runtime Now that the BPMN for the main process is explained and saved, the Activiti generated app becomes available following the deployment guidelines. From the initial menu in the Activiti web interface, the a CPMSinCPPS app should be visible. To run it, the app should be selected and in the next menu, the “Start a new process and then track its progress” option should be selected. The following menu presents the option to “Start process” and that executes the BPMN process (Fig. 26). The progress and interaction with the process tasks can be tracked on the Activiti platform. With the beginning of the polishing and the machine functioning, the Data Collector module starts receiving the IoT installation in place (Fig. 27). With this, the Activiti tool, Mosquitto (MQTT Broker), CPPS Publishing Services (IoT Agent), FIWARE Orion and related DB’s, are all implemented and running on an accessible cloud server (multi-container Docker applications), within the network of UNINOVA.
Fig. 26 Activiti process execution and interaction
Fig. 27 CRISTEMA’s IoT installation
172
J. Ferreira et al.
6 Lessons Learnt and Recommendations Several experiments were conducted during the project. The awareness of the production system was improved regarding the use-case, using MSTB modelling and the GRAI method. Several improvements were introduced (not related to BEinCPPS) and are being tested within the company, mostly related to procedures and team building that were only possible with an improved awareness of the production system. The ability to produce automatic Production Order sequences for the polishing activity improved the company’s awareness about the process’ constraints, and mostly, provided the means for an agile response to the Production Plan modifications. Experimenting with the user interface provided feedback from the operators on what kind of information would improve their performance. The ability to monitor the polishing machine’s production status provides the user with preventive maintenance alerts enables an additional integration possibility with the currently used maintenance software. Machine status monitoring also improved the polishing activity awareness by sending status messages via email to the Production Responsible. Further experimentation will show additional points of improvement. The main problem during experimentation was due to the integration of BEinCPPS components with the ERP system and the lack of integration with proprietary machines. The first problem occurred due to the absence of assistance by the ERP software company. The second problem results from the complexity of used machines that required further studies and assistance from a specialized company. During the development of the platform, some problems appeared that were not expected. First the integration between the Scheduler, ACTIVITI and CPPS Publishing Services, where it was necessary to pass the data between the different technologies, and it was necessary to standardize the interoperability in the messages. The second point was to improve the user interface that depends on BPMN, as it does orchestrate the process. In order to solve this, the best approach was studied, and a connection from a webpage to ACTIVITI was developed, thus achieving better usability by the machine operator. Several opportunities emerged during the experimentation, mostly related to procedures and team building that was only possible with an improved awareness of the production system. The process awareness and the control of the production were improved, since at this moment it is possible to know the production state in real-time, due to the e-mails with the status of each activity. CRISTEMA considers that the goals were fully achieved, since the Business Requirements were mostly fulfilled. Experimenting with CPPS provided the opportunity to further improve communication between different levels and to identify further possible improvements in the production system, related to BEinCPPS components. Starting with their integration, using the current ERP system, through the inclusion of an IT partner company, to provide a better workflow. During the development of the platform, some ideas were successfully concluded, but others were difficult to implement, making it necessary to adapt these ideas in order to achieve the desired goal. The first unsuccessful attempt was the involvement
Empowering SMEs with Cyber-Physical Production Systems …
173
of the ERP software company and the supplier machines that could not be integrated with our solution, which could make the system reliable and faster. Despite this difficulty, it has enabled the improvement of the tool and demonstrated that it is a good solution for the company even without the connection to the ERP software. This was demonstrated with the use of BPMN and the GRAI method, providing a global view of the production system, connecting with the shop-floor through the use of sensors and creating opportunities for the creation of “to-be” scenarios. Based on the BEinCPPS experimentations, BPMN, GRAI method and EAStar knowledge is required for the RTOs in order to successfully model the production system. Managers/Professionals knowledge concerning the GRAI method and EAStar are required. Fortunately, the learning process was eased by the Research and Technology Organisations (RTOs) for this use-case. Linear programming skills, IT and electronics are required by the RTOs in order to develop the BEinCPPS components and generic computer skills are required for the actors contacting with the BEinCPPS components, mainly for dealing with the simple interface. For the Technology providers: electronics and IT skills are required.
7 Final Considerations In this work, the final prototype of the CPMSinCPPS experiment was presented, along with an evaluation of the results obtained. This experiment was selected in the second open call of the BEinCPPS project, to carry out innovative Modelling & Simulation (M&S) experiments in CPPS-based manufacturing systems, based on the components/artefacts of the project reference architecture. The experiment was conducted on a sector of the CRISTEMA’s production line, related to the cutlery industry, with particular emphasis on the cutlery polishing process (a bottleneck of the overall production, due to high setup times and low processing capacity). The approach was based on a CPPS system composed of several collaborative entities that connected the polishing machine and sensors into a network environment. This allows the monitorization and control of the production in real-time, giving, to the company, the ability to forecast maintenance, to know the production status and to be aware of production planning. To reach the final prototype, the processes were organized in design-time and runtime phases, embedding the set of components developed to satisfy the requirements. The main outcomes can be summarized as follows: • The final prototype of the CPMSinCPPS solution deployed and available for the use-case experimentation; • Elaboration of the business process models; • Preparation of production data needed for the prototype experimentation; • Final CPMSinCPPS architecture available;
174
J. Ferreira et al.
• Implementation of the Process Modeller module, customizing the MSTB to ensure proper interoperability with the remaining components; • Implementation of the Data Collector module, integrating the FIWARE context broker and deploying a set of IoT nodes in the polishing machine; • Implementation of the Scheduler module; The Process Modeler, the Data Collector and the Scheduler have been the principal components involved in the intelligent system of this experiment. The team is still looking for a third-party solution that could provide the Capacity Planner full functionality. For the purposes of the prototype, it has been implemented as a set of Excel functions. For the technical evaluation, the Business Requirements were satisfied at 83% (target was 75%) and the value of the technical indicators (learnability, comprehensively, user attraction and efficiency) reached the score of 4 on 5 with one Technical Indicator at 5. The technical lessons learnt are related to: • Integration between the existing ERP and the BEinCPPS solutions as a major obstacle but also data to represent the Production Orders (and the grouping of these Production Orders to optimise the performance of the last activity (polishing); • Modelling as a key learning facilitation allowing to the end users to understand the search of solutions and also to reuse past knowledge as a best practice and further development of the BEinCPPS components as next steps. Concerning Best Practices, some results and methodologies from BEinCPPS and other projects where used. The implemented solution does not create distractions to the operators and to the established procedures. It was perceived that the use of modelling facilitates the search of Best Practices. Measured Business Performance Indicators, shows improvements with respect to the Business Objectives. The optimization of the introduction of a new production order in an already established production order sequence was not possible before the experiment. In the end of this experimentation work, a face-to-face meeting was organised in CRISTEMA, with the development team, and the management of the company expressed satisfaction regarding the performed work.
7.1 Future Work For the future, the improvements and the next steps have been well identified. Concerning the improvement, a list of actions has been established. It is expected the achievement of the full functionality of the Capacity Planner module or, if that deems to be unpractical, to buy it on the market, the difficulty is to find a suitable tool for an SME. The capacity planning is recognized as of great
Empowering SMEs with Cyber-Physical Production Systems …
175
importance in order to improve the planning decision at the technical level and also the related produced information facilitates the improvement of the scheduling. In terms of simulation, only the transformation to DEVS is made. It is necessary to execute this output to ensure proper validation. Another future work is to develop a mobile application (instead of web-based interface) to facilitate the operator’s work and to be more user-friendly. For the Scheduler, the next steps are related with solving some bugs, improving accuracy of time constraints (e.g. improve capability to detect holidays, weekends, etc.) and improve the scheduling algorithm to deal with more specific situations (e.g. unforeseen shortage of materials, new priorities, etc.). On the Process Orchestrator, the future work also starts with resolving minor bugs and improving the overall polishing process, to make it more efficient and to facilitate the operator, to identify the current production step, allowing better production feedback. This can be achieved by adding more sensors in the production line, to avoid manual inputs (on the interface) by the operator. Acknowledgements The research leading to these results has received funding from the EC H2020 Program under grant agreement n°. BEinCPPS 680633 (http://www.beincpps.eu/). The authors are also thankful to all CRISTEMA collaborators namely CEO João Fertuzinhos.
References 1. R. Baheti, H. Gill, Cyber-physical systems. Impact Control Technol. 1, 161–166 (2011) 2. L. Sha, S. Gopalakrishnan, X. Liu, Q. Wang, in Cyber-Physical Systems: A New Frontier. 2008 IEEE International Conference on Sensor Networks, Ubiquitous, Trusted Computing (sutc 2008) (2008), pp. 1–9 3. E.A. Lee, S.A. Seshia, Introduction to Embedded Systems—A Cyber-Physical Systems Approach (2017) 4. H. Abelson, G.J. Sussman, Structure and Interpretation of Computer Programs (McGraw-Hill Book Company, 1996) 5. D. Chen, G. Doumeingts, The GRAI-GIM reference model, architecture and methodology, in Architectures for Enterprise Integration (1996) 6. INCOSE, Systems Engineering Vision 2020, in System of Engineering Vision 2020 (2007), p. 32 7. I. Loureiro, E. Pereira, N. Costa, P. Ribeiro, P. Arezes, Global city: Index for industry sustainable development, in Advances in Intelligent Systems and Computing (2018) 8. Instituto Nacional de Estatística, Estatísticas da Produção Industrial. Lisboa (2012) 9. BEinCPPS, BEinCPPS (2017). [Online]. Available: http://www.beincpps.eu/ 10. O. Vermesan, P. Friess, Digitising the Industry—Internet of Things Connecting the Physical, Digital and Virtual Worlds (2016), pp. 153–183 11. J. Lee, B. Bagheri, H.A. Kao, a cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manuf. Lett. (2015) 12. H. Lasi, P. Fettke, H.G. Kemper, T. Feld, M. Hoffmann, Industry 4.0. Bus. Inf. Syst. Eng. (2014) 13. H. Kagermann, W. Wahlster, J. Helbig, Securing the future of German manufacturing industry recommendations-recommendations for implementing the strategic initiative INDUSTRIE 4.0. Final Report Ind. 4.0 WG (2013)
176
J. Ferreira et al.
14. B. Vogel-Heuser, D. Hess, Guest editorial industry 4.0-prerequisites and visions. IEEE Trans. Autom. Sci. Eng. (2016) 15. L.D Xu, E.L. Xu, L. Li, Industry 4.0: state of the art and future trends. Int. J. Prod. Res. (2018) 16. M. Hermann, T. Pentek, B. Otto, in Design Principles for Industrie 4.0 Scenarios. Proceedings of the Annual Hawaii International Conference on System Sciences (2016) 17. Z.M. Bi, S.Y.T. Lang, W. Shen, L. Wang, Reconfigurable manufacturing systems: the state of the art. Int. J. Prod. Res. (2008) 18. S. Wang, J. Wan, D. Zhang, D. Li, C. Zhang, Towards smart factory for industry 4.0: a selforganized multi-agent system with big data based feedback and coordination. Comput. Netw. (2016) 19. E. Geisberger, H. Manfred BroySeeger, L. Tönskötter, Agenda CPS - Integrierte Forschungsagenda Cyber-Physical Systems. acatech Stud. (2012) 20. E.A. Lee, in Cyber Physical Systems: Design Challenges. Proceedings—11th IEEE Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2008 (2008) 21. J.H. Lee, S.D Noh, H.J. Kim, Y.S. Kang, Implementation of cyber-physical production systems for quality prediction and operation control in metal casting. Sensors (Switzerland) (2018) 22. J. Lee, E. Lapira, B. Bagheri, H.a. Kao, Recent advances and trends in predictive manufacturing systems in big data environment. Manuf. Lett. (2013) 23. J. Lee, E. Lapira, S. Yang, A. Kao, in Predictive Manufacturing System—Trends of NextGeneration Production Systems. IFAC Proceedings Volumes (IFAC-PapersOnline) (2013) 24. Y. Ducq, C. Agostinho, D. Chen, G. Zacharewicz, R. Jardim-Goncalves, G. Doumeingts, Generic methodology for service engineering based on service modelling and model transformation, in Manufacturing Service Ecosystem: Achievements of the European 7th Framework (2014) 25. Y. Ducq, D. Chen, G. Doumeingts, A contribution of system theory to sustainable enterprise interoperability science base. Comput. Ind. (2012) 26. MSEE, Manufacturing Service Ecosystem (2014). [Online]. Available: http://interop-vlab.eu/ msee/ 27. Y. Ducq, An architecture for service modelling in servitization context: MDSEA in Enterprise Interoperability: Research and Applications in the Service-Oriented Ecosystem (2013) 28. I-VLab, MDSEA Architecture (2015) 29. I. Jacobson, M. Ericsson, A. Jacobson, The Object Advantage: Business Process Reengineering with Object Technology (ACM Press/Addison-Wesley Publishing Co., New York, NY, USA, 1994) 30. M.A. Ould, M.A. Ould, Business Processes: Modelling and Analysis for Re-engineering and Improvement, vol. 598 (Wiley Chichester, 1995) 31. M. Zdravkovic, H. Panetto, M. Trajanovic, in Semantic Interoperability for Dynamic ProductService. International Conference on Information Systems and Technology (ICIST 2013) (2013) 32. C. Agostinho, R. Jardim-Goncalves, Sustaining interoperability of networked liquid-sensing enterprises: A complex systems perspective. Ann. Rev. Control 39, 128–143 (2015) 33. Y. Ducq, D. Chen, T. Alix, Principles of Servitization and Definition of an Architecture for Model Driven Service System Engineering, in Enterprise Interoperability, vol. 122, ed. by M. van Sinderen, P. Johnson, X. Xu, G. Doumeingts (Springer, Berlin, Heidelberg, 2012), pp. 117–128 34. R. Grangel, M. Bigand, and J.-P.J.P. Bourey, in A UML Profile as Support for Transformation of Business Process Models at Enterprise Level. 1st International Working Model Driven Interoperability Sustainable Information System MDISIS 2008—Held Conjunction with CAiSE 2008 Conference, vol. 340, pp. 73–87 (2008) 35. Y. Lemrabet, M. Bigand, D. Clin, N. Benkeltoum, J.-P. Bourey, in Model Driven Interoperability in Practice: Preliminary Evidences and Issues from an Industrial Project. Proceedings of the First International Workshop on Model-Driven Interoperability (2010), pp. 3–9
Empowering SMEs with Cyber-Physical Production Systems …
177
36. P. Girard, G. Doumeingts, Modelling the engineering design system to improve performance. Comput Ind Eng (2004) 37. H. Mili, M. Frendi, G.B. Jaoude, L. Martin, G. Tremblay, in Classifying Business Processes for Domain Engineering. Proceedings of International Conference Tools with Artificial Intelligence, ICTAI (2006) 38. K. Bouchbout, Z. Alimazighi, in Inter-Organizational Business Processes Modelling Framework. CEUR Workshop Proceedings (2011) 39. K. Bouchbout, J. Akoka, Z. Alimazighi, An MDA-based framework for collaborative business process modelling. Bus. Process Manag. J. (2012) 40. A. Rodríguez, I.G.R. de Guzmán, E. Fernández-Medina, M. Piattini, Semi-formal transformation of secure business processes into analysis class and use case models: An MDA approach. Inf. Softw. Technol. (2010) 41. A. Rodríguez, E. Fernández-Medina, M. Piattini, in Towards Obtaining Analysis-Level Class and use Case Diagrams from Business Process Models. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 42. H. Bazoun, J. Ribault, G. Zacharewicz, Y. Ducq, H. Boyé, SLMToolBox: enterprise service process modelling and simulation by coupling DEVS and services workflow. Int. J. Simul. Process Model. (2016) 43. B.P. Zeigler, H. Praehofer, T.G. Kim, Theory of Modeling and Simulation (2000) 44. K. Rose, S. Eldridge, L. Chapin, in The Internet of Things: An Overview. Understanding the Issues and Challenges of a More Connected World. Internet Soc. (2015), p. 80 45. F. Lopes, J. Ferreira, R. Jardim-goncalves, C. Agostinho, in Semantic Maps for IoT Network Reorganization in face of Sensor Malfunctioning
Intelligent Approach for Analysis of 3D Digitalization of Planer Objects for Visually Impaired People Dimitar Karastoyanov, Lyubka Doukovska, Galia Angelova and Ivan Yatchev
Abstract Persons with visual impairments have difficulties to work with graphical computer interface, such as Windows icons. Also they do not have access to objects of cultural and historical heritage such as paintings, tapestries, icons, etc. The chapter presents an approach for providing visual Braille services by 3D digitization of planar or spatial objects using a graphical Braille display and/or tactile graphical tiles, supplemented with Braille annotations. A solution involving 3D touch by tactile reading of exhibits in a version accessible and affordable to visual impaired people is presented. Different existing devices, called Braille displays are described. Our technology for graphical Braille display is on the base of linear electromagnetic micro drives is shown. The presented ‘InterCriteria’ decision making approach is based on two fundamental concepts: intuitionistic fuzzy sets and index matrices. A comparative analysis of hybrid electromagnetic systems on the base of InterCriteria Analysis approach is given. Keywords Intelligent systems · Braille display · Cultural and historical heritage · Visual impaired people · InterCriteria analysis
D. Karastoyanov · L. Doukovska (B) · G. Angelova Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Acad. Georgi Bonchev Str., bl. 2, Sofia 1113, Bulgaria e-mail: [email protected] D. Karastoyanov e-mail: [email protected] G. Angelova e-mail: [email protected] I. Yatchev Technical University, Sveti Kliment Ohridski Blvd., 8, Sofia 1756, Bulgaria e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_8
179
180
D. Karastoyanov et al.
1 Introduction The media industry now provides entertainment, information, event organization and advertising targeting only majority in the society. Marginalized segment of the society such as blind people, are excluded from content production and content access. They are automatically less motivated in content consumption or in case of physical disabilities sometimes completely excluded. Making content and media seamlessly accessible to the marginalized segment of society is by now a crucial point on the agenda of European content/media/service industry. Besides achieving a bigger share of the available market it is essential to broadly establish innovative accessibility solutions to offer an increased personalized user experience in content and media consumption. Additionally, (real-time) context information is rarely taken into account for the adaptation and personalization of media and content. Visual impaired people have difficulties to work with graphical computer interface (Windows icons) and to understand pictures or other 2D objects of culturalhistorical heritage. The assistive interface for unsigned or blind people we divide in two groups—for text and for graphics visualization. In each of these groups they are subgroups depending on the movement method of the elements, presenting text/graphics touchable way. People with visual impairment need a new concept of working with computer. This disadvantage can be eliminated by using of graphical tactile displays [1–3]. A device, called Braille display can help to those people. The device visualizes Braille alphabet and graphics as well as pictures for visually impaired people. Nowadays there is no developed such device for graphical purposes. In the presented work, a prototype of the graphical screen will be presented. It will allow blind users to “see” the screen and give feedback according to the resolution and the “pixel” density of the screen. The density is very important and helps to produce a detailed image. On the other hand, it must be compliant to the user capabilities and to the device possibilities. During the last decade, a number of touch screen devices for visually impaired users have been announced. In particular, devices for refreshable visualization with pins are still at early development phases, mostly with small displays for just onefinger touch. Three popular examples are: Graphiti—an Interactive Tactile Graphics Display based on Tactuator technology [4]. It enables the non-visual access to any form of graphical information such as maps, charts, drawings, flowcharts and images, through an array of 60 × 40 moving pins. Each pin is independently addressable and can be set to “blink” individually. BlindPAD—an interactive tactile tablet that can display in real time maps, graphics, and other symbols [5]. The tablet has 12 × 16 independently controlled pins. It provides graphical information such as geometrical figures, maps or artwork. The tablet has 200 (mN) holding force. It is based on special and expensive materials. HyperBraille—in our opinion it is the best tactile graphics display at present [6]. It is a graphics-enabled laptop with 9000 pin-matrix appeared in 2008. The display is touch-sensitive so the user can locate the cursor position. According to [7], in
Intelligent Approach for Analysis of 3D Digitalization …
181
2015 the model is with a pin-matrix of 7200 dots, arranged in a 120 × 60 array. The HiperFlat is another version of HiperBraille. There are various existing Braille terminals and screens as well as conceptual models. A comparative analysis of the devices has been made by several parameters and several criteria. Nowadays, a number of devices have been developed to help visually impaired people. They can be divided into several major groups: Braille matrices; Braille terminals; Braille printers; Braille mice; Braille keyboards. Depending on the functions that such terminal offers, prices reach up to several thousand dollars [1]. One of the existing on the market device [1] is a Braille terminal. It consists of 40 or 80 cells with 2 × 3 needles in each cell for presenting 40 or 80 Braille symbols. The movement of each needle (pin) is electro-mechanical. Other way of movement is by using piezo effect. The resizing is transformed into mechanical vertical movement. Each pin has an own crystal, so this is very technological difficult, [2, 3, 8]. Another device is the Bulgarian Braille tablet Blitab. It visualizes text and consists of two main functional parts: a Braille tactile screen with 14 lines changing surface and a touch screen for navigation and application usage [9]. The ZIXEL technology uses mini piezoelectric linear drive motors. These ultrasonic motors create high force and speed with few built-in parts. A linear stack of ten miniature 2.8 (mm) cubes is mounted over the shaft of the actuator, [10]. There are interesting conceptual models that have the idea of displaying not only Braille symbols but also graphics. These are only conceptual models that have not been realized yet. An example of such model is the DrawBraille. There are 5 rows, each containing 7 cells of 6 dots each. The users enter each character by touching the corresponding squares that represent the Braille character [11]. A refreshable Braille Screen was announced in 2016. Researchers from the University of Michigan are working on a Braille tablet that, thanks to the use of microfluidics, could display complex information as graphs and charts, all while still being mobile. The Braille display uses air or liquid bubbles to create the raised dots on the screen, [12]. Facing the above-mentioned challenges, here we introduce new approach, concept and technical solutions. It will provide a system for the production of novel content and accessibility to content and services in hyper personalized mode targeting historical, arts and cultural heritage, specifically: Visual Braille Service, VBS is a solution involving 3D touch by tactile reading of exhibits, experimenting the transposition of the depicted information in a version accessible and affordable to visual impaired people targeting on one hand galleries and museums as content precursors and on the other hand public authorities, private companies and local associations for remote access to the content/exhibitions. Partially sighted or blind people cannot enjoy all the treasures that cultural heritage offers. The proposed service will produce tactile versions of a selected subset of masterpieces and to experiment and validate the solution. The proposed service allows, for the first time, visually impaired people and blind to “virtually” visit ordinary museums and exhibitions from their homes using 3D printers. Images will use
182
D. Karastoyanov et al.
Braille Agenda and Braille Annotations over the tiles. Artifacts—sculptures, architecture objects will be produced in 3D figures and recognize the content using Braille Agenda and Braille Annotations at the figures. The VBS application/service can provide the recorded, verbal notifications on smartphone about new content available through official museums/galleries databases. The backend services can aggregate and adapt the content in alignment to the user preferences. Also, the VBS will be available for exhibition organizers and they will be able to order images/artifacts and print them for their exhibitions/galleries. In the future the service will serve the broader audience e.g. general public in sharing images with blind people. This service can be easily evaluated quantitatively on the basis of the number of users.
2 Concept and Methodology The development is dedicated to consolidate media and content services with new ways of interaction and advanced context models for substantially improving accessibility and enhancing personalized experience in content consumption. The Fig. 1 illustrates the general concept pursued in the paper. It will cover three beneficiary groups as “content consumers” which are representative to disadvantaged user groups within the marginalized segment of society. The underlined concept and approach for the realization of the Visual Braille Services (VBS) and integration the system are described hereafter. Experiences involving 3D touch have been recently investigated, [13] and examined to study how people actually approach touch interaction, share experiences, learn from one another, and refine best practices that enhance accessibility and inclusion. 3D models can be profitably used with a variety of goals: • to reconstruct original artifacts: 3D models of artifacts that have been destroyed can be built from fragments and pictures of documents or other sources; • to replicate: originals that are too fragile to move can be duplicated and replicated;
Fig. 1 The general concept
Intelligent Approach for Analysis of 3D Digitalization …
183
• to interpret: reconstructing and then analyzing objects, pieces of architecture and maps of places, help us understand the past; • to investigate: various items can be built so as to compare artifacts from different sources; • to share: geographically distributed people can share the content. The solution we are proposing can be applied to any gallery, exhibition and museum, and is not only related to paintings, mosaics, tapestries and general textiles, manuscript and miniatures. In fact, it can be applied to any 2D artwork. However, it can be also extended to 3D cases, for sculptures and architectures. In what follows two emblematic scenarios related respectively to 2D and 3D cases are described in some details. Scenario 1: Tactile Images Preliminary statements. The original picture is digitized (see Fig. 2), modified, adapted, semantically segmented, reconstructed as a 3D model and finally 3D printed, to be read in “Visual Braille”. The tactile image includes a strip with title and key in Braille alphabet, so that each image segment, marked with Braille symbols, can be read and interpreted. It is also associated with Braille text, summarizing content, figures and scenes of the original one. Scenario 2: 3D Artifacts Reading actions and scenes. In the museum the visitor can approach it, read the Braille general description and the key listing characters and semantic scene components— identified by Braille characters—and enjoy the exploration of the tactile version, turned into full/empty contours of the original one. A first study case, [14] empowered visually impaired people and enable them to enjoy the visit inspecting with their
Fig. 2 Battle of Pavia in visual Braille scenario
184
D. Karastoyanov et al.
Fig. 3 3D artifacts as MEDIARTI content
fingertips the tactile image that depicts the beginning of the battle and the attack of the French gendarmerie led by Francis I. Preliminary statements. “A great machine almost in the manner of a triumphal arch, with two large paintings, one in front and one behind, and in smaller pieces about thirty stories full of many figures all very well done”. Giorgio Vasari in his “Lives” describes in this way the magnificent altar that he designed for the church of Santa Croce in Bosco Marengo, on commission of Pope Pius V. Unfortunately, in 1710, the altar was dismantled and replaced by the current of marble, the painting mostly scattered anywhere. On the right side of the Fig. 3, the 3D reconstruction of the machine made in 2004 by Computer Visual Lab. of the Pavia University, [14], on the occasion of the fifth century of the birth of the pope. This model was conceived on the basis of the Vasari’s project in the Louvre and an altarpiece that shows the machine in the background. In the 2015th edition the Vasari’s machine has been reproduced for tactile fruition for blind and those affected by visual impairment, [15]. It is forbidden not to touch is the exhibition motto, not only for the disable, but for as many users as possible, according to the design for all philosophy. Our solution. The process is implemented by a sequence of four phases as illustrated in Fig. 4: • Selection and acquisition, in digital format, of the masterpiece subset suitable to validate the approach (including the definition of quality acquisition parameters, such as spatial and chromatic resolutions, camera position, etc.); • Definition of a general framework for the conceptual segmentation of a piece of art into salient components suitable to communicate content and action in the scene;
Intelligent Approach for Analysis of 3D Digitalization …
185
Fig. 4 Visual Braille service
definition of a strategy to select the artifact’s size considering the original content and the handling accessibility; • Definition of a general framework for mnemonic labeling (in Braille alphabet) of each salient segment, modalities for representing them (empty and full parts), levels of their contour details and label density (all these based on their relevance to the scene); • 3D printing strategy and tiles distribution and cutting. The mismatch between the actual size of the work of art and that of the tiles allowed in the 3D printer (usually
186
D. Karastoyanov et al.
smaller) must be faced with a suitable strategy that avoids cutting salient parts, like faces, or even creating overlaps of extended segment contours. The Visual Braille Services. The microservices building blocks revolve around the common goal to provide content to consumers; they can be categorized based on a general content lifecycle: • Capture. Content capture microservices, at least one for each business service, have the goal to digitize analog content (drawings, tales, …) or to access external content. • Storage. Content storage microservices have the goal to store and retrieve the digital content produced by capture microservices. • Model. Content modeling microservices provide the tools to apply user models to digital content in order to transform it in a way suitable for consumption by end users (3d printing models from work of arts). • Deliver. Content delivery microservices provide the backend for the interaction points, or the physical artifact in the case of 3d printing.
3 Ambition and Innovation Potential Visual Braille. The proposed solution of art for visually impaired and blind people changes considerably the current scenario for the reasons illustrated by the following three points: • Blind and visually impaired can access the world’s visual culture participating in museums and exhibitions in the world at large. In fact, they can exploit a guide or a companion audio narrative on smaller reproductions of works of art (suited for low-vision people), or with hand-on activities on the entire surface of the picture reproduction (for blind persons). • It exploits a “two-part system” composed of Braille alphabet and pattern morphology: the success of the approach is due to these complementary components. Vision impaired is acquainted to both these features, and tactile diagrams use a lexicon of standardized patterns, enabling the reader to acquire a tactile vocabulary after a short training phase. • Tactile diagrams, and tactile graphics embosser in general, consist of two levels (monochromatic in nature); our solution exploits at least three spatial levels, which can be easily detected by visual impaired. Increasing the number of levels allows a rich tactile experience, while solutions like bass relieves are suitable only for very simple objects. The proposed solution is therefore suited to fully describe compositional and stylistic morphological details. Visual Braille Services. The proposed solution is very important in terms of social connectivity, cohesion and integration of visually impaired and blind people, who are currently excluded from the access to art in museums and exhibitions.
Intelligent Approach for Analysis of 3D Digitalization …
187
By means of tactile versions of paintings, mosaics, tapestries, manuscripts, miniatures, etc., visually impaired and blind people can experience the magnificence of masterpieces that would otherwise be precluded to them. Museums and exhibitions can therefore address a new class of visitors who could not be considered in the past, thus extending their population segments. It is to be stressed that the proposed solution can be applied not only to 2D artworks, but it can be also extended to 3D cases, for sculptures and architectures. It is also possible to perform direct 3D scanning of objects from cultural-historical heritage (sculptures, architectural objects etc.) using a mobile 3D scanner and special 3D software. The 3D models can be printed 3D and in color, so that the sighted user can observe the objects of heritage without risks for the original. The blind or visually impaired people can also “imagine” the objects when touch the figures. In particular, 3D models can be profitably used for a variety of goals, such as: • to reconstruct 3D artifacts that have been destroyed and that can be rebuilt from fragments, pictures in documents or other sources; • to replicate originals that are too fragile to move and can be duplicated for commercial purposes; • to reconstruct and then analyze objects, pieces of architecture and maps; • to compare artifacts from different sources; • to group geographically distributed parts of objects. The innovation potential also consists in the extensive use of full-color 3D printers and, for blind people, of less expensive monochromatic 3D printers. Although 3D printing is becoming increasingly common nowadays, here the production of tactile “objects” for visually impaired and blind people will be carried out on a relatively large scale, and in a context only marginally considered to date. The innovations will have real and tangible impacts across a number of sectors— media and content market, cultural and creative industry, medicine and health. The cultural and creative industry, with their institutes like museums, galleries and fashion and trends media have interest switch from trivial content to content that gives additional dimension and meaning. When artists, scientists and technologists collaborate, they create hybrid cultures where their different and compatible domains of knowledge meet, [16] leading to good practices and excellent results—US, [17] and EU, [18]. The stakeholders for VBS are indicated as: • Painting museums: the VBSs could be very useful for the museums, as marketing instruments, because it captures the segment of low vision people, is a high tech and innovative instruments, which places the museum as an institution which is careful toward social needs and disadvantages, diffuses a positive attitude toward the museum, and creates attention and notoriety. Wikipedia lists more than 200 main painting museums in EU28. The number can be not so high, but we are speaking of the main institutions in the painting field, with big budget, probably the better for the first stage of the diffusion process. • Libraries: similar to benefits for the museums, but potential with a more “social” and distributed approach, given the capillary diffusion of libraries in almost all European countries
188
D. Karastoyanov et al.
• Educational institution, particularly in the area of arts. Apart from previous considerations, we have to consider that never, in the human history, we have had this possibility. That way we enable the creative production to a new “kind” of people, with a different sensibility towards color and painting, the creative results could be highly innovative and surprising • Blind and low vision people associations, which could offer on more services to their associates. Their number is not high, but what is important is their territorial diffusion, because every local structure of the association can be a buyer of this service. Potential numbers are in the range of many thousands—Tens of thousands. • Local public administrations: VBS can be a tool to place the administration as caring for social needs, with a small expense. The number of local public administrations in Europe is in the order of tens of thousands (there are more than 8 thousand municipalities only in Italy; il CEMR, Council of European Municipalities and Regions, groups 150,000 local government in 41 European countries). Other environmental and socially important impacts are: Social skills’ empowerment: digital tools will allow end users (people with visual impairments) to access domains apparently inaccessible for them (e.g. art for people with visual impairments). Support for disengaged and disadvantaged users: helping people at risk of social exclusion reduce such risk factors early in their development is essential. Designing and developing content and services focusing on Braille visualization of images/artifacts: The approach will explore how digital tools can play a major role in enhancing social engagement training methods and strategies. In 2016 tactile plates were produced for the painting “Christ and the Samaritan Woman at the Well” by Annibale Carracci. Each figure has contours, different levels, and contains semantic annotation with Braille symbols; the annotation is explained in the associated Legend. A short text in Braille provides additional information (Fig. 5). In 2017 a 3D copy were produced for the sculptures above the Portal of the Certosa Cathedral near to Pavia, Italy. The group of sculptures was scanned using 3D scanner. After that, colors of the costumes according to the epoch were added (Fig. 6).
4 New Graphical Braille Display The patented Braille display is with simple structure and easy conversion technology with improved static, dynamic and energy performance, as well as to apply a common link between all moving parts, an extended tactile feedback and a highly efficient start up with low energy consumption [19–23]. The dynamic screen is a matrix with linear electromagnetic micro drives. When a signal is processed to the device, the pixels will display on the screen the image.
Intelligent Approach for Analysis of 3D Digitalization …
189
(a) The Original and the Tactile Copy
(b) The Tactile Tile of the Painting
Fig. 5 The painting “Christ and the Samaritan Woman at the Well”, Brera Gallery, Milan, Italy
Paintings and artifacts from museums can be digitized and used as images, thus visually impaired users will be able to “see” these historic objects touchable way. The display represents a matrix comprised of a base with fixed electromagnets, arranged thereon. The display element (Fig. 7) includes an outer cylindrical magnetic core, in which a winding magnetic core locking up the cylindrical magnetic core at the top side and a winding magnetic core locking up the cylindrical magnetic core at the bottom side are placed, where the magnetic cores are with axial holes, and into the space between the windings a movable non-magnetic cylindrical body is placed, carrying a cylindrical permanent magnet axially magnetized and a non-magnetic needle, passing axially through the permanent magnet and the axial holes of the magnetic cores, and the top side of the permanent magnet a ferromagnetic disc is arranged having an axial hole, and on its underside a ferromagnetic disc is arranged
190
D. Karastoyanov et al.
(a) The Original
(b) The Copy Fig. 6 The portal of the Certosa Cathedral, Italy
Intelligent Approach for Analysis of 3D Digitalization …
191
Fig. 7 Principal geometry of the permanent magnet linear actuator: 1—needle (shaft); 2—upper core; 3—outer core; 4—upper coil; 5—upper ferromagnetic disc; 6—non-magnetic bush; 7— permanent magnet; 8—lower ferromagnetic disc; 9—lower coil; 10—lower core; 11—needle (shaft)
having an axial hole wherein, the upper disc and the upper magnetic core have cylindrical poles and the lower magnetic core and the lower disc have conical poles, and above the electromagnets a lattice is placed with openings through which the needles pass [19–23]. The actuator is a linear electromagnetic Microdrive (see Fig. 7). The mover is a permanent magnet. Its magnetization direction is along the axis of rotational symmetry. The upper and lower coils are connected in series. This connection is realized so that the flux created by each of them is in opposite directions in the mover zone. Thus by choosing proper power supply polarity, the motion of the mover will be in desired direction. For example, in order to have motion of the mover in upper direction, the upper coil has to be supplied in a way to create air gap magnetic flux, which is in the same direction as the one of the flux created by the permanent magnet. The lower coil in this case will create magnetic flux which is in opposite direction to the one of the magnetic flux created by the permanent magnet. In this case motion up will be observed. In order to have motion down, the lower coil should be supplied in a way so that its flux is in the same direction as the flux by the permanent magnet. The upper coil then will create magnetic flux in opposite direction. In order to fix the moving part to the Braille dot, non-magnetic shaft is used. Additional construction
192
D. Karastoyanov et al.
variants of the actuator have also been considered, in which two small ferromagnetic discs are placed on both sides—upper and lower—of the moving permanent magnet [22, 23]. This actuator is also energy efficient, as energy is used only for changing the position of the moving part from lower to upper and vice versa. Both at lower and at upper position, no energy is used. At these positions the mover is kept fixed due the force ensured by the permanent magnet (sticks to the core). When transmitting a short voltage pulse with positive polarity to the coils, the nonmagnetic needle rigidly connected to the non-magnetic body carrying the permanent magnet with ferromagnetic discs is moved upwards until the appropriate ferromagnetic plate next to the pole of the upper magnetic core is reached and it is kept in this top position after the decline of the pulse due to the attractive forces between the permanent magnet and the magnetic core, as needle protrudes above the lattice with holes of the Braille display. When transmitting a short voltage pulse to the coils with opposite polarity, the nonmagnetic needle moves down-wards until the ferromagnetic disc next to the pole of the lower magnetic core is reached and remains in that lower position after the decline of the pulse due to the forces of attraction between the permanent magnet and the magnetic core, as the needle does not protrude above the lattice with holes of the Braille display [19]. All electromagnets are controlled simultaneously and in a synchronized manner, such that to obtain a general image with needles—text or graphics on the entire matrix. The visually impaired person feels by touching only those needles that protrude above the lattice with holes of the Braille display, since the permanent magnets are in upward position. The electromagnets can be placed in one line in the matrix as well as in two, three or more lines, side by side at an offset along two axes (x and y) and a different length of the movable needles along the third axis (z), such that they overlap and occupy less space in the matrix, and the tips of the needles are in one plane—the plane of the lattice with openings of the Braille display [20–22]. Static Force Characteristics are obtained for different construction parameters of the actuator. The outer diameter of the core is varied. The air gap between the upper and lower core, the length of the permanent magnet and the coils height has been varied too [19]. In Fig. 8, the force-displacement characteristics are given for varied values of the permanent magnet height (hm), coil height (hw), magnetomotive force (Iw) and apparent current density in the coils (J). The polarity and value of the supply current of the coils is denoted with c1 and c2. “c1= −1, c2= 1” corresponds to supply for motion in upper direction; “c1= 1; c2= −1”—motion down, “c1= 0, c2= 0”—without current in the coil, i.e. this force is due only to the permanent magnet. In this case—Iw = 180 (A), J = 20 (A/mm2 ). Magnetic field of the construction variant of the permanent magnet linear actuator with two ferromagnetic discs on both sides of the permanent magnet is analysed with the help of the finite element method [19]. The program FEMM has been used and additional Lua Script® language was developed for faster computation. The field is analysed as an axisymmetric one due to the rotational symmetry of the actuator. The weighted stress tensor approach has been utilized for evaluating the electromagnetic force on mover—Fig. 9.
Intelligent Approach for Analysis of 3D Digitalization …
193
Fig. 8 The force-displacement characteristics
The optimality criterion is the minimal magnetomotive force NI of the coils. The optimization factors are geometrical parameters (height of the permanent magnet, height of the ferromagnetic discs and height of the coils. The optimization is carried out subject to the following constraints—minimal electromagnetic force acting on the mover, minimal starting force and overall outer diameter of the actuator have been set [19]. Minimization of magnetomotive force NI is in direct correspondence to minimization of the energy consumption. The lower bounds for the dimensions are imposed by the manufacturing limits and the upper bound for the current density is determined by the thermal balance of the actuator. The results of the optimization are as follows: NIopt = 79.28 (A), hwopt = 5 (mm), hmopt = 2.51 (mm), hdopt = 1.44 (mm), Jopt = 19.8 (A). The force-displacement characteristics of the optimal actuator are shown in Fig. 10. In Fig. 11, the magnetic field distribution of the optimal actuator is given for two cases.
194
D. Karastoyanov et al.
Fig. 9 Typical flux lines distribution for two different mover positions
Fig. 10 F/D characteristics with and without current in the coils
In next figure are presented different areas for application. For simple simulations of draw with low-resolution a small Braille screen is satisfying, (Fig. 12). The Fig. 13 represents an image of cultural-historical object, shown on high-resolution display. The problems that may occur are mobility (due to the size), power supply, price, etc. The idea is high-resolution displays to be used from museums and other facilities, where many people can visit and explore.
Intelligent Approach for Analysis of 3D Digitalization …
195
Fig. 11 The magnetic field with and without current in the coils
(a) Draw Image
(b) Simulation
Fig. 12 Simulation of simple painting with low-resolution screen
5 InterCriteria Decision Making Approach The presented multicriteria decision making approach is based on two fundamental concepts: intuitionistic fuzzy sets and index matrices. It is called ‘InterCriteria decision making approach’, which is presented in paper [24]. Intuitionistic fuzzy sets defined by Atanassov [25–27] represent an extension of the concept of fuzzy sets, as defined by Zadeh [28], exhibiting function μA (x) defining the membership of an element x to the set A, evaluated in the [0; 1] interval. The
196
D. Karastoyanov et al.
(a) 3D Model
(b) Tactile Simulation
Fig. 13 High-resolution display
difference between fuzzy sets and intuitionistic fuzzy sets (IFSs) is in the presence of a second function ν A (x) defining the non-membership of the element x to the set A, where: 0 ≤ μ A (x) ≤ 1, 0 ≤ ν A (x) ≤ 1, 0 ≤ μ A (x) + ν A (x) ≤ 1. The IFS itself is formally denoted by: A = {x, μ A (x), ν A (x)|x ∈ E}. Comparison between elements of any two IFSs, say A and B, involves pairwise comparisons between their respective elements’ degrees of membership and nonmembership to both sets. The second concept on which the proposed method relies is the concept of index matrix, a matrix which features two index sets. The theory behind the index matrices is described in [29]. Here we will start with the index matrix M with index sets with m rows {C 1 , …, C m } and n columns {O1 , …, On }:
Intelligent Approach for Analysis of 3D Digitalization …
O1 C1 aC1 ,O1 .. .. . . Ci aCi ,O1 M= . .. .. . C j aC j ,O1 .. .. . . Cm aCm ,O1
. . . Ok . . . aC1 ,Ok .. .. . . . . . aCi ,Ok .. .. . .
. . . aC j ,Ok .. .. . . . . . aCm ,O j
197
. . . Ol . . . aC1 ,Ol . .. . .. . . . aCi ,Ol . .. . ..
. . . aC j ,Ol . .. . .. . . . aCm ,Ol
. . . On . . . aC1 ,On .. .. . . . . . aCi ,On .. , .. . . . . . aC j ,On .. .. . . . . . aCm ,On
where for every p, q (1 ≤ p ≤ m, 1 ≤ q ≤ n), C p is a criterion (in our case, one of the twelve pillars), Oq in an evaluated object, aCpOq is the evaluation of the qth object against the pth criterion, and it is defined as a real number or another object that is comparable according to relation R with all the rest elements of the index matrix M, so that for each i, j, k it holds the relation R(aCkOi , aCkOj ). The relation R has dual ¯ which is true in the cases when relation R is false, and vice versa. relation R, For the needs of our decision making method, pairwise comparisons between every two different criteria are made along all evaluated objects. During the comparison, it is maintained one counter of the number of times when the relation R holds, and another counter for the dual relation. μ Let Sk,l be the number of cases in which the relations R(aCkOi, aCkOj ) and R(aClOi ν aClOj ) are simultaneously satisfied. Let also Sk,l be the number of cases in which the ¯ ClOi, aClOj ) are simultaneously satisfied. As relations R(aCkOi, aCkOj ) and its dual R(a the total number of pairwise comparisons between the object is n(n–1)/2, it is seen that there hold the inequalities: μ
ν ≤ 0 ≤ Sk,l + Sk,l
n(n − 1) 2
For every k, l, such that 1 ≤ k ≤ l ≤ m, and for n ≥ 2 two numbers are defined: μ
μCk ,Cl = 2
Sk,l n(n − 1)
, νCk ,Cl = 2
ν Sk,l
n(n − 1)
The pair constructed from these two numbers plays the role of the intuitionistic fuzzy evaluation of the relations that can be established between any two criteria C k and C l . In this way the index matrix M that relates evaluated objects with evaluating criteria can be transformed to another index matrix M* that gives the relations among the criteria:
198
D. Karastoyanov et al.
C1 ... C1 μC1 ,C1 , νC1 ,C1 . . . M∗ = . .. .. .. . . Cm μCm ,C1 , νC1 ,Cm . . .
Cm μC1 ,Cm , νC1 ,Cm . .. . μCm ,Cm , νCm ,Cm
The final step of the algorithm is to determine the degrees of correlation between the criteria, depending on the user’s choice of μ and ν. We call these correlations between the criteria: ‘positive consonance’, ‘negative consonance’ or ‘dissonance’. Let α, β ∈ [0; 1] be given, so that α + β ≤ 1. We call those criteria C k and C l are in: • (α, β)—positive consonance, if μCk,Cl > α and ν Ck,Cl < β; • (α, β)—negative consonance, if μCk,Cl < β and ν Ck,Cl > α; • (α, β)—dissonance, otherwise. Obviously, the larger α and/or the smaller β, the less number of criteria may be simultaneously connected with the relation of (α, β)—positive consonance. For practical purposes, it carries the most information when either the positive or the negative consonance is as large as possible, while the cases of dissonance are less informative and can be skipped.
6 Experimental Results In this paper, based on the experimental research the values of seven parameters of the devices for refreshable visualization have been obtained: 1. 2. 3. 4. 5. 6. 7.
Number of Pins direction X; Number of Pins direction Z; Total Pins; Refresh time (ms); Holding Force or Tactile Force (mN); Dot Spacing (mm); Vertical Travel (movement) (mm).
The parameter “Number of Pins direction X” gives the resolution of the screen in one direction (the horizontal). Together with the “Number of Pins direction Z” parameter, it gives the portrait or landscape orientation of the device. Together with the “Dot Spacing” parameter, it gives the physical dimensions of the Braille screen. The parameter “Number of Pins direction Z” gives the resolution of the screen in one direction (the vertical). Together with the “Number of Pins direction X” parameter, it gives the portrait or landscape orientation of the device. Together with the “Dot Spacing” parameter, it gives the physical dimensions of the Braille screen. The parameter “Total Pins” gives the general graphical capabilities of the Braille screen. For example with a small number of pins and if managed not individually,
Intelligent Approach for Analysis of 3D Digitalization …
199
but in groups of 6, the device can only represent character information—letters and numbers. With a number of pins over 300 managed individually, we can talk about pseudo-graphic. A total of 5000 pins, individually managed, are required for the full presentation of graphical information (paintings, icons, tapestries, photographs, etc.). The parameter “Refresh time” indicates the frequency at which the control signal must be refreshed to keep the pin in position. In the absence of other means to maintain a position, it is not energy efficient. Some devices put the respective pin with a onetime impulse in position and hold it by other means—permanent magnets and more. In any case, the parameter “Refresh time” also indirectly shows the frequency at which the Braille screen image can change. The parameter “Holding Force or Tactile Force” shows the strength with which the pin resists the finger pressure of the visually impaired person when touching the screen. Depending on the selected way of maintaining the position of the pins— refresh or permanent magnet, the parameter “Holding Force or Tactile Force” indirectly shows the power of the corresponding permanent magnet, electromagnet or other. The parameter “Dot Spacing” shows the distance between the pins. This together with the number of pins in both directions gives the Braille screen size. Also, the parameter “Dot Spacing” indirectly shows the accuracy with which the graphical information is interpreted. The parameter “Vertical Travel (movement)” shows the height of the pins above the Braille screen in the working position. Some more expensive devices offer several levels of elevation to display relief, but it is still a concept. It is to be noted that the fingers of visually impaired people are very sensitive and they recognize pins even at low altitudes. In this paper the parameters of the devices for refreshable visualization have been detail analysed applying the multicriteria decision making method—the InterCriteria approach. The achieved results are presented in Tables 1 and 2. The results show a strong relation between the parameter pairs: 1 (‘Number of Pins direction X’) − 3 (‘Total Pins’); Table 1 Membership pairs of the intuitionistic fuzzy InterCriteria correlations
μ 1
1 1.000
2 0.833
3 1.000
4 0.500
5 0.667
6 0.167
7 0.333
2
0.833
1.000
0.833
0.333
0.833
0.333
0.500
3
1.000
0.833
1.000
0.500
0.667
0.167
0.333
4
0.500
0.333
0.500
1.000
0.500
0.167
0.167
5
0.667
0.833
0.667
0.500
1.000
0.500
0.667
6
0.167
0.333
0.167
0.167
0.500
1.000
0.833
7
0.333
0.500
0.333
0.167
0.667
0.833
1.000
200
D. Karastoyanov et al.
Table 2 Non-membership pairs of the intuitionistic fuzzy InterCriteria correlations
ν 1
1 0.000
2 0.167
3 0.000
4 0.167
6 0.167
6 0.667
7 0.500
2
0.167
0.000
0.167
0.333
0.000
0.500
0.333
3
0.000
0.167
0.000
0.167
0.167
0.667
0.500
4
0.167
0.333
0.167
0.000
0.333
0.667
0.667
5
0.167
0.000
0.167
0.333
0.000
0.500
0.333
6
0.667
0.500
0.667
0.667
0.500
0.000
0.167
7
0.500
0.333
0.500
0.667
0.333
0.167
0.000
2 (‘Number of Pins direction Z’) − 5 (‘Holding Force or Tactile Force’); 1 (‘Number of Pins direction X’) − 2 (‘Number of Pins direction Z’); 2 (‘Number of Pins direction Z’) − 3 (‘Total Pins’); 6 (‘Dot Spacing’) − 7 (‘Vertical Travel (movement)’); 1 (‘Number of Pins direction X’) − 5 (‘Holding Force or Tactile Force’); 3 (‘Total Pins’) − 5 (‘Holding Force or Tactile Force’); 5 (‘Holding Force or Tactile Force’) − 7 (‘Vertical Travel (movement)’); 1 (‘Number of Pins direction X’) − 4 (‘Refresh time’); 3 (‘Total Pins’) − 4 (‘Refresh time’). Part of these relations is due to the specific physical properties of the devices for refreshable visualization, which confirms the reliability of the proposed InterCriteria decision making approach. The benefit here is that this allows for finding strong dependencies as well as such where the relations are not so visible. The geometrical visualisation of the InterCriteria correlations for the case of the devices for refreshable visualization onto the intuitionistic fuzzy interpretational triangle is shown on Fig. 14.
7 Conclusion Nowadays still there is no device, which can provide detailed picture or graphic for visually impaired people. The developed testing device of Braille display on the base of synchronous and simultaneous controlled linear electromagnetic micro drives shows that the patented technology is simple, appropriate and energy efficient. In this paper is shown that the objects of the cultural and historical heritage are recreated through 3D tactile tiles for planar objects, and 3D printed models of spatial objects. The presented approach enables visually impaired people to “see” and perceive objects of cultural and historical heritage by touching the contours of the depicted figures and reading the Braille annotation.
Intelligent Approach for Analysis of 3D Digitalization …
201
Fig. 14 Intuitionistic fuzzy interpretational triangle with results of the InterCriteria analysis
The presented development proves the application of one original multicriteria decision making approach—the InterCriteria approach, which focuses upon the relations between the criteria analysis, giving better production quality of the devices. Acknowledgements The development, described in this chapter, is supported by the Bulgarian National Science Fund under Grants № DN 17/21-2017 and № DN 17/13-2017.
References 1. B. Thylefors (1998) A global initiative for eliminate of avoidable blindness. Am J Ophthalmol (1998) 2. H. Ando, T. Miki, M. Inami, T. Maeda, SmartFinger: Nail-Mounted Tactile Display, ACM SIGGRAPH. Retrieved 13 Nov 2004 3. A Step toward the Light, LLP-LdV-TOI-2007-TR-067 (2007) 4. Orbit Research, Orbit Reader 20: Revolutionary Technology—A Breakthrough in Affordability Highest Quality Braille at the Lowest Price. http://www.orbitresearch.com/, last accessed 2018/02/28 5. Personal Assistive Device for BLIND and visually impaired people (BlindPAD), FP7-ICT2013-10 Project, Grant No 611621, Final Project Report, v. 5 Oct 2017, https://www. blindpad.eu/, last accessed 2018/02/28 6. J. Bornschein, D. Prescher, G. Weber, Inclusive production of tactile graphics. INTERACT 1(2015), 80–88 (2015) 7. S. O’Modhrain, N. Giudice, J. Gardner, G. Legge, Designing media for visually-impaired users of refreshable touch displays: possibilities and pitfalls. IEEE Trans. Haptics 8(3), 248–257 (2015) 8. S. Levinson, Mathematical Models for Speech Technology (Wiley Ltd., 2005). ISBN 0-47084407-8 9. Blitab (Tablet for Blind—last visited Apr 2018). http://assistivetechnologyblog.com/2017/01/ blitab-worlds-first-tactile-tablet-for-the-blind.html
202
D. Karastoyanov et al.
10. A. Sharma, R. Motiyar, Zixel: A 2.5-D Graphical Tactile Display System, SIGGRAPH, Asia (2011) 11. Phone Draw Braille (last visited Apr 2018). http://www.yankodesign.com/2012/02/20/theultimate-braille-phone/ 12. Refreshable Braille Screen (last visited Apr 2018). https://liliputing.com/2016/01/refreshablebraille-displays-could-allow-the-blind-to-read-graphics.html 13. T. Guerreiro, K. Montague, J. Guerreiro, R. Nunes, H. Nicolau, D. Gonçalves, in Blind People Interacting with Large Touch Surfaces: Strategies for One-handed and Two-handed Exploration. Proceedings of the International Conference on Interactive Tabletops & Surfaces, (2015) pp. 25–34 14. V. Cantoni, D. Karastoyanov, P. Moskoni, A. Setti, “1525–2015. Pavia, The Battle, The Future. Nothing was the Same Again” Milan EXPO and Pavia Castle Exhibition 13/6-29/11/2015 15. Abilitando, Where Technology Meets Disability, Series of Meetings Around the Relationship Between New Technologies, Disabilities and Integration in School and Work (2015) 16. Manifesto for Agile Software Development. http://agilemanifesto.org/ 17. European digital art and science network, Austria. http://www.aec.at/artandscience/residencies/ 18. R. Malina, C. Strohecker, C. LaFayette, Steps to an Ecology of Networked Knowledge and Innovation Enabling New Forms of Collaboration among Sciences, Engineering, Arts and Design, on behalf of SEAD network contributors, 2013 19. D. Karastoyanov, I. Yatchev, I. Balabozov, in Innovative Graphical Braille Screen for Visually Impaired People. Approaches and Solutions in Advanced Intelligent Systems (Springer International Publishing, Switzerland, 2016). ISBN: 978-3-319-322-06, ISSN: 1860-949X. https:// doi.org/10.1007/978-3-319-32207-0, p. 648, pp. 219–240 20. D. Karastoyanov, Braille screen, Bulgarian Patent № 66520, 2016 21. D. Karastoyanov, S. Simeonov, Braille display, Bulgarian Patent № 66527, 2016 22. D. Karastoyanov, I. Yatchev, K. Hinov, T. Rachev, Braille screen, Bulgarian Patent № 66562, 2017 23. D. Karastoyanov, I. Yatchev, K. Hinov, Y. Balabosov, Braille Display, WIPO PCT Patent Application, № PCT/BG2014/00038, 10.01.2014 24. K. Atanassov, D. Mavrov, V. Atanassova, in InterCriteria Decision Making. A New Approach for Multicriteria Decision Making, Based on Index Matrices and Intuitionistic Fuzzy Sets. Proceedings of the 12th International Workshop on Intuitionistic Fuzzy Sets and Generalized Nets, Warsaw, Poland, 2013 25. K. Atanassov, Intuitionistic Fuzzy Sets (VII ITKR’s Session, Sofia, 1983). (in Bulgarian) 26. K. Atanassov, Intuitionistic fuzzy sets. Fuzzy Sets Syst. 20(1), 87–96 (1986) 27. K. Atanassov, Intuitionistic Fuzzy Sets: Theory and Applications (Physica-Verlag, Heidelberg, 1999) 28. L.A. Zadeh, Fuzzy sets. Inf. Control 8, 333–353 (1965) 29. K. Atanassov, Generalized Nets (World Scientific, Singapore, 1991)
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring Heterogeneous Event Log Data Pierre Dagnely, Tom Ruette, Tom Tourwé and Elena Tsiporkova
Abstract Photovoltaic (PV) event log data are typically underexploited mainly because of the heterogeneity of the events. To unlock these data, we propose an explorative methodology that overcomes two main constraints: (1) the rampant variability in event labelling, and (2) the unavailability of a clear methodology to traverse the amount of generated event sequences. With respect to the latter constraint, we propose to integrate heterogeneous event logs from PV plants with a semantic model of the events. However, since different manufacturers report events at different levels of granularity and since the finest granularity may sometimes not be the right level of detail for exploitable insights, we propose to explore PV event logs with Multi-level Sequential Pattern Mining. On the basis of patterns that are retrieved across taxonomic levels, several event-related processes can be optimized, e.g. by predicting PV inverter failures. The methodology is validated on real-life data from two PV plants. Keywords Semantic integration · Ontology model · SPARQL · Multilevel sequential patterns · Photovoltaic plants
1 Introduction With the ubiquity of solar energy, a multitude of PV plants are installed around the world. The parallel development of the Internet of Things facilitates the continuous remote monitoring of the operation of these plants. Inverters are the main devices P. Dagnely (B) · T. Ruette · T. Tourwé · E. Tsiporkova Sirris Elucidata Innovation Lab, Bd. A. Reyerslaan 80, 1030 Bruxelles, Belgium e-mail: [email protected] T. Ruette e-mail: [email protected] T. Tourwé e-mail: [email protected] E. Tsiporkova e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_9
203
204
P. Dagnely et al.
Table 1 Event concepts related to DC over-voltage and their labels in two inverter models Manufacturer Fronius Danfoss name Inverter model IGPlus150-3/150V-3 Danfoss_TLX15/TLX_15_PRO MS model SOLARLOG SSG SOLARLOG SKYLOG/SSG Intermediate circuit over-voltage PV over-voltage
1071
1071
1283 or 1284
209 or 210
258
in a PV plant, collecting and aggregating the electricity produced by the panels and subsequently preparing it for the grid. Therefore, they are closely monitored by monitoring systems (MS). The log data that are generated by these MSs contain, among others, error messages, e.g. “PV over-voltage”, and are often merely used to raise an alarm when a specific event occurs. In other domains, such as in data center monitoring or complex machine maintenance, log data are being considered for early fault detection purposes. Initial studies from these domains, discussed in Sect. 3, show that log files can be successfully used to find recurring sequences of events that lead to errors, enabling to predict errors, so that pre-emptive counter-measures can be taken. To exploit the event log data, we propose an explorative methodology that overcomes two main constraints, i.e. (1) the rampant variability in event labelling, leading to data model and semantic heterogeneities, and (2) the unavailability of a clear methodology to traverse the immense amount of generated event sequences. The optimal exploitation of log data in the PV domain is mostly hampered by the heterogeneity of the event labels generated by the different inverters. The event labels differ across inverter manufacturers, models, series and data loggers, as shown in Table 1. There is currently no finalized and widely accepted standard for PV event logs—despite the SunSpec initiative [2]—and the event labels consequently depend on the inverter-MS pair. A survey of 1000 plants with event log data in the portfolio under study shows that 46% of the plants monitored by our industrial partner consist of at least two different inverter models. Given this variability, event logs are not commensurable and standard event log data analyses will not be optimal unless the log files are uniformised. It creates two problems: (1) the data model heterogeneity, i.e. the heterogeneity of the event labels generated by the assets, (2) the semantic heterogeneity, i.e. the heterogeneity in terms of which events are reported by the asset.
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring …
205
1.1 Data Model Heterogeneity Problem Applying pattern mining on a dataset generated by a heterogeneous plant or set of plants, i.e. with various inverters types, would generate incomplete or false patterns. Considering pattern mining returns sequence of events, i.e. sequences of labels, it relies on a consistent labelling. For a concrete PV example, in a plant composed of Danfoss and Fronius inverters, a pattern leading to event label 1010 could mix sequences leading to General grid error (1010 event of Danfoss) or to Grid voltage too high (1010 event of Fronius). Similarly, an event may have 2 different labels depending of the type of the inverter reporting it. Pattern mining will consider that these are 2 different events and may miss patterns leading to the event. Any pattern mining method needs therefore to be applied on an integrated dataset with consistent labelling. Unfortunately, industrial portfolios are often heterogeneous. Data integration is therefore required before applying pattern mining on event logs generated by a heterogeneous portfolio. The alternative to look for patterns on all the sub-datasets generated by homogeneous assets is usually impractical. This approach is simpler as only the results need to be integrated to compare the various patterns retrieved. However, by only applying pattern mining on subsets of the data, it may be applied on biased datasets and return incorrect results. It may also miss important patterns that only occurred a few times in each sub-dataset. This approach could only make sense for assets with very different behaviors, when it is not expected to retrieve patterns shared by all asset types. However, the assets of an industrial portfolio have usually similar behaviors. Therefore, pattern mining should usually be applied on the all heterogeneous dataset, and hence requires data integration.
1.2 Semantic Heterogeneity Problem The second main problem when applying pattern mining in industrial event logs is the semantic heterogeneity. This problem is linked to the wide variability of the events. As each manufacturer defined its own monitoring system, they also defined which events will be reported by their devices. Therefore, two devices from distinct manufacturers may not report events at the same conceptual level. A concrete example of the complications that emerge from this lack of standardization is reflected by the event concept Internal temperature of the inverter is too high reported by Danfoss inverters which do not precise where the over-temperature occurred. Fronius inverters are more precise and define which component suffers the over-temperature with events Over-temperature buck converter or Over-temperature cooling element. Pattern mining will therefore consider these events as different although they are conceptually similar. Some patterns leading to temperature errors could be found not frequent if considering these events as distinct while they would be found frequent, and hence retrieved by the algorithm, if these events were considered as similar.
206
P. Dagnely et al.
Therefore, even on integrated datasets, the semantic heterogeneity will still impact the results of pattern mining. Applying pattern mining on a multi-level dataset (considering that the data model heterogeneity has been solved beforehand) will lose many information. The patterns returned will be correct and accurate but not as complete as they could be. Considering the multi-level aspect allows to an adapted pattern mining algorithm to retrieve more relevant patterns at the same support threshold i.e. low minimum number of occurrences required by the pattern to be considered.
1.3 Approach Therefore, in this paper, we demonstrate the potential of exploiting uniformised event log files of PV plants to find recurring sequences of events. By doing so, we address the two problems that we identified above. First, we tackle the issue of the heterogeneous log data, which we propose to approach by semantically modeling the events in an ontology and then use semantic integration techniques for uniformizing event logs. Second, we offer an exploration methodology for the uniformised event logs by searching for recurring sequences of events. As an added value of using an ontology for the semantic integration, not only event sequences across devices and PV plants, but also across hierarchies of events can be found by applying a Multilevel Sequential Pattern Mining (MLSPM) algorithm [13]. The paper is organised as follows. First, the relevant literature about semantic integration and MLSPM is explained in Sects. 2 and 3. Then, in Sect. 4, we explain the motivation for our research and we apply both methods to the PV domain by developing a case study on a real-world dataset provided by our industrial partner. Finally, we conclude the paper with a discussion and an outlook for further research in Sect. 5.
2 Semantic Integration Integrating heterogeneous, yet complementary data is a long-standing problem for which, according to [8], three main approaches have been developed: Data fusion [8], Record linkage [8] and Schema mapping which creates a mediating schema, i.e. a schema representing all of the available data, and then define mappings between this mediated schema and the local schemas of the data sources. One successful schema mapping method is semantic integration. This method is widely used in the bio-medical domain, with encouraging results [14] to e.g. integrate biological information such as genome data [14]. In semantic integration, the mediated (global) schema role is taken up by an ontology, i.e. a formal naming and definition of the types, properties and interrelationships of the entities in a specific domain. An ontology is composed of several components:
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring …
207
• Concepts: Domain-related entities e.g. an inverter or the ambient temperature. An instantiation of a concept is called an individual, e.g. the individual inverter_1 instanciate the concept Inverter and represents a specific inverter. • Hierarchy: Concepts can be arranged hierarchically to define sub and superconcepts, e.g. DC over-voltage is a super-concept of the sub-concept PV overvoltage. Hierarchies are a powerful way to manipulate the data at different levels of granularity, e.g. a query for all instances of a DC over-voltage fault will return all instantiations of the sub-concepts. • Relations: There are two types of relations. First, an object property links two instantiations of concepts, e.g. is component of is a relation which links inverter_1 to monitoring_system_1, and data properties which link a concept to a literal, i.e. a secondary concept which is not represented as a concept in the ontology but through a simple format such as string or integer, e.g. has ID is a relation which links the individual inverter_1 to the literal “34521’. In this study, the ontology models the information flow in terms of event logs, generated during the operation of a PV plant. As far as we know, there are no ontologies that cover PV event log entries. There are ontologies that describe a generic model of events, beyond the PV domain, such as the one created by Shaw et al. [19]. However, we are interested in the specific PV events with their specific relations and classifications, and we require a tailored ontology. Existing ontologies of the PV domain focus on production efficiency rather than on the modelling of event concepts. As an example, Abanda et al. [1] defined an ontology describing all the components of a PV plant and their production parameters such as CO2 saving or sun availability. Since none of these previous efforts have the level of detail that is required for semantically integrating PV plant event logs, we propose a new ontology (see also [6]). Once the ontology, i.e. the mediating schema, is defined, the mapping with all the local schemas can be expressed through mapping rules. For example, in the PV domain, the local schemas are the specific event labels used by each manufacturer and the mapping rules make the relation between an event label (as sent by an inverterMS pair, e.g. “258”) and an Event (as defined conceptually in the ontology, e.g. PV over-voltage). The mapping rules can be expressed using SPARQL (SPARQL Protocol and RDF Query Language), OWL (Web Ontology Language) constraints [3] or SWRL [3]. SPARQL has been shown to be the more efficient approach by Boran et al. [3]. SPARQL is the preferred query language in the semantic domain, mainly through the SELECT query type which retrieves information. However, a specific type of queries can also insert new relations in a database. They can be used to insert the relation between an event label and an Event in the database.
208
P. Dagnely et al.
3 Multilevel Sequential Pattern Mining Sequential Pattern Mining (SPM) algorithms are particularly useful to get insights into the operational process of a PV plant. By uncovering recurring patterns, i.e. sequences of PV events, it is possible to get insight on normal and anomalous behavior. Next to the strong explorative potential of such patterns, exploitation possibilities can also be envisaged, e.g. predicting failures ahead of time by early detection of event sequences that where known to lead to outages in the past. With such forecasting information available, maintenance teams can be dispatched pro-actively or plant operation can be adjusted. Mainly two types of algorithms are used for SPM. On the one hand, there are the Candidate Generation (CG) algorithms. CG algorithms include e.g. GSP [15] or SPADE [15]. On the other hand, there are the Frequent Pattern Growth algorithms, such as PrefixSpan [15] or BIDE [15]. Only a few algorithms are unique in their design, such as CloSpan [15] or DISC [15]. All these SPM algorithms have been applied successfully in many cases, such as critical events prediction [4] or the prediction of the next prescribed medication [22]. Many different variants of the standard SPM algorithms exist. However, the most suitable flavour of SPM to find patterns in items for which hierarchical knowledge is available is Multi-level SPM (MLSPM). Those algorithms can look for patterns at different levels of conceptual detail, by using a taxonomy of the concepts behind the items in a sequence. For instance, MLSPM can retrieve patterns containing DC over-voltage faults and/or their lower level instantiations such as PV over-voltage or Intermediate circuit over-voltage. This clearly enriches the bird’s-eye view of the PV behaviour, which should be advantageous for domain experts. Another difference of MLSPM algorithms lies in how they check whether a pattern reaches the support threshold. MLSPM counts the support, i.e. the number of occurrences, of the patterns itself (like SPM does), and adds the support of any of its more specific representations, i.e. conceptually lower level versions of the pattern. An important consequence of this is that since general, i.e. conceptual higher level, patterns tend to have higher support values, then they can be retrieved with higher support threshold. For instance, the event DC over-voltage could be retained while patterns containing PV over-voltage or Intermediate circuit over-voltage would have a too low support threshold. The first MLSPM algorithm to be found in the literature is multi-level GSP, and was developed by Srikant et al. [21]. It adds all the ancestors of each element to the sequence before applying GSP. This algorithm has some drawbacks, among which the long computation time, the heavy memory requirements and the necessity to post-process the patterns to remove redundancy. Hierarchical Generalized Sequence Patterns (hGSP) is an inter-level SPM algorithm on the basis of GSP and was developed by Sebek et al. [18]. In [11], an algorithm is proposed that handles multi-levelled properties for sequences prior to and separately from the testing and counting steps of candidate sequences. Chen et al. [5] defined an algorithm using a numerical re-encoding of the taxonomy, defining e.g. a 3 level taxonomy with 1∗∗ as root, 11∗ , 12∗ , 13∗ as second
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring …
209
level and 111, 112, 123 as child of 11∗ . By renaming the sequences using this encoding, it is easy to check if a pattern matches a sequence. For instance 111, 258 match 1∗∗ , 2∗∗ , which can be easily verified by only looking at the first letter of 111 and 258. This re-encoding allows to easily compute the support of a pattern without using the taxonomy anymore. An adapted APriori based algorithm is then used to find multi-level patterns. One drawback, mentioned by Lianglei et al. [13], of this method is its inability to deal with large taxonomy. They took the example of the node 111 that can be e.g. the first node of third level or the eleventh node of second level. Therefore, they proposed PMSM a modification of this algorithm in [13], where they used prime numbers for this numerical re-coding (a more complete explanation of this algorithm can be found below). Plantevit et al. [16] introduced M3 SP, a SPM algorithm that combines multi-level and multi-dimensional approaches (a flavor of fSPM that takes other parameters into account such as the location or the size of a plant). Egho et al. [10] applied such algorithm to mine real world data of healthcare trajectories in France, and found potentially interesting patterns for healthcare specialist. Nonetheless, as far as we know, SPM has not yet been applied to the sequences of events generated by the monitoring devices in PV plants.
4 Case Study In this chapter, we first describe the available data. Then, the ontology for modeling the PV event workflow and its implementation for the integration of the data is presented. Finally, the interaction between the hierarchically structured ontology and a multi-level sequential pattern mining algorithm is explained.
4.1 Business Goals and Experimental Design The two main constraints that hamper the use of event log data in PV monitoring are (1) the unavailability of a clear methodology to traverse the immense amount of generated event sequences, and (2) the rampant variability in event labelling. The suitability of our MLSPM approach to deal with those (real world) constraints and its explorative potential are therefore demonstrated in a well designed experimental context, composed of 2 comparison studies: 1. The general advantages and drawbacks of MLSPM (PMSM) are showcased by comparing its pattern mining output and computational performance with those of a regular SPM (APriori) algorithm. For this purpose, both algorithms have been applied to event log data originating from a single PV plant consisting of a homogeneous set of device types, as regular SPM cannot deal with heterogeneous data. 2. To illustrate the essential feature of our approach to deal with multi-source heterogeneous data, 2 separate datasets have been composed. Subsequently, the pattern
210
P. Dagnely et al.
mining outputs of MLSPM, applied separately on those 2 datasets, have been compared with the patterns retrieved with the same algorithm when the 2 datasets have been merged into a single dataset. The merged dataset is no more homogeneous since the chosen PV plans are composed of different device types.
4.2 Data Understanding Our benchmark corpora consists of 1 year of data from two different PV plants: Plant A and Plant B with respectively 33 and 26 inverters resulting into two datasets of respectively 251,401 and 113,593 events with 26 and 24 distinct types of events. They are provided by our industrial partner 3E, which is active, through its Software-asa-Service SynaptiQ, in the PV plant monitoring domain. These plants use distinct inverters from big players, to have representative data and are close to each other to avoid a large influence of meteorological variables. As we are interested in daily patterns, the two datasets have been segmented in sequences that represent the events reported by a single inverter during a single day. The day sequences have very variable length, ranging from 1 to 1242 events. The support of a pattern is calculated by counting in how many day sequences the pattern occurs, divided by the total number of sequences in the dataset.
4.3 Integrating PV Events with an Ontology As there is no ontology that models PV events, we developed a suitable ontology on the basis of the rich domain knowledge available at 3E and the manufacturer documentation per inverter. In what follows, we first describe the concepts in the ontology, their hierarchical status and the relations among them in a more formal way. Then, we provide an example to make things more tangible. Finally, we discuss the queries that drive the semantic integration functionality. For a more detailed discussion of this ontology, we refer to [6]. Concepts In this subsection, we introduce the concepts in the ontology. To find an optimal level of conceptualisation, we relied on the input of the 3E domain experts. They specified that concepts in the ontology should capture all abstract (e.g. timestamps) and concrete (e.g. inverters) objects at a PV plant. In the remainder of the text, concepts will be presented in small capitals. Most of the concepts are conceptually attributes of other concepts, e.g. the concept Inverter type is an attribute of the concept Inverter. The main concepts with selected attribute concepts (if applicable) are presented here:
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring …
211
• Equipment, which represents any device in or involved in the PV plant. A related attribute concept is Equipment fault, which may be instantiated as a fault label such as Intermediate circuit over-voltage. • Inverter, which represents a specific device of a PV plant, the inverter. Related attribute concepts are (1) the Inverter type, e.g. WRTP4Q39 and (2) the Inverter status, e.g. on or off. • Monitoring system, which represents a specific device of a PV plant, the monitoring system. A related attribute concept is the Monitoring system type, e.g. SOLARLOG. • Power station, which represents a full PV plant. • Power station component, which represents any sub-component of a PV plant or, transitively, a sub-component of a sub-component of a PV plant. • Interaction Event, which represents an event, i.e. a phenomenon that occurs or is detected in a PV plant, such as Equipment fault (see below) or Maintenance activity. A related attribute concept is Event label, which identifies an event, regarding to the inverter and MS which detected it. • Equipment/fault, which represents any events occurring in or around the PV plant. Equipment/fault has two sub-concepts: Grid fault and Power station fault. • State, which represents the specific event describing the state in which the PV plant is. For instance, the PV plant can be in the state Running, Stop, Freeze, Error or Recovery. These statuses are therefore grouped in two sub-categories: (1) NotOperatingState which contains all the statuses where the PV plant does not produce electricity, and (2) OperatingState which contains all the “regular” statuses, i.e. the statuses where the PV plant produces electricity. Hierarchies We defined our proposed hierarchy of Event concepts on the basis of specialized documentation from the manufacturers (mainly [7, 12, 17, 20]) and on the knowledge of domain experts from 3E. This hierarchy contains 289 concepts with 10 levels. Of the 289 concepts 253 belongs to the Equipment/fault hierarchy. The most important part of the ontology is the 6 levels hierarchy of 253 concepts of PV events. The hierarchy contains all the events reported by the main manufacturers (SMA, Power One, Danfoss and IG plus). It should therefore offer a complete overview of the possible events. For the sake of readability, this section will only cover a short subset of the Equipment fault hierarchy. The events can be divided in two main categories: • Grid fault, which represents the faults occurring in the grid, i.e. in the connection between the PV plant and the “outside world”. They are not faults occurring in the PV plant per se but they impact the behavior of the PV plant as it can not send the electricity produced to the grid. The sub concepts are: – GridEventDetected, which represents an unspecified event occurring in the grid
212
P. Dagnely et al.
– GridFail, which indicates that the grid do not operate – IslandingFault, which represents an islanding fault – WaitingGridVoltage, which indicates that PV inverters are waiting to know the grid voltage to adapt the voltage of the current send from the PV plant to the grid – ACVoltageOutOfRangeFault, which indicates that the voltage of the grid current is outside the boundaries accepted by the PV plant – FrequencyOutOfRangeFault, which indicates that the frequency of the grid current is outside the boundaries accepted by the PV plant – GridImpendanceOutOfRange, which indicates that the impedance of the grid current is outside the boundaries accepted by the PV plant – GridRelayFault, which indicates that the relay between the Grid and the PV inverters failed – GridInstallationFault, which indicates that the installation of the connection between the grid and the PV inverter has not been done properly – InverterBrieflyDisconnectsOfGrid, which indicates that the connection between the grid and the PV inverters has briefly been compromised. Only ACVoltageOutOfRangeFault and FrequencyOutOfRangeFault have sub concepts, representing the various possibilities of “out of range", e.g. above a certain threshold, over frequency, rapid fall and rise of the frequency, over frequency in a sub-component, ... • Power station fault, which represents the events occurring in the PV plant itself. It has 9 sub-concepts with some of them having up to 5 level sub-hierarchy. The sub-concepts are: – ConfigurationFault, which represents all the events related to a wrong installation of any component of the PV plant (except the component linked to the grid, where the events are in the Grid fault sub-hierarchy) – FaultDueToTemperature, which represents all the events related to external temperatures being outside the operating condition of the PV plant – GroundFault, which represents the islanding faults – HardwareFault, which represents all the events linked to the failure of a specific component of the PV plant – VoltageFault, which represents all the events related to a voltage issue in the current generated by a PV module – CommunicationFault, which represents the events linked to software failures when the communication between components of the PV plant is not functioning. For instance, two sub-components of the inverter do not communicate resulting to an impossibility for the inverter to calibrate itself correctly. – MemoryLossFault, which represents another type of software failures when the memory of the inverter software fail – TestFault, which represents the events related to the automatic tests performed by the inverter, e.g. in the morning the inverter test if there is enough sun to start producing electricity
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring …
213
– DisplayError, which indicates that the on site display of the inverter is not working. This event is the only one without sub-hierarchy. Relations Concepts can also be linked through relations, the majority of which are between a concept and its attributes. As an example, consider the relations Inverter has status Status and Equipment has fault Equipment fault. However, other relations, defined in close collaboration with domain experts, convey more knowledge about the behavior of a PV plant. 32 relations are defined inside the ontology. The main (object) properties are: • is component of, which links a sub-device to its device, e.g. Inverter is component of Power station. • invokes, which links an Event to the Monitoring system and Inverter which detected it, e.g. Monitoring system invokes Event. • reports, which defines what is reported by an Event, e.g. Event reports Equipment fault or Event reports Maintenance activity. • hasManufacturer, which defines the InverterManufacturer of an Inverter. • hasStatus, which defines the State of an Inverter, e.g. Running, Stop or Freeze. • hasTilt, which defines the InverterTilt, i.e. it’s orientation of an Inverter. • hasFault, which defines which EquipmentFault occurs in which Equipment. • hasBeginning, which defines the installation date of the PowerStation. Relations are defined between concepts and link individuals belonging to those concepts or an individual to a literal. An individual is for example inverter_1 which is one specific Inverter, and which is linked to the individual monitoring_system_1 through the is component of relation. inverter_1 is also linked to the literal 34512 through the has ID relation. Data is therefore seen as tuple i.e. as sets of two individuals link by one relation. Mapping Rules On the basis of the above concepts, hierarchies and relations, semantic integration of event logs is possible through the addition of SPARQL queries to the ontology. SPARQL queries make the relation between an event label (as sent literally by an inverter-MS pair, e.g. “223”) and an Event (as defined conceptually in the ontology, e.g. Islanding fault) explicit. To make the mapping, three parameters need to be resolved: (1) the inverter type, (2) the MS type, and (3) the event label. On the basis of these parameters, the precise Event can be identified in the ontology. For example, a domain expert knows that the event label “258” reported by the Inverter of type TLX15, through the Monitoring System SSG is a PV over-voltage fault. This knowledge is expressed in a mapping rule. A graphical representation of the mapping rule can be found in Fig. 1. It relies on a subset of the ontology represented in Fig. 2. The concepts and relations in black represent the asserted data, i.e. the data stored in the PV event log. The concepts and
214
P. Dagnely et al.
relations in light grey are inferred, i.e. the result of the mapping rule. In this example, the event “258” monitored by a SSG MS on an inverter TLX15 is inferred to be a PV over-voltage fault. We have used INSERT SPARQL queries to express the mapping rules. These queries, as shown in Fig. 3, are composed of two parts: (1) The INSERT part, which contains the relation to create (the light grey part of Fig. 1), and (2) the WHERE part, which contains the condition upon which the rules applied (the black part of Fig. 1). The semantic integration has been implemented in Java, using the Apache Jena library, and includes these steps: 1. The log data is saved in a triplestore, i.e. a semantic database, (we have chosen TDB, the built-in triplestore of Jena). This kind of database first loads the ontology and is then able to store the data according to the concepts and relations defined in the ontology, as shown in Fig. 4. SPARQL queries can then efficiently be applied
Fig. 1 Graphical representation of an example of the event mapping rule, based on the inverter-MS pair models
Fig. 2 Main concepts and relations of the ontology (only a small section of the Equipment fault hierarchy is shown) related to the mapping rules
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring … 1 2 3 4 5 6 7 8 9 10 11 12
215
PREFIX s i r : < h t t p : / / www. s i r r i s . be / S o l a r P a n e l . owl #> insert { ? fault a sir:IslandingFault . } where { ? event sir:hasLabel sir:258 . ? logger sir:hasModel s i r : s s g . ? logger s i r : i n v o k e s ? event . ? i n v e r t e r s i r : h a s M o d e l sir:TLX15 . ? i n v e r t e r s i r : i n v o k e s ? event . ? event s i r : r e p o r t s ? f a u l t . }
Fig. 3 Example of a SPARQL query used for the mapping
Fig. 4 Representation of a log entry, based on the ontology. Hexagons represent individuals, i.e. value of object properties, while squares represent literals, i.e. values of data properties (indicated by arrows with non-filled tips) 1 2 3 4 5 6 7 8 9
< I n v e r t e r M o d e l >TLX15< / I n v e r t e r M o d e l > SSG< / MSModel> < L a b e l >258< / L a b e l > < E v e n t >< / E v e n t >
...
Fig. 5 XML file for a domain expert to make the link between an event label and an event concept’. In this case, the domain expert would add “PV over-voltage fault” in the “Event” node
216
P. Dagnely et al.
to the data through built-in functions. An alternative is to use a non-semantic database, such as a relational database, and to use a semantic layer, e.g. RDB2RDF or Virtuoso. The layer will then make the mapping, in real-time, between the rows of the tables and their semantic representation and the reverse. 2. Writing the SPARQL queries is cumbersome as each inverter-MS pair can detect hundred of different events. Moreover, the mapping has to be done manually, as only a domain expert can link a specific error to its conceptual representation in the ontology. However, this task can be eased by preprocessing the data and creating a simple XML file to fill out instead of writing raw SPARQL queries, as shown in Fig. 5. The XML element contains the model of the Inverter-MS pair, the event label and an empty element where the domain expert can indicate the corresponding ontology concept, i.e. the only missing mapping information. Once the XML elements are completed, they are automatically parsed and converted into SPARQL queries (Fig. 4). 3. Once created, the mapping rules can be applied on any triplestore with a SPARQL engine. After the SPARQL engine ran, the triplestore contains the integrated data. 4. After the mapping step, the integrated data can be retrieved or exported through further SPARQL queries.
4.4 Finding Recurring Multi-level Event Sequences For this study, the Prime encoding based Multi-level Sequential patterns Mining (PMSM) algorithm of Lianglei et al. [13] has been selected as it is well adapted to large taxonomies. The taxonomy we use in this paper, i.e. the sub-tree concerning events of our ontology, consists of 253 concepts and 6 levels. PMSM relies on the APriori approach of Candidate Generation. It starts by finding all concepts (events) above the support threshold, which can be seen as frequent patterns of size 1. Then it iterates to generate patterns of size n by combining the frequent patterns of size n − 1 among them (see example below). Subsequently, only the frequent patterns of size n are kept. The algorithm stops when no patterns can be found. The candidate generation follows these rules: (1) the combination of the frequent events a1 and a2 is a1 a2 and a2 a1 ; (2) the combination of the patterns a1 a2 a3 and a2 a3 a4 is the pattern a1 a2 a3 a4 . Only overlapping patterns can be combined to obtain patterns of size n + 1. As its name suggests, PMSM uses a prime-number-based encoding of the taxonomy. Each node is assigned the multiplication of the number of its ancestor and a prime number not used in any upper level or left node, i.e. any node at the same level already treated by the algorithm. For instance, the root node is 1 and its two children are thus respectively 2 (1 × 2) and 3 (1 × 3). If node 2 has two children, they would be encoded as follows: 10 (2 × 5) and 14 (2 × 7). Due to the properties of primes numbers, this encoding makes it easy to check ancestors: a is an ancestor of b ⇔ b mod a = 0. By renaming the sequences, using these prime numbers, it is easy to verify if a node or a pattern is the ancestor of another node or patterns without
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring …
217
referring to the taxonomy anymore, which simplifies the computation of the support of a pattern, as it correspond to the sum of its number of occurrence and the number of occurrence of it’s children, i.e. more specific instantiation. The algorithm is implemented in Python as follows: 1. As the taxonomy is implicitly present in the proposed ontology, we parse the ontology with the Python RDFLib library and apply a script that converts the taxonomy into a key-value store, with the node name as key and the node (prime) number as value. The link between the ontology and the MLSPM algorithm rely on this step. 2. Using the key-value store, we convert the sequences of events into their numerical encoding. 3. The PMSM algorithm, which we implemented based on [13], takes the primenumber encoded sequences and the desired support threshold and derives the patterns, grouped by length.
4.5 Discussion To showcase the explorative possibilities of our proposed approach, two comparisons have been made: (1) a comparison between PMSM and a standard SPM algorithm— performed on a single plant—showcasing the general advantages and drawbacks of MLSPM (PMSM) (2) A comparison between the results of PMSM, when applied separately to each of the two datasets in our benchmark corpora, and those when applied to an integrated dataset combining data from two different PV plants, showcasing the advantage to integrate heterogeneous fleet of PV plants. Both comparisons are evaluated in term of amount of (insightful) patterns founds.
4.5.1
PMSM Versus APriori
In order to get a better insight into the advantages and drawbacks of our approach, we have compared PMSM to APriori. As APriori is the underlying method used by PMSM, all differences between them can be attributed to the multi-level properties of PMSM. Both algorithms have been applied to the dataset of Plant A, which suffers of frequent voltage failures. Figure 6 contains a selection of patterns found by APriori, while Fig. 7 contains patterns found by PMSM, both at the same support threshold. These selected patterns are not necessarily relevant for domain experts, but they allow to easily grasp the advantages and drawbacks of PMSM. The main advantage of PMSM is its ability to find patterns across different conceptual levels. Let us consider the first PMSM pattern of Fig. 7. According to this pattern, a VoltageFault is regularly preceded by two sensor events, but between them, the inverter reports its Operating state as e.g. Running or Starting. This pattern has been succesfully detected, even though the concepts are at a different level
218
P. Dagnely et al.
in the ontology hierarchy. Such patterns across hierarchical levels give a bird’s-eye view on the behaviour of the plant, which APriori is not able to achieve as it only considers leaf events. The ability of PMSM to discover patterns across different hierarchical levels also offers the capacity to return patterns at any desired conceptual level. This capability is illustrated with the two under-voltage faults: Input under-voltage and Input voltage under threshold, where the conceptual difference between them is minimal. Therefore, domain experts are more interested in higher level patterns, i.e. patterns involving DCUnderVoltageFault, their upper-class. This pattern is found by PMSM as illustrated by the second pattern of Fig. 7 which indicates that it is usually preceded by sensor checks. It is demonstrated once more that APriori, as it only finds patterns at the leaf level, is only able to retrieve the link between under-voltage and sensor checks at the leaf level (InputUnderVoltage and InputVoltageUnderThreshold), as shown by the first and second patterns of Fig. 6. Subsequently, in order to merge these clearly redundant patterns, a heavy post-processing method, probably involving the proposed ontology, is needed. In addition, as PMSM considers higher level events that have mathematically higher support threshold (their supports are the sum of the support of the leaf events they cover), it finds more insightful patterns than APriori at a given threshold. For example, the third pattern of Fig. 7 shows the link between VoltageFault (a high level event) and BulkOvervoltage, a link that was not retrieved by APriori at that support threshold. However, as shown in Table 2, finding more patterns comes with a price. Namely, with an increase of the pattern length, an exponential increase of both the computation time and the number of patterns found can be observed, which is much steeper than the increase one sees for APriori. This is logical, since PMSM will find patterns across all levels of the conceptual hierarchy. Therefore, the amount of patterns become too large to be readily available for (manual) exploration purposes, and thus search-space reducing strategies are important. Our main post-processing strategy significantly reduce this number of patterns by discarding patterns that reside at a higher level in the taxonomy if they are also represented by a pattern that resides on a lower level in the taxonomy. Domain experts
1 2
[ S e n s o r T e s t A n d M e a s u r i n g R i s o , S e n s o r T e s t A n d M e a s u r i n g R i s o , Running , InputVoltageUnderThreshold ] [ S e n s o r T e s t A n d M e a s u r i n g R i s o , S e n s o r T e s t A n d M e a s u r i n g R i s o , Running , InputUnderVoltage ]
Fig. 6 Selected patterns found by APriori with support threshold set to 15 on plant A 1 2 3 4
[ ’ SensorTestAndMeasuringRiso ’ , ’ O p e ra ti ng S ta te ’ , ’ SensorTestAndMeasuringRiso ’ , ’ VoltageFault ’ ] [ ’ SensorTestAndMeasuringRiso ’ , ’ Waiting ’ , ’ SensorTestAndMeasuringRiso ’ , ’ DCUnderVoltageFault ’ ] [ ’ BulkOvervoltage ’ , ’ Waiting ’ , ’ SensorTestAndMeasuringRiso ’ , ’ VoltageFault ’ ] [ ’ BulkOvervoltage ’ , ’ EquipmentFault ’ , ’ EquipmentFault ’ , ’ EquipmentFault ’ ]
Fig. 7 Selected patterns found by PMSM with support threshold set to 15 on plant A
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring …
219
Table 2 Computation time and number of patterns returned by PMSM and APriori, for a pattern length ranging from 1 event to 4 events, various levels of support threshold and various datasets for one month of data Pattern length PMSM on plant A
30% sup. thres. 15% sup. thres. 2% sup. thres.
APriori on plant A PMSM on plant B PMSM on integrated data
15% sup. thres. 15% sup. thres. 15% sup. thres.
1
2
3
Comp. time
145 s
77 s
9m
4 58 m
# of patterns
19
283
3,157
28,149
Comp. time
114 s
174 s
36 m
5 h 46 m
# of patterns
23
365
4,550
47,795
Comp. time
156
137 s
77 m
14 h 52 m
# of patterns
26
572
9843
139,549
Comp. time
1s
1s
1s
1s
# of patterns
10
32
48
37
Comp. time
54 s
57 s
9m
47 m
# of patterns
19
241
2030
10,734
Comp. time
178 s
10 m
39 m
2 h 30 m
# of patterns
27
310
2395
12,815
can deduce the high level patterns if they see the corresponding low level pattern. In addition, patterns not found by APriori are still retrieved, i.e. high-level patterns for which the corresponding leaf level patterns are not retrieved by APriori (and PMSM). Therefore, the meaningless high level patterns, such as the fourth pattern of Fig. 7 which contains three times the root event, are discarded while the insightful patterns are kept. This method significantly reduces the number of patterns. For PMSM, with a support threshold of 15%, only 921 insightful patterns of size 4 are returned (from 47,795 patterns).
4.5.2
PMSM Power on Heterogeneous Data
In order to obtain better insights of the advantages and drawback of our approach, a comparison of the patterns found by PMSM when applied separately to the dataset from each of the two plants and those when applied to the integrated data of these plants have been performed. PMSM only retrieved meaningless patterns, with only a succession of inverter statuses, on plant B. While, on plant A, PMSM also found a few meaningful patterns involving InputUnderVoltage, InputVoltageUnderThreshold or BulkOvervoltage (in addition to the meaningless patterns). However, when applying PMSM to the integrated dataset (at the same support threshold), more insightful patterns were discovered. For instance, Fig. 8 shows some of the retrieved patterns. The first pattern retrieves the link between sensor check and under-voltage fault already mentioned. The two other patterns where not found in the separate datasets. The second pattern indicates that temperature faults are usually followed by various grid faults. The last
220 1 2 3
P. Dagnely et al.
[ ’ SensorTestAndMeasuringRiso ’ , ’ Waiting ’ , ’ InputUnderVoltage ’ , ’ Stop ’ ] [ ’ SpecificFaultDueToTemperature ’ , ’ GridFault ’ , ’ GridFrequencyRapidFallOrRise ’ , ’ GridImpendanceOutOfRange ’ ] [ ’ G r i d F r e q u e n c y R a p i d F a l l O r R i s e ’ , ’ NoCommunicationWithEEPROM ’ , ’ G r i d F r e q u e n c y R a p i d F a l l O r R i s e ’ , ’ NoCommunicationWithEEPROM ’ ]
Fig. 8 Selected patterns found by PMSM with support threshold set to 15 on the integrated dataset
pattern indicates that GridFrequencyRapidFallOrRise is usually interleaved by NoCommunicationWithEEPROM events, i.e. communication problem with the inverter’s memory. Those new patterns reflect general PV plant behaviour, i.e. patterns widespread amongst PV plants. While patterns found on single plant may only reflect the specific plant behaviour. For example, the pattern involving BulkOvervoltage in Fig. 7 is specific to plant A and not retrieved in the integrated dataset. To sum up, our approach is able to manage data integration across PV plants and derives additional insightful patterns of event sequences, which were originally not discovered in single plant datasets. In addition, PMSM finds more insightful patterns than APriori at a given threshold, offers a bird’s-eye view on the plant behaviour, and allows domain experts to easily find patterns at the desired conceptual level. A possible drawback of our approach is the extended computation time needed to find the patterns. However, pattern discovery is rarely employed at a run time, but rather during batch processing phase of hypothesis building and modelling.
5 Conclusion In this paper, we proposed a novel methodology enabling the application of MLSPM together with a semantic model/ontology. Our methodology consists first in an integration of the heterogeneous event logs from PV plants with a semantic model. Moreover, since different manufacturers and models report events at different conceptual levels of granularity and since the finest granularity may sometimes not be the right level of detail for exploitable insights, our methodology tackles this by exploring further PV event logs with MLSPM. It offers a valuable help for the exploration of plant behaviour and can be leveraged to build a predictive maintenance system. Scientifically, our paper contributes to the research on MLSPM by combining SPM with ontologies that extend the search space for the SPM algorithm by facilitating the mining of integrated heterogeneous datasets as encountered in industrial contexts. From industrial applicability perspective, our work illustrates that the vast amount of logged PV events contains exploitable and insightful information that can be unlocked with the right explorative methodology.
Semantically Enriched Multi-level Sequential Pattern Mining for Exploring …
221
We identify the following trajectories as further research possibilities: 1. In order to tackle the extended computation time and the required post-processing step mentioned above, we consider to investigate pre-processing steps for the dataset to reduce the search space. For instance, by pre-selecting event sequences that end in a specific error. 2. Other post-processing methods could be considered e.g. formal concept analysis (FCA), as applied by Egho et al. [9]. They ran M3 SP [16] until very low support threshold and apply FCA to retrieve interesting patterns that can be infrequent, by the mean of a lattice-based classification of the patterns. 3. More optimized MLSPM algorithms could also be applied, especially multi-level and multi-dimensional algorithms, such as M3 SP [16] or MMISP [10]. They could leverage our data by taking into account additional information, such as plant size. Acknowledgements This work was subsidised by the Region of Bruxelles-Capitale—Innoviris.
References 1. F.H. Abanda, J.H.M. Tah, D. Duce, PV-TONS: a photovoltaic technology ontology system for the design of PV-systems. Eng. Appl. Artif. Intell. 26(4), 1399–1412 (2013). https://doi.org/ 10.1016/j.engappai.2012.10.010 2. J. Blair, J. Nunneley, K. Lambert, P. Adamosky, R. Petterson, L. Linse, B. Randle, B. Fox, A. Parker, T. Tansy, in SunSpec Alliance Interoperability Specification-Common Models (2013) 3. A. Boran, I. Bedini, C.J. Matheus, P.F. Patel-Schneider, S. Bischof, An empirical analysis of semantic techniques applied to a network management classification problem, in Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology, vol. 01 (IEEE Computer Society, 2012), pp. 90–96 4. J. Chen, R. Kumar, Pattern mining for predicting critical events from sequential event data log, in Proceedings of the 2014 International Workshop on Discrete Event Systems, Paris-Cachan, France (2014) 5. Y.L. Chen, T.C.K. Huang, A novel knowledge discovering model for mining fuzzy multi-level sequential patterns in sequence databases. Data Knowl. Eng. 66(3), 349–367 (2008). https:// doi.org/10.1016/j.datak.2008.04.005 6. P. Dagnely, E. Tsiporkova, T. Tourwe, T. Ruette, K. De Brabandere, F. Assiandi, A semantic model of events for integrating photovoltaic monitoring data, in 2015 IEEE 13th International Conference on Industrial Informatics (INDIN) (2015), pp. 24–30. https://doi.org/10. 1109/INDIN.2015.7281705 7. Danfoss, in TLX Reference Manual, L00410320-07_02 (2012) 8. X.L. Dong, D. Srivastava, Big data integration, in 2013 IEEE 29th International Conference on Data Engineering (ICDE) (IEEE, 2013), pp. 1245–1248 9. E. Egho, N. Jay, C. Raïssi, A. Napoli, A FCA-based analysis of sequential care trajectories, in The Eighth International Conference on Concept Lattices and Their Applications-CLA 2011 (2011) 10. E. Egho, C. Raïssi, N. Jay, A. Napoli, Mining heterogeneous multidimensional sequential patterns, in ECAI 2014: 21st European Conference on Artificial Intelligence, vol. 263 (IOS Press, 2014), p. 279 11. N.G. Ghanbari, M.R. Gholamian, A novel algorithm for extracting knowledge based on mining multi-level sequential patterns. Int. J. Bus. Syst. Res. 6(3), 269–278 (2012)
222
P. Dagnely et al.
12. IGPlus, in Fronius IG Plus 25 V/30 V/35 V/50 V/55 V/60 V 70 V/80 V/100 V/120 V/150 V: Operating Instructions (2012) 13. S. Lianglei, L. Yun, Y. Jiang, Multi-level sequential pattern mining based on prime encoding. Phys. Proc. 24, 1749–1756 (2012). https://doi.org/10.1016/j.phpro.2012.02.258 14. I. Merelli, H. Pérez-Sánchez, S. Gesing, D. D’Agostino, Managing, analysing, and integrating big data in medical bioinformatics: open problems and future perspectives, in BioMed Research International, 2014 (2014) 15. C.H. Mooney, J.F. Roddick, Sequential pattern mining-approaches and algorithms. ACM Comput. Surv. (CSUR) 45(2), 19 (2013) 16. M. Plantevit, A. Laurent, D. Laurent, M. Teisseire, Y.W. Choong, Mining multidimensional and multilevel sequential patterns. ACM Trans. Knowl. Discov. Data 4(1), 1–37 (2010). https:// doi.org/10.1145/1644873.1644877 17. PowerOne, in “Aurora Photovoltaic Inverters” Installation and Operator’s Manual (2010) 18. M. Šebek, M. Hlosta, J. Zendulka, T. Hruška, MLSP: mining hierarchically-closed multi-level sequential patterns, in Advanced Data Mining and Applications, pp. 157–168 (Springer, 2013) 19. R. Shaw, R. Troncy, L. Hardman, Lode: linking open descriptions of events, in The Semantic Web, pp. 153–167 (Springer, 2009) 20. SMA, in PV Inverter “SUNNY Tripower 8000TL/10000TL/12000TL/15000TL/17000TL” Installation Manual (2012) 21. R. Srikant, R. Agrawal, in Mining Sequential Patterns: Generalizations and Performance Improvements (Springer, 1996) 22. A.P. Wright, A.T. Wright, A.B. McCoy, D.F. Sittig, The use of sequential pattern mining to predict next prescribed medications. J. Biomed. Inf. 53, 73–80 (2015)
One Class Classification Based Anomaly Detection for Marine Engines Edward Smart, Neil Grice, Hongjie Ma, David Garrity and David Brown
Abstract Although extensive research has been undertaken to detect faults in marine diesel engines, significant challenges still remain such as the need for noninvasive monitoring methods and the need to obtain rare and expensive datasets of multiple faults from which machine learning algorithms can be trained upon. This paper presents a method that uses non-invasive engine monitoring methods (vibration sensors) and doesn’t require training on faulty data. Significantly, the one class classification algorithms used were tested on a very large number (12) of actual diesel engine faults chosen by diesel engine experts and maritime engineers, which is rare in this field. The results show that by learning on only easily obtainable healthy data samples, all of these faults, including big end bearing wear and ‘top end’ cylinder leakage, can be detected with very minimal false positives (best balanced error rate of 0.15%) regardless of engine load. These results were achieved on a test engine and the method was also applied to an operational vehicle/passenger ferry engine where it was able to detect a fault on one of the cylinders that was confirmed by the vessel’s engineering staff. Additionally, it was also able to confirm that a sensor fault occurred. Significantly it highlights how the ‘healthiness’ of an engine can be assessed and monitored over time, whereby any changes in this health score can be noted and appropriate action taken during scheduled maintenance periods before a serious fault develops. Keywords Condition monitoring · Fault diagnosis · Support vector machines
E. Smart (B) · H. Ma · D. Brown School of Energy and Electronic Engineering, University of Portsmouth, Portsmouth, UK e-mail: [email protected] N. Grice NGnuity Limited, Wareham, UK D. Garrity STS Defence Limited, Gosport, UK © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_10
223
224
E. Smart et al.
1 Introduction Around 90% [1] of the world’s trade is carried by the merchant fleets of over 150 nations, totalling in number, over 50,000 [2]. The shipping industry has low margins, so operators are keen to reduce costs as much as possible and keep overheads low. Significant research has been carried out to try to optimise various parts of the operation, such as routing, weather and logistics, however, a key aspect is the availability of the vessel for operations. The engines are of vital importance to the vessel, providing the means for propulsion as well as the main source of power for climate (crew and cargo) as well as heating and lighting. Unlike the aerospace or the automotive industries, the marine industry has been relatively slow to adopt offline, and particularly online, condition monitoring technologies to manage their engines. Given the logistics involved in cargo loading and unloading and the limited availability of harbour berths of the necessary size, it is critical that vessels are able to leave and arrive at the scheduled time. If a vessel at sea develops a fault that requires immediate attention, there are a number of potential costs; • Diversion of vessel to a different port with fuel costs of around £1500 per nautical mile • Costs of unloading cargo at an unexpected port • Repairs involving 7 day dry dock costs for engine overall and hull survey. Typical costs are £750,000 in Europe • Costs to travel to new port for cargo loading or unloading to recover the passage plan • Post repair port state control costs • Increased insurance • Fines if incident led to the blocking of a channel or led to other vessels being delayed • Potential removal from Port State Control ‘white’ lists. Table 1 shows illustrative example repair costs for engine failure for common problems. Even for minor engine problems, the costs and duration of repairs are significant. One of the most visible breakdowns occurred on the cruise ship Carnival Splendour on 8th November 2010. Number five diesel generator suffered a mechanical fault that led to the discharge of engine lubricants and fuel. The resulting fire disabled all main engines and meant that the vessel had to be towed into San Diego with air conditioning and refrigeration systems disabled. The total cost of this incident, including full compensation for stranded passengers and the loss of earnings from the cancellation of 11 future cruises reached £39 million. Methods such as thermography [3], tribology [4], monitoring acoustic emissions, visual inspection and cylinder pressure analysis have been used for engine monitoring but one of the most common methods is vibration monitoring. This is because an engine consists of many rotating and reciprocating parts that are often visible in a vibration signal. Additionally, it is non-intrusive, which is a significant advantage over cylinder pressure analysis. Hand held vibration units are often deployed and an
One Class Classification Based Anomaly Detection … Table 1 Illustrative example costs of engine failure Action Costs (GBP) Replacement of piston and piston liner Replacement of crankshaft Replacement of crankshaft and 6 cylinder engine Annealing of crankshaft and machine on site Replacement of turbocharger Replacement of fans Replacement of gearbox
225
Duration
£100,000
2 days in water
£500,000 £1,000,000
10 days in dry dock 12 weeks in dry dock
£150,000
10 days in dry dock
£150,000 £15,000 £230,000 (£360,000)
2 days in water 1 day in water 3 days in water (in dry dock)
engineer may test key engine components once a month to look for excess vibration. The disadvantage with this approach is that it only captures a brief snapshot of the engine behaviour and fault indicators may be missed. Online vibration monitoring allows for 24/7 data capture. Sensor technology, communication methods, processing power and cheap data storage all permit sensor data to be captured at a rapid rate but the key challenge remains in terms of how to analyse it. Marine engineers are in short supply and so it can be difficult to manually analyse the data collected, particularly to establish trend analyses.
2 Background In this section, the available literature for diesel engine condition monitoring is assessed. One common approach is oil and lubrication analysis [5]. Oil comes into contact with the majority of key engine components and so an analysis of its components can provide an indication as to which components are experiencing degradation. Typical methods included ferrography and spectrography. Cylinder pressure analysis [6, 7] has been used successfully to detect faults on marine diesel engines. Pressure sensors can be fitted to each cylinder to identify the peak pressure and also the crank angle at which it occurs at. It can provide direct insight into the combustion process as the firing point and angle can be detected, which directly relates to the efficiency of the engine. A disadvantage of this approach is that it is invasive and the sensors themselves are unreliable in the hostile environment of a cylinder head, especially when continuous condition monitoring is desired. Instantaneous angular speed (IAS) monitoring [8, 9] has been used to deduce information about cylinder pressure and performance as there is a detectable flywheel speed increase when the cylinder fires and a decrease when it compresses. Statistical methods have been employed to extract features from the measured speed.
226
E. Smart et al.
Acoustic emissions [10, 11] are another popular method but challenging to implement as it requires highly specialised sensors and substantial signal processing. Additionally, the signals themselves are typically weaker than vibration signals. Exhaust gas monitoring [12–14] has been used to detect faults though physical measurements of the emitted particles, or from a chemical analysis of the contents of the exhaust gases. One of the most common approaches to diesel engine fault detection is vibration monitoring [15, 16]. This is achieved through the analysis of vibration signals to understand the condition of the engine. It is a popular approach because the vibrations of the engine are theoretically well understood and different engine states result in changes to the vibration signals that can be detected, particularly in terms of the firing sequence. Typical methods for vibration analysis include . . . • Time Domain Features—these include statistical measures such as root mean square (RMS), crest factor and kurtosis. • Frequency Analysis—methods include fast fourier transforms, spectrum analysis, envelope analysis and cepstrum analysis. • Time-Frequency Analysis—methods include Wigner-Ville distribution analysis, wavelets and wavelet packet transforms. Liu et al. [17] used the Wigner-Ville distribution to analyze vibration signals on a 6 cylinder diesel engine with 5 faults with a total recognition rate of 95%, highlighting the capabilities of time-frequency methods. However, the proposed approach did need to be trained on fault examples before being able to recognize them. Jin et al. [18] compares all three of the list methods above and found that a combination of them offered 100% total recognition rate, although just three faults were considered and load was kept constant. Zeng et al. [19] uses multiple vibration sensors on a test engine and shows that detection performance is better collectively over individual sensors. They use the one class classification method ‘Support Vector Data Description’, originally developed by Tax [20], though in a multi-class classification role to classify known faults as opposed to a one class classification scenario in which fault types are unknown. The disadvantage of this approach is that it requires accurate samples of known faults for training, which are expensive to obtain. Delvecchio et al. [21] provide a comprehensive of vibration and acoustic related monitoring of internal combustion engines. The paper highlights that the discrete wavelet transform has been very valuable in extracting fault signs during the processing of vibration signal and this paper seeks to advance the application of wavelets in this area by applying wavelet packet transforms (WPT). WPT has advantages over the discrete wavelet transform as it has better resolution in the higher as well as lower frequencies. It has also successfully been used in other applications such as fault detection of induction motors requires analysis of non-stationary signals [22]. One class classification techniques are ideally suited to condition based maintenance but there are clear challenges in applying them as only one well sampled class is available for learning. Khan and Madden [23] and Tax [24] present a detailed review of these techniques. Wei et al. [25] use a kernel regression based anomaly
One Class Classification Based Anomaly Detection …
227
detection method to detect faults on low speed diesel engines. The method successfully detects an anomaly but it is not tested on common engine problems such as fuel starvation, air filter blockages, worn piston skirts etc. Kondo et al. [26] look at anomaly detection using vibration sensors on railway traction machines. Although faults were simulated, the method reported a high AUC. Wang et al. [27] used the support vector data description on roller bearings with strong results. Lazakis et al. [28] used anomaly detection methods to achieve good results on marine engine data points such as exhaust temperatures and fuel inlet temperatures. Li et al. [29] used anomaly detection successfully on vibration data, highlighting its application for electro-mechanical systems. Tax divides one class classification algorithms in three types; density based methods, distance based methods and reconstruction based methods. It is clear that a key challenge in diesel engine monitoring lies in the development of a practical system for accurate fault detection but with minimal false positives. Data for all types of faults is highly unlikely to be obtainable so this paper aims to fill this much needed gap and show how engine faults can be detected using only healthy samples.
3 Design of the Test Engine Facility The key factor in choosing a suitable test engine for the facility was the size, particularly as marine engines can be very large and occupy large amounts of space. The engine selected was a refurbished 3 cylinder Kubota D905 engine that used to provide lighting for motorway maintenance teams in the UK. The engine RPM was between 1550 and 1650 RPM and had a maximum load of 6 kW, of which increments of 0.5 kW could be simulated using storage heaters. Figure 1 shows the test engine laboratory with the engine and attached generator, all supported on a metal frame on a concrete floor, designed to remove excess vibration. The Prosig data collection unit is attached to the wall and the magnetic accelerometers are attached to this unit via armoured cables. The accelerometers chosen were the Monitran MTN/1100SC constant current accelerometers. Data was collected using the Prosig 9200 data acquisition unit, which has 16 channels, 24-bit ADC and a sampling rate of up to 32 kHz (when using just 6 channels). Six magnetic accelerometers were attached to the engine (see Figs. 2 and 3), with two sensors on each of the x, y and z axes. A top dead center sensor was placed on the fly wheel so that the number of rotations could accurately be recorded. Figure 2 shows the sensors that are attached to the left side of the engine relative to the view of the engine in Fig. 1 Figure 3 shows the sensors that are attached to the right side of the engine relative to the view of the engine in Fig. 1.
228
Fig. 1 View of the test engine laboratory
Fig. 2 View of the left side of the engine with attached sensors
E. Smart et al.
One Class Classification Based Anomaly Detection …
229
Fig. 3 View of the right side of the engine with attached sensors
The advantage of having a custom test facility was that a wide variety of faults could be imposed on the engine. After consultation with diesel experts and marine engineers, a wide range of faults to simulate were selected and can be found in Table 2. Throughout this paper, for ease of reference, faults are referred to by their fault code.
4 Feature Selection When constructing classifiers, particularly one class classifiers where data from one class is usually non-existent, it is important to select appropriate features for learning. Vibration signals can be analysed using a variety of methods such as (short) fast fourier transforms (FFT), cepstrum analysis and wavelet analysis. For constant speed applications, where frequencies do not vary over time, FFT is particularly suited to extracting fault signs. However, for marine engines, which have varying speed profiles, particularly in rough seas, wavelet analysis is particularly suitable. For this application, wavelet packet transforms were selected due to their ability to extract useful information at high as well as low frequencies.
230
E. Smart et al.
Table 2 List of engine states simulated Fault code Description F0.0 F1.1 F1.2 F1.3 F2.1 F2.2 F3.1 F4.1 F5.1 F6.1 F7.1 F7.2 F8.1
Fault simulated
Baseline mapping Loose engine mount (front right) Loose engine mount (front left) Loose engine mount (front both) Leaking exhaust valve cyl. 3 Inlet tappet valve clearance cyl. 3 Fuel starvation cyl. 3
Normal operation Loose engine mounts Loose engine mounts Loose engine mounts ‘Top end’ cylinder leakage Push rod wear
No fuel flow/incomplete combustion Big end bearing slack cyl. 3 Big end bearing wear Piston ring gap increase Worn piston rings (blow by) Worn piston skirt on anti/thrust Bore wear faces Air filter blockage (45% total Air filter clogging area) Air filter blockage (90% total Air filter clogging area) Top ring-land increase gap Top ring-land wear (ring twist/ring flutter)
Wavelets [30, 31] are a powerful tool for analysing stationary and non-stationary transient signals. They feature the dilation property which allows them to adjust the width of the frequency band and the location of its central frequency so they can automatically focus on the positions of high and low frequency changes. Gaeid and Ping [32] provides a good review of wavelets and their useful applicability to fault detection. For any signal x (t) ∈ L2 (R) where R is the set of real numbers and t is time, the continuous wavelet transform is given by the convolution of the signal with a scaled conjugated wavelet W (α, β) where * denotes the complex conjugate; namely 1 W (α, β) = α − /2
∞
−∞
x(t)ψ ∗
t−β dt. α
(1)
The term W (α, β) indicates how similar the wavelet and signal are through the scale (or pseudo frequency) parameter α and time shift parameter β. It shows that wavelets are a time-frequency analysis tool. To chose the scale and time shift parameters, it is noted that only dyadic scales can be used without information loss, leading to the discrete wavelet transform, given by
One Class Classification Based Anomaly Detection …
m ψm,n (t) = 2− /2 ψ 2−m t − n
231
(2)
where α = 2m and β = n2m . These discrete wavelets also form an orthonormal basis. Wavelet analysis can then be performed via a low-pass filter h(n) relating to the scaling function ϕ(t) and a high-pass wavelet filter g(n) that is related to the wavelet function ψ(t): 1 h(n) = 2− /2 ϕ(t), ϕ(2t − n)
(3)
1 g(n) = 2− /2 φ(t), φ(2t − n) .
(4)
In decomposition of the signal x(t) (see Fig. 4), the application of the low and high pass filters leads to two vectors cA1 (approximation coefficients) and cD1 (detail coefficients). In wavelet transform decomposition, this step is repeated on the approximation vector to achieve the required depth of decomposition. The symbol ↓ 2 denotes down-sampling (omitting the odd indexed elements of the filter). In the reconstruction step, low and high-pass reconstruction filters are convolved with cA1 and cD1 respectively, resulting in A1 1 (approximation signal) and D1 1 (detail signal). This is possible because x = A1 1 + D1 1 . Furthermore, these reconstruction signals satisfy A1 j−1 = A1 j + D1 j x = A1 j + D1 i .
(5) (6)
ij
for positive integers i and j. If T is the sampling rate and Nt is the length of x(t) then each vector Aj contains roughly Nt 2j data
The jth decomposition provides points. information about a frequency band 0, T 2j+1 .
Fig. 4 Wavelet decomposition to depth 3
232
E. Smart et al.
Wavelet packet transforms [31] are a generalisation of the wavelet transform. Define two functions W0 (t) = ϕ(t)
(7)
W1 (t) = ψ(t).
(8)
where ϕ(t) is the scaling function and ψ(t) is the wavelet function. In the orthogonal case, for m = 0, 1, 2, . . ., functions Wm (t) are obtained by W2m (t) = 2
2N −1
h(n)Wm (2t − n)
(9)
g(n)Wm (2t − n)
(10)
n=0
W2m+1 (t) = 2
2N −1 n=0
j Wj,m,n (t) = 2− /2 Wm (2−j t − n).
(11)
where j is a scale parameter and n is a time localisation parameter. The functions Wj,m,n are called wavelet packet atoms. The difference between this method and wavelet transforms is that both the details and the approximations are further decomposed, thus giving a wavelet packet tree (see Fig. 5). Each decomposition contains a set of nodes, indexed by positive integers (i, j) where j is the node depth and i is the node position at that depth for i = 0, 1, . . . , 2j − 1.
Fig. 5 Wavelet decomposition to depth 3
One Class Classification Based Anomaly Detection …
233
The signal energy for sub-signal node j
Bi (i = 0, 1, . . . , 2j − 1) (i.e. the approximation and detail) at depth j is given by 100 Ei =
2 M j Bi (k)
k=1
j −1 2
i=0
M 2 j Bi (k)
(12)
k=1
where the numerator is the energy for a given node, the denominator is the energy of the whole signal and M is the number of sampling points.
5 One Class Classification Methods One class classifiers are designed for classification problems where data for one of the classes is either non-existent or poorly sampled compared to the other class. For many industrial applications, particularly for fault or novelty detection problems, this condition is often true. Traditional two class or multi-class classification algorithms will struggle to achieve high accuracy because of the large class imbalance between healthy and faulty samples. Therefore, even though in this paper the fault class is well sampled with examples of each fault type, one class classification algorithms have been chosen because of their applicability to real world situations. There are three broad types of one class classification methods [24]; density methods, boundary methods and reconstruction methods. Density methods focus on estimating the density of the training set, often by making assumptions regarding the density model (e.g. gaussian or poisson). Boundary methods focus on estimating a boundary (based on distance) between the healthy class and the faulty class. This method can have advantages over density based methods if the amount of data is insufficient to accurately estimate the density of the healthy class. It also has the additional advantage that the distance metric can be used as an estimate of severity. Reconstruction methods make assumptions about how the data has been generated and then they select a model to fit the data. In this paper, one method from each of these three broad method types was selected. The density method selected was the Parzen windows, the boundary method selected was the one class Support Vector Machine and the reconstruction method selected was k-means.
234
E. Smart et al.
5.1 Parzen Windows Density Estimation This method is an extension on the mixture of Gaussians model. The density estimated is a mixture of usually Gaussian kernels centered on the individual training objects with (often) diagonal covariance matrices i = hI . It takes the form pp (x) =
1 pN (x; xi , hI ). N i
(13)
The equal width h in each feature direction means that the Parzen density estimator [33] assumes equally weighted features and so will be sensitive to scaling. The free parameter h is optimized using the maximum likelihood solution. Since there is just one parameter, the data model is very weak and so success depends entirely on a representative training set. The training time is very small but the testing time is rather expensive, especially with large feature sets in high dimensional spaces. This is because all the training objects have to be stored and during testing, distances to all the training objects must be computed and then sorted.
5.2 Support Vector Novelty Detection One Class Support Vector Machine (OCSVM) [34, 35] is a useful novelty detection method based on the support vector machine [36]. Consider ‘normal’ training data x1 , x2 , . . . , xl ∈ Rn . Let φ be the mapping φ : R → F into some feature dot product space F. Let k(x, y) = (φ(x), φ(y)) be a positive definite kernel which operates on this paper, the kernel used the mapping φ. In is the Gaussian kernel, k(x, y) = exp − x − y 2 /2σ 2 , as it suppresses growing distances in larger feature spaces. Here, σ is the width parameter associated with the Gaussian kernel. The data is mapped into the feature space via the kernel function and is separated from the origin with maximum margin. The decision function is found by minimising the weighted sum of the support vector regulariser and the empirical error term depending on a margin variable ρ and individual error terms ξi , min
w∈F,ξ ∈R ,ρ∈R l
subject to
1 2
w 2 +
1 νl
l i=1
ξi − ρ,
(w · φ(xi )) ≥ ρ − ξi , ξi ≥ 0,
(14)
where w is a weight vector in F and v is the fraction of the training set to be regarded as outliers. Using Lagrangian multipliers, αi , βi ≥ 0, with constraints and setting the derivatives of those multipliers with respect to w equal to zero leads to
One Class Classification Based Anomaly Detection … l
w=
235
αi φ(xi ),
(15)
i=1 l
αi = 1,
(16)
i=1
1 . νl
αi + βi =
(17)
The dual problem is formulated to give l
min
α∈R
l
subject to
αi αj k(xi , xj ),
i,j=1 l
αi = 1,
i=1
0 ≤ αi ≤
(18)
1 . νl
Solutions for the dual problem yield parameters w0 , ρ0 where w0 =
Ns
αi φ(si ),
(19)
Ns Ns 1 αi k(si , x). ρ0 = Ns j=1 i=1
(20)
i=1
Here, Ns is the number of support vectors and si denotes a support vector. The decision function is given by f (x) = sgn(w · φ(x) − ρ0 )
N s = sgn αi k(si , x) − ρ0 .
(21) (22)
i=1
The ‘abnormality’ detection function is then given by g(x) = ρ0 −
Ns
αi k(si , x).
(23)
i=1
The user has to choose the appropriate kernel, with its associated parameters, for the problem. However, rather than choosing an error penalty C as via the classical SVM method, one chooses a value for ν which is the fraction of the training set to be classified as outliers. The software used for this classifier is LIBSVM for Matlab version 3.21 [37].
236
E. Smart et al.
5.3 K-Means K-Means clustering is a data reconstruction method [38]. It assumes the data is clustered and that it can be characterized by a few prototype vectors denoted by μk . Euclidean distance is used to measure the distance between target objects and the prototypes.The placing of the prototypes is done by optimizing the following error min xi − μk 2 . A key advantage of the K-Means algorithm is that all εk−m = i
k
the distances to the prototypes are averaged, making the method more robust against remote outliers. The implementation used is that which is found in dd_tools [39].
6 Case Studies—Method and Results In this section, two sets of experiments are presented. The first tests the one class classification algorithms on vibration data obtained from the 3 cylinder land based test engine (see Fig. 1). The second tests the approach on an operational 8 cylinder ferry engine to further demonstrate its applicability to real world working environments. The aim of these experiments was to explore the capabilities of one class novelty detection methods to accurately detect faults imposed on the engine. In particular, it was to test whether accurate fault detection could take place with minimal false positives, whilst training the algorithms on healthy data only. For these case studies Matlab 2017a was used as the development environment. The implementation used for the one class Parzen Windows and the Kmeans algorithms is found in dd_tools [39]. The implementation of the one class SVM is found in LIBSVM for Matlab version 3.21 [37].
6.1 Error Metric One class classification methods are designed for use when there is a large imbalance between the healthy and faulty classes and as such, traditional metrics such as classification accuracy are not suitable. One of the most commonly used metrics is the Balanced Error Rate (BER). For a dataset with H healthy samples and F un-healthy samples, for a one class classifier, let the number of false positives be given by FP and the number of false negatives be given by FN . Then the BER is defined as; 1 BER = 2
FP FN + H F
.
(24)
A BER of 50% is equivalent to chance and a BER of 0% is a perfect result so strong classifiers will have a small BER.
One Class Classification Based Anomaly Detection … Table 3 List of algorithm parameters and their ranges Parameter Values Fraction of training set to be rejected (ν) SVM kernel width Length of Parzen hypercube (h) Number of cluster centers
237
Methods
0.01, 0.05, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5 0.01, 0.1, 1 0.5, 1, 5, 10, 15
All OSCVM Parzen windows
1, 2, 3, . . ., 18, 19, 20
K-Means
6.2 Case Study 1—Test Engine The authors had full access to the test engine and as such, were able to acquire a comprehensive amount of data across all the engine states listed in Table 2. For the following engine states (0 kW—No Load, 3 kW—Half Load, 6 kW—Full Load), 3 sets of 60 s worth of vibration data was captured (sampled at 2.4 kHz). The method employed was as follows: • For all data collected, using a Hamming window of length 1000 with a 50% overlap, extract the signal energies using wavelet packet transforms at depths 5–9 using the Db5 wavelet. • For fault codes F1-8 (i.e. the faulty data), this data forms the faulty test set • For the baseline data (fault code 0), this data is used to form the healthy training and testing data sets. The training set is created by taking at random 2 out of the 3 sets for each load. The healthy testing set consists of the remaining set from each load. • Perform 5 fold cross validation on the training set using a grid search on the training parameters listed in Table 3. • Test the models using the healthy and faulty testing sets.
6.2.1
Test Engine Results
In this section, the results of classification across all one class learning methods are compared. Table 4 shows the mean and standard deviation of the best classification across the 5-fold cross validation for all 3 methods at all wavelet decomposition depths. It shows clearly that despite only being trained on healthy data, the one class classifiers have successfully detected faults of 12 different types with minimal false positives. Each classifier was able to accurately detect all the faults, regardless of load, with a maximum of 4.54% error on the healthy test over all classifiers. Figure 6 shows the health score plot for the K-Means and OSCVM classifiers for each of the engine states listed in Table 2. Nearly all of the healthy data (fault code F0.0) samples have
238
E. Smart et al.
Table 4 Best mean (Std) classification results for all methods for all wavelet depths Wavelet depth Parzen mean (Std) % K-means mean (Std) OCSVM mean (Std) BER % BER % BER 5 6 7 8 9
0.84 (0.07) 0.71 (0.08) 1.31 (0.10) 0.88 (0.08) 2.27 (0.29)
0.14 (0.05) 0.15 (0.04) 0.50 (0.30) 0.47 (0.18) 0.56 (0.23)
2.02 (0.27) 1.84 (0.28) 1.97 (0.15) 1.53 (0.17) 1.45 (0.01)
Fig. 6 Health score plot K means and OCSVM classifier
a positive score, whilst all the faulty samples have clearly negative health scores. Differences with the scores for individual engine states are often attributable to load variations. Significantly, the results show that one class classifiers trained on healthy data containing different engine loads are able to accurately detect faults on engines operating at different loads. Figure 7 shows the health score output for the Parzen Windows classifier. Whilst it is accurate in terms identifying the differences between healthy and faulty samples, the flatness of the health score for fault states gives no information about fault severity.
6.2.2
Key Frequencies
The low BER values highlights that the choice of features were able to provide sufficiently good discrimination to determine the differences between the healthy and faulty samples. Table 5 shows the key fault frequency indicators for sensor 6.
One Class Classification Based Anomaly Detection …
239
Fig. 7 Health score plot Parzen windows Table 5 Key fault indicator frequencies for each fault type Fault code Key frequencies (Hz) F1.1 F1.2 F1.3 F2.1 F2.2 F3.1 F4.1 F5.1 F6.1 F7.1 F7.2 F8.1
1153, 1188 59, 525 35, 525 21, 59 482, 525 117, 513 21, 59 59, 525 59, 513 513, 525 155, 455 21, 59
These were chosen by taking the top two frequencies with the highest F-score [40] across all loads. The cylinder firing frequency for this engine is approximately 40 Hz and for several faults, the 0.5 (21 Hz) or 1.5 (60 Hz) harmonics of this frequency are prominent, which is consistent with the engine architecture.
240
E. Smart et al.
Fig. 8 Monitran magnetic vibration sensor attached to cylinder head
6.3 Case Study 2—Ferry Engine The algorithms are deployed on an engine on a passenger and vehicle ferry that travels over 45,000 miles per annum on regular sailings. The engine in question is a Stork-Wartsila 8FHD 240G, capable of developing 1360 kW at 750 RPM. It is a turbo charged and intercooled 8 cylinder 4 stroke diesel engine. The engine itself was fitted with the same Monitran MTN/1100SC constant current accelerometers where one such sensor was placed on each of the 8 cylinder heads (see Fig. 8). Vibration data, sampled at 2.4 kHz, was collected from each sensor 24/7 during the period 13th February 2016 to 8th November 2016. Data collection was occasionally not continuous due to scheduled engine maintenance, data acquisition unit failure, bad weather or national holidays. As a point of interest, to give some indication of the size comparison, this ferry engine produces a power output that is over 200,000 times greater than that of the Kubota engine. Based on the results of the Kubota engine, the one class SVM algorithm was selected. The Parzen Windows classifier was rejected because its output was mostly flat and it offered little ability to indicate severity. Although the Kmeans classifier offered the best performance on the test engine, its performance varied significantly depending on the choice of k. Given the difficulty in determining the value of k, the one class SVM was used to compute the health score for this engine.
6.3.1
Ferry Results
Figure 9 shows the normal operation of the boat in terms of engine speed and demonstrates how the vessel goes back and forth between two ports at set engine speeds. When the vessel is in port, the engine operates at idle speed and when the vessel is sailing between ports, the engine can operate at either intermediate or full speed.
One Class Classification Based Anomaly Detection …
241
Fig. 9 Plot of engine speed during normal vessel operation
The analysis of the ferry data posed a different challenge as, unlike the test engine, there was no neatly labelled healthy or faulty data sets. However, the ferry engine had received significant maintenance during January and early February 2016, before returning to service mid February and the engineers onboard asserted that the engine was healthy. Initially the data was unusable from cylinder 6 due to a faulty sensor. A one class support vector machine was trained on data from each cylinder collected from 25th to 28th February. Figure 10 shows the health score plot for cylinder 6 from 13th February through to 8th November. Data from the faulty sensor on cylinder 6 (13th–24th February) is clearly identified as abnormal (see Fig. 10—left hand side), which shows that the algorithm can identify monitoring equipment faults. Cylinder 6 started showing a gradual reduction in health score from mid March and on 21st March, the vessel’s Chief Engineer confirmed that cylinder 6 was experiencing reduced cylinder pressure, though not to the point that the vessel’s engine could not operate. During the middle of May, the health score dropped significantly for cylinder 6. On June 22nd, the Chief Engineer reported that the forward exhaust temperature for cylinder 6 (see Fig. 11) was significantly lower than the other cylinders. As the engine was old and due to be fully replaced during winter 2016, the decision was made to manage the fault rather than repair it.
7 Conclusions The results show that not only can one class classification algorithms be used to detect marine diesel engine faults but that their performance, as shown by the BER metric, is very strong. The algorithms have successfully been able to detect a comprehensive
242
E. Smart et al.
Fig. 10 Health score plot for Ferry engine cylinder 6
Fig. 11 Low forward exhaust temperature for Ferry engine cylinder 6
variety of diesel engine faults with minimal false positives. The impact of this is significant as developers of predictive monitoring systems for diesel engines no longer need to acquire expensive and hard to obtain data sets of engines in different fault conditions over different loads to develop effective monitoring systems. Additionally, this is one of the first papers that presents algorithms that have been tested against a comprehensive list of diesel engine faults rather just one or two. Furthermore, the faulty data was obtained when the engine was running at different loads. This data set was constructive under the guidance of a mechanical engineer,
One Class Classification Based Anomaly Detection …
243
an engineering superintendent for a tug company and a senior researcher with over 30 years experience in diesel engine operation and fault analysis. A key result of this work is that it was achieved using non-invasive monitoring methods. With traditional methods such as cylinder pressure analysis requiring the time consuming and costly fitting of pressure sensors and tribology requiring expensive experts and time consuming methods, our proposed method is quicker and cheaper as magnetic vibration sensors can be fitted and attached to data acquisition units very quickly. Furthermore, in the event of engine repair, the monitoring system can be removed quickly and replaced rapidly, unlike a pressure sensor based system. Furthermore, the output of the one class support vector machine can be used as a measure for the severity/healthiness of the condition of the engine. Although this metric has been used in other applications as a measure of severity, it is one of the first times it has been used for marine diesel engines. Future work will see this approach tested on larger static diesel engines as well as engines on marine vessels such as ferries and tankers. Acknowledgements The authors would like to thank and acknowledge financial support from Innovate UK (formerly known as the Technology Strategy Board) under grant number 295158. Additionally, the authors would like to thank Martin Gregory for his work on the test engine design and the construction of the faults imposed on it.
References 1. Shipping and world trade. http://bit.ly/18ur7fz (2017). Accessed 7 Feb 2017 2. Statistica, Number of ships in the world merchant fleet 2016. http://bit.ly/2kmcjbc, 2017. Accessed 7 Feb 2017 3. S. Bagavathiappan, B.B. Lahiri, T. Saravanan, J. Philip, T. Jayakumar, Infrared thermography for condition monitoring—a review. Infrared Phys. Technol. 60, 35–55 (2013) 4. X. Yan, C. Sheng, J. Zhao, K. Yang, Z. Li, Study of on-line condition monitoring and fault feature extraction for marine diesel engines based on tribological information. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 229(4), 291–300 (2015) 5. D. Boullosa, J.L. Larrabe, A. Lopez, M.A. Gomez, Monitoring through T2 hotelling of cylinder lubrication process of marine diesel engine. Appl. Therm. Eng. 110, 32–38 (2017) 6. D.T. Hountalas, R.G. Papagiannakis, G. Zovanos, A. Antonopoulos, Comparative evaluation of various methodologies to account for the effect of load variation during cylinder pressure measurement of large scale two-stroke diesel engines. Appl. Energy 113, 1027–1042 (2014) 7. D. Watzenig, M.S. Sommer, G. Steiner, Model-based condition and state monitoring of large marine diesel engines, in Diesel Engine-Combustion, Emissions and Condition Monitoring (Intech, 2013), pp. 217–230 8. T.R. Lin, A.C.C. Tan, L. Ma, J. Mathew, Condition monitoring and fault diagnosis of diesel engines using instantaneous angular speed analysis. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 229(2), 304–315 (2015) 9. Y. Yuan, X. Yan, K. Wang, C. Yuan, A new remote intelligent diagnosis system for marine diesel engines based on an improved multi-kernel algorithm. Proc. Inst. Mech. Eng. Part O J. Risk Reliab. 229(6), 604–611 (2015) 10. B. Dykas, J. Harris, Acoustic emission characteristics of a single cylinder diesel generator at various loads and with a failing injector. Mech. Syst. Sig. Process. 93, 397–414 (2017)
244
E. Smart et al.
11. W. Wu, T.R. Lin, A.C.C. Tan, Normalization and source separation of acoustic emission signals for condition monitoring and fault detection of multi-cylinder diesel engines. Mech. Syst. Sig. Proces. 64, 479–497 (2015) 12. O.C. Basurko, Z. Uriondo, Condition-based maintenance for medium speed diesel engines used in vessels in operation. Appl. Therm. Eng. 80, 404–412 (2015) 13. C.P. Cho, Y.D. Pyo, J.Y. Jang, G.C. Kim, Y.J. Shin, Nox reduction and N2 O emissions in a diesel engine exhaust using Fe-zeolite and vanadium based SCR catalysts. Appl. Therm. Eng. 110, 18–24 (2017) 14. J. Kowalski, B. Krawczyk, M. Wo´zniak, Fault diagnosis of marine 4-stroke diesel engines using a one-vs-one extreme learning ensemble. Eng. Appl. Artif. Intell. 57, 134–141 (2017) 15. K. Jafarian, M. Mobin, R. Jafari-Marandi, E. Rabiei, Misfire and valve clearance faults detection in the combustion engines based on a multi-sensor vibration signal monitoring. Measurement 128, 527–536 (2018) 16. J. Porteiro, J. Collazo, D. Patiño, J. Luis Míguez, Diesel engine condition monitoring using a multi-net neural network system with nonintrusive sensors. Appl. Therm. Eng. 31(17), 4097– 4105 (2011) 17. Y. Liu, J. Zhang, L. Ma, A fault diagnosis approach for diesel engines based on self-adaptive WVD, improved FCBF and PECOC-RVM. Neurocomputing 177, 600–611 (2016) 18. C. Jin, W. Zhao, Z. Liu, J. Lee, X. He (2014) A vibration-based approach for diesel engine fault diagnosis, in 2014 IEEE Conference on Prognostics and Health Management (PHM) (IEEE, 2014), pp. 1–9 19. R. Zeng, L. Zhang, J. Mei, H. Shen, H. Zhao, Fault detection in an engine by fusing information from multivibration sensors. Int. J. Distrib. Sens. Netw. 13(7), 1550147717719057 (2017) 20. D.M.J. Tax, R.P.W. Duin, Support vector data description. Mach. Learn. 54(1), 45–66 (2004) 21. S. Delvecchio, P. Bonfiglio, F. Pompoli, Vibro-acoustic condition monitoring of internal combustion engines: a critical review of existing techniques. Mech. Syst. Sig. Process. 99, 661–683 (2018) 22. P. Konar, P. Chattopadhyay, Multi-class fault diagnosis of induction motor using hilbert and wavelet transform. Appl. Soft Comput. 30, 341–352 (2015) 23. S.S. Khan, M.G. Madden, One-class classification: taxonomy of study and review of techniques. Knowl. Eng. Rev. 29(3), 345–374 (2014) 24. D.M.J. Tax, One class classification. Ph.D. thesis (Technische Universiteit Delft, 2001) 25. M. Wei, B. Qiu, Y. Jiang, and Xiao He. Multi-sensor monitoring based on-line diesel engine anomaly detection with baseline deviation, in Prognostics and System Health Management Conference (PHM-Chengdu), 2016 (IEEE, 2016), pp. 1–5 26. M. Kondo, S. Manabe, T. Takashige, H. Kanno, Traction diesel engine anomaly detection using vibration analysis in octave bands. Quart. Rep. RTRI 57(2), 105–111 (2016) 27. S. Wang, Y. Jianbo, E. Lapira, J. Lee, A modified support vector data description based novelty detection approach for machinery components. Appl. Soft Comput. 13(2), 1193–1205 (2013) 28. I. Lazakis, C. Gkerekos, G. Theotokatos, Investigating an SVM-driven, one-class approach to estimating ship systems condition. Ships Offshore Struct., 1–10 (2018) 29. D. Li, S. Liu, H. Zhang, Negative selection algorithm with constant detectors for anomaly detection. Appl. Soft Comput. 36, 618–632 (2015) 30. S. Mallat. A Wavelet Tour of Signal Processing: The Sparse Way. 3rd edn (Academic Press, 2008) 31. D.K. Ruch, P.J. Van Fleet, Wavelet Theory: An Elementary Approach with Applications (Wiley, 2011) 32. K.S. Gaeid, H.W. Ping, Wavelet fault diagnosis and tolerant of induction motor: a review. Int. J. Phys. Sci. 6(3), 358–376 (2011) 33. E. Parzen, On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962) 34. B. Schölkopf, J.C. Platt, J. Shawe-Taylor, A.J. Smola, R.C. Williamson, Estimating the support of a high-dimensional distribution. Neural Comput. 13, 1443–1471 (2001)
One Class Classification Based Anomaly Detection …
245
35. B. Schölkopf, R.C. Williamson, A.J. Smola, J. Shawe-Taylor, J.C. Platt, Support vector method for novelty detection, in Advances in Neural Information Processing Systems, vol. 12, ed. by S.A. Solla, T.K. Leen, K.-R. Müller (MIT Press, Cambridge, MA, 2000), pp. 582–588 36. B.E. Boser, I.M. Guyon, V.N. Vapnik, A training algorithm for optimal margin classifiers, in COLT’92: Proceedings of the Fifth Annual Workshop on Computational Learning Theory (ACM, New York, NY, USA, 1992), pp. 144–152 37. C.-C. Chang, C.-J. Lin, LIBSVM: A Library for Support Vector Machines (2001). Software available at http://bit.ly/2vzezCN 38. C.M. Bishop, Neural Networks for Pattern Recognition (Oxford University Press, USA, 1995) 39. D.M.J. Tax, Ddtools, the data description toolbox for matlab. Version 2.1.2, June 2015 40. Y.W. Chen, C.J. Lin, Combining SVMS with various feature selection strategies. Stud. Fuzziness Soft. Comput. 207, 315 (2006)
Enhanced Methodologies in Photovoltaic Production with Energy Storage Systems Integrating Multi-cell Lithium-Ion Batteries J. B. L. Fermeiro, J. A. N. Pombo, R. L. Velho, G. Calvinho, M. R. C. Rosário and S. J. P. S. Mariano Abstract The increasing world’s energy demand and the emerging environmental concerns are encouraging the search for clean energy solution. Renewable sources are free, clean and virtually limitless and for those reasons they present a great potential. This chapter addresses two concerns related to photovoltaic (PV) production with energy storage system integrating multi-cell Lithium-ion batteries. To increase the efficiency of a PV production, a Maximum Power Point Tracking (MPPT) method is proposed based on the particle swarm optimization (PSO) algorithm. The proposed PSO-based MPPT is able to avoid the oscillations around the maximum power point (MPP) and the convergence to a local maximum under partial shading conditions. Also, it exhibits an excellent tracking under rapid variation in the environment conditions (irradiance and temperature). Additionally, a new charging method was developed based on the parameters of the battery pack in real time, extending the battery lifespan, improving the capacity usage and the performance of the Energy Storage System (ESS). The proposed Lithium Ion (Li-ion) battery charging method analyses at each moment the difference between the desired voltage and the mean voltage of the cells, the temperature of the pack and the difference of voltages between cells. Based on the obtained information, the algorithm calculates the charging current through trilinear interpolation. It should also be noted that the proposed charging method combines a balancing method and a state of charge determination method based on the Coulomb counting method, which represents an innovation when compared to the existing methods in the literature. The experimental results of both methods demonstrate excellent performance allowing to, on the one hand, achieve an optimized PV production, and on the other hand, make the ESS more effective and efficient.
J. B. L. Fermeiro (B) · J. A. N. Pombo · R. L. Velho · G. Calvinho · M. R. C. Rosário · S. J. P. S. Mariano Department of Electromechanical Engineering, Universidade da Beira Interior, Calçada Fonte do Lameiro, Covilhã, Portugal e-mail: [email protected] J. B. L. Fermeiro · M. R. C. Rosário · S. J. P. S. Mariano Instituto de Telecomunicações, Covilhã, Portugal © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_11
247
248
J. B. L. Fermeiro et al.
Keywords Photovoltaic MPPT · Particle swarm optimization · Partial shading conditions · Energy storage systems · Battery cell charging · Battery cell balancing
1 Introduction The increasing demand for electric energy worldwide along with the urgent need to reduce pollution and carbon emissions are causing one of the most important challenges over the past years. In this context of sustainable development, there is now a clear focus on decentralized production of electricity, based on a mix of technologies with renewable sources because they are free, clean and virtually limitless. Among the various renewable energy sources, solar energy is seen as an attractive source of renewable energy for microgrids and remote systems, due to its advantages, notably low operating cost and low maintenance cost. The purpose of any solar energy production is to obtain the maximum power available at every instant. A photovoltaic (PV) panel has a non-linear power output which depends mainly on irradiance and temperature‚ and it presents more than one local optimum under partial shading conditions. For these reasons and to increase the efficiency of a PV production it is necessary to use a controller that tracks the maximum power available at every instant. Many examples of Maximum Power Point Tracking (MPPT) methods can be found in the literature, diverging in complexity, speed of convergence, required sensors, cost, efficiency, hardware implementation and other aspects [1–4]. These can be roughly divided into two main groups, intelligent and non-intelligent. A very popular non-intelligent MPPT is the perturb and observe (P&O) method [5, 6] which has two classic configurations. In the first configuration, there is no feedback (open loop) and the algorithm consists of perturbing the duty cycle of the converter and taking periodic samples of voltage and current values. After this, the algorithm changes the modulation index (duty cycle) in a way that allows the PV panel to work in its Maximum Power Point (MPP). In the second configuration there is a feedback (closed loop) of either the voltage or the current of the PV panel, and the algorithm consists of perturbing the PV reference and comparing the value of power before and after that perturbation. In both configurations, when the MPP is reached the algorithm oscillates around it. The amplitude of the perturbation is fixed and predetermined, and the oscillation around the MPP is determined by this value. A small amplitude value results in small oscillation but higher convergence time whereas a large amplitude results in high oscillations around the MPP and a faster convergence. This conflict can be resolved with variable perturbation amplitude, starting with a reasonably high value and decreasing it as the algorithm approaches the MPP. Another well-known non-intelligent MPPT is the incremental conductance method [7]. This is similar to the P&O method, with the exception that it compares the instantaneous and the incremental conductances. Although this is a technique known as more efficient than the P&O method, because it can determine the MPP, in practice, the MPP determination
Enhanced Methodologies in Photovoltaic Production …
249
is difficult to achieve due to several factors such as noise or the analog-to-digital conversion of the voltage and current, making it oscillate around the MPP as well. All the above-mentioned methods have the advantages of easy implementation and low computational cost but they also have some drawbacks, like oscillations around the MPP and poor tracking under rapid changing irradiation levels. Also, they can be trapped in a local maximum in partial shading conditions. An effort to overcome these disadvantages has been developed with alternative methods based on artificial intelligence, such as neural networks, fuzzy logic and Metaheuristic-based MPPT algorithms. Although neural networks and fuzzy logic are both highly powerful and dynamic techniques, they have a high computational cost. Therefore, the metaheuristic-based MPPT algorithms have emerged as an alternative approach because of their skills to solve complex non-linear and dynamic problems. Thus, several Metaheuristic-based MPPT have been proposed such as, Genetic Algorithm (GA), Cuckoo Search (CS), Particle Swarm Optimization (PSO), Ant colony Optimization (ACO) and many others [8–12]. Renewable energy sources due to their intrinsic characteristics (such as variability and unpredictability) represent a huge challenge in maintaining the balance of the electrical system to guarantee stability and reliability. To mitigate this drawback, from a wide range of possible and viable solutions, Energy Storage Systems (ESS) has been recognized as one of the most promising approaches, making renewable energy penetration more viable without compromising efficiency, quality and stability. In particular, the use of ESS systems in a distributed way, thus ensuring stability and above all quality in the electrical system, as well as enabling each consumer/producer to act as a player, enables the control of their facilities, in order to manage their resources and consumption. Among the wide diversity of existing ESS, with technological maturity, Li-ion batteries have become popular because they exhibit a high efficiency, do not suffer from memory effect and have greater longevity and a high energy density [13]. Despite all the improvements and technological advances Li-ion batteries still face some challenges and concerns that need to be addressed. A key concern is the charging approach essential to improve its efficiency and charging time. There are several charging methods, in the literature. These can be divided into three main groups: Constant Current/Constant Voltage (CC/CV), Multistage and Pulse Charging Method. The most used in the industry and most frequently found in the literature is the Constant Current/Constant Voltage method. It has three phases, the first phase, called trickle charge (TC), aims to charge the Li-ion battery with a reduced current if the voltage is below the minimum value. Once the minimum value is exceeded, the Li-ion battery is charged with a constant current (CC) until the voltage reaches a predefined value. When this value is reached, the algorithm starts the last phase called constant voltage (CV), imposing on the Li-ion battery a constant voltage and allowing the current to gradually reduce until the minimum charging current or the charging time limit are reached [14, 15]. Aiming to reduce the charging time, many variations of this method can be found, for example, the double-loop control charger (DL–CC/CV) [15], the boost charger (BC–CC/CV) [15, 16], the fuzzy-logic control (FL–CC/CV) [15], the grey-predicated control (GP–CC/CV) [15, 17, 18] and the phase-locked loop control (PLL–CC/CV) [15]. The Multistage method consists on
250
J. B. L. Fermeiro et al.
dividing the charging time in periods with different current levels. At each period, the cell is charged with a lower current value that of the previous [15]. However, this type of methodology raises three questions: the transition criteria between current levels; the current values of each level; and the number of current levels. In the literature, the most frequently found transition criterion states that the transition occurs when the cell voltage reach its maximum voltage. To optimize the current values, several optimization algorithms are used, for example the PSO algorithm and Fuzzy [19, 20]. In [21] a study is performed on the optimum number of current levels, where the results indicate five current levels as the optimal value. In [22] the authors propose a new criterion of transition between current levels based on the error between the desired voltage and the voltages of the cells. This presents an improvement in charging time when compared to the traditional algorithm. In the literature, it is possible to find another charging method called Pulse Charging Method. This type of method can be divided into two different approaches, Variable Frequency Pulse Charge (VFPC) and Variable Duty Pulse Current (VDPC). The basic idea of the VFPC is to optimize the frequency of the current pulse minimizing the impedance of the cell (best electrochemical reaction of the battery) and consequently to maximize the transfer of energy. VDPC maximizes energy transfer through two approaches, by varying the pulse amplitude and setting the pulse width or vice versa [15, 23]. There are several variants of this method, for example in [24] a VFPC charging method is proposed, it aims to find the optimal charging frequency, i.e. the frequency for which the internal impedance of the cell is minimum, maximizing the transferred energy. Another approach is proposed in [25] but based on VDPC charging method. In [26] the authors propose a sinusoidal charging current, to find the optimal charging frequency that minimizes the cell’s internal impedance, resulting in the optimal charging current. Finally, in [27] the authors propose an online tracking algorithm to allocate and track the optimal charging frequency for common batteries in real time under any condition. Another key concern related to ESS is the inevitability to interconnecting multiple Li-ion batteries in series (pack) to obtain the required voltage levels, coupled with the existence of intrinsic and extrinsic differences between cells, results in a lack of uniformity that reduces usable capacity and consequently the performance of the pack. This intrinsic (internal) non-homogeneity between cells is due mainly to small variations in the construction process, such as different capacities, volume, internal impedance and different rates of self-discharge, characteristics that worsen with usage and battery age. The main extrinsic factor (external) is temperature nonhomogeneity along the battery pack, which leads to different rates of self-discharge and consequent decline in performance [28]. There are in the literature several methods of balancing, divided in active and passive. The active methods are based on the transfer of energy from the cells with the higher voltage to the cells with lower voltage, being able to be unidirectional or bidirectional depending on the topology. These have a lot of different topologies and can be split into another two groups based on the use of the capacitors [29–32] and the use of converters. This second group can be further divided into isolated [32–35] and non-isolated [31, 35, 36]. The passive balancing methods are based on the dissipation of energy in the form of heat.
Enhanced Methodologies in Photovoltaic Production …
251
There are different methods in literature, such as, fixed shunt resistor, shunt resistor, complete shunting and switched shunt resistor [31, 32, 35, 36]. The switched shunt resistor topology is widely used because of its simplicity, cost, efficiency, volume, weight, robustness and reliability. Several of these concerns have been addressed in the past years to enhance both photovoltaic energy production and ESS performance [37–42]. This chapter will focus on two particular concerns, explaining how the problems can be addressed, the solution proposed by the authors and the obtained results. Firstly, to increase the efficiency of a PV production a MPPT method is proposed based on the particle swarm optimization algorithm avoiding the oscillations around the MPP, the poor tracking under rapid changing irradiation levels and the premature convergence to a local maximum under partial shading conditions. Secondly, a Li-ion battery charging method is proposed combining charging and balancing methodologies for large Li-ion battery packs with a high number of cells. The proposed charging method calculates the optimal value of the charging current based on the pack condition at each instant. For this, the pack temperature, the imbalance between cells and the difference between the desired voltage and the voltage of the cells are analysed in real time. This chapter is organized as follows: Sect. 2 describes all hardware related to the PSO-based MPPT and the Li-ion battery charging methods; Sect. 3 explains both algorithms design and implementation; Sect. 4 shows the experimental results of the proposed methodologies; Sect. 5 concludes the chapter and discusses the methods implementation results.
2 Hardware Description The hardware developed to test and implement the PSO-based MPPT and the Li-ion battery charging methods includes the following functional blocks: Power source unit; Processing and control system; Power circuit unit; Acquisition unit; Load unit and Battery pack unit, illustrated in Fig. 1.
2.1 Power Source Unit The power supply used was the 2.6 KW MAGNA-POWER programmable DC source SL500-5.2. Its PPPE software allows the simulation of PV panel, considering different solar irradiance and temperature levels defined by the user. It also has the possibility to program I-V curves, according to specific needs, which facilitates testing MPPT algorithms. After setting the profile of the PV panel this can be transferred to the programmable DC source through serial communication.
252
J. B. L. Fermeiro et al.
Power Circuit Unit L C
C
S
Acquisition Unit
Processing & Control System
SPI
Battery Pack Unit
V I
S
ISL94212
µ
AD7367
SPI
RS232
Load Unit
D
ISL94212
Power Source Unit
Fig. 1 Functional block representation of the hardware used
2.2 Processing and Control System This system is embodied by MathWorks Matlab® software (main processing unit) and the Texas Instruments microcontroller TMS320F28069 (auxiliary processing unit). The communication between the main and the auxiliary processing units was via serial communication (RS232). The auxiliary processing unit is a 32-bit floating point microcontroller able to perform complex mathematical tasks. It has a clock frequency of 90 MHz with 100 kb of RAM, 2 kb of ROM and 256 kb of Flash memory. It also has 16 PWM channels and 16 ADC channels with 12-bit resolution and minimum conversion time of 289 ns. It supports a wide range of communication protocols like Inter-integrated circuit (I2 C), Controller area network (CAN) and Serial peripheral interface (SPI).
2.3 Battery Pack Unit The battery pack unit was assembled with 24 SAMSUNG ICR18650-26H batteries with 2600 mAh capacity [43] in series, corresponding to a 24S1P configuration, illustrated in Fig. 2. This configuration results in a nominal power of approximately 230 Wh.
Enhanced Methodologies in Photovoltaic Production …
253
Fig. 2 Implemented battery pack unit assembled in a 24S1P configuration
2.4 Acquisition System Unit The acquisition unit associated to the MPPT system was based on the ADC AD7367, which is a 14-bit converter with 4 channels of simultaneous conversion. It can be configurated to operate with various conversion amplitudes, providing high precision and flexibility with conversion times lower than 1.25 µs and a transmission rate of 500 kbps. Choosing an external ADC allows for greater resolution and flexibility. This ADC was configured to operate with conversion amplitude of 0–10 V and the communication with the auxiliary processing unit is achieved through SPI communication. The voltage sensor used was the CYHVS025A, with a transformation ratio of 2500:1000 with a primary nominal current of ±10 mA and secondary nominal current of ±25 mA. It was assembled in a way so that the input range is ±250 V. The current sensor used was the Hall Effect current CYHCS-B1-25. This sensor grants excellent accuracy, good linearity and maximum nominal current of 25 A. The acquisition and balancing unit associated to the battery unit pack was implemented based on the ISL94212. This device is capable of monitoring 12 cells in series and can be connected in a chain of up to a maximum of 14 devices (168 cells in series and n in parallel). This unit also performs extensive diagnostic functions like cell overvoltage and undervoltage, over-temperature indication and many others. Also the ISL94212 device allows the implementation of low-cost balancing methodologies with external circuits. Figure 3 illustrates in detail the external balancing circuit used, known as passive switched shunt resistor method. One limitation of the ISL94212 is that each device it only allows the implementation of 4 temperature sensors. For that reason, an auxiliary circuit capable of monitoring 12 temperature sensors (1 sensor per 2 cells) was associated to the auxiliary processing unit. The temperature value was extrapolated through the Steinhart-Hart equation and the sensor used was Semitec’s 10 k Ultimate Thinness NTC thermistor.
254
J. B. L. Fermeiro et al.
Fig. 3 External balancing circuit (passive switched shunt resistor method)
2.5 Power Circuit Unit A non-isolated DC-DC boost converter was implemented, able to generate an output voltage equal or higher than the input voltage (step up) [44]. The electric circuit of the converter is presented in Fig. 4. The converter operating in continuous conduction mode has two distinct stages, as a function of modulation index (d). In the first stage [0 ≤ t ≤ dTs ] the MOSFET conducts and no current flows through the diode. In the second stage [dTs ≤ t ≤ Ts ] the MOSFET is off and the diode is forward biased. The differential equations that characterize these two conduction stages (considering ideal semiconductors) where the state vector is given by the expression x = [i L vc ] are the following:
x˙1 = x˙2
a1
0 − rLL 1 0 − C(R+r c)
x 1 R y = 0 (R+r c) x 2
b1
⎤ ⎡ 1 x1 ⎣ + L⎦u x2 0 (1)
c1
rL
Fig. 4 Non-isolated DC-DC boost converter circuit
VDC
L
D C
S
rC
R
Enhanced Methodologies in Photovoltaic Production …
x˙1 x˙2
=
y=
a2
c +r L )+rc r L − R(rL(R+r c) R C(R+rc ) c2
Rrc R (R+rc ) (R+rc )
R − L(R+r c) 1 − C(R+r c)
255
x1 + x2
b2
1 L
0
u
x 1
(2)
x2
The mathematical model of the converter can be obtained by a weighted arithmetic mean of its modulation index in both models of the conduction stages, thus: A
B
x˙ = (a1 d + a2 (1 − d)) x + (b1 d + b2 (1 − d)) u y = (c1 d + c2 (1 − d)) x
(3)
C
The transient regime component of the system (3) is given by:
Ax + Bu = 0 ⇒ y = Cx
x = −A−1 Bu y = Cx
(4)
Solving Eq. (4), we get the gain (G) of the converter: G=−
R(R + rc )(d − 1) R(rc + r L ) + rc r L + R 2 (1 − 2d) + d R(d R − rc )
(5)
where d is the modulation index. Assuming an ideal converter, i.e., considering rc and r L equal to zero, we get: G=
1 1−d
(6)
Figure 5 shows the gain variation as function of the modulation index and the ratio rRL . This is a non-inverting configuration and it displays a non-linear curve for a modulation index greater than 0.5 and a mostly linear tendency for a modulation index less than 0.5. A relevant aspect that must be taken into consideration when implementing MPPT algorithms is to never design the converter to operate with a modulation index close to its maximum gain. This is because from the maximum gain point on, the gain decreases as the modulation index increases, i.e. if higher output voltage is required, by adjusting the modulation index, it might operate in the descending part of the curve, therefore decreasing the output voltage instead.
256
J. B. L. Fermeiro et al. 20 Ideal R / rL = 900 R / rL = 450 R / rL = 225 R / rL = 100
Gain
15
10
5
0 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Modulation index
Fig. 5 Gain variation as a function of the modulation index for ratios between R and r L
3 Proposed Methodologies 3.1 PSO-Based MPPT Method The algorithm Particle Swarm Optimization (PSO) is a bio-inspired approach based in the social behaviour of animals. The algorithm starts with a random population of particles, where each particle represents a possible solution. Every particle has its own velocity, which is adjusted by its update Eq. (7) that consider the history of individual and collective experiences, that is, the experience of each individual particle and the experience of the collective. The main idea is to move the particles in such a way that they explore the search space for an optimal solution. The algorithm evaluates the performance of each particle at every iteration via a fitness function and changes its velocity in the direction of its own best performance until then (xPbest ) and the best performance of every particle (xGbest ). Each particle velocity is calculated with Eq. (7), and the new particle position is determined by Eq. (8). i i i i i + r2 C2 x Gbest,D = w × vk,D + r1 C1 x iPbest,D − xk,D − xk,D vk+1,D
(7)
i i i xk+1,D = xk,D + vk+1,D
(8)
Enhanced Methodologies in Photovoltaic Production …
257
with i the particle number, k the iteration number, D the dimension index, r1 and r2 are random numbers between [0, 1], C1 and C2 positive acceleration constants and w the inertial weight. The PSO is a flexible and robust approach fit to deal with this non-linear characteristic of PV panels. This type metaheuristic algorithm when applied to MPPT problem is able to mitigate the oscillations around the MPP, allows good tracking under rapid changing irradiation levels and avoids the local maxima trapping problem under partial shading conditions [41].
3.1.1
PSO-Based MPPT Method Implementation
The PSO-based MPPT algorithm was implemented on Code Composer Studio® v6 in C programming language. Figure 6 presents the flowchart for the implemented algorithm. It has a main function where all peripherals are configured, such as the EPWM module and the SPI module. All the interrupt routines are configured as well and variable initialization is executed.
Begin
Peripherals initialization
Interrupts initialization
Variables initialization Vref initialization
CNVST Interrupt
Infinite Loop
Read Interrupt
PSO algorithm Interrupt
Begin
Begin
Clear GPIO6 Get V and I via SPI Delay 0.2ms Calculate Vav and I av Set GPIO6 Return Return
Fig. 6 PSO-based MPPT algorithm flowchart
258
J. B. L. Fermeiro et al.
CONVST Busy
Conversion time
CS SCLK
Fig. 7 Temporal diagram of the ADC
The CNVST interrupt routine is responsible for triggering the ADC. The GPIO6 controls the conversion start signal (CONVST) for the ADC, which is active low, i.e. the conversion is initiated on the falling edge of CONVST (Fig. 7). The READ interrupt routine is falling edge-triggered by the GPIO12 connected to the BUSY output signal of the ADC. When this signal goes low it indicates that the conversion is complete and data is ready to be transmitted to the auxiliary processing unit. In this interrupt routine, initially we obtain the values of voltage and current via SPI communication. Afterwards, the mean values are computed to be used in the PSO algorithm interrupt routine. The DC-DC converter switching frequency was configured to operate at 25 kHz and both the CNVST interrupt and the READ interrupt routines are executed at 12.5 kHz, illustrated in Fig. 8. The PSO algorithm interrupt routine is executed at 2 Hz, due to the DC SL 500-5.2 programmable source constraints. This limitation prevents this routine to be executed at a higher frequency, as it would be desired (15–100 Hz). PSO algorithm interrupt routine is responsible for executing the PSO-based MPPT algorithm, as illustrated in Fig. 9. The algorithm has a star topology, with a population of three particles, where every particle can communicate with each other. First, the 12.5 kHz 25 kHz
PWM module CNVST Interrupt GPIO6 Read Interrupt GPIO12 PSO algorithm Interrupt
Fig. 8 CPU usage for the implemented interrupt routines
Enhanced Methodologies in Photovoltaic Production …
259
Fig. 9 PSO interrupt routine flowchart
Begin particle positioning count++ |v | ≤ 0.1
yes
yes
|P k-P k-1| ≥ 1
i
count = 4
i
i
yes count
d = part
P countk = Vav Iav
compute vi part i = parti+v i
count = 0
algorithm places the particles randomly in the search space (converter modulation index—d) and then performance of each particle is evaluated through the output power of the PV panel (fitness function). Finally, position of every particle is updated through Eqs. (7) and (8) and the process is repeated. As they approach the MPP, the velocity of each particle converges to zero. In case the velocity of every particle is within a radius inferior to 0.1 and the power difference between two consecutive iterations is greater than 1 W the process is restarted.
3.2 Li-Ion Battery Charging Method The proposed Li-ion battery charging method combines charging and balancing methods for large Li-ion battery packs with a high number of cells, which is a very complex process. The proposed charging method calculates the optimal value of the charging current based on the pack condition at each instant and not in pre-established conditions [42]. For this, the pack temperature, the imbalance between cells and the difference between the desired voltage and the voltage of the cells are analysed. The algorithm operates in real time, adjusting the charging profile according to the state of the batteries along the charging process. This is done using a Lookup Table (LUT) constructed with charge current values conditioned by three parameters related to the state of the battery pack (the difference between the desired voltage and the mean voltage, the difference between cells and temperature of the pack). At each moment, the appropriate charge current value is calculated by trilinear interpolation
260
J. B. L. Fermeiro et al.
0 0
0.5 40
0.5
fere
nce
30
to 4
.2V
1
20
e atur
per
Tem
0.5
0.05
0
0
0 0
0.2
0.4
0.6
0.8
1
1.5 1
0.1
0.5
0.05
0
0 30
35
40
Difference to 4.2V
0.2 0.15
2.5
1
2
25
1
0.1
2.5
0.3
Difference between cells
1.5 0.15
Difference to 4.2V
0.25
20
2
0.2
0.8
2
0.6
1.5
0.4
1
0.2
0.5 0
0
45
Temperature
Current
Dif
0.25
Current
1
Difference between cells
1.5
0.1
Current
0.2
2.5
0.3
2
Current
Difference between cells
2.5
20
25
30
35
40
45
Temperature
Fig. 10 Representation of the implemented lookup table
considering the three parameters related to the state of the battery pack mentioned above. One of the advantages of this methodology is that the table can be constructed taking into account several assumptions, for example, giving priority to the charging time or to the lifespan of battery pack, that is, the user has the possibility to choose the charging profile. The lifespan of battery pack is mainly affected by high temperatures, high currents, overcharging and deep discharges. Figure 10 illustrates several sections of the table, where it is possible to visualize the relationship between the various parameters. In this way, it is possible to understand that during the construction of the LUT it was necessary to take into account several premises, such as: – The influence of the temperature on the charging current only occurs when the temperature reaches high values. Temperatures up to 38 °C are considered safe and therefore there is no need to reduce the charging current. Above 38 °C the temperatures are considered risk temperatures and the current is reduced in order to decrease the temperature. If the pack temperature reaches 45 °C, the charging is interrupted for safety reasons [43]; – The difference between cells has a more significant impact on the charging current value choice. Generally, a small difference between cells will correspond to a small influence in the charging current value. The degree of influence of high difference between cells will depend upon the difference between the desired voltage and the mean voltage of the cells. The higher the difference to 4.2 V (desired voltage) fewer the influence on the charging current value. i.e. when the mean voltage of the cells approaches the desired voltage, the current suffers a more significant reduction. This behaviour is justified because balancing is much more effective at the end of the charging process. Generally, the charging current is reduced throughout the
Enhanced Methodologies in Photovoltaic Production …
261
charging process and at the end, the balancing current is higher than the charging current, this allows for a more efficient imbalance elimination. – The last parameter analysed is the difference between the desired voltage and the mean voltage of the cells. If any cell voltage is below the lower bound voltage limit value (3.2 V) the current is severely reduced, this is called the trickle charge phase. From this limit on, the charging current starts with a high value and is gradually reduced if the imbalance between cells is low and the temperature is below 38 °C. However if these conditions are not present the current will suffer a reduction, as explained in the points above. Two stopping criteria were selected; expression (11) represents a safety condition, to prevent the voltage of the cells to exceed maximum voltage of 4.25 V and (12) represents an acceptance condition, which determines if the charging is completed successfully. Vi,k ≤ 4.25
(11)
N N N 1 1 1 Vi,k ≥ dv ∧ Vi,k − 0.01 ≤ Vi,k ≤ Vi,k + 0.01 N i=1 N i=1 N i=1
(12)
The proposed balancing method is performed through a statistical analysis. This analysis considers the individual voltage of the cells (Vi,k ), the mean of the voltages (ξk ) and the difference of this to the desired voltage (φk ). Criteria have been established that define the voltage limit value at which the cells go into balancing. This limit value becomes more constraint as the mean voltage of the cells approaches the desired value, since balancing is more effective at the top balancing. These criteria are represented by the first four equations of expression (13). Two additional criteria were created, the last two conditions in expression (13), in order to increase the efficiency of the changing process. The first criterion disables balancing when the difference between the desired voltage and the mean voltages of the cell is small and the imbalance between cells is within the stopping criteria (Ψk < 0.01 V). The second criterion (last condition) sets the cells that have a voltage equal or greater than the desired voltage (4.2 V) for balancing. ⎧ ⎪ Vi,k ⎪ ⎪ ⎪ ⎪ Vi,k ⎪ ⎪ ⎨ Vi,k ⎪ V i,k ⎪ ⎪ ⎪ ⎪ V i,k ⎪ ⎪ ⎩ Vi,k
≥ ξk + δk ≥ ξk + δk ≥ ξk + δk ≥ ξk + δk ≥ ξk + δk ≥ 4.2
i f φk ≥ 0.2 ∗ 0.8 i f 0.2 > φk ≥ 0.1 ∗ 0.6 i f 0.1 > φk ≥ 0.05 ∗ 0.4 i f 0.05 > φk ≥ 0.01 ∗ 0.4 i f φk < 0.01 ∧ Ψk > 0.01
where Vi,k Voltage in cell i at instant k;
(13)
262
dv N ξk δk φk Ψk
J. B. L. Fermeiro et al.
Desired voltage (4.2 V); Number of cells; Mean voltage in the cells at instant k; Standard deviation at cell voltages at instant k; Difference between desired voltage and mean of voltages in cells at instant k; Voltage difference among cells at instant k.
Given the constructed LUT, the stopping criteria and the balancing orders, the charging current value will be determined with trilinear interpolation for values not pre-established in the table. This determines the value of the charging current through the three parameters referred previously.
3.2.1
Li-Ion Battery Charging Method Implementation
The implementation of the algorithm was performed in Texas Instruments Code Composer Studio® V6 in C language. Figure 11 shows the auxiliary processing unit CPU usage. The code was divided into two segments in order to make the analysis of the implemented algorithm more perceptible and simpler. In a first phase, all variables are initialized, the peripherals and all interrupts used are configured. The timer interrupt was set to trigger the segments in a time base of 5 s. Code segment A and B are performed with a sampling time of 70 s, out of phase 35 s of each other. The code segment A is responsible for: • acquiring cells voltages and temperatures; • designate the cells to be set to balance;
5s
Timer
70s Code Segment A 35s
70s
Code Segment B
Balancing Time
60s
Fig. 11 CPU usage of the auxiliary processing unit
Enhanced Methodologies in Photovoltaic Production …
263
• calculating the charging current. The balancing and the charging current calculations are done according to the Eqs. (11), (12) and (13), described in the previous sub-chapter. The code segment B is responsible for calculating the state of charge of the battery pack under operation. To estimate the state of charge (SOC) of the battery pack for the discharging period, the traditional Coulomb counting method was used, which is widely found in the literature, however for the charging period it is necessary to consider the energy dissipated in the balancing process. For this reason, a variant of the Coulomb counting method was developed. In this way, it is possible to estimate more accurately the SOC during the discharge and charge period of the battery pack. As previously stated, in the discharge period, the SOC is estimated through the traditional Coulomb counting method [45]. The estimation of SOC in the discharge period is represented by the following equations: S OCi,k = S OCi,k −
N
Idischarge ∗ Ts ∗ C
(14)
i=1
S OC p,k =
N S OCi,k N i=1
(15)
where S OCi,k S OC p,k Idischarge Ts C N
SOC of the cell i, at the instant k; SOC of the pack p, at instant k; Discharge current; Sampling time; Number of coulombs per second (1 C per second); Number of cells.
In the charging period, the estimation of the SOC is more complex, due to balancing. In this case it is necessary to withdraw the dissipated coulombs when balance occurs. That is, when a cell is set to balance, a voltage reading is performed during this period (60 s) to calculate the balancing current. Thus, the traditional Coulomb counting method is performed, and the value associated with balancing is subtracted from the cell SOC value. This process is represented by the following equations: S OCi,k =
N
Ichargek ∗ Ts ∗ C −
i=1
S OC p,k =
N S OCi,k N i=1
where S OCi,k
N Vbi,k ∗ Tb ∗ C Rb i=1
SOC of the cell i, at the instant k;
(16)
(17)
264
J. B. L. Fermeiro et al.
S OC p,k Ichargek Vbi,k N Ts Rb Tb
SOC of the pack p, at the instant k; Charge current, at the instant k; Voltage during balancing process of cell i, at the instant k; Number of cells; Sampling time; Value of balancing resistor; Balancing time.
The algorithm cyclically executes the two previously presented segments until the stop criteria are met.
4 Experimental Results 4.1 PSO-Based MPPT Method Results To validate the performance of the PSO-based MPPT method, experimental tests were performed under uniform irradiance conditions. Figure 12a–c illustrate the output waveforms for the voltage, current and power of the PV system respectively. The experimental results show the excellent performance of the algorithm with a convergence time of 10 samples (equivalent to 5 s) and once the MPP is reached
(a)
(b) 2
Current [A]
Voltage [V]
60
40
20
Output Voltage Reference Voltage
1.5 1 Output Current Reference Current
0.5 0
0 0
5
10
15
20
25
30
35
40
0
45
5
10
15
20
25
30
35
40
45
Number of samples
Number of samples
(c)
(d)
90
100
2
100
Voltage Current Power
80
90 80
70
40
50
30
60 50 40 30
Power error [W]
Voltage error [V]
Power [W]
50
1
Current error [A]
70 60
20 20
0 Output Power Reference Power
10
10 0
0
-10
0 5
10
15
20
25
Number of samples
30
35
40
45
0
5
10
15
20
25
30
35
40
45
Number of samples
Fig. 12 Performance results of the PSO-based MPPT method under uniform irradiance conditions. a Voltage output waveform. b Current output waveform. c Power output of the PV system. d Evolution of the error for voltage, current and power
Enhanced Methodologies in Photovoltaic Production …
(a)
(b)
60
2
40
20
Output Voltage Reference Voltage
Current [A]
Voltage [V]
265
0
1.5 1 Output Current Reference Current
0.5 0
0
50
100
150
200
0
50
Number of samples
100
150
200
Number of samples
(c)
(d) 120
2
100
90
Voltage Current Power
80
100
70
40
50
60
40
Power error [W ]
Voltage error [V]
Power [W]
50
1
Current error [A]
80
60
30 20
0 Output Power Reference Power
10 0
0
50
100
150
200
0
20
0 0
Number of samples
50
100
150
200
Number of samples
Fig. 13 Performance results of the PSO-based MPPT method under non-uniform irradiance conditions. a Voltage output waveform. b Current output waveform. c Power output of the PV system. d Evolution of the error for voltage, current and power
there are no oscillations around it. Figure 12d illustrates the evolution of the voltage, current and power errors, demonstrating once again the excellent performance of the controller. The second experimental test was performed under non-uniform irradiance conditions. Figure 13a–c presents the obtained results for the voltage, current and output power of the PV system respectively when subject to fast irradiance transitions. Once again, the PSO-based MPPT method show excellent performance with average convergence time of 10 samples (equivalent to 5 s) and no oscillations once the MPP is reached. Another important point is the tracking capability of the algorithm, i.e. when a variation in the environment conditions (irradiance) occurs, the algorithm restarts if the difference between two consecutive iterations is greater than 1 W and the velocity of every particle is within a range inferior to 0.1. The evolution of the voltage, current and power errors is illustrated in Fig. 13d demonstrating once again the excellent performance of the controller.
4.2 Li-Ion Charging Method Results To validate the performance of the proposed method, batteries with reduced charge/discharge cycles were used under different imbalance conditions. Two case
266
J. B. L. Fermeiro et al.
studies were presented: Case 1 illustrates a charge where there was a slight imbalance between cell voltages, whereas Case 2 represents a charge in which there was a greater imbalance between cell voltages. Figure 14 shows the resulting current, in two complete charging processes, and the parameters (the difference between the desired voltage and the cell mean voltages, the imbalance between cells and the temperature of the pack) used by the proposed charging method. The charging current depends on the real time conditions of the battery pack, and not on pre-established conditions. At the beginning of the charging process for Case 1 the charging current was penalised due to the initial cell imbalance of 0.11 V. This imbalance was diminished to 0.0375 V at t = 0.078 h leading to an increase in charging current up to 2.105 A. From this point on, the charging current show a decreasing behaviour mainly influenced by the difference between the desired voltage and the cells mean voltage. This occurred due to the presence of a small imbalance between cells and the temperature not reaching risk values, having a gradient of 10.47 °C corresponding to the maximum value of 33.43 °C. When compared to Case 1, in Case 2, the charging current shows a more gradual decreasing behaviour. This is due to the presence of greater imbalances along the charging, only being minimized at the end. Also for this case the temperature does
0.9
1.8
0.8
1.6 1.4 1.2 1
34
Difference to 4.2V Difference between cells Temperature
30 29
0.5
28
0.4
27 26
0.3
25 0.2
0.4
24 0.1 0.05 0
0.2 0 0
0.5
1
1.5
2
2.5
23 0
0.5
1
Time [h]
1.5
2
22 2.5
Time [h]
Charging current (case 2)
2.2
Table parameters (case 2)
1
2
0.9
1.8
0.8
34
Difference to 4.2V Difference between cells Temperature
33 32
Hotspot=31.62ºC ( ΔT=8.71ºC)
0.7
Voltage [V]
1.4 1.2 1
30
0.6
29
0.5
28
0.4
27
0.8
26
0.3
0.6
25 0.2
0.4
24 0.1 0.05 0
0.2 0 0
0.5
1
1.5
2
2.65
23 0
0.5
1
Time [h]
Fig. 14 Charging current and table parameters, for two cases of study
1.5
Time [h]
2
22 2.65
Temperature [ºC]
31
1.6
Current [A]
32
0.6
0.8 0.6
33
31
Hotspot=33.43ºC ( ΔT=10.47ºC)
0.7
Voltage [V]
Current [A]
Table parameters (case 1)
1
2
Temperature [ºC]
Charging current (case 1)
2.2
Enhanced Methodologies in Photovoltaic Production …
267
not influence the current, since it only reaches the maximum value of 31.62 °C, corresponding to a gradient of 8.71 °C. As shown previously, for Case 1 the temperature has a higher gradient since the charging current was higher throughout the process. The first graph in Fig. 15 illustrates the behaviour of the individual cells voltage throughout the charging (Case 1), where we can see the presence of a very small voltage imbalance between cells, finishing the charging within the established stopping criteria (ξk = 4.20 V and voltage deviation φk = 0.0086 V). The second graph illustrates the number of balanced cells over the entire charging process. Analysing it, it is clear that the balancing criteria favoured the top balancing, which, as explained before, is more effective. The charging process ends when the stopping criteria has been fulfilled. In this case, a large number of cells were set to balance (21 cells) at the end of the charging. This happened because although these cells already reached the desired voltage, the stopping criteria were still not fulfilled (ξk = 4.199 V and Ψk = 0.0121 V ). The cell balancing was disabled at t = 2.46 h since the balancing criteria were fulfilled, i.e., φk < 0.01 ∧ k < 0.01. However, the charging process continued for 2.3 min until the stopping criteria were met (ξk = 4.20 V ∧ φk = 0.0086 V). Figure 16 shows the individual cells voltage profile and the number of balanced cells throughout the charging process (Case 2). Despite the high imbalance between cells throughout charging (bottom graph) the established stopping criteria Voltage of each cell along the charge (case 1) 4.2 4.1
Voltage [V]
4 3.8 3.6
3.3 0
0.5
1
1.5
2
2.5
2
2.5
Time [h] Number of cells balancing during charge (case 1)
21
Number of cells
18 15 12 9 6 3 0 0
0.5
1
1.5
Time [h]
Fig. 15 The voltage of each cell along the charge and number of cells balancing during charge, the case of study 1
268
J. B. L. Fermeiro et al. Voltage of each cell along the charge (case 2) 4.2 4.1
Voltage [V]
4 3.8 3.6
3.3 0
0.5
1
1.5
2
2.65
Time [h] Number of cells balancing during charge (case 2)
17
Number of cells
15 12 9 6 3 0 0
0.5
1
1.5
2
2.65
Time [h]
Fig. 16 The voltage of each cell along the charge and number of cells balancing during charge, the case of study 2
(ξk = 4.20 V and φk = 0.01 V) were met. Once again, the method favoured the top balancing as expected. The results in both cases show that the proposed method is able to handle well with large imbalances. When compared to a situation of small imbalances, it becomes less efficient regarding charging time. Finally, Fig. 17 shows the evolution of the SOC throughout the charging, as well as the profile of the mean voltage versus SOC, for both case studies. In Case 1 the estimated SOC was 0.992 corresponding to an error of 0.08%. In the second case the estimated SOC was 0.999 corresponding to an error of 0.01%.
4.2.1
Charging Efficiency Comparison
For comparison purposes, the traditional Multistage method (with 5 current levels) was implemented with the proposed balancing method for the same conditions. Figure 18 shows the charging current profile and the parameters along the charging. By observing the first graph, the method jumps from the first current level to the fourth, in the first transition. This occurred because the cells voltage exceeded the desired voltage (4.2 V) and remained above this value for 3.5 min. At the end of this time, the current is at the fourth level (the sampling time is 70 s). This phenomenon
Enhanced Methodologies in Photovoltaic Production …
269
SOC estimation during charge (case 1)
1
Voltage vs. SOC (case 1) 4.2
0.9
4.1
0.8
4
0.7
Voltage [V]
3.9
SOC
0.6 0.5 0.4
3.8 3.7 3.6
0.3 0.2
3.5
0.1
3.4
0
3.3 0
0.5
1
1.5
2
2.5
0
0.2
0.4
Time [h]
0.8
1
0.8
1
Voltage vs. SOC (case 2)
SOC estimation during charge (case 2)
1
0.6
SOC
4.2
0.9
4.1
0.8
4
0.7
Voltage [V]
3.9
SOC
0.6 0.5 0.4
3.8 3.7 3.6
0.3 0.2
3.5
0.1
3.4 3.3
0 0
0.5
1
1.5
2
0
2.65
0.2
0.4
0.6
SOC
Time [h]
Fig. 17 SOC estimation during charge and profile of mean cell voltage during charge, for both cases of study Charging current (Traditional Multistage)
Table parameters (Traditional Multistage)
1
2.5
0.9
Difference to 4.2V Difference between cells Temperature
Hotspot=39.86ºC ( ΔT=17.07ºC)
0.8
40 38 36
2
Voltage [V]
Current [A]
34 1.5
1
0.6 32
0.5 0.4
30
0.3
28
0.2
26
0.5 0.1 0.05 0 -0.05
0 0
0.5
1
1.5
Time [h]
2
2.5
24
0
0.5
1
1.5
Time [h]
Fig. 18 Charging current and table parameters, for traditional multistage
2
22 2.5
Temperature [ºC]
0.7
270
J. B. L. Fermeiro et al.
occurs because the cells have a relatively small number of cycles, thus having a low internal resistance, with no rapid variations in voltage. Another relevant factor is the temperature of the pack. Due the prolonged period with high current, the temperature reaches high values (39.86 °C), corresponding to a gradient of 17.07 °C. This represents a risk factor, since the temperature increases to high values without the algorithm taking this into account. Moreover, the imbalance between cells was greatly reduced throughout the charging. Figure 19 shows the individual cell voltages and the number of cells with active balancing during the entire charging process. The charging ended successfully when the stop criteria were met (voltage deviation φk = 0.01V) with some voltage imbalances between cells. Many cells are set to balance at t = 1 h because at this point their voltages exceeded 4.2 V. This causes a short period of overcharging (exceed 4.2 V), which results in battery damage and decrease the lifespan of the battery pack. At the top of the charge, the balancing is disabled by the balancing criteria (φk < 0.01 ∧ Ψk < 0.01). The charging proceeded until the stopping criteria was reached. The illustration in Fig. 20 shows the SOC profile during charging and cells mean voltage vs. SOC. For the traditional Multistage the same SOC estimation method was used. The SOC estimated for the first case was 1.005, corresponding to an error of 0.05%. When compared with the cases presented previously, this show a greater slope until the 1 h of charge, due to the prolonged higher current. After 1 h of charging Voltage of each cell along the charge (Traditional Multistage) 4.2 4.1
Voltage [V]
4 3.8 3.6
3.3 0
0.5
1
1.5
2
2.5
Time [h] Number of cells balancing during charge (Traditional Multistage)
Number of cells
22 20 17 15 12 9 6 3 0 0
0.5
1
1.5
2
2.5
Time [h]
Fig. 19 The voltage of each cell along the charge and number of cells balancing during charge for traditional multistage
Enhanced Methodologies in Photovoltaic Production … Voltage vs. SOC (Traditional Multistage)
SOC estimation during charge (Traditional Multistage) 4.2
1
4.1
0.9 0.8
4
0.7
3.9
Voltage [V]
SOC
271
0.6 0.5 0.4
3.8 3.7 3.6
0.3 3.5
0.2
3.4
0.1 0
0
0.5
1
1.5
2
3.3
2.5
0
0.2
0.4
0.6
0.8
1
SOC
Time [h]
Fig. 20 SOC estimation during charge and profile of cell mean voltages during charge, for traditional multistage
Table 1 Comparison between the proposed algorithm and the traditional multistage method Charging time (h)
Imbalance between cells at the end of charging (V)
Temperature increase (°C)
Traditional multistage with 5 current levels
2.5
0.01
17.07
Proposed algorithm
2.48
0.0086
10.47
(mean voltage at 4.2 V) the SOC was 0.9, increasing to 1 slowly because of the low current values. The results of the traditional Multistage method were compared with those of the proposed method for the Case 1, as shown in Table 1. This table presents the various parameters analysed during charging. Although the proposed method achieved a faster charging time and lower imbalance between cells at the end of the charge the most relevant difference was the temperature variation with a reduction of 38.6% in favor of the proposed method when compared to the Multistage method. Finally, it is important to mention that the high temperature and the maximum voltage overshoot from the multistage will damage the batteries and reduce the lifespan of the battery pack.
5 Conclusion This chapter discussed concerns related to PV production with energy storage system integrating multi-cell Lithium-ion batteries. With the intent of increasing the efficiency of a PV production, a MPPT method was proposed based on a classic PSO algorithm with a population of three particles
272
J. B. L. Fermeiro et al.
in a star topology. The proposed PSO-based MPPT was tested under uniform and non-uniform environment conditions. From the experimental results we can conclude that this type of MPPT algorithms based on meta-heuristics exhibits excellent performance presenting no oscillations once the MPP is reached, avoids the convergence to a local maximum under partial shading conditions and also has an excellent tracking capability under rapid variation in environment conditions particularly irradiance and temperature. To increase the participation of renewable energy sources in the energetic matrix, ESS has been recognized as one of the most promising approaches. Despite all the technologic advances and research, ESS still present some concerns. One of the most important is the charging process, for this reason this chapter presented a new charging method based on the parameters of the battery pack in real time. The charging current is determined according to the state of the battery pack using a lookup table, where it is conditioned by three parameters: the difference between the desired voltage and the mean cell voltage, the voltage imbalance between cells; and the temperature of the pack. The proposed Li-ion battery charging method when compared with a classic method (Multistage) presented a similar charging time however it presented a significant reduction of 38.6% in gradient temperature. Another important feature of this method is the ability to perform balancing, essential to ensure the ESS safety usage. A passive balancing methodology was implemented, achieving low imbalance between cells in either methods (proposed and Multistage) demonstrating the effectiveness of the balancing process. Additionally, a new SOC estimation methodology was developed based on the Coulomb counting method, to account for the energy dissipated by the balancing process. This SOC estimation method presented negligible errors at the end of the charge. The experimental results demonstrated excellent performance allowing to achieve an optimized PV production and at the same time the increase the ESS effectiveness and efficiency.
References 1. B. Subudhi, R. Pradhan, A comparative study on maximum power point tracking techniques for photovoltaic power systems. IEEE Trans. Sustain. Energy 4, 89–98 (2013) 2. T. Esram, P.L. Chapman, Comparison of photovoltaic array maximum power point tracking techniques. IEEE Trans. Energy Convers. 22, 439–449 (2007) 3. A. Anurag, S. Bal, S. Sourav, M. Nanda, A review of maximum power-point tracking techniques for photovoltaic systems. Int. J. Sustain. Energy 35, 478–501 (2016) 4. A.N.A. Ali, M.H. Saied, M.Z. Mostafa, T.M. Abdel-Moneim, A survey of maximum PPT techniques of PV systems, in 2012 IEEE Energytech (2012), pp. 1–17 5. D. Sera, L. Mathe, T. Kerekes, S.V. Spataru, R. Teodorescu, On the perturb-and-observe and incremental conductance MPPT methods for PV systems. IEEE J. Photovoltaics 3, 1070–1078 (2013) 6. F. Aashoor, F. Robinson, A variable step size perturb and observe algorithm for photovoltaic maximum power point tracking, in 2012 47th International Universities Power Engineering Conference (UPEC) (2012), pp. 1–6
Enhanced Methodologies in Photovoltaic Production …
273
7. M.A. Abdourraziq, M. Maaroufi, M. Ouassaid, A new variable step size INC MPPT method for PV systems, in 2014 International Conference on Multimedia Computing and Systems (ICMCS) (2014), pp. 1563–1568 8. S. Hadji, J.P. Gaubert, F. Krim, Real-time genetic algorithms-based MPPT: study and comparison (theoretical an experimental) with conventional methods. Energies 11 (2018) 9. B.R. Peng, K.C. Ho, Y.H. Liu, A novel and fast MPPT method suitable for both fast changing and partially shaded conditions. IEEE Trans. Ind. Electron. 65, 3240–3251 (2018) 10. H. Chaieb, A. Sakly, A novel MPPT method for photovoltaic application under partial shaded conditions. Sol. Energy 159, 291–299 (2018) 11. S. Titri, C. Larbes, K.Y. Toumi, K. Benatchba, A new MPPT controller based on the Ant colony optimization algorithm for photovoltaic systems under partial shading conditions. Appl. Soft Comput. 58, 465–479 (2017) 12. K. Sundareswaran, S. Peddapati, S. Palani, MPPT of PV systems under partial shaded conditions through a colony of flashing fireflies. IEEE Trans. Energy Convers. 29, 463–472 (2014) 13. S.M.A.S. Bukhari, J. Maqsood, M.Q. Baig, S. Ashraf, T.A. Khan, Comparison of characteristics—lead acid, nickel based, lead crystal and lithium based batteries, in 2015 17th Uksim-Amss International Conference on Computer Modelling and Simulation (Uksim) (2015), pp. 444–450 14. A.A. Hussein, I. Batarseh, A review of charging algorithms for nickel and lithium battery chargers. IEEE Trans. Veh. Technol. 60, 830–838 (2011) 15. S. Weixiang, V. Thanh Tu, A. Kapoor, Charging Algorithms of Lithium-Ion Batteries: An Overview, (2012), pp. 1567–1572 16. P.H.L. Notten, J.H.G.O.H. Veld, J.R.G. van Beek, Boostcharging Li-ion batteries: a challenging new charging concept. J. Power Sour. 145, 89–94 (2005) 17. L.R. Chen, R.C. Hsu, C.S. Liu, H.Y. Yang, N.Y. Chu, A grey-predicted Li-ion battery charge system, in Iecon 2004—30th Annual Conference of IEEE Industrial Electronics Society, vol. 1, (2004), pp. 502–507 18. L.R. Chen, R.C. Hsu, C.S. Liu, A design of a grey-predicted li-ion battery charge system. IEEE Trans. Ind. Electron. 55, 3692–3701 (2008) 19. C.L. Liu, S.C. Wang, S.S. Chiang, Y.H. Liu, C.H. Ho, PSO-based fuzzy logic optimization of dual performance characteristic indices for fast charging of lithium-ion batteries, in 2013 IEEE 10th International Conference on Power Electronics and Drive Systems (IEEE Peds 2013) (2013), pp. 474–479 20. S.C. Wang, Y.H. Liu, A PSO-based fuzzy-controlled searching for the optimal charge pattern of Li-Ion batteries. IEEE Trans. Ind. Electron. 62, 2983–2993 (2015) 21. L.R. Dung, J.H. Yen, ILP-based algorithm for lithium-ion battery charging profile, in IEEE International Symposium on Industrial Electronics (ISIE 2010) (2010), pp. 2286–2291 22. R. Velho, M. Beirao, M.D. Calado, J. Pombo, J. Fermeiro, S. Mariano, Management system for large li-ion battery packs with a new adaptive multistage charging method. Energies 10 (2017) 23. M.D. Yin, J. Cho, D. Park, Pulse-based fast battery IoT charger using dynamic frequency and duty control techniques based on multi-sensing of polarization curve. Energies 9 (2016) 24. L.R. Chen, A design of an optimal battery pulse charge system by frequency-varied technique. IEEE Trans. Ind. Electron. 54, 398–405 (2007) 25. L.R. Chen, Design of duty-varied voltage pulse charger for improving li-ion battery-charging response. IEEE Trans. Ind. Electron. 56, 480–487 (2009) 26. L.R. Chen, S.L. Wu, D.T. Shieh, T.R. Chen, Sinusoidal-ripple-current charging strategy and optimal charging frequency study for li-ion batteries. IEEE Trans. Ind. Electron. 60, 88–97 (2013) 27. A.A. Hussein, A.A. Fardoun, S.S. Stephen, An online frequency tracking algorithm using terminal voltage spectroscopy for battery optimal charging. IEEE Trans. Sustain. Energy 7, 32–40 (2016) 28. J. Kim, J. Shin, C. Chun, B.H. Cho, Stable configuration of a li-ion series battery pack based on a screening process for improved voltage/SOC balancing. IEEE Trans. Power Electron. 27, 411–424 (2012)
274
J. B. L. Fermeiro et al.
29. M.Y. Kim, C.H. Kim, J.H. Kim, G.W. Moon, A chain structure of switched capacitor for improved cell balancing speed of lithium-ion batteries. IEEE Trans. Ind. Electron. 61, 3989– 3999 (2014) 30. Y.M. Ye, K.W.E. Cheng, Modeling and analysis of series-parallel switched-capacitor voltage equalizer for battery/supercapacitor strings. IEEE J. Emerg. Sel. Top. Power Electron. 3, 977– 983 (2015) 31. J. Gallardo-Lozano, E. Romero-Cadaval, M.I. Milanes-Montero, M.A. Guerrero-Martinez, Battery equalization active methods. J. Power Sour. 246, 934–949 (2014) 32. J. Cao, N. Schofield, A. Emadi, Battery balancing methods: a comprehensive review, in 2008 IEEE Vehicle Power and Propulsion Conference (2008), pp. 1–6 33. M.M.U. Rehman, M. Evzelman, K. Hathaway, R. Zane, G.L. Plett, K. Smith, E. Wood, D. Maksimovic, Modular approach for continuous cell-level balancing to improve performance of large battery packs, in 2014 IEEE Energy Conversion Congress and Exposition (Ecce) (2014), pp. 4327–4334 34. M.M.U. Rehman, F. Zhang, M. Evzelman, R. Zane, D. Maksimovic, Control of a series-input, parallel-output cell balancing system for electric vehicle battery packs, in 2015 IEEE 16th Workshop on Control and Modeling for Power Electronics (COMPEL) (2015) 35. M. Daowd, N. Omar, P. Van den Bossche, J. Van Mierlo, A review of passive and active battery balancing based on MATLAB/Simulink. Int. Rev. Electr. Eng.-IREE 6, pp. 2974–2989 (2011) 36. J. Qi, D.D.C. Lu, Review of battery cell balancing techniques, in 2014 Australasian Universities Power Engineering Conference (AUPEC) (2014) 37. M.F.N. Tajuddin, M.S. Arif, S.M. Ayob, Z. Salam, Perturbative methods for maximum power point tracking (MPPT) of photovoltaic (PV) systems: a review (vol. 39, pg 1153, 2015). Int. J. Energy Res. 39, 1720–1720 (2015) 38. N. Karami, N. Moubayed, R. Outbib, General review and classification of different MPPT Techniques. Renew. Sustain. Energy Rev. 68, 1–18 (2017) 39. Y.J. Zheng, M.G. Ouyang, L.G. Lu, J.Q. Li, X.B. Han, L.F. Xu, On-line equalization for lithiumion battery packs based on charging cell voltages: Part 1. Equalization based on remaining charging capacity estimation. J. Power Sour. 247, 676–686 (2014) 40. M.M. Hoque, M.A. Hannan, A. Mohamed, Voltage equalization control algorithm for monitoring and balancing of series connected lithium-ion battery. J. Renew. Sustain. Energy 8 (2016) 41. G. Calvinho, J. Pombo, S. Mariano, M.d.R. Calado, Design and implementation of MPPT system based on PSO algorithm, in Presented at the 9th International Conference on Intelligent Systems IS’18, Madeira Island, Portugal (2018) 42. R.L. Velho, J.A.N. Pombo, J.B.L. Fermeiro, M.R.A. Calado, S.J.P.S. Mariano, Lookup table based intelligent charging and balancing algorithm for Li-ion Battery packs, in 9th International Conference on Intelligent Systems IS’18, Madeira Island, Portugal (2018) 43. E.B. Disivion, Specification of Product for Lithium-Ion Rechargeable Cell Model: ICR1865026H, Samsung SDI co. Ltd. (2011) 44. J. Sun, D.M. Mitchell, M.F. Greuel, P.T. Krein, R.M. Bass, Averaged modeling of PWM converters operating in discontinuous conduction mode. IEEE Trans. Power Electron. 16, 482– 492 (2001) 45. W.-Y. Chang, The state of charge estimating methods for battery: a review. ISRN Appl. Math. 2013 (2013)
Mobility in the Era of Digitalization: Thinking Mobility as a Service (MaaS) Luís Barreto, António Amaral and Sara Baltazar
Abstract The planning and design of sustainable and smart cities—cities of the future—should properly address the challenges that arise by the every day growth of the urban population. Mobility is an important issue considering social inclusion and the sustainable development of such cities. Thus, future mobility will have an increased importance when having to plan and design the cities of tomorrow. A key component of any future mobility and its metabolism is what is known as Mobility as a Service (MaaS), representing emerging opportunities from any type or mode of transportation in future cities. Through an empirical and explorative research methodology, this chapter presents the main issues and characteristics that any future MaaS should consider. Concluding, some features and trends are presented that should be considered in the development of future MaaS systems, allowing a more convenient provision of sustainable, versatile and attractive mobility services. Keywords Mobility as a service (MaaS) · MaaS systems · Cities of the future · Sustainable and smart cities · Smart mobility
1 Introduction People try more and more to work and live in urban or suburban areas and this tendency seems to be growing over time. Population in cities is growing every year considerably, and this is turning them into an essential element of the economic and L. Barreto (B) · S. Baltazar Escola Superior de Ciências Empresariais, Instituto Politécnico Viana do Castelo, Valença, Portugal e-mail: [email protected] S. Baltazar e-mail: [email protected] A. Amaral Escola Superior de Tecnologia e Gestão, Instituto Politécnico do, Felgueiras, Portugal e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_12
275
276
L. Barreto et al.
social development of any country. However, they also contribute to the emergence of complex problems that need to be properly handled due to the high concentration of people. Namely, intrinsic problems such as the pronounced reliance on cars [54] and, consequently, the aggravation of traffic congestion and noise levels that increase the environmental damage. To solve such problems new mobility solutions must be adopted having always in mind that inter-modal usage is a crucial factor in people’s daily life, hence in cities triple bottom line dimensions: economic, social and environmental. Recent research in the transportation field focuses their developments into the following aspects/dimensions: transport decarbonization [71], the reduction of CO2 emissions, the overload of traffic infrastructures, the shortage of parking areas, the impact on traffic flows and the necessity of stimulating sustainable transport systems that handles with the higher demand. Trilles et al. [64] highlights the necessity of public authorities in evolving its urban management policies since there is a growing demand for efficiency, sustainable development, quality of life and an effective management of its resources. The design of an optimized transport network, more extended and with the frequency adjusted to the users demand of collective transport, new bus corridors and more flexible mobility paradigms that need to encourage equity and the universality of mobility services in cities are also important factors to be considered. There are many approaches that have been tried, such as car sharing with priority lanes for shared vehicles (Seattle case), charges for driving/entering in the city centre (London/Amsterdam cases), introducing cycle lanes or improving public transport (PT) options [37]. Alongside with these cases, however, some new problems are emerging such as new and additional public expenditure, changes in the daily routine of drivers and new problems that become a reality through the commuter point of view. It is only possible to take advantage of inter-modal services, using both private and PT, if there is a behavioral change in the population, that needs to discard the idea of owning a vehicle, and if stakeholders such as academics, urban planners, policy and decision makers work in a collaborative and effective way towards the transition to future mobility digitalization and if its implications are supported by public and integrated policies [15]. The ever-growing use of new technologies in cities’ management, that can easily integrate new services, such as new transport services, which can evolve and make people aware of the possibility of renouncing to the usage of private cars to the detriment of having a PT offer that is easier, healthier and economically viable. Therefore, one key aspect that needs to be properly addressed into the future mobility planning and design is the accurate development of Mobility as a Service (MaaS) paradigm as an integrated, simple of use and collaborative system. This chapter is structured as follows. In the next Section, the mobility concept in an urban context is outlined, together with the PT and the inter-modality ideas. In Sect. 3, the Cities of the Future concept is presented, as well as the Sustainable and Smart cities themes; In Sec. 4 is addressed the global Mobility as a Service concept and its related notions, through an empirical methodology, and in particular, the technological tools and innovative solutions used in Europe; In Sec. 5 is discussed a future mobility approach; and brief conclusions and future work possibilities are outlined in Sect. 6.
Mobility in the Era of Digitalization: Thinking Mobility …
277
2 Mobility In latest years, urban and suburban development processes have become more widespread, aiming to promote a better quality of life for the population and taking into consideration the challenge of being social inclusive. However, for certain groups of the population, several constraints could be explained or related to transports [56], specifically in terms of vehicle emissions, congestion and auto dependency which jointly conspire to reduce the aimed quality of life in many cities [29]. The rapid growth of such areas has generated an increasing necessity for persons and goods’ mobility, dominantly made by road and carburant motor transportation devices. Nowadays, this situation is becoming unsustainable regarding environmental, economic and financial terms and public policy measures must be taken into consideration towards reversing this global tendency or trying to reduce its impacts. New procedures and planning tools have been developed to deal with mobility problems under this new paradigm, where public transportation, traffic circulation and urban planning activities must be considered altogether in a combined approach [56], known as Sustainable Urban Mobility Plan (SUMP).
2.1 Urban Mobility Several urban problems, such as traffic accidents, congestion, noise, air pollution and mobility constraints for certain groups of the population, have been caused by or are related to transport [56], specifically vehicle emissions, congestion and the ever more car dependency conspire to reduce quality of life in many cities [29]. Transport impacts goes beyond the urban boundaries, with social, economic, political and environmental effects [56]. As cities are dominant centres of production and consumption, most transport, both of passenger and freight, starts and ends in urban areas and often bypasses several urban areas on its way [34]. The car is the most frequently used mode of transport and also the most polluting by kilometre covered [45]. Thus, strategies need to be developed and implemented to promote the local PT, cycle and walking, allowing those areas to become more attractive and therefore reducing the greater use of the car.
2.2 Public Transports (PT) The planning of PT systems plays a critical role in improving accessibility for all users, namely keeping average trip lengths below the thresholds required for maximum use of the walk and cycle modes. The main advantages for each type of stakeholders are [18]:
278
L. Barreto et al.
• Travellers have a general vision of the available transport services, as if available in a centralized system the information received is personalized according to a profile that is dynamically adapted (a specific traveler can change the trip plan of an on-going trip without the need of pursuing the initial plan); • Transport providers can adapt and/or create new services and business models, able to address the population needs, taking into account the promotion of accessibility and social equity dimensions; • Authorities can enhance the management of public infrastructures, e.g. during emergency situations, and effective inclusion policies can also be designed. In recent times, many transport agencies are investing in the PT systems in order to transform it into an integrated system. Nevertheless, the understanding of public transport users’ perceptions and how this aligns with policy makers’ perceptions of an integrated system is limited [10].
2.3 Inter-modal Mobility The promotion of inter-modal transport has been one of the European concerns [17]. The development of an inter-modal transport systems should be a major priority, towards enhancing mobility and accessibility while, at the same time, reducing congestion, road accidents and pollution in cities [2], embracing their ‘overall development. The inter-modal urban mobility can be viewed as the integral combination of components and it is pointed as the key trigger to guarantee the sustainable mobility. As the implementation of inter-modal mobility concepts grows in our cities, the interaction among shared vehicles will create grow [31]. The inter-modality embraces several challenges, such as technological and the need of an enormous coordination from a variety of transport operators to be covered by the same ticketing system (e.g. integration of long distance and urban offers) [17]. Those challenges will be omnipresent in all cities, in the near future.
3 Cities of the Future Cities are growing at an impressive rate, and with their population increasing they need to be built faster and more effectively towards guaranteeing a smooth and sharp strategy to the arising of the upper mentioned challenges. Growing cities are not only dense in terms of land use, but they are socially and economically diverse which makes them challenging to govern. While cities have provided economic opportunities to migrants, they have also faced increased social segregation and acute shortages of physical and social infrastructures.
Mobility in the Era of Digitalization: Thinking Mobility …
279
Cities of today must: • Plan its sustainable and resilient future; • Balance economic and social development, as well as environmental protection; • Design solutions adapted to their local context and enhance their character [70]. In the last decades, sustainable cities and sustainable mobility concepts have gained political significance, in particular at the European level [50], based on the social, environmental and economic balanced scheme. This enforces the urban development paradigm between the environmental crises and the worsening social decays, causing ecological and social scarcity and exposing future life [7].
3.1 Sustainable Cities The term sustainability can be described as a state in which society does not destabilise the natural and social systems [7], avoiding mass consumption and preventing environmental pollution, as well as guaranteeing the realization of any population’s actual needs, promoting its quality of life. One way to guarantee that is by achieving proper mobility, i.e. having the possibility to choose between various transport modes, regarding its sustainability. Thus, a vital strategy is to develop an appropriate sustainable urban mobility plan, as well as to understand the complex interaction between residence, workplace and mobility choices [30], together with a mobility service that best serves the citizens [21], meeting their expectations [10], and improving their quality of life. It must be referred that one of the European Union (EU) aims is to establish a sustainable mobility system [50], that is deeply connected to Smart Mobility [39].
3.2 Smart Cities During the past two decades, the smart city concept was ambiguous and generated a recurrent doubt regarding its feasibility and overall potential [3]. Smart cities are networked places where deploying Information and Communications Technologies (ICT) into each activity in the city clearly improves its standards of life [32]. Together with the conception of the Internet of Things (IoT) the smart cities concept was established, supporting the city operations intelligently [57] and with a special focus on embracing the sustainability purpose with the quality of life into the cities of the future. Nowadays, the citizens’ contexts of working, socialization and personal development are more and more inter-connected with technological concepts. Thus, the technology could lead to the implementation of sustainable measures and policies [4] towards facilitating its visibility and proving its benefits. A smart city, in its broad
280
L. Barreto et al.
sense, comprehends various dimensions and embraces many challenges like economic competitiveness, inter-modality, social inclusion, sustainability, viability of the diverse systems (transport, waste, water, information, energy), together with related data processing and data security. Therefore, data needs to be collected, stored, treated, shared and make it available in a secure and efficient way.
4 Mobility as a Service (MaaS) Mobility is an essential part of modern society and a prerequisite for autonomous living [61]. A change in urban mobility based on the usage of the private car is imperative, and surely technology is an important element for this ambition, allowing to involve the sustainable and smart cities concepts. Mobility as a Service (MaaS) is an approach that connects new players and generates new business concepts. MaaS concept/paradigm embraces a new model for mobility that unites all modes of transport and mobility services together in a one-stop-shop package [1], with the intent to reduce traffic volumes, emissions and congestion in urban areas and increase efficiency in rural areas, as well as to create a cost effective, time efficient and organized transport system. Therefore, becoming a flexible, sustainable and also a powerful opponent to the use of the private car [1].Thus, MaaS comprises three principal characteristics [42]: • On-demand transport—MaaS satisfies the user’s requisites for a variety of transport services. A MaaS system aims to define the most suitable mean of transport for the user. The transport can be PT, taxi or car rental, or even bike-sharing among other possibilities; • A subscription service—MaaS users can use, for an agreed period of time, a payas-you-go subscription. Thus, users will not need to buy travel tickets or sign up for separate transport accounts; • Potential to create new markets—MaaS can be used towards gaining higher visibility and insight about the travel on-demand data patterns and dynamics. This will allow the creation of new sales channels, by detecting unexploited customer demand, as well as the simplification of the user’s account and payment management. Despite the multimodal and sustainable mobility services, addressing customers’ transport needs is another key MaaS feature that can be performed by integrating planning and payment on a one-stop-shop principle [1] which provides the desired flexibility in today’s on-demand environments, allowing commuters to build their own journeys for their own convenience, and consequently offering a tailored hyperconvenient mobility solution, a concrete market option [2], with the promising perspective of changing the role for the private car [26, 42], especially when trips are short [25]. The MaaS development can lead to different impacts on traditional PT’s [60]. Hensher [22] argues that in Australia, if the PT is not at the centre of MaaS, and conveniently complemented with on-demand transport covering the first to the
Mobility in the Era of Digitalization: Thinking Mobility …
281
Fig. 1 MaaS overall essential components
last mile, it will not be able to succeed. The integration of transport, namely the related to the public types, can be promoted with an open and extensible mobility platform distribution that integrates all different types of transport in mega-cities for the benefit of all stakeholders (Fig. 1). Since the MaaS’ paradigm is an holistic approach and a still emerging concept [16], it is possible not to have consider all the MaaS actors, different points of view, e.g.: a user must be the direct beneficiary to an attractive, cost effective and convenient mobility service; the system can be tailored to understand users’ behaviors, needs and patterns, providing a more environmental friendly mobility solution; the operator can provide experience and knowledge from the current challenges they daily face and, aiming to achieve economic profitability; and the academia can play an important role in the research and dissemination of the best and innovative practices worldwide.
4.1 Regulation MaaS regulation factors, such as society’s laws, regional MaaS regulation factors, such as society’s laws, regional differences, market size, current market shares, active stakeholders and key actors in the market [16], as the base of MaaS models will have direct interference on the MaaS application. MaaS success will depend on changes to the existent regulation and in the introduction of new legal platforms [48], allowing new systems to be developed [23], paving the way towards a smarter and more sustainable mobility system [17].
4.2 Application Demanding for mobility or prevailing habits may differ regarding the region and the baseline geographical context in which they are being considered therefore MaaS should be able to be tested and evaluated in various environments, from urban to rural
282
L. Barreto et al.
contexts [1]. In urban areas, MaaS is more related with the quality and efficiency of the transport system improvement, promoting a change on the demand for transport infrastructures, such as parking places and city planning [1]. In rural areas, MaaS may facilitate a transfer of mobility services, e.g. allowing to define more demandresponsive, flexible and sharing-based transport services, adopted to the rural context [5].
4.3 Security While some inter-modal offers exist on the market the transport companies do not necessarily face incentives to share valuable and business relevant data with third parties, including their competitors [17]. With increasingly sophisticated mobility systems, complex and inter-connected net works [9], are increasing the security challenges that need to be properly considered and handle. Across the several dimensions and layers that coexist in a smart city with different types of information that are constantly being generated: data related to users, providers and authorities (Fig. 1). When a user shares personal information or does a registration, he is providing information to create a profile, which provides key data for being collected afterwards by the city that will allow the identification of the user’s behaviours and mobility patterns: needs, interests and possible choices; nonetheless, the user is also sharing that information with all the other stakeholders. Thus, an important challenge related to the security of intelligent mobility systems and services is how to handle all the information about the citizens and the city itself, guaranteeing its protection, confidentiality, and privacy of any personal information without conditioning the operational efficiency of the system.Other challenge that may also be considered is related to the integrity of these data, together with the assurance of the availability of the applications and systems, regarding the integration of mobility technological platforms.
4.4 Integration An integrated PT system has five main attributes [10]: 1. network integration; 2. fare integration; 3. information integration; 4. physical integration of stations; and 5. coordinated schedules. In order to achieve fully integrated transport solutions all key actors—policy-makers, urban and mobility planners, city stakeholders and platform operators [27]—must collaborate and cooperate [28] in order to obtain a multi-modal and integrated approach, customized to their specific local context [2], considering the urban and the rural environments [25], enabling the role for technology to encourage inter-modality in the urban transport [2]. Thus, a core point of any smart mobility system is the development of an integrated mobility technological platform.
Mobility in the Era of Digitalization: Thinking Mobility …
283
4.5 Ticketing The promotion of an integrated ticketing has been, in the latest years, on the European policy agenda [17], and the main aim of having an integrated ticketing system is to improve the service quality for (potential) PT users and therefore to encourage the use of tangible alternatives to the use of the own car [50]. Integrated tickets are not smart per se; combining services of several operators on one single ticket can likewise be paper-based. One of the main advantages of smart ticketing is that tickets can be sold and stored in an electronic device, such as smart cards or mobile phones. An Integrated e-ticketing scheme has long been on the agenda of EU transport policy, but until now there is not yet a wider-scale application available since its implementation is a complex process that requires the synchronized activity of a huge amount of heterogeneous actors. In Europe, most countries have an e-ticketing system at least in their capital [50]. For example, PT in the Netherlands mostly uses an integrated ticketing system (OV-chip card) which can be used for all types of transport modalities [55].
4.6 APP’s/Platforms Presently, towards using a mobility solution it is necessary to access many different platforms, each with a specific focus [49]. Mobility platforms represent an element of any inter-modal transport [17]. An integrated platform must include registration procedures and travel packages, inter-modal journey planning, booking options, smart ticketing and payment functions so that the entire chain of transport can be managed by a centralized platform [2] (as the MaaS-London proposal). To better address, the mobility needs of travelers, information from different transport providers (e.g. buses, metro, ambulances, taxis) need to be handled and made available in a central system [18]. A mobility platform assistance system should support users with urban trip-planning and outnumbers comparable approaches inasmuch the inter-modal options [36] should take the dynamic interactions between platform owners, service providers, and consumers into consideration [38]. It is possible to distinguish two forms of mobility platforms. There are platforms that focus on the provision of raw data through open inter-faces and, otherwise, there are platforms that provide digital mobility services [49]. Following it will be pointed a study that is focused on the platforms that provide digital mobility services. In this study, the sample dimension was of about 60 APP’s/platforms reviewed. In the following table (Table 1), we present the ones which present MaaS integrated platform systems, with information in English and regarding the European territory, a country or a specific city or cities, taking into consideration the Europe cohesion space. In this sample, it were considered pilot projects, as well as operational, concluded ones or even projects that are being in the planned phase.
284
L. Barreto et al.
Table 1 APP’s/Platforms and its characteristics Platforms Region Transport modes Beamrz [6]
Eindhoven
PT (train, bus) Car Scooter Bike PT Car Bike Walking PT Car Scooter Bike PT (train, bus) Plane
Citymapper [11]
Europe
Comtrade Digital Platform [14]
Ljubljana
GoEuro [19]
Europe
Google Maps [20]
Europe
PT Car Bike Walking Plane
IMAb [24]
Berlin
Kyyti [33]
Turku
Qixxit [51]
Germany
PT Car Bike Walking PT (train, bus) Car Bike PT (train, bus) Plane
MaaS Pilot Ghenta [62]
Ghent
PT
Mobilleo [40]
UK
PT (train) Plane
Services
Customisation
Taxi
Beamrz community
Mapping Free mobile APP Desktop website Taxi Car sharing Car rental Shuttle
Car sharing Bike sharing Ride sharing Chauffeur Car sharing Car rental Taxi Car rental Ride sharing Taxi Car sharing Bike sharing Taxi Parking Booking Hotels Restaurants
Regular and electriccars
Find the fastest, cheapest and best travel options Schedule and budget Mapping Satellite imagery 360◦ panoramic views of streets Real-time traffic conditions Route planning Total cost Trip time Environmental footprint Urban transit
Duration and price information Carbon footprint Mobility budget Smartphone APP Business MaaS Payment
(continued)
Mobility in the Era of Digitalization: Thinking Mobility … Table 1 (continued) Platforms Region
285
Transport modes
Services
Customisation Real time information Smartphone APP Urban transit Real time traffic information Payment
Mobility 2.0 services [13]
Palma
PT (bus) Bike
Parking
MOBiNET [41]
Europe
Car
MyWayb [43]
Barcelona, Berlin, Trikala
PT Car Walking
NaviGoGo: Pick& Mixb [44]
Fife, Dundee
PT Bike Walking
NordwestMobil [46]
Northwestern Switzerland
PT Car Bike Walking
Optimile [47]
Belgium, Lithuania, Italy, Norway, Poland, The Netherlands, Switzerland Lyon
PT Car Bike Walking
Taxi Car sharing Parking Booking Carpolling Car sharing Bike sharing Parking Booking Taxi Car clubs Bike sharing Bike rental Booking Taxi Uber Carpooling Car sharing Bike sharing Parking Taxi Electrical stations Bike sharing Booking
Optymod [12]
RACC Trips [52] Barcelona
PT Car Bike Walking Plane
Parking
PT
Taxi Car sharing Moto sharing Bike sharing Parking Auto repair
Real time information
Young people (16–25 year old) Payment Personalised journey planner Local event notifications route planning
Trip planning Electrical vehicles Ticketing Payment Urban transit Traffic prediction Road features Info on transport around user’s position Real time data
(continued)
286
L. Barreto et al.
Table 1 (continued) Platforms Region Rome2rio [53]
Europe
Smile [59]
Austria
TransitApp [63]
Europe
Transport modes
Services
Customisation
PT Car Bike Walking Plane PT Car and e-car Bike and e-bike Walking
Car rental Bike sharing Hotels Booking Attractions Taxi Car sharing Parking Booking Bike sharing Bike rental Charging stations Taxi Uber Car sharing Bike sharing On-demand Taxi Car sharing Shuttle Bike sharing
Journey planning Price options Ticketing
PT
TripGo APP [58] Europe
PT Motorbike
UBIGO a [66]
Stockholm
PT Bike
UbiGo/Go: Smarta [65]
Gothenburg
PT Bike
Ustra [67]
Hanover
PT
Taxi Car sharing Car rental On-demand Booking Taxi Car sharing Car rental Bike sharing Taxi Car sharing Booking
Mobility profile Ticketing Payment Billing CO2 emissions Real time data Real time data Walking options
Real time data Trips planning Calendar synchronization Service alerts GPS locations Flexible monthly payment, for each house hold
24/7 manned support service
Real time data Routing Registration Ticketing (continued)
Mobility in the Era of Digitalization: Thinking Mobility … Table 1 (continued) Platforms Region
287
Transport modes
Services
Customisation
Taxi Car sharing Car rental Bike sharing Booking Taxi Car sharing Bike sharing Parking Booking
Real time data Payment: monthly packages or pay-as-you-go Ticketing Real time data Payment
Whim [35]
Helsinki, West Midlands
PT Car
WienMobil [69]
Vienna
PT Walking
a Pilot; b Trial
Table 1, presents the selected platforms by alphabetic order, regarding transport modes available, associated services and type of customization. Transport modes Services Customization
Register all the modes considered in the combination proposed by the APP/platform. Designated services provided as transport modes or extra services. Particularities and personal services offered in the APP/platform.
The PT mentioned in the Table 1 includes bus, subway, tram, train, and ferry, whatever transport is available in the region and is considered as a transport mode. It’s important to refer that some of the APP’s/platforms were not considered in this review due to the fact that they are not available in English, e.g. Green Class CFF E-Car, Mobility-S, Switchh, tim, tropKM, among others. The cases selected and presented in Table 1 focused on MaaS APP’s/platforms in which the PT was a base offer with integration with other types of transport modes. In the particular case of MaaS’ business, more services were offered, but due to its specifications, a full integration with PT is not guaranteed. Folowing are described some features of the table presented APP’s/platforms, according to the respective mentioned authors: • Beamrz—Provide access to all forms of road transport; • Citymapper—Integrates data for all urban modes of transport, started in London and the second city was NYC and now covers cities on every inhabited continent except Africa; • GoEuro—A travel platform for any town or village in Europe; • IMA— Is stated to contribute to a more efficient and sustainable mobility behavior of citizens and tourists, increasing their quality of life; • Qixxit—The APP redirects the tickets booking to participating suppliers’ websites.;
288
L. Barreto et al.
• MaaS Pilot Ghent—A three months pilot where a hundred participants receive mobility budget to spend on alternative mobility modes; • Mobilleo—The first Mobility as a Service technology platform dedicated to business; • Mobility 2.0 services—Utilizes social media to foster communication between customers and there is also a distinct focus on sustainable mobility options; • MyWay—Offer a solution closest to each single traveler’s vision, according to the personal needs and preferences, contributing to the sustainability of urban transport [8]; • NaviGoGo: Pick&Mix—Scotland’s first MaaS web application; • Optimile—Provide a cloud-based mobility platform and its functions as a whitelabel platform, thence fore tailored to any company’s needs and any corporate identity; • Rome2rio—This multimodal transport search engine was launched in April 2011; • Smile—More than 1000 persons tested the pilot; • TransitApp—It is functional in over 125 cities worldwide; • Whim—The world’s first MaaS operator. This empirical research allows us to have an overview of the existing APP’s/ platforms, its upgrades and main characteristics/functionalities and services. This outlines a future path towards the integration and cooperation of all transport modes in a unique system.
4.7 Extra Services Extra services are offered to complete the MaaS and also to create target options according to the region and to the users. Thus, a MaaS service can include different types of service agreements [16]. They can contribute to the development of the APP’s/platforms regarding the extra services offered and the number of users. If the APP/platform provides users’ access regarding extra services that match their needs and profile, then it will outstandingly become more useful. The MaaS could increase an area’s attractiveness and enable its accessibility [1], e.g. therefore, its wide potential for tourism promotion.
4.8 Cooperative Services A MaaS service owned or commissioned by a corporation which is focused on their workers’ pendular movements, to and from a work site or campus can be identified as a Corporate MaaS (CMaaS) [23]. A CMaaS is a MaaS service which provides an integration of several transport modes, based on a digital platform where users can register (assuming that all employees are already registered); and can also
Mobility in the Era of Digitalization: Thinking Mobility …
289
provide a “one-stop shop” from a user-centered perspective [23]. Thus, contributing to change the travels’ behaviour by sharing transportation on the home-work-home trips, leastwise at some point of the trip, it will reduce car usage and congestion and promote the use of different vehicles—towards the adoption of more sustainable mobility practices. The argument previously stated, follows the trend related to the migration from car ownership to usership [68].
5 Future MaaS—Some Considerations The future mobility will gain an increased importance in the way urban metabolism is evermore being designed and planned. The way cities will be influenced by the concepts of smart, sustainable and resilient will alter completely the living standards in urban, suburban and rural places. MaaS is encouraging transportation towards being universally affordable and inclusive, environmentally friendly, personalized and flexible to deal with the complexities and dynamics inherent to the mobility systems. Aiming that these personalized solutions could contribute to the sustainability of the transport system, in all its dimensions. Artificial Intelligence (AI) technology will be almighty present in any type of mobility solution. It will be designed to provide a personalized travel experience, easing up the interaction between users and APP/technological support systems in an integrated and simpler manner. The usage of natural language recognition to identify voice and, therefore, helping with the decision process of planning the user own travel in real-time with all the peculiarities and vicissitudes that any day life journey may have, making sure that the users’ preferred settings are properly considered towards selecting the best mobility options at any time, without conditioning the personal privacy or other important data that might cause any embrace or inappropriate/nonauthorized usage. In the cities of the future, the amount of information available will increase immensely. Especially by using technologies like sensor-ingested maps, which sends real-time incident data or other road attributes, to the vehicles themselves, moving towards breakthroughs in autonomy and connectivity, as well as deploying key information about the cost, time, transport types alternatives, amount of CO2 discharged, among others. Therefore, it will be extremely important to select the mobility pattern according to the user’s personal desire, physical conditions, environmentally friendliness, health and age restrictions. Among the required conditions of the cities of tomorrow, it will be also of extreme importance to work the inter-connectivity between cities and rural places as well as the inter-connectivity between countries. An upper level of MaaS might need to be considered towards enhancing this necessity and urgency to be able to easily connect foreign countries with different mobility ecosystems in an integrated way.
290
L. Barreto et al.
6 Conclusion It is almost unthinkable to discuss the future of the cities and its sustainability without addressing the mobility issues in a proper way. There are multiple challenges to be overcome in MaaS concept and the technological ones must not be overlooked. It is clear that technology will have a leading role on the design, planning, and operability of all the mobility systems of tomorrow . The way users will interact with those systems will be critical to potentiate the “right” combination of mobility solutions focused in the PT offer centrality, supported by the availability of services that will create the right balance to the system, among other dimensions like the social inclusion and the universality of the whole system. The sustainability concept together with the digital transformation of society is tipping points towards the identifications of new directions and possibilities that in the past were impossible to consider. The personalization of mobility, supported by the availability of several services, will profoundly change the way users/citizens live in cities as well as they interact. The consideration of mobility APP’s/platforms is critical to support the success of these measures. Notwithstanding, a cultural shift is required to increase the awareness and level of understanding of the personal mobility choices and as well as its impacts and implications in the citizens’ quality of life, environment, costs, health & happiness among others. The identification of case-studies, that can be used as benchmarking towards the definition of functionalities/characteristics that the mobility APP’s/platforms should deploy, is of capital importance. The examples studied point to a critical tendency that has been gaining visibility and a world-wide importance for the future of mobility. The MaaS concept/paradigm is leading the establishment of the mobility milestones, properly adapted to the requirements of the smart cities’ mobility patterns, the sustainability of all systems, as well as the full integration and commitment of all the stakeholders involved. As future work we can foresee one important line of action. That is to pursue amongst the European Union the intention to elaborate concrete legislation that propels the full integration of transnational mobility services and services, driving the development of a multidisciplinary and full integrated European MaaS system. Acknowledgements This work was supported by FEDER (Fundo Europeu de Desenvolvimento Regional) through Programa Operacional Competitividade e Internacionalização, in the scope of the Project"ALTO MINHO. SMOB - Mobilidade Sustentável para o Alto Minho", Ref. POCI-010145-FEDER-024043.
References 1. A. Aapaoja, J. Eckhardt, L. Nykänen, J. Sohor, MaaS service combinations for different geographical areas, in Intelligent Transport System World Congress (Montreal, 2017) 2. G. Ambrosino, J.D. Nelson, M. Boero, I. Pettinelli, Enabling intermodal urban transport through complementary services: From flexible mobility services to the shared use mobility—Agency workshop 4. Developing inter-modal transport systems. Res. Transp. Econ. 59, 179–184 (2016)
Mobility in the Era of Digitalization: Thinking Mobility …
291
3. L. Anthopoulos, Smart utopia VS smart reality: Learning by experience from 10 smart city cases. Cities 63, 128–148 (2017) 4. D. Banister, The sustainable mobility paradigm. Transp. Policy 15(2), 73–80 (2008) 5. L. Barreto, A. Amaral, S. Baltazar, Mobility as a Service (MaaS) in rural regions: An overview, in 2018 International Conference on Intelligent Systems (IS), (Madeira Island, 2018) 6. BEAMRZ for business, Let’s go beaming! (2017), https://www.beamrz.com/. Accessed 24 July 2018 7. S.E. Bibri, J. Krogstie, Smart sustainable cities of the future: An extensive interdisciplinary literature review. Sustain. Cities Soc. 31, 183–212 (2017) 8. M. Boero, M. Garré, J. Fernandez, S. Persi, D. Quesada, M. Jakob, MyWay personal mobility: From journey planners to mobility resource management. Transp. Res. Procedia 14, 1154–1163 (2016) 9. Catapult—Transport Systems, Cyber security and intelligent mobility. Tech. Rep. (2016) 10. S. Chowdhury, Y. Hadas, V.A. Gonzalez, B. Schot, Public transport users’ and policy makers’ perceptions of integrated public transport systems. Transp. Policy 61, 75–83 (2018) 11. Citymapper, Making cities usable (2018), https://citymapper.com/. Accessed 27 July 2018 12. CityWay, OPTYMOD (2018), https://itunes.apple.com/pt/app/optymodlyon/id1039416275? mt=8. Accessed 24 July 2018 13. CIVITAS Initiative, Mobility 2.0 services (2013), http://civitas.eu/content/mobility-20services. Accessed 24 July 2018 14. Comtrade, Comtrade—Digital services (2018), https://comtradedigital.com/mobile-appdevelopment/. Accessed 24 July 2018 15. I. Docherty, G. Marsden, J. Anable, The governance of smart mobility. Transp. Res. Part A Policy Pract. 115, 114–25 (2017) 16. J. Eckhardt, A. Aapaoja, L. Nykänen, J. Sochor, in Mobility as a Service business and operator models. In: 12th ITS European Congress (2017), p. 12 17. M. Finger, N. Bert, D. Kupfer, in Mobility-as-a-Service: from the Helsinki experiment to a European model?, Robert Schuman Centre for Advance Studies, Tech. Rep, Mar 2015 18. T. Fontes, J. Correia, J.P. de Sousa, J.F. De Sousa, T. Galvão, A multi-user integrated platform for supporting the design and management of urban mobility systems, in 20th EURO Working Group on Transportation Meeting, EWGT 2017 (Elsevier B.V, 2017). Transp. Res. Procedia 27, 35–42 19. GoEuro, Trains, Buses and Flights (2018), https://www.goeuro.co.uk/. Accessed 30 July 2018 20. Google, Google Maps (2018), https://www.google.com/maps. Accessed 27 July 2018 21. D.A. Hensher, Future bus transport contracts under a mobility as a service (MaaS) regime in the digital age: Are they likely to change? Transp. Res. Part A Policy Pract. 98, 86–96 (2017) 22. D.A. Hensher, Tackling road congestion-What might it look like in the future under a collaborative and connected mobility model? Transp. Policy 66, A1–A8 (2018) 23. M. Hesselgren, M. Sjöman, A. Pernestål, Understanding user practices in mobility service systems: Results from studying large scale corporate MaaS in practice. Travel. Behav. Soc. (2019) 24. IMA Project, Ima—Intermodal mobility assistance (2018), http://ima.dai-labor.de/. Accessed 30 July 2018 25. S. Ison, L. Sagaris, Workshop 4 report: Developing inter-modal transport systems. Res. Transp. Econ. 59, 175–178 (2016) 26. P. Jittrapirom, V. Caiati, A.M. Feneri, S. Ebrahimigharehbaghi, M.J.A. González, J. Narayan, Mobility as a Service: A critical review of definitions, assessments ofsSchemes, and key challenges. Urban Plan. 2(2), 13–25 (2017) 27. P. Jittrapirom, V. Marchau, R. van der Heijden, H. Meurs, Dynamic adaptive policymaking for implementing Mobility-as-a Service (MaaS). Res. Transp. Bus. Manag. 27, 46–55 (2018) 28. I.C.M. Karlsson, J. Sochor, H. Strömberg, Developing the ’Service’ in mobility as a service: Experiences from a field trial of an innovative travel brokerage. Transp. Res. Procedia 14, 3265–3273 (2016)
292
L. Barreto et al.
29. C. Kennedy, E. Miller, A. Shalaby, H. Maclean, J. Coleman, The four pillars of sustainable urban transportation. Transp. Rev. 25(4), 393–414 (2005) 30. J. Kinigadner, F. Wenner, M. Bentlage, S. Klug, G. Wulfhorst, A. Thierstein, Future perspectives for the munich metropolitan region—An integrated mobility approach, in International Scientific Conference on Mobility and Transport Transforming Urban Mobility, mobil.TUM (Elsevier, 2016). Transp. Res. Procedia 19, 94–108 31. M. Kuemmerling, C. Heilmann, G. Meixner, Towards seamless mobility: Individual mobility profiles to ease the use of shared vehicles, in: 12th IFAC Symposium on Analysis, Design, and Evaluation of Human-Machine Systems (IFAC, 2013), pp. 450–454 32. R.K.R. Kummitha, N. Crutzen, How do we understand smart cities? An evolutionary perspective. Cities 67, 43–52 (2017) 33. Kyyti Group Ltd., Kyyti—Makes daily travel easy (2017), https://www.kyyti.com/english. html. Accessed 24 July 2018 34. M. Lindholm, S. Behrends, Challenges in urban freight transport planning-a review in the baltic sea region. J. Transp. Geogr. 22, 129–136 (2012) 35. MaaS Global Oy, Travel smarter (2018), https://whimapp.com/. Accessed 24 July 2018 36. N. Masuch, M. Lützenberger, J. Keiser, An open extensible platform for intermodal mobility assistance, in Proceedings of 4th International Conference on Ambient Systems, Networks and Technologies (Elsevier, 2013). Procedia Comput. Sci. 19, 396–403 37. R. McCall, V. Koenig, M. Kracheel, Using gamification and metaphor to design a mobility platform for commuters. Int. J. Mob. Hum. Comput. Interact. 5(1), 1–15 (2013) 38. H. Meurs, H. Timmermans, Mobility as a Service as a multi-sided market: Challenges for modeling, in Proceedings of Transportation Research Board 96th Annual Meeting, vol. 17 (2017), p. 00830 39. F.E. Misso, F.E. Pani, S. Porru, C. Repetto, U. Sansone, Opportunities and boundaries of transport network telematics. Interreg-Central. Eu (2017) 40. Mobilleo, Mobility as a Service for business (2017), https://www.mobilleo.com/. Accessed 30 July 2018 41. MOBiNET, MOBiNET—Internet of mobility (2017), http://mobinet.eu/. Accessed 26 July 2018 42. C. Mulley, J.D. Nelson, S. Wright, Community transport meets mobility as a service: On the road to a new a flexible future. Res. Transp. Econ. 1–9 (2018) 43. MyWay, My life my way (2018), http://app.myway-project.eu/. Accessed 30 July 2018 44. Navigogo, Plan, book and pay for your journeys (2017), https://navigogo.co.uk/. Accessed 24 July 2018 45. J.P. Nicolas, P. Pochet, H. Poimboeuf, Towards sustainable mobility indicators: application to the lyons conurbation. Transp. Policy 10(3), 197–208 (2003) 46. NordwestMobil, NWMobil (2016), https://androidappsapk.co/detail-nordwestmobil/. Accessed 24 July 2018 47. OPTIMILE, Conquer your mobility market with the MaaS Plus platform (2018), https://www. optimile.eu/product/mobility-as-a-service/. Accessed 27 July 2018 48. J.P. Ovaska, Emergence of mobility market platforms—Case: Mobility as a Service in Finland. Master’s thesis, Aalto University—School of Business, 2017 49. Pflügler C, Schreieck M, Hernandez G, Wiesche M, Krcmar H, A concept for the architecture of an open platform for modular mobility services in the smart city, in International Scientific Conference on Mobility and Transport Transforming Urban Mobility, mobil.TUM 2016 (Elsevier, 2016). Procedia TR 19, 199–206 50. M. Puhe, Integrated urban e-ticketing schemes—Conflicting objectives of corresponding stakeholders, in Mobil. TUM 2014 Sustainable Mobility in Metropolitan Regions (Elsevier B.V., 2014). Transp. Res. Procedia, 4, 494–504 51. Qixxit, Where to next? Book trains, buses and flights (2018), https://www.qixxit.com/en/. Accessed 27 July 2018 52. RACC Automvil Club, One app. All mobility options at your fingertips (2018), https://www. racc.es/microsites/application-racc-trips. Accessed 24 July 2018
Mobility in the Era of Digitalization: Thinking Mobility …
293
53. Rome2rio Pty Ltd., Discover how to get anywhere by plane, train, bus, ferry and car (2018), https://www.rome2rio.com/. Accessed 27 July 2018 54. S. Sarasini, O. Langeland, Providing alternatives to the private car: the dynamics of business model innovation, in 1st International Conference of Mobility as a Service (ICoMaaS) (Tampere, 2017), pp. 21–42 55. R. Schakenbos, L.L. Paix, S. Nijenstein, K.T. Geurs, Valuation of a transfer in a multimodal public transport trip. Transp. Policy 46, 72–81 (2016) 56. A.N.R. da Silva, M.A.N. de Azevedo Filho, M.H. Macêdo, J.A. Sorratini, A.F. da Silva, J.P. Lima, A.M.G.S. Pinheiro, A comparative evaluation of mobility conditions in selected cities of the five brazilian regions. Transp. Policy 37, 147–156 (2015) 57. B.N. Silva, M. Khan, K. Han, Towards sustainable smart cities: A review of trends, architectures, components, and open challenges in smart cities. Sustain. Cities Soc. 38(August), 697–713 (2018) 58. SKEDGO, Plan your trips with any combination of transport modes, in real-time (2018), https:// skedgo.com/home/tripgo/. Accessed 09 Aug 2018 59. Smile Mobility, Smile-simply mobile (2015), http://smile-einfachmobil.at/index_en.html. Accessed 12 Dec 2018 60. G. Smith, J. Sochor, I.C.A. Karlsson, Mobility as a Service: Development scenarios and implications for public transport. Res. Transp. Econ. (2018) 61. M. Stein, J. Meurer, A. Boden, V. Wulf, Mobility in later life—Appropriation of an integrated transportation platform, in Conference on Human Factors in Computing Systems—CHI’17, Denver, CO, USA, 6–11 May 2017, pp. 5716–5729 62. The Network for Sustainable Mobility, Mobility as a service (2018), http://www.idm.ugent. be/maas/. Accessed 24 July 2018 63. transit, Go Touy own way (2018), https://transitapp.com/. Accessed 24 July 2018 64. S. Trilles, A. Calia, Ó. Belmonte, J. Torres-Sospedra, R. Montoliu, J. Huerta, Deployment of an open sensorized platform in a smart city context. Futur. Gener. Comput. Syst. 76, 221–233 (2017) 65. UbiGo, Attractive and sustainable urban mobilit (2013a), http://ubigo.se/. Accessed 25 July 2018 66. UbiGo, Attractive and sustainable urban mobility (2013b), http://ubigo.se/las-mer/aboutenglish/. Accessed 24 July 2018 67. Ustra Hannoversche Verkehrsbetriebe Aktiengesellschaft, Here is where the journey starts (2018), https://shop.uestra.de/index.php/. Accessed 24 July 2018 68. J.M.L. Varela, Y. Susilo, D. Jonsson, User attitudes towards a corporate Mobility as a Service. Transportation Submitted (2018) 69. Wiener Linien, The city at your fingertips with a single app (2018), https://www.wienerlinien.at/ eportal3/ep/channelView.do/pageTypeId/66533/channelId/-3600061. Accessed 24 July 2018 70. World Economic Forum—Committed to Improving the State of the World, Inspiring Future Cities & Urban Services—Shaping the Future of Urban Development and Services Initiative (Tech. Rep, April, PwC, 2016) 71. J. Zawieska, J. Pieriegud, Smart city as a tool for sustainable mobility and transport decarbonisation. Transp. Policy 63, 39–50 (2018)
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm Applied to Quadrotor Aerial Robots Jorge Sampaio Silveira Júnior and Edson Bruno Marques Costa
Abstract In this paper, we propose to use a state-space Takagi-Sugeno (TS) fuzzy model to tackle quadrotor aerial robots complexities, such as highly nonlinear dynamics, multivariable coupled input and output variables, parametric uncertainty, are naturally unstable and underactuated systems. To estimate the fuzzy model parameters automatically through input and output multivariable dataset, two fuzzy modelling methodologies based on Observer/Kalman Filter Identification (OKID) and Eigensystem Realization Algorithm (ERA) are proposed. In both methods, the fuzzy nonlinear sets of the antecedent space are obtained by a fuzzy clustering algorithm; in this paper we approach the Fuzzy C-Means algorithm. These two methods differ in the way to obtain the fuzzy Markov parameters: a method based on pulseresponse histories and another through an Observer/Kalman filter. From the fuzzy Markov parameters, the Fuzzy ERA algorithm is used to estimate the discrete functions in state-space of each submodel. Results for identification of a quadrotor aerial robot, the Parrot AR.Drone 2.0, are presented, demonstrating the efficiency and applicability of these methodologies. Keywords Fuzzy Modelling · Pulse response histories · Observer/Kalman Filter · Nonlinear systems · Quadrotor aerial robots
1 Introduction Aerial robots or unmanned aerial vehicles have brought new challenges, greater demands and new directions of research, due to the increasing applicability (demanding for navigation in new complex environments), new legal and regulatory J. Sampaio Silveira Júnior (B) · E. B. Marques Costa Instituto Federal de Educação Ciência e Tecnologia do Maranhão, Imperatriz, Maranhão, Brazil e-mail: [email protected] E. B. Marques Costa e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_13
295
296
J. Sampaio Silveira Júnior and E. B. Marques Costa
questions (especially in stricter security requirements), demand for reducing energy consumption, and emergence of new paradigms, such as smart sensing, Internet of Things, cloud computing, Industry 4.0, big data and cyber physical systems [1–3]. This scenario adds new complexities in aerial robotics and demands by new aerial robot modelling methodologies that meet the following complexities: highly nonlinear dynamics, multivariable coupled input and output variables, parametric uncertainty, are naturally unstable and underactuated systems, prone to failures and disturbances, among others [4]. Over time, several nonlinear representations have been presented to improve the modelling efficiency of this kind of systems, such as Volterra series [5, 6], Hammerstein and Wiener models [7, 8], nonlinear autoregressive models (NARX) [9, 10], neural networks [11, 12] and fuzzy systems [13, 14]. The main advantage of TS fuzzy models in relation to other nonlinear model structures is that the TS fuzzy model can add classic linear modelling representations, such as state-space, transfer functions, ARX, etc., and therefore the rich classical theories associated with these representations, such as stability and convergence analysis, frequency response, among others, can be extended to a nonlinear approach using TS fuzzy structure [15]. In other words, a TS fuzzy model, given its overall structure, briefly fuzzification, fuzzy rule base, fuzzy inference machine, and defuzzification of outputs, is globally nonlinear even though it combines several linear local submodels (of the consequent part) by fuzzy activation degrees. In this way, we propose the use of a TS fuzzy model structure with consequent of the rules base represented by discrete state-space functions to represent aerial robots given the complexities of such systems. The following is briefly presented the advantages of the TS fuzzy model to represent aerial robots [16–18]: • It is nonlinear: the TS fuzzy inference system is in principle a universal approximation of functions [19, 20], since the membership functions are in general nonlinear functions of the antecedent variables, and the set of submodels consequently are smoothly connected by these membership functions. Thus, a TS fuzzy model is a global nonlinear model consisting of a set of linear local models that are smoothly connected by fuzzy membership functions; • It is based on approximate reasoning/uncertainty: uncertainties in the TS fuzzy system are associated with the antecedent functions of the rule base; • It can have a multivariable structure: in the structure adopted in the present work, the multiple variables that define the operating regions of the aerial robot are fuzzified in the antecedent of the rule base and in the consequent are defined submodels in the state-space whose inputs and outputs are the same of the real system; • Integration with classical identification techniques: TS fuzzy models can be combined with representations by state-space structures, transfer functions, or some functional expression of any system variables. Therefore, classic identification techniques such as Least Squares, Recursive Least Squares, Eigensystem Realization Algorithm, among others, can be adapted to the fuzzy context.
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm …
297
Given this structure, the next challenge is to obtain the model parameters autonomously from input and output data using modern algorithms. In the literature, some methods are proposed to use the ERA algorithm in fuzzy context to obtain the parameters of the consequent of the the state-space discrete TS fuzzy model. In [21], an online evolving TS fuzzy state-space model identification approach for multivariable dynamic systems is proposed. This methodology is based on computation of Markov parameters from experimental data, which model is capable of adapting its structure according to the data. In [22], a methodology to system identification based on Evolving Fuzzy Kalman Filter, applied to state and parameter estimation in oscillator system, is proposed. Finally, in [23], the goal is to develop a method based on fuzzy logic which dynamically set the control parameters of an unmanned aerial vehicle depending on the characteristics of the path to be followed. But the main disadvantage of the methodologies of papers [21–23] consists in the lack of an inherent stability guarantee. In fact, these methodologies are recursive or online formulations and this fact makes it difficult to analyze and guarantee stability of the identified model (an a priori stability condition formulation should be established), mainly for control applications. In this context, we propose a batch or offline formulation for aerial robots applications due to safety requirements (aerial robots are naturally unstable systems) and to provide stability guarantee. The stability is checked after the parameters estimation procedure. To estimate the fuzzy model parameters automatically through input and output multivariable dataset, we propose two fuzzy modeling methodologies based on Eigensystem Realization Algorithm (ERA) algorithm. In both methods, the nonlinear fuzzy sets of the antecedent space are obtained by a fuzzy clustering algorithm; in this paper we approach the Fuzzy C-Means algorithm. These two methods differ in the way to obtain the fuzzy Markov parameters: a method based on pulse-response histories and another through an Observer/Kalman filter. From the fuzzy Markov parameters, the Fuzzy ERA algorithm is used to estimate the discrete functions in state-space of each submodel. Summarizing, the main contributions of this paper are: • Obtain a multivariable minimum-order fuzzy model, that is, with the smallest order among all the models that could be obtained from the same data set, allowing transparency and interpretability of the TS fuzzy state-space model identified; • The identified TS fuzzy model can be seen as the decomposition of a nonlinear system into a collection of local submodels in state-space, varying according to the input space defined by the linguistic variables of the antecedent for a convex polytopic region in the output space defined by the state-space model, being useful for robust modeling and optimal control design of uncertain and nonlinear systems with time-varying parameters; • Obtain two new methods for the identification of fuzzy models in state-space based on the calculation of Markov parameters from experimental data.
298
J. Sampaio Silveira Júnior and E. B. Marques Costa
2 Takagi-Sugeno Fuzzy Batch Modelling The structure of the Takagi-Sugeno (TS) fuzzy model is described by a set of fuzzy rules, as shown below [24]: p
R i = IF z k1 is F11,2,...,c AND z k2 is F21,2,...,c AND · · · AND z k is F p1,2,...,c xi = Ai xi + Bi uk THEN ik+1 i i k i yk = C xk + D uk
(1)
p
where R i denotes the i-th fuzzy rule (i = 1, 2, . . . , L), zk = [z k1 , z k2 , . . . , z k ] are the antecedent variables on k-th instant of time, F jc is the c-th fuzzy set of the jth antecedent parameter ( j = 1, 2, . . . , p). In the consequent part, the state matrix Ai ∈ n×n , the input matrix Bi ∈ n×r , the output matrix Ci ∈ m×n and the direct transition matrix Di ∈ m×r are the parameters of the i-th submodel of n order, with r inputs and m outputs, xik ∈ n is the state vector of the i-th submodel, yik ∈ m is the output vector of the i-th submodel and uk ∈ r is the input vector of the system. j Let μiF c (z k ) : U −→ [0, 1] ( j = 1, 2 . . . , p) the activation degree associated with j
j
the k-th sample of the linguistic variable z k , in an universe of discourse Uz j partitioned by fuzzy sets F jc , representing their linguistic terms, then the activation degree of the i-fuzzy rule is given by: p
h ik = μiF1c (z k1 ) μiF2c (z k2 ) · · · μiFpc (z k )
(2)
where the operator denotes the T-norms between the related membership functions. The normalized activation degree of the i-th rule is given by: γ i (z k ) =
h ik L (h ik )
(3)
i=1
The output of the TS fuzzy model is shown below: ⎧ L ⎪ ⎪ γ i (z k )xik+1 ⎨x˜ k+1 = i=1
L ⎪ ⎪ ⎩y˜ k = γ i (z k )yik i=1
Thus, replacing the Eq. (1) in Eq. (4), it gives:
(4)
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm …
299
⎧ L L ⎪ ⎪ Ai γ i (z k )xik + Bi γ i (z k )uk ⎨x˜ k+1 = i=1
i=1
L L ⎪ ⎪ ⎩y˜ k = Ci γ i (z k )xik + Di γ i (z k )uk i=1
(5)
i=1
2.1 Fuzzy C-Means Clustering Algorithm In the antecedent of fuzzy rules, clustering algorithms can be used to classify a data set and organize them according to their similar characteristics. Given a data set Z = [z k k = 1, 2, . . . , N ], with N observations of the plant, p measured variables p and a p-dimensional column vector z = [z k1 , z k2 , . . . , z k ]T , i.e. [25]: ⎤ ⎡ 1 1 z 1 z 2 · · · z 1N ⎢ z 12 z 22 · · · z 2N ⎥ ⎥ ⎢ (6) Z=⎢ . . . .. ⎥ . . . ⎣ . . . . ⎦ p
p
p
z1 z2 · · · z N
The processing of the grouped data set can be done recursively or in batch mode. The first one is characterized by processing the data online at any time, being more feasible with real-time applications. The second is characterized by processing the entire set of data offline. In this work, the Fuzzy C-Means (FCM) algorithm is approached in batch [26]. The main objective of the FCM is to find a membership matrix U = [μ1 ; μ2 ; · · · ; μc ],
(7)
with U ∈ c×N , where c is the number of clusters and N is the number of data points, and a centers matrix (8) V = [v1 ; v2 ; · · · ; vc ], with vc ∈ p and p the dimensionality of a data set z k , such that [27]: Jη (Z, U, V) =
c N
(μi (z k ))η (z k − vi )2
(9)
k=1 i=1
where Jη is an objective function to be minimized, μi (z k ) is the membership function of the k-th data point in the i-th cluster, η ∈ (1, ∞) is a weighting constant that controls the degree of fuzzy overlap and z k − vi is the Euclidian distance between z k and vi . Assuming that z k − vi = 0, ∀ 1 ≤ k ≤ N and ∀ 1 ≤ i ≤ c, then U and V is a local minimum for Jη only if:
300
J. Sampaio Silveira Júnior and E. B. Marques Costa
⎛ c
z k − vi
2 ⎞ η−1
⎟ ⎜ i=1 ⎟ μi (z k ) = ⎜ ⎝ z k − vi ⎠ where
N (μi (z k ))η z k vi = k=1 N i η k=1 (μ (z k ))
(10)
(11)
The FCM algorithm performs several iterations in order to reduce as much as possible the objective function defined in Eq. (9) until either Eqs. (12) or (13) is satisfied: U(l+1) − U(l) < ξ
(12)
Jη(l+1)
(13)
−
Jη(l)
s as [22]: ¯ k = ϑ¯ k ϕ k Y
(40)
that, through batch least squares, becomes ¯ i ϕ T [ϕ i ϕ T ]−1 ϑ¯ = Y
(41)
¯ = [ys+1 , ys+2 , . . . , yk ] is the output matrix, ϑ¯ k = [ϑ 1k , ϑ 2k , . . . , ϑ kL ] is the where Y vector with the fuzzy Markov parameters of the observer from all linear local models, ϕ = [ϕ s+1 , ϕ s+2 , . . . , ϕ k ] is the fuzzy regressors matrix and i is the fuzzy diagonal weighting matrix from the i-th rule, defined in Eq. (24). The fuzzy Markov parameters of the system for each linear local model are obtained by solving the following equations: i (1)
i (2)
¯ k +M ¯ k Mi0 + Mik = M
k−1
(2)
¯ ij Mik− j−1 ; para k = 1, 2, . . . , s M
(42)
j=0
Mik = −
k−1 j=0
(2)
¯ ij Mik− j−1 ; para k > s M
(43)
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm … (1)
(2)
(1)
305
(2)
¯ ik = [M ¯ ik , −M ¯ ik ]. Thus, M ¯ ik ∈ m×r and M ¯ ik ∈ m×m are partitions of where M ¯ ik used to find the fuzzy Markov parameters of the system through the the matrix M r fuzzy Markov parameters of the observer. Therefore, the fuzzy Markov parameters of the system can be defined as following: Mi0 = Di Mij
= C (A ) i
i j−1
B , j = 1, 2, . . . , s i
(44) (45)
2.4 Fuzzy Eigensystem Realization Algorithm The Fuzzy Eigensystem Realization Algorithm (Fuzzy ERA) can be used by considering both the estimation of the fuzzy Markov parameters of the system through pulse-response histories or a state observer, since the two methods share the same purpose, which is to obtain the matrices Ai , Bi and Ci of the system to be analyzed. Thus, to find these matrices for each linear local model, or each rule, the generalized Hankel matrix Hi0 ∈ ρm×σ r is defined by means of fuzzy Markov parameters of the system as follows: ⎤ ⎡ i M1 Mi2 · · · Miσ ⎢ Mi2 Mi3 · · · Miσ +1 ⎥ ⎥ ⎢ i (46) H0 = ⎢ . ⎥ .. . . .. ⎦ ⎣ .. . . . Miρ Miρ+1 · · · Miρ+σ −1 where σ and ρ demonstrate the dimensionality or indirectly delimit the amount of fuzzy Markov parameters of the system. Replacing the Eq. (45) in Eq. (46), it gives: ⎤ · · · Ci (Ai )σ −1 Bi ⎢ · · · Ci (Ai )σ Bi ⎥ ⎥ ⎢ Hi0 = ⎢ ⎥ .. .. ⎦ ⎣ . . i i ρ−1 i i i ρ+σ −2 i B C (A ) B · · · C (A ) ⎡
Ci Bi Ci Ai Bi .. .
(47)
The generalized Hankel matrix can be rewritten in terms of the observability matrix Piρ and the controllability matrix Qiσ , as follows: ⎡
Ci Ci Ai .. .
⎤
⎢ ⎥ ⎢ ⎥ i i i Hi0 = ⎢ ⎥ [B , A B , . . . , (Ai )σ −1 Bi ] ⎣ ⎦ i i ρ−1 C (A )
(48)
Hi0 = Piρ Qiσ
(49)
306
J. Sampaio Silveira Júnior and E. B. Marques Costa
The number n of nonzero singular values of Hi0 is equal to the ranks of Pρ and Qiσ and it is the maximum order of an observable and controllable system corresponding to the fuzzy Markov parameters in Eq. (46). Decomposing Hi0 in singular values, it gives: (50) Hi0 = Ui Σ i (Vi )T where the columns of the matrices Ui and Vi are orthonormal and Σ i a rectangular matrix i Σn 0 Σi = (51) 0 0 ⎡
with
σ1i ⎢0 ⎢ Σ in = ⎢ . ⎣ ..
0 ··· σ2i · · · .. . . . . 0 0 ···
⎤ 0 0⎥ ⎥ .. ⎥ .⎦
(52)
σni
In Eq. (52), σ1i > σ2i > · · · > σni > 0 are the n most significant values of Hi0 , considering that above the order n there are less significant singular values. Defining Uin and Vin the first n columns of Ui and Vi , respectively, the matrix Hi0 becomes: Hi0 = Uin Σ in (Vin )T
(53)
Examining the Eqs. (48), (49) and (53), it gives: Hi0 = [Uin (Σ in )1/2 ][(Σ in )1/2 (Vin )T ] ≈ Piρ Qiσ
(54)
The approximation of Eq. (54) is useful in cases where there is noise and very small singular values. To compute the matrix Ai , it shifts Hi0 as below: ⎤ Ci Ai Bi · · · Ci (Ai )σ Bi ⎢ Ci (Ai )2 Bi · · · Ci (Ai )σ +1 Bi ⎥ ⎥ ⎢ Hi1 = ⎢ ⎥ .. .. .. ⎦ ⎣ . . . Ci (Ai )ρ Bi · · · Ci (Ai )ρ+σ −1 Bi
(55)
Hi1 = Piρ Ai Qiσ = Uin (Σ in )1/2 Ai (Σ in )1/2 (Vin )T
(56)
⎡
Solving Eq. (56) for Ai , it gives: Ai = (Σ in )−1/2 (Uin )T Hi1 Vin (Σ in )−1/2
(57)
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm …
307
Fig. 1 The quadrotor aerial robot Parrot AR.Drone 2.0: a aerial view; b the robot as a black-box system
The matrices Bi and Ci are obtained through Eqs. (48) and (54): Bi = first r columns of Qiσ = (Σ in )1/2 (Vin )T C = first m rows of i
Piρ
=
Uin (Σ in )1/2
(58) (59)
3 Implementation of the Proposed Methodologies to Parrot AR.Drone In order to evaluate the performance, efficiency and applicability of the proposed methodologies, the stages of identification and validation of the quadrotor aerial robot Parrot AR.Drone 2.0, shown in Fig. 1, are presented. It is represented as a black-box system, where the laws and physical parameters of this type of system are unknown and, therefore, one has an interest in obtaining the input and output experimental data. In this figure, φr and θr represent the angular rates of roll and pitch,1 in rad · s−1 , vs is the vertical reference speed, in m · s−1 , and vx , v y and vz are linear velocities, in m · s−1 , associated with lateral movement (right and left), longitudinal movement (forward and backward) and vertical movement (up and down), respectively, with real values obtained by the states robot. 1 In
this context, roll and pitch are Euler angles (with yaw angle) that represent the rotational movements of an object around an axis in the cartesian coordinate system.
308
J. Sampaio Silveira Júnior and E. B. Marques Costa
For the estimation of the proposed models, a toolbox called AR Drone Simulink Development-Kit V1.1, developed by David Sanabria, was used [30]. Based on this toolbox, two experiments were performed to collect the input and output data: one for the identification stage, with N = 3000 sample data, and another for the validation stage, with N = 1500 sample data [31].
3.1 Identification Stage Results In order to implement the two proposed methodologies, the following parameters of the Fuzzy C-Means Clustering algorithm were considered: number of clusters c = 2, weighting constant η = 1.25 and minimum tolerance specified ξ = 10−5 . The fuzzy rule base used to model the presented quadrotor aerial robot, which is a multivariable nonlinear system of fast dynamics, is defined as follows: R i = IF z k1 is F11,2 AND z k2 is F21,2 AND z k3 is F31,2 xi = Ai xi + Bi uk THEN ik+1 i i k i yk = C xk + D uk
(60)
where i = 1, 2, . . . , 8 rules, z k1 = [u 1,k ]T , z k2 = [u 2,k ]T and z k3 = [u 3,k ]T . The fuzzy sets of the first cluster are “Z” type membership functions given by
1 F p| p=1,2,3
⎧ 0, if u p,k ≤ a ⎪ ⎪ ⎪ 2 ⎪ ⎪ ⎪ ⎪1 − 2 u p,k − b , if a ≤ u p,k ≤ a + b ⎨ b −a 2 = ⎪ u p,k − a 2 a+b ⎪ ⎪ ≤ u p,k ≤ b 2 , if ⎪ ⎪ b−a 2 ⎪ ⎪ ⎩ 1, if u p,k ≤ b
(61)
and the fuzzy sets of the second cluster are “S” type membership functions given by
2 F p| p=1,2,3
⎧ a ⎪ ⎪0,if u p,k ≤ ⎪ 2 ⎪ ⎪ − a u a+b p,k ⎪ ⎪ , if a ≤ u p,k ≤ ⎨2 b− a 2 = ⎪ u p,k − b 2 a+b ⎪ ⎪ ≤ u p,k ≤ b 1−2 , if ⎪ ⎪ b−a 2 ⎪ ⎪ ⎩ 1, if u p,k ≤ b
(62)
where a and b are constants representing the extremes of the sloped portion of the curves.
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm …
309
The fuzzy sets from rule base of the Takagi-Sugeno fuzzy model were estimated automatically by FCM algorithm, resulting in two fuzzy sets associated to each one of the three linguistic variables, totaling c3 = 8 rules. In Fig. 2, fuzzy sets estimated via FCM and optimized after successive fine tunings (with final values of a = 1 and b = 1) are presented, in order to reduce the error between the fuzzy model and the real aerial robot. It is observed that the optimized fuzzy sets have a higher degree of fuzzification than the fuzzy sets identified and both are complementary. After obtaining the fuzzy sets (membership functions) associated with each linguistic variable, the normalized activation degree is calculated from the Eqs. (2) and (3). Then, successive tests were performed by varying the main parameters of the OKID/ERA algorithm, that is, the variables s, ρ and σ , in order to obtain the best values through the classic method of trial and error. After observing the values of the system fuzzy Markov parameters obtained, it is noted that the number of parameters with the most significant singular values should be around ρ + σ , as shown in Eqs. (46) and (55). Thus, through this method of manual tuning, the following values for the fuzzy identification of the two proposed methods were considered: s = 1, ρ = 25 and σ = 25. Checking the first ten singular values of the Hankel matrix for all fuzzy rules or submodels shown in Fig. 3, it is seen that there are only three most significant singular values. Therefore, the chosen order of the system is n = 3.
(b) 1
1
0.8
0.8
0.6
0.6
U1
U1
(a)
0.4 0.2 0 -1 -0.8 -0.6 -0.4 -0.2
0.2 0
0.2 0.4 0.6 0.8
0 -1 -0.8 -0.6 -0.4 -0.2
1
u1 1
1
0.8
0.8
0.6
0.6
0.4 0.2 0 -1 -0.8 -0.6 -0.4 -0.2
1
0.2 0.4 0.6 0.8
1
0.2 0.4 0.6 0.8
1
0.2 0
0.2 0.4 0.6 0.8
0 -1 -0.8 -0.6 -0.4 -0.2
1
0
u2
1
1
0.8
0.8
0.6
0.6
U3
U3
0.2 0.4 0.6 0.8
0.4
u2
0.4 0.2 0 -1 -0.8 -0.6 -0.4 -0.2
0
u1
U2
U2
0.4
0.4 0.2
0
u3
0.2 0.4 0.6 0.8
1
0 -1 -0.8 -0.6 -0.4 -0.2
0
u3
Fig. 2 Membership functions: a obtained through FCM algorithm; b optimized through fine tunings
310
J. Sampaio Silveira Júnior and E. B. Marques Costa
2
2
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1
0.5
1
0.5
0.5
0 5
10
0
Rule 1
5
5
10
5
5
5
0 0
10
1
0.5
0 0
10
5
0
10
5
10
Rule 8
Rule 7
Rule 6
Rule 5
Rule 4
1
0.5
0 0
10
1
0.5
0 0
Rule 3
Rule 2
1
0.5
0 0
10
1
0.5
0
0 0
1
diag( )
2
diag( )
2
diag( )
2
diag( )
2
diag( )
2
diag( )
2
diag( )
diag( )
(a)
2
2
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1.5
1
1
0
5
Rule 1
10
0
5
Rule 2
10
0
5
Rule 3
10
1
5
Rule 4
10
0
5
Rule 5
10
1
0
0 0
5
Rule 6
10
1
0.5
0.5
0
0 0
1
0.5
0.5
0
0
0
0
1
0.5
0.5
0.5
0.5
1
diag( )
2
diag( )
2
diag( )
2
diag( )
2
diag( )
2
diag( )
2
diag( )
diag( )
(b)
0
5
Rule 7
10
0
5
10
Rule 8
Fig. 3 Singular values of the Hankel matrix for each submodel of the quadrotor aerial robot: a method 1; b method 2
Calculating the matrices Ai , Bi , Ci and Di of the two identification methods presented, it was noted that for each i-th rule (i = 1, 2, . . . , 8), for rounding, these matrices had elements with the same values. This means that, in some systems, the two methods can be equivalent, regardless of whether these systems have decoupled or coupled data. Thus, the fuzzy rule base updated from the Eq. (60), with the matrices of the calculated state space, is detailed in Eqs. (63)–(70). R 1 = IF z k1 is F11 AND z k2 is F21 AND z k3 is F31 ⎡ ⎤ ⎡ ⎤ ⎧ 0.9661 −0.0009 −0.0177 −0.0029 −0.3814 −0.0114 ⎪ ⎪ ⎪ ⎢ ⎥ ⎢ ⎥ ⎪ ⎪ x1k+1 = ⎣0.0006 0.9752 −0.0106⎦ x1k + ⎣ 0.2784 −0.0018 −0.013 ⎦ uk ⎪ ⎪ ⎪ ⎨ 0.0037 0.0001 0.9076 0.0027 0.0015 −0.3128 ⎡ ⎤ ⎡ ⎤ THEN ⎪ 0.3813 0.0051 0.0694 0.0709 0.2036 0.0389 ⎪ ⎪ ⎪ 1 ⎢ ⎥ ⎢ ⎥ ⎪ ⎪ y = ⎣−0.0046 0.2786 0.0318 ⎦ x1k + ⎣−0.0195 −0.0134 0.0022 ⎦ uk ⎪ ⎪ ⎩ k 0.0082 −0.0002 −0.3043 0.005 0.008 −0.0075
(63)
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm …
311
R 2 = IF z k1 is F11 AND z k2 is F21 AND z k3 is F32 ⎡ ⎤ ⎡ ⎤ ⎧ 0.9666 0.0014 −0.009 0.0075 0.3799 −0.0065 ⎪ ⎪ ⎪ ⎢ ⎥ 2 ⎢ ⎥ ⎪ 2 ⎪ ⎪ ⎪xk+1 = ⎣−0.0004 0.9741 0.0195 ⎦ xk + ⎣−0.2823 0.0044 0.0273 ⎦ uk ⎪ ⎨ 0.0009 −0.0017 0.9029 −0.0049 0.0021 −0.3116 ⎡ ⎤ ⎡ ⎤ THEN ⎪ −0.38 0.0009 −0.036 0.0409 0.1706 −0.0353 ⎪ ⎪ ⎪ ⎢ ⎥ ⎢ ⎥ ⎪ ⎪ y2 = ⎣0.0038 −0.2834 0.0603 ⎦ x2k + ⎣−0.0255 −0.0084 0.0141 ⎦ uk ⎪ ⎪ ⎩ k 0.0039 −0.0012 −0.3044 0.0027 0.0026 −0.0066
(64)
R 3 = IF z k1 is F11 AND z k2 is F22 AND z k3 is F31 ⎡ ⎡ ⎤ ⎤ ⎧ 0.9666 0.0014 −0.009 0.0074 0.3799 −0.0064 ⎪ ⎪ ⎪ ⎢ ⎢ ⎥ ⎥ ⎪ ⎪ x3k+1 = ⎣−0.0004 0.9741 0.0195 ⎦ x3k + ⎣−0.2823 0.0044 0.0273 ⎦ uk ⎪ ⎪ ⎪ ⎨ 0.0009 −0.0017 0.9029 −0.0049 0.0021 −0.3116 ⎡ ⎤ ⎡ ⎤ THEN ⎪ −0.3799 0.0009 −0.0358 0.041 0.1706 −0.0351 ⎪ ⎪ ⎪ ⎢ ⎥ ⎢ ⎥ ⎪ ⎪ y3 = ⎣ 0.0037 −0.2834 0.0603 ⎦ x3k + ⎣−0.0255 −0.0084 0.0141 ⎦ uk ⎪ ⎪ ⎩ k 0.0039 −0.0012 −0.3045 0.0028 0.0026 −0.0066
(65)
R 4 = IF z k1 is F11 AND z k2 is F22 AND z k3 is F32 ⎡ ⎡ ⎤ ⎤ ⎧ 0.9654 0.0012 −0.0229 −0.0005 0.388 −0.0289 ⎪ ⎪ ⎪ ⎢ ⎢ ⎥ ⎥ ⎪ ⎪ x4k+1 = ⎣−0.0002 0.9751 0.0103 ⎦ x4k + ⎣−0.2791 −0.0009 0.0118 ⎦ uk ⎪ ⎪ ⎪ ⎨ 0.0029 −0.002 0.9029 −0.003 −0.0077 −0.3132 ⎡ ⎡ ⎤ ⎤ THEN ⎪ −0.3888 0.0054 −0.0889 −0.0029 0.1846 −0.0743 ⎪ ⎪ ⎪ ⎢ ⎢ ⎥ ⎥ ⎪ ⎪ y4 = ⎣−0.0033 −0.2792 0.0325 ⎦ x4k + ⎣−0.0139 −0.0077 0.0084 ⎦ uk ⎪ ⎪ ⎩ k 0.0052 −0.004 −0.3 −0.0034 0.0005 0.001
(66)
R 5 = IF z k1 is F12 AND z k2 is F21 AND z k3 is F31 ⎡ ⎤ ⎡ ⎤ ⎧ 0.9666 0.0014 −0.0089 0.0074 0.3799 −0.0064 ⎪ ⎪ ⎪ ⎢ ⎥ ⎢ ⎥ ⎪ ⎪ x5k+1 = ⎣−0.0004 0.9741 0.0195 ⎦ x5k + ⎣−0.2823 0.0044 0.0273 ⎦ uk ⎪ ⎪ ⎪ ⎨ 0.0009 −0.0017 0.9029 −0.0049 0.0021 −0.3116 ⎡ ⎡ ⎤ ⎤ THEN ⎪ −0.3799 0.0009 −0.0357 0.041 0.1706 −0.035 ⎪ ⎪ ⎪ ⎢ ⎢ ⎥ ⎥ ⎪ ⎪ y5 = ⎣ 0.0037 −0.2834 0.0603 ⎦ x5k + ⎣−0.0255 −0.0084 0.0141 ⎦ uk ⎪ ⎪ ⎩ k 0.0039 −0.0012 −0.3045 0.0028 0.0026 −0.0066
(67)
312
J. Sampaio Silveira Júnior and E. B. Marques Costa
R 6 = IF z k1 is F12 AND z k2 is F21 AND z k3 is F32 ⎡ ⎤ ⎡ ⎤ ⎧ 0.9654 0.0012 −0.0229 −0.0005 0.388 −0.0289 ⎪ ⎪ ⎪ ⎢ ⎥ 6 ⎢ ⎥ ⎪ 6 ⎪ ⎪ ⎪xk+1 = ⎣−0.0002 0.9751 0.0103 ⎦ xk + ⎣−0.2791 −0.0009 0.0118 ⎦ uk ⎪ ⎨ 0.0029 −0.002 0.9029 −0.003 −0.0077 −0.3132 ⎡ ⎤ ⎡ ⎤ THEN ⎪ −0.3888 0.0054 −0.089 −0.0028 0.1846 −0.0743 ⎪ ⎪ ⎪ ⎢ ⎥ ⎢ ⎥ ⎪ ⎪ y6 = ⎣−0.0033 −0.2792 0.0325 ⎦ x6k + ⎣−0.0139 −0.0077 0.0084 ⎦ uk ⎪ ⎪ ⎩ k 0.0052 −0.004 −0.3 −0.0034 0.0005 0.001
(68)
R 7 = IF z k1 is F12 AND z k2 is F22 AND z k3 is F31 ⎡ ⎡ ⎤ ⎤ ⎧ 0.9654 0.0012 −0.0229 −0.0005 0.388 −0.0289 ⎪ ⎪ ⎪ ⎢ ⎢ ⎥ ⎥ ⎪ ⎪ x7k+1 = ⎣−0.0002 0.9751 0.0103 ⎦ x7k + ⎣−0.2791 −0.0009 0.0118 ⎦ uk ⎪ ⎪ ⎪ ⎨ 0.0029 −0.002 0.9029 −0.003 −0.0077 −0.3132 ⎡ ⎤ ⎡ ⎤ THEN ⎪ −0.3888 0.0053 −0.089 −0.0026 0.1846 −0.0744 ⎪ ⎪ ⎪ ⎢ ⎥ ⎢ ⎥ ⎪ ⎪ y7 = ⎣−0.0032 −0.2792 0.0325 ⎦ x7k + ⎣ −0.014 −0.0077 0.0084 ⎦ uk ⎪ ⎪ ⎩ k 0.0052 −0.004 −0.3 −0.0034 0.0004 0.001
(69)
R 8 = IF z k1 is F12 AND z k2 is F22 AND z k3 is F32 ⎡ ⎡ ⎤ ⎤ ⎧ 0.9663 0.0004 −0.0152 −0.0262 0.3856 −0.0179 ⎪ ⎪ ⎪ ⎢ ⎢ ⎥ ⎥ ⎪ ⎪ x8k+1 = ⎣−0.0013 0.9746 0.0092 ⎦ x8k + ⎣−0.2802 −0.0149 0.0093 ⎦ uk ⎪ ⎪ ⎪ ⎨ 0.0015 −0.0014 0.9018 −0.0002 −0.0028 −0.3142 ⎡ ⎡ ⎤ ⎤ THEN ⎪ −0.386 0.0171 −0.0603 −0.1065 0.2289 −0.0258 ⎪ ⎪ ⎪ ⎢ ⎢ ⎥ ⎥ ⎪ ⎪ y8 = ⎣−0.0256 −0.2801 0.0278 ⎦ x8k + ⎣−0.0141 0.0023 0.0021 ⎦ uk ⎪ ⎪ ⎩ k 0.0019 −0.0043 −0.3077 −0.0041 0.0067 −0.0012
(70)
3.2 Validation Stage Results To evaluate the Takagi-Sugeno fuzzy models obtained in the identification stage, some performance metrics are considered. One of them is Root Mean Square Error (RMSE), which compares the predictions of the model based on the temporal mean of the signal, resulting in an ideal model the closer it approaches 0 (zero). It is given by:
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm …
313
v x (m/s)
4 2 0 -2 -4
Real Output Estimated Output
0
500
1000
1500
Sample
v y (m/s)
4 2 0 -2 Real Output Estimated Output
-4 0
500
1000
1500
Sample v z (m/s)
1 0.5 0 -0.5 Real Output Estimated Output
-1 0
500
1000
1500
Sample
Fig. 4 Validation of method 1 through the comparison between the real outputs of the aerial robot and the outputs of the obtained fuzzy model
RMSE = !
N 1 (y − y˜ k )2 N k=1 k
(71)
where y is the real output of the system and y˜ is the estimated output for each sample k = 1, 2, . . . , N . Another very important metric to measure the mean square correlation between two signals is the Variance Accounted For (VAF), based on the principle of variance between the signals, in a percentage form. VAF is given by: var (y − y˜ ) × 100 VAF(%) = 1 − var (y)
(72)
where y is the real output of the system and y˜ is the estimated output. Thus, a comparison is made between the real outputs of the aerial robot and the outputs of the fuzzy model, as shown by Fig. 4 for method 1 and Fig. 5 for method 2. Finally, in Table 1, the RMSE and VAF metrics are used to validate the proposed methodology, comparing the results of the identified and optimized fuzzy models (according to the membership functions of Fig. 2), for two identification methods described above. As it is seen in Table 1, the identified model directly using the Fuzzy C-Means Algorithm did not show satisfactory results for the two identification methods. The desired values for the RMSE metric should be close to zero (minimum error), and
314
J. Sampaio Silveira Júnior and E. B. Marques Costa
v x (m/s)
4 2 0 -2 -4
Real Output Estimated Output
0
500
1000
1500
Sample
v y (m/s)
4 2 0 -2 -4
Real Output Estimated Output
0
500
1000
1500
Sample v z (m/s)
1 0.5 0 -0.5 -1
Real Output Estimated Output
0
500
Sample
1000
1500
Fig. 5 Validation of method 2 through the comparison between the real outputs of the aerial robot and the outputs of the obtained fuzzy model Table 1 Validation of the fuzzy model obtained through performance metrics. Model Outputs Method 1 Method 2 RMSE VAF (%) RMSE VAF (%) Identified
y3
1.8427 × 10111 2.3192 × 10111 6.0407 · 10111
y1 y2 y3
0.1775 0.1008 0.0596
y1 y2
Optimized
−4.9819 × 10224 −6.2571 × 10224 −3.3496 × 10226 95.3720 98.8377 96.7525
4.3476 × 1036 −2.7369 × 1075 36 7.9768 × 10 −7.3054 × 1075 36 1.0377 × 10 −9.7560 × 1074 0.1775 95.3720 0.1008 098.8377 0.0596 96.7525
values for the VAF metric should be close to 100% (minimum variance). Thus, it was justified the use of fine tunings for the membership functions, as evidenced in the same Table, with the values of the performance metrics for the optimized model.
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm …
315
4 Final Remarks 4.1 Conclusions In this paper, two fuzzy modelling methodologies based on OKID/ERA algorithm applied to quadrotor aerial robots. These methods can tackle quadrotor aerial robots complexities, such as highly nonlinear dynamics, multivariable coupled input and output variables, parametric uncertainty, fast dynamic, naturally unstable and underactuated system. The proposed methods presented good performances for a case study of identification of the Parrot Ar.Drone 2.0 aerial robot. It was observed that, using directly the pertinence functions obtained with the Fuzzy C-Means Clustering Algorithm, it was not possible to obtain a good model, as evidenced by the RMSE and VAF performance metrics values of the identified model, through Table 1, making it crucial to use the optimization of these membership functions via successive fine adjustments (or other optimization techniques). From this, the two methods of identification proposed were efficient in obtaining a fuzzy model very close to a real system. The two methods of identification presented the same results, indicating that for the analyzed system these methods became equivalent, although they are used for different purposes. The first method, which estimates the consequent of rules through impulsive response, is a simpler case whose MIMO system can be controlled through a set of techniques directed to SISO systems and the multistep prediction can be used to directly set a model predictive controller, reducing computational cost for controller design. The second, which estimates the consequent of rules through a state observer, is more generic and complex, and is more suitable for MIMO systems and can be used easily for optimal and robust control design. The key difference between the two methods is that the first one considers the initial samples of the experimental data, while the second method disregards them. In addition, the obtained fuzzy model is of minimal order, confirming the idea that a nonlinear system can be decomposed into a set of linear local submodels in state-space.
4.2 Future Works • • • •
Design of TS fuzzy control based on the models proposed in this paper; Adapt the methodologies studied in this article to the recursive context; Design of TS fuzzy predictive control in state-space; Analyze other methods of fuzzy clustering in order to reduce possible optimizations or hands adjustments.
Acknowledgements This work was financed by Fundação de Amparo à Pesquisa e ao Desenvolvimento Científico e Tecnológico do Maranhão (FAPEMA) under UNIVERSAL-01298/17 and TIAC-06615/16 projects and was supported by Instituto Federal de Educação, Ciência e Tecnologia do Maranhão (IFMA). Also, the authors would like to thank Prof. Luís Miguel Magalhães Torres and Prof. Selmo Eduardo Rodrigues Júnior for the important contributions in this work.
316
J. Sampaio Silveira Júnior and E. B. Marques Costa
References 1. S. Gupte, P.I.T. Mohandas, J.M. Conrad, A survey of quadrotor unmanned aerial vehicles, in 2012 Proceedings of IEEE Southeastcon (IEEE, 2012), pp. 1–6 2. S.-J. Chung, A.A. Paranjape, P. Dames, S. Shen, V. Kumar, A survey on aerial swarm robotics. IEEE Trans. Robot. 34(4), 837–855 (2018) 3. J.S. Silveira Júnior, E.B.M. Costa, L.M.M. Torres, Multivariable fuzzy identification of unmanned aerial vehicles, in XXII Congresso Brasileiro de Automática (CBA 2018) (João Pessoa, Brasil, 2018), pp. 1–8 4. J.S. Silveira Júnior, E.B.M. Costa, Data-driven fuzzy modelling methodologies for multivariable nonlinear systems, in IEEE International Conference on Intelligent Systems (IS’18) (Funchal, Portugal, 2018), pp. 1–7 5. Y.B. Dou, M. Xu, Nonlinear aerodynamics reduced-order model based on multi-input Volterra series, in Material and Manufacturing Technology IV, volume 748 of Advanced Materials Research (Trans Tech Publications, 2013), pp. 421–426 6. S. Solodusha, K. Suslov, D. Gerasimov, A new algorithm for construction of quadratic Volterra Model for a non-stationary dynamic system. IFAC-PapersOnLine 48(11), 982–987 (2015) 7. Z. Wang, Z. Zhang, K. Zhou, Precision tracking control of piezoelectric actuator using a Hammerstein-based dynamic hysteresis model, in 2016 35th Chinese Control Conference (CCC) (2016), pp. 796–801 8. J. Kou, W. Zhang, M. Yin, Novel Wiener models with a time-delayed nonlinear block and their identification. Nonlinear Dyn. 85(4), 2389–2404 (2016) 9. H.K. Sahoo, P.K. Dash, N.P. Rath, Narx model based nonlinear dynamic system identification using low complexity neural networks and robust H∞ filter. Appl. Soft Comput. 13(7), 3324– 3334 (2013) 10. H. Liu, X. Song, Nonlinear system identification based on NARX network, in 2015 10th Asian Control Conference (ASCC) (2015), pp. 1–6 11. T. Xiang, F. Jiang, Q. Hao, W. Cong, Adaptive flight control for quadrotor UAVs with dynamic inversion and neural networks, in 2016 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI) (2016), pp. 174–179 12. Q. Ma, S. Qin, T. Jin, Complex Zhang neural networks for complex-variable dynamic quadratic programming. Neurocomputing 330, 56–69 (2019) 13. S. Zaidi, A. Kroll, NOE TS fuzzy modelling of nonlinear dynamic systems with uncertainties using symbolic interval-valued data. Appl. Soft Comput. 57, 353–362 (2017) 14. M. Sun, J. Liu, H. Wang, X. Nian, H. Xiong, Robust fuzzy tracking control of a quad-rotor unmanned aerial vehicle based on sector linearization and interval matrix approaches. ISA Trans. 80, 336–349 (2018) 15. E.B.M. Costa, G.L.O. Serra, Optimal recursive fuzzy model identification approach based on particle swarm optimization, in 2015 IEEE 24th International Symposium on Industrial Electronics (ISIE) (IEEE, 2015), pp. 100–105 16. G. Feng, A survey on analysis and design of model-based fuzzy control systems. IEEE Trans. Fuzzy syst. 14(5), 676–697 (2006) 17. T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications to modeling and control, in Readings in Fuzzy Sets for Intelligent Systems (Elsevier, 1993), pp. 387–403 18. E.B.M. Costa, G.L.O. Serra, Robust Takagi-Sugeno fuzzy control for systems with static nonlinearity and time-varying delay, in 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (2015), pp. 1–8 19. F. Sun, N. Zhao, Universal approximation for takagi-sugeno fuzzy systems using dynamically constructive method-siso cases, in 2007 IEEE 22nd International Symposium on Intelligent Control (2007), pp. 150–155 20. K. Zeng, N.-Y. Zhang, W.-L. Xu, A comparative study on sufficient conditions for TakagiSugeno fuzzy systems as universal approximators. IEEE Trans. Fuzzy Syst. 8(6), 773–780 (2000)
Fuzzy Modelling Methodologies Based on OKID/ERA Algorithm …
317
21. L.M.M. Torres, G.L.O. Serra, State-space recursive fuzzy modeling approach based on evolving data clustering. J. Control Autom. Electr. Syst. 29(4), 426–440 (2018) 22. D.S. Pires, G.L.O. Serra, Fuzzy Kalman filter modeling based on evolving clustering of experimental data, in 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (2018), pp. 1–6 23. P. Garcia-Aunon, M.S. Peñas, J.M.C. García, Parameter selection based on fuzzy logic to improve UAV path-following algorithms. J. Appl. Logic 24, 62–75 (2017) 24. G. Serra, C. Bottura, An IV-QR algorithm for neuro-fuzzy multivariable online identification. IEEE Trans. Fuzzy Syst. 15(2), 200–210 (2007) 25. R. Babuška, Fuzzy Modeling for Control. International Series in Intelligent Technologies (Kluwer Academic Publishers, 1998) 26. J. Bezdek, R. Erlich, W. Full, FCM: the fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984) 27. L.-X. Wang. A Course in Fuzzy Systems and Control (Prentice-Hall Press, 1999) 28. J.N. Juang, Applied System Identification (Prentice-Hall Inc., Upper Saddle River, 1994) 29. J.N. Juang, M. Phan, L.G. Horta, R.W. Longman, Identification of observer/Kalman filter Markov parameters—theory and experiments. J. Guidance Control Dyn 16, 320–329 (1993) 30. D. Sanabria, AR Drone Simulink Development-Kit V1.1 - File Exchange—MATLAB Central. Available at: http://bit.ly/AD2Toolbox (2014). Last accessed on 09 Jan. 2019 31. J.S. Silveira Júnior, ARDrone2Data. Available at: http://bit.ly/ARDrone2Data (2019). Last acessed on 23 Jan. 2019
A Generic Architecture for Cyber-Physical-Social Space Applications Stanimir Stoyanov, Todorka Glushkova, Asya Stoyanova-Doycheva, Jordan Todorov and Asya Toskova
Abstract This paper briefly presents a reference architecture called Virtual Physical Space. The purpose of the architecture is to adapt to the development of various Cyber-Physical-Social applications. In the paper, the basic components of the space are described in more detail. Adapting the proposed architecture to implement an intelligent personal touristic guide is also considered. Keywords Cyber-Physical-Social system · Virtual physical space · Ambient-Based modeling · Events · Deep learning · Touristic guide
1 Introduction Cyber-Physical Spaces (CPS) and Internet of Things (IoT) are closely related concepts. Despite certain differences, the integration of the virtual and physical worlds is common to both of them. By placing the person (the user) in the center of such spaces, they become Cyber-Physical-Social Spaces (CPSS). From a software architecture point of view, a CPSS includes a variety of components designed to provide effective support (effective help) to different user groups, taking into account changes in the physical environment. Effective software models for building a CPSS can be such as to support the creation of distributed, autonomous, contextually informed, intelligent software. To support e-learning at the Faculty of Mathematics and Informatics at the University of Plovdiv, the environment DeLC (Distributed eLearning Center) has been used for years [1, 2]. Although DeLC was a successful project for applying information and communication technologies in education, one of its major drawbacks is the lack of close and natural integration of its virtual environment with the physical world where the real learning process takes place [3]. CPSS and IoT paradigms reveal entirely new opportunities for taking into account the needs of disabled people, in our case disabled learners. For these reasons, in recent years, the DeLC environment S. Stoyanov (B) · T. Glushkova · A. Stoyanova-Doycheva · J. Todorov · A. Toskova University of Plovdiv “Paisii Hilendarski”, 24 Tzar Asen Str, 4000 Plovdiv, Bulgaria e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_14
319
320
S. Stoyanov et al.
has been transformed into a Virtual Education Space (VES) that operates as an IoT ecosystem [4, 5]. A component for presenting the location of objects of interest in the physical world was implemented using a formal ambient-oriented approach. The use of this component is demonstrated to assist disadvantaged learners [6]. Summing up the experience of constructing VES, we began developing a reference architecture known as Virtual Physical Space (ViPS) [7] that could be adapted to various CPSS-like applications such as a smart seaside city [8] or a smart tourist guide [9]. Implementing a CPSS application, a major challenge is the virtualization of “things” from the physical world, which are of interest to us. Moreover, account has to be taken of related events, time and spatial aspects. To present the spatial aspects of “things”, we have decided to use an ambient-oriented modeling approach. In this paper, we present the AmbiNet environment that supports the chosen approach. The rest of the paper is organized as follows: a short review of related works is considered in Sect. 2, which is followed by an overall description of the ViPS architecture in Sect. 3. The next three Sections present the main components of the space, namely an event model, a referent personal assistant, and an analytical subspace. Section 7 demonstrates the adaptation of ViPS to implement a touristic guide assistant. Finally, Sect. 8 concludes the paper.
2 Related Works In the specialized literature, CPSS-systems are most often seen in the context of smart cities [10]. The smart city uses knowledge or rules mined from Internet of Things sensor data to promote the development of the city. This brings new opportunities and challenges, such as low delay and real time. A smart city integrates a distributed sensor network, government initiatives on openly available city-centric data and citizens sharing and exchanging city-related messages on social networks [11]. In [12] is presented a comprehensive analysis of the concept of smart city and existing technologies and platforms have been made by proposing a model for intelligent urban architecture. In parallel, some weaknesses in the application of this architecture have also been identified in the article. In article [13] is presented an agricultural cyber-physical-social system (CPSS) serving agricultural production management, with a case study on the solar green house. The IoT-sensors from the physical world accumulate a tremendous amount of data that, together with those provided by users through their digital devices and social networks, can enable real-time analysis and informed decisions [14]. Extracting knowledge out of data, typically through large data analysis techniques, can help to build a virtual picture of life dynamics, which can enable intelligent applications and services to guide decision-making [15]. An essential aspect of any real system is the efficient use of energy resources. In [16], a multi-agent energy optimization system is presented. The multi-agent system is seen as a modeling tool that is easily extrapolated in the field of energy, smart city, smart agriculture and smart education systems, and can be further developed using the IoT paradigm.
A Generic Architecture for Cyber-Physical …
321
3 ViPS in a Nutshell ViPS is built as a reference architecture that can be adapted to different CPSS-like applications. For this purpose, the main issues to be addressed are the following: • Users are in the focus of attention; • Virtualization of the physical “things”; • Integration of the virtual and physical worlds. The concept of an IoT ecosystem lies in the notion of “things”—the basic building blocks that provide a connection between the physical world and the digital world of the Internet. To operate as part of such an ecosystem, a thing has to expose observational, sensory, acting, computational and processing capabilities. All these capabilities specify it as an autonomous, proactive entity that can share knowledge and information with other surrounding things for planning and decision-making in order to reach personal or shared aims. Furthermore, “things” need to be able to communicate seamlessly in a network. In this way, they form a hierarchical network structure of “things”. In line with this concept, the ViPS architecture (Fig. 1) provides virtualization of real objects that can be adapted for a particular domain. In the context of software technologies, this means creating digitized versions of physical real objects that can be specified and digitally interpreted. In this aspect, the architecture reflects and represents in the digital world an essentially identical model of the real physical world, in which processes, users and knowledge of the area of interest, as well as the interaction between them, are implemented in a dynamic, personalized and context-aware manner.
Fig. 1 Architecture of ViPS
322
S. Stoyanov et al.
Practically, the virtualization of “things” is supported by the ViPS middleware. The modeling of “things” takes into account factors such as events, time, space, and location. The Analytical Subspace provides means for the preparation of domainspecific analysis supported by three modeling components. The Event Engine models and interprets various types of events and their arguments (identification, conditions for occurrence and completion) representative of the field of interest. TNet (Temporal Net) provides an opportunity to present and work with temporal aspects of things, events and locations. In AmbiNet (Ambient Net), the spatial characteristics of the “things” and events can be modeled as ambients. The work with these structures is supported by specialized interpreters implemented as operative assistants based on the formal specifications Interval Temporal Logics (ITL [17]), Calculus of Contextaware Ambients (CCA [18]) and Event Model (EM [19]). Inferences and conclusions made by model interpretation use background knowledge and domain documents stored in the Digital Libraries Subspace implemented as open digital repositories. The OntoNet layer is a hierarchy of ontologies to represent the essential features of things. Furthermore, the relationships between the “things” are specified in the OntoNet. Due to the inherent complexity of an IoT ecosystem, it is difficult for users to work with it. Besides, the access to ViPS is controlled and personalized. Additionally, in ViPS, as a CPSS ecosystem, users are in the focus of attention. Personal assistants, operating on behalf of users and aware of their needs, will prepare scenarios for the execution of users’ requests and will manage and control their execution by interacting with the ViPS middleware. From an architectural point of view, this is reflected in the most essential components of space—Personal Assistants (PAs). PAs are closely related to the event model of space. A PA initiates the implementation of various scenarios in ViPS. Therefore, the PA lifecycle can be presented as a sequence of events related to the management of the user’s “personal timetable” known as PST (Personal Schedule Table). Furthermore, the events can be used for specification of policies that in turn present operative scenarios for the space. A PA is implemented as a rational agent with a BDI-like architecture consisting of two basic components— Deliberation and Means-Ends Reasoning [20]. During the deliberation, the PA forms its current goal while in the next component a plan for the goal’s achievement is prepared. Usually, the plan is launched when a corresponding event occurs, i.e. by means of the Event Engine. However, the PA also has to be able to operate in an “overtaking action” mode, i.e. it has an early warning system. In this way, certain operations must be performed before the event occurs. Therefore, the PA cannot rely on the Event Engine in the timeframe before the event occurs. Then, the PA is supported by the AmbiNet and TNet components. The interaction between the PA and the three supporting components is implemented by specialized interaction protocols. The operative assistants, implemented as rational intelligent agents, provide access to the resources of both subspaces and accomplish interactions with personal assistants and web applications. They are architectural components suitable for providing the necessary dynamism, flexibility, and intelligence. However, they are unsuitable to deliver the necessary business functionality in the space. For this reason, assistants
A Generic Architecture for Cyber-Physical …
323
work closely with services or micro-services. In Sect. 7, a service, implementing a model for deep learning, is explained in more detail. Furthermore, the operative agents interact with the guards to supply ViPS with data from the physical world. Guards operate as a smart interface between the virtual and the physical world. They provide data about the state of the physical world transferred to the virtual environment of the space (the two sub-spaces). There are multiple IoT Nodes integrated in the architecture of the guards that provide access to sensors and actuators of the “things” located in the physical world. The sensors-actuators’ sets are configured in accordance with the application. The communication in the guard system operates as a combination of a personal network (e.g. LoRa) and the Internet. The public information resources of the space are openly accessible through appropriate web applications that are usually implemented especially in the particular domain. Usually, when developing CPSS applications, only selected ViPS components are adapted to the particular domain. Subsequently, these new components are archived and saved in separate application libraries (DoLs) and become part of ViPS. In this way, it is possible to extend ViPS with each new app. In the following sections, the basic components of ViPS will be presented in more detail.
4 Event Model of ViPS The notion of “event” has a great importance for systems operating with occurrences. The earliest version of the event model defines the “event” concept as a basic principle and structure to organize access and synchronize various dynamic systems within ViPS. From the options suggested in [21], we can assume the following: “something that happens or is regarded as happening; an occurrence, especially one of some importance; the outcome, issue, or result of anything; something that occurs in a certain place during a particular interval of time”. Considering the broad range of definitions the term “event” can have, for the purposes of ViPS it is regarded as a phenomenon occurring (or accepted as having occurred) within a particular location and time interval, the effect of which affects the operation of ViPS. In other words, it is agreed that there are many kinds of events, but only those affecting ViPS are acknowledged and taken into account. Since events can occur both in the physical and the virtual world, we can accept a differentiation based on this criterion. Apart from that, there needs to exist a way for a symbolic representation in the virtual world (including for virtualization of physical events). Finding the required, suitable representation for this purpose is a challenging task. The large variety of events with their characteristics as well as the fact that events depend on different factors such as a domain of interest, context, and granularity allows the use of alternative approaches to define them. One option is the “bottom up” approach that aims for a simpler representation, where some events are defined as structures that are more complex and are built out of other elementary
324
S. Stoyanov et al.
events. Another possibility is the reverse of the “top down” approach attempting to characterize an event in more detail by using different attributes. We have considered two commonly preferred approaches to represent events: • Atomic events—on a lower level the events are represented as atomic structures without parameters. Complex events are built using the atomic ones. • Attributed events—they occupy a higher abstraction level, where every event is characterized by different attributes. While the existing event representation approaches usually favor one of the two aforementioned methods, the Space model relies on a hybrid approach that allows different components to operate and work with different event aspects. We accept that E is the set of events happening within ViPS and e is an event such as e = d, y, a, with founding characteristics such as a fictive identifier d, an event type y, and attributes a. We call the event e’ attributing if e ∈ a(e). The actual event representation can be done using a recursive structure; respectively, e is an attributed event. Let us have two events e’, e ∈ E such as e’ is attributing and e—attributed. We define the following two operations: • e’ ↑ e(fire)—an occurrence of e’ causes the happening of e; • e’ ↓ e(kill)—an occurrence of e’ terminates the event e; Let us have the two events e’ and e ∈ E and define the following terms: • e’|| e (independent events)—the event e’ does not “know” about the event e; • e’ → e (dependent events)—the event e’ premises the event e; in other words they are causally linked; The event model supports a classification like the following one: E = BE ∪ SE ∪ DE, where: • BE is the set of basic events, actual (time(Date, Hour), location); • SE is a subset of system events; • DE is a set of domain-dependent events. Alternatively, if we objectify at the most scalar level, we can think of every event as a set of attributes. Time (t) and location (l) can form the basic spatiotemporal identity of the event, but their use is optional. As mentioned earlier, there is an attribute (payload) section that might contain zero or more preliminarily defined attributes as well as freeform ones. The section can be also treated as a set termed P. For instance, there is an event Ex with some attributes like E x = {d, t, l, p1, p2, . . . , pn}. The payload section permits an unlimited amount of scalar values along with recursive inclusion of other event definitions called sub-events that must represent proper subsets: P = { p1, p2, pn, {dn, tn, ln, { pn, . . .}} . . .}, P ⊂ E x ∨ P = φ
A Generic Architecture for Cyber-Physical …
325
The aggregation of all definitions and their concrete values provides a uniquely identified complex event C preceded by other events becoming sub-events in its definition C = (A ∪ (B ∪ (. . . ∪ Z )) . . .). The possible complexity is additionally limited by the model by allowing only basic or system events as sub-events. Furthermore, basic primitives are provided, through which it is possible to execute comparison operations between simple or complex events. This can be done by specifying the individual members, relative to which the comparison will be made. For example, a simple comparison of two concrete events X and Y having a common integer field designating priority u can be made by referring to it (by its alias defined in the model) and the expected outcome M of the comparison: X = {u ∈ Z}, Y = {u ∈ Z}, M = {0}, Z = X × Y = {(x, y)|x ∈ X, y ∈ Y }, S = {−1, 0, 1}, f : Z → A, A = {a ∈ S}, | A| = 1, where : ⎧ ⎨ −1, x < y f (x, y) = 0, x = y ⎩ 1, x > y The result can be directly compared to the expected outcome that will indicate whether the desired conditions are met or not. In case of complex comparisons including sub-events, M has to be adjusted accordingly to match the structure of the source data, while f will be applied seamlessly to the members that need to be compared. Multiple events or their particular attributes can be combined through the use of logical operations like conjunction, disjunction, and negation. The above approximation does not show handling of any special situations that might occur due to the architectural and implementation specifics of the model, although they are realized in it using Java programming language. Such cases are, for instance, broken reflexivity from X to Y, from Y to X, and different and/or incomparable data types. Those cases are resolved in the model by assigning them special finite values—NOTCOMPARED = 2, UNKNOWN = 4, INCOMPARABLE = 8. Additionally, the event model defines the basic categories and a hierarchy of events. Logically organized, the domain events take the highest place in the hierarchy, followed by the system events, and lastly the basic events. In contrast, the programmatic implementation is done in nearly the reverse approach, relying on object-oriented inheritance techniques. The different events and event categories retain specific properties and data organization structures. Mechanisms for serialization and transportation via different broker messaging systems complement the model, forming the first version of the event engine of ViPS. The first version Event Engine 1.0 is an object-oriented implementation of the event model and engine used to standardize representation and manipulation of events, the occurrence of which affect the Space’s domain and operation. Although in its current form the model is flexible enough and relatively easy to integrate with most of the Space’s components, a certain majority of units operating on a higher abstraction level can benefit from a similar implementation based on the idea of
326
S. Stoyanov et al.
human practical reasoning, implemented as rational agents and an underlying group of behaviors. Most of the components operating in the ViPS are implemented as rational agents. While they can directly use the developed event model to communicate through events, the object-oriented nature of the engine and its mechanisms for event distribution in particular deviate the “agent” nature to a “service-based” one. In result, the agents are obliged to manually query at some interval the message broker system that is effectively delivering events through the event engine. If new events are available, the agent needs to decide which to ignore, analyze or react to accordingly and perhaps send back results by generating another event and forwarding it to the broker. All of the aforementioned steps need to be developed separately for the individual agent while taking into account that they have to be multiplexed with the inter-agent communication that happens all the time [22]. A possibility for important events to be able to interrupt the agent’s communication or vice versa will also be considered. Implementing the above requirements is usually a time-consuming process that is not very straightforward and requires careful planning of the agent’s architecture and apportioning its computational resources. The second version of the engine Event Engine 2.0 is based on its predecessor with the idea of keeping the existing representation model while improving the event distribution mechanism, making it more natural for use by rational agents, and thus reducing their development time and complexity. The implementation is again based on the Java programming language and relies in particular on the Java Agent Development Framework [23]. The final result is a library containing a set of behaviors, configuration utilities, communication protocols, an ontology, and a default realization of a rational agent with event brokering functionality. The Event Engine 2.0 supports a proactive management of events; instead of making every agent check for new events, the engine reverses the process by making new events to inform the interested parties of their existence. When the new event occurs, it is represented by its own agent, which announces its existence to the other agents in the system. With the increasing amount of events, however, this approach is rather inefficient even in environments highly optimized to support many agent instances. Instead, a single agent was designed to represent a whole category of events. Furthermore, the event model permits event categories to inherit each other, which allows filtering and reduction of the communication traffic only to those categories that a particular party is interested in. The more general category is selected, the more events are going to be received.
5 A Referent Personal Assistant The CPSS paradigm puts users in the spotlight. Therefore, from an architectural point of view, a central component of ViPS is a PA directly dedicated to assisting users. Generally, a PA operating in a specific ViPS adaptation must perform two main tasks: • Assistance—operational support to the user in her/his everyday activities;
A Generic Architecture for Cyber-Physical …
327
• Prevention—helping to carry out the necessary anticipatory actions and prevention. The prevention aims to provide optimal conditions for the user to participate in (or to benefit from) the upcoming event, including related to movement in the physical space and over time. In ViPS, PAs operate exclusively in the virtual space, taking into account the physical world that the user resides in. PAs are closely related to the event model. The states (situations), in which the user should be assisted, are presented as domain events. The architecture of a PA is presented in Fig. 2. PAs are also implemented as rational BDI agents. Accordingly, the life cycle of a PA consists of two phases—the first one termed deliberation and the second one named planning. The Deliberator is the component that generates an actual goal based on the mental states of the agent. Usually, the Deliberator interacts with the Event Engine. The Planner is responsible for preparing the plan that needs to be completed to reach the goal. Typically, the Planner interacts with AmbiNet and TNet to complete the planning activities. The operative assistants also support the implementation of plans and, if necessary, liaise with the Digital Libraries Subspace. The Initializer generates the profile of the user to be assisted by the PA. In some cases, it is possible to interact with the InHouse system of the actual domain. For example, when adapting to support students, the Initializer interacts with the university system. PST (Personal Schedule Table) is a fundamental structure in the PA architecture, which acts as a “timetable” of the users’ support. The entries in the table represent the domain events, in which the PA will assist the user: PST = (index, option, epsilon, delta) where: • index—entry points in the PST. The index uniquely identifies each record in the table. Base events are allowed as indexes, i.e. index = {date, time, location};
Fig. 2 Architecture of a PA
328
S. Stoyanov et al.
• option—options describe corresponding domain events. For example, in e-learning a domain option = {lecture, exam, lab, …}; • epsilon—a preventive interval of the corresponding option; epsilon plays the role of a metric to determine the intentions (goals) of the PA in a prevention mode; • delta—an event interval of the corresponding option; delta determines the users’ support during the actual domain event; delta plays the role of a metric to determine the intentions (goals) of the PA in an event mode. The PA and other ViPS components are implemented as intelligent agents; interaction protocols termed IPs (Interaction Protocols) are specified to ensure interaction between them.
6 Analytical Subspace According to the referenced architecture, the analytic subspace has two main functions. The first one is to provide the opportunity to model “things” and processes by taking into account factors such as events, time, space, and location. The second one is to provide tools for analysis, statistics, and suggestions to improve processes in the specific domain. The Analytical Subspace prepares the necessary analytical information using the TNet and AmbiNet components.
6.1 AmbiNet AmbiNet is a network structure modeling the corresponding domain (a smart city, tourist area, university campus, etc.). AmbiNet implements its functionalities through a network structure of “ambients”. The ambient is an identity that has the following characteristics [24]: – Restriction—a limited location where the considered calculation happens. – Inclusion—an ambient can be included in other ambients; ambient hierarchies can be created. – Mobility—an ambient can be moved from one location to another as a whole, together with its subambient structure. An ambient can be presented as a structure with the following elements: – Identifier (name)—a mandatory element, which, besides identification, also serves to control access. – Corresponding multitude of local agents (threads, processes)—these are computational processes that work directly in the ambient and in a sense control it.
A Generic Architecture for Cyber-Physical …
329
– A number of sub-ambients, which have identifiers and its own sub-ambient structure. Thus there is an opportunity for recursively building different complex ambient structures. Every ambient can communicate with another one around it. Ambients can exchange messages to each other. The process of messaging is done using the handshaking process. The notation “::” is a symbol for sibling Ambients; “↑” and “↓” are symbols for parent and child; “ “ means sending, and “()”-receiving a message. An ambient can be mobile, i.e., to move within the surrounding environment. With CCA, there are two ways to move: in and out, which allow an ambient to move from one location to another. Ambients model physical, abstract, or virtual objects of interest together with their spatial, temporal, and event attributes. The network can be parameterized to the dynamically changing physical or virtual environment through data obtained from the IoT nodes. Additionally, it can be adapted to different domains of interest. All of this gives us a reason to use the ambient modeling of processes run by the RPA. For the purposes of this study, we use the Calculus of Context-aware Ambients (CCA), which builds on the basic formalism of π-calculus [Milner] representing a type of process calculus. In CCA, four syntax categories can be distinguished: processes P, opportunities M, locations α and context expressions k. For the present study, we will look at two types of Ambients: abstract ambients (ex. a different kind of assistants) and Ambients that virtualize the objects from the real world. Depending on its location and momentary state, the second group of ambients may be static or dynamic. – A static ambient has a constant location in the physical world or in the modeled virtual reality; examples are hospitals, schools, universities, museums, etc. Because of their properties, the ambients can form a hierarchical structure. Thus a museum represents a whole static sub-structure from different expositions. – A dynamic ambient has a variable location in the current context; for example, cars, tourists, students, wheelchairs, etc. They can also have a hierarchical structure and can move from or into other static or dynamic ambients. When they change their location, the ambients move together with all of their “ambient-children”. Let A be the set of Ambients in the Space. An ambient a = ∈ A where: – name—is the ambient identifier; – location—the location of the ambient in the physical world; for an abstract ambient location = null; – type—a type of ambient (abstract, static, dynamic); – parent—a parental ambient; if none exists, then parent = null; – P(a)—the plurality of ambient processes; – attr—a set of additional ambient attributes. An ambient a1 is a child of ambient a, if parent = a. We denote a = parent (a1), i.e. the representation of the ambient hierarchy may be a recursive structure. Let a1, a ∈ A. We define the following two operations:
330
S. Stoyanov et al.
– a1↓ a (in)—ambient a1 enters into the boundaries of ambient a and becomes its child. Then a = parent(a1); – a1↑ a (out)—ambient a1 goes out of the boundaries of ambient a and becomes its sibling. Then parent(a) = parent(a1) Let us consider ambients a1 , a2 ∈ A. They can communicate with each other by sending and receiving messages as follows: a1
→
a2 , if a2 = parent(a1 )or a1 = parent(a2 ) or parent(a1 ) = parent(a2 ) (1)
Abstract ambients are mobile in the virtual space, but their physical location can be “none”. For example, for a1, a2, a3 and a4∈ A: – a1 = —represented Museum – a2 = —a smart wheelchair of a disabled tourist; – a3 = —an Embroidery Exposition in the Museum; – a4 = —an elevator that can move the tourist wheelchair to the desired exposition. Ambients a1 and a3 are static; a2 and a4 are dynamic; a1 has no parent in the current context (parent (a1) = null); a2, a3 and a4 are children of a1, i.e. a1 = parent (a2) = parent (a3) = parent (a4). The location of dynamic ambients may change. It is captured by appropriate sensors and transmitted dynamically. Let the tourist wheelchair go into the elevator. Then: a2 = . Then Lift = parent(a2) and loc_a2 = loc_a4; so recursively Museum = parent (parent (a2)). If ambient a2 wishes to send a message to ambient a3, this cannot be done directly because a2 and a3 do not fulfill any of the conditions (1) and are not in any of the accepted relationships. The transmission of the message can only be done by retransmission from the parent Ambients in the hierarchy. In this
→ a4 , because a4 = parent(a2 ); then a4 → a3 , since parent(a4 ) case a2 = parent(a3 ). AmbiNet is supported by the following implementation and development tools (Fig. 3): – AmbiNet Editor—an editor for visual modeling of test scenarios. The editor allows to present and describe the required ambients with their hierarchical structure, location, types, and attributes. Also, to introduce their processes and capabilities, which at certain values of the parameters, entering dynamically from the IoT nodes, trigger actions and/or send messages to the other Ambients according to the current scenario; – AmbiNet ccaPL interpreter—this is an interpreter of the official cclPL language based on context-aware Ambients (CCA) calculations. After presenting and describing the AmbiNet Editor’s structure, a simulator of the current scenario is displayed, which is visualized by a console (Console) or by a visual Animator.
A Generic Architecture for Cyber-Physical …
331
Fig. 3 AmbiNet components
– AmbiNet Route Generator—this component is used to generate routes (or plans) based on user-defined criteria. The route is a sequence of nodes (doors, elevators, staircases, etc.) that are modeled as ambients with their current location. Various search path algorithms can be used to generate routes; – AmbiNet Route Optimizer—it applies to the route optimization depending on the generated routes from the AmbiNet Route Generator and the instant state of the participating objects; – AmbiNet Repository—a library of already completed routes and plans.
6.2 TNet Interval Temporal Logic (ITL) is a flexible notation that defines and models timedependent processes. Unlike most temporal logic, ITL is suitable for presenting and modeling both consecutive and parallel processes and compositions. For the needs of the developed reference CPSS space based on the Tempura ITL-based interpretive mechanism, an agent-oriented interpreter AjTempura [25] was developed. The interpreter is an executable ITL subset that uses the ITL syntax and is based on its basic philosophy to perceive time as a final sequence of states, with each state being matched with significant (and available) variables representing the attributes of interest. The operative assistant SA_TNet collects data and constructs an ITL statement or formula reflecting the state of the environment (ViPS), which is sent for analysis to the TNet Analyser Module. After synchronizing the entered values and their verification from the TNet Procedure & Check Module, it is sent for processing to the
332
S. Stoyanov et al.
Fig. 4 TNet components
TNet Manager. After interpreting the contextual-dependent information processed by the TNet Interpreter, the information is provided in a dynamic mode to the ANet environment for route generation and optimization and for action plans. In order to verify the results obtained, a visual simulator was created, which automatically generates test ITL formulas and statements. The resulting conclusions and analyses are stored in a TNet Repository where they can be used as additional heuristic knowledge needed by AmbiNet to generate and optimize the search for appropriate user-specific plans and solutions (Fig. 4). The interaction between AmbiNet and TNet in the Analytical Subspace can be viewed in different ways. Let us turn our attention to the preventive planning phase. At this stage, the RPA Assistant must prepare a plan to achieve a specific goal by interacting with the Analytical Subspace to prepare an appropriate plan. Preparing the preventive plan usually uses the interface to the physical world. In particular, the plan is generated using AmbiNet, working in a modeling mode, i.e. supported by AmbiNet planning. In general, the plan is presented in the form of a possible route, which should ensure the presence of the user in the necessary space in time to be able to participate in the upcoming event. To explain this, we will look at a typical scenario for helping tourists with motor disabilities. For this group of users, the physical environment plays a fundamental role, and the convergence between the physical and the virtual space is of particular importance. Let us assume that the user, through his/her personal travel assistant, identifies the tourist objects he/she wishes to visit and sends a request to the Analytical Subspace to generate an appropriate route. Since both TNet and AmbiNet need upto-date information about the real world, they communicate with the Guards system and receive the current values of the monitored parameters. AmbiNet uses this data to generate a graph of available and connected Ambients that are important points for the movement of the disabled tourist such as ramps, escalators, elevators, etc. This information is needed on the AmbiNet Route Generator to generate possible routes. In parallel mode, this data is used by TNet to analyze the current state and logical rules that are sent periodically to AmbiNet as additional analytical knowledge of the
A Generic Architecture for Cyber-Physical …
Route, Plan
333
Analytical SubSpace TNet Analyze
Analytical Information, Knowledge
AmbiNet Generate
RPA
Actual Information Guards Fig. 5 Interaction between the ViPS components
current context (Fig. 5). For example, if TNet provides information that some of the areas that are important for the wheelchair movement at this time of day are usually overloaded, AmbiNet can take it into account when generating and optimizing the right route.
7 A Touristic Guide Assistant We will demonstrate the adaptation of ViPS for an application called a Tourist Guide. The Tourist Guide (TG) is implemented as a personal assistant that intends to support tourists in their tourist routes. It performs an inquiry of the users to extract their wishes and interests for different cultural and historical objects. It takes into consideration the user’s location, the location of preferred objects, the working time of museums and so on. Taking these conditions into account, the TG can generate two types of tourist routes: • A virtual route—in this case the Tourist Guide generates a route that is not compliant with different conditions. The user can see all the cultural or historical objects at any time. The TG shows the user information, pictures, or video about the objects. For this task the TG uses CHH-OntoNet; • A real rout—in this situation the TG generates a tourist route taking into account all the constraints and conditions of the real word, where the cultural and historical objects are located and the location of the user. The TG shows information about the objects, when the user is near them. For this task it uses the GPS of the mobile devices of the user and CHH-OntoNet. The Tourist Guide works both on stationary and mobile devices. It can inform the user about the local attractions, recommend different destinations, offer opportunities for travelling, and provide detailed information about different places, which are planned to visit or have already been seen.
334
S. Stoyanov et al.
Each object has two types of presentation on the guide’s server: • A cultural and historical object (CHH objects)—depending on the nature of the presentation, it includes different features in accordance with the CCO standard. • Ambient—for characterization of the location and condition as a physical feature in a real location (area) of a separate CHH object or a group of CHH objects, designated as an exposition.
7.1 TG Architecture In [26], the architecture of the TG is described in more detail. Here, this architecture is briefly presented. In reality, the TG is a multi-agent system including two basic components (Fig. 6): • A back-end component: it consists of different modules, distributed in two layers— a knowledge base and operational assistants performing the tasks of gathering information for the client’s needs and generating it in an appropriate cultural and historical route;
Fig. 6 General architecture of the TG
A Generic Architecture for Cyber-Physical …
335
• A front-end component: it consists of an intelligent assistant that takes care of presenting the route and object information to the client’s mobile device using the information generated by the operational assistants in the back-end layer. The active modules of the back-end component are the following assistants: • QGA (Questioner Generation Assistant)—the responsibility of this operational assistant is to generate and conduct a survey with the tourist to identify his or her preferences, wishes, and time available. The survey results are used to generate a tourist profile. • KGA (Knowledge Generation Assistant)—using the tourist profile, the assistant selects the elements of the primary route. The primary route elements are expositions or separate CHH objects. • CCAA (Calculus of Context-aware Ambients Assistant)—it generates a final route by completing the primary route with additional information such as the location and status of the expositions (or individual objects), the working time, etc. The assistant uses the ambient presentation of the CHH objects included in the primary route. In fact, the final route is a set of possible sequences for viewing the objects. The TGA (Touristic Guide Assistant) operates in the Front-end component and performs the following basic functions: • It serves as a tourist’s GUI—the tourist can only communicate with the guide through this assistant. This agent is responsible for the proper visualization of the information received from the operating agents on the client’s mobile device. It visualizes the questions that the QGA generates and returns the received answers back to it. It is responsible for visualizing the information about the various cultural and historical objects and for visualizing the route generated by the CCA. • Establishing the tourist location—by using the GPS capabilities of the client’s mobile device to determine his or her position. • Life Cycle Management—it prepares a “schedule” for visiting the cultural objects and follows its observance.
7.2 CHH-OntoNet CHH-OntoNet (Cultural and Historical Heritage Ontology Network) is a specific adaptation of OntoNet to serve the Tourist Guide Assistant. The CHH-OntoNet tourist guide’s server component is a repository implemented as a network of ontologies (Fig. 7). It has two main functionalities: • Defining expositions of interest to the tourist; • Preparing information about the objects included in the offered tourist route. Ontologies are developed in accordance with the Cataloging Cultural Objects (CCO) [27] standard, which contributes to the easy and convenient dissemination and
336
S. Stoyanov et al.
Fig. 7 Linked ontologies in the domain of the cultural and historical heritage of Bulgaria
sharing of data between different systems, communities, and institutions. Currently, there are ten ontologies that have been developed. Almost all of them represent different aspects of the CHH objects and one, titled Meta-ontology, that contains information about the other ones. The purpose of the Meta-ontology is to support working with the ontology network, especially when the survey is created. The ontologies that describe CHH objects are Costumes, Expositions, Museums, Objects, Materials, Locations, Folklore Regions, Agents, and Subjects. The division of the knowledge into multiple separated ontologies is important for two reasons. First, it makes it easier and more convenient to follow the requirements of the CCO standard. Second, it is an effective way of distributing maintenance and editing the knowledge in the ontologies. Also, adding new knowledge to one or more ontologies is relatively easy. Currently, the CCH objects stored in the ontologies are traditional Bulgarian costumes, along with information about the expositions and museums in Bulgaria where they are kept. One of the important things about the developed ontologies is that all of them are interrelated. For example, Costume is one of the main ontologies and contains descriptions of the Bulgarian costumes. It references knowledge from other ontologies—for example, clothes as a type of object (along with their basic characteristics) are described in the Objects ontology. The Objects ontology, in turn, uses concepts about materials (needed for the object creation). These terms are defined and described in the Materials ontology. The descriptions of the folklore regions of Bulgaria, represented in the Folklore Region ontology, use knowledge from the
A Generic Architecture for Cyber-Physical …
337
Locations ontology (cities within the regions, mountains, rivers, plains, and other geographical locations). The museums with their architecture and history are described in the Museums ontology. They are regarded as objects of cultural and historical heritage but also as the place where the expositions or costumes are preserved and can be seen. Accordingly, some of the knowledge needed to describe the museums is also defined in other ontologies, for example the materials for the construction of a museum are contained in the Materials ontology. Expositions with their characteristics and objects included in them are described in the Expositions ontology. The museums and exhibitions as an object were created by someone, i.e. knowledge from the Agents ontology is used and, by analogy to costumes, they are also defined in the Objects ontology. The Subjects ontology contains knowledge about the historical period of the described objects, which is used in the Costumes, Museums, and Expositions ontologies.
7.3 Recognition of Folklore Elements Service (RFES) A possible extension of the tourist guide is its training on the basis of the sights seen by the user. By pointing the phone camera to materials or objects, the user can receive information if the guide is trained to recognize them. A part of the Bulgarian cultural and historical heritage are the national costumes and garbs, the blankets, the coverings and the tunics, decorated with many embroideries and fancyworks. In every region of Bulgaria, the needlework has a specific character, which is determined by the combination of colors, stitches, symbols, ornaments, sewn beads, coins, tassels, and sequins. These features can define the place where the embroidery is made. By using machine learning methods, in particular the algorithms with neural networks, the tourist guide can learn to recognize embroideries, to understand whether they are Bulgarian and to classify them by their area of workmanship. This classification would help it to extract information from related ontologies and to enrich and to expand the consumer knowledge by becoming a kind of personal guide. The traditional Bulgarian embroideries are extremely varied in terms of ornaments, shape, coloring, and symbolism (Fig. 8). The main elements of the embroideries are Slavic but all the cultures, with which Bulgaria has had a touch in its historical development, have influenced the needlework. However, the embroidery has preserved the specifics of our culture and has its own character. The distinctive elements of the embroideries are the symbols they represent, the motives with which the ornaments are made, the colors of the threads, the types of the material, the way of embroidering, the arrangement of the ornaments, etc. The symbols embedded in the ornaments of the Bulgarian embroideries perform various functions—protective (from diseases and dangers), curative, fertile (for fertility), identificational (birthplace, marital status, social status) and decorative. The most commonly embroidered symbols are “mother goddess”, “tree of life”, “elbetica”, “kanatica” “rosetta”, “swastika”, “star”, “circle”, “cross”, “rhombus”, etc. (Figure 9). Some of them are not unique for Bulgaria, because they originate from the
338
S. Stoyanov et al.
Fig. 8 Map of the Bulgarian embroideries (author: Iren Yamami)
Fig. 9 Kinds of kanatica: 1. wedding, 2. family, 3. kin, 4. nation
Neolithic period. This is the reason why they are also characteristic for many other cultures such as all the Slavic ones, the peoples of Eastern Europe, the Scandinavian lands, the Eastern Mediterranean, and others. One and the same ornamentation can be accomplished in several ways—with geometric, floral (plants), zoomorphic (animals), or anthropomorphic (human) motives. The floral motives are the most widespread and characteristic for almost all of Bulgaria. A distinctive feature of the Bulgarian embroidery is its polychrome. It includes all the colors of the rainbow, the predominant color being the red one (although in some places it is displaced by darker tones), and for additional accents the blue and the green ones are mainly used. The needlework is made with approximately 20 different stitches. For each region a certain combination of several of them is characteristic, the most common being the slanting and the crossed. A peculiarity of the embroidery is also the size of the stitch—it can be large and small. Ornaments may be symmetrical or asymmetrical, straight or inclined, framed or not, arranged in a line, in squares or without a particular arrangement. Their elements can be large, small, or mixed. According to Bulgarian ethnographers, several geographic regions have been identified, which differ in the specific features of the embroideries. However, our folklorists disagree on their description. Such a classification is possible but it is necessary to collect and to interpret a large number of embroideries (pictures) from all over the country, as well as those that are close in cultural and historical sense but are made outside the borders of Bulgaria. The embroidery can be formally represented as an E vector with parameters—the ornaments that make it: E = O1, O2, . . . , Oq , q ≥ 1. The ornament O is a vector of several symbols: O = [S1, S2, . . . , Sr], r ≥ 1. The symbol S is a vector with four parameters—value k K, motive m M, stitch b Band color c C; in the stitch band the color is predominant in the symbol S = [k, m, b, c], where:
A Generic Architecture for Cyber-Physical …
• • • •
339
K = {rhombus, elbetica, …, other}, M = {geometric, floral, zoomorphic, anthropomorphic}, B = {slanting, crossed, …, other}, C = {red, brown, black, …}.
By combining the formal representation of all elements, we get a 3D Tensor (matrix of vectors) for the description of the embroidery: E = [[S1, S2, …,Sr]1, [S1, S2, …, Sr]2, …, [S1, S2, …, Sr]q] or E = [[[k, m, b, c]1, . . . , [[k, m, b, c]r]1, [[k, m, b, c]1, . . . , [[k, m, b, c]r]2, . . . , [[k, m, b, c]1, . . . , [[k, m, b, cr]r]q]. The task of the tourist guide can be divided into three separate parts—recognition of an image as an embroidery, definition of the embroidery as Bulgarian, and its classification according to the area of workmanship. All these three tasks can be solved by using methods based on neural networks. In the touristic guide, the RFES is implemented as a Hopfield’s network [28]. One of the most effective networks for recognizing images is the Convolutional Neural Network. However, it requires substantial computing resources, training time, and a great deal of data (photos of embroideries). A much simpler option that can solve the first task—recognizing the image as an embroidery, is the Hopfield’s associative memory.
7.4 TG AmbiNet Let us take a look at the following scenario where a tourist with a wheelchair, after communicating with the tourist QGA and KGA assistants, wishes to visit an exhibition of national embroideries that is exhibited in a nearby museum. The tourist is in a hotel and can use several devices that are important for the wheelchair: the hotel lift (lift1), the ramp to the street and to the bus station (ramp1), the ramp to the museum (ramp2), and the elevator of the museum (lift2). He/she can also go to the museum by bus or by using the pedestrian promenade (Fig. 10). All the important objects from the physical world needed for the generation of routes will be presented as the following Ambients: • a1 = —the lift in the hotel • a2 = —the hotel of the tourist • a3 = —the smart wheelchair of the tourist
340
S. Stoyanov et al.
Museum Hotel Lift1 5 min Ramp1
Lift2 25 min
5 min Ramp2
5 min 5 min
BusStop1
5 min
BusStop2
Fig. 10 A simple situation in the described scenario
• a4 = —the ramp between the hotel and the street • a5 = —the ramp to the museum • a6 = —the lift in the Museum • a7 = —the exposition in the museum • a8 = —the bus • a9 = —the bus stop 1 • a10 = —the bus stop 2. All physical objects are provided with sensors that transmit information about the availability and connectivity of the object. In addition to the physical Ambients in the scenario, several abstract Ambients are also included: • • • • • • •
a11 = —the personal touristic assistant a12 = —the Analytical Subspace a13 = —AmbiNet a14 = —the Analytical Subspace a15 = —the Route Generator a16 = —the Route Optimizer a17 = —the Guard Assistants.
During the pre-planning period, the PA sends a request to the AS to generate an appropriate route. The AS requests the GA for information on the activity, accessibility, and connectivity of the important physical objects, and transmits the request to the ANet. In communication with the TNet, the internal ambient RG generates a list of routes and transmits it to the RO to find an optimal route. This route returns
A Generic Architecture for Cyber-Physical …
341
to the tourist PA for performance. The processes of the main participating Ambients and the communication between them are presented below: PP T G ⎛
AN et ::< P Ai , get Route, H otel, E x position > .0| AN et :: (Route).W Ch ::< Route > .0
⎞ !P A :: (P Ai , get Route, H otel, E x position).G A ::< P Ai, ⎜ get Status I oT _H otel_Li f t1_Ramp1_B S1_Bus_ ⎟ ⎜ ⎟ ⎜ B S2_Ramp2_Museum_Li f t2_E x p > .0| ⎟ ⎟ PAN et ⎜ ⎜ G A :: (P Ai , Status I oT ).RG ↓< P Ai , get Route, ⎟ ⎜ ⎟ ⎝ ⎠ H otel, E x position, Status I oT > .0| RG ↓ (P Ai , Route).P A ::< Route > .0 ⎞ ⎛ AN et :: (P Ai , ⎜ get Status I oT _H otel_Li f t1_Ramp1_B S1_ ⎟ ⎟ ⎜ ⎟ ⎜ PG A ⎜ Bus_B S2_Ramp2_Museum_Li f t2_E x p)). ⎟ ⎟ ⎜ ⎠ ⎝ I oT ::< P Ai , get Statuce > .0| AN et ::< P Ai , Status I oT > .0 ⎛ ⎞ AN et ↑ (P Ai , get Route, H otel, E x position, Statis I oT ). ⎠ PRG ⎝ R O ::< P Ai , get O ptimal Route, List Routes > .0| R O :: (P Ai , Route).AN et ↑< P Ai, Route > .0 RG :: (P Ai , get O ptimal Route, List Routes). PR O RG < P Ai , Route > .0 Figure 11 shows the animated AmbiNet simulator of the presented example.
Fig. 11 Animated AmbiNet ccaPL simulator
342
S. Stoyanov et al.
8 Conclusion and Future Directions This paper presents a generic architecture for CPSS applications known as ViPS. Recently, we have developed an intelligent personal assistant designed to support tourists who want to be acquainted with the cultural and historical heritage of Bulgaria. With this app we wish to demonstrate the capabilities of ViPS to be adapted to various domains. Our future plans are in several directions. The first one is to complete the implementation of the Touristic Guide Assistant; in particular, we intend to extend and refine the assistant’s graphical user interface using virtual and cross reality. A subsequent adaptation domain would be intelligent agriculture. In cooperation with two agricultural institutes in the region of Plovdiv, we are building an infrastructure known as “Agriculture 2.0—Plovdiv”, which is based on ViPS. Acknowledgements The authors wish to acknowledge the partial support of the MES by the Grant No. D01-221/03.12.2018 for NCDSC—part of the Bulgarian National Roadmap on RIs.
References 1. S. Stoyanov, I. Ganchev, I. Popchev, M. O’Droma, From CBT to e-Learning. J. Inf. Technol. Control. No. 4, 2–10 Year III (2005), ISSN 1312-2622 2. S. Stoyanov, I. Popchev, E. Doychev, D. Mitev, V. Valkanov, A. Stoyanova-Doycheva, V. Valkanova, I. Minov, DeLC Educational Portal, Cybernetics and Information Technologies (CIT), vol. 10, No 3, (Bulgarian Academy of Sciences, 2010), pp. 49–69 3. S. Stoyanov, D. Orozova, I. Popchev, Virtual Education Space—Present and Future, Jubilee scientific conference with international participation, (BFU, Sept 20–21, 2016), pp. 410–418 4. S. Stoyanov, Context-Aware and adaptable eLearning systems. Internal report, software technology research laboratory, (De Montfort University, Leicester, UK, Aug 2012) 5. S. Stoyanov, I. Popchev, E-Learning infrastructures. Technosphere J. BAS. 4(30), 38–45 (2015), ISSN 1313-38612015 6. T. Glushkova, S. Stoyanov, I. Popchev, S. Cheresharov, Ambient-Oriented Modeling in a Virtual Educational Space, Comptes rendus de l’Acad´emie bulgare des Sciences, Tome. 71(3), pp. 398–406 (2018) 7. S. Stoyanov, A. Stoyanova-Doycheva, T. Glushkova, E. Doychev, Virtual Physical Space— An architecture supporting internet of things applications. XX-th International Symposium on Electrical Apparatus and Technologies SIELA, (Bourgas, Bulgaria, 2018), 3–6 June 8. S. Stoyanov, D. Orozova, I. Popchev, Internet of things water monitoring for a smart seaside city. XX-th International Symposium on Electrical Apparatus and Technologies SIELA, (Bourgas, Bulgaria, 2018), 3–6 June 9. T. Glushkova, M. Miteva, A. Stoyanova-Doycheva, V. Ivanova, S. Stoyanov, Implementation of a personal internet of thing tourist guide. Am. J. Comput., Commun. Control. 5(2), 39–51 June (2018), ISSN: 2375-3943 10. S. De, Y. Zhou, I. L Abad, K. Moessner, Cyber–physical–social frameworks for urban big data systems: A Survey. Appl. Sci. (2017) 11. P. Wang, L. T. Yang, J. Li, An edge cloud-assisted CPSS framework for smart city. IEEE Cloud Comput. 5(5), 37–46, Sept 2018, Doi: 10.1109/MCC.2018.053711665
A Generic Architecture for Cyber-Physical …
343
12. P. Chamoso, A. González-Briones, S. Rodríguez, J. M. Corchado, Tendencies of technologies and platforms in smart cities: A state-of-the-art review. Wirel. Commun. Mob. Comput. (2018) 13. M. Kang, X.R. Fan, J. Hua, H. Wang, X. Wang, F.Y. Wang, Managing traditional solar greenhouse with CPSS: A just-for-fit philosophy. IEEE Trans. Cybern. 48(12), 3371–3380 (2018). https://doi.org/10.1109/TCYB.2018.2858264 14. B. Guo, Z. Wang, Z. Yu, Y. Wang, N. Yen, R. Huang, X. Zhou, Mobile crowd sensing and computing: The review of an emerging human-powered sensing paradigm. ACM Comput. Surv. 48, 1–31 (2015) 15. Y. Zhou, S. De, K. Moessner, RealWorld city event extraction from twitter data streams. Proced. Comput. Sci. 98, 443–448 (2016) 16. A. González-Briones, F. De La Prieta, M. Mohamad, S. Omatu, J. Corchado, Multi-agent systems applications in energy optimization problems: A state-of-the-art review. Energies 11(8), 1928 (2018) 17. B. Moszkowski, Compositional reasoning using interval temporal logic and tempura, Lecture Notes in Computer Science, vol. 1536, (Springer, 1998), pp. 439–464 18. F. Siewe, H. Zedan, A. Cau, The calculus of context-aware ambients. J. Comput. Syst. Sci. 2010 19. Z. Guglev, S. Stoyanov, Hybrid approach for manipulation of events in the Virtual Referent Space. International Scientific Conference “Blue Economy and Blue Development”, (BFU, Burgas, 2018), 1–2 June 20. M. Wooldridge, An Introduction to multiagent systems, (Wiley, 2009) 21. Dictionary.com, “Definition of event”. Online Available. http://www.dictionary.com/browse/ event?s=t 22. Telecom Italia Lab, “JADE v4.5.0 API”. Online Available. http://jade.tilab.com/doc/api/index. html 23. F. Luigi Bellifemine, G. Caire, G. Dominic, Developing multi-agent systems with JADE, (John Wiley, Inc., 2007) 24. L. Cardelli, A. Gordon, Mobile ambients. Theoret. Comput. Sci. 240, 177–213 (2000) 25. V. Valkanov, A. Stoyanova-Doycheva, E. Doychev, S. Stoyanov, I. Popchev, I. Radeva, AjTempura—First software prototype of C3A model. IEEE Conference on Intelligent Systems, vol. 1, pp. 427–435 (2014) 26. T. Glushkova, M. Miteva, A. Stoyanova-Doycheva, V. Ivanova, S. Stoyanov, Implementation of a personal internet of thing tourist guide. Am. J. Comput., Commun. Control. 5(2), (June 2018). ISSN: 2375-3943, 39-51 27. CCO Commons, http://cco.vrafoundation.org/index.php/aboutindex/who_is_using_cco/ 28. J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. USA 79(8), 2554–2558 (1982)
Analysis of Data Exchanges, Towards a Tooled Approach for Data Interoperability Assessment Nawel Amokrane, Jannik Laval, Philippe Lanco, Mustapha Derras and Nejib Moala
Abstract Data interoperability implies setting up data exchanges among intra and inter enterprises collaborating Information Systems (IS). The multiplicity of these exchanges generates complexity and brings out control needs that can be handled by establishing monitoring and analysis systems. With the aim to assess the level of data interoperability among information systems, we present in this work the establishment of an analysis system where we exploit services provided by RabbitMQ, a messaging-based communication mean, in order to collect information about IS interactions. We propose a Messaging Metamodel that aggregates the information collected from RabbitMQ. It provides a single point of control and enables depicting indicators about potential interoperability problems. This work also presents the use of Moose, a software analysis platform, to implement data interoperability related indicators, queries and visualizations. An industrial case study of interactions among existing systems is presented to showcase the feasibility and the interest of our approach. Keywords Data interoperability · Interoperability assessment · Data visualization · Monitoring · Message brokers
N. Amokrane (B) · P. Lanco Berger-Levrault, Lyon, France e-mail: [email protected] P. Lanco e-mail: [email protected] J. Laval · N. Moala DISP Lab EA4570, University Lumière Lyon 2, Lyon, Bron, France e-mail: [email protected] N. Moala e-mail: [email protected] M. Derras Berger-Levrault, Labège, France e-mail: [email protected] © Springer Nature Switzerland AG 2020 R. Jardim-Goncalves et al. (eds.), Intelligent Systems: Theory, Research and Innovation in Applications, Studies in Computational Intelligence 864, https://doi.org/10.1007/978-3-030-38704-4_15
345
346
N. Amokrane et al.
1 Introduction When put into production, information systems (IS) are mostly part of distributed networks where exchanging data is inevitable to fulfill their mission. If these systems are able to share and exchange information without depending on a particular actor and can function independently from each other, we can qualify them as interoperable systems [1]. Interoperability is a property defined as “the ability of two or more systems or components to exchange information and to use the information that has been exchanged” [2]. When established among communicating IS, it ensures an increasing in the productivity and efficiency of inter and intra enterprise processes. This is technically enabled by the automation of information exchanges based on the use of shared exchange formats and appropriate communication protocols. In order to achieve data interoperability [3] Berger-Levrault1 (BL) implements such information exchanges within its IS ecosystems by setting up middleware’s to moderate among heterogeneous systems. This allows to build flexible, scalable and loosely-coupled architectures, such as service oriented and event driven architectures (SOA, EDA). These architectures rely on several communication means to convey data among the network, in this work we focus on messaging mechanisms. The latter are used to send messages, among interoperable systems, from a source (publisher) to one or more recipients (consumers) by using specific routings within a push mechanism. In this context and aiming to fully harness its integration architectures and build robust interactions, BL-MOM (Message Oriented Middleware), an in-house framework, has been developed by Berger-Levrault. It establishes routes between communicating programs following, over a first phase, the AMQP protocol [4]. The exchange of messages is handled according to publish/subscribe or asynchronous request/reply patterns [5], allowing consequently the systems to be loosely coupled. BL-MOM uses RabbitMQ,2 a reliable open source communication mediator, as the underlying message broker and provides helpers to facilitate creating messages schema, publishers, consumers and messaging operations over this broker. BL-MOM has been developed as a first step towards building a robust interoperability framework that handles scalable and configurable architectures. BL-MOM represents the base-line infrastructure for messaging. It implements configurations that guarantee service availability, delivery and persistence of messages. However, other features are needed to build a larger scope sustainable interoperability framework [6], such as: • Facilitate intelligent auto-configuration in order to integrate new interoperability interactions and data exchanges • Sustain several communication protocols • Fluently support multiple deployment modes and environments 1 Software
provider specialized in the fields of education, health, sanitary, social and territorial management. 2 https://www.rabbitmq.com/.
Analysis of Data Exchanges, Towards a Tooled Approach for Data …
347
• Provide data transformations to adapt to common exchange formats • Fully secure the architecture components and the exchanged data • Provide means of control, assessment and maintenance of interoperability interactions. These features are essential to establish scalable, maintainable and secure architectures in a constantly changing environment with multiple interactions. In this work, we contribute to providing the last feature by building means of control over the undertaken message interactions. We propose to supervise the underlying communication mean: RabbitMQ that implements messaging actions. We use collected information about the existing data exchanges to offer indicators, queries and visualizations in order to provide a single point of control over the existing architecture configurations and the messaging behavior. We contribute that way to assess the effective interoperability of the communicating systems. In the remaining sections of this chapter, Sect. 2 defines messaging related terms used in this chapter. Section 3 explains our stance regarding the use of data exchanges analysis for interoperability assessment. We then present the motivation behind this work in Sect. 4 along with related work. Section 5 exposes the proposed analysis system, its underling metamodel and a subset of the reported queries and indicators that enable the evaluation of data interoperability. Section 6 describes implementation elements and Sect. 7 depicts results illustrated by an industrial case study. Section 8 concludes this chapter and opens perspectives.
2 Terms and Vocabulary In this work, we use some specific terms related to IS messaging and RabbitMQ implementation of the AMQP protocol [4]. We define them in this section. • Connection: a TCP network connection between an application and the RabbitMQ broker • Channel: a stream of communications between two AMQP peers • Message: a message is composed of a header and a body. The header contains the properties of the message presented in a specific type of format. The body or payload is the transiting application data also presented in a specific type of format • Exchange: a named entity that receives messages from producers and routes them to queues • Queue: a named entity that holds messages and delivers them to consumers • Routing key: a virtual address that an exchange may use to route messages towards queues • Publisher: a client application that publishes messages to exchanges • Consumer: a client application that requests messages from queues. The links between some of these elements are depicted in Fig. 1.
348
N. Amokrane et al.
Fig. 1 AMQP exchange and queuing system [4]
3 Data Exchanges Analysis for Data Interoperability Assessment We aim in this work to evaluate and maintain effective interoperability. That is to say, interoperability that is already established among interacting systems [7] and that we assess once the communication has started or is completed. Furthermore, considering interoperability concerns (data, services, processes and business) [3], we address in this work data interoperability. Meaning that we consider aspects related to what makes systems with different data models work together. As explained above, the heterogeneity of systems is handled by setting up middleware such as BL-MOM and its use of messaging mechanisms. The latter allows to build connectors that route data from publishers to consumers regardless of their technology, data models, operating systems or data bases management systems; as long as they share a common exchange format and are able to publish or use received data. We advocate that such middleware infrastructures convey enough information about the network’s architecture and interactions to enable the assessment of the level of data interoperability. We demonstrate in this work that the behavior of data exchange among communicating IS gives an indication of their data interoperability level. The analysis of existing data exchanges allows us to provide elements that highlight interoperability problems and their potential causes (Table 1). Moreover, the multiplicity of the established data interoperability exchanges generate complexity that must be handled. Besides, information systems are dynamic and evolve in changing environments. Interfaces and exchange formats that are valid today can be obsolete tomorrow. It is therefore important to supervise the network’s interactions in order to anticipate, or react promptly to, potential interoperability dysfunctions. We propose to continuously assess data exchanges related architecture and the existing interactions to provide indication about effective data interoperability.
4 Motivation and Related Work Interoperability assessment of enterprise processes and systems evaluates the ability to undertake common activities or exchange data. Several interoperability assessment frameworks and approaches have been proposed since the emergence of the concept of interoperability [2] as developing and improving interoperability implies that the level of interoperability can be assessed, and causes of failure identified and analyzed [8]. Interoperability has become an important asset for enterprises and is
• Presence of the publisher’s related exchanges • Intervals between publications for inactivity periods of publishers • Presence of the consumer’s related queues • Intervals between connections for inactivity periods of consumers • Date of last published message on the concerned exchanges • Date of last consumed messages on the concerned queues • Date of last connection of the consumer
• Missing publisher
• Missing consumer
• Missing interaction
• “Data is exchanged among partners”
• Presence of the user and the required • Permissions on the related virtual host • Validity of user credentials
• Missing permission to exchange data
• “Partners have the required permissions to carry out data exchanges”
Messaging indicators
Interoperability problem
Interoperability requirement [21]
• Unnecessity/obsolescence of the interaction due to architecture configuration change or a process change • Error in publishing or consuming properties attribution • Missing binding between the related exchanges and queues • Problem of missing publisher • Problem of missing consumer • Problem of invalid exchange format (continued)
• Application shutdown • Problem of missing permission to exchange data
• Application shutdown • Problem of missing permission to exchange data
• Deleted user • Deleted virtual host • Not communicated permissions update • Not communicated credentials update • Error in publishing or consuming properties attribution
Messaging indicators
Table 1 Interoperability requirements, related interoperability problems, messaging indicators and problems potential causes
Analysis of Data Exchanges, Towards a Tooled Approach for Data … 349
• Time to consume, time to acknowledge (filtering messages according to message known elements) • Time to response (filtering messages according to message known elements)
• Unavailable expected data
• Data leak
• Data transmission delay
• Response delay
• “Exchanged data is available”
• “The exchanged data is only accessible by authorized entities”
• “Effective exploitation time of data is less or equal to defined exploitation time”
• Presence of unauthorized consumers • Authentication attempts
• Lost messages on the concerned canals (filtering according to message known elements)
• Rejected messages on the concerned queues • Exchange format of last accepted message on the concerned queues
• Invalid exchange format
• “Received data is conform to required data”
Messaging indicators
Interoperability problem
Interoperability requirement [21]
Table 1 (continued)
• Local bugs • Problem of missing consumer • Problem of missing interaction
• Credential leak or error in credential attribution to consumers • Error in consuming properties attribution
• Problem of missing interaction • Problem of missing permission to exchange data
• Incompatible exchange formats • Not communicated evolution of the exchange format
Messaging indicators
350 N. Amokrane et al.
Analysis of Data Exchanges, Towards a Tooled Approach for Data …
351
now proposed to be considered as a key performance indicator in business process performance management systems [9]. Maturity models [10–15] have first been utilized as interoperability assessment approaches. They allow to evaluate interoperability respecting a set of predefined levels and provide recommendations to move from one level to the next one in order to achieve a required level or reach full maturity. However, maturity models do not allow to precisely indicate causes of non-interoperability and mainly focus on general qualitative notions [16]. Other methods allow to provide a quantitative measurement of the level of interoperability; Daclin et al. [17] propose an approach to determine the degree of interoperability using yes or no questions about conceptual, organizational and technological compatibility of systems. Interoperability performance is also considered in terms of cost, time and quality of interoperation. Ford et al. [18] provide an interoperability score through a measure of operational effectiveness, the score takes into consideration whether a third-party intervention is needed to allow the interoperation. These approaches provide the fundamental concepts that allow formalizing and evaluating interoperability by indicating whether interoperability problems exist or not. Based on these concepts and in order to precisely locate interoperability problems among collaborative processes, Mallek et al. [19] define a set of interoperability requirements that should be verified in order to achieve interoperability. The requirements are structured in terms of compatibility, reversibility and interoperation properties defined for each interoperability concern (data, services, processes and business). This formalization of interoperability requirements allows us to select the appropriate interoperability requirements to be checked. According to our analysis objectives, we propose to focus on data related requirements to help evaluate effective data interoperability by analyzing existing data exchanges among interoperable systems. This allows us to precisely detect interoperability problems and provide this way their potential causes. We use the selected data interoperability requirements to analyze existing data exchanges. This is performed, as explained in Sect. 3, by first supervising and monitoring the underlying message broker that is used as a communication canal among the interoperable systems. When considering existing open source and commercial integration frameworks (Apach Camel3,4 , NServiceBus5 ) that provide monitoring consoles for data exchanges, we notice that they mostly focus on low level monitoring information such as, frequency of messages, technical performance indicators or memory usage. Besides, these frameworks only provide partial solutions considering Berger-Levrault goals for a sustainable interoperability framework. RabbitMQ itself offers a management console with information related to the structure of the messaging system and the status of messages [20]. It presents
3 http://sksamuel.github.io/camelwatch. 4 http://rhq-project.github.io/rhq/. 5 https://particular.net/nservicebus.
352
N. Amokrane et al.
lists of existing resources (channels, exchanges, queues, etc.), their content, characteristics and a set of statistics. It is, for example, possible to access queues and check the pending messages. Based on our experience, RabbitMQ console can be used for real time monitoring of the messaging system and is suitable for specific queries where the maintainer knows the queues or exchanges that must be tracked. However, it does not allow advanced querying and filtering over the resources and the transiting messages, especially needed in case of multiple interoperability exchanges with thousands of messages. Keeping track of in-transit messages is for instance not permitted as the consumed messages are no longer presented in the management console. Besides, messaging canals such as exchanges, queues and their bindings are volatile and can be deleted when the consumer disconnects, as the console does not provide a visualization of the history of existing resources. It is for example not possible to identify all the consumers that a resource has had. As the behavior of data exchange among communicating applications gives an indication of their data interoperability level. A monitoring system should provide elements to help maintain a good level of interoperability. Though the lack of the above-mentioned control elements complicates the analysis and diagnosis of interoperability problems such as the inactivity of publishers and consumers, the invalidity of exchange formats or the unavailability of data. Where it is difficult for maintainers to identify the context and the origin of the problem, based only on RabbitMQ management console. We therefore propose to take advantage of other RabbitMQ services such as messages and events tracing, combine them with information provided by BL-MOM, in order to perform advanced monitoring and querying. This would provide indicators that highlight interoperability problems, improving thereby the determination of maintenance actions.
5 Messaging Analysis System for Data Interoperability Assessment In order to assess effective data interoperability, we propose to analyze messagingbased data exchanges and their related architecture by supervising and monitoring RabbitMQ, the underlying message broker used among communicating systems. We interrogate RabbitMQ and its provided log services and aggregate the collected information into a common metamodel, the Messaging Metamodel. In the following, we present the metamodel elements, depicted in Fig. 2. We then detail how these elements are collected and what information can be reported allowing to provide high level indicators that facilitate interoperability assessment and maintenance.
Analysis of Data Exchanges, Towards a Tooled Approach for Data …
Fig. 2 Messaging metamodel
353
354
N. Amokrane et al.
5.1 Messaging Metamodel The Messaging Metamodel illustrates the messaging structure implemented through message queuing and exchange system following the AMQP protocol. The metamodel also integrates dynamic aspects, where the lifecycle of architecture components is depicted by specifying creation and deleting dates and timestamps. Messaging concepts are linked with business related concepts providing accordingly the business context of the interactions. As presented in Fig. 2, each message carries application data within the payload where the data is formatted according to an exchange format. It is published into an exchange then routed to none or several queues according to routing keys that are defined via bindings. The latter can be set up between an exchange and a queue or among exchanges. The architecture components represent publishers or consumers, they are linked to resources (exchanges and queues) through connection channels. The publisher and consumer clients connect to the broker with user credentials, where every user has specific permissions. User authentications to access the broker are traced along with their success state. A node is a RabbitMQ server, establishing several nodes can be used to handle large scale, geographically distributed architectures. RabbitMQ can also function in a cluster mode, load balancing one broker instance over several nodes. Within a node, RabbitMQ separates groups of resources with virtual hosts. The latter provides logical grouping and isolation of resources that also share a common authentication. The Metamodel aggregates information from several sources. We present in the following each source and the information it conveys: • Message traces provided by RabbitMQ tracing plug-in,6 this allows to identify for each message: – The RabbitMQ node it transits through along with the connection and virtual host information – The exchange it was published in or consumed from, the queues it is routed to or the queue it is consumed from and the related routing keys – The user publishing or consuming the message – Its timestamp, type (published/received) and delivery mode (persistent or not). • Current configuration of the broker in terms of exchanges, queues, their bindings and existing users and permissions. This is audited through the use of RabbitMQ REST management API.7 • History of events of creation and deletion of resources, virtual hosts, users and permissions; creation and closing of connections and channels and user authentication attempts. This is provided by the RabbitMQ Event Exchange plug-in.8
6 https://www.rabbitmq.com/firehose.html. 7 https://pulse.mozilla.org/api. 8 https://www.rabbitmq.com/event-exchange.html.
Analysis of Data Exchanges, Towards a Tooled Approach for Data …
355
• Contextual elements about the communicating applications characteristics provided by BL-MOM.
5.2 Analysis Queries The conjunction of sources allows us to have a single point of control and perform advanced querying to facilitate interoperability maintenance actions. Here is a subset of the queries and indicators that can now be reported: i. Business level queries BL-MOM overrides message traces providing high level information i.e. information with identifiable business signification, such as: publishers and consumers application identifiers, tenants, topics and exchange formats. This enables, for instance, to consider the tenant identifier of the application that publishes or consumes messages in case of a multi-tenant application. Similarly, topics to which messages have been published when using a publish/subscribe messaging pattern can be specified. Such information help maintainers specify the context of the interactions. ii. Messages filtering Messages can be filtered according to, or combining, several of their characteristics such as: identifier, timestamp, exchange, queue, publisher, consumer, related user, state, size, encoding or exchange format. This allows to have more precision while searching in messages traces. For example, filtering messages that transit within a time slot between a publisher and a consumer allows to inspect the behavior of the interaction during a period where a break down occurs. We can also check if a message is duplicated, redelivered or rejected. iii. Security checks Supervising the security of message exchanges can be improved by checking several elements: • User authentication timestamps and the success or failure of the authentication. This contributes to the detection of violation attempts. • The set of users, and their corresponding consumers, which have similar permissions on a set of resources, in order to prevent potential data leaks. • Payload content, when this is allowed, to check whether confidential elements are encrypted or not. iv. Architecture evolution The ability to track resources creation and deletion allows to depict current and past RabbitMQ messaging configurations in terms of exchanges, queues and their bindings. Furthermore, relying on business information provided by BL-MOM, we
356
N. Amokrane et al.
can identify the communicating applications even if their technical identifiers have changed at the broker level. We can accordingly provide a larger scope visualization of the architecture components and their interactions along with their evolution over time.
5.3 Indicators for Messaging Based Interoperability Requirements The proposed indicators and queries are used to precisely detect and locate interoperability problems in order to maintain a good level of interoperability. To define what is a good level of interoperability we rely on interoperability requirements, meaning what should be undertaken and maintained when establishing interoperability interactions among communicating systems. We rely on compatibility and interoperation interoperability requirements defined by Mallek [21]. Compatibility requirements are considered to be invariable throughout the collaboration, they represent abilities or characteristics that partners must satisfy before collaboration is effective. As for interoperation requirements, they vary during the collaboration and are related to the performance of the interaction. Both types of requirements are defined for each interoperability concern (data, services, processes and business). According to our analysis objectives, we focus on data related requirements. As our focus is on messaging mechanisms as a way to convey data among interoperable systems, we instantiate the proposed requirements by referencing messaging paradigm concepts, defined in the messaging metamodel, in order match locally identifiable elements. We contribute that way to precisely locate problems among the messaging components that are used to guarantee data interoperability. We present in the following messaging interpretation of the data related interoperability requirements. Compatibility requirements are verified by Berger-Levrault prior to the establishment of interoperability exchanges for intra or inter-enterprise collaborations. Prior to each collaboration: • “Partners have the required permissions to carry out data exchanges”: in order to setup the data exchanges, publishers and consumers are provided with access credentials to be able to publish and consume from a given broker node. The related node’s host and port are provided along with the user’s name, password and the virtual host it has permission to write or read from. • “Partners provide permissions for data updates”: in an automated process, consuming data may imply modifications to a partner’s own data. This is why, partners agree on the actions to be performed when exploiting the provided data and give their authorization regarding the implications that these actions have on their data. • “Exchanged data is formalized and unambiguous”: publisher’s and consumer’s development teams design (together or by accepting given choices) a shared
Analysis of Data Exchanges, Towards a Tooled Approach for Data …
357
exchange format. The syntax and semantic of the data is defined, sharing thereby a common unambiguous understanding of the exchanged data and its format. • “Confidential data is secured”: partners agree on means of securing confidential shared data. They, for example, decide to encrypt all or parts of the message’s payload and share decoding keys to be able to exploit the conveyed data. Interoperation requirements are related to elements that can vary or be altered during interactions. They thus must be supervised. To do so, we inspect the behavior of undertaken data exchange, depicted through the population of the messaging metamodel. We present in Table 1 how we can highlight data interoperability problems by providing indicators or possible queries that define precisely the context of the problem. A data leak can, for instance, be detected by the presence of unauthorized consumers, or be prevented by supervising authentication attempts. Interoperability problems represent situations where the requirements are not verified. Along with the indicators that showcase these problems, we provide potential causes and point out existing correlations between interoperability problems (by referencing other problems in the causes). If, for example, partners show no interaction at data level over a certain period (requirement “Data is exchanged among partners” not verified), this can indicate the unnecessity or the obsolescence of the interaction due to a configuration change (consuming on another channel) or a change at the process level. This can further indicate a permission problem or the invalidity of the exchange format. Maintainers inspect the proposed messaging indicators to determine maintenance actions considering the identified potential causes.
6 Implementation We implemented a prototype tool9 as a first step towards developing the proposed messaging analysis system. It allows to visualize some of the behaviors of data interactions. This prototype aims to allow maintainers to understand the status of each message and gives indications to help analyze interoperability problems. The prototype is implemented on top of Moose, a Smalltalk based open source software and data analysis platform [22], and is based on an extension of its metamodel. The latter is independent from source code [23]. We experimented our approach on data generated by a RabbitMQ implementation explained later on in a case study. The prototype shows the feasibility and the interest of our approach. Its implementation is composed of several parts: the implementation of the previously presented Messaging Metamodel, the development of importers to populate instantiations of the metamodel and the development of two visualizations for reporting indicators.
9 https://github.com/janniklaval/pulse.
358
N. Amokrane et al.
6.1 Implementation of the Metamodel The metamodel is specific to RabbitMQ’s implementation of the protocol AMQP. RabbitMQ supports other protocols such as MQTT, REST and STOMP. The metamodel is thus a first step towards the definition of a higher-level model that can integrate several communicating means and protocols. The implementation of the metamodel is an extension of Moose. The latter also includes a family of metamodels that can be customized for various aspects of code representation (static, dynamic, history…). Moose core describes the static structure of software systems, particularly object-oriented software systems.10 Extending Moose is a major asset for our implementation approach, as it allows us to rely on and reuse powerful tools and analysis developed within Moose [25]. To complete the implementation of the metamodel, we developed importers to populate the model. For that, we created three types of importers that catch information from different sources (as explained in Sect. 5.1): parsers for messages traces files, consumer module that is suspscribed to the broker to get events logs and REST requests to interrogate RabbitMQ management API in order to fully populate the proposed metamodel. The following case study focuses on one importer, the traces files parser and its use to build the messaging model.
6.2 Using Moose as a Platform Moose is used in our implementation as a complete platform. It provides the possibility to build parsers, data queries, data browsers and data visualizations by extending inherent services. Accordingly, Moose allows us to bring an agile answer to BergerLevrault’s industrial needs by completely integrating the messaging metamodel into its environment. We seamlessly benefit from the provided services for data browsing and visualization. Once our metamodel is implemented and declared, we can for instance browse data with the generic browser. Further developments are needed to implement the required parsers, queries and data visualization. Figures 3 and 4 presented in the following section show two examples of visualizations built with Moose.
7 Case Study We use here a case of exiting interactions among BL applications to showcase some analysis services of the implemented prototype.
10 see
[24] and http://www.moosetechnology.org/docs/famix.
Analysis of Data Exchanges, Towards a Tooled Approach for Data …
359
Fig. 3 Visualization of transiting messages in a queue
Fig. 4 Cartography of publishers and subscribers of traced interactions
Berger-Levrault provides its clients with a software as a service Console (SAASConsole) to allow a secure access to its cloud deployed software via SSO (SingleSign-On) mechanism. The SAAS-Console is also an administration console with which clients can autonomously manage the accounts and access rights related to the software they use. The SAAS-Console exchanges data with several applications (CRM, authentication modules and business applications). We consider here the ones that use RabbitMQ for data transmission: (i) BL-Auth, a separate authentication module, to which it sends user accounts rights and from which it receives credentials’ updates, (ii) BLSocle for the automatization of provisioning, to which it sends information regarding the packages of software to be deployed for the clients and their assigned user access accounts. The interactions are set up with BL-MOM and the exchanged messages transit through a RabbitMQ node.
7.1 Goals In this primary case study, we provide analysis elements in response to some of interoperability assessment needs expressed by Berger-Levrault:
360
N. Amokrane et al.
• Having a clear visualization of the traces of transiting messages in each queue and their characteristics. This is particularly needed considering that transiting messages are volatile in the management console of RabbitMQ. This can be used to check messages state during defined periods of time. • Having a global vision of the structure of messaging architectures and the undertaken message paths. This is to ensure the presence and activity of publishers and consumer and the expected interactions between them. • Ensuring the correctness of the subscriptions and that there are no data leaks.
7.2 Used Data Berger-Levrault provided traces files from the RabbitMQ node used for the above explained data exchanges among the SAAS-Console, BL-Auth and BL-Socle. BLMOM controls the messaging actions allowing to have logs with business level information. These log files are activated for elements that we want to trace on the broker node. The log files are JSON encoded, containing a JSON entity for each action on messages. In this case study, we exploit only these log files without extracting information from the other sources (e.g. REST API). For the sake of confidentially, we do not trace the payload of the messages which contains the transiting application data.
7.3 Results For the first need we propose to represent the messages, collected from the traces, in a browsable, histogram typed, visualization. It depicts the messages routed to a specified queue, during a chosen period and grouped by equal slots of time. Figure 3 shows the resulting visualization for published messages into one of the queues that the SAAS-Console messages are routed to, consumed by BL-Socle. The queue is bound to a topic exchange that handles messages related to user access accounts and the visualization shows the rate of published messages traced during working hours. We can see that there has been more activity on the SAAS-Console regarding the creation and updating of user accounts and access rights during the end of the day. Moreover, in order to inspect the messages, we upgraded this visualization (that is also provided by RabbitMQ console but in a static way), by offering the possibility to browse into each entry to view the list of messages and deploy accordingly each message enabling, in this manner, to check its definition, state and characteristics. In answer to the other expressed needs, we provide, based on traces files, a cartography that offers an overall view of the current messaging architecture components. The visualization exposes the internal broker architecture, in terms of exchanges to
Analysis of Data Exchanges, Towards a Tooled Approach for Data …
361
which the messages are published and routed queues, along with business information by identifying the producers and consumers of the messages. An example is presented in Fig. 4. The visualization succeeds in showing the exiting interactions between: (i) BL-Auth as publisher and the SAAS-Console as consumer; (ii) the SAAS-Console as publisher and BL-Socle and BL-Auth as consumers. It displays the underlying RabbitMQ architecture elements: • The use of four topic exchanges • The queues to which the exchanges are bound. We notice that user access related exchange (consoleSaasUserAccess) is also bound to a queue consumed by a logger application. Such a visualization offers a structured representation of a technical architecture that can be limited by business context. It favors the inspection of the existing interactions among publishers and consumers and whether the latter correspond to the intended designed ones. It also allows to detect unauthorized consumers to check for data leaks. The activity of interactions among publishers and consumers can be inspected by checking their last publication and consumption. Furthermore, in order to show the evolution of the architecture we need to collect and integrate data from event logs to reconstruct the architecture at a selected point in time.
8 Conclusion and Future Work We presented in this work a contribution to data interoperability assessment performed through the analysis of messaging-based exchanges among communicating information systems. The proposed system exploits services from RabbitMQ, the message broker that is used to undertake interoperability interactions. We proposed a messaging metamodel used to gather and aggregate information collected from several sources. The population of the metamodel allows to provide contextual indicators used to assess data related interoperability requirements and precisely locate interoperability problems. Highlighting problem related to the effective ability to interoperate and interchange data helps maintain a good level of data interoperability. We implemented the metamodel and analysis functions by extending the metamodel of Moose taking advantage of its inherent services. We elaborated importers and parsers to collect data from different sources and populate the metamodel. In answer to some of Berger-Levrault analysis needs, we used the structured instantiated model to build two data visualizations that depict several indicators. Further developments are planned: data visualizations in order to consider dynamic aspects of the messaging configurations and queries to help indicate interoperability dysfunctions. A representation of messages by type can be proposed to visualize rejected messages. The evolution of the architecture over time can as well be used to highlight idle interoperability interactions. Indicators can also be utilized to alert administrators with real time notifications about failed operations or about rates that exceed accepted thresholds.
362
N. Amokrane et al.
References 1. D. Chen, G. Doumeingts, F. Vernadat, Architectures for enterprise integration and interoperability: past, present and future. Comput. Ind. 59(7), 647–659 (2008) 2. A. Geraci, F. Katki, L. McMonegal, B. Meyer, J. Lane, P. Wilson, J. Radatz, M. Yee, H. Porteous, F. Springsteel, IEEE Standard Computer Dictionary: Compilation of IEEE Standard Computer Glossaries (IEEE Press, 1991) 3. D. Chen, N. Daclin, et al., Framework for enterprise interoperability, in Proceedings of IFAC Workshop EI2N (2006), pp. 77–88 4. S. Vinoski, Advanced message queuing protocol. IEEE Internet Comput. 10(6) (2006) 5. G. Hohpe, B. Woolf, Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions (Addison-Wesley Professional, 2004) 6. C. Agostinho, Y. Ducq, G. Zacharewicz, J. Sarraipa, F. Lampathaki, R. Poler, R. JardimGoncalves, Towards a sustainable interoperability in networked enterprise information systems: trends of knowledge and model-driven technology. Comput. Ind. 79, 64–76 (2016) 7. C. Cornu, V. Chapurlat, J.-M. Quiot, F. Irigoin, Interoperability assessment in the deployment of technical processes in industry. IFAC Proc. Vol. 45(6), 1246–1251 (2012) 8. D. Chen, B. Vallespir, N. Daclin, An approach for enterprise interoperability measurement, in MoDISE-EUS (2008), pp. 1–12 9. M.-J. Verdecho, J.-J. Alfaro-Saiz, R. Rodriguez, Integrating business process interoperability into an inter-enterprise performance management system, in Proceedings of the 9th International Conference on Interoperability for Enterprise Systems and Applications (I-ESA), Berlin, Germany, 2018 10. C.A.W. Group, et al., Levels of Information Systems Interoperability (LISI), US DoD (1998) 11. T. Clark, R. Jones, Organisational interoperability maturity model for c2, in Proceedings of the 1999 Command and Control Research and Technology Symposium (Citeseer, 1999) 12. A. Tolk, J.A. Muguira, The levels of conceptual interoperability model, in Proceedings of the 2003 Fall Simulation Interoperability Workshop, vol. 7 (Citeseer, 2003), pp. 1–11 13. W. Guédria, D. Chen, Y. Naudet, A maturity model for enterprise interoperability, in OTM Confederated International Conferences “On the Move to Meaningful Internet Systems” (Springer, 2009), pp. 216–225 14. I.M. de Soria, J. Alonso, L. Orue-Echevarria, M. Vergara, Developing an enterprise collaboration maturity model: research challenges and future directions, in 2009 IEEE International Technology Management Conference (ICE) (IEEE, 2009), pp. 1–8 15. G. Kingston, S. Fewell, W. Richer, An Organisational Interoperability Agility Model. Technical Report, Defence Science and Technology Organisation, Canberra, Australia, 2005 16. S. Mallek, N. Daclin, V. Chapurlat, Towards a conceptualization of interoperability requirements, in Enterprise Interoperability IV (Springer, 2010), pp. 439–448 17. N. Daclin, D. Chen, B. Vallespir, Methodology for enterprise interoperability. IFAC Proc. Vol. 41(2), 12873–12878 (2008) 18. T. Ford, J. Colombi, S. Graham, D. Jacques, The Interoperability Score. Technical Report, Air Force Institute of Technology, Wright-Patterson AFB, OH, 2007 19. S. Mallek, N. Daclin, V. Chapurlat, The application of interoperability requirement specification and verification to collaborative processes in industry. Comput. Ind. 63(7), 643–658 (2012) 20. D. Dossot, RabbitMQ Essentials (Packt Publishing Ltd, 2014) 21. S. Mallek, Contribution au développement de l’interopérabilité en entreprise: vers une approche anticipative de détection de problèmes d’interopérabilité dans des processus collaboratifs. Ph.D. dissertation, Ecole des Mines d’Alès, 2011 22. S. Ducasse, T. Gîrba, M. Lanza, S. Demeyer, Moose: a collaborative and extensible reengineering environment, in Tools for Software Maintenance and Reengineering. Series RCOST/Software Technology Series (Franco Angeli, Milano, 2005), pp. 55–71. [Online]. Available: http://scg.unibe.ch/archive/papers/Duca05aMooseBookChapter.pdf
Analysis of Data Exchanges, Towards a Tooled Approach for Data …
363
23. S. Ducasse, N. Anquetil, M.U. Bhatti, A. Cavalcante Hora, J. Laval, T. Girba, MSE and FAMIX 3.0: An Interexchange Format and Source Code Model Family. Research Report, Nov 2011. [Online]. Available: https://hal.inria.fr/hal-00646884 24. S. Demeyer, S. Tichelaar, S. Ducasse, FAMIX 2.1—The FAMOOS Information Exchange Model. Technical Report, University of Bern, 2001 25. S. Ducasse, T. Gîrba, A. Kuhn, L. Renggli, Meta-environment and executable metalanguage using smalltalk: an experience report. J. Softw. Syst. Model. (SOSYM) 8(1), 5–19 (2009). [Online]. Available: http://scg.unibe.ch/archive/drafts/Duca08a-SosymExecutableMetaLanguage.pdf