147 77 7MB
English Pages 232 [228] Year 2024
Fuzzy Cognitive Maps
Philippe J. Giabbanelli · Gonzalo Nápoles Editors
Fuzzy Cognitive Maps Best Practices and Modern Methods
Editors Philippe J. Giabbanelli Department of Computer Science and Software Engineering Miami University Oxford, OH, USA
Gonzalo Nápoles Cognitive Science and Artificial Intelligence Tilburg University Tilburg, Noord-Brabant, The Netherlands
ISBN 978-3-031-48962-4 ISBN 978-3-031-48963-1 (eBook) https://doi.org/10.1007/978-3-031-48963-1 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Paper in this product is recyclable.
To Dr. Vijay K. Mago, whose patience for teaching Fuzzy Cognitive Maps has borne unexpected fruits over ten years. To Prof. Dr. Koen Vanhoof, Prof. Dr. Rafael Bello and Dr. Maikel León for their support and encouragement.
Foreword
Cognitive maps (CMs) and fuzzy cognitive maps (FCMs) have assumed a prominent and visible position in the arena of intelligent systems. Over the decades, we have been witnessing a growing level of interest in these architectures. This is not surprising at all given that such graph-oriented models come with a great deal of simplicity, high interpretability, useful knowledge representation capacity and associated learning mechanisms. In a nutshell, modeling complex phenomena by identifying their underlying concepts and linkages among them arises as a natural and convincing way of coping with the complexities of real-world phenomena. FCMs deliver an efficient modeling vehicle. The resulting graphs are easily interpretable: concepts are mapped on their nodes and relationships are described by edges of the graphs annotated by +1 or −1 in this way denoting the excitatory or inhibitory nature of linkages. Further accommodation of gradual quantification of dependencies as advocated in FCMs has further strengthened the expressive power of the maps. Professors Philippe Giabbanelli and Gonzalo Nápoles, who are outstanding experts in the area, have delivered an important, timely, very much needed and unique textbook on fuzzy cognitive maps. The authoritative and lucid treatment of the subject matter through a systematic exposure of the key design and analysis topics are the key facets of this treatise. The material is structured into ten chapters organized into several main and logically structured parts. Introductory material covered in Chap. 1 brings forward basic concepts, motivation, and well-thought-out illustrative material. Elicitation processes of cognitive maps are covered in Chap. 2 by elaborating on interesting aspects of structured interviews and mechanisms of interaction with the users. Simulations with FCMs including hybrid approaches (Chaps. 3 and 4) deliver ways in which a behavior of FCMs is revealed and analyzed and results become deployed. A thorough discussion (Chap. 5) offers an in-depth view at the graph-oriented aspects of FCM that are quantified by invoking measures characterizing sparsity and centrality of the underlying graph structure. Extensions displayed in Chap. 6 navigate the reader through various ways on how to handle inherent uncertainty factors by looking at interval-valued characterization of the maps including also the uncertainty of the time component. Learning is the key contents of Chaps. 7–10; different learning schemes vii
viii
Foreword
based on data and accuracy issues are presented in depth. Finally, an application classification case study (Chap. 9) along with practical ramifications and various implementation environments is thoroughly discussed. The exercises given at the end of each chapter are a useful addition to the overall material. Some problems are open-ended as this may trigger more interest in pursuing further lines of independent research. Indisputably, the book will appeal to readers who are new to the area of FCMs. However, it will be of equal interest to the research community of those who are familiar with the ideas of the maps but wish to become fully updated to new trends, enjoy systematic and thorough analysis and design practices that have been accumulated in the subject area in recent years. New inspiring insights and a systematic exposure of the material are equally valuable to newcomers and well-established researchers. Undoubtedly, all of them could identify some intriguing and promising directions in their future research pursuits. The text is self-contained which also offers some evident advantage when studying the material. The book also sets up a coherent terminology. This is of particular importance as quite often the existing literature comes with sometimes incoherent and confusing naming and notation. In sum, this treatise is an outstanding contribution to the body of knowledge on the subject of fuzzy cognitive maps for which the research community has been waiting for a long time. I am confident that the book will play a highly instrumental role in broadening the working knowledge of the fundamentals and practice of fuzzy cognitive maps. Edmonton, Canada September 2023
Witold Pedrycz
Contents
1
Defining and Using Fuzzy Cognitive Mapping . . . . . . . . . . . . . . . . . . . . Philippe J. Giabbanelli, C. B. Knox, Kelsi Furman, Antonie Jetter, and Steven Gray
2
Creating an FCM with Participants in an Interview or Workshop Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. B. Knox, Kelsi Furman, Antonie Jetter, Steven Gray, and Philippe J. Giabbanelli
1
19
3
Principles of Simulations with FCMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gonzalo Nápoles and Philippe J. Giabbanelli
45
4
Hybrid Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Philippe J. Giabbanelli
61
5
Analysis of Fuzzy Cognitive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ryan Schuerkamp and Philippe J. Giabbanelli
87
6
Extensions of Fuzzy Cognitive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Ryan Schuerkamp and Philippe J. Giabbanelli
7
Creating FCM Models from Quantitative Data with Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 David Bernard and Philippe J. Giabbanelli
8
Advanced Learning Algorithm to Create FCM Models From Quantitative Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Agnieszka Jastrz˛ebska and Gonzalo Nápoles
9
Introduction to Fuzzy Cognitive Map-Based Classification . . . . . . . . 165 Agnieszka Jastrz˛ebska and Gonzalo Nápoles
ix
x
Contents
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Gonzalo Nápoles and Agnieszka Jastrz˛ebska Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
Contributors
David Bernard CNRS UMR5505, IRIT, Artificial and Natural Intelligence Toulouse Institute, University Toulouse Capitole, Toulouse, France Kelsi Furman Smithsonian Environmental Research Center, Edgewater, MD, USA Philippe J. Giabbanelli Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA Steven Gray Department of Community Sustainability, Michigan State University, East Lansing, MI, USA Agnieszka Jastrz˛ebska Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland Antonie Jetter Department of Engineering & Technology Management, Portland State University, Portland, OR, USA C. B. Knox School for Environment and Sustainability, University of Michigan, Ann Arbor, MI, USA Gonzalo Nápoles Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands Ryan Schuerkamp Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA
xi
Acronyms
ABM
FCM GA
ML
Agent-Based Modeling. An individual-level, simulation technique that explicitly represents each entity, interactions between entities, and interactions between entities and their environment. In the context of this book, it is a technique that can be combined with Fuzzy Cognitive Maps. See Chap. 4 for examples. Fuzzy Cognitive Map. This is the focus of the book. See Chap. 1 for definitions. Genetic Algorithms. In the context of this book, it is a technique that can automatically create Fuzzy Cognitive Maps from data. See Chaps. 7, 9 and 10 for examples. Machine Learning. In the context of this book, it serves to either create Fuzzy Cognitive Maps from data or use them as classification models.
xiii
Chapter 1
Defining and Using Fuzzy Cognitive Mapping Philippe J. Giabbanelli, C. B. Knox, Kelsi Furman, Antonie Jetter, and Steven Gray
Abstract This chapter lays the foundations for the book by answering two essential questions: what are Fuzzy Cognitive Maps, and why do we use them? We show that there are three different definitions, depending on the focus of a study: mental models that are aligned with how knowledge is stored in human memory, mathematical objects akin to recurrent neural networks but with meaningful concepts, or discrete simulation models capable of performing what-if scenarios. Since these definitions are tightly coupled with applications, we then propose a typology of tasks, including the use of FCM to support learning or as expert systems. This chapter will guide readers in identifying the strand of literature most relevant to their own purpose.
1.1 Introduction To those who live in it, a house is a home. Rooms have names such as a ‘family room’ or a ‘living room’ and are associated with familiar activities. Visuals of the house are images shared on social media. For the treasurer, a home is a parcel characterized by P. J. Giabbanelli (B) Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA e-mail: [email protected] C. B. Knox School for Environment and Sustainability, University of Michigan, Ann Arbor, MI, USA e-mail: [email protected] K. Furman Smithsonian Environmental Research Center, Edgewater, MD, USA e-mail: [email protected] A. Jetter Department of Engineering & Technology Management, Portland State University, Portland, OR, USA e-mail: [email protected] S. Gray Department of Community Sustainability, Michigan State University, East Lansing, MI, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_1
1
2
P. J. Giabbanelli et al.
Fig. 1.1 An FCM can be represented in several ways. When working with participants, node-andlink diagrams (a) contribute to the transparency of the model. For analytical purposes or computations, data may be stored in matrices (b)
appraised and taxable values. Rooms are numbers, and visuals are blueprints. Both views of the system coexist and are true for those involved. In the same manner, there are multiple definitions for Fuzzy Cognitive Maps (FCMs). Domain experts and individuals who live in a system build FCMs because it is an intuitive technique, producing visuals such as node-and-link diagrams (Fig. 1.1a) where concepts have clearly interpretable meanings. From a technical standpoint, FCMs are matrices of numbers (Fig. 1.1b), which represent weights that can be updated via simulations or optimized to achieve a desired system outcome. These definitions coexist in the literature, depending on whether a study applies the technique to build an FCM (e.g., participatory modeling) or improves on the technique. Our book gathers practitioners, simulationists, and mathematicians under one roof to study Fuzzy Cognitive Mapping. It is thus important to be aware of the different equivalent definitions, as each field can benefit from others. Practitioners may build an FCM that does not converge, thus yielding inconclusive results for their interventions; by knowing about convergence from a mathematical standpoint, they can find solutions and initiate fruitful partnerships. Simulationists may propose new mechanisms for social simulations where individuals’ mental models are expressed as FCMs, but evaluating their case study will require real-world data. By identifying the characteristics of relevant studies, they can create meaningful case studies that offer new opportunities for practitioners. Since defining FCMs depends on how we intent to use them, this introductory chapter covers these intertwined notions. We start with the intuitive notion of an FCM as a graphical model in participatory modeling, then we treat FCMs as mathematical objects specified by equations, and finally we focus on FCMs as simulation tools that operate through algorithms. We emphasize that these definitions are equivalent and merely seek to emphasize different aspects, often found in separate strands of the literature. Building on these definitions, we introduce a typology of tasks (i.e., the reasons for which we create FCMs) that distinguish between FCMs as mental models that seek to
1 Defining and Using Fuzzy Cognitive Mapping
3
externalize the qualitative views of individuals and FCMs as machine learning tools whose accuracy is evaluated using numerical data. Readers interested in mental models are encouraged in continuing with the next chapter, which details how to externalize individual perspectives through workshops or interviews. Readers who focus on machine learning will find the last four chapters instructive, through their focus on evolutionary algorithms and classification.
1.2 Three Equivalent Definitions 1.2.1 FCMs as Mental Models The practice of capturing human knowledge as FCMs is grounded theoretically in constructivist psychology and the idea that individuals organize knowledge and information into mental representations which are used to understand the world around them [7, 15, 44]. These internal abstractions of the real world, known as an individual’s “mental model,” can be externally represented as cognitive maps. Political scientist Robert Axelrod introduced cognitive mapping in 1976 as a method for mapping the reasoning of decision-makers about social and political systems [1, 21]. Kosko built on this earlier work and developed “Fuzzy” Cognitive Maps. Cognitive maps and FCMs both capture the structure of complex systems through concepts and the causal connections between them: connections have directions (indicating a sequence of causation), a positive or negative sign (indicating the nature of the effect, namely an increase or decrease), and can have a weight (indicating the strength of the increase or decrease, e.g. strong, medium, weak). However, Bart Kosko expanded on Axelrod’s earlier work and provided a solution for dynamic inference, making it possible to ask “what if?” questions about how concepts in a system react to change. Kosko envisioned FCM as novel method for expert system design that can incorporate knowledge from different experts and that can be updated easily. On this basis, an FCM in the context of mental models can be defined as follows [16]: Definition 1.1 An FCM is a semi-quantitative1 representation of individual and/or group knowledge structures consisting of variables and their causal relationships, which are directional and weighted. An FCM uses common language to help individuals interpret and express the complexity of their environment and experiences by combining their knowledge, preferences and values with quantitative estimations of the perceived relationships between variables within a context. 1 The semi-quantitative aspect emphasizes that the values of variables and causal relationships are only supposed to be interpreted in relative terms. Although inputs and outputs are normalized, the model is not designed to provide physical units. For example, we may see that a concept such as ‘forest fires’ increases more under scenario A than scenario B, but we do not forecast the exact number of trees burnt or the rate of heat transfer per unit length of the fire line (‘fireline intensity’). FCMs thus support a “semi-quantitative assessment of the outcome of hypothetical adaptation options” [36]. The semi-quantitative aspect is discussed in details by Kok [20].
4
P. J. Giabbanelli et al.
Researchers who use FCMs as mental models often emphasize that they provide a “graphical representation of the system” [37]. It is indeed commonplace in applied research to see a visualization of an FCM in the form of a node-and-link diagram, particularly using the format produced by MentalModeler; see for example Fig. 1 in [35] or Fig. 3 in [31]. While the use of easily interpretable diagrams is an undeniable benefit of the technique when working with people, we caution against defining FCMs as graphical representations: saying that an object can be represented as a picture does not mean that it is a picture. Three arguments are important in this regard. First, defining FCMs as graphical representations would be misleading: a ‘graphical representation’ is normally the role of a conceptual model such as cognitive maps or causal maps. Such representations only capture the structure of a system, hence they can tell which concepts may be impacted by an intervention but they would be incapable of quantifying the impact. FCMs extend such representations2 through dynamic inference mechanisms, which mean that numbers can be updated so that we can estimate by how much some concepts would change. Second, a visual may accompany an FCM for the sake of illustration, but it is not required. An FCM could also be provided as an Excel spreadsheet or in the form of computer code, which demonstrates that it exists without needing a graphical representation. Third, an FCM is a unique model of a system: if we are presented with two FCMs, we can examine whether they are the same. In contrast, graphical representations of FCMs are not unique. For example, Fig. 1.2b is the same FCM as in Fig. 1.1a, with concepts simply moved around. Background: Representing learned knowledge with maps Two arguments underpin the externalization of an individual’s mental model as a map and, by extension, as an FCM. First, there is the notion that knowledge is acquired and stored in a structure [16]. Constructivist psychology proposes a learning theory in which individuals actively build their own understanding of the world by creating mental systems that organize and interpret their experiences. The well-known cognitive stage theory of Piaget also emphasized that there are logical structures in learning, which develop as a result of assimilation (fitting new experiences with existing concepts) and accommodation (fitting concepts with new experiences) [17]. Psychological concepts such as cognitive map, schema, abstract task structure, or categorization are also increasingly studied in neurophysiological research with a focus on mechanisms of neural representations of knowledge [19]. Second, there is the postulate that maps are aligned with the knowledge structures held by individuals. This alignment is important as it makes knowledge elicitation easier, opens access to associated knowledge, and supports the comparison of knowledge structures among individuals [25]. The knowledge structures of interest for model building are held in the semantic memory of individuals, which store their conceptualization of the 2
Readers familiar with System Dynamics (SD) will find the same situation in this field: SD extends the conceptual model of causal loop diagrams by adding the ability to compute numbers over time. An SD may be represented by a visual (stock and flow diagram) for convenience, but it is also specified by a system of differential equations.
1 Defining and Using Fuzzy Cognitive Mapping
5
world as part of their long-term memory. Semantic memory is an abstract structure that provides functional relationships between objects, hence many methods aim at constructing networks of relationships [13, pp. 24–28]. Given the explosion of cognitive network science in the 2010s, several retrieval processes have been proposed to quantify how the activation of a concept would propagate to its neighbors within semantic memory [22].
1.2.2 FCMs as Mathematical Objects A mathematical approach to FCMs emphasizes their similarities with neural networks [29, 39]. Since readers may have gained familiarity with Recurrent Neural Networks (RNNs) given the growth of AI and deep learning, it is important to clarify similarities and differences with FCMs. Both RNNs and FCMs are dynamical systems consisting of neural entities, which can be activated (or ‘fire’) and then propagate a signal to their neighbors. The propagation can eventually come back to the same neuron, hence the notion of recurrence. Consequently, such systems often iterate until they have stabilized (i.e., converged to a fixed-point attractor) or reached a maximal number of iterations/cycles, rather than for a set period of time. Indeed, RNNs and FCMs are both driven by iterations rather than time. These models can be trained from data and they are used in machine learning for their ability to predict patterns such as categorical outcomes (‘classifiers’). Several key differences must be noted between RNNs and FCMs. First, an FCM is designed to be interpretable (Fig. 1.2) because its neural entities have been ascribed clear meanings via a label (e.g., ‘economic stimulus’, ‘density of juvenile pikes’). Since the concepts are interpretable, their activation values are also interpretable (e.g., ‘high economic stimulus’, ‘low density of juvenile pikes’). Second, the number of such entities is usually small enough to read all of them, with as FCMs rarely exceed 100 concepts. In contrast, an RNN is a more black-box model where neurons are not designed to be individually interpreted; rather, they are collectively organized in layers designed to accomplish certain tasks. While RNNs can have very large number of neurons (e.g., millions or billions) and thus may required significant compute resources, the relatively small size of FCMs allows to compute them quickly on ubiquitous devices such as personal laptops. Third, while RNNs are trained from large datasets, an FCM can be developed from data sources of different types and sizes: it may be obtained through qualitative data originating from workshops with participants (see next chapter), tweaked with a small quantitative dataset, or entirely derived from a larger quantitative dataset (see last three chapters). From a mathematical perspective, we define an FCM as follows [11, 30]: Definition 1.2 An FCM is a nonlinear dynamical system consisting of meaningful neural entities (‘concepts’). Its structure is a fuzzy signed digraph with feedback, where concepts and edges have values. FCMs are knowledge-based recurrent neural
6
P. J. Giabbanelli et al.
networks, and as such, they perform an iterative reasoning process where concepts recurrently interact by updating their activation values from an initial condition. The neural entities of an FCM are updated at each iteration. The update can be summarized as follows: “the activation degree of each map neuron is given by the value of the transformed weighted sum that this processing unit receives from connected neurons on the causal network” [29]. There are two important elements to unpack in this process (Fig. 1.3): the weighted sum, and the transformation. To express them mathematically, we briefly introduce the following notation. At the .tth iteration, the activation vector (i.e., the value of all nodes in the FCM) is denoted by (t) .A = (a1(t) , . . . , ai(t) , . . . , a (t) N ), where . N is the total number of nodes. For example, given Fig. 1.1, we would have the vector (0.3, 0.15, 0.25, 0.45) because the value of the first neuron is 0.3, the next neuron is 0.15, and so on. The weight matrix is denoted as .W N ×N . This matrix is shown in Fig. 1.1b and its content is fixed for a given FCM. Positive weights in the matrix such as .w2,1 specify a causal increase (if .i increases then . j increases) and negative weights such as .w3,4 specify a causal decrease (if .i increases then . j decreases). Given this notation, the update of an FCM can be specified as follows: (t+1)
A
.
(t)
=A
·W = f
(∑ N
a (t) j
) × w j,i
(1.1)
j=1
Fig. 1.2 A Recurrent Neural Network a can have a large number of neurons, which are not individually interpretable. A Fuzzy Cognitive Map b typically has a small number of neurons, whose meaning is expressed via a label. Visualizations of an FCM usually include information on the edge weights, either by stating them exactly (as shown here) or by using edge colors to indicate values (e.g., red for negative and green for positive, hue for intensity)
1 Defining and Using Fuzzy Cognitive Mapping
7
Fig. 1.3 The first computational model of a neuron (MuCulloch-Pitts) introduced a two steps process: aggregation followed by an activation function. For FCMs, the aggregation involves a weighted sum of incoming nodes and their edge weights (a). There are many possible choices for activation choices, among which the sigmoid or hyperbolic tangent (b)
∑ The weighted sum . Nj=1 a (t) j × W j,i is obtained by adding the value of each incoming neighbor multiplied by the edge’s weight. For instance, in Fig. 1.1, assume that we wish to compute the next value of .c2 . The influence provided by neighbors would be .a1(t) × w1,2 + a4(t) × w4,2 = 0.3 × 0.7 + 0.45 × 0.45 = 0.4125. If a modeler does not wish the value of a concept to be retained at the next time step (i.e., the matrix has only 0’s on the diagonal), then the value of .c2 would be erase and replaced by the new one. Conversely, if a modeler wishes to introduce memory in the system (e.g., .w2,2 = 1) then we would include the value of .c2 into the sum. This weighted sum may go out of the range required for a node, hence an activation function . f maps the result onto a desired range. Equation 1.1 is performed as long as the system has not stabilized (given a userspecified threshold .E) and we perform less than a user-specified maximum number of iterations .tmax (to prevent a chaotic attractor). A modeler may require that all neurons must stabilize, or that only a subset of neurons . O (selected as the outputs of the system) need to stabilize. Consequently, the overall behavior of the system is specified as follows: { ∃i ∈ O such that |ai(t+1) − ai(t) | > E, and .Apply Eq. 1 while (1.2) t < tmax The mathematical literature on FCMs always contain update functions involving neighbors’ node values and edge values, activation functions (also called ‘clipping’ or ‘transfer functions’), and iterations. However, the literature uses different notations for these operations, may use variations of the update equation depending on
8
P. J. Giabbanelli et al.
the system modeled, and even defines FCMs by arranging components differently.3 When readers come across a paper from a new research group, we thus suggest that they familiarize themselves with the notation used by identifying the symbols that denote familiar concepts: node values, edge values, activation functions.
1.2.3 FCMs as Simulation Tools As explained by Swarup, “a common (mis)conception is that the goal of modeling and simulation efforts should be to design the perfect simulation” [40]. Creating the perfect model is an unattainable goal as it would require perfect measurements (which may no longer reflect reality by the time we are done measuring it), or unlimited resources to create and run a model as complex as reality. Instead of chasing dreams of ideal models [12], the field of Modeling & Simulation focuses on methods that create computational abstractions of a problem that are adequate, that is, fit for purpose. Although there are many reasons to engage in a modeling exercise (e.g., guiding data collection, learning as a team) [10], the purpose of the model itself can be broadly classified as explanatory or predictive. An explanatory model proposes an ‘explanation’, hence it is a theory-building approach that focuses on why a certain phenomenon occurs. Such models show that applying transparent rules (i.e., the theory) from an initial state can successfully reproduce the desired consequences. For example, Schelling’s model of segregation started with a mixed population, used the theory that individuals relocate when a few of their neighbors no longer look like them, and successfully reproduced the segregation patterns observed in some contemporary societies. Such a model does not claim that its theory is the only valid one: rather, it establishes one potential causal chain [9]. A predictive model may be misunderstood as a crystal ball that tells us the future. In this respect, several analyses have showed that many models fail at predicting the future [9] and COVID-19 models are providing abundant cautionary tales of such failures [6]. Rather, we emphasize the role of predictive models for interventions: such models can serve as virtual laboratories to test scenarios and evaluate them with respect to a baseline case (e.g., business as usual). There is no claim that the baseline case accurately predicts the future of the world: the emphasis is instead on the ability to identify the best scenario. For example, a COVID-19 model may project twice as many infections as in reality, but it can still serve its purpose if it helps us to decide between vaccination or no vaccination. In a simulation context, FCMs are primarily used as predictive models for interventions. For instance, if we wish to promote sustainable food consumption [26], an FCM can project the effects of various interventions (e.g., information campaigns, 3
Mathematicians have proposed different ways to package the various components of an FCM into a single object, ranging from 3 tuples [11] to 6 tuples [42]. For example, we can use a 3-tuple stating that . F (t) = (A(t) , W, f ) defines an FCM at iteration .t [11]. We could also use a 4-tuple by separating the neurons (and their labels) from their values [27]. We could also replace. f by the entire Eq. 1.1, which would allow different FCMs to be governed by different update mechanisms [14].
1 Defining and Using Fuzzy Cognitive Mapping
9
Fig. 1.4 Organization of methods within Modeling & Simulation. Readers interested in the differences between aggregate/individual or qualitative/quantitative models can find detailed explanations in [5]
small-scale farming) onto variables of interest (e.g., improving customers’ attitude, addressing supply chain issues). Because an FCM does not have time and all variables are scaled in the same interval, the model does not predict point estimates such as “the environment will improve by 5.36% by June 1st 2061”. Rather, the model may tell us that one intervention reduces supply chain issues more than another intervention. Models do not make decisions, but people do. FCMs thus serve the role of decision-support systems and their transparency helps stakeholders understand how the result was obtained, which is important to secure buy-in into the implementation of interventions. As an aggregate-level model (Fig. 1.4), an FCM tracks an overall system instead of the interactions between entities of the system. For example, an FCM can model fisheries by having one concept for the number of adult fish, another for juvenile fish, and a third for fishermen. In contrast, an individual-level model could explicitly track each entity by simulating the action of each fish and fishermen. The characteristics abovementioned help to situate FCMs among other aggregate modeling techniques. In particular, FCMs are based on discrete iterations or ‘time ticks’ (i.e., value of each concept at .t = 0, .t = 1, and so on) that do not correspond to physical time, in contrast with System Dynamics where rates are expressed in physical units. FCMs are deterministic: given the initial values of each concepts, the same output will be
10
P. J. Giabbanelli et al.
obtained regardless of the number of simulations. Other models may be stochastic as they employ probabilities, which creates the possibility of a different outcome for the same initial case. Given this context, FCMs can be defined in M&S as follows: Definition 1.3 An FCM is an aggregate simulation model consisting of two components. The structure is a directed, labeled, weighted graph. Nodes and edges have values in a bounded range. A synchronous4 and deterministic update changes the nodes’ values over discrete iterations based on (i) their current value, (ii) the value of their incident nodes,5 (iii) the weight of the incident edges, and (iv) an activation function that keep the result within the desired bounded range. In the same way as a mathematical definition is precise thanks to the use of equations, a computational definition clarifies processes by expressing them through algorithms. Algorithm 1 defines the update for the concept values of an FCM and can be understood as follows [45]: The update is applied iteratively (lines 3–4) until one of two stopping conditions is met. The desirable condition is that stabilization has occurred, as target nodes of interest (i.e., outputs of the system) are changing between two consecutive steps by less than a user-defined threshold (lines 5–6). However, depending on conditions such as the choice of activation function . f , the system may oscillate or enter a chaotic attractor. The second condition thus sets a hard limit on the maximum number of iterations (line 2), which is akin to setting ‘emergency brakes’ on the execution of an algorithm.
Algorithm 1: SimulateFCM 1 2 3 4
Input: input vector A of N concepts, adjacency matrix W , activation function f , max. iterations tmax , output set O, threshold E : out put 0 ← A for step ∈ {1, . . . , tmax } do for i ∈ {1, . . . , N } do ∑ step step−1 step−1 out puti ← f (out puti + nj=1 out put j × W j,i ) step
if ∀o ∈ O, |out puto return out put 7 return out put 5 6
step−1
− out puto
| < E then
4 A synchronous update means that the value of all concepts at step .t is based exclusively on the values of these concepts at step .t − 1. In other words, all concepts change values together when the iteration is complete. The alternative in the M&S literature is an asynchronous update, which can be performed in many ways such as visiting the concepts in random order and updating them as we go. Asynchronous updates can be used for natural systems since their behavior is not governed by a ‘global clock’, but this is explored in individual M&S paradigms [23] rather than in aggregate approaches such as FCMs. While asynchronous concept activation in FCMs can be seen in some studies, it is because a machine learning is built on top of the FCM and modifies its operations [33]. 5 From a network standpoint, ‘incident nodes’ can also be called ‘incoming neighbors’. This should not be mistaken for a distance-based notion of neighbors, as found in clustering algorithms. Rather, a neighbor within a network setting is a node that shares an edge.
1 Defining and Using Fuzzy Cognitive Mapping
11
1.3 A Typology of Uses Over the last 30 years, Fuzzy Cognitive Maps (FCMs) have been used to develop semi-quantitative, structured representation of knowledge about complex and dynamic systems. Because of their flexibility, FCMs can be constructed and analyzed in a variety of ways and for different purposes. For example, to represent system knowledge embedded in digital data sources, FCMs can be constructed using Natural Language Processing, a subfield of AI. To capture system knowledge from people directly, FCMs can be based on narrative interviews, focus groups, or developed by participants themselves, who represent their own understanding. FCMs can be analyzed to address a range of research questions and to obtain “systems-based” insights about diverse topics. However, many of the published results provide little insights into the core assumptions and methodological choices of the authors. We identify core approaches to FCM projects, which can be differentiated by whether FCMs are meant to represent “objective” or “subjective” reality, if the FCM represents specialized expertise versus the shared knowledge of many, and if it serves as boundary objects to promote social learning, with an emphasis on process, or if the FCM model itself is the most important outcome. In this section, we address the theoretical, conceptual and methodological issues involved with these choices, with the goal to better align the disparate fields that use FCM including AI, Cognitive Science, Social Science, and Network Science. Ever since the early days of FCM research, FCM practice has one of several motivations, with implications for the type of knowledge that is captured. Similar to Axelrod’s research on political elites, FCMs are often used to understand how a specific person or group of people thinks about a problem and, accordingly, what they might prefer or do. Aligned with Kosko’s work, FCMs are also used to represent valuable (and often scarce) expert knowledge—held by single experts or aggregated from many—as a model, so that it becomes available to non-experts. Additionally, FCM models are used as a boundary object, that facilitates knowledge exchange, meaning making, and ultimately social learning. Table 1.1 brings these perspectives together.
1.3.1 FCMs as Expert Systems While expertise is often meant to be reserved for people with high levels of specialized training, vast experience in similar tasks, and tacit knowledge, expert knowledge in FCM projects is more broadly defined to also include traditional or local experts, whose lived experiences are unique and offer important information about the complex problem or system researchers seek to represent. In some cases, expertise is sought to create a single, “true” representation of the system of interest with the goal to make it available for further research, analysis, and decision making. For example, Aminpour et al. [4] collected FCMs from different fishery stakeholders, including
12
P. J. Giabbanelli et al.
avid recreational anglers, commercial fishermen and fishery policy-makers of the striped bass fishery off the coast of Massachusetts to better understand how warming ocean waters might impact ecological and economic dimension of the fishery. In other cases, expertise relates to specific ways of thinking, interpretations, goals, or value systems that apply to a particular setting or community. In these cases, expert insights provide a lens on a complex system that is subjective and different from other groups, but useful because these differences matter for decision making, planning, or foresight. For example, Nyaki et al. [32], compared the FCMs of the bushmeat trade in Tanzania from several different villages to better understand social, economic and ecological dimensions across different locations. Table 1.2 provides an example breakdown of the different tasks involved in creating FCMs as expert systems and time requirements for these tasks. Tips If you wish to gather human data to create FCMs as expert systems, then we recommend planning the data collection process beforehand. The next chapter details the key considerations for data collection. The data collection methodology should meet the analysis requirements, not the other way around. For example, it can be tempting to compare different data sets on similar topics or include pilot data, but if the interview protocol is different then the two might not be theoretically sound to compare. Considerations such as including definitions of some predefined concepts, starting with a base map, or reorganizing the order of questions can result in significantly different FCMs.
Table 1.1 Common uses of collecting FCMs for different research purposes or to better understand different problems Type of application Goal Knowledge Expert system
True expert
How people think
Collective intelligence
Boundary object
Prediction model
Approximate a representation of reality What people know that might influence behaviors/decisions Approximate a representation of reality What knowledge in a group is shared or not shared Machine learning tasks (e.g., classification, regression)
Objective
Subjective
Objective
Subjective
Objective
1 Defining and Using Fuzzy Cognitive Mapping
13
Table 1.2 Example tasks for an FCM study and time estimates they generally take to complete Stage Task Estimated time required Research design Protocol development Participant elicitation
Data collection
Interviews/Group modeling sessions
Data analysis
Data cleaning
Map analysis
Narrative analysis
Products
Writing papers or Summary reports Dissemination
2–10 h, depending on familiarity with FCM and study context 1–5 h, depending on existing social capital, access to and ease of reaching participants 1–2 h per individual interview 1–5 hours per group session 3–5 h for concept standardization .∼1.5x recorded time for transcribing with AI programs, longer by hand 1–2 h for simple structural metrics (e.g., density, size) 4–5 h for model complex metrics, given existing code (e.g., micromotifs, clustering) 30+ h for more complex analysis incl. detailed qualitative analysis, depending on number of maps collected Highly variable, depending on type and level of analysis. For example, hand coding for qualitative content analysis takes 0.5x-1x recorded time per session 20–40 h 10+ h, particularly with communityand stakeholder-engaged research, we highly recommend public-facing result dissemination events and other accessible products
1.3.2 FCMs in Collective Intelligence If one accepts that different experts and groups of experts hold relevant but partial (and thus ultimately subjective) knowledge, the logical next step is to overcome the limitations of partial expertise, represented by different FCMs, by merging them into a single aggregate FCM model that better approximates the real world. For example, Aminpour et al. [3] integrated different angler stakeholders FCMs of freshwater lakes that, when combined, represented a singular model that was similar to the model of scientific experts of freshwater ecology. The idea that a better knowledge base is achieved collectively rather than individually is one of the tenets of collective intelligence. This is closely related to the notion of ‘wisdom of crowds’, which sets
14
P. J. Giabbanelli et al.
several criteria (e.g., diversity of opinion) under which a crowd can make better decisions than any individual member of the group [2]. Tips To claim the aggregated model represents and integrates diverse sources of knowledge, there needs to be sufficient representation from each stakeholder group in a domain. Care should be taken not to over or underweight opinions or perspectives by having imbalanced representations. Some groups will be harder to reach, which either needs to be considered in the recruitment stage, or addressed in the analysis. This example is particularly relevant for engaged FCM research where the results will directly impact decision-making. Whose voices are being privileged? Who isn’t in the room and might not be represented? Why weren’t they invited?
1.3.3 FCMs as Boundary Objects to Support Learning Some studies that create FCMs with people are less interested in the benefits of the resulting model but derive benefit from the process of modeling. They value FCMs because of their ability to serve as an external representation of implicit knowledge that facilitates discourse and discussion not likely to occur without an artifact that can be reviewed, interpreted, and improved [43]. For example, Huang et al. [18] found, when working with novice community groups and comparing FCMs over time, citizens tended toward consensus of a water quality issue when afforded the tools of model-based reasoning and deliberation when FCM was used as a boundary object.
1.3.4 FCMs as Prediction Models The intrinsic interpretability of FCMs has motivated researchers to use them in machine learning tasks such as classification [41], regression [38], and time series forecasting [28] where historical data is available. In regression and classification tasks, we need to divide the variables describing the problem into inputs and outputs. Therefore, the goal consists of creating an FCM model able to compute the right values for the output variables using the values of input variables as the initial conditions to start the reasoning process. Naturally, the concepts denoting output variables do not receive any initial activation value since their values will be computed through the reasoning process. In time series forecasting tasks, one might distinguish between input and output variables or assume that all variables are inputs and outputs at the same time. What is common in all these models is that hidden
1 Defining and Using Fuzzy Cognitive Mapping
15
neurons are not allowed, as opposed to traditional (deep) neural networks. Moreover, we need a supervised learning algorithm to build the weight matrix from available historical data. In this process, domain experts are typically not involved in the network construction process due to the complexity of estimating weights that map the inputs into the desired outputs. Note that FCM models devoted to prediction tasks tend to be larger. However, experts could define design constraints to produce meaningful weights for the problem domain (e.g., the weights in the main diagonal of the weight matrix must be zero). The fact that experts can inject knowledge into the model by defining these constraints or portions of the weight matrix allows for hybrid intelligence (also referred to as human-in-the-loop). This feature is difficult to obtain with other machine learning models since they compute the parameters defining the models from the available data and nothing else.
Exercises 1. For the sake of simplicity, we provided three different but compatible definitions of FCMs by examining one application domain at a time: participatory modeling, mathematics, Modeling and Simulation (M&S). In practice, researchers may operate in multiple domains together. For example, researchers stated that FCMs “constitute a neuro-fuzzy modeling methodology that can simulate complex systems” [34]. Which two perspectives are jointly used in this definition? 2. a. Contrast Fuzzy Cognitive Mapping with System Dynamics (SD) by explaining which parts of their definitions are (i) shared and (ii) different. b. Read the tutorial on SD by Crielaard et al. [8]. Which parts of the process are similar to how FCMs are built, as explained in Chap. 2? 3. After reading the overview by Liu [24], contrast Fuzzy Cognitive Mapping with Bayesian Networks. 4. Implement Algorithm 1 in native Python (i.e., using built-in lists and/or dictionaries), and in an optimized Python version using the Numpy library. 5. Find one article that uses Fuzzy Cognitive Maps in each of the following journals: Applied Soft Computing, Environmental Modelling & Software, Expert Systems with Applications, Information Sciences, Marine Policy, Neural Computing and Applications, Proceedings of the National Academy of Sciences. Which of the three definitions from this chapter (Sect. 1.2) is most related to the article that you found? Which of the uses (Sect. 1.3) is most salient in the article?
16
P. J. Giabbanelli et al.
References 1. J. Aguilar, A survey about fuzzy cognitive maps papers. Int. J. Comput. Cognit. 3(2), 27–33 (2005) 2. P. Aminpour, S.A. Gray, A.J. Jetter, P.J. Giabbanelli, Is the crowd wise enough to capture systems complexities? An exploration of wisdom of crowds using Fuzzy Cognitive Maps, in Proceedings of the 9th International Congress on Environmental Modelling and Software (iEMSs), section on Integrated Social, Economic, Ecological, and Infrastructural Modeling (2018) 3. P. Aminpour, S.A. Gray, A.J. Jetter, J.E. Introne, A. Singer, R. Arlinghaus, Wisdom of stakeholder crowds in complex social-ecological systems. Nat. Sustain. 3(3), 191–199 (2020) 4. P. Aminpour, S.A. Gray, A. Singer et al., The diversity bonus in pooling local knowledge about complex problems. Proc. Natl. Acad. Sci. 118(5), e2016887118 (2021) 5. J. Badham, A compendium of modelling techniques (2010). https://pure.qub.ac.uk/en/ publications/a-compendium-of-modelling-techniques 6. V. Chin, N.I. Samia, R. Marchant et al., A case study in model failure? covid-19 daily deaths and icu bed utilisation predictions in NewYork state. Eur. J. Epidemiol. 35, 733–742 (2020) 7. K.J.W. Craik, The Nature of Explanation, vol. 445. CUP Archive (1967) 8. L. Crielaard, J.F. Uleman, B.D.L. Châtel, et al., Refining the causal loop diagram: A tutorial for maximizing the contribution of domain expertise in computational system dynamics modeling. Psychol. Methods (2022) 9. B. Edmonds, Different modelling purposes. Simulating Social Complexity: A Handbook (2017), pp. 39–58 10. J.M. Epstein, Why model? J. Artif. Soc. Soc. Simul. 11(4), 12 (2008) 11. P.J. Giabbanelli, M. Fattoruso, M.L. Norman, Cofluences: Simulating the spread of social influences via a hybrid agent-based/fuzzy cognitive maps architecture, in Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (2019), pp. 71–82 12. P.J. Giabbanelli, A.A. Voinov, B. Castellani, P. Törnberg, Ideal, best, and emerging practices in creating artificial societies, in 2019 Spring Simulation Conference (SpringSim). (IEEE, 2019), pp. 1–12 13. P.J. Giabbanelli, Computational models of chronic diseases: understanding and leveraging complexity. Doctoral Thesis, Simon Fraser University (2014) 14. R. Gras, D. Devaurs, A. Wozniak, A. Aspinall, An individual-based evolving predator-prey ecosystem simulation using a fuzzy cognitive map as the behavior model. Artif. Life 15(4), 423–463 (2009) 15. S.A. Gray, S.R.J. Gray, J.L. De Kok, et al., Using fuzzy cognitive mapping as a participatory approach to analyze change, preferred states, and perceived resilience of social-ecological systems. Ecol. Soc. 20(2) (2015) 16. S.A. Gray, E. Zanre, S.R.J. Gray, Fuzzy cognitive maps as representations of mental models and group beliefs, in Fuzzy Cognitive Maps for Applied Sciences and Engineering: From Fundamentals to Extensions and Learning Algorithms. (Springer, 2013), pp. 29–48 17. H. Heft, Environment, cognition, and culture: reconsidering the cognitive map. J. Environ. Psychol. 33, 14–25 (2013) 18. J. Huang, C.E. Hmelo-Silver, R. Jordan et al., Scientific discourse of citizen scientists: models as a boundary object for collaborative problem solving. Comput. Hum. Behav. 87, 480–492 (2018) 19. K.M. Igarashi, J.Y. Lee, H. Jun, Reconciling neuronal representations of schema, abstract task structure, and categorization under cognitive maps in the entorhinal-hippocampal-frontal circuits. Curr. Opinion Neurobiol 77, 102641 (2022) 20. K. Kok, The potential of fuzzy cognitive maps for semi-quantitative scenario development, with an example from Brazil. Global Environ. Change 19(1), 122–133 (2009) 21. B. Kosko, Fuzzy cognitive maps. Int. J. Man-machine Stud. 24(1), 65–75 (1986)
1 Defining and Using Fuzzy Cognitive Mapping
17
22. A.A. Kumar, M. Steyvers, D.A. Balota, A critical review of network-based and distributional approaches to semantic memory structure and processes. Top. Cognit. Sci. 14(1), 54–77 (2022) 23. J. Li, T. Köster, P.J. Giabbanelli, Design and evaluation of update schemes to optimize asynchronous cellular automata with random or cyclic orders in 2021 IEEE/ACM 25th International Symposium on Distributed Simulation and Real Time Applications (DS-RT). (IEEE, 2021), pp. 1–8 24. Z.-Q. Liu, Causation, bayesian networks, and cognitive maps. Acta automática sinica 27(4), 552–566 (2001) 25. M.D. McNeese, P.J. Ayoub, Concept mapping in the analysis and design of cognitive systems: a historical review. Applied Concept Mapping: Capturing, Analyzing, And Organizing Knowledge, vol. 47 (2011) 26. P. Morone, P.M. Falcone, A. Lopolito, How to promote a new and sustainable food consumption model: a fuzzy cognitive map study. J. Cleane. Prod. 208, 563–574 (2019) 27. G. Nápoles, I. Grau, R. Bello, M. León, K. Vahoof, E. Papageorgiou, A computational tool for simulation and learning of fuzzy cognitive maps, in 2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE). (IEEE, 2015), pp. 1–8 28. G. Nápoles, I. Grau, A. Jastrze˛bska, Y. Salgueiro, Long short-term cognitive networks. Neural Comput. Appl. 34(19), 16959–16971 (2022) 29. G. Nápoles, M. Leon Espinosa, I. Grau, K. Vanhoof, R. Bello, Fuzzy cognitive maps based models for pattern classification: Advances and challenges. Soft Computing Based Optimization and Decision Models: To Commemorate the 65th Birthday of Professor José Luis" Curro" Verdegay (2018), pp. 83–98 30. G. Nápoles, E. Papageorgiou, R. Bello, K. Vanhoof, On the convergence of sigmoid fuzzy cognitive maps. Inf. Sci. 349, 154–171 (2016) 31. M.A. Nozari, A.S. Ghadikolaei, K. Govindan, V. Akbari, Analysis of the sharing economy effect on sustainability in the transportation sector using fuzzy cognitive mapping. J. Clean. Prod. 311, 127331 (2021) 32. A. Nyaki, S.A. Gray, C.A. Lepczyk, J.C. Skibins, D. Rentsch, Local-scale dynamics and local drivers of bushmeat trade. Conserv. Biol. 28(5), 1403–1414 (2014) 33. E.I. Papageorgiou, P.P. Spyridonos, D.T. Glotsos, C.D. Stylios, P. Ravazoula, G.N. Nikiforidis, P.P. Groumpos, Brain tumor characterization using the soft computing technique of fuzzy cognitive maps. Appl. Soft Comput. 8(1), 820–828 (2008) 34. Y.G. Petalas, K.E. Parsopoulos, M.N. Vrahatis, Improving fuzzy cognitive maps learning through memetic particle swarm optimization. Soft Comput. 13, 77–94 (2009) 35. N. Rahimi, A.J. Jetter, C.M. Weber, K. Wild, Soft data analytics with fuzzy cognitive maps: modeling health technology adoption by elderly women. Advanced Data Analytics in Health (2018), pp. 59–74 36. D. Reckien, Weather extremes and street life in India-implications of fuzzy cognitive mapping as a new tool for semi-quantitative impact assessment and ranking of adaptation measures. Global Environ. Change 26, 1–13 (2014) 37. R.C. Rooney, J. Daniel, M. Mallory et al., Fuzzy cognitive mapping as a tool to assess the relative cumulative effects of environmental stressors on an arctic seabird population to identify conservation action and research priorities. Ecol. Sol. Evid. 4(2), e12241 (2023) 38. A.P. Rotshtein, D.I. Katielnikov, Fuzzy cognitive map vs regression. Cybernet. Syst. Anal. 57, 605–616 (2021) 39. C.D. Stylios, P.P. Groumpos, et al., Mathematical formulation of fuzzy cognitive maps, in Proceedings of the 7th Mediterranean Conference on Control and Automation, vol. 2014. (Mediterranean Control Association Nicosia, Cyprus, 1999), pp. 2251–2261 40. S. Swarup, Adequacy: what makes a simulation good enough? in 2019 Spring Simulation Conference (SpringSim). (IEEE, 2019), pp. 1–12 41. P. Szwed, Classification and feature transformation with fuzzy cognitive maps. Appl. Soft Comput. 105, 107271 (2021) 42. J. Tisseau, M. Parenthoën, C. Buche, P. Reignier, Comportements perceptifs d’acteurs virtuels autonomes. Technique et Science Informatiques 24, 1259–1293 (2005)
18
P. J. Giabbanelli et al.
43. M. van Vliet, K. Kok, T. Veldkamp, Linking stakeholders and modellers in scenario studies: The use of Fuzzy Cognitive Maps as a communication and learning tool. Futures 42(1), 1–14 (2010) 44. A. Voinov, K. Jenni, S. Gray, N. Kolagani et al., Tools and methods in participatory modeling: Selecting the right tool for the job. Environ. Model. Softw. 109, 232–255 (2018) 45. M.K. Wozniak, S. Mkhitaryan, P.J. Giabbanelli, Automatic generation of individual fuzzy cognitive maps from longitudinal data, in International Conference on Computational Science. (Springer, 2022), pp. 312–325
Chapter 2
Creating an FCM with Participants in an Interview or Workshop Setting C. B. Knox, Kelsi Furman, Antonie Jetter, Steven Gray, and Philippe J. Giabbanelli
Research design for FCM studies involving people is not always linear. Several decision factors can take priority over each other, such as identifying whose knowledge to represent, how the data will be collected, and the best process for what they are going to represent. In this chapter, we focus on matching the study goals or research questions with the context of the study and who the participants are and how (or if?) you have access to them to collect FCMs. The chapter starts by examining four significant choices for study design: (1) individual versus group modeling, (2) participant versus facilitator mapping, (3) hand-drawn models versus modeling software, and (4) pre-defined versus open-ended concepts. We explain advantages and disadvantages of common methods in the field for each of these aspects of study design, such that researchers can identify suitable methods for their application context. Then, we cover other considerations for data collection such as participant recruitment and best practices for facilitation. Upon completion of this chapter, readers would be able to draft a data collection plan and put it to practice. The hands-on exercises included at the end of this chapter provide scenarios to practice facilitation skills. C. B. Knox (B) School for Environment and Sustainability, University of Michigan, Ann Arbor, MI, USA e-mail: [email protected] K. Furman Smithsonian Environmental Research Center, Edgewater, MD, USA e-mail: [email protected] A. Jetter Department of Engineering & Technology Management, Portland State University, Portland, OR, USA e-mail: [email protected] S. Gray Department of Community Sustainability, Michigan State University, East Lansing, MI, USA e-mail: [email protected] P. J. Giabbanelli Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_2
19
20
C. B. Knox et al.
2.1 Decision Factors The FCM data collection protocol should fundamentally be built around the study goals or research questions, which can be influential in several ways beyond guiding which and how questions are asked. For example, the previous chapter showed that the goal of the study may be to capture subjective knowledge (what someone thinks they know) or objective knowledge (what someone actually knows—which is generally the method behind “expert modeling”). Depending on the specific purpose of a research application, the method and facilitation can be adapted to capture cognition from individuals by having participants create their own FCMs with limited or no input from researchers. Alternatively, if the goal is to accurately model a real-world system, a more structured, collaborative modeling session or integrating multiple individual models would be more appropriate. While the overarching research questions should guide the protocol, it is possible that other decision factors such as access to participants might take precedence. In particular, a study might want to capture collective knowledge, but participants are only able to be reached individually. This may be the case, for instance, if participants live far from one another, scheduling conflicts emerge, or language barriers exist. For instance, logistical difficulties for a project in French Guiana resulted in contacting some participants by phone or videoconference, while others engaged in face-to-face interviews [47]. Tips: spatial versus non-spatial systems Face-to-face interviews are common for socio-ecological research questions regarding specific areas, and participants often include residents interested in the environmental management of their region [29]. Specific groups may include fishermen, farmers, residents, or local government officials [7]. In contrast, research questions regarding the functioning of non-spatial systems (e.g., suicide [22], obesity [24]) may draw on national expert groups who can only be reached individually and remotely. Another limitation might be internet access, for example when working with participants living in remote areas like rural farmers [27]. This can prevent the use of online tools such as MentalModeler, thus shifting to other data collection platforms or hand-drawn models for data collection. This was the case for Furman et al. [19] while examining the impacts of Hurricane Irma on the community of Key West, Florida, as models were collected by conducting “intercept” surveys within the community in a post-disaster setting. Several additional examples are discussed in Table 2.1. While there may be flexibility in either the protocol or the focus of the research to find a theoretically grounded way to select FCM as the study method, it is important to keep in mind that it may not be the appropriate choice. Ultimately, it is up to the researchers to decide the appropriateness of their chosen methodology given the theory, goals, and the specific stakeholders they might be working with.
Either method could be used. Modeling by hand or via software may be advantageous given the physical space where data is being collected, participants’ technological literacy, resources for data analysis, etc. Open-ended concepts avoid biases, but comparing models becomes resource-intensive as equivalent terms must be identified. Open-ended concepts requires a facilitator to ensure that the same topics as discussed across participants Pre-defined concepts can be less resource-intensive for such comparisons and anchor participants in the same topics
Hand-drawn models may be useful in this context, as they allow participants more flexibility in visualizing how they understand the system. For some, using software will be a significant barrier
To avoid biasing participants to discuss particular concepts, using open-ended concepts would more accurately capture a mental model. For ill-understood systems, pre-determining concepts can be impossible
Hand-drawn vs modeling software
Pre-defined concepts versus open-ended concepts
With decentralized decision-makers, it might be more convenient for participants to do individual models. It could also be useful to avoid negative group dynamics, particularly if the problem is contentious As the study goal is to accurately model the system, facilitator mapping may be more useful. Depending on participant experience with modelling or the study design, participant mapping could also be effective
To capture mental models, individual modeling is recommended to ensure participants have time and space to express their knowledge and perspective
To model cognition, the appropriate method Facilitator mapping versus participant map- would be participant mapping or a hybrid ping approach where the facilitator recommends a structure based on answers
Individual modeling versus group modeling
Depending on the research questions, either could be appropriate. Alternatively, a hybrid approach can be used where the model is anchored by some core concepts, but participants can freely add more concepts
Technological constraints may limit data collection to hand-drawn, but generally using modeling software with groups is preferred so the model can be projected or screen-shared for all to see and collaborate
Either method could be used in this context. That decision should be informed by the experience of participants, time or resource constraints of the project, and specific research questions
Group modeling sessions would be convenient for both participants and the modeling team. Group modeling can also serve as a co-learning experience for the participants
Identity of participants and accessing them
The context of the study
Example: The context is finding solutions to a Example: Participants have monthly problem with decentralized decision-makers meetings to discuss the system being studied
The study goals or research questions
Example: The goal of the study is to capture mental models of an ill-understood system
Table 2.1 Answer first: what is the most limiting or important part of the study? This table provides examples of how such decision factors can influence study design
2 Creating an FCM with Participants in an Interview or Workshop Setting 21
22
C. B. Knox et al.
Some aspects of the protocol design might be facilitated or limited by the decision factors discussed above, but often there are also choices that can be freely made by the researchers. In this section, we explain the four key design choices when designing an FCM study with people and explore trade-offs of each.
2.1.1 Individual Versus Group Modeling One way to gather FCM data is to interview participants individually, generally with structured or semi-structured interviews that begin with a limited amount of central concepts related to the area of study. Individual mapping can be a suitable method of deep diving into topics, specific kinds of expertise, and an individual’s perspective on a topic. Individual interviewing provides more time for each participant to share their perspective and for their voices to be heard, as they have the time and space to share their knowledge. Interviewing study participants individually can also help reduce some biases, the tendency of people to under-report attitudes and behaviors they perceive to be socially undesirable [31], as social pressures that may exist during group modeling are removed. However, when compared to group modeling sessions, individual interviews are more time-intensive, as opposed to group modeling sessions, to collect data from the same number of people. When not synchronously collected, studies can also have individuals create models on their own given instructions on how to create an FCM and send the researchers the model file or upload it as part of a survey. A low-tech option illustrated in Aminpour et al. [2] consisting of (i) providing participants with standardized cards including concepts, arrows and degrees of influence; (ii) instructing them in-person on how to create a map; and (iii) asking them to create an FCM of a lake ecosystem. This approach to over 200 recreational fishermen. A more technology-focused option taken by Rooney et al. [43] (i) provided an instructional manual and accompanying video on how to build the FCM with yED; (ii) gave a graphml file with the key concepts that participants would directly edit; and (iii) asked participants to send back their edited file. Reusable Resource: Short Online Manual Researchers who wish to send a short online manual to participants so they independently create their map with MentalModeler can reuse the instructions in pp. 3–5 of the Appendix provided at https://www.pnas.org/doi/10. 1073/pnas.2016887118 (bottom of the page). The instructions contain links to two Youtube videos, lasting about 15 min in total. In contrast to individual FCM development, group modeling sessions provide opportunities for social connection, communication, and co-learning. This aspect is frequently featured in participatory studies on FCM, which seek to “investigate the current problem in an interdisciplinary and participatory manner and [reflect] on
2 Creating an FCM with Participants in an Interview or Workshop Setting
23
how this approach can foster co-learning, deliberation and shared understandings on emerging global wicked problems” [44]. However, negative group dynamics can occur which could lead to biased/inaccurate model creation. For example, power dynamics can be tricky to manage, and the model may over-represent one individual’s model at the cost of others contributing. Facilitators thus need to establish strategies that will mitigate the power asymmetries of different constituencies. In addition, some participants in group modeling spaces may be less likely to speak up, share their true thoughts, or might disengage if the topic shifts to something they do not care about. These potential challenges have been discussed in the broader literature on participatory modeling [26, 30]. Tips: the meaning of ‘workshops’ While a workshop generally means that several participants can attend, it does not always imply that they will create an FCM together. For instance, Eriksson et al. used workshops in the manner of ‘drop-in sessions’ via Zoom, allowing participants to attend any one of 10 identical sessions to hear an overview of the project, receive guidance on creating FCMs online, and then create their individual maps [14]. For this reason, we recommend terms such as ‘group modeling sessions’ to indicate that participants worked together during a workshop. Whether models are collected individually or in groups, the same basic steps apply (Procedure 1 and Fig. 2.1). First, define the main concepts of interest. These concepts will anchor the model. It is common that only a handful of such concepts are used. For instance, Nyaki et al. [38] was interested in understanding the drivers of the bushmeat trade in Tanzania and group modeling began simply with two concepts: “bushmeat hunting” and “zebra and wildebeest populations”. The main concepts may be identified by asking a simple question: what are the main measurable outcomes of the model? If an FCM serves to test the effect of interventions on whether a place qualifies as ‘smart city’, then ‘smart city’ would be the main concept [16]. If the goal of the FCM is to measure how many individuals have suicidal thoughts, attempts, or die, then there will be three core concepts of ‘suicide ideation’, ‘suicide attempt’, and ‘suicide fatality’ [22]. Second, build the structure by identifying additional concepts that connect to core, along with their relationships and strength. This can start by asking simple questions about the relationships between these starter concepts (as hunting increases, it as a strong negative impact on the wildlife populations?). Then, participants can identify more concepts related to one core concept, such as demand for food and cultural practices, among others. The process continues by asking about the other concept, that is, what influences wildlife populations. These questions would trigger a series of ecological factors which are represented as new concepts in the FCM. The process continues by considering factors further away (e.g., at distance 2 from the core, then 3, then 4, etc.). For example, availability of other food sources may decrease food demand which in turn reduces hunting, while rainfall influence migration patterns which in turn influences wildlife populations.
24
C. B. Knox et al.
Fig. 2.1 Example of a map built by applying Procedure 1 (above) up to two steps from the anchor nodes. Note that the impact of ‘religion’ is both positive and negative, hence follow-up questions should disentangle the effect. The participant may then identify two pathways: conflict of identity with religion (e.g., LGBTQ) which increases suicidal thoughts, and religiosity which decreases ideation
Procedure: Create an FCM via a discussion A model can be created like an ‘onion’: identify the core concepts and connections among them (1st layer), then discuss how direct drivers connect to core concepts and among themselves (2nd layer), etc. In computer science, this is known as a Breadth-First Search algorithm. A systematic building process has thus been defined as follows [42]: 1. Define the main concepts of interest. These are often the main 1–3 measurable model outcomes. 2. Identify relationships between the concepts. 3. For each concept, ask what drives it (whether by increasing or by decreasing). If participants believe that a concept may both increase or increase another, it is likely that several pathways are involved hence the direct relationship may need to be refined by identifying intermediate concepts. 4. Repeat the process at step #2 until either (i) participants do not have concepts or relationships to add, or (ii) facilitators note that answers are tangential to the purpose of the model. If the process is used directly by participants, they may need a visual support to track what they have built. If the process is performed by facilitators, then they can occasionally summarize what the participant has said before asking if additional constructs come to mind.
2 Creating an FCM with Participants in an Interview or Workshop Setting
25
2.1.2 Facilitator Versus Participant Mapping Mapping can either be done by the facilitator, with varying levels of participant involvement as the participant is asked questions, or by the participant themselves. We start by covering approaches in which participants are directly involved, either by building the map themselves or through interviews with facilitators. We then discuss methods that emphasize scalability by deploying questionnaires.
2.1.2.1
Working Directly with Participants: Interviews or Participant-Led Mapping
Participants with prior experience with system modeling or systems thinking might be able to quickly pick up the process and generate their own FCMs. However, this is not guaranteed. Some participants may not be fully aware of the factors that drive their decisions, and such knowledge gaps can result in models that are not fit for purpose (e.g., omit concepts that would be critical to examine the impact of an intervention onto a system). In such instances, thoughtful sampling designs in which a diversity of knowledge1 is represented and the selection of an appropriate data analysis technique, particularly when aggregating models, can help alleviate these issues. One approach for facilitator mapping is a semi-structured interview in which a participant identifies concepts and relationships, which are immediately translated into an FCM by the facilitator. Interview questions are frequently open-ended and designed to extract valued system components and their corresponding interactions (see Sect. 2.3 for details). The facilitator should be familiar with structuring answers into an FCM, but does not need to be familiar with the application field. A facilitator can be particularly useful for situations where the participant may not be able to create a map themselves, for example in the case of technological barriers or virtual meetings. This allows for less intensive load on the participant, who can thus focus on active mapping participation and use FCM as a boundary object in the interview. However, this approach shifts the workload from the participant to the facilitator, who needs to quickly create the map as the conversation unfolds. The cognitive load of juggling tasks (asking questions, interpreting modeling structure, creating the model, developing follow-up questions, keeping the conversation on track, etc.) can be challenging for a single person. While not always possible, due to resource constraints like time and labor, we recommend two facilitators for facilitator assisted mapping, with one individual handling map generation/creation and the other running the interview. An alternative is for the facilitator to later create or polish the map based on a recorded interview. 1
Past studies have examined the impact of knowledge diversity onto the structure of the resulting maps. In particular, they found that variables that are essential to steer the system towards a desired outcome were present in most of the participants’ FCMs, but there can be significant variability between these core variables and the rest of the system [50].
26
C. B. Knox et al.
The most common error we have experienced with facilitator mapping approach is potential incorrect signage (indicating a positive relationship when a participant intended to include a negative polarity) on relationships. To avoid this error, the facilitator should not make assumptions by considering that a signage is ‘obvious’— which may be tempting if the facilitator happens to be familiar with the application domain. Rather, the facilitator should discuss the preferred state of model concepts with the participant by explicitly asking if they want a system component to increase or decrease [25]. Ultimately, the decision of (a specific type of) facilitator mapping or participant mapping comes down to whichever approach is best suited to the research question(s), study context, and resources. Some studies have engaged in mixed-methods, by first performing interviews to ‘activate memories through eliciting narratives of actual experiences’ and then asking participants to draw their own maps [45]. While such variation exists, we view the following as best practices for collecting individual interviews: 1. Begin all interviews with clearly defined concepts. This would reduce variability and potential misunderstandings. 2. With the participants’ consent, record the interview. This will help later on when structuring the FCM. 3. Involve two interviewers, where one listens and the other one takes notes. This seeks to reduce interpretation risks. One protocol is that a researcher makes the FCM in front of the participant with the second interviewer listens in and asks follow-up questions as appropriate. Another protocol is that a researcher takes just enough notes to ask follow-up questions but does not build the FCM yet, while the second interviewer listens in to help with structuring the FCM after all interviews are complete (this will involve playing all recorded interviews). 4. Perform validation, through the interview (e.g., to check the interpretation of concepts and relationships so far) and after the interview (to review the FCM). Although best practice #3 seeks to reduce interpretation issues by interviewers, the use of such facilitators adds one layer and potential biases in externalizing mental models from participants. The facilitation process is thus subject to additional validation.
2.1.2.2
Working Indirectly with Participants: Questionnaires
The techniques abovementioned can produce rich information by revealing concepts and their relationships, but they may not scale easily. For example, interviewing a large number of participants may not be feasible. Even when maps are produced by participants themselves, modelers may have to spend time on resolving differences across these maps to combine them (e.g., different words with equivalent meanings). When modelers need to either scale-up participation or move quickly on creating an FCM for which conceptual models already exist, they can turn to creating a survey
2 Creating an FCM with Participants in an Interview or Workshop Setting
27
and administering it online.2 This option requires at minimum a pre-determined concept list, creating before the survey is designed. The list can be produced either by the participant through open-ended pilot questions or a literature review. For example, Aminpour et al. [1], used a series of “known” concepts from the literature about coastal shorelines to first ask participants which were important for coastal ecosystem dynamics. Once concepts were collected by the participant, concept relationships are then assessed by asking, “does component A in the system have an impact on component B in the system?”. This question can be answered with either a categorical (e.g. seven-point Likert scale of strong negative, moderate negative, weak negative, no weak positive, moderate positive, and strong positive) or numerical scale answer (e.g. 0–10). However, this pairwise comparison process can be mentally draining with systems larger than just a few nodes.3 Two options have been proposed to deal with this problem. One option is to pre-determine not only the concept list but also the relationships of interest (Fig. 2.2), thus significantly reducing the number of questions for participants [21]. This is possible when there are existing conceptual models for the application domain, and modelers may still include open-ended questions to identify relationships or concepts that participants recommend adding. Another option is to ask participants to enter the weights of relationships in a matrix (e.g., as an Excel spreadsheet) instead of addressing each relationships through a dedicated question. We do not recommend this option, as it can lead to the creation of unusually dense FCMs where participants see every concept as potentially related to everything else [51]. Since questionnaires can be sent to a large pool of respondents and results are automatically organized, questionnaires can be used to generate a much larger sample size than the interview techniques previously discussed, with the potential to produce FCMs from hundreds if not thousands of participants. However, these FCMs are highly limited and prone to error if the participant does not understand the task. We thus recommend including “red herring” concepts (e.g., what impact does increasing “bananas” have on the “amount of sunlight”). Including such illogical relationships in surveys can allow measurement error in incorrect relationships to be identified and ultimately omitted from final analysis which is especially helpful when collecting hundreds of FCMs from participants through surveys.
2
This is a much less frequently used option to create FCMs with participants, hence fewer case studies will be available to readers interested in this option by contrast with interviews or direct mapping by participants. 3 For an FCM of .n nodes, each one could theoretically have an edge pointing to all other nodes. There are thus .n × (n − 1) potential relationships to assess. For just a small map of 10 concepts, that would mean asking already 90 questions to participants, without even accounting for duplicate questions or “red herring” concepts that serve as quality control.
28
C. B. Knox et al.
Fig. 2.2 Example of survey question asking participants to identify the weights of pre-identified relationships [21]. ‘Unsure’ allows a participant to state that the relationship exist, but that the weight is unclear. ‘Non-existent’ allows a participant to voice that the relationship should not exist. These options are important and treated differently when gathering all responses (‘unsure’ is skipped, and ‘non-existent’ is treated as a 0). In this instance, we used Google Sheets to create the form, which automatically assembles answers in a spreadsheet (below) that can be downloaded as an Excel file or CSV file and passed onto other packages such as FCMpy to create the FCM [35]
2 Creating an FCM with Participants in an Interview or Workshop Setting
29
2.1.3 Hand-Drawn Models Versus Modeling Software Another consideration during protocol development is the medium used to collect data. Models can either be hand-drawn during interviews or built in a modeling software.4 Models are often hand-drawn during in-person interviews when field sites are not conducive for computer use such as in rural areas. As an example, a seminal study used drawings when engaging with participants in the Uluabat Lake watershed in Turkey, who were primarily farmers, hunters, or fishermen [40]. In addition, software such as MentalModeler require internet access, which can be a limiting factor for the interviewee. Hand-drawn models can also be more accessible for participants with technological access issues or lower technological literacy. Scholars have also argued that hand-drawn sessions can promote engagement [5]: At the building stage, the main decision is whether to use pen and paper (and probably post-it notes), or to use software straight away. Using software can seem like an efficiency saving for a map which you will need to digitise at some point; however, it comes at a big potential cost of engagement and inclusiveness when building maps in a group. Using software excludes people who are not confident using computers or unfamiliar software, and if the facilitator operates the software alone, this makes them a bottleneck on the process.
Note that hand-drawn maps will require significant data processing efforts from the modeling team, since they will have to be digitized (e.g., Microsoft Excel or a CSV file). The use of modeling software thus also decreases labor during data processing, and allows an immediate analysis or scenario testing with participants. Given the rise of remote work and meetings during the COVID-19 pandemics, maps can also be collected in-person or virtually, which can inform the choice to handdraw or use software to generate maps. There are trade-offs to each approach; personal connections and rapport can be easier to form in-person. Participant recruitment may also be easier, as a wider range of techniques including in-person intercepts can be utilized. Further details on participant recruitment are discussed below in Sect. 2.2.2. Both enhanced personal connection and recruitment are important considerations for inclusion of traditionally underrepresented groups as they can build trust and increase willingness to participate. Furthermore, in-person data collection can be easier for participants with limited access to resources such as computers or internet or other technological barriers. For example, work focusing on the experiences and knowledge of elders might be better suited for in-person modeling. Conversely, virtual modeling sessions can be easier when using modeling software as the modeler can share their screen. Virtual meeting platforms also have built-in recording technology, making it easy to capture audio and video. Finally, virtual meetings are flexible so researchers can reach participants in different physical locations or with time constraints. 4 Both MentalModeler and FCM Expert have a Graphical User Interface that allows participants to directly create concepts, link them, and set weights. The same software can then be used to test scenarios. If testing scenarios is not immediately needed, then participants could use any software that supports the creation of directed, labeled networks. The files can then be imported into a wider range of FCM-focused libraries either in Java via JFCM [9] (https://sourceforge.net/ projects/jfcm) or in Python through FCMpy [35] (https://pypi.org/project/fcmpy/).
30
C. B. Knox et al.
Procedure: Create an FCM via pen-and-paper An unstructured procedure is to simply take a piece of paper, add and link concepts as needed. While this is simple to explain, it can lead to a number of problems. Participants may later need to connect two concepts that are far away, thus drawing meandering links around the map. Participants may forget that they already have a certain concept, because it is buried in a busy area of the map, and hence they would make a duplicate (possibly with a different name). The resulting map may be messy, and difficult to digitize. Consequently, the literature has often involved a structured approach that first elicits all concepts, and then only adds edges [28]. In a workshop setting, we can decompose the elicitation of concepts into several phases: (i) generate candidate concepts on post-it notes,5 (ii) organize them in clusters (and optionally sort them by importance within each cluster), (iii) identify duplicates or less important concepts and hence trim the list, then (iv) arrange the remaining post-it notes and draw the connections. The procedure is the same as in applied concept mapping [36], the only difference being that step (iv) requires specifying the connection as increasing/decrease with a given strength. Figure 3 in [46] illustrates the creation of the post-its (i) and their organization into clusters (ii). When post-it notes are not an option and/or time is very limited, a simplified approach was proposed by O’Garra et al. who asked participants to write all concepts of the left-side of a large large 11” x 17” sheet of paper. Once all concepts were identified, participants would draw their concepts in the center of same sheet, this time connecting them with edges. Reusable instructions are available as Appendix 1 in [39]. Although we recommend this structured approach, other procedures have been used in the literature.6 Note that low-technology sticky notes can be replaced with magnetic paper used on a magnetic dry erase board, which allows participants to relocate concepts and keep their models clean by erasing edges [6].
2.1.4 Pre-defined, Open-Ended Concepts or Hybrid Approach A FCM protocol can consist of a pre-defined concept approach, an open-ended concept approach, or a hybrid approach combining the two. When using pre-defined concepts, interviewees are presented with a fixed set of concepts at the start of the modeling exercise. Conversely, an open-ended approach directs participants to bring their own concepts into the model based on what they value or perceive to be important in their system after the first initial concepts are assigned. In a hybrid approach, participants are both given a set of pre-defined concepts and encouraged to bring their own (additional) concepts into the conversation.
2 Creating an FCM with Participants in an Interview or Workshop Setting
31
Fig. 2.3 To identify whether terms used by different stakeholders have an equivalent meaning, it is useful to view them in context. For instance, seeing the terms within their respective maps can reveal that they share several causes and effects, thus raising the possibility that they are equivalent. Conversely, some terms may appear similar, but situating them within their maps can show that they are used in a very different manner hence they should not be combined. In this example, we used several displays to visualize multiple maps at once and combine equivalent terms [13]
There are several tradeoffs to take into account when selecting a pre-defined or open-ended concept approach. One key tradeoff to consider is where time and labor is spent. For pre-defined concepts, labor is front-ended to develop the concepts which then saves considerable time post data collection when aggregating or comparing models. For instance, a concept list can be built in a workshop [10] or selected from the literature and validated with participants via a survey [17]. With open-ended concepts, the labor is back-ended with potentially considerable amounts of time being spent on concept standardization (Fig. 2.3) before model analysis can begin. For example, “bad weather”, “good weather”, “amount of rain”, and “drought” may all have the same impact on the system being modeled and could all be reduced qualitatively to “amount of rain annually”. In a project, we had individual maps from 22 participants, resulting in 361 concepts. We found that 134 concepts were duplicates (37.1%) while 227 were unique (62.9%). Identifying these 134 concepts took multiple rounds, requiring a subject-matter expert alongside a modeler [18]. Oftentimes dealing with open-ended concepts requires qualitative binning to conduct quantitative analyses, which can be complex and time consuming, particularly if apparent commonalities fail to emerge. For instance, in a fisheries study, fishers may add their primary and secondary target species (e.g. red snapper, striped bass) to their
32
C. B. Knox et al.
individual maps during data collection, and the analyst may then bin those individual species into bins of “primary target species” and “secondary target species” during model aggregation. The opposite is also true where participants may use the same term but mean multiple different things. For example, reviews of audio recordings indicated that “Fishery” was used to mean four different distinct concepts depending on how participants used the term in the FCM [48] (Table 2.2). To the extent possible, we recommend having a word bank of commonly used terms that might appear during the interview to reduce post-analysis labor requirements. During the FCM protocol development, it is important to assess what knowledge is known and what knowledge the protocol is being designed to collect. The predefined concept approach allows researchers to assess relationships between specific system components of interest. Researchers need to know which concepts are important to the system before going into the conversations. In situations with emergent problems or ill understood systems, having pre-defined concepts might be challenging. Facilitators must also ensure that such concepts have consistent meaning across participants. Creating concept definitions can be necessary, further increasing labor on the front-end of the project. In addition, having pre-defined concepts can limit the data collected or how the conversation can develop, as participants may identify novel concepts yet would not be able to add them. This may affect key elements of the system and incorrectly identify the components most components most central to the individual. An open-ended approach is much more flexible, as any concept can be introduced to the model, but it can be intimidating for some participants (i.e. the infinite possibilities of a blank page known as “blank page syndrome”). Furthermore, an open-ended approach runs the risk of getting off-track without proper facilitation. Details on facilitation considerations and approaches are discussed in detail below in Sect. 2.2.3. Time spent modeling with participants can be a limiting resource of many studies, so ensuring only relevant information is captured is important. A hybrid approach in which a combination of pre-defined and open-ended concepts are embedded into the model-building protocol can be useful to alleviate many of these challenges. In this approach, the researchers pre-determine a set of fixed concepts based on their research question while also allowing participants to bring in their own concepts that best represent their system as they perceive it. By starting the interview with core, commonly defined concepts, participants get familiar with the FCM approach and the “blank page syndrome” can be avoided. Participants can then build upon the pre-defined concepts in any direction they wish. While a hybrid approach can account for the data collection trade-offs of both methods, it can require more time and labor for protocol development and data analysis.
2 Creating an FCM with Participants in an Interview or Workshop Setting
33
Table 2.2 Qualitative analysis from transcribed FCM interviews demonstrate variation in definitions of concept mentioned by stakeholders [48]. See table note next page SC Node Measure Com Rec S M NGO T Fish Fish Fish
Fishery, economy
Age structure
Alternative employment opp.
Average age fisher
Fisheries administration
Fishery main income
Producer organization Artisanal fishery
Economic situation fishery
Fishery (side income)
#large and old cod Strength of mixing #alternative employment opportunities Amount of other income Strength of age distribution Height of the average age Amount of bureaucracy #implemented measures Amount of sales
1 1 1
1 1 1 1 1 1
#people in main-income fishery #stable economic relationships Amount of sales Amount of sales Degree of conservation Amount of catch Amount of sales Strength of lobbying Level of conditions in fishery Level of fishing effort (fishing intensity) Amount of sales #people in side-income fishery
1
1 1 1 1 1 1 1 1
1
1 1
(continued)
34
C. B. Knox et al.
Table 2.2 (continued) SC Node Politics, Management
Common fisheries policy (CFP)
Fisheries policy
Marine Stewardship Council
Max. sustainable yield
Presumed stock size
Regulations
Politics
Measure
Com Rec Fish Fish
S
M
NGO T
#implemented measures
1
Strength of implemented measures Quality of fisheries policy Level of effectiveness of fisheries measures #certification
#certified fisheries Strength of compliance with this set target value Height of value of . FM SY Value of . B M SY −trigger Amount of assumed stock size #regulations Level of quality and applicability of the regulations Strength of influence Quality of politics Strength of political influence regarding fishery Amount of bureaucracy Strength of economic orientation
1
1 1
1
1 1
1 1 1
1 1
1 1 1
1 1
(continued)
2 Creating an FCM with Participants in an Interview or Workshop Setting Table 2.2 (continued) SC Node Tourism recreation.
Gastronomy
Measure
Com Rec Fish Fish
#gastronomies
#restaurant visits Level of good environmental status Strength of nature experience, being outside, sense of community Nature citizen Level of satisfaction Amount of interest in nature-friendly life Tradition, custom Strength of existence of culture and tradition #fishing company Nature Strength of the conservation emotional attachment to nature conservation Level of quality of nature conservation Strength of the 1 requirement of nature protection #measures #implemented measures Strength of influence
Other (social) Community, nature, experience
S
35
M
NGO T
1 1 1
1
1 1
1
1 1
1
1 1 1
Note The individual measure description and frequency of each stakeholder group is shown for the following groups: Com Fish .= commercial fishery, Rec Fish .= recreational fishery, S .= scientist, M .= managers, NGO .= non-governmental organization, T .= tourism. # means ‘number of’
36
C. B. Knox et al.
2.2 Data Collection 2.2.1 Creating a Parsimonious Model and Weighting Connections The FCM protocol can work to quantify (through qualitative reasoning) the perceived interactions between system components that comprise a problem space or complex system to be modeled. Participants will be asked to identify important relationships that they perceive between concepts in the FCM. The interviewer will pose the question, “does component A in the system have an impact on component B in the system?” If they do see an impact, interviewees are asked if the corresponding relationship is positive or negative [5]. Therefore, it is important to specify with the participant that the terms positive and negative represent how an increase in component A would either increase (if the relationship is positive) or decrease (if the relationship is negative) component B. Often participants will think of the relationship signs in a “normative” sense, meaning that a positive relationship is good and a negative relationship is bad [5]. However, this is not always the case and is dependent on the preferred state of each individual concept. Keep in mind, it is important that only salient relationships between concepts should be represented in order to get a parsimonious model that captures important causal relationships and avoids “everything being connected to everything in theory”. FCM researchers should try to represent relationships from participants and not simply connect all components to everything else. These relationships usually emerge in normal narrative conversations via the interview process. Finally, interviewees are then asked to assign weighting to the causal links drawn between the components. At this stage, it is important to specify the meaning of the relationship weights with the participants. Weights are not meant to represent uncertainty in the linkage, but rather the relative impact that component A has on component B, in relation to the other linkages in the system. Participants may choose from a reduced set of weights (strong, moderate, weak) or a full set of weights (very low, low, medium, strong, very strong). ‘Zero’ or ‘none’ may be included so that participants can express that a relation does not exist [37]. These linguistic variables capture the ‘grade of influence’ of one concept onto another and they are associated with a fuzzy membership function.7 Using Likert-type scales in everyday language makes it simple for participants to describe the strength of a relationship. There are also studies in which participants rated the degree of influence by using a numerical scale from .−3 (strong negative) to .+3 [17]; numbers are then scaled instead of using fuzzy logic.
7
Triangular membership functions are commonly used, as illustrated by Fig. 5 in [33] or Fig. 2 in [37]. An alternative is a trapezoidal membership function, as shown by Fig. 3 in [32]. Libraries such as FCMpy can take in data with linguistic variables (e.g., ‘very low’) and automatically associate them with a fuzzy membership function, so that modelers do not need to define such functions themselves.
2 Creating an FCM with Participants in an Interview or Workshop Setting
37
2.2.2 Participant Recruitment Creating a sample design for participant recruitment is an important step at the early stages of an FCM study. Once the stakeholder group(s) or “experts” have been established, key considerations include spatial representation; representation of different demographic groups and socioeconomic backgrounds; and consideration of who has and has not traditionally been involved in research studies and/or decision-making processes. When recruiting from a variety of stakeholder groups for a comparative study, it is important to evaluate each of these considerations across and within groups to ensure a representative and balanced sample. Producing a robust sample can often be challenging, particularly when producing FCMs in novel or poorly understood situations. A common limitation in FCM studies is the ability to capture a wide audience and variety of stakeholders due to the determination of components and relationships often taking long periods of time [25]. Due to this time consumption and lack of ease of use with interviewees, capturing a full range of various socioeconomic backgrounds can be challenging. Furthermore, a general lack of trust, historical under-representation, and lack of inclusivity can decrease willingness to participate for low-income and people of color across human subject research. Access to resources such as computers and the internet can also limit participation of lower-income community members when conducting virtual studies, which are becoming increasingly popular in the post-COVID 19 era. One approach to mitigate these constraints and incentivize participation is compensation, when funds allow.8 Examples of frequently used compensation are honorariums, gift cards, or raffle cards. For example, when participants were asked to develop their own FCMs with online instructions independently, Aminpour et al. [3] compensated each one with $50. In other studies, where participants were interviewed, the same amount of compensation was used, estimating participants would spend around 1.5 h during interviews and in follow up validation requests. Deploying a mixed-method approach to recruitment can also help mitigate these challenges due to differing communication preferences, time limitations, and varying comfort levels naming referrals. Several approaches to participant recruitment can be deployed. Purposive sampling is often utilized to gather FCMs from participants with particular knowledge pertaining to the project research questions [20]. Projects aiming to gather knowledge from subject experts such as researchers or managers might utilize purposive sampling, extracting participant information from government, non-project organization, or academic institution websites, for example. Projects aiming to recruit specific stakeholder groups might use other public data sources for participant recruitment. Examples include license databases, residential addresses through local government 8 Researchers should account for the potential impact that compensation may have on the ethical clearance of their study by the relevant Institutional Review Board. For instance, in some institutions, rules for exemption include ‘no payments or benefits in kind’ alongside: no deception in study design, participants are not vulnerable, questions are neither sensitive nor offensive, no risks to physical or mental health, confidentiality and anonymity are guaranteed [41].
38
C. B. Knox et al.
online resources, and phonebooks, to name a few. Community partners such as NGOs may also maintain relevant lists, as exemplified in a project on farm households [4]. When applying purposive sampling, we recommend disclosing (i) the criteria upon which individuals were identified, (ii) how many invitations were sent, and (iii) how many were accepted [43]. When working in a small spatial area, in-person intercept surveying (e.g. approaching beachgoers, or charter boat captains at their places of work or leisure) can be useful for recruitment. This approach often works best when targeting a stakeholder group conducting a specific activity in a public area. This technique is frequently deployed for creating FCMs with fishers, intercepting shore recreational or subsistence fishers at public fishing piers and charter, commercial, or other recreational boat fishers at public docks. Other examples of potential intercept survey areas include tourism operator booths or shops, public beaches, or offices open to the public, such as government or non-profit office locations, depending on the research questions at hand and corresponding stakeholders of interest. Often, a combination of purposive sampling and snowball sampling (also referred to as chain referral sampling) is used [34, 49]. Snowball sampling is a recruitment technique in which interviewees refer potential new study participants. When using this technique, it is important to have several starting points established to ensure a diverse sample. In some cases, participants were explicitly asked for the names of potential participants who would think differently on the issue of interest than the participant [29]. The study sample is often considered complete and recruitment ends when a saturation point has been met, meaning no new concepts are being brought in during map creation.
2.2.3 Facilitation Considerations Working with people always presents challenges that can generally be managed through a combination of intentional protocol development and practice being a facilitator. It is important to develop rapport with participants and build a space for open and honest communication. Some examples of intentional protocol development we have found successful are: (1) beginning sessions with introductions from both the facilitator(s) and participant(s) to build familiarity and so everyone is aware of the roles/background of folks in the room, (2) set norms and guidelines for discussion such as speaking from an “I” perspective or addressing ideas and not people, and (3) using time at the end of the session for reflection and open comments while acknowledging that a majority of mental and emotional processing will happen outside of the session. Some participants will take to system mapping and be excited to explain their knowledge in terms of directed and weighted connections, while others prefer to share stories and narratives and mostly ignore the mapping process. We recommend first trying to re-engage the participant with the map, especially if participant-generated mapping is central to the protocol. However, it may be appropriate to transition to more facilitator-generated mapping to ensure a robust and nuanced conversation with
2 Creating an FCM with Participants in an Interview or Workshop Setting
39
participants who find it difficult to translate their experiences into an FCM. If there is some flexibility in the protocol, it can be very impactful to adapt to different styles of interviewees and their preferences for engaging with the FCM while maintaining high data fidelity. This might look like giving a participant with less interest in the hands-on modeling process more space to talk, with more infrequent check-ins to validate model structure than a participant who is eager to see the model develop. Facilitation skills and adaptation can be vital to smoothly move the conversation along and respect where participants are at while still productively capturing knowledge and understanding about a systems. For example, we might offer low stakes opportunities to directly engage with the FCM at the beginning of the modeling session to gauge intrigue and shape the rest of the conversation. In our work, we had participants draw their own connections to a simple traffic example to become comfortable with expressing direction, polarity, and strength of connection. We also started the data collection portion of the interview with participants assigning weight to three existing connections, to make the start of mapping less intimidating. Regardless, the data collection methodology must align with the theory backing the study and the analysis planned. How much flexibility and adaptability exists for facilitators to exert should be established before interviews are conducted. As with all research involving human subjects, researchers and facilitators of FCM collection need to be mindful of risks to participants. Depending on the subject matter being discussed, conversations can be emotionally charged and run the risk of emotional/psychological harm. Being aware of a participant’s relationship to the topic can inform how mapping is carried out. For example, creating an FCM about gun violence in a neighborhood with a journalist versus a community member would require facilitation differences such as acknowledging the difficulty of the topic upfront, taking time and space to process emotions, and checking in with participants to minimize risks of retraumatization. Researchers themselves are never a truly neutral party, and it has been our experience that the best practice is to actively acknowledge the relationships between facilitators, participants, and the socioecological systems captured in FCMs. Social identities like race, ethnicity, gender, and sexuality can deeply influence how participants perceive a facilitator and thus what they are willing to share. In addition, attributes beyond identity such as how a facilitator dresses, their title, how old they seem, etc. can affect the data collection. These influences are unavoidable, so researchers should be intentional when selecting and training facilitators depending on the research context and participant populations. While, depending on subject matter, the influence of facilitator positionality and social identity on data collection may vary, it is always recommended to explicitly consider the implications of these factors. In Table 2.3 we describe three examples of how modeling sessions may be influenced and recommend some considerations.
40
C. B. Knox et al.
Table 2.3 Some considerations of the social aspects of the group modeling sessions that might influence the development of an FCM Recommendations Example scenarios A white researcher is entering a predominantly Black community to discuss the effects of structural racism on their public education system. In individual interviews, the facilitator notices that some community members gave surface-level or very neutral information and retreated when nudged to share their true thoughts. In an anonymous feedback survey, participants stated that they felt uncomfortable because the facilitator is not only an outsider to their geographic community, but also a white academic, which is a group that has historically made significant contributions to systemic oppression The facilitator in charge of a modeling workshop on domestic abuse prevention is a man, and most of the participants are women. He noticed during the workshop that several participants paused in their discussion to apologize to him for generalizing or to say things like “well I know you’re not like that.” The facilitator knows that any statements about men and the patriarchy are not personal attacks, he just wants to make sure that his identity doesn’t get in the way of a meaningful conversation or makes a participant feel like they have to censor themselves A cisgender facilitator is running a focus group on barriers to transgender healthcare. After asking participants to explain a few terms, she finds that she’s getting less engagement from participants. Towards the end there are long silences after her questions, and a participant shares that they think it would be a lot of additional effort to get the facilitator to understand their experiences, and that she might not ever really “get it”
Consider specifically seeking a researcher or facilitator from the geographical area and/or with similar social identity to the study population. It can also be powerful to partner with a community-based organization to co-facilitate the interviews. This can take some additional resources to train partners, but ensures that the research team includes local knowledge and experience from the start. It can also be useful to leverage the existing networks of partners for participant elicitation and closes the research-practice gap
Have the facilitator address the intersection of social identity and the modeling topic at the start of the session and assure participants that they can speak freely. Specifically, they could establish the discussion norm that we should assume that everyone is motivated by good intent and any harmful statements are not on purpose
Ensure that the facilitator has prepared beforehand in order to have a deep and nuanced conversation about the topic. This includes knowing jargon, cultural practices, physical locations, and other system-specific knowledge. While the facilitator does not need to be an expert, showing a lack of what participants consider basic knowledge can create distance and make the facilitator seem less credible. Facilitator training (through academic and non-academic media, attending community meetings, conducting pilot or scoping interviews, etc.) can narrow the gaps but, in some cases, it is highly recommended to have a facilitator with a similar social identity/life experience to establish rapport and relatability
2 Creating an FCM with Participants in an Interview or Workshop Setting
41
2.3 Conclusions One major benefit of FCM as a method is its flexibility as a social science research tool, however such flexibility also requires researchers to make research design decisions which are not always straight forward. We suggest that researchers think about how to align both the qualitative and quantitative strengths of FCM, along with the overall purpose or goals for why the study is being conducted in the first place. Whose knowledge is being collected and for what purpose? How are predetermined concepts defined? How can data collection be standardized to reduce measurement error while acknowledging if facilitated, the mental models of research participants are being communicated qualitatively and processed through the facilitators mental models to ultimately be represented? If participants are creating their own FCMs, are the interpretations of the instructions understood uniformly to the extent possible? These questions are not easily answered by anyone other than the researcher(s) designing the study. As is the case with all research, such questions should be asked at the start of the study to minimize measurement error, to ensure that any conclusions are theoretically based in previous research and that the purpose of using FCM as knowledge capture technique makes sense for the research questions being asked.
Exercises 1. Consider a problem on which you have accrued experience. For example, it could be a hobby such as fishing, cooking, hiking. Thinking of yourself as a participant, create a FCM for this problem. This would include creating concepts, connecting them with edges, and specifying the type (positive/negative) and strength (very low, low, medium, high, very high) of each edge. Aim for at least 7 different concepts. 2. Perform the exercise #1 above with a peer, so that you each have created an FCM on the same problem. Then, assemble your individual maps into a grouplevel map. Explain Which concepts have you combined, and why. Also explain whether certain concepts could be eliminated, and the rationale for doing so. 3. Reuse the FCM that you created in exercise #1. Create a Google Sheets survey (see Fig. 2.2) to obtain the edge weights and send it to multiple participants. 4. Download the supplementary materials of [23], located at https://www.mdpi.com/ 2079-8954/9/2/23#app1-systems-09-00023. Open activity S1, read the case on the first page, and complete the three questions located on the second page. 5. Discuss whether we should set the list of terms for participants prior to a workshop, during the workshop, or a combination of the two. Your discussion should clearly explain the pros and cons of each approach. 6. Why could it be counter-productive to assemble all terms in a circle and ask participants to connect them? (Hint: the main issue is about organizing terms in a circular manner.)
42
C. B. Knox et al.
7. Should an FCM include every relation possible (even when the evidence is limited), or should it focus on covering only certain relations? 8. Consider each of the following three situations. For each one, explain (i) how you will recruit participants, and (ii) how you will create the FCMs with them. a. Fishermen at a lake located in a remote, rural location. The problem has been well-studied before and there is abundant literature to draw upon on fishing in such lakes, albeit not in this specific region. b. A mix of subject-matter experts from regional institutions and residents of the region. This is an emerging problem, with limited literature available. c. Academic experts located at several institutions nationwide. The experts are familiar with the literature, having written many of the articles themselves.
References 1. P. Aminpour, S.A. Gray, M.W. Beck, et al. Urbanized knowledge syndrome–erosion of diversity and systems thinking in urbanites’ mental models. npj Urban Sustain. 2(1), 11 (2022) 2. P. Aminpour, S.A. Gray, A.J. Jetter et al., Wisdom of stakeholder crowds in complex socialecological systems. Nat. Sustain. 3(3), 191–199 (2020) 3. P. Aminpour, S.A. Gray, A. Singer et al., The diversity bonus in pooling local knowledge about complex problems. Proc. Natl. Acad. Sci. 118(5), e2016887118 (2021) 4. S. Aravindakshan, T.J. Krupnik, S. Shahrin et al., Socio-cognitive constraints and opportunities for sustainable intensification in south Asia: insights from fuzzy cognitive mapping in coastal bangladesh. Environ. Dev. Sustain. 23(11), 16588–16616 (2021) 5. P. Barbrook-Johnson, A.S. Penn, Fuzzy cognitive mapping, in Systems Mapping: How to Build and Use Causal Models of Systems. (Springer, 2022), pp. 79–95 6. C.J. Bardenhagen, P.H. Howard, S.A. Gray, Farmer mental models of biological pest control: associations with adoption of conservation practices in blueberry and cherry orchards. Front. Sustain. Food Syst. 4, 54 (2020) 7. C. Bosma, K. Glenk, P. Novo, How do individuals and groups perceive wetland functioning? fuzzy cognitive mapping of wetland perceptions in uganda. Land Use Policy 60, 181–196 (2017) 8. A.M. Cafer, K. Gordon, G. Mann, K. Kaiser, Fuzzy cognitive mapping and photovoice: a pilot of a novel participatory methodology for addressing equity in community resilience research. Local Dev. Soc. 4(1), 212–228 (2023) 9. D. De Franciscis, Jfcm: a java library for fuzzy cognitive maps, in Fuzzy Cognitive Maps for Applied Sciences and Engineering: From Fundamentals to Extensions and Learning Algorithms (2014), pp. 199–220 10. C.E. de Jong, K. Kok, Ambiguity in social ecological system understanding: advancing modelling of stakeholder perceptions of climate change adaptation in kenya. Environ. Modell. Softw. 141, 105054 (2021) 11. T. Devisscher, E. Boyd, Y. Malhi, Anticipating future risk in social-ecological systems using fuzzy cognitive mapping: the case of wildfire in the chiquitania, bolivia. Ecol. Soc. 21(4) (2016) 12. G. Dove, S.J. Abildgaard, M.M. Biskjær, et al. 13. L. Drasic, P.J. Giabbanelli, Exploring the interactions between physical well-being, and obesity. Can. J. Diabetes 39, S12–S13 (2015) 14. M. Eriksson, M. Safeeq, T. Pathak, B.N. Egoh, R. Bales, Using stakeholder-based fuzzy cognitive mapping to assess benefits of restoration in wildfire-vulnerable forests. Restor. Ecol. 31(4), e13766 (2023)
2 Creating an FCM with Participants in an Interview or Workshop Setting
43
15. F.A.F. Ferreira, M.S. Jalali, Identifying key determinants of housing sales and time-on-themarket (tom) using fuzzy cognitive mapping. Int. J. Strateg. Prop. Manag. 19(3), 235–244 (2015) 16. H.S. Firmansyah, S.H. Supangkat, A.A. Arman, P.J. Giabbanelli, Identifying the components and interrelationships of smart cities in Indonesia: supporting policymaking via fuzzy cognitive systems. IEEE Access 7, 46136–46151 (2019) 17. K. Fonseca, E. Espitia, L. Breuer, A. Correa, Using fuzzy cognitive maps to promote naturebased solutions for water quality improvement in developing-country communities. J. Cleaner Prod. 377, 134246 (2022) 18. A.J. Freund, P.J. Giabbanelli, Automatically combining conceptual models using semantic and structural information, in 2021 Annual Modeling and Simulation Conference (ANNSIM). (IEEE, 2021), pp. 1–12 19. K.L. Furman, P. Aminpour, S.A. Gray, S.B. Scyphers, Mental models for assessing coastal social-ecological systems following disasters. Marine Policy 125, 104334 (2021) 20. S. Galehbakhtiari et al., A hermeneutic phenomenological study of online community participation: applications of fuzzy cognitive maps. Comput. Hum. Behav. 48, 637–643 (2015) 21. P.J. Giabbanelli, Modelling the spatial and social dynamics of insurgency. Sec. Inf. 3, 1–15 (2014) 22. P.J., K.L. Rice, M.C. Galgoczy, et al., Pathways to suicide or collections of vicious cycles? understanding the complexity of suicide through causal mapping. Soc. Netw. Anal. Min. 12(1), 60 (2022) 23. P.J. Giabbanelli, A.A. Tawfik, How perspectives of a system change based on exposure to positive or negative evidence. Systems 9(2), 23 (2021) 24. P.J. Giabbanelli, T. Torsney-Weir, V.K. Mago, A fuzzy cognitive map of the psychosocial determinants of obesity. Appl. Soft Comput. 12(12), 3711–3724 (2012) 25. S.A. Gray, S. Gray, J.L. De Kok, et al., Using fuzzy cognitive mapping as a participatory approach to analyze change, preferred states, and perceived resilience of social-ecological systems. Ecol. Soc. 20(2) (2015) 26. J. Halbe, Participatory Modelling in Sustainability Transitions Research. (Routledge, 2019), pp. 182–206 27. J. Halbrendt, S.A. Gray, S. Crow et al., Differences in farmer and expert beliefs and the perceived impacts of conservation agriculture. Global Environ. Change 28, 50–62 (2014) 28. Z. Irani, A.M. Sharif, H. Lee et al., Managing food security through food waste and loss: small data to big data. Comput. Oper. Res. 98, 367–383 (2018) 29. D.N Johnson, C.J. van Riper, W.P. Stewart, et al., Elucidating social-ecological perceptions of a protected area system in interior alaska: a fuzzy cognitive mapping approach. Ecol. Soc. 27(3) (2022) 30. R. Jordan, S. Gray, M. Zellner et al., Twelve questions for the participatory modeling community. Earth’s Future 6(8), 1046–1057 (2018) 31. C.A. Latkin, C. Edwards, M.A. Davey-Rothwell, K.E. Tobin, The relationship between social desirability bias and self-reports of health, substance use, and social network factors among urban substance users in baltimore, maryland. Addict. Behav. 73, 133–136 (2017) 32. V.K. Mago, R. Mehta, R. Woolrych, E.I. Papageorgiou, Supporting meningitis diagnosis amongst infants and children through the use of fuzzy cognitive mapping. BMC Med. Inf. Decis. Making 12, 1–12 (2012) 33. V.K. Mago, H.K. Morden, C. Fritz et al., Analyzing the impact of social factors on homelessness: a fuzzy cognitive map approach. BMC Med. Inf. Decis. Making 13(1), 1–19 (2013) 34. P. Martinez, M. Blanco, B. Castro-Campos, The water-energy-food nexus: a fuzzy-cognitive mapping approach to support nexus-compliant policies in andalusia (spain). Water 10(5), 664 (2018) 35. S. Mkhitaryan, P.J. Giabbanelli, M.K. Wozniak, G. Nápoles, N. De Vries, R. Crutzen, FCMpy: a python module for constructing and analyzing fuzzy cognitive maps. PeerJ Comput. Sci. 8, e1078 (2022)
44
C. B. Knox et al.
36. B. Moon, R.R. Hoffman, J. Novak, A. Canas, Applied Concept Mapping: Capturing, Analyzing, And Organizing Knowledge. (CRC Press, 2011) 37. M. Nikravesh, J. Kacprzyk, L.A. Zadeh et al., Fuzzy cognitive maps structure for medical decision support systems. Forging New Front.: Fuzzy Pioneers II, 151–174 (2008) 38. A. Nyaki, S.A. Gray, C.A. Lepczyk, J.C. Skibins, D. Rentsch, Local-scale dynamics and local drivers of bushmeat trade. Conserv. Biol. 28(5), 1403–1414 (2014) 39. T. O’Garra, D. Reckien, S. Pfirman, et al., Impact of gameplay vs. reading on mental models of social-ecological systems: a fuzzy cognitive mapping approach. Ecol. Soc. 26(2) (2021) 40. U. Özesmi, S. Özesmi, A participatory approach to ecosystem conservation: fuzzy cognitive maps and stakeholder group analysis in uluabat lake, turkey. Environ. Manag. 31, 0518–0531 (2003) 41. A.S. Penn, C.J.K. Knight, D.J.B. Lloyd et al., Participatory development and analysis of a fuzzy cognitive map of the establishment of a bio-based economy in the humber region. PloS One 8(11), e78319 (2013) 42. T. Reddy, P.J. Giabbanelli, V.K. Mago, The artificial facilitator: guiding participants in developing causal maps using voice-activated technologies, in Augmented Cognition: 13th International Conference, AC 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Orlando, FL, USA, July 26–31, 2019, Proceedings 21. (Springer, 2019), pp. 111–129 43. R.C. Rooney, J. Daniel, M. Mallory et al., Fuzzy cognitive mapping as a tool to assess the relative cumulative effects of environmental stressors on an arctic seabird population to identify conservation action and research priorities. Ecol. Sol. Evid. 4(2), e12241 (2023) 44. V.M. Salberg, A.M. Booth, S. Jahren, P. Novo, Assessing fuzzy cognitive mapping as a participatory and interdisciplinary approach to explore marine microfiber pollution. Marine Pollut. Bull. 179, 113713 (2022) 45. S. Samarasinghe, G. Strickert, Mixed-method integration and advances in fuzzy cognitive maps for computational policy simulations for natural hazard mitigation. Environ. Modell. Softw. 39, 188–200 (2013) 46. F.R.R.L. Santos, F.A.F. Ferreira, I. Meidute-Kavaliauskiene, Perceived key determinants of payment instrument usage: A fuzzy cognitive mapping-based approach. Perceived key determinants of payment instrument usage: a fuzzy cognitive mapping-based approach 3, 950–968 (2018) 47. P. Scemama, E. Regnier, F. Blanchard, O. Thebaud, Ecosystem services assessment for the conservation of mangroves in French Guiana using fuzzy cognitive mapping. Front. Forests Global Change 4, 769182 (2022) 48. H. Schwermer, P. Aminpour, C. Reza et al., Modeling and understanding social-ecological knowledge diversity. Conserv. Sci. Pract. 3(5), e396 (2021) 49. M.N. Tabar, R. Andam, H. Bahrololoum et al., Study of football social responsibility in Iran with fuzzy cognitive mapping approach. Sport Soc. 25(5), 982–999 (2022) 50. J.M. Vasslides, O.P. Jensen, Fuzzy cognitive mapping in support of integrated ecosystem assessments: developing a shared conceptual model among stakeholders. J. Environ. Manag. 166, 348–356 (2016) 51. M.K. Wozniak, S. Mkhitaryan, P.J. Giabbanelli, Automatic generation of individual fuzzy cognitive maps from longitudinal data, in International Conference on Computational Science. (Springer, 2022), pp. 312–325
Chapter 3
Principles of Simulations with FCMs Gonzalo Nápoles and Philippe J. Giabbanelli
Abstract In Chap. 1, we defined FCMs as mathematical objects and explained that they could be used to model complex systems and simulate what-if scenarios. Although some software hide away these aspects, modelers who seek to build FCMs from the ground up are faced with numerous questions: how do I choose an update equation or activation function? How do I know whether my FCM has stabilized, and what influences these dynamics? In this chapter, we provide practical guidance on these core questions. By completing this chapter, readers will be able to (i) identify options and choose a solution for each aspect of the design; (ii) relate choices such as the update equation or activation function to the dynamics of the FCM; (iii) apply these concepts by programming an FCM in native Python.
3.1 Introduction: Revisiting the Reasoning Mechanism Recall per Chap. 1 that FCM-based models are recurrent neural networks. This means that the activation values of neural concepts are iterative updated (starting with an initial stimulus) until a stopping criterion is reached. We introduced Eq. (1.1) in Sect. (1.2.2) for the update of an FCM as follows: (∑ ) N (t) (t+1) (t) .A =A ·W= f a j × w ji , j=1
where .A(t) denotes the activation vector (i.e., values of neural concept) at iteration .t, . N represents the number of concepts, . f (·) is the activation function, and .W N ×N is the matrix of edge weights connecting the concepts. G. Nápoles (B) Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands e-mail: [email protected] P. J. Giabbanelli Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_3
45
46
G. Nápoles and P. J. Giabbanelli
In practice, modelers can choose among different variations of this equation. All variations use the following when updating the concepts: a sum of incoming nodes weighted by their incoming edges, and an activation function. These are necessary components, since the value of a node depends on the information that it receives, and it must be contained within a desired interval. The reasoning process of all variations is additive, because the neurons’ activation values are determined from successive addition and multiplication operations. The first variation considers that the new value of a concept .ai(t) depends only on its connected concepts, hence it does not account for its own current value .ai(t−1) : (∑ ) N (t) (t−1) .ai = f w ji a j (3.1) , i /= j. j=1
Notice that the constraint .i /= j implies that a concept cannot be considered a cause of itself.1 In the network, this means that nodes do not have self-loops. It also prevents a round-about way for a node’s current value to impact its next value by setting .wii to 1. The next variant of this reasoning mechanism includes the previous concept’s activation value in the calculations. More explicitly, the concept’s activation value in the current iteration will be given by (i) the activation values of connected concepts and (ii) the concept’s activation value in the previous iteration. This reasoning mechanism can be formalized as follows: (∑ ) N (t) (t−1) (t−1) .ai = f w ji a j + ai (3.2) , i /= j. j=1
Background Information: Naming the equations There is no universal convention on the name of these variations. Since the first variation is the oldest, it is occasionally known as ‘type I’. The second variation came next, hence it was called ‘type II’ [21]. The issue is that ‘type I’ already has several different meanings in various application domains. In statistics, we have type I errors and type II errors. Modelers have used ‘type I’ and ‘type II’ to refer to FCM environments with delay or uncertainty, respectively [16]. In an application such as obesity research, we distinguish ‘type I’ diabetes (autoimmune condition in which the pancreas does not make insulin) from ‘type II’ diabetes (excessive body fat leads to insulin resistance). If we analyzed a model applying the first variation on the autoimmune condition in an environment with high uncertainty, we would assess type I errors in a type I FCM for type I diabetes in a type II environment—thus leading to confusion. When teaching the fundamentals of FCMs, we have thus also named the first variation as ‘without memory’ (since a node does not use its past value) and the second variation as ‘with memory’.
An empirical study examined the effects of lifting the constraint .i /= j and found that it did not necessarily improve the quality of the model, as measured by predictive performances [17].
1
3 Principles of Simulations with FCMs
47
In either variation, the process can be summarized as follows. Firstly, each neuron is activated using a certain initial activation value, which belongs to the allowed activation interval. The expert determines the initial activation values and which neurons to activate. Secondly, we perform reasoning such that the output of neurons in the previous iteration (after being processed by the activation function) is used as the input of the following iteration. If we arrange all activation values in a vector, then the inference process will consist of chained vector-matrix multiplications. Overall, the stimuli concern the IF part of the simulation, while the outputs (after a fixed number of iterations) concern the WHAT part. Note that in either variation, the state of FCM at iteration.t depends only on its state at .t − 1. This means that the dynamics of an FCM are of the first-order. There could be situations in which the state at .t should depend on both .t − 1 and .t − 2 (second order), or even further. Higher-order FCMs have thus been proposed [20] and they continue to be used, particularly for time series forecasting [6]. However, they lead to significant changes in the update equation (hence also affecting the dynamics of the system), the structure of the model, or the construction process. Consequently, they are extensions rather than variations, and we invite readers interested in extensions to consult Chap. 5. In addition, in Chap. 10 we will introduce yet another variation that was originally used in prediction settings.
3.2 Activation Functions The activation function . f (·) is a necessary component of every neural reasoning model. One of the reasons is that weights have a linear nature, so non-linear activation functions transform linear systems into non-linear ones. In an FCM, the weighted sum of incoming nodes’ values could lead a given node to have a value outside of the required interval, hence the function . f (·) is required to keep the activation values in the allowed activation interval. While there are many choices for the activation function, allowed choices must be monotonic, non-decreasing, and non-linear (Fig. 3.1). The most widely used activation functions for fuzzy cognitive modeling are the bivalent, trivalent, sigmoid and hyperbolic tangent. The formulas of these functions and some relevant properties are given below. Background Information: Non-linear systems A non-linear system is a system in which the changes in the outputs are not proportional to the changes in the inputs. Non-linear problems are very popular because most systems are inherently non-linear. For example, behavior change is based on non-linear dynamics [18]. Imagine that your partner nags you to start exercising or stop smoking. You will not engage twice as much in the new behavior if your partner tells you about it twice as much. Rather, you ignore suggestions until a point, then you finally give in and change behavior.
48
G. Nápoles and P. J. Giabbanelli
Fig. 3.1 The function depicted here would be unusable as an activation function, for at least three reasons. First, it should be non-decreasing. Because the function violates this requirement in the negative range, the inputs.a < b are incorrectly mapped to. f (a) > f (b). For example, we may have had low fishing intensity and high fish stock as inputs, but we produce medium fishing intensity and low fish stock as outputs. Second, it should be monotonic. This is violated in the positive range, where the order of .c and .d are not preserved. Finally, its main purpose is to keep concept values within a target range such as [0, 1]. Hence it cannot be go up or down indefinitely, as suggested by this graph; it must be strictly contained within the range, for instance by plateauing or oscillating within it
The bivalent function (see Eq. (3.3)) is a discrete function that only produces binary responses leading to a finite number of states. Binary FCM-based models can only represent an increase of a concept or represent a stable concept but lack the capability of representing a decrease of a concept. { .
f (x) =
1 ,x >0 0 ,x ≤0
(3.3)
The trivalent function (see Eq. (3.3)) is another discrete activation function that produces a finite number of different states. Trivalent FCMs can represent increase or decrease of a concept and also represent a stable concept. However, they cannot represent the degree of an increase or a decrease of a concept. ⎧ ⎨−1 , x < 0 0 ,x =0 . f (x) = ⎩ 1 ,x >0
(3.4)
The sigmoid function (see Eq. (3.5) and Fig. 1.3) is a continuous function that can produce an infinite number of different states. In this function, .λ > 0 and .h ∈ R are two user-specified parameters controlling the function slope and offset, respectively. In sigmoid FCM models, small changes to the initial activation values can lead to a dramatic change to the final state of the neurons.
3 Principles of Simulations with FCMs
49
Fig. 3.2 The state space of an FCM with three concepts is limited to corners of the hypercube for a bivalent function (a), has additional possible states with a trivalent function (b), or could be anywhere within the hypercube (including corners and edges) for sigmoid and hyperbolic functions (c)
.
f (x) =
1 1 + e−λ(x−h)
(3.5)
Caution: activation functions with parameters Properly setting the parameters of an activation function (e.g., .λ and .h for the sigmoid) can prevent problems in the dynamics of an FCM, such as when concept values are propelled towards saturation at their minimal/maximal values [10]. Alternatively, a third variant of the update rule can be used, as shown in [3].
The hyperbolic function (see Eq. (3.6) and Fig. 1.3) is a continuous activation function that can also produce infinite states. Sigmoid and hyperbolic FCM models can be used for modeling both qualitative and quantitative scenarios. .
f (x) =
e2x − 1 e2x + 1
(3.6)
Activation functions play a paramount role in the behavior of the FCM model, and might have serious implications on the simulation results. As exemplified in Fig. 3.2 for three concepts, the choice of a function drives the state space of the FCM. There is no ‘one size fits all’ function, as the appropriate selection of this function depends on the problem being modeled and convergence (discussed in the next subsection). For example, it is common that we want to model concepts that can range from a minimum of 0 (absence) up to a maximum of 1 (presence), rather than take negative values [8]. This can be achieved by the sigmoid function, whereas a hyberbolic tangent would have a range between .−1 and 1.
50
G. Nápoles and P. J. Giabbanelli
3.3 Convergence: A Mathematical Perspective Convergence plays an important role when designing and simulating FCM-based models. In mathematics, convergence can be understood as the property (exhibited by certain infinite series and functions) of closely approaching a limit as the function argument increases or decreases. In our context, we say that an FCM-based model has converged (to a fixed-point attractor) if, after performing a large enough number of iterations defined beforehand by the domain experts, the neurons continue to produce the same activation values over and over again. There are two important observations regarding the convergence of FCMs. First, convergence is not always necessary. Rather, the need for convergence depends on the problem being modeled. If the real-world phenomenon should eventually stabilize, then convergence would be desired to replicate an important pattern. But if the target phenomenon is unstable and the model is stable, this discrepancy can indicate a faulty model. For example, suppose we are modeling the EUR-USD exchange rate within a certain market. If the FCM converges, we predict a constant exchange rate after a certain number of iterations. Second, convergence is difficult to control when FCMs use continuous activation functions, which allow for any state (Fig. 3.2c). The FCM may converge when we need it to produce different values or behave seemingly erratically when we hope for convergence. Let us further elaborate on this problem. As mentioned, FCMs are recurrent neural networks that produce a new activation vector in each iteration. This procedure is repeated until either the system stabilizes or meets a predefined stop criterion (e.g., reaching a maximum number of iterations). The former implies that a fixed point was discovered, whereas the latter suggests that the FCM responses are either cyclic or chaotic. These situations are formalized below. • Fixed-point .(∃tα ∈ {1, 2, . . . , (T − 1)} : ai(t+1) = ai(t) , ∀t ≥ tα ): the FCM produces the same state vector after the .tα th iteration. This suggests that .ai(tα ) = ai(tα +1) = ai(tα +2) = · · · = ai(T ) ; • Limit cycle .(∃tα , P ∈ {1, 2, . . . , (T − 1)} : ai(t+P) = A(t) , ∀t ≥ tα ): the FCM periodically produces the same state vector after the .tα th iteration. This sug(t + j P) where .tα + j P ≤ T , such gests that .ai(tα ) = ai(tα +P) = ai(tα +2P) = · · · = ai α that . j ∈ {1, 2, . . . , (T − 1)}; • Chaos: the FCM produces a different state vector at each iteration. In other words, the system is neither stable nor cyclic. If the model is able to converge, then the system will produce the same output towards the end, and therefore the activation degree of concepts will remain unaltered (or subject to infinitesimal changes). In contrast, a cyclic FCM produces dissimilar responses with the exception of a few states that are periodically produced. The last scenario concerns chaotic configurations in which the network yields different state vectors as more iterations are performed. Figure 3.3 illustrates the behaviors of these configurations for a single neural concept. The .x-axis denotes the number of iterations while the . y-axis denotes the neuron’s activation value.
3 Principles of Simulations with FCMs
51
1 0.8 0.6 0.4 0.2 0 0
10
20
30
40
50
60
70
80
90
100
(a) Stable neural concept 0.1 0.08 0.06 0.04 0.02 0 0
10
20
30
40
50
60
70
80
90
100
(b) Cyclic neural concept 1 0.8 0.6 0.4 0.2 0 0
10
20
30
40
50
60
70
80
90
100
(c) Chaotic neural concept
Fig. 3.3 Examples of the dynamic states of a neural concept over 100 iterations
In presence of chaotic or cyclic situations, the reasoning rule stops once a maximal number of iterations .T is reached. At that point, the state vector is calculated from the last response. In some decision-making scenarios, such an output might be deemed unreliable due to the lack of stability. The convergence issues in FCM-based systems are mostly related to (i) the pattern encoded in the weight matrix, (ii) the strategy for updating the concepts’ values and (iii) the non-decreasing activation function used in the reasoning rule. For example, a symmetric zero-diagonal matrix often leads to improved convergence features [15]. However, symmetric matrices are not useful when we want to represent relationships where the cause and effect cannot be exchanged.
52
G. Nápoles and P. J. Giabbanelli
Before concluding this section, let us discuss the link between the model convergence and the activation function. According to the literature [22], discrete activation functions (that produce a finite number of states) will never lead to chaotic outputs. This happens because FCM models are deterministic systems and so, if they reach a state they have previously visited, the system will enter a closed orbit which will always repeat itself. Therefore, a bivalent and trivalent FCM model will either converge to a fixed-point attractor or show cyclic patterns (with an exponential period in the worst scenario) but it will never produce chaos.
3.4 Convergence: A Simulation Approach Intuitively, simulation studies really need their model to stabilize because that is when results are considered ‘final’ and thus can be communicated to stakeholders. An FCM that does not stabilize would not be delivering results, and the model would not allow its users to make any conclusion. For example, users may want to know whether a new approach to fishery management would yield a sufficient stock of fish; if the answer oscillates between ‘a lot of fish’ and ‘not very much fish’ then the model is not fit for purpose. These practical needs for stabilization can be difficult to satisfy under the rigorous definition presented in the previous subsection, which required all concepts to stop changing (or be within the infinitesimal changes attributable to the small errors inherent to fixed number representations). Consequently, simulation studies frequently operate under a more lax and varying notion of convergence by relaxing one or both of the core tenets of convergence: it may be sufficient for some concepts to stabilize, and we may consider that they are stable enough under a certain lens. These two aspects are discussed in the paragraphs below, respectively. We operate a distinction between global stabilization where all concepts stabilize (per the previous subsection) and partial stabilization when only some concepts must stabilize [23]. In partial stabilization, there must be a clear rationale to select the subset of concepts that must stabilize to stop the entire model. For instance, consider that a model has five concepts, among which the ‘stock of fish’ that interests users. It would be methodologically flawed to pick some other concept to decide that the model has stabilized, and then tell users about the ‘stock of fish’ since its value would not be stable. In partial stabilization, we are concerned with the outputs of interest for model [11]. For example, if we make an FCM to monitor whether a patient is obese, then we end the simulation once the value of ‘obesity’ is stable. If users seek more detailed answers that would also include concepts such as ‘mental health’ and ‘physical health’, then all three of these concepts must be stable. The variety of methods used to conclude that some concepts have stabilized echoes the diversity of visual and statistical approaches in other fields of simulation.2 Visual 2 Steady state simulation studies consider that a model may need to warm-up by running for some iterations, until its behavior is finally indicative of the long-term dynamics of the system. For instance, consider a factory line: it would not be meaningful to observe its performance until all
3 Principles of Simulations with FCMs
53
Fig. 3.4 Outputs may appear stable on a convergence plot at some resolution, but zooming in can reveal an oscillatory behavior at a more fine grained resolution. The technique consisting of a visual inspection thus implicitly involves a tolerance threshold based on the resolution
methods consist of looking for a flat line on a time series of concepts’ values [9, 19], also called a ‘convergence plot’. Such an approach is built into the FCM R Package [2] or the FCM Expert software [13, 14]. While this approach has the benefits of looking at multiple consecutive states, it leaves it to the modeler to decide that stability has been obtained. In addition, what may appear to be a flat line when zoomed out may actually reveal oscillations at a more fine grained resolution (Fig. 3.4). Other methods automatically determine stability by requiring that two consecutive values change by less than a user-defined error margin (.E) [12]. The value of .E is not provided in all studies where it is used, but some examples include 0.001 on all factors in [8], 0.001 on the one core output in [4], or 0 on the two core outputs in [5]. Note that, the smaller the threshold, the more precise the convergence.
3.5 A Detailed Example in Python In this section, we will implement the FCM model in Fig. 3.5 using Python. While programming in Python is not a learning goal of this book, implementing this simple model will help us understand fully the theoretical ideas discussed in this chapter. This can be done in five steps, as detailed below. Step 1. We need to create the model, which translates into creating the causal weight matrix that defines the interaction between the problem variables. Self-loops are not allowed in this problem. personnel has arrived and the line is in full operation. Once the warm-up period is established, observations within this period are tossed and observations after this period are analyzed. Multiple methods have been proposed to determine the warm-up length [1, 7]. These methods may not all be directly applicable to FCMs. The main reason is that steady state simulation studies consider that the system will continue running after its warm-up period, whereas in an FCM we stop the simulation once we reach a stable state. Notice that FCMs are closed systems that operate in a deterministic way, producing the same values after convergence.
54
G. Nápoles and P. J. Giabbanelli
Fig. 3.5 FCM-based system comprised of three neuronal concepts modeling the interrelations in a simplified food chain
− 1.0 Prey
Predator
− 1.0 1.0 1.0 Grass
W = np . arra y ([[0 , -1 ,0] , [1 ,0 , -1] , [0 ,1 ,0]]) n ame s = [ ’ p r e d a t o r ’ , ’ prey ’ , ’ grass ’] Step 2. Next, we implement the reasoning mechanism used to update the activation values of concepts at each iteration. We need to define an initial activation vector A containing the activation values of all some (or all) concepts. def r e a s o n i n g ( W , A , T =50) : states = [A] for t in ra ng e ( T ) : A = h y p e r b o l i c ( np . m a t m u l (A , W ) ) states . append (A) r e t u r n states , A def s i g m o i d ( X , l =1.0 , h = 0 . 0 ) : r e t u r n 1.0 / (1.0 + np . exp ( - l *( X - h ) ) ) def h y p e r b o l i c ( X ) : r e t u r n np . tanh ( X ) In short, the reasoning mechanism of an FCM-based model consists of iteratively multiplying the activation vector by the weight matrix. After each multiplication, we transform the output vector using the activation function. The reasoning function receives the weight matrix, the initial activation vector, and the number of iterations. This sequence of states for the model are accumulated (via the append function) for the convenience of visualization, as shown in Step 4. Note that we use the hyperbolic tangent function as the activation function to develop this example. This means that we can expect the activation values to be in the .[−1, 1] interval. The shape of this function was shown in Chap. 1 (Fig. 1.3). Step 3. The following step is devoted to the network activation. This means that we need to define the initial activation vector . A(0) to feed the FCM model and trigger the reasoning mechanism (using a certain activation function). Such a vector is normally defined by the model user who assigns values to the input variables while the output
3 Principles of Simulations with FCMs
55
Fig. 3.6 Activation values of concepts in the prey-predator model using the hyperbolic tangent activation function for . T = 20 iterations
ones remain inactive. Again, the user determines which concepts are regarded as inputs and outputs. In this example, we have configured the initial activation vector such that we have a medium number of predators, prey and grass. However, we did not make any kind of distinction between the concepts. This means that all concepts are regarded as both inputs and outputs. A = np . a rra y ([0.5 , 0.5 , 0 . 5 ] ) Step 4. Once we have defined the initial activation vector, we can trigger the reasoning mechanism. The code below does that while plotting the activation values of neural concepts in each iteration. states , fin al = r e a s o n i n g (W , A ) fig = plt . plot ( np . array ( states ) ) plt . x l a b e l ( ’ i t e r a t i o n n u m b e r ’ , f o n t s i z e =14) plt . y l a b e l ( ’ a c t i v a t i o n v a l u e ’ , f o n t s i z e =14) plt . l e g e n d ( fig , n a m e s ) Step 5. The final step is dedicated to interpreting results, exploring the convergence and assessing the soundness of the simulation. If necessary, we can change the parametric settings (the number of iterations or the activation function) and repeat the simulations until we obtain reasonable results (Fig. 3.6). In this example, we investigate what would happen if we have a medium number of predators, prey and grass, as the initial activation vector encodes. This is an example of a what-if question, which demonstrates the usefulness of FCM for simulating complex systems. We can see that both prey and grass decrease as the number of predators increases. The former is expected, while the latter is surprising. When the number of predators decreases in the system, the number of prey and the amount of grass increase. Again the former is logical, but the latter apparently does not: it implies that carnivorous predators have suddenly switched to eating grass. But the reality is that there is an intermediate component: the prey. The dynamic of this
56
G. Nápoles and P. J. Giabbanelli
system is as follows. Prey are attracted to the region when there is enough grass, and the number of prey will attract predators. This happens with a time lag, with the amount of grass being the triggering component. Let us repeat the what-if simulations using the same weight matrix and initial conditions but using the sigmoid activation function, which we defined in Step 2 and illustrated in Chap. 1 (Fig. 1.3). We might expect that results will be similar as both activation functions are non-linear and produce infinite states. The only difference is that the hyperbolic tangent function produces values in the .[−1, 1] interval, while the sigmoid function produces values in the .[0, 1] interval. The simulations below show how important this difference might be in some problems. Apparently, we have gotten a better result since our model converged to a fixed point attractor. And this seems to be one of those problems where producing consistent results is important. However, modelers should not just be content by the existence of a stable outcome: they should always examine the dynamics of the output with respect to the problem at hand, particularly by involving subject matter experts. Figure 3.7 shows that there will always be more predators than preys, which is not realistic. Another aspect also raises a red flag: the results will not change even if we use different initial activation vectors. Regardless of the what-if question, the system produces the same output. This means that the FCM converged to a unique fixed-point attractor. Since the same output is produced regardless of the inputs, the only solution would be to change the model. Let us summarize our simulations. On the one hand, the hyperbolic FCM model could not reach an equilibrium state in which the neurons’ activation values no longer change, at least for the provided initial activation values. Note that in cyclic FCM models like this one, the final activation vector provides little useful information. However, the model aptly describes what is supposed to happen in reality, as this system is indeed oscillatory. On the other hand, the sigmoid FCM model was able to converge to a fixed point. Unfortunately, it can be verified that such an equilibrium point is unique, rendering the what-if simulations unusable since all initial conditions
Fig. 3.7 Activation values of concepts in the prey-predator model using the sigmoid activation function for .T = 20 iterations
3 Principles of Simulations with FCMs
57
will lead to the same output. There are specific situations in which unique fixed-point attractors are desired. For example, if we want to design an FCM-based controller, we might want to reach the same goal regardless of the initial conditions. But this is certainly not what what-if simulations are about.
3.6 Exercises 1. Fill in the blank space: “The reasoning mechanism of Fuzzy Cognitive Maps is the process of __________________.” a. updating the activation values of concepts in each iteration. b. iteratively multiplying the activation vector by the weight matrix. c. doing all of the things mentioned in the other answer options. 2. What happens with the concepts of a Fuzzy Cognitive Map that converges to a unique fixed point? Select the correct choice below. a. They produce the same output values regardless of the initial conditions. b. They produce different output values that depend on the initial conditions. c. They produce different output values for the same initial conditions. 3. Given the the FCM in Fig. 3.8, suppose there are some design constraints that will be used to assign the correct activation functions to some concepts. Which of the following statements are true? a. C3 represents a continuous variable.
C1
C2
Number of people in a city
0.1
Migration into city
0.9
C3 Modernization C6
Amount of garbage
Number of diseases per 1K residents
C4
C5 Sanitation facilities - 0.9 Bacteria per area
0.9
C7
Fig. 3.8 Complex system concerning a civil engineering model described by seven neural concepts
58
G. Nápoles and P. J. Giabbanelli
b. C4 can only be activated when C1 is active. c. C5 represents a variable having two states. 4. Given the the FCM in Fig. 3.8, which activation function setting satisfies all design constraints? a. The sigmoid function for C3, the hyperbolic tangent function for C4, and the bivalent function for C5. b. The hyperbolic tangent function for C3, the sigmoid function for C4, and the bivalent function for C5. c. The hyperbolic tangent function for C3, the bivalent function for C4, and the sigmoid function for C5.
References 1. C.S.M. Currie. Analysing output from stochastic computer simulations: An overview, in Computer Simulation Validation: Fundamental Concepts, Methodological Frameworks, and Philosophical Perspectives (2019), pp. 339–353 2. Z. Dikopoulou, E. Papageorgiou, A. Jetter, D. Bochtis, Open source tool in r language to estimate the inference of the fuzzy cognitive map in environmental decision making (2018) 3. G. Felix, G. Nápoles, R. Falcon, W. Froelich, K. Vanhoof, R. Bello, A review on methods and software for fuzzy cognitive maps. Artif. Intell. Rev. 52, 1707–1737 (2019) 4. H.S. Firmansyah, S.H. Supangkat, A.A. Arman, P.J. Giabbanelli, Identifying the components and interrelationships of smart cities in Indonesia: supporting policymaking via fuzzy cognitive systems. IEEE Access 7, 46136–46151 (2019) 5. P.J. Giabbanelli, T. Torsney-Weir, V.K. Mago, A fuzzy cognitive map of the psychosocial determinants of obesity. Appl. Soft Comput. 12(12), 3711–3724 (2012) 6. A.A. Harmati, L.T. Kóczy, Some dynamical properties of higher-order fuzzy cognitive maps. Comput. Intell. Math. Tack. Complex Prob. 3, 149–156 (2022) 7. K. Hoad, S. Robinson, R. Davies, Automating warm-up length estimation. J. Oper. Res. Soc. 61(9), 1389–1403 (2010) 8. W. Hoyos, J. Aguilar, M. Toro, A clinical decision-support system for dengue based on fuzzy cognitive maps. Health Care Manag. Sci. 25(4), 666–681 (2022) 9. K. Kok, The potential of fuzzy cognitive maps for semi-quantitative scenario development, with an example from brazil. Glob. Environ. Chang. 19(1), 122–133 (2009) 10. T. Koutsellis, G. Xexakis, K. Koasidis, A. Nikas, H. Doukas, Parameter analysis for sigmoid and hyperbolic transfer functions of fuzzy cognitive maps. Oper. Res. Int. J. 22(5), 5733–5763 (2022) 11. E.A. Lavin, P.J. Giabbanelli, Analyzing and simplifying model uncertainty in fuzzy cognitive maps, in 2017 Winter Simulation Conference (WSC). (IEEE, 2017), pp. 1868–1879 12. V.K. Mago, H.K. Morden, C. Fritz, T. Wu, S. Namazi, P. Geranmayeh, R. Chattopadhyay, V. Dabbaghian, Analyzing the impact of social factors on homelessness: a fuzzy cognitive map approach. BMC Med. Inform. Decis. Mak. 13(1), 1–19 (2013) 13. G. Nápoles, M.L. Espinosa, I. Grau, K. Vanhoof, Fcm expert: software tool for scenario analysis and pattern classification based on fuzzy cognitive maps. Int. J. Artif. Intell. Tools 27(07), 1860010 (2018) 14. G. Nápoles, M. Leon, I. Grau, and K. Vanhoof. Fuzzy cognitive maps tool for scenario analysis and pattern classification, in 2017 IEEE 29th International Conference on Tools with Artificial Intelligence (ICTAI) (2017), pp. 644–651
3 Principles of Simulations with FCMs
59
15. G. Nápoles, E. Papageorgiou, R. Bello, K. Vanhoof, On the convergence of sigmoid fuzzy cognitive maps. Inf. Sci. 349–350, 154–171 (2016) 16. E.I. Papageorgiou and J.L. Salmeron. Methods and algorithms for fuzzy cognitive map-based modeling, in Fuzzy Cognitive Maps For Applied Sciences and Engineering: From Fundamentals to Extensions and Learning Algorithms. (Springer, 2013), pp. 1–28 17. G.A. Papakostas, D.E. Koulouriotis, Classifying Patterns Using Fuzzy Cognitive Maps. (Springer, Berlin, Heidelberg, 2010), pp. 291–306 18. K. Resnicow, R. Vaughan, A chaotic view of behavior change: a quantum leap for health promotion. Int. J. Behav. Nutr. Phys. Act. 3, 1–7 (2006) 19. A.M. Sharif, Z. Irani, Exploring fuzzy cognitive mapping for is evaluation. Eur. J. Oper. Res. 173(3), 1175–1187 (2006) 20. W. Stach, L. Kurgan, W. Pedrycz, Higher-order fuzzy cognitive maps, in NAFIPS 2006-2006 Annual Meeting of the North American Fuzzy Information Processing Society. (IEEE, 2006), pp. 166–171 21. C.D. Stylios, P.P Groumpos, Mathematical formulation of fuzzy cognitive maps, in Proceedings of the 7th Mediterranean Conference on Control and Automation, vol. 2014. (Mediterranean Control Association Nicosia, Cyprus, 1999), pp. 2251–2261 22. A.K. Tsadiras, Comparing the inference capabilities of binary, trivalent and sigmoid fuzzy cognitive maps. Inf. Sci. 178(20), 3880–3894 (2008) 23. W. Xiaojie, L. Chao, L. Chen, The feedback stabilization of finite-state fuzzy cognitive maps. Trans. Inst. Meas. Control. 44(13), 2485–2499 (2022)
Chapter 4
Hybrid Simulations Philippe J. Giabbanelli
Abstract A Fuzzy Cognitive Map can serve to externalize the mental model of an individual or group. However, mental models do not directly communicate, in the same way as two brains lying on a table cannot interact. While FCMs can be quickly developed, they cannot capture how processes differ over space or across time. These limitations can be addressed either through extensions (Chap. 6) or by combining FCMs with other techniques. In this chapter, we focus on hybrid simulations, which involve combining FCMs with other simulation techniques such as cellular automata, complex networks, or agent-based models (ABMs). Hybrid ABM/FCM simulations are our focal point, as they have received the most attention in the literature. We explain how they can be used to simulate interactions between individuals, provide time estimates, and account for spatial differences. We provide a five-step process to build such models and exemplify it on a case study, supported with Python code. This chapter equips readers with fundamental simulation concepts, exemplifies practical applications that motivate ABM/FCM simulations, and provides a reusable process to create such simulations.
4.1 Introduction Each tool has its limitations, hence combinations of tools can become appealing. A synergistic combination is not merely a matter of convenience: it has to enable new features. For example, combining a washer and dryer into a single machine is only convenient to save space, since it performs the same functions as the individual tools. In contrast, a self-heating food packaging allows for the possibility of cooking food while away from home, thanks to the interaction between two components: an inner box chamber with food and an outer chamber with chemicals that act as heating agent once the user releases the membrane between the chambers. In this section, we examine synergistic combinations of FCMs with other simulation techniques by starting with an examination of the rationale for such combinations, then we focus P. J. Giabbanelli (B) Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_4
61
62
P. J. Giabbanelli
on combinations involving Agent-Based Modeling (Sect. 4.2), and provide guidance on their design (Sects. 4.3–4.4) and performances (Sect. 4.5). As a modeling technique, an FCM has the following strengths and weaknesses: • . It can represent the mental model of individuals or groups, but . it cannot represent how mental models would influence each other. Intuitively, we can think of a mental model as a ‘brain’. An FCM is an efficient approach to elicit brains from people, but it does not tell us how two people will interact: putting two brains next to each other on a table will not get them to do anything together since they lack an interface (e.g., bodies and sensory perceptions). • . It updates the system based on iterations, either until stabilization or once a maximum number of iterations is reached. This is appropriate when time is unimportant. For example, if an FCM represents individuals’ views of a system, then it does not really matter whether an individual arrives at a policy decision within 10 ms or 30 ms. However, . the lack of time is a problem for decision-support systems that care about when a certain effect will be obtained, or in which interventions are inherently time-based. For example, an intervention can consist of a high early investment which gradually decreases over time in the hope that individuals will have changed into a new and sustainable behavior. • . It is a flexible approach to transparently represent any kind of concept, hence spatial notions (e.g., amount of rain) can be easily combined with human behaviors (e.g., decision to irrigate the fields). But in reality, . spaces are not always homogeneous hence the values of some concepts depend on the location (e.g., there may be more rain before the mountain than after). This is important when model users ask about where the effects will be more salient, or whether some areas will see benefits at the detriment of others. As we saw in Chap. 1 (Fig. 4.1), simulation techniques can be broadly classified into aggregate and individual. FCMs belong to the aggregate category, because a concept can represent a group of entities. For example, we use an FCM concept ‘amount of fish’ instead of modeling each fish. Since individual-level techniques represent each entity, they have ways of specifying interactions between entities. These techniques are also commonly time-based, and some allow for spatial heterogeneity. There has thus been considerable research interest in combining FCMs with an individual-level simulation technique. Mago and colleagues combined FCMs with cellular automata1 [36], we proposed a combination of FCMs with complex networks [19], and dozens of studies have combined FCMs with Agent-Based Modeling (ABM) [13].
1
A Cellular Automaton (CA) consists of a set of cells which are arranged in a regular pattern, such as a grid or hexagonal tiling (for a 2D space). Each cell has a state. States are updated over discrete time steps based on time (e.g., an infected cell dies after 4 steps), probability (e.g., chance of recovering), and neighboring cells (e.g., transmitting an infection). While this approach may not be suitable for social systems since people have varying number of social contacts, it is a popular representation of physical systems where dynamics are based on proximity.
4 Hybrid Simulations
63
Background Information: Hybrid simulation versus Hybrid modeling When multiple simulation techniques interact to form a single model, we have a hybrid simulation. In other words, two or more simulation techniques co-exist. A study would not be a hybrid simulation if it involved several techniques one after the other, such as using FCMs to represent stakeholder knowledge and then moving onto Agent-Based Modeling as the sole simulation engine [14]. We also emphasize that a hybrid simulation is strictly within simulation techniques, hence it is different from hybrid modeling, where simulation approaches are combined with techniques from other fields such as machine learning [39] as shown in the last three chapters of this book.
These combinations may operate very differently, as revealed by how the two techniques are interfaced. For example, when combining FCMs with Cellular Automata (CA), there may be one instance of each model, and we switch between them as needed (one FCM/one CA). Alternatively, there could be one FCM for each cell of a cellular automaton (one CA/many FCMs), to represent local-level decision making. Or there could be one cellular automaton for every spatial concept of the FCM (one FCM/many CA), as a means to provide spatial heterogeneity. The same variety exists when combining FCMs with Agent-Based Modeling: we could have one instance of each model (one FCM/one ABM), or an FCM to represent the brain within each agent (one ABM/many FCMs), or an ABM within each FCM concept that involves interactions (one FCM/many ABMs) as in Fig. 4.4. The next section focuses on using an FCM within each agent, as it has been the most widely studied hybrid simulation. From hereon we refer to this option as a ‘hybrid ABM/FCM simulation’.
4.2 Rationale for a Hybrid ABM/FCM Simulation A hybrid ABM/FCM simulation embeds one FCM within each agent, to serve as its decision-making module. Agent-Based Modeling provides the time-based aspect, interacting agents, and interactions between agents and the environment. Fuzzy Cognitive Mapping contributes to creating transparent and potentially varied decisionmaking models, which address some of the main limitations of traditionally developed ABMs where agents are often (i) equipped with the same ruleset2 which was (ii) created and calibrated by modelers rather than transparently elicited from partic2 It is common to specify the behavior of all agents from one category using the same ruleset. For example, a model can capture the vaccine behavior of all farmers as a function of their profile, perceived utility for a certain action, and the perceived norms from their social network. We may still observe heterogeneous actions, because agents have different profiles (i.e., features) or varied positions within their social network and physical environment [48]. However, the implicit assumption is that agents all apply the same reasoning process given a set of inputs. In reality, individuals may have the same set of features and receive the same information, but they could decide differently by following different rulesets or making different errors—a defining characteristic of being
Fig. 4.1 The one FCM a can be combined with several ABMs to provide details on specific aspects of the model. An ABM can be used to explicitly model the interactions between fish, birds, and fishermen (b) while another ABM can detail the spatial heterogeneity of fish within their habitat (c). Results of each ABM would be aggregated and passed back to the FCM
64 P. J. Giabbanelli
4 Hybrid Simulations
65
ipants (see background information below). In line with the analogy in the previous section, we can view ABMs as providing bodies that can move and interact within an environment, while FCMs give them different brains that originate from realworld individuals. A hybrid ABM/FCM can be used with participants, with the aim to “quickly developing the system’s rule in a participatory way from FCM while obtaining temporal and spatial explicitness from ABM” [23]. However, a hybrid ABM/FCM does not have to be employed only in participatory settings: for example, it may reuse previously developed FCMs, or derive them from data using machine learning techniques. Background Information: Three broad schools to develop ABMs Each broad school to develop ABMs focuses on one aspect: ‘keep it simple stupid’ (KISS) strives to create parsimonious models, only adding when strictly necessary; ‘keep it descriptive stupid’ (KIDS) uses data and analyses to decide where a model may be simplified; and ‘keep it a learning tool’ (KILT ) emphasizes the involvement of stakeholders [10]. In KISS and KIDS, modelers create a ruleset based on the evidence-base (e.g., quantitative data, existing conceptual models and theories, interviews). Our point (ii) is mostly a reflection of practices in KISS and KIDS. In KILT, co-design approaches develop ABMs by eliciting rules from participants. For instance, “participatory agent-based simulation sessions have been successfully used as an experimental framework to extract interaction patterns in negotiated (written) elements between participants” [34]. An example of a co-construction process with farmers, students, and researchers can be found in [46].
A hybrid ABM/FCM simulation can be used for different research purpose. One line of inquiry focuses on how influences between agents permeate through their mental models. In this situation, FCMs act as a filter to social influences: concept values can change under an external stimulus, and the model will iterate until it stabilizes at new values. Intuitively, it represents that agents are influenced by each other, but the effects of this influence are mediated by the individual context represented by the FCM. Although humans who interact may change the structure of their mental models, prior studies chose to preserve the transparency of FCMs by only allowing values to change [37]. This paradigm can be applied to socio-environmental cases where resource management is a multi-actor dynamic decision making process. A sample study could examine how stakeholder perspectives change during participatory decision-making processes. Several of the ABM/FCM frameworks emphasize this purpose, such as the Multi-Agent based MObile Negotiation framework (MAMON) in NetLogo [40, 41] or the HYbrid Fuzzy-Agents Simulator (HYFAS) in Java [18]. Another line of inquiry is to develop artificial life systems, in which we observe the evolution of the behavioral traits of agents (as captured by their FCMs) over genhuman [6]. Experimental research also shows that the different rules perceived by individuals are more closely aligned with the heterogeneity of behavioral outcome than their features [21].
66
P. J. Giabbanelli
erations based on mechanisms such as selection. In this situation, the structure of each FCM is also allowed to change, since traits can appear or disappear. EcoSim supports this paradigm and is designed to accommodate the large populations (e.g., hundreds of thousands of agents) needed in studies on artificial life [31].
4.3 Main Steps to Design a Hybrid ABM/FCM Simulation In this subsection, we focus on the most widely studied architecture in which a hybrid ABM/FCM simulates how influences between agents permeate through their mental models. Such simulation models are designed in five phases3 (Fig. 4.2), which cover the initialization of the model (phases . 1 2 3 ), the update (phase . 4 ), and the end of a simulation run (phase . 5 ). . 1 We create the structure of the agent population, that is, the social network. We must decide how many agents we want and which pairs will interact. Modelers can create this network either by using empirical data or (more commonly) by employing a generator. The generator can use transparent mechanisms to create a network, which allows modelers to manipulate the structure of the network and assess its impact on simulation outcomes. For example, the Barabasi-Albert scale-free network generator is built on the notion that ‘the rich get richer’ as individuals are added one at a time in a population, and each time are more likely to form a tie with people who are already well-connected. Several reviews provide pointers to classic network generators [2, 11], while newer options also embed agents in a geographicallyexplicit environment [29]. Alternatively, we can use a black-box machine learning approach for population synthesis, is broader than creating the social network as it may also include agent characteristics, location, or routine activities. This approach involves machine learning techniques such as generative adversarial networks4 [32, 33]. . 2 We assign the FCMs to agents. Since an FCM represents a mental model or ‘brain’, we need to identify which agents get which brain. There are three broad options. First, as seen in early studies, there could be a single FCM structure for all agents: everyone gets the same FCM concepts and causal weights. Agents will still vary in concept values which, together with their network and environmental heterogeneity, can produce different behaviors and give the appearance of diversity.7 . Second, as exemplified by more recent frameworks (Fig. 4.3), there may be several FCMs but fewer than the number of agents. These FCMs are thus used as archetypes and assigned to groups of agents. These individual
3
Our earlier discussion on building such models proposed three phases [20]. Based on ten years of experience, we divided some of these phases to provide better guidance to modelers in this book. 4 The use of such deep neural networks for population synthesis in ABMs is an emerging research area, and its application for hybrid ABM/FCM models is an open research question.
4 Hybrid Simulations
67
Fig. 4.2 The design of any simulation requires the specification of three parts: initializing the model (how do we start), update rules (how do we progress), and halting conditions (when do we end). These parts are consecutively addressed through five steps in the development of a hybrid ABM/FCM simulation
68
P. J. Giabbanelli
FCMs can be obtained from individual participants5 or aggregated from subgroups within a study (e.g., an archetype FCM for farmers, fishermen, or managers). Finally, there may be as many FCMs as the number of agents, which would impose the use of a generator for FCMs. This can be accomplished using genetic algorithms [7], which we discuss in Chap. 7. Note that the assignment of FCMs to agents has received less attention than other steps of the model building process, hence readers may consider innovating to address the specific needs of their problem and/or look for new research in this area. . 3 We set starting values for the concepts of each FCM. This can be achieved by drawing a value for each concept from a distribution (Figs. 4.4a and 4.5a) calibrated from the target population and normalized within the range of values for the FCM (e.g., 0–1). In this case, modelers must be cautious about potential dependencies/correlations between their concepts. For example, independently initializing height and weight may result in individuals with implausible profiles. Alternatively, if the synthetic generator used in step #1 already assigned a value to the agents’ features, then the value may be normalized and reused for the agents’ concepts. Caution: Dependencies The links in the FCM can inform the identification of dependencies. For instance, if .a has incoming edges .ec,a and .eb,a then the value that initializes .a should depend on the values of .b and .c. However, an FCM cannot directly be used as a dependency graph because it may have loops (e.g., .ec,a , .eb,a , .ea,b ) so modelers must decide the order under which concepts are initialized. Concepts with dependencies can be initialized together with a multivariate distribution as it provides multiple values simultaneously. However, finding a multivariate distribution that flexibly matches each concept can be difficult [9, 30] (e.g., uniform for .a but lognormal for .b and normal for .c) and calibrating such distributions can require advanced statistical skills. There is thus growing research in using deep learning to obtain input data for a simulation model [38].
.
4 We create interactions. When we interact with an individual, we are not aware
of their complete mental model. Rather, we can only observe their actions, which may impact some parts of our mental model. To generalize, some concepts from the influencing agent are observable by its peers, who may modify the values of some of their concepts to account for the social interaction. Within each FCM, we identify which subset of concepts can influence other agents or be influenced by other agents (Fig. 4.4b). The modeler expresses how influencing concepts from neighbors will cause an influence in peers (Fig. 4.5b). For example, a peer could copy its neighbors, or align its values in the direction of the average among neighbors when the discrepancy is too high (since it causes discomfort and thus creates an incentive to 5 Several studies are following best practices for replicability by disclosing the FCMs of all individual participants [3, 47], which can thus be reused for simulation purposes.
Fig. 4.3 The CoFluences graphical user interface includes a step to assign FCMs to agents, where users can choose how the proportion of agents that get each FCM. The randomized assignment can be done once and reused for every simulation, or re-assigned at every run
4 Hybrid Simulations 69
Fig. 4.4 Phases in the design of a hybrid ABM/FCM model include a initializing of FCM concept values, b defining social influences between agents as matches between their FCMs, and c configuring the simulation run such as the total number of steps. A simulation step consists of pushing influences from agents onto their neighbors (d) and then having each agent stabilize its FCM independently (e)
70 P. J. Giabbanelli
4 Hybrid Simulations
71
blend with the group). A simulation then consists of applying the influences between connected peers and waiting for each one to stabilize its FCM in reaction to social stimuli (Fig. 4.4c). In this setting, stabilization is always expected. For instance, consider an FCM with 5 factors (Fig. 4.4d): obesity is not contagious so it is not directly subject to influences, and neither is diabetes or socio-economic status. Two individuals may exercise together or discuss certain topics that raise their awareness, hence the exercise level of an individual may be aligned with its group and its awareness of diabetes goes up a little based on peers’ awareness or higher if peers have diabetes. As people interact, the awareness of diabetes may initially be increased significantly (Fig. 4.4e), in the same way as a topic is in our mind just after a conversation, but this level can decrease once the individual independently reflects based on its FCM. Caution: Simultaneous changes in a concept value There are two types of changes. An absolute change happens when a value is erased and replaced, such as when an individual copies its peers’ behavior. A relative change expresses how a concept can increase or decrease from its current value, for example as individuals try to be more like their peers. A concept can be subject to several relative influences: our awareness of diabetes can increase a bit based on the awareness of peers, and increase even further if peers have diabetes. However, there can only be one absolute influence at a time, since it would erase any other change (Fig. 4.5b). This is a key design rule in the composability of changes [18]. . 5 We define halting conditions (i.e., when the simulation stops). A simulation must eventually end, based on conditions set by the user. Such conditions can be timebased (e.g., run for one virtual year) or feature-based (e.g., if all agents are infected or the average level of exercise crosses a threshold). These conditions should be explicitly aligned with the purpose of the model. For example, if a model is intended to help users contrast the effects of various behavior change interventions, then simulations should stop neither at two weeks (not enough time for a change to be observed) nor after 30 years.
Caution: Temporal conflicts between the ABM and FCMs The temporal component comes from the ABM alone, as modelers define the time step/granularity at which agents interact: is step #4 performed every day, week, or month? However, there can be compatibility issues between the ABM and FCMs if an FCM includes actions that take longer than the time step of the ABM. For instance, the FCM in Fig. 4.4-e represents the effect of exercise on obesity, which is slow. If a step of the ABM was set to 1 day, then the FCM would not be able to represent what happened within a day only.
72
P. J. Giabbanelli
Fig. 4.5 The initial value of a concept can be initialized from a univariate distribution, such as a normal, lognormal, or uniform distribution (a). This is appropriate when a concept value is assumed to be independent from the value of other concepts. Otherwise, we either use multivariate distributions or initialize values in a certain order, based on each other. Interactions are based on transforming the influencing concepts of peers onto an impact on influenced concepts (b). This example shows incompatible transformations as ‘pressure to be thin’ is modified both by an absolute change (copy from neighbor) and a relative change
4 Hybrid Simulations
73
4.4 Example of Study Design and Python Implementation In this section, we use a prior study [24] and the library cuda-hybrid to illustrate the main modeling choices with respect to the five phases in the previous section, and their translation into a Python implementation. Readers who wish to access all files and run the complete code to replicate the study can use our online tutorial,6 which also covers interventions and data visualization. The goal of a model is not usually to replicate the complexity of the world with minute details, such that we could see in a computerized form what we already witness in reality. Rather, a model serves a specific set of objectives, which impact its design, data collection, and simulations. In our guiding example, the ABM/FCM seeks to evaluate how the spread of knowledge regarding weight management can impact the level of obesity in the Canadian population. Obesity is the result of a complex system involving environmental factors (e.g., access to fresh fruit and vegetables, safe places to exercise, exposure to fast-food outlets), physiological factors (e.g., ability to exercise, genetics), social factors (e.g., cultural norms), and psychological factors (e.g., stress, body image). Representing all of this complexity could steer modelers towards an ABM/FCM that tracks how agents move in their neighborhoods, where they shop, what they eat, and how it increases adipose tissues in certain areas of the body based on physiological factors (e.g., cortisol from stress). However, before incorporating an aspect into a model, practitioners should ensure that the work involved is worth it given the expected improvement in the ability of the model to address its objective. In our example, spatial aspects are unnecessary to evaluate the spread of knowledge. For instance, the guiding question does not ask how outcomes depend on the area (e.g., which Canadian province, or urban/rural divides) and does not seek to make any spatial intervention (e.g., increase in bike lines or sidewalks). As there is no need to include the physical environment, the population generation in phase . 1 focuses on creating a social network of agents. Background Information: Cost/benefit tradeoff of complex models The benefits of adding to a model can be evaluated in multiple ways whose importance is determined by the stakeholders [1]. A model could have less uncertainty (e.g., as quantified by reduced confidence intervals) or a higher average accuracy. Details could enable the evaluation of model outcomes across sub-populations (e.g., by location, race and ethnicity, age), which helps to determine the fairness and equity of outcomes. A more comprehensive model could enable users to incorporate additional questions of interest, instead of developing a separate model. Achieving these benefits can incur various costs, such as an increase in uncertainty (adding details without enough supporting data) or computational time to perform simulations [28],
6
The step-by-step tutorial created by Kim Ha and Kareem Ghumrawi is available at https://cudahybrid.github.io/tutorials/index.html.
74
P. J. Giabbanelli
as well as increases in the time required to create the model (which translates into financial costs when modelers are on contract).
While there are synthetic populations for ABMs in Canada, they often focus on socio-demographic attributes and geographical locations [45]. In our case, we instead need to generate social ties and we do not require embedding agents in a location. Although social network generators are available for other target populations [50], there is currently none that is specific to Canada. We thus employ general purpose social network generators. There is uncertainty about the social ties, as we do have country-wide data on how Canadians interact with each other. Several broad hypotheses include a small-world effect (i.e., people form groups and some people belong to multiple groups) and a scale-free effect (i.e., most people have few connections, a few people have orders of magnitude more connections). In order to reflect the uncertainty in social ties onto simulation outcomes, we employ two generators: a small-world generator, and a scale-free generator. Using libraries such as NetworkX, a population can be simply instantiated as follows: i m p o r t n e t w o r k x as nx G = nx . w a t t s _ s t r o g a t z _ g r a p h (2412 , 11 , 0 . 0 5 ) # the small - w o r l d g e n e r a t o r has t h r e e p a r a m e t e r s . The f i r s t one is the p o p u l a t i o n size .
Although Canada has almost 40 million inhabitants, a model intended for the whole country would not simulate 40 million agents. The virtual population size has to be sufficiently large so that results can be stable and computed within the resources available. An arbitrarily small population (e.g., 100 or 1,000 agents) may show exceedingly wide fluctuations in the average level of obesity across simulation runs.7 One approach is thus to start with a small population and increase it8 until simulation outputs achieve a desired confidence interval. In phase . 2 , we assign an FCM to each agent. The choice of approach proceeds by elimination given the data available. We do not have individual-level longitudinal data on a nationally representative sample of the population regarding obesity and weight-related factors, hence we rule out the assignment of a unique FCM to each agent. Although there are FCMs on obesity and weight-related across different subpopulations [27], the extremely imbalanced coverage prevents its use to create FCM archetypes that can be assigned to groups of agents. Indeed, the data only provides 7 To illustrate the impact of the number of virtual agents onto the stability of simulation outcomes, see Fig. 9 in [26]. 8 When using multiple generators in a single study, they must be calibrated to produce similar populations. Otherwise, the variance in the simulation outcomes would not be attributable only to how people connect, but also to the number of people and connections. Generators may not be able to create populations of any desired size due to their mechanisms. For example, some generators can create a larger population by copying and linking multiple instances of a smaller population [5]. In our case, we used 2,412 agents because this population size can be generated both by the smallworld generator and the scale-free generator. The other parameter values (.11,.0.05) were determined to provide an equivalent number of social ties in both generated networks.
4 Hybrid Simulations
75
two sub-population, one representing 5% of the population (Indigenous) and the remainder accounting for 95%. Consequently, we assign the same FCM [25] (Fig. 4.6) to all agents, while noting it as a model limitation since the effect of knowledge diversity is unaccounted in the model. def c r e a t e _ o b e s i t y _ F C M () : """ to c o m p l e t e in the next code s n i p p e t """ for a g e n t in G . n o d e s () : G . n o d e s [ a g e n t ][ " FCM " ] = c r e a t e _ o b e s i t y _ F C M ()
Initializing the FCMs in phase . 3 consists of identifying relevant data (i.e., sufficiently recent and applicable to Canada) for as many core concepts as possible. This is a significant data collection effort,9 which can include several data sources and needs to account for potential dependencies between concepts. In this example, we use four data sources (see Table 1 in [24]) including the National Population Health Survey (NPHS) and the Canadian Community Health Survey (CCHS). This allows us to initialize several concepts: for example, the age determines depression and stress in NPHS, and it determines physical health in CCHS (Fig. 4.6). Other concepts stem from the evidence available in peer-reviewed articles. For instance, the Canadian population is mostly sedentary hence the distribution of exercise is skewed towards low values. Society tends to strongly believe that fatness is an undesirable trait and that weight is under one’s own responsibility (despite evidence for the role of the environment and policies), hence these values are high. Finally, some concepts are initialized based on modeling assumptions. Since people tend to gain weight over time, it follows that the energy derived from food must slightly exceed the energy spent from physical activity. We thus calibrate food intake to be above physical activity, such that we replicate the average weight gain of the population. The corresponding code involves several data files for empirical distributions (available online) alongside distributions built in the numpy library: i m p o r t n u m p y as np A G E _ R E L A T E D = np . l o a d t x t ( ’ a g e _ r e l a t e d . txt ’ ) I N C O M E = np . l o a d t x t ( ’ i n c o m e . txt ’ ) def c r e a t e _ o b e s i t y _ F C M () : FCM = nx . r e a d _ e d g e l i s t ( ’ o b e s i t y . txt ’ , n o d e t y p e = str , data =(( ’ w e i g h t ’ , f l o a t ) ,) , c r e a t e _ u s i n g = nx . D i G r a p h () ) FCM . n o d e s [ " F a t n e s s _ a s _ N e g a t i v e " ][ " val " ] = 0.8 FCM . n o d e s [ " B e l i e f _ P e r s o n _ R e s p o n s i b l e " ][ " val " ] = 0.8 9
National statistics can be a valuable source to jointly initialize demographic factors such as age, gender, race and ethnicity. Several constructs are expressed in other data sources as a function of such demographic factors. Note that sources should provide a representative sample for the target population of the model. Since data sources will impact the simulation outcomes, characteristics and limitations of the data sources (e.g., sample size, use of sample weights) should be disclosed as part of a model’s description.
Fig. 4.6 The concepts and relationships in the FCM were created in a prior participatory study [25], at the exception of ‘Knowledge’ which is added for the ABM/FCM hybrid simulation. Concepts are initialized in three steps, starting with a group of six concepts initialized independently from each other
76 P. J. Giabbanelli
4 Hybrid Simulations
77
# E x e r c i s e has i n v e r s e g a u s s i a n d i s t r i b u t i o n with mu of 1 and l a m b d a of 0 . 0 1 FCM . n o d e s [ " E x e r c i s e " ][ " val " ] = np . r a n d o m . w a l d (1 , 0.01) # F o o d _ I n t a k e is a r o u n d 0 . 0 0 2 7 more than E x e r c i s e FCM . n o d e s [ " F o o d _ I n t a k e " ][ " val " ] = FCM . n o d e s [ " E x e r c i s e " ][ " val " ] + 3 0 . 0 / 1 1 0 0 0 . 0 # K n o w l e d g e has i n v e r s e g a u s s i a n d i s t r i b u t i o n with mu of 1 and l a m b d a of 0 . 0 1 FCM . n o d e s [ " K n o w l e d g e " ][ " val " ] = np . r a n d o m . w a l d (1 , 0.01) # I n c o m e is r a n d o m l y a s s i g n e d b a s e d on the d i s t r i b u t i o n f r o m i n c o m e . txt FCM . n o d e s [ " I n c o m e " ][ " val " ] = np . r a n d o m . c h o i c e ( I N C O M E [: , 0] , p = I N C O M E [: , 1 ] / 1 0 0 ) # Age - r e l a t e d c o n c e p t s . Each age will have a d i f f e r e n t d i s t r i b u t i o n for Stress , D e p r e s s i o n , etc . # T h e s e i n f o r m a t i o n are s t o r e d in a g e _ r e l a t e d . txt a g e I n d = int ( np . r a n d o m . c h o i c e ( A G E _ R E L A T E D [: , 0] , p = A G E _ R E L A T E D [: , 1 ] / 1 0 0 ) - 18) FCM . n o d e s [ " Age " ][ " val " ] = A G E _ R E L A T E D [ a g e I n d ][2] p r o p S t r e s s e d = A G E _ R E L A T E D [ a g e I n d ][4] / 100 FCM . n o d e s [ " S t r e s s " ][ " val " ] = np . r a n d o m . c h o i c e ([1 , 0] , p = [1 - p r o p S t r e s s e d , p r o p S t r e s s e d ]) p r o p D e p r e s s e d = A G E _ R E L A T E D [ a g e I n d ][3] / 100 FCM . n o d e s [ " D e p r e s s i o n " ][ " val " ] = np . r a n d o m . c h o i c e ([0 , 1] , p = [1 - p r o p D e p r e s s e d , p r o p D e p r e s s e d ]) FCM . n o d e s [ " A n t i d e p r e s s a n t s " ][ " val " ] = 0 if FCM . n o d e s [ " D e p r e s s i o n " ][ " val " ] == 1: p r o p A n t i = A G E _ R E L A T E D [ a g e I n d ][5] / 100 FCM . n o d e s [ " A n t i d e p r e s s a n t s " ][ " val " ] = np . r a n d o m . c h o i c e ([0 , 1] , p = [1 - propAnti , p r o p A n t i ]) p r o p H e a l t h = A G E _ R E L A T E D [ a g e I n d ][9] / 100 FCM . n o d e s [ " P h y s i c a l _ H e a l t h " ][ " val " ] = np . r a n d o m . c h o i c e ([0 , 1] , p = [1 - p r o p H e a l t h , p r o p H e a l t h ]) p r o b O b e s i t y = A G E _ R E L A T E D [ a g e I n d ] [ 6 : 9 ] / 100 # O b e s i t y [0 , 0.5 , 1] for normal , medium , o b e s e FCM . n o d e s [ " O b e s i t y " ][ " val " ] = np . r a n d o m . c h o i c e ([0 , 0.5 , 1] , p = p r o b O b e s i t y ) # W e i g h t _ D i s c r i m i n a t i o n is the same as O b e s i t y FCM . n o d e s [ " W e i g h t _ D i s c r i m i n a t i o n " ][ " val " ] = FCM . n o d e s [ " O b e s i t y " ][ " val " ] r e t u r n FCM
Observation: Adapting an FCM for a hybrid ABM/FCM simulation An FCM is normally the (by)product of its own study. So far, no FCM has been developed specifically for an ABM/FCM simulation. As a result, hybrid simulations need to reuse and adapt FCMs. A simple adaptation is the addition of an intervention concept, such as ‘Knowledge’ in our guiding example. We caution against adding other concepts, as it would jeopardize the validity of the FCM. For example, if the FCM is obtained from a participatory study and modelers start adding
78
P. J. Giabbanelli
concepts themselves, then we cannot claim anymore that the FCM corresponds to a real-world individual. Another form of adaptation is motivated by the unavailability of data. When an FCM is developed to externalize a mental model, a participatory can share any concepts that they view as relevant. Some of these concepts may not be measurable, and others may not have any representative data for a target population. This becomes an issue when an FCM is reused as the mental model of virtual agents. Anytime a concept without supporting data is used, it adds uncertainty and calls for additional sensitivity analyses (i.e., examining the simulation output as a function of possible values for unknown variables), which can become computationally costly. Consequently, an FCM can be reduced based on data availability. This is an optimization problem in which we seek to minimize uncertainty while keeping the model fit for purpose [16]. Although the consequences of tinkering with an FCM to change the knowledge encoded have not been studied in details, the interest in knowledge editing stemming from language models provides new avenues to define edit methods and evaluate their effects [12].
In phase . 4 , we start by identifying the influencing and influenced concepts. The focus of the study is on the spread of knowledge, so we select ‘Knowledge’ as both influencing and influenced concept. We note that individuals do partake in other shared activities (e.g., exercise, eat) so the model is a simplification of social interactions. Next, we must decide how peers’ knowledge impacts an agent, that is, create a transformation function. Obesity models have used a threshold-and-impact transformation, such that an agent’s value would change (.impact) if it is significantly above/below (.thr eshold) the average value of peers. However, this transformation introduces two unknowns into the model.10 In this case, we consider that an individual does not speak to the whole group at once, but instead interacts with peers one at a time. Peers can convey good or bad advice (e.g., incorrectly portraying some foods as ‘healthy’ or ‘causing weight loss’), so knowledge can increase or decrease. Individuals do not always change behavior after hearing advice, and they may not be as credulous as to accept any advice either. Consequently, we treat the transformation
10
The amount of social pressure necessary to change an individual’s behavior, and the extent to which it changes, are both difficult to observe in reality hence data may not be available. Observational studies may provide upper and/or lower bounds on changes, for example by documenting how an individual adapts to certain situations (e.g., how much more people eat when they are in a group). Unknown parameters can thus be calibrated to ensure that the model’s output falls within a plausible range. If the model is not particularly sensitive to the parameters’ values, a plausible output may be obtained for many parameter values and modelers will have to choose them. A model may also exhibit a phase transition, such that an implausible outcome appears once a certain parameter value is reached, which helps to narrow down possible values (see Fig. 6 in [22]).
4 Hybrid Simulations
79
as a probabilistic event11 whereby there is a probability . p of accepting good advice and thus .1 − p of accepting bad advice: def k n o w l e d g e _ i n f l u e n c e ( i n f l u e n c e d , i n f l u e n c i n g , prob ) : c h o i c e s = np . a r r a y ([ i n f l u e n c e d , i n f l u e n c i n g ]) w e i g h t s = [1.0 ,0.0] if i n f l u e n c i n g > i n f l u e n c e d : w e i g h t s = [1.0 - prob , prob ] elif i n f l u e n c i n g < i n f l u e n c e d : w e i g h t s = [ prob , 1.0 - prob ] r e t u r n np . r a n d o m . c h o i c e ( choices , p = w e i g h t s )
Having defined what happens when two individuals interact, we now specify that each agent will interact with one randomly chosen peer: def o b e s i t y _ i n t e r a c t ( hm , p ) : if hm . A B M _ a d j . s h a p e [0] i n f l u e n c e d : if r a n d o m s [ a g e n t ] < prob : n o d e _ f u t u r e _ v a l [ a g e n t ][ k n o w l e d g e I d x ] = influencing elif i n f l u e n c i n g < i n f l u e n c e d : if r a n d o m s [ a g e n t ] < 1 - prob : n o d e _ f u t u r e _ v a l [ a g e n t ][ k n o w l e d g e I d x ] = influencing
12
Mass-market GPUs such as GeForce GTX 1660 or GeForce GTX 1080 provide 1408 and 2560 CUDA cores, respectively. A workstation-grade GPU such as a Quadro RTX A5000 has 8192 cores.
82
P. J. Giabbanelli
Fig. 4.8 A naive serial implementation (left) loops through each agent to perform its social ties, then loops through each FCM to update them. A parallel implementation (right) divides the population to perform interactions. Once the last interaction is done, we update as many FCMs in a single step as we can fit in the GPU
def o b e s i t y _ i n t e r a c t ( hm , p ) : TPB = (1024 , 1) b l o c k s p e r g r i d _ x = math . ceil ( hm . A B M _ a d j . s h a p e [0] / TPB [0]) BPG = ( b l o c k s p e r g r i d _ x , 1) # Pre - s e l e c t the n e i g h b o r that each agent will i n t e r a c t with n e i g h b o r s = np . z e r o s ( len ( hm . n e i g h b o r s ) ) for i in r a n g e ( len ( n e i g h b o r s ) ) : n e i g h b o r s [ i ] = np . r a n d o m . c h o i c e ( hm . n e i g h b o r s [ i ]) n e i g h b o r s = np . a r r a y ( neighbors , dtype = int ) a g e n t s = hm . A B M _ a d j . s h a p e [0] c u _ n e i g h b o r s = cuda . t o _ d e v i c e ( n e i g h b o r s ) k n o w l e d g e I d x = hm . f c m _ l a b e l s [ " K n o w l e d g e " ] r a n d o m s = np . r a n d o m . u n i f o r m ( size = len ( n e i g h b o r s ) ) c u _ r a n d o m s = cuda . t o _ d e v i c e ( r a n d o m s ) f [ TPB , BPG ]( agents , c u _ n e i g h b o r s , hm . node_val , hm . node_future_val , knowledgeIdx , p , cu_randoms ) p r i n t ( hm . r u n _ p a r a l l e l ([ " O b e s i t y " , " P h y s i c a l _ H e a l t h " ] , [.05 , .05] , 10 , o b e s i t y _ i n t e r a c t , [ hm , p ] , 20) [ " O b e s i t y " ])
4 Hybrid Simulations
83
4.6 Exercises 1. Our human minds are sometimes too busy to pay attention to everything that we see. The information that we observe is also modified through the filter of our biases. Finally, our memories can be distorted over time, particularly as we try to adhere to our existing narratives for our own comfort. a. Which human errors do you see as the most important ones to include in a hybrid ABM/FCM simulation? You can provide a real-life example to explain your choices. b. Choose one error and detail how the model building process (Sect. 4.3) should be modified to account for this error. c. Given the error that you chose above, include it in the obesity study by modifying the Python code from Sect. 4.4. Remember that the code is available at https://cuda-hybrid.github.io/tutorials/index.html, so you can start by copying it from the website rather than the book. d. Not everyone makes the same type of mistakes, or to the same amplitude. An error can thus be modeled through various parameters, such as the percentage of people who make it, or the severity of the error. Using at least one parameter to define the error that interests you, show how various levels of this error affect the simulation outcomes. 2. The virtual agents of an ABM can be equipped with an FCM (as discussed in this chapter) or other types of models, such as artificial neural networks. These alternatives would still result in agents with different rule sets. Given the alternatives detailed in [42], give two examples of situations in which we should use hybrid ABM/FCM simulations, and two examples in which we should use another hybrid instead. For each of the four situations, clearly convey which characteristics of the situation resulted in your modeling choice. 3. Although our focus was on the combination of FCMs with individual-level simulation techniques, many other combinations have been proposed. FCMs have been combined with aggregate-level simulation techniques such as Bayesian networks [4, 49] or structural equation models [35]. FCMs have also been used alongside non-simulation methods, such as neural networks [44]. Choose one of these approaches and: a. Explain the design process. Clearly decompose the steps as in Sect. 4.3 and provide an overview figure to illustrate them. b. Write an implementation in the programming language of your choice on a toy model. 4. The speedup achieved via parallel computing (Sect. 4.5) can be evaluated in several ways: a. with respect to the serial implementation. We know that parallelism is faster, the question is how much faster and under which circumstances. Using the code for our sample study on obesity, show the speed-up that you achieve on
84
P. J. Giabbanelli
your own computer between serial and parallel approaches. (Hint: Use the %timeit command in Python to measure how long each code takes.) b. with respect to an ideal implementation. The ratio between effective speedup and ideal speedup is the parallelization efficiency. How would you construct an ideal implementation? c. with respect to various environments. GPUs are only one of several hardware that supports parallelism, alongside CPUs or FPGAs [51]. There are also many parallel programming frameworks to scale Python code, such as Charm4Py or mpi4py [15]. Pick one hardware and programming framework of your choice and explain how it could be used to scale-up hybrid ABM/FCM simulations.
References 1. M. Bicket, I. Christie, N. Gilbert, D. Hills, A. Penn, H. Wilkinson, Supplementary guide: Handling complexity in policy evaluation, in ed. by H.M. Treasury Magenta Book (2020) 2. F. Amblard, A. Bouadjio-Boulic, C. Sureda Gutiérrez, B. Gaudou, Which models are used in social simulation to generate social networks? a review of 17 years of publications in Jasss, in 2015 Winter Simulation Conference (WSC). (IEEE, 2015), pp. 4021–4032 3. P. Aminpour, S.A. Gray, A. Singer, S.B. Scyphers, A.J. Jetter, R. Jordan, R. Murphy Jr., J.H. Grabowski, The diversity bonus in pooling local knowledge about complex problems. Proc. Natl. Acad. Sci. 118(5), e2016887118 (2021) 4. A. Azar, K.M. Dolatabad, A method for modelling operational risk with fuzzy cognitive maps and bayesian belief networks. Expert Syst. Appl. 115, 607–617 (2019) 5. L. Barriere, F. Comellas, C. Dalfo, M.A. Fiol, Deterministic hierarchical networks. J. Phys. A: Math. Theor. 49(22), 225202 (2016) 6. J.T. Beerman, G.G. Beaumont, P.J. Giabbanelli, A scoping review of three dimensions for longterm covid-19 vaccination models: hybrid immunity, individual drivers of vaccinal choice, and human errors. Vaccines 10(10), 1716 (2022) 7. D. Bernard, S. Cussat-Blanc, P.J. Giabbanelli, Fast generation of heterogeneous mental models from longitudinal data by combining genetic algorithms and fuzzy cognitive maps, in Proceedings of the 56th Hawaii International Conference on System Sciences (2023), pp. 1570–1579 8. P. Bhattacharya, J. Chen, S. Hoops et al., Data-driven scalable pipeline using national agentbased models for real-time pandemic response and decision support. Int. J. High Perform. Comput. Appl. 37(1), 4–27 (2023) 9. B. Biller, C. Gunes, Introduction to simulation input modeling, in proceedings of the 2010 Winter Simulation Conference. (IEEE, 2010), pp. 49–58 10. K. Chapuis, M.-P. Bonnet, N. da Hora, et al,. Support local empowerment using various modeling approaches and model purposes: a practical and theoretical point of view, in Advances in Social Simulation: Proceedings of the 16th Social Simulation Conference, 20–24 September 2021. (Springer, 2022), pp. 79–90 11. K. Chapuis, P. Taillandier, A. Drogoul, Generation of synthetic populations in social simulations: a review of methods and practices. J. Artif. Soc. Soc. Simul. 25(2) (2022) 12. R. Cohen, E. Biran, O. Yoran, A. Globerson, M. Geva, Evaluating the ripple effects of knowledge editing in language models (2023). arXiv:2307.12976 13. C.W.H. Davis, P.J. Giabbanelli, A.J. Jetter, The intersection of agent based models and fuzzy cognitive maps: a review of an emerging hybrid modeling practice, in 2019 Winter Simulation Conference (WSC). (IEEE, 2019), pp. 1292–1303
4 Hybrid Simulations
85
14. S. Elsawah, J.H.A. Guillaume, T. Filatova, J. Rook, A.J. Jakeman, A methodology for eliciting, representing, and analysing stakeholder knowledge for decision making on complex socioecological systems: From cognitive maps to agent-based models. J. Environ. Manag. 151, 500–516 (2015) 15. Z. Fink, S. Liu, J. Choi, M. Diener, L.V. Kale, Performance evaluation of python parallel programming models: Charm4py and mpi4py, in 2021 IEEE/ACM 6th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2). (IEEE, 2021), pp. 38–44 16. A.J. Freund, P.J. Giabbanelli, The necessity and difficulty of navigating uncertainty to develop an individual-level computational model, in International Conference on Computational Science. (Springer, 2021), pp. 407–421 17. K. Ghumrawi, K. Ha, J. Beerman, J.-D. Rudie, P.J. Giabbanelli, Software technology to develop large-scale self-adaptive systems: accelerating agent-based models and fuzzy cognitive maps via cuda (2023), pp. 6863–6872 18. P.J. Giabbanelli, M. Fattoruso, M.L. Norman, Cofluences: simulating the spread of social influences via a hybrid agent-based/fuzzy cognitive maps architecture, in Proceedings of the 2019 ACM SIGSIM conference on principles of advanced discrete simulation (2019), pp. 71–82 19. P.J. Giabbanelli, A novel framework for complex networks and chronic diseases, in Complex Networks. (Springer, 2013), pp. 207–215 20. P.J. Giabbanelli, Modelling the spatial and social dynamics of insurgency. Sec. Inf. 3, 1–15 (2014) 21. P.J. Giabbanelli, Analyzing the complexity of behavioural factors influencing weight in adults, in Advanced Data Analytics in Health (2018), pp. 163–181 22. P.J. Giabbanelli, A. Alimadad, V. Dabbaghian, D.T. Finegood, Modeling the influence of social networks and environment on energy balance and obesity. J. Comput. Sci. 3(1–2), 17–27 (2012) 23. P.J. Giabbanelli, S.A. Gray, P. Aminpour, Combining fuzzy cognitive maps with agent-based modeling: frameworks and pitfalls of a powerful hybrid modeling approach to understand human-environment interactions. Environ. Model. Softw. 95, 320–325 (2017) 24. P.J. Giabbanelli, P.J. Jackson, D.T. Finegood, Modelling the joint effect of social determinants and peers on obesity among Canadian adults, in Theories and Simulations of Complex Social Systems. (Springer, 2013), pp. 145–160 25. P.J. Giabbanelli, T. Torsney-Weir, V.K. Mago, A fuzzy cognitive map of the psychosocial determinants of obesity. Appl. Soft Comput. 12(12), 3711–3724 (2012) 26. M. Gibson, J. Portugal Pereira, R. Slade, J. Rogelj, Agent-based modelling of future dairy and plant-based milk consumption for UK climate targets. J. Artif. Soc. Soc. Simul. 25(2) (2022) 27. B.G. Giles, C.S. Findlay, G. Haas, B. LaFrance, W. Laughing, S. Pembleton, Integrating conventional science and aboriginal perspectives on diabetes using fuzzy cognitive maps. Soc. Sci. Med. 64(3), 562–576 (2007) 28. C. Helgeson, V. Srikrishnan, K. Keller, N. Tuana, Why simpler computer simulation models can be epistemically better for informing decisions. Philos. Sci. 88(2), 213–233 (2021) 29. N. Jiang, A.T. Crooks, H. Kavak, A. Burger, W.G. Kennedy, A method to create a synthetic population with social networks for geographically-explicit agent-based models. Comput. Urban Sci. 2(1), 7 (2022) 30. M.E. Johnson, Multivariate Statistical Simulation: A Guide To Selecting and Generating Continuous Multivariate Distributions, vol. 192. (Wiley, 1987) 31. M. Khater, E. Salehi, R. Gras, The emergence of new genes in Ecosim and its effect on fitness, in Simulated Evolution and Learning: 9th International Conference, SEAL 2012, Hanoi, Vietnam, December 16-19, 2012. Proceedings 9. (Springer, 2012), pp. 52–61 32. E.-J. Kim, P. Bansal, A deep generative model for feasible and diverse population synthesis. Transp. Res. Part C: Emerg. Technol. 148, 104053 (2023) 33. S. Kotnana, D. Han, T. Anderson, A. Züfle, H. Kavak. Using generative adversarial networks to assist synthetic population creation for simulations, in 2022 Annual Modeling and Simulation Conference (ANNSIM). (IEEE, 2022), pp. 1–12 34. C. Le Page, A. Perrotton, Kilt: a modelling approach based on participatory agent-based simulation of stylized socio-ecosystems to stimulate social learning with local stakeholders, in
86
35. 36.
37.
38.
39.
40.
41. 42.
43. 44. 45. 46.
47.
48. 49. 50. 51.
P. J. Giabbanelli Multi-Agent Based Simulation XVIII: International Workshop, MABS 2017, São Paulo, Brazil, May 8-12, 2017, Revised Selected Papers 18. (Springer, 2018), pp. 156–169 L. Luo, L. Zhang, Q. He, Linking project complexity to project success: A hybrid sem-fcm method. Eng. Constr. Archit. Manag. 27(9), 2591–2614 (2020) V.K. Mago, L. Bakker, E.I. Papageorgiou, A. Alimadad, P. Borwein, V. Dabbaghian, Fuzzy cognitive maps and cellular automata: an evolutionary approach for social systems modelling. Appl. Soft Comput. 12(12), 3771–3784 (2012) S. Mkhitaryan, P.J. Giabbanelli, How modeling methods for fuzzy cognitive mapping can benefit from psychology research, in 2021 Winter Simulation Conference (WSC). (IEEE, 2021), pp. 1–12 J.A. Barra Montevechi, A. Teberga Campos, G. Teodoro Gabriel, C.H. dos Santos, Input data modeling: an approach using generative adversarial networks, in 2021 Winter Simulation Conference (WSC). (IEEE, 2021), pp. 1–12 N. Mustafee, A. Harper, B.S. Onggo, Hybrid modelling and simulation (m&s): driving innovation in the theory and practice of m&s, in 2020 Winter Simulation Conference (WSC). (IEEE, 2020), pp. 3140–3151 T. Nacházel, Optimization of decision-making in artificial life model based on fuzzy cognitive maps, in 2015 International Conference on Intelligent Environments. (IEEE, 2015), pp. 136– 139 T. Nacházel, Fuzzy cognitive maps for decision-making in dynamic environments. Genet. Program Evolvable Mach. 22(1), 101–135 (2021) A. Negahban, P.J. Giabbanelli, Hybrid agent-based simulation of adoption behavior and social interactions: alternatives, opportunities, and pitfalls. IEEE Trans. Comput. Soc. Syst. 9(3), 770–780 (2021) H.R. Parry, M. Bithell, Large scale agent-based modelling: a review and guidelines for model scaling, in Agent-Based Models of Geographical Systems (2011), pp. 271–308 K. Poczeta, E.I. Papageorgiou, Energy use forecasting with the use of a nested structure based on fuzzy cognitive maps and artificial neural networks. Energies 15(20), 7542 (2022) M. Prédhumeau, E. Manley, A synthetic population for agent-based modelling in Canada. Sci. Data 10(1), 148 (2023) A.G. Lima Resque, E. Perrier, E. Coudel, L. Galvão, J.V. Fontes, R. Carneiro, L. Navegantes, C. Le Page, Discussing ecosystem services in management of agroecosystems: a role playing game in the eastern Brazilian amazon. Agroforestry Syst. 1–15 (2021) R.C. Rooney, J. Daniel, M. Mallory et al., Fuzzy cognitive mapping as a tool to assess the relative cumulative effects of environmental stressors on an arctic seabird population to identify conservation action and research priorities. Ecol. Sol. Evid. 4(2), e12241 (2023) J. Sok, E.A.J. Fischer, Farmers’ heterogeneous motives, voluntary vaccination and disease spread: an agent-based model. Eur. Rev. Agric. Econ. 47(3), 1201–1222 (2020) Y.Y. Wee, W.P. Cheah, S.C. Tan, K.K. Wee, A method for root cause analysis with a bayesian belief network and fuzzy cognitive map. Expert Syst. Appl. 42(1), 468–487 (2015) C. Zhuge, C. Shao, B. Wei, An agent-based spatial urban social network generator: A case study of Beijing, China. J. Comput. Sci. 29, 46–58 (2018) A.N. Ziogas, T. Schneider, T. Ben-Nun, et al., Productivity, portability, performance: Datacentric python, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2021), pp. 1–13
Chapter 5
Analysis of Fuzzy Cognitive Maps Ryan Schuerkamp and Philippe J. Giabbanelli
Abstract Structural analysis of Fuzzy Cognitive Maps leveraging techniques from network science and graph theory can answer several questions about the system of interest without performing simulations. This chapter focuses on widely used methods to answer two questions about FCMs and applies them to a guiding example from a real-world case study. First, what are the important concepts? We introduce transmitter, receiver, and ordinary concepts and five concept centrality measures (degree, betweenness, closeness, eigenvector, and Katz) to determine the critical concepts. Second, is the FCM facilitation (i.e., construction) process good? We define commonly used metrics (e.g., receiver-transmitter ratio and density) to assess the quality of FCM facilitation and support model comparison. Readers should be able to confidently analyze and compare FCMs after reading this chapter and completing its exercises.
5.1 Why Analyze Fuzzy Cognitive Maps? FCMs can answer numerous questions about a system of interest without performing simulations because of their graph/network representation through network science [23] and graph theory methods. For example, given an FCM that depicts how diabetes is shaped by social norms and environmental factors [13], we could identify the concepts with the most relationships without performing simulations; however, identifying the long-term effects of smoking more on diabetes would require simulations. Analyzing and comparing FCMs is a complex task with many options and the subject of research articles [29, 32]. We focus the chapter and its exercises on a subset of commonly employed techniques and apply them to an example FCM (Fig. 5.1) derived from [13] to answer three pertinent questions for FCMs: R. Schuerkamp · P. J. Giabbanelli (B) Department of Computer Science and Software Engineering, Miami University, 524 E High St, Oxford, OH 45056, USA e-mail: [email protected] R. Schuerkamp e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_5
87
88
R. Schuerkamp and P. J. Giabbanelli
Fig. 5.1 An example FCM derived from [13]. Edge weights are omitted from figures for simplicity but are shown in the lower subfigure of Fig. 2 in [13]
1. What are the important concepts in the FCM? Identifying the critical concepts in an FCM can reveal potential leverage points if we seek to steer the system in a particular direction through interventions. 2. Is the facilitation process good? Participants, or groups of participants, with different sources and levels of knowledge often build FCMs, and determining and differentiating between the quality of FCMs is widely done in practice and can help compare the perspectives of groups of stakeholders on a system. Thus, it is important to assess the quality of the facilitation process to ensure the participants are fairly compared. 3. How similar are FCMs? Directly comparing FCMs can help identify the similarities and differences between FCMs and the perspectives of the participants who built them. The remainder of this chapter is organized as follows. Section 5.2 introduces two techniques to determine the important concepts in the FCM. In Sect. 5.3, we define metrics commonly used to compare FCMs and assess quality. Section 5.4 briefly recaps the chapter, and Sect. 5.5 provides exercises to reinforce readers’ comprehension and familiarize them with techniques used to compare FCMs. Code for the guiding example and its analysis is available at https://doi.org/10.5281/zenodo.8235905, and Table 5.1 summarizes when to use
5 Analysis of Fuzzy Cognitive Maps
89
Table 5.1 Popular metrics to analyze FCMs implemented in NetworkX [15], the function to use in NetworkX, and when to use the metric and function Metric(s) NetworkX function Situation Find important concepts within a neighborhood Identify concepts critical in spreading effects and interventions Discover concepts that quickly spread effects or the optimal location in a network Detect the important concepts in the system based on the importance of its neighbors Assess the size of the FCM
Degree centrality
degree_centrality
Betweenness centrality
betweenness_centrality
Closeness centrality
closeness_centrality
Eigenvector centrality Katz centrality
eigenvector_centrality katz_centrality
Number of concepts Number of relationships Average distance (i.e., average shortest path length)
len(graph.nodes()) len(graph.edges()) average_shortest_path_length
Determine how fast information will spread throughout the entire FCM Evaluate on average how close Average clustering coefficient a node and its neighbors are to forming a complete graph Judge how close the FCM is Density to a complete graph as a whole Appraise the complexity Feedback loops of the FCM
average_clustering
density simple_cycles
each of the measures covered in the chapter that are implemented by Python’s NetworkX [15], a popular network science library.
5.2 What Are the Important Concepts? Identifying important concepts in FCMs is a critical task widely conducted because it highlights the concepts driving the behavior of the system and potential points of intervention. Two techniques are often employed to discover the crucial concepts. First, each concept is often categorized as a transmitter, receiver, or ordinary concept. Section 5.2.1 introduces this approach and applies it to this chapter’s guiding example. Second, the importance of concepts is ranked according to various centrality measures, which each defines concept importance differently. We introduce a subset of centrality measures in Sect. 5.2.2 and apply them to the guiding example.
90
R. Schuerkamp and P. J. Giabbanelli
5.2.1 Transmitter, Receiver, and Ordinary Concepts Classifying concepts as transmitters, receivers, or ordinary concepts identifies the role of each in the system of interest. Transmitters only have outgoing relationships. They spread information and influence but are not affected by any other concept in the system. On the other hand, receivers only have incoming relationships. Thus, they do not have influence, but concepts influence them. Ordinary concepts have both incoming and outgoing relationships. In the cause-effect perspective, ordinary concepts are mediating factors, receivers are outcomes, and transmitters are inputs (i.e., stressors in health assessment) [13]. In the guiding example, there are 14 transmitters (e.g., Alcohol), zero receivers, and four ordinary concepts (e.g., Diabetes) (Fig. 5.2), indicating numerous inputs act on fewer outcomes. The classification of concepts as transmitters, receivers, or ordinary concepts is frequently used to analyze FCMs in real-world case studies [13, 20, 26]. For example, when comparing FCMs representing the scientific community’s and Canadian Aboriginal communities’ perspectives on diabetes [13], modelers found the scientific community’s FCM had several transmitters (72% of concepts), few ordinary concepts (28%), and no receivers. This views the system as depending on a large collection of inputs. In contrast, the Canadian Aboriginal communities’ FCM had 31% of its
Fig. 5.2 Transmitters are shown in pink/red, and ordinary concepts are shown in blue
5 Analysis of Fuzzy Cognitive Maps
91
concepts classified as transmitters and 69% as ordinary concepts, indicating a more intertwined system.
5.2.2 Centrality Measures Centrality measures induce a ranking of concepts,1 answering the question of what are the most important concepts. Each centrality measure defines importance differently. There are hundreds of centrality measures; some would take a long time to compute [10], and many others are highly correlated such that there the information they present becomes redundant [25]. We focus this subsection on five centrality measures commonly applied to FCMs. We apply each measure to the guiding example and provide their results in Table 5.2. Consider an FCM representing a road system where roads are relationships weighted by their average daily traffic, intersections are concepts, and a relationship from one intersection to another indicates the number of vehicles on average each day that travel the road between the intersections. Centralities can answer four critical questions. First, what intersection is the busiest (i.e., the one with the highest daily average traffic, combining incoming and outgoing traffic)? Degree centrality can rank intersections by their cumulative traffic (i.e., immediate impacts). Second, if we closed an intersection, what one would disrupt traffic the most and the least? We can use betweenness centrality to identify the impact of closing intersections (i.e., removing concepts). Third, if a company is building a gas station, where should they construct it so it is nearest to all intersections? Closeness centrality can identify the optimal location. Finally, what intersection is the most important based on traffic flow across the road system (i.e., what concept is most important across the system based on its relationships and their importance)? Eigenvector and Katz centrality take a recursive approach in which the importance of a concept depends on the importance of the concepts that connect to it. This subsection is organized as follows. Section 5.2.2.1 introduces degree centrality, Sect. 5.2.2.2 covers betweenness, Sect. 5.2.2.3 presents closeness, and Sects. 5.2.2.4 and 5.2.2.5 introduce eigenvector and Katz.
5.2.2.1
Indegree, Outdegree, and Degree Centrality
Indegree measures how much a concept is directly influenced by others and is defined for a concept as the sum of absolute values of its incoming edge weights: d (ci ) =
N ∑
. in
|w ji |
(5.1)
i=1
1 We cannot directly compare two concepts based on their centrality values (e.g., a centrality of five does not mean the concept is five times as important as a concept with a centrality of one) [18].
92
R. Schuerkamp and P. J. Giabbanelli
Table 5.2 The five centrality measures applied to the guiding example Degree Betweenness Closeness Eigenvector 1: Diabetes 2: Physical Activity 3: High Birth Weight 4: Body weight 5: Traditional lifestyle 6: Smoking
1: Physical Activity 2: Diabetes
Katz
1: Diabetes
1: Diabetes
1: Diabetes
2: High Birth Weight 3: Body weight
4: Alcohol
2: High Birth Weight 3: Physical Activity 4: Body weight
4: Smoking
5: Alcohol
4: Physical Activity 5: Alcohol
2: High Birth Weight 3: Physical Activity 4: Body weight
4: High Birth Weight 4: Breast-Feeding 4: LBW 4: GDM 4: Education
5: Smoking
5: Smoking
5: Smoking
5: Breast-Feeding 5: LBW 5: GDM 5: Education
5: Breast-Feeding 5: LBW 5: GDM 5: Education
5: Breast-Feeding 5: LBW 5: GDM 5: Education
5: Employment 5: Income 5: Married v single 5: Health Status 5: Self-efficacy
5: Employment 5: Income 5: Married v single 5: Health Status 5: Self-efficacy
5: Employment 5: Income 5: Married v single 5: Health Status 5: Self-efficacy
5: Traditional lifestyle 5: TV watching 5: Traditional Diet
5: Traditional lifestyle 5: TV watching 5: Traditional Diet
5: Traditional lifestyle 5: TV watching 5: Traditional Diet
3: Body weight
6: GDM 6: Education 6: Employment 6: Traditional Diet 4: Employment 7: Alcohol 7: Breast-Feeding 4: Income 7: LBW 4: Married v single 7: Income 4: Health Status 7: Married v 4: Self-efficacy single 7: Health Status 4: Traditional lifestyle 4: TV watching 7: Self-efficacy 7: TV watching 4: Traditional Diet
5: Alcohol
A transmitter has an indegree of zero, and the indegree could rank the receivers most influenced by other concepts. Similarly, outdegree measures how much a concept influences others and is defined as the sum of absolute values of its outgoing edge weights: N ∑ .dout (ci ) = |wi j | (5.2) i=1
A receiver would have an outdegree of zero. Degree centrality describes the importance of a concept within its immediate neighbors (i.e., adjacent concepts) and is defined as the sum of the concept’s indegree and outdegree:
5 Analysis of Fuzzy Cognitive Maps
cen D (ci ) =
93 N ∑
.
|w ji | +
i=1
N ∑
|wi j |.
(5.3)
i=1
Thus, concepts will have a high degree centrality if they are directly involved in several strong relationships and a low degree centrality if they are in a few weak relationships. In the guiding example, diabetes is the most important concept according to degree centrality (Fig. 5.3a), indicating it has the most immediate effects within its neighborhood (i.e., the concepts it directly has a relationship with/is connected to). Indegree, outdegree, and degree centrality are easy to compute and employed by numerous case studies [13, 21, 26] to determine concepts with the greatest immediate effects. For example, degree centrality indicated that lake pollution is the most critical concept for ecosystem conservation in a Turkish lake [26], suggesting interventions such as educational programs should target lake pollution to improve conservation efforts.
5.2.2.2
Betweenness Centrality
Betweenness centrality reflects how important a concept is in spreading information. It answers the question, when every concept tries to reach one another most efficiently, what concepts are visited the most along the way? Formally, it assesses concept importance as the fraction of the shortest paths that flow through it: cen B (ci ) =
.
N ∑ σst (ci ) , σst s,t=1
(5.4)
s/=t/=ci
where .σst is the number of shortest paths from a source concept .s to a target concept t and .σst (ci ) is the number of those paths that involve .ci . Betweenness centrality assumes the shortest path is always taken and exchanges between concepts are equal. A concept with high betweenness influences the flow of information between concepts, and it may take longer for information to flow without it. Physical activity has the highest betweenness centrality in the guiding example (Fig. 5.3b). Modelers have applied betweenness centrality to FCMs [2, 8], but its use is not as prevalent as degree centrality. For example, modelers have used betweenness centrality to identify the concepts critical for spreading effects and interventions (e.g., information or initiative) between disconnected groups, supporting the management of Utah’s Bonneville Salt Flats [2].
.
5.2.2.3
Closeness Centrality
Closeness centrality defines concept importance based on optimal location in physical space. Thus, it is most appropriate for spatial networks, where concepts map
94
R. Schuerkamp and P. J. Giabbanelli
(a) Degree centrality.
(b) Betweenness centrality
(c) Closeness centrality
Fig. 5.3 The five centrality metrics applied to the guiding example
5 Analysis of Fuzzy Cognitive Maps
95
(d) Eigenvector centrality
(e) Katz centrality
Fig. 5.3 (continued)
to positions in space, and the goal is to optimize the placement of something (e.g., determining where to build a hospital to serve a population). Therefore, applying it to FCMs does not align with its original goal but is still commonly done in practice [2, 21, 24, 33]. Mathematically, closeness centrality is the reciprocal of the average shortest path length from the concept to all other concepts: cen C (ci ) = ∑ N
.
N −1
s=1 s/=ci
d(s, ci )
,
(5.5)
where .d(s, ci ) is the distance (i.e., the length of the shortest path between) concepts s and .ci . However, this definition is problematic when the graph is not connected (i.e., there is not a path between every pair of concepts), so a different formulation
.
96
R. Schuerkamp and P. J. Giabbanelli
was proposed for when there is more than one connected component [31]. A concept with a high closeness centrality has a lower average distance from other concepts. Diabetes has the highest closeness centrality in our example (Fig. 5.3c), implying its effects may quickly spread through the FCM. Closeness centrality is frequently used in FCM case studies [2, 21, 33]. For example, closeness centrality has found economic stagnation is the most central concept for wind energy deployment in Iran [33], so changes in economic stagnation may rapidly affect the system.
5.2.2.4
Eigenvector Centrality
Eigenvector centrality determines the importance of a concept by accounting for the importance of the concepts it is related to (i.e., a concept is more important if it is connected to other important concepts). Specifically, the eigenvector centrality of a concept .ci is proportional to the sum of the centralities of its neighbors (i.e., the concepts it is related to) and is given by the .ith entry of the vector .x defined as follows: Wx = κx,
.
(5.6)
where .κ is the largest eigenvalue of the weight matrix .W and .x is the corresponding eigenvector. Eigenvector centrality does not work for acyclic networks (e.g., FCMs without a feedback loop) because all concepts will have a centrality of zero [23], but this generally does not affect FCMs, as feedback loops are typically present. According to eigenvector centrality, diabetes is the most important concept (Fig. 5.3d). Eigenvector centrality has been applied to numerous FCM case studies to rank concepts’ importance [1, 2, 5, 22, 33] and identify which ones best spread effects system-wide in the long-term [33].
5.2.2.5
Katz Centrality
Katz centrality is a variant of eigenvector centrality that initially gives each concept a small base importance/centrality to overcome the previously mentioned limitations of eigenvector centrality. Similarly, the Katz centrality of a concept .ci is defined by the .ith entry of the vector .x defined as follows: x = αWx + β1,
.
(5.7)
where .1 is a vector of all ones and .α and .β are positive constants with .β specifying the initial centrality value for each concept and .α balancing the eigenvector centrality with the initial centrality value .β. Thus, a concept with zero eigenvector centrality now has an initial Katz centrality that it can pass to its neighbors. The choice of .β is unimportant because the absolute magnitude of centrality scores does not matter, so a value of .β = 1 is often used [23]. .α has a greater effect on Katz centrality. When
5 Analysis of Fuzzy Cognitive Maps
97
α = 0, only the constant term .β survives, so all concepts have the same centrality. For Katz centrality to converge, we must satisfy the inequality .α ≤ κ1 , where .κ is the largest eigenvalue of the weight matrix .W. Using a value close to . κ1 places the maximum weight on the eigenvector centrality and less on the constant term. In the guiding example, Katz centrality ranked diabetes as the most important concept (Fig. 5.3e). Additionally, it has been applied to FCMs, finding that aggregate FCMs that agree on Katz centrality tend to agree on simulation outcomes; however, it does not work as well for individuals maps because Katz centrality relies on feedback, which is less present in individual FCMs due to fewer relationships [19].
.
5.3 Validating the Facilitation Process Determining whether the FCM facilitation process is good is a critical task, as it may result in a lack of trust in the FCM or further data gathering or participant input if it is not. We emphasize that such an assessment concerns the FCM structure only, since convergence or prediction capabilities are not assessed. Also, assessing the quality of different FCMs can aid in comparing the knowledge and perspectives of distinct groups of participants. For example, one may expect the scientific community and their FCM to contain more specific knowledge on diabetes than the FCM produced by a non-expert Canadian Aboriginal population [13] and an aggregate FCM produced by ecologists to have more information on the fish ecosystem than one made by fishermen [19]. This subsection provides a subset of metrics to assess model quality and enable comparison. Section 5.3.1 introduces the importance of the number of concepts and relationships in the model, Sect. 5.3.2 the receiver-transmitter ratio, Sect. 5.3.3 average distance (i.e., average shortest path length) and diameter, Sect. 5.3.4 clustering coefficient, Sect. 5.3.5 density, and Sect. 5.3.6 feedback loops.
5.3.1 Number of Concepts and Relationships Calculating the number of concepts and relationships in an FCM is often a first step to compare FCMs [2, 13]. The number of concepts is an easy way to evaluate the size of the FCM, and the number of relationships is one way to assess the complexity of the FCM with more connections implying a more complex model [2]. However, the number of new concepts and relationships tends to saturate as more participants are added [26]. Calculating the number of relationships can be more relevant for comparing FCMs because some modelers create a fixed set of concepts for participants to include in their FCM [33]. In the guiding example, there are 18 concepts and 29 relationships.
98
R. Schuerkamp and P. J. Giabbanelli
5.3.2 Receiver-Transmitter Ratio After classifying each concept as a receiver, transmitter, or ordinary concept, the receiver-transmitter ratio is often calculated as . R/T , where . R is the number of receivers in the FCM and.T is the number of transmitters. A large receiver-transmitter ratio implies the FCM considers numerous outcomes of the system that result from few inputs and can indicate the map is complex [6, 26]. In contrast, a low receivertransmitter ratio indicates that many inputs act on few outcomes, which can be indicative of an overly simplified system in which causal relationships are not fully elaborated [6, 13, 26]. In the guiding example, there are 14 transmitters (e.g., Alcohol), zero receivers, and four ordinary concepts (e.g., Diabetes) (Fig. 5.2). Thus, the receiver-transmitter ratio is .0/14 = 0, implying that causal relationships may not be sufficiently elaborated and the FCM is potentially an oversimplification. The receiver-transmitter ratio is frequently used to analyze FCMs in real-world case studies [13, 20, 26].
5.3.3 Metrics Based on Shortest Paths Recall the distance .d(ci , c j ) between two concepts .ci and .c j is the length of the shortest path between them (i.e., the fewest number of relationships between them). This is unrelated to the Euclidean distance and it does not depend on any drawing of a graph: it is entirely a reflection of the structure of the graph and the sequence of edges that can most efficiently go from one node to another. The distance is used to compute two model-level measures: average distance and diameter. First, the average distance (a.k.a., average shortest path length) is defined as follows: ∑ 1 d(ci , c j ), N × (N − 1) i/= j N
l=
.
(5.8)
where.d(ci , c j ) = 0 if.c j cannot be reached from.ci . A small average distance implies that information and impacts spread fast throughout the model, so a change in one concept will quickly affect others. The average distance has been applied to FCM case studies [33], and the average distance for the guiding example is 0.258 (it is less than one because the distance accounts for the relationship weights). Second, the diameter is the largest distance in the model (i.e., the fewest number of relationships between the two most distant concepts): δ = max{d(ci , c j )}.
.
ci ,c j
(5.9)
The diameter indicates the maximum number of FCM inference iterations it would take for a change in one of the most distant concepts to affect the other. Additionally,
5 Analysis of Fuzzy Cognitive Maps
99
it intuitively represents how much the FCM spans the problem space. If a diameter is exceedingly small relative to the number of nodes, it can indicate that causal relationships may not be well elaborated [11]. For example, the guiding example has a diameter of three. There are numerous distances of three, but one is from income to high birth weight (the path is income to physical activity to diabetes to high birth weight). Thus, a change in income will take three iterations to impact high birth weight.
5.3.4 Clustering Coefficient As a local measure for a given node, the clustering coefficient measures the density of a node’s neighborhood. If all of its adjacent nodes are also connected, then there is a high local clustering coefficient. The value can be averaged across the nodes, leading to a global clustering coefficient over the whole model. For FCMs, the unweighted and weighted clustering coefficient may be of interest, and both are implemented in popular network science libraries such as Python’s NetworkX [15] (Table 5.1). There are several ways to define the clustering coefficient for weighted graphs like the FCM [27] but only one way for unweighted graphs. For the sake of simplicity, we define the unweighted clustering coefficient for a concept as follows: Ci =
.
2 × mi , di (di − 1)
(5.10)
where .m i is the number of relationships between the .ith concept and its neighbors, and .di is the degree of the concept. A concept with a high clustering coefficient is closer to forming a complete graph with its neighbors. The average clustering coefficient provides a measure of the overall level of clustering in the network and is defined as follows: C=
.
N 1 ∑ Ci . N i=1
(5.11)
A high average clustering coefficient could indicate participants did not selectively add relationships, potentially indicating an FCM with excessive relationships. The average clustering coefficient has been applied to case studies [33], and the guiding example has an average clustering coefficient of 0.26, implying the scientific community was reasonably selective in including relationships.
100
R. Schuerkamp and P. J. Giabbanelli
5.3.5 Density Similar to the clustering coefficient, density provides a measure of how similar the FCM is to a complete graph. For the type I FCM (i.e., the previous value of the concept does not affect its current value), density is defined as follows: .
D=
m , N (N − 1)
(5.12)
where .m is the number of relationships in the FCM. Similarly, for the type II FCM (i.e., the previous value affects its current value), density is defined as follows: .
D=
m . N2
(5.13)
A low value for density is desirable for FCMs because it indicates participants were selective including relationships, and we expect the density to decrease as the number of concepts or diameter increases [11]. Density has been applied to numerous case studies [13, 26, 33], and the guiding example has a density of 0.095, indicating the scientific community was selective in including relationships in their FCM. We emphasize that this interpretation of density is within the context of FCMs built from participants. If FCMs were built from data as classifiers, then having dense FCMs is often needed to produce accurate predictions.
5.3.6 Feedback Loops Feedback loops characterize complex systems across fields and make the dynamic behavior of the system and its final state difficult to predict [4, 30], and their absence may imply insufficient participant input [9]. A feedback loop occurs when you start at a concept and end back up at it by traversing its relationships. In the guiding example, there is a relationship between diabetes to high birth weight and another from high birth weight to diabetes, so there is a feedback loop between the two. FCM case studies frequently identify feedback loops [7, 16], and there are several algorithms to find feedback loops (i.e., cycles) in graphs [14, 17, 28], which we omit for the sake of brevity. There are two types of feedback loops: positive reinforcing and negative balancing. A positive feedback loop contains an even number of negative relationships and reinforces system behavior (Fig. 5.4a), and having many positive feedback loops in an FCM may increase system deviation and instability [7]. FCMs with only positive feedback loops are said to show structural balance [3]. On the other hand, a negative feedback loop contains an odd number of negative relationships and decreases the effect of change (Fig. 5.4b). If there is a mix of positive and negative feedback loops, it is difficult to predict their collective effect [7]. Thus, identifying feedback loops and whether they are positive or negative is important for FCMs. In the guiding example,
5 Analysis of Fuzzy Cognitive Maps
Mental Health Disorders
+
101
Hopelessness
-
-
Feeling energetic and hopeful
Mental health treatment
+
Parents' involvement in care
+
+
Therapy success
(a) Positive/reinforcing feedback loop
Supervised internet access
-
+
+
Parents change practices at home
Capacity for suicide
+ Informing others of suicidal thoughts
Ability to identify suicide methods
+
+
Suicide ideation
(b) Negative/balancing feedback loop
Fig. 5.4 An example positive/reinforcing (a) and negative/balancing (b) feedback loops derived from [11], where green edges are positive relationships and red are negative
there is only one feedback loop, and it is negative because the relationship between diabetes and high birth weight is positive with a weight of 0.19, and the relationship between high birth weight and diabetes is negative with a weight of -0.01, so there is an odd number of edge weights.
5.4 Conclusion This chapter provided a subset of possible measures and metrics to identify important concepts in the FCM and if the FCM is a good model, supporting modelers in determining the concepts driving the system of interest, potential intervention points, and the quality of an FCM. Its exercises guide readers in comparing FCMs and applying the techniques discussed to real-world case studies. After reading this chapter and completing its exercises, readers should confidently be able to analyze FCMs in the real world and learn additional techniques to support further analysis as needed from the rich existing literature on network science [23].
102
R. Schuerkamp and P. J. Giabbanelli
5.5 Exercises The following exercises are classified by level of difficult, as indicated by the number of * symbols. 1. Read [13] and pay close attention to the comparison between the scientific knowledge (SK) and traditional knowledge (TK) FCMs. a. * Apply three of the five centralities covered in Sect. 5.2.2 to the full FCM in Fig. 2 in [13]. To quickly implement the full FCM, modify the code that creates the simplified FCM (i.e., the guiding example) in the Building the FCM section of the provided jupyter notebook (https://doi.org/10.5281/zenodo.8235905). b. * How do the five most important concepts according to the selected centralities differ from the simplified model in Fig. 5.2, which we used as the guiding example? Refer to Table 5.2 for the concept rankings by centralities in the simplified model. c. ** Justify your selection of the three centralities. What information do they neglect? d. *** How would the concept importances differ if you used the remaining centralities? Verify your hypothesis. 2. *** Apply your selected three centralities to the TK FCM also. Additionally, apply average distance to and determine the number of positive and negative feedback loops in the SK and TK FCMs. Compare the results of the centralities, average distances, and number of feedback loops and interpret the differences. How do your findings compare to those of the paper? 3. ** How are eigenvector and Katz centrality different? State the importance of this difference and include an example application area where it would matter. 4. *** Are the average clustering coefficient and density correlated? If you believe they are correlated, state if they are positively or negatively correlated. Test your hypothesis by increasing and decreasing the number of relationships in the full FCM from exercise 1 and recording the density and average clustering coefficient for each number of relationships. We suggest trying at least 10 different numbers of relationships, with some lower numbers (e.g., 10) and higher ones (e.g., 100). 5. Read [12] and pay close attention to the sections on graph edit distances, graph kernels, and graph embeddings. a. * What arguments must users provide to use graph edit distances, graph kernels, and graph embeddings to compare FCMs? Examine how the required arguments for the three differ. b. ** How are graph kernels and embeddings different? Include how each quantifies the differences between graphs. c. *** Use graph edit distance, graph kernels, and graph embeddings to compare the full FCM from exercise 1 (the full FCM in Fig. 2 in [13]) to the guiding example. For the sake of simplicity, use the number of concepts and relationships as the two features for graph kernels and embeddings.
5 Analysis of Fuzzy Cognitive Maps
103
6. ** Read [29] and select one of the 37 metrics and measures they covered that this chapter did not. What information does it capture, and how does it complement the methods covered in this chapter?
References 1. J. Alonso-Garcia, F. Pablo-Martí, E. Nunez-Barriopedro, Omnichannel management in b2b. Complexity-based model. empirical evidence from a panel of experts based on fuzzy cognitive maps. Ind. Market. Manag. 95, 99–113 (2021) 2. M.P. Blacketer, M.T. Brownlee, E.D. Baldwin, B.B. Bowen, Fuzzy cognitive maps of socialecological complexity: applying mental modeler to the bonneville salt flats. Ecol. Complex. 47, 100950 (2021) 3. D. Cartwright, F. Harary, Structural balance: a generalization of heider’s theory. Psychol. Rev. 63(5), 277 (1956) 4. J.P. Carvalho, On the semantics and the use of fuzzy cognitive maps and dynamic cognitive maps in social sciences. Fuzzy Sets Syst. 214, 6–19 (2013). Soft Computing in the Humanities and Social Sciences 5. Y. Choi, H. Lee, Z. Irani, Big data-driven fuzzy cognitive map for prioritising it service procurement in the public sector. Ann. Oper. Res. 270(1), 75–104 (2018) 6. C. Eden, F. Ackermann, S. Cropper, The analysis of cause maps. J. Manag. Stud. 29(3), 309–324 (1992) 7. C. Enrique Peláez, J.B. Bowles, Using fuzzy cognitive maps as a system model for failure modes and effects analysis. Inf. Sci. 88(1), 177–199 (1996) 8. B. Feleko˘glu, A. Baykaso˘glu, et al., A fcm-based systematic approach for building and analyzing problem solving networks in open innovation. J. Multiple-Valued Logic & Soft Comput. 34 (2020) 9. H.S. Firmansyah, S.H. Supangkat, A.A. Arman, P.J. Giabbanelli, Identifying the components and interrelationships of smart cities in indonesia: supporting policymaking via fuzzy cognitive systems. IEEE Access 7, 46136–46151 (2019) 10. A.J. Freund, P.J. Giabbanelli, An experimental study on the scalability of recent node centrality metrics in sparse complex networks. Front. Big Data 5 (2022) 11. P.J. Giabbanelli, K.L. Rice, M.C. Galgoczy, N. Nataraj, M.M. Brown, C.R. Harper, M.D. Nguyen, R. Foy, Pathways to suicide or collections of vicious cycles? understanding the complexity of suicide through causal mapping. Soc. Netw. Anal. Mining 12(1), 60 (2022) 12. P.J. Giabbanelli, A.A. Tawfik, V.K. Gupta, Learning analytics to support teachers’ assessment of problem solving: a novel application for machine learning and graph algorithms. Utilizing Learning Analytics to Support Study Success (2019), pp. 175–199 13. B.G. Giles, C.S. Findlay, G. Haas, B. LaFrance, W. Laughing, S. Pembleton, Integrating conventional science and aboriginal perspectives on diabetes using fuzzy cognitive maps. Soc. Sci. & Med. 64(3), 562–576 (2007) 14. A. Gupta, T. Suzumura, Finding all bounded-length simple cycles in a directed graph (2021) 15. A.A. Hagberg, D.A. Schult, P.J. Swart, Exploring network structure, dynamics, and function using networkx, in Proceedings of the 7th Python in Science Conference, ed. by G. Varoquaux, T. Vaught, J. Millman (Pasadena, CA USA, 2008), pp. 11–15 16. A. Jetter, W. Schweinfort, Building scenarios with fuzzy cognitive maps: an exploratory study of solar energy. Futures 43(1), 52–66 (2011) 17. D.B. Johnson, Finding all the elementary circuits of a directed graph. SIAM J. Comput. 4(1), 77–84 (1975) 18. D. Koschützki, K.A. Lehmann, L. Peeters, S. Richter, D. Tenfelde-Podehl, O. Zlotowski, Centrality Indices (Springer, Berlin, Heidelberg, 2005), pp. 16–61
104
R. Schuerkamp and P. J. Giabbanelli
19. E.A. Lavin, P.J. Giabbanelli, A.T. Stefanik, S.A. Gray, R. Arlinghaus, Should we simulate mental models to assess whether they agree?, in Proceedings of the Annual Simulation Symposium, ANSS ’18, San Diego, CA, USA, 2018. Society for Computer Simulation International 20. M.A. Levy, M.N. Lubell, N. McRoberts, The structure of mental models of sustainable agriculture. Nat. Sustain. 1(8), 413–420 (2018) 21. V.K. Mago, H.K. Morden, C. Fritz, T. Wu, S. Namazi, P. Geranmayeh, R. Chattopadhyay, V. Dabbaghian, Analyzing the impact of social factors on homelessness: a fuzzy cognitive map approach. BMC Med. Inf. Decis. Mak. 13(1), 94 (2013) 22. F.J. Navarro-Meneses, Unraveling the airline value creation network with fuzzy cognitive maps. Int. J. Eng. Bus. Manag. 14, 18479790221124640 (2022) 23. M. Newman, Networks (Oxford University Press, 2018) 24. M. Obiedat, S. Samarasinghe, A novel semi-quantitative fuzzy cognitive map model for complex systems for addressing challenging participatory real life problems. Appl. Soft Comput. 48, 91–110 (2016) 25. S. Oldham, B. Fulcher, L. Parkes, A. Arnatkevi.c˘ i¯ut˙e, C. Suo, A. Fornito, Consistency and differences between centrality measures across distinct classes of networks. PLOS ONE 14(7), 1–23 (2019) 26. U. Özesmi, S. Özesmi, A participatory approach to ecosystem conservation: fuzzy cognitive maps and stakeholder group analysis in uluabat lake, turkey. Environ. Manag. 31(4), 0518–0531 (2003) 27. J. Saramäki, M. Kivelä, J.-P. Onnela, K. Kaski, J. Kertesz, Generalizations of the clustering coefficient to weighted complex networks. Phys. Rev. E 75(2), 027105 (2007) 28. J.L. Szwarcfiter, P.E. Lauer, A search strategy for the elementary cycles of a directed graph. BIT Numer. Math. 16(2), 192–204 (1976) 29. D.E. Tchupo, G.A. Macht, Comparing fuzzy cognitive maps: methods and their applications in team communication. Int. J. Ind. Ergon. 92, 103344 (2022) 30. R. Thomas, Logical analysis of systems comprising feedback loops. J. Theor. Biol. 73(4), 631–656 (1978) 31. S. Wasserman, K. Faust, Social Network Analysis: Methods and Applications (Cambridge University Press, 1994) 32. B.S. Yoon, A.J. Jetter, Comparative analysis for fuzzy cognitive mapping, in 2016 Portland International Conference on Management of Engineering and Technology (PICMET) (2016), pp. 1897–1908 33. S.G. Zare, M. Alipour, M. Hafezi, R.A. Stewart, A. Rahman, Examining wind energy deployment pathways in complex macro-economic and political settings using a fuzzy cognitive map-based method. Energy 238, 121673 (2022)
Chapter 6
Extensions of Fuzzy Cognitive Maps Ryan Schuerkamp and Philippe J. Giabbanelli
Abstract Fuzzy Cognitive Maps (FCMs) are interpretable simulation models capable of representing complex systems; however, they have numerous limitations. They can only represent causal relationships, have a limited representation of uncertainty, and cannot capture nonlinear relationships, time delays/lags, or conditional relationships. Thus, several extensions of FCMs have been proposed. We organize various use cases, additional features, and added requirements of extensions of FCMs and identify candidates to support extension selection given a particular scenario. We examine how to build Interval-Valued FCMs (IVFCMs), Time-Interval FCMs (TIFCMs), and Extended-FCMs (E-FCMs), how they operate, and the additional features they offer to introduce a subset of extensions. We comment on three trends of applying and developing extensions and suggest two skills for modelers to effectively use extensions. Finally, we provide exercises to solidify the readers’ understanding of extensions. After reading this chapter and completing its problems, readers should understand why we extend FCMs and be able to compare extensions and their additional capabilities. Moreover, they should be able to select and apply an extension for numerous distinct use cases.
6.1 Why Do We Extend Fuzzy Cognitive Maps? Fuzzy Cognitive Maps (FCMs) are robust simulation models capable of representing a system of interest and assessing its long-term behavior. They focus on the essence of the problem, concepts and their static causal relationships, to simplify modeling. However, this simplified representation of reality may be insufficient to fulfill the model’s needs and support stakeholders. We examine three limitations of the FCM through our guiding example of modeling suicide attempts (Fig. 1.2 from Chap. 1). R. Schuerkamp · P. J. Giabbanelli (B) Department of Computer Science and Software Engineering, Miami University, 524 E High St, Oxford, OH 45056, USA e-mail: [email protected] R. Schuerkamp e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_6
105
106
R. Schuerkamp and P. J. Giabbanelli
First, the relationship between trauma and suicide attempts is likely highly nonlinear. A large increase in trauma may be required to cause an individual to begin attempting suicide when they have not before, whereas a slight increase in trauma may result in a more drastic increase in suicide attempts if an individual is already attempting. The FCM cannot represent this nonlinear relationship. Second, the relationship between financial hardship and suicide attempts may be highly uncertain; its effects on suicide attempts likely vary substantially from person to person. The FCM cannot fully represent this variable relationship because it assigns one value to each edge weight. Finally, economic stimulus does not immediately reduce financial hardship; there is a delay between policy implementation and its effects rippling through the economy before ultimately reducing individual financial hardship. The FCM cannot represent time delays/lags for relationships. In this chapter, we focus on the limited representation of uncertainty and the inability to represent nonlinear relationships and time lags/delays because numerous extensions address each limitation. However, several extensions address other limitations. For example, FCMs can only represent causal relationships, and RuleBased FCMs enable representing various additional relationships (e.g., influence, probabilistic, possibilistic) [2]. Additionally, each concept in the FCM uses the same inference equation, motivating the FCM with Needs, Activities, and States (FCMNAS) that classifies each concept as a need, activity, or state with the class affecting the inference equation used [13]. The number of existing extensions makes selecting an optimal extension for a given use case challenging, and the growing body of extensions will exacerbate this challenge. A comprehensive review of extensions is beyond the scope of this chapter and covered in previous reviews [15, 18]. However, we examine when to use a subset of extensions based on existing case studies (see Fig. 6.1) and the limitations they address (see Fig. 6.2). First, we categorize each extension’s use case as being for physical systems where there is a ground truth or social systems where there is not. Some extensions are designed specifically for physical systems because weights and additional parameters for abstract qualitative systems cannot be evaluated [12]. For example, a concept value indicating how much of a lesson a student has understood is unitless and cannot be interpreted or measured because there is no unit of measurement, whereas physical force can be measured [12]. After making this distinction, we assign an extension to an application area based on one of its case studies. Some extensions are applied to several case studies, but we use one case study to determine the extensions application area for simplicity (see Fig. 6.1). Second, we categorize extensions based on whether they address nonlinearity, time delays/lags, uncertainty, or a combination. Then, we classify them as being data or participant-driven. Data-driven extensions require real-world datasets and frequently leverage learning algorithms, whereas participant-driven extensions utilize participant knowledge of the system. Finally, when there are multiple extensions under the same branch, we differentiate among the extensions according to their further capabilities (see Fig. 6.2). If modelers have a given use case in mind and are unsure what additional capabilities they need, we suggest they use Fig. 6.1 to identify a similar use case and
6 Extensions of Fuzzy Cognitive Maps
107 Use Case
Physical Systems
Time Series
HFCM
DFCM
Expanded FCM
UDFCM
Social Systems
Chemical Engineering
GLF
Socioeconomic Problems
Management
TM-FTCM
iFCM-II
IVFCM
TI-FCM
FCM4DRV
Ensemble IVFCM
IFCMR
Project Management
E-FCM
NCM Extension
FGCM
Fig. 6.1 The use cases of extensions segmented by physical or social systems and application domains. Note, we discuss the three extensions with a black border in detail. Abbreviations and references for each extension from left to right are as follows: High-Order FCM (HFCM) [10], Expanded FCM [12], Deep FCM (DFCM) [22], Unsupervised Dynamic FCM (UDFCM) [9], Generalized Logistic Function (GLF) [8], Time-Delay Mining Fuzzy Time Cognitive Map (TMFTCM) [6], Intuitionistic FCM (iFCM-II) [14], Extended-FCM (E-FCM) [3], Neutrosophic Cognitive Map (NCM) Extension [1, 7], Fuzzy Grey Cognitive Maps (FGCM) [17], Interval-Valued FCM (IVFCM) [5], FCM for Discrete Random Variables (FCM4DRV) [20], Time-Interval FCM (TI-FCM) [11], Ensemble IVFCM [21], and High-Order Intuitionistic FCM (IFCMR) [23]
extensions applied to it. Then, they can use Fig. 6.2 to identify additional features and needs of each extension to select one that they have the resources to implement (either participants or data) with the most appropriate added features. If modelers already know the increased functionality they need, we suggest they use Fig. 6.2 to identify extensions that offer it. Moreover, selecting the simplest extension required and avoiding unnecessary additional features is ideal because extensions with greater functionality often demand more data, participant input, or work. Overall, selecting an extension for a use case is not an exact science; several approaches may perform well, like in machine learning. For example, support vector machines, decision trees, and neural networks may have similar classification accuracies, and the best model will depend on the dataset and task. When classifying whether an individual has a disease on a large dataset, support vector machines could have an accuracy of 84%, decision trees 82%, and neural networks 85%. Working with a smaller dataset of whether a customer will purchase a product could result in support vector machines having an accuracy of 74%, decision trees 73%, and neural networks 70%. Thus, the trained neural network would be the best model for classifying whether someone is infected, and a support vector machine would be the best for predicting whether customers will purchase the product. Overall, there are two critical skills modelers should develop to effectively use extensions. First, modelers should learn to identify and select extensions suitable for
108
R. Schuerkamp and P. J. Giabbanelli Nonlinearity
Uncertainty
Data Driven
Participant Driven
Compatible with existing learning algorithms?
Performing statistical analysis of concept values?
Yes Can represent non-monotonic relationships? Yes
GLF
Yes DFCM
No UDFCM
Yes IFCMR
Participant Driven
Do nodes need their own activation function, or is time series data being modeled?
TI-FCM
Yes
HFCM
No
Yes Eliciting hesitation as additional input?
Do participants have different levels of importance (i.e., credibilities)?
Data Driven
Modeling indeterminacy?
FCM4DRV
Yes
Time Delays/Lags
No
Yes
Can predict multivariate time series?
No
Expanded FCM
No
Are models going to be aggregated? Yes
No
NCM Extension
No iFCM-II
No
Can participant input be weighed differently (i.e., can some participant input be emphasized)? Yes Ensemble IVFCM
IVFCM
No FGCM
Nonlinearity and Time Delays/Lags
Participant Driven
No
TM-FTCM
E-FCM
Fig. 6.2 The additional features offered by each extension and whether they require data or participant input. Note, we discuss the three extensions with a black border in detail. The abbreviations and references for each extension are given in the caption to Fig. 6.1
their needs and available resources. As shown by the numerous extensions addressing the same application domain (see Fig. 6.1) and offering similar functionality for comparable resources (see Fig. 6.2), this is often easier said than done. As a result, it is also critical to make an informed selection of an extension if there are multiple viable candidates. Additionally, if no extensions satisfy the use case or available resources, being able to select another extension or the base FCM and thoroughly justify their decision is an important skill. Second, understanding how extensions operate and how to build them is essential, particularly for the extension selected. If the requirements for a model are not fully understood, a modeler may lack the necessary data to build it or not elicit sufficient information from participants, preventing its use. Additionally, a robust understanding of how the extension operates is crucial because code is often unavailable, so modelers may have to develop their own implementation. In conclusion, developing the ability to select, understand, and implement an extension are crucial to effectively leveraging extensions. The remainder of this chapter is organized as follows. Section 6.2 introduces Interval-Valued Fuzzy Cognitive Maps, Sect. 6.3 Time-Interval Fuzzy Cognitive Maps, and Sect. 6.4 Extended-Fuzzy Cognitive Maps. We introduce each of the
6 Extensions of Fuzzy Cognitive Maps
109
Initial State
Iteration One
[0.6, 0.8]
[0.6, 0.8] c2 : Suicide Attempts [0.1, 0.2]
c1 : Trauma [0.2, 0.4]
c2 : Suicide Attempts [0.6271, 0.6726]
c1 : Trauma [0.5842, 0.6318] [0.7, 0.8]
[0.7, 0.8] [0.2, 0.7]
c3: Economic Stimulus [0.1, 0.4]
[-0.6, -0.5]
[0.2, 0.7]
c3: Economic Stimulus [0.5250, 0.5987]
c4 : Financial Hardship [0.4, 0.5]
[-0.6, -0.5]
c4: Financial Hardship [0.5866, 0.6106]
(a) IVFCM Initial State
Iteration One
Weight: 0.7 Delay: 0 Frequency: 1
Weight: 0.7 Delay: 0 Frequency: 1 c2 : Suicide Attempts 0.15
c1: Trauma 0.3 Weight: 0.75 Delay: 0 Frequency: 1
c3: Economic Stimulus 0.25 Weight: -0.55 Delay: 2 Frequency: 1
c2 : Suicide Attempts 0.7837
c1: Trauma 0.7261 Weight: 0.75 Delay: 0 Frequency: 1
Weight: 0.45 Delay: 0 Frequency: 1 c4: Financial Hardship 0.45
c3: Economic Stimulus 0.5622 Weight: -0.55 Delay: 2 Frequency: 1
Weight: 0.45 Delay: 0 Frequency: 1 c4 : Financial Hardship 0.5775
(b) TI-FCM Initial State
Iteration One Weight:
Weight:
Delay: 0
Delay: 0 c2 : Suicide Attempts 0.15
c1: Trauma 0.3 Weight: 0.75 Delay: 0
c3: Economic Stimulus 0.25 Weight: -0.55 Delay: 2
c2 : Suicide Attempts 0.5926
c1 : Trauma 0.5281 Weight: 0.75 Delay: 0
Weight: 0.45 Delay: 0 c4: Financial Hardship 0.45
c3: Economic Stimulus 0.5 Weight: -0.55 Delay: 2
Weight: 0.45 Delay: 0 c4: Financial Hardship 0.5
(c) E-FCM
Fig. 6.3 Example IVFCM, TI-FCM, and E-FCM for our guiding example and one step of inference. Note, the underlined fourth decimal point indicates it is rounded
110
R. Schuerkamp and P. J. Giabbanelli
three extensions in detail, including their inference mechanisms (see Fig. 6.3), to depict how extensions can modify the FCM. We select one extension that addresses uncertainty, time delays/lags, and nonlinearity to cover each additional need in our guiding example of suicide attempts and provide an instance of each added feature. Section 6.5 outlines three trends for extensions of FCMs and their implications for modelers. Finally, Sect. 6.6 provides exercises to reinforce the content of the chapter and help modelers develop the skills required to effectively select and apply an extension.
6.2 Interval-Valued Fuzzy Cognitive Maps Determining precise concept values and weights in FCMs can be difficult because of uncertainties in the system of interest, noise in the data, disagreement among participants, or linguistic uncertainties [4]. Interval-Valued Fuzzy Cognitive Maps (IVFCMs) replace crisp concept values and weights with intervals to better represent and express uncertainty, provide participants with additional freedom, and effectively model highly uncertain systems [5]. Specifically, they use interval-valued fuzzy sets (IVFSs) because of their ability to express uncertainty stemming from numerous sources, including model accuracy, limited data, and different sources of knowledge [16]. IVFCM construction closely resembles constructing FCMs leveraging participant knowledge; however, they elicit linguistic input for both lower and upper bounds of the concept or weight instead of just one value as in the FCM. Background Information An IVFS .x has a lower and upper bound, respectively denoted as .x L and .x U . The degree of uncertainty of .x is the distance between these two bounds: .π(x) = x U − x L . To perform inference, it is necessary to define addition and multiplication on IVFSs as follows: .
) ] [ ( A ⊕ B = min x L + y U , x U + y L , x U + y U ,
(6.1)
)] [ ( A ⊗ B = x L y L , max x L y U , x U y L .
(6.2)
.
The IVFCM reformulates the type II FCM inference equation (introduced as Eq. 3.2 in Chap. 3) as follows to give a value for the . jth concept at iteration .t + 1:
6 Extensions of Fuzzy Cognitive Maps
⎞
⎛⎛ ⎜⎜ {[a L , a U ]}(t+1) = f ⎝⎝ j
111
N
.
⎞
⎟ ⎟ {[w L , wU ]}i j ⊗ {[a L , a U ]}i(t) ⎠ ⊕ {[a L , a U ]}(t) j ⎠,
i=1 i/= j
(6.3) where .{[w L , wU ]}i j is the weight between concepts .i and . j, .{[a L , a U ]}i(t) is the value of the .ith concept at the .tth iteration, and the addition and multiplication operators are respectively replaced by Eqs. 6.1 and 6.2.
6.2.1 Example Interval-Valued Fuzzy Cognitive Map Inference We work through one iteration of IVFCM inference (see Fig. 6.3a) for our guiding example given concepts trauma .c1 , suicide attempts .c2 , economic stimulus (0) .c3 , and financial hardship .c4 , initial activation vector .A1×4 = ([0.2, 0.4], [0.1, 0.2], [0.1, 0.4], [0.4, 0.5]), and weights .w1,2 = [0.6, 0.8], .w2,1 = [0.7, 0.8], .w4,2 = [0.2, 0.7]), and .w3,4 = [−0.6, −0.5]. We observe the activation values of economic stimulus will only be affected by its previous value and the clipping function because of the lack of incoming relationships. First, we update .c1 as follows: ) ( a (1) = f (a2(0) ⊗ w2,1 ) ⊕ a1(0) = f (([0.1, 0.2] ⊗ [0.7, 0.8]) ⊕ [0.2, 0.4]) .
. 1
By IVFS multiplication (see Eq. 6.2), .[0.1, 0.2] ⊗ [0.7, 0.8] = [0.1 × 0.7, max(0.1 × 0.8, 0.2 × 0.7)] = [0.07, max(0.08, 0.14)] = [0.07, 0.14]. Thus, we have .
f ([0.07, 0.14] ⊕ [0.2, 0.4]) .
By IVFS addition (see Eq. 6.1), .[0.07, 0.14] ⊕ [0.2, 0.4] = [min(0.07 + 0.4, 0.14 + 0.2), 0.14 + 0.4] = [min(0.47, 0.34), 0.54] = [0.34, 0.54], producing . f ([0.34, 0.54]). The IVFCM usually uses the sigmoid function as its activation function . f (x) [5], like the FCM. For this example, we assume the function slope .λ = 1 and the function offset .h = 0. For the final step, we apply the sigmoid function to both the lower and upper values of the interval: . f (0.34) = 1+e1−0.34 = 0.5842 and . f (0.54) = 1+e1−0.54 = 0.6318. Note, we underline the fourth decimal place to indicate it is rounded. This produces the final value (1) .a1 = [0.5842, 0.6318]. Second, we update .c2 as follows:
112
R. Schuerkamp and P. J. Giabbanelli
) ( a (1) = f (a1(0) ⊗ w1,2 ) ⊕ (a4(0) ⊗ w4,2 ) ⊕ a2(0) =
. 2
.
f (([0.2, 0.4] ⊗ [0.6, 0.8]) ⊕ ([0.4, 0.5] ⊗ [0.2, 0.7]) ⊕ [0.1, 0.2]) .
By IVFS multiplication (see Eq. 6.2), .[0.2, 0.4] ⊗ [0.6, 0.8] = [0.2 × 0.6, max(0.2 × 0.8, 0.4 × 0.6)] = [0.12, max(0.16, 0.24)] = [0.12, 0.24]. Similarly, .[0.4, 0.5] ⊗ [0.2, 0.7] = [0.4 × 0.2, max(0.4 × 0.7, 0.5 × 0.2)] = [0.08, max(0.28, 0.1) = [0.08, 0.28]. Thus, we now have .
f ([0.12, 0.24] ⊕ [0.08, 0.28] ⊕ [0.1, 0.2]) .
By IVFS addition (see Eq. 6.1), .[0.12, 0.24] ⊕ [0.08, 0.28] = [min(0.12 + 0.28, 0.24 + 0.08), 0.24 + 0.28] = [min(0.4, 0.32), 0.52] = [0.32, 0.52]. Then, .[0.32, 0.52] ⊕ [0.1, 0.2] = [min(0.32 + 0.2, 0.52 + 0.1), 0.52 + 0.2] = [min(0.52, 0.62), 0.72] = [0.52, 0.72]. As a result, we have . f ([0.52, 0.72]) . By definition of the sigmoid function above, . f (0.52) = 1+e1−0.52 = 0.6271 and 1 = 0.6726. This produces the final value . f (0.72) = 1+e−0.72 a (1) = [0.6271, 0.6726].
. 2
Third, we update .c3 as follows: . f ([0.1, 0.4]). By definition of the sigmoid function, . f (0.1) = 1+e1−0.1 = 0.5250 and . f (0.4) = 1+e1−0.4 = 0.5987. This produces the final value a (1) = [0.5250, 0.5987].
. 3
Finally, we update .c4 as follows: ) ( a (1) = f (a3(0) ⊗ w3,4 ) ⊕ a4(0) = f (([0.1, 0.4] ⊗ [−0.6, −0.5]) ⊕ [0.4, 0.5]) .
. 4
By IVFS multiplication (see Eq. 6.2), .[0.1, 0.4] ⊗ [−0.6, −0.5] = [0.1 × −0.6, max(0.1 × −0.5, 0.4 × −0.6)] = [−0.06, max(−0.05, −0.24)] = [−0.06, −0.05]. Thus, we have .
f ([−0.06, −0.05] ⊕ [0.4, 0.5]) .
By IVFS addition (see Eq. 6.1), .[−0.06, −0.05] ⊕ [0.4, 0.5] = [min(−0.06 + 0.5, −0.05 + 0.4), −0.05 + 0.5] = [min(0.44, 0.35), 0.45] = [0.35, 0.45], producing . f ([0.35, 0.45]) . By definition of the sigmoid function above, . f (0.35) = 1+e1−0.35 = 0.5866 and 1 = 0.6106, producing the final value . f (0.45) = 1+e−0.45 a (1) = [0.5866, 0.6106]
. 4
6 Extensions of Fuzzy Cognitive Maps
113
and completing iteration one of the IVFCM.
6.3 Time-Interval Fuzzy Cognitive Maps The FCM cannot represent time delays/lags, and weights are always active. Realworld relationships may have time delays/lags, and some relationships are not constantly operating but operate every set number of iterations (e.g., earnings reports could come out every three months). Allowing relationships to switch on and off every iteration can help simulate oscillations and cycles in the real world (e.g., busy periods compared to slow periods for companies). Time-Interval Fuzzy Cognitive Maps (TI-FCMs) introduce time delays/lags and allow relationships to oscillate on and off [11]. Their construction procedure has three additional requirements in comparison to the FCM’s. First, participants must select a unit time for iterations (e.g., one iteration.t could correspond to one year). Second, participants must define delays for each relationship in the unit time. Finally, participants must state how often each relationship is active using the unit time (e.g., a relationship could operate every three iterations/years). TI-FCMs introduce three . N × N matrices, where . N is the number of concepts. First, a matrix .D contains the time delays/lags for each relationship. Note, if .di j = 0, there is no delay/lag for the relationship between the .ith and . jth concepts. Second, a matrix .F represents how often each relationship is active. If the relationship between .ith and . jth concepts is always active as in the FCM, . f i j = 1. If the relationship is never active or has no effect (i.e., .wi j = 0), . f i j = 0. Third, the matrix .IM(t) indicates whether the relationship is active for iteration .t, where .im i(t)j = 1 if the relationship between the.ith and. jth concepts is active and.im i(t)j = 0 if not. For example, if.di j = 6 and . f i j = 3, then .im i(t)j = 1 for iterations .8, 11, 14, . . . , 5 + 3 × (t + 1), where .0 ≤ t ≤ T . In general, .im i(t)j = 1 for iterations .−1 + di j + f i j × (t + 1), where .0 ≤ t ≤ T and . f i j ≥ 1. Otherwise, .im i(t)j = 0. Note, we start at iteration 0, whereas the TIFCM paper starts at iteration 1, so we subtract one to calculate the active iterations. The TI-FCM reformulates the type II FCM inference equation (i.e., ones on the diagonal elements of .W) (Eq. 3.2 in Chap. 3) as follows: )) ( ( A(t+1) = f A(t) × W × IM(t) .
.
(6.4)
6.3.1 Example Time-Interval Fuzzy Cognitive Map Inference We work through an iteration of inference for the TI-FCM on the guiding example (see Fig. 6.3b) with an initial state vector .A(0) 1×4 = (0.3, 0.15, 0.25, 0.45) and weight matrix .W, delay/lag matrix .D, and frequency matrix .F as follows:
114
R. Schuerkamp and P. J. Giabbanelli
⎛
W4×4
.
1 ⎜0.75 ⎜ =⎝ 0 0
⎛ ⎞ 0 0 00 ⎜ 0 0 ⎟ ⎟ ; D4×4 = ⎜0 0 ⎝0 0 1 −0.55⎠ 0 1 00
0.7 1 0 0.45
0 0 0 0
⎛ ⎞ 0 1 ⎜ 0⎟ ⎟ ; F4×4 = ⎜1 ⎝0 2⎠ 0 0
1 1 0 1
0 0 1 0
⎞ 0 0⎟ ⎟. 1⎠ 1
To calculate .IM(0) 4×4 , we must consider the three cases present given our delay matrix .D and frequency matrix .F and use the general formula .im i(t)j = 1 for iterations (t) .−1 + di j + f i j × (t + 1), where .0 ≤ t ≤ T and . f i j ≥ 1, and .im i j = 0 otherwise. First, we have .di j = 0, meaning there is no delay for the relationship, and . f i j = 1, meaning the relationship is always active. Thus, our active iterations are given by .−1 + 0 + 1 × (t + 1), and using iteration 0 (i.e., .t = 0) produces .−1 + 0 + 1 × (0 + 1) = −1 + 1 = 0, so .im i(0) j = 1. This occurs for all edges with non-zero weight (e.g., .w1,2 ) other than the edge from economic stimulus to financial hardship because it has a delay of two (i.e., .d3,4 = 2). Second, for the edge from economic stimulus to financial hardship, .d3,4 = 2 and . f 3,4 = 1, meaning it has a delay of two and is always active. Using the general formula for active iterations, we get .−1 + 2 + 1 × (t + 1), and substituting .t = 0 produces .−1 + 2 + 1 × (0 + 1) = −1 + 2 + 1 = 2. Thus, the edge from economic stimulus to financial hardship is first active in the third iteration .t = 2, and .im (0) 3,4 = 0. Third, we have .di j = 0 and . f i j = 0 for all edges with weight 0 (i.e., .wi j = 0), meaning there is no delay, but the edge is never active. Thus, (t) (0) .im i j = 0 for all .t, including iteration 0. Finally, our indicator matrix .IM can be given as follows: ⎛ ⎞ 1100 ⎜1 1 0 0 ⎟ (0) ⎟ .IM4×4 = ⎜ ⎝0 0 1 0⎠ . 0101 Thus, )) ( ( A(1) = f A(0) × w × IM(0) =
.
⎛
⎛⎛
1 ⎜( ) ⎜⎜0.75 . f ⎜ 0.3 0.15 0.25 0.45 × ⎜⎜ ⎝ ⎝⎝ 0 0 ⎛
0.7 1 0 0.45
⎛
⎞ ⎛ 0 0 1 ⎜1 0 0 ⎟ ⎟×⎜ 1 −0.55⎠ ⎝0 0 1 0
1.7 ⎜( ) ⎜1.75 . f ⎜ 0.3 0.15 0.25 0.45 × ⎜ ⎝ ⎝ 0 0.45
1.7 1.75 −0.55 1.45
1 1 0 1
0 0 1 0
⎞⎞⎞ 0 ⎟⎟ 0⎟ ⎟⎟⎟ = ⎠ 0 ⎠⎠ 1
⎞⎞ 0 0 ⎟ 0 0 ⎟ ⎟⎟ = ⎠ 1 −0.55 ⎠ 0 1
6 Extensions of Fuzzy Cognitive Maps .
f
115
(( )) 0.975 1.2875 0.25 0.3125 .
To produce the final result, we must apply the sigmoid function to each element. Like above, we assume the function slope .λ = 1 and the function offset .h = 0. This produces the final result ) ( A(1) = 0.7261 0.7837 0.5622 0.5775 .
.
6.4 Extended-Fuzzy Cognitive Maps FCMs cannot represent time delays/lags or nonlinear relationships. Moreover, they cannot represent conditional relationships (i.e., the AND logical operator). For example, the joint effect of experiencing financial hardship and being physically abused on suicide attempts may be greater than the combined effect of both in isolation. Extended-FCMs (E-FCMs) address these three limitations [3] by replacing crisp weights of the FCM with weight functions and introducing time delays. The E-FCM construction process closely resembles the FCM’s, but participants must identify if the relationship is nonlinear, delayed, or conditional. If it is nonlinear, they must select an appropriate weight function. Similarly, if there is a time delay, they must determine how long of a time delay. Note, there can still be crisp weights like in the FCM because constants are considered functions. E-FCMs reformulate the total input to the . jth concept as follows: net j =
N ∑
.
( ) (t−delayi j ) (t−delayi j ) × ai wi j ai ,
(6.5)
i=1
where .wi j is a weight function instead of a crisp weight as in the FCM, and .delayi j represents the time delay for the relationship. Thus, the inference equation of the FCM may be reformulated; we give the type I E-FCM equation below: ( N ) ( ) ∑ (t−delayi j ) (t−delayi j ) (t+1) × ai .a j = f wi j ai (6.6) . i=1
6.4.1 Example Extended-Fuzzy Cognitive Map Inference We work through an iteration of inference for the E-FCM on the guiding example (see Fig. 6.3c) with an initial state vector .A(0) 1×4 = (0.3, 0.15, 0.25, 0.45) and weights 1 .w1,2 = (t) , .w2,1 = 0.75, .w3,4 = −0.55, and .w4,2 = 0.45. We use the sigmoid −a 1+e
1
function with slope .λ = 1 and the function offset .h = 0 to represent the nonlinear
116
R. Schuerkamp and P. J. Giabbanelli
relationship between trauma and suicide attempts. Additionally, we assign a delay to the relationship between economic stimulus and financial hardship .delay3,4 = 2 and set all other delays to 0. We do not consider conditional weights in this example. First, we update .c1 as follows (see Eq. 6.6): ( ) ) ( (1) .a1 = f w2,1 a2(0−0) × a2(0−0) = .
f (0.75 × 0.15) = f (0.1125) =
1 = 0.5281. 1 + e−.1125
Second, we update .c2 : ( ) ( ) ) ( (1) .a2 = f w1,2 a1(0−0) × a1(0−0) + w4,2 a4(0−0) × a4(0−0) = ( .
( .
f
.
f
1 (0)
1 + e−a1
) ( ) × a1(0) + w4,2 a4(0) × a4(0) =
) ) ( 1 × 0.3 + 0.45 × 0.45 = f 0.5744 × 0.3 + 0.2025 = −0.3 1+e
f (0.17232 + 0.2025) = f (0.37482) =
1 = 0.5926. 1 + e−0.37482
Third, we update .c3 :
(1) .a3
= f
( 4 ∑
( ) (t−delayi j ) (t−delayi j ) × ai wi j ai
i=1
) = f (0) =
1 = 0.5. 1 + e−0
Note, because the type I equation is used, .c3 is unaffected by its previous value, and it has no incoming edges, so the total input to .c3 is 0. Finally, we update .c4 : ( ) ) ( (0−delay3,4 ) (0−delay3,4 ) (1) × a3 = .a4 = f w3,4 a3
.
) ( ) ( f −0.55 × a3(0−2) = f −0.55 × a3(−2) .
a (−2) is required by the inference equation because .delay3,4 = 2, but .a3(0) is the first value. Thus, we consider .a3(−2) = 0: . 3
a (1) = (−0.55 × 0) = f (0) =
. 4
1 = 0.5. 1 + e−0
6 Extensions of Fuzzy Cognitive Maps
117
6.5 Trends and Future of Extensions of Fuzzy Cognitive Maps There are three main trends regarding extensions of FCMs. First, several extensions have been proposed and covered in reviews [15, 18], and dozens have since been developed, raising the risk of redundant extensions that reinvent the motivation and rationale of an existing extension. For example, the FGCM [17] and the IVFCM [5] represent additional uncertainty over the FCM leveraging intervals. Thus, as more extensions are developed, we may eventually reach saturation for some limitations where additional extensions closely resemble existing ones and utilize similar approaches. This may make extension selection more difficult due to the increased number and similarity of extensions. Previous work created transformations between extensions to enable interoperability among extensions and the FCM, the reuse of expert knowledge, and comparison of multiple extensions on a given case, which may help with extension selection [19]. Second, recent extensions have focused on specific application domains and use cases, whereas more established extensions sought to be broadly applicable. For example, the E-FCM [3], proposed in 1992, was designed to allow the FCM to represent nonlinearity, time delays/lags, and conditional weights without a specific application domain in mind, whereas the DFCM [22] (published in 2020) focuses on multivariate time series forecasting. The more specific a model is, the better it may perform for the desired use case; however, its use cases are more limited. Even if recent extensions do not focus on a specific use case or domain, they may distinguish between physical or social systems due to the availability of ground truth in physical systems [12], which social systems lack. Given the increasing specialization of proposed extensions, modelers should search for recent extensions that may better support their specific use case because a specialized model may perform better than a more general one. Third, extensions are increasingly developed to effectively utilize participant input or real-world data, similar to the distinction between physical and social systems. Physical systems often have a ground-truth and plenty of data to build an FCM, whereas social systems often lack real-world data, requiring participant knowledge to model. Building an FCM leveraging participant knowledge or historical data are two drastically different processes, so numerous extensions focus on each. Extensions tailored to participants often elicit additional input from the participants, which is frequently accomplished by leveraging linguistic terms as in the FCM. For example, the IVFCM [5] extracts lower and upper interval values, and the TI-FCM [11] requests a unit time, time delays, and frequencies. However, increasing the number of required inputs increases participant workload and the knowledge necessary to contribute to the system, which both may restrict the set of possible participants. For example, building an E-FCM [3] requires determining if a relationship is nonlinear, has a delay, or is part of a conditional relationship. If a relationship is nonlinear, an appropriate function must be determined, and a specific delay must be specified if one exists. These additional requirements may challenge knowledgeable participants,
118
R. Schuerkamp and P. J. Giabbanelli
especially those without a technical background. Thus, extensions have been developed to leverage historical data to automatically extract and represent information that participants may struggle with, particularly for complicated physical systems with an abundance of data. For example, the TM-FTCM [6] leverages historical data to extract time delays. The growing divide between participant and data-driven extensions may ease model selection and improve model performance, given a use case has sufficient data or participant input.
6.6 Exercises The following exercises are classified by level of difficult, as indicated by the number of * symbols. 1. * You have a dataset containing the spread of a disease over time. Subject matter experts urge you to consider that there is an incubation period after an individual is infected, so they cannot spread the infection immediately. What extension should we use? Justify your selection. 2. ** Create a specific use case for the IVFCM, TI-FCM, and E-FCM. For each use case, discuss why the extension you propose is the best choice of the three. 3. Consider the TI-FCM and E-FCM. a. * What is one use case where the TI-FCM is more appropriate than E-FCM? Describe why it is better suited. b. ** How do their representations of time delays/lags differ? 4. *** In the programming level of your choice, implement inference for the TIFCM. Confirm your results for the first iteration align with the example in Sect. 6.3.1. 5. Read the iFCM-II paper [14]. a. * How are interval-valued fuzzy sets and intuitionistic fuzzy sets different? Include the information each represents in your answer. b. ** In both the iFCM-II and IVFCM, what happens if we are completely certain about a concept value or weight? Investigate and discuss what happens to uncertainty in the IVFCM and hesitancy in the iFCM-II. c. *** How does the participant input required to build the iFCM-II compare to the requirements of the IVFCM? In your answer, discuss if there are scenarios where using one is preferred. 6. Read the FGCM paper [17]. a. * Are FGCMs and FCMs compatible? In your answer, state if one is a special case of the other, and if so, when it is a special case. b. ** After reading about the IVFCM, iFCM-II, and FGCM, what is a common approach for representing additional uncertainty in FCMs? Discuss if you think more extensions use this approach or different ones.
6 Extensions of Fuzzy Cognitive Maps
119
c. *** How are the IVFCM, iFCM-II, and FGCM similar and different? Describe how these similarities and differences affect what model you would use. 7. *** Do you think many extensions address multiple limitations? Justify your answer and state the effects of an extension addressing multiple limitations.
References 1. S.H.S. Al-Subhi, I.P. Pupo, R.G. Vacacela, P.Y.P. Pérez, M.Y.L. Vázquez, A new neutrosophic cognitive map with neutrosophic sets on connections, application in project management. Infinite Study (2018) 2. J.P. Carvalho, J.A. Tomè, Rule based fuzzy cognitive maps-qualitative systems dynamics, in PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society-NAFIPS (Cat. No. 00TH8500) (IEEE, 2000), pp. 407–411 3. M. Hagiwara, Extended fuzzy cognitive maps, in [1992 Proceedings] IEEE International Conference on Fuzzy Systems (1992), pp. 795–801 4. H. Hagras, C. Wagner, Towards the wide spread use of type-2 fuzzy logic systems in real world applications. IEEE Comput. Intell. Mag. 7(3), 14–24 (2012) 5. P. Hajek, O. Prochazka, Interval-valued fuzzy cognitive maps for supporting business decisions, in 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (IEEE, 2016), pp. 531–536 6. C. Jiang, D. Wang, C. Gong, G. Zhang, W. Gu, L. Yang, X. Ding, Prediction of key parameters of coal gasification process based on time delay mining fuzzy time cognitive maps, in 2021 IEEE International Conference on Real-time Computing and Robotics (RCAR) (IEEE, 2021), pp. 189–193 7. W.V. Kandasamy, F. Smarandache, Fuzzy cognitive maps and neutrosophic cognitive maps. Infinite Study (2003) 8. M.K. Ketipi, D.E. Koulouriotis, E.G. Karakasis, G.A. Papakostas, V.D. Tourassis, A flexible nonlinear approach to represent cause-effect relationships in fcms. Appl. Soft Comput. 12(12), 3757–3770 (2012) 9. B. Liu, W. Fan, T. Xiao, Unsupervised dynamic fuzzy cognitive map. Tsinghua Sci. Technol. 20(3), 285–292 (2015) 10. W. Lu, J. Yang, X. Liu, W. Pedrycz, The modeling and prediction of time series based on synergy of high-order fuzzy cognitive map and fuzzy c-means clustering. Knowl.-Based Syst. 70, 242–255 (2014) 11. A. Minzoni, E. Mounoud, V.A. Niskanen, A case study on time-interval fuzzy cognitive maps in a complex organization, in 2017 8th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) (IEEE, 2017), pp. 000027–000032 12. O. Motlagh, T.S. Hong, S.M. Homayouni, G. Grozev, E.I. Papageorgiou, Development of application-specific adjacency models using fuzzy cognitive map. J. Comput. Appl. Math. 270, 178–187 (2014) 13. T. Nachazel, Fuzzy cognitive maps for decision-making in dynamic environments. Genetic Program. Evolvable Mach. 22(1), 101–135 (2021) 14. E.I. Papageorgiou, D.K. Iakovidis, Intuitionistic fuzzy cognitive maps. IEEE Trans. Fuzzy Syst. 21(2), 342–354 (2012) 15. E.I. Papageorgiou, J.L. Salmeron, A review of fuzzy cognitive maps research during the last decade. IEEE Trans. Fuzzy Syst. 21(1), 66–79 (2012) 16. W. Pedrycz, W. Homenda, From fuzzy cognitive maps to granular cognitive maps. IEEE Trans. Fuzzy Syst. 22(4), 859–869 (2013) 17. J.L. Salmeron, Modelling grey uncertainty with fuzzy grey cognitive maps. Expert Syst. Appl. 37(12), 7581–7588 (2010)
120
R. Schuerkamp and P. J. Giabbanelli
18. R. Schuerkamp, P.J. Giabbanelli, Extensions of fuzzy cognitive maps: a systematic review. ACM Comput. Surv. (2023) 19. R. Schuerkamp, P.J. Giabbanelli, N. Daclin, Facilitating the interoperability and reuse of extensions of fuzzy cognitive maps, in 2023 Annual Modeling and Simulation Conference (ANNSIM) (IEEE, 2023), pp. 708–719 20. P. Szwed, Combining fuzzy cognitive maps and discrete random variables, in Artificial Intelligence and Soft Computing: 15th International Conference, ICAISC 2016, Zakopane, Poland, June 12–16, 2016, Proceedings, Part I 15 (Springer, 2016), pp. 344–355 21. J. Wang, Q. Guo, Ensemble interval-valued fuzzy cognitive maps. IEEE Access 6, 38356– 38366 (2018) 22. J. Wang, Z. Peng, X. Wang, C. Li, J. Wu, Deep fuzzy cognitive maps for interpretable multivariate time series prediction. IEEE Trans. Fuzzy Syst. 29(9), 2647–2660 (2020) 23. Y. Zhang, J. Qin, P. Shi, Y. Kang, High-order intuitionistic fuzzy cognitive map based on evidential reasoning theory. IEEE Trans. Fuzzy Syst. 27(1), 16–30 (2018)
Chapter 7
Creating FCM Models from Quantitative Data with Evolutionary Algorithms David Bernard and Philippe J. Giabbanelli
Abstract The weights of an FCM can be adjusted or entirely learned from data, which addresses limitations when experts are either unsure or unavailable. In this chapter, we show how evolutionary algorithms can perform this optimization process. Evolutionary algorithms start with a random solution and improve it by repeatedly applying operators such as mutation, crossover, and selection. The chapter defines and exemplifies these operations in Python. When there is only one candidate solution at a time, we use single-individual algorithms. In contrast, when there are several candidates, we use population-based algorithms. In this chapter, we focus on the use of population-based algorithms to optimize FCMs, which we demonstrate via two popular solutions: genetic algorithms and CMA-ES. This chapter shows readers how to apply population-based algorithms on FCMs via reusable code, while highlighting some of the key modeling choices.
7.1 Introduction In Chap. 2, we detailed how FCMs could be obtained through qualitative data in participatory settings, for example through one-on-one interviews or workshops. However, participants may not always be available, depending on considerations such as the application domain or the project timeline. In addition, they may not be entirely sure about some causal mechanisms. The use of quantitative data can address these limitations by allowing modelers to create an FCM entirely from data, or to fine-tune an existing FCM to improve its ability at replicating observed patterns. This chapter demonstrates the use of Evolutionary Algorithms, while noting the existence D. Bernard CNRS UMR5505, IRIT, Artificial and Natural Intelligence Toulouse Institute, University Toulouse Capitole, Toulouse, France e-mail: [email protected] P. J. Giabbanelli (B) Department of Computer Science and Software Engineering, Miami University, Oxford, OH, USA e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_7
121
122
D. Bernard and P. J. Giabbanelli
Fig. 7.1 An FCM is formed of three components. When we learn the ‘structure’ of an FCM from data using evolutionary algorithms, we focus on the adjacency matrix. We assume that the user has chosen the inference function and provided input vectors
of many other learning algorithms both in historical studies [12] and among recent works [9, 11, 15, 19]. The emphasis is on optimizing the structure of the model. We assume that the user provides the input vectors (Fig. 7.1), either directly by asking questions or by providing a repository of cases.1 Said otherwise, the input vector represents the question that a user wishes to ask to an FCM (e.g., what happens to sustainability if regulations are decreased?) hence we focus on optimizing the answer rather than the question. We also assume that the FCM inference mechanism and activation functions have been set, as they are driven by choices from the modeler (should a concept retain memory, be driven solely by its neighbors, or a balance of the two?) and adjusted separately (see Sect. 3.2 on setting the parameters of an activation function). The focus is thus on optimizing the adjacency matrix given a dataset that contains at least the input vectors and desired outputs. The broad steps in applying such algorithms to optimize an adjacency matrix are intuitively summarized in Fig. 7.2. To begin, we initialize a population consisting of many matrices with random weights. Each matrix defines an FCM. We then evaluate the quality of each matrix (i.e., its fitness) by running each FCM on the input case(s) provided by the user and measuring how the simulated output differs from the expected outcome in the data. Then, we select the best matrices (i.e., with the highest fitness) and use them to create another generation of the population. The creation process may include a crossover through which values belonging to different matrices are copied, and a mutation that changes some of the values. The process then repeats by evaluating the new generation, selecting the best matrices, and creating the next generation. The process stops when we exceed either a target fitness (which 1
When we build machine learning models such as classifiers or regressors, the intention is to apply them to new instances. It could be a single instance, for which the values have been defined by the user to represent a situation of interest. Alternatively, we can provide a set of instances in one batch, such as a CSV file containing instances to classify.
7 Creating FCM Models from Quantitative Data …
123
Fig. 7.2 Main steps of the optimization process with evolutionary algorithms. This schema is only intended as an intuitive overview. The process is more rigorously depicted as the reader progresses through the chapter
means that we successfully created an FCM) or a number of population generations. The process is customized to a modeler’s needs in two critical places: constraints can be imposed on the matrices (e.g., must be sparse, cannot change existing expert weights by more than a user-defined margin) and the fitness evaluation may be based on a subset of important concepts, the trajectory of values over the iterations or only the final values. In Sect. 7.2, we explain how the adjacency matrix of an FCM can be represented as a one-dimensional genome subject to constraints. Then, we show how to evaluate the fitness of such genome in Sect. 7.3. The process of repeatedly generating and evaluating populations is explained in Sect. 7.4 using Genetic Algorithms (GAs). We illustrate the results produced by the best genome in Sect. 7.5. Finally, we use the state-of-the-art CMA-ES optimizer in Sect. 7.6 to obtain the best results.
7.2 Representing the Genome 7.2.1 Transformations Between Vector and Matrix The optimization algorithms operate on a vector, that is, a one-dimensional array of numbers. We call it a genome, as it represents the content of an individual solution composed of chromosomes (the edge weights). The adjacency matrix of an FCM is represented using two dimensions. To convert a matrix into a vector, we flatten it. When the vector is transformed into a matrix, we reshape it (Fig. 7.3a). For example,
124
D. Bernard and P. J. Giabbanelli
Fig. 7.3 We use a genome on the optimization side and an adjacency matrix on the FCM side, hence transformations are necessary between these representations
the following Python code creates a matrix of 25 consecutive numbers (.0, 1, . . . , 24) and flattens it: 1 2
m a t r i x = np . a r a n g e (25 , d t y p e = f l o a t ) . r e s h a p e (5 ,5) m a t r i x F l a t = np . a r r a y ( m a t r i x . flat )
As shown below, flattening the representation will organize the numbers with a single command: a r r a y ([ 0. , 1. , 2. , 3. , 4. , 5. , 6. , 7. , 8. , 9. , 10. , 11. , 12. , 13. , 14. , 15. , 16. , 17. , 18. , 19. , 20. , 21. , 22. , 23. , 2 4 . ] )
This array can be turned back into an adjacency matrix by calling 1
m a t r i x F l a t . r e s h a p e (5 ,5)
The reshaped data is organized into a matrix: a r r a y ([[ 0. , 1. , 2. , 3. , 4.] , [ 5. , 6. , 7. , 8. , 9.] , [10. , 11. , 12. , 13. , 14.] , [15. , 16. , 17. , 18. , 19.] , [20. , 21. , 22. , 23. , 2 4 . ] ] )
7 Creating FCM Models from Quantitative Data …
125
7.2.2 Constraints If we do not specify constraints, then all numbers in the genome could be changed until the corresponding FCM produces satisfactory results. A model can be right but for the wrong reasons. For example, a learned FCM could consider that eating reduces weight and that exercising increases weight, and simulated individuals may still have the expected weight as the two errors balance out. At minima, we must ensure that there are no self-loops in the FCM hence we do not want to optimize the weights on the diagonal of the matrix. Ideally, available theories and participants’ feedback can guide the search for a good FCM by giving us the expected directionality of effect (does eating reduce or increase weight?) or even a range of plausible values. The notion of causality When an FCM is built with participants, there is a clear notion of causality. For example, participants may state that as food intake increases, there is a medium impact on weight gain. However, weights optimized from data do not have the same causal connotation. Such weights aim to reproduce a target pattern and the optimization algorithm has no notion of real-world chains of events. Consequently, a model optimized entirely from data may not convey an authentic causal meaning. This limitation can be addressed when constraints are enforced on the weights, but we still advise a verification with experts.
To impose simple structural constraints, we use a mask that only lets the algorithms modify specific values (Fig. 7.3b). For example, this can prevent the optimization of weights from self-loops, or tempering with causal weights for which experts expressed high certainty. To create a mask, we make a matrix of the same size as the adjacency matrix, fill elements that cannot be optimized with 0’s, and use the other elements to obtain a subset of the adjacency matrix. The following Python code creates a mask that avoids self-loops: 1 2 3 4
a = np . ones ((5 ,5) ) np . f i l l _ d i a g o n a l (a ,0) mask = np . n o n z e r o ( a ) matrix [ mask ]
As a result, we have a smaller genome in which the diagonal values (0, 6, 12, 18, 24) have been eliminated: a r r a y ([ 1. , 2. , 3. , 4. , 5. , 7. , 8. , 9. , 10. , 11. , 13. , 14. , 15. , 16. , 17. , 19. , 20. , 21. , 22. , 23.])
Remember from the introduction that we need to transform the genome into the adjacency matrix, so that we can run the FCM and evaluate the genome’s fitness. If a mask was used for the genome, the transformation also needs to use this mask
126
D. Bernard and P. J. Giabbanelli
instead of reshaping as shown in Sect. 7.2.1. In the following example, we assume that a genome of 20 random weights is intended to encode an adjacency matrix of .5 × 5, where a mask prevents the use of weights on the diagonal. To transform the 1D array of the genome into the 2D array of the matrix, we build a matrix of the expected size, fill it with 0’s (including on the diagonal), and then insert the genome’s weights based on the mask: 1 2 3
g e n o m e = np . r a n d o m . u n i f o r m ( -1.0 ,1.0 ,5*4) print (" genome : ") display ( genome )
4 5 6 7 8
m a t r i x = np . z e r o s ((5 ,5) ) matrix [ mask ] = genome print ("\ nweights matrix : ") display ( matrix )
The output below shows the 1D array of the genome and the 2D array of the adjacency matrix: genome : array ([ 0.64.. , 0.26.. , 0.55.. , 0.94.. , 0.46.. , 0.59.. , -0.71.. , 0.33.. , -0.59.. , 0.19.. , 0.80.. , 0.17.. , 0.10.. , 0.70.. , -0.25.. , 0.80.. , 0.59.. , -0.82.. , -0.87.. , 0.45..]) weights matrix : array ([[ 0. , 0.64.. , 0.26.. , 0.55.. , [ 0.46.. , 0. , 0.59.. , -0.71.. , [ -0.59.. , 0.19.. , 0. , 0.80.. , [ 0.10.. , 0.70.. , -0.25.. , 0. , [ 0.59.. , -0.82.. , -0.87.. , 0.45.. ,
0.94..] , 0.33..] , 0.17..] , 0.80..] , 0. ]])
7.3 Evaluation Assessing the quality of a genome consists of measuring the fitness obtained through its associated FCM. As a guiding example, we use the sigmoid function and the update equation that accounts for a concept’s previous state (Eq. 3.2 in Chap. 3). The function below applies the update equation for a set number of steps,2 while noting that a more complete production code would also stop if certain nodes have stabilized. 1 2
3
def s i m u l a t e F C M ( concepts , w e i g h t s M a t r i x , n s t e p s ) : # the c o n c e p t v a r i a b l e s t o r e s the c u r r e n t s t a t e of each node c o n c e p t s = np . copy ( c o n c e p t s )
4
2
The vector-matrix multiplication in the reasoning rule is reversed to align it with longitudinal data where concepts are typically organized by rows and their values by columns.
7 Creating FCM Models from Quantitative Data … 5 6 7 8
9 10 11
127
v a l s T h r o u g h o u t S t e p s = np . zeros (( nsteps , c o n c e p t s . shape [0]) ) for j in range ( n s t e p s ) : # c a l c u l a t e s n o d e v a l u e s at step t +1 c o n c e p t s = 1 / (1 + np . exp ( -( c o n c e p t s + w e i g h t s M a t r i x @ c o n c e p t s ) ) ) # @ is dot p r o d u c t # archiving node values valsThroughoutSteps [j] = concepts return valsThroughoutSteps
To evaluate a genome, we need the mask (to re-create the adjacency matrix) and data (an initial state .t = 0 and a final expected value). There are multiple ways to quantify the difference between the simulated data and the expected data. In this example, we use the Mean Squared Error (MSE) across all concept values as follows: 1 2 3 4 5 6 7 8
9 10
def e v a l u a t e ( genome , dataToFit , concepts , mask ) : n C o n c e p t s = c o n c e p t s . shape [0] n u m b e r o f s t e p s = d a t a T o F i t . shape [0] # g e n o m e to m a t r i x c o n v e r s i o n w e i g h t s M a t r i x = np . zeros (( nConcepts , n C o n c e p t s ) ) weightsMatrix [ mask ] = genome # get the s i m u l a t i o n o u t p u t d a t a F C M = s i m u l a t e F C M ( concepts , w e i g h t s M a t r i x , numberofsteps ) # r e t u r n f i t n e s s as the MSE r e t u r n np . mean (( d a t a T o F i t - d a t a F C M ) **2)
To demonstrate the application of this code, we construct a short case study consisting of a longitudinal dataset with four concepts, whose values have been tracked over four steps. We use a mask to prevent the creation of weights on the diagonal, create a random genome, and show its fitness on the same scale as the nodes’ values (which here ranges from 0 to 1): 1 2 3 4 5 6
questionnaire = { ’ S u i c i d e i d e a t i o n ’ :[0 ,0.5 ,0.6 ,0.65] , ’ S u i c i d e a t t e m p t ’ :[0 ,0 ,0.3 ,0.3] , ’ T r a u m a ’ :[0.6 ,0.6 ,0.7 ,0.6] , ’ M e n t a l h e a l t h c a r e ’ :[0 ,0 ,0 ,0.5] }
7 8 9 10 11
# mask c r e a t i o n a = np . ones (( q u e s t i o n n a i r e . shape [1] , q u e s t i o n n a i r e . shape [1]) ) np . f i l l _ d i a g o n a l (a ,0) mask = np . n o n z e r o ( a )
12 13 14
# random genome generation g e n o m e = np . r a n d o m . u n i f o r m ( -1.0 ,1.0 , len ( mask [0]) )
15 16 17
# genome evaluation e v a l u a t e ( genome , q u e s t i o n n a i r e N u m p y [1:] , q u e s t i o n n a i r e N u m p y [0] , mask )
Since the MSE measures errors, a lower value indicates a better fitness. The code above yields varying fitness values given the randomness, but on average the fitness would be relatively high for an example of this size. This is expected, since we created a single random genome to illustrate the evaluation process and we have not
128
D. Bernard and P. J. Giabbanelli
applied optimizations yet. As shown in the next section when we start using genetic algorithms, the error decreases (i.e., the fitness increases).
7.4 Genetic Algorithms The Distributed Evolutionary Algorithms in Python (DEAP) library provides access to a comprehensive toolbox that readily implements genetic algorithms. To instantiate a tool, we need to import the necessary packages, define the domain of values for fitness and for edge weights, express how to initialize a genome and set a genome size. This code is reusable by modelers and proceeds as follows: 1 2 3 4 5
from deap i m p o r t from deap i m p o r t from deap i m p o r t from deap i m p o r t import random
base creator tools algorithms
6 7
8 9
10
# we are in a mono - o b j e c t i v e c o n t e x t ( mse to be m i n i m i z e d ) so w e i g h t s = -1 c r e a t o r . c r e a t e ( " F i t n e s s M i n " , base . Fitness , w e i g h t s =( -1.0 ,) ) # our i n d i v i d u a l s ( g e n o m e s ) are r e p r e s e n t e d in the form of a list c r e a t o r . c r e a t e ( " I n d i v i d u a l " , list , f i t n e s s = c r e a t o r . FitnessMin )
11 12 13
M I N _ B O U N D = -1 # we d e f i n e the w e i g h t b o u n d s [ -1 ,1] MAX_BOUND = 1
14 15 16 17 18
19
20
g e n o m e S i z e = len ( mask [0]) t o o l b o x = base . T o o l b o x () # d e f i n e genome , individual , p o p u l a t i o n t o o l b o x . r e g i s t e r ( " g e n o m e " , np . r a n d o m . uniform , MIN_BOUND , MAX_BOUND , g e n o m e S i z e ) toolbox . register (" individual " , tools . initIterate , creator . Individual , toolbox . genome ) t o o l b o x . r e g i s t e r ( " p o p u l a t i o n " , tools . initRepeat , list , toolbox . individual )
When applying the code, we provide the data and distinguish the initial values (which would be given as input vector to our FCM during evaluation) from the rest of the data (used to evaluate fitness). We now express the remaining three components of the algorithm: mixing A common approach is to use a crossover (Fig. 7.4), consisting of picking a pivot point in two parent genomes (e.g., genome 1 left/right and genome 2 left/right) and swapping the parts to create offsprings (genome 1 left / genome 2 right, genome 1 left / genome 2 right). Note that mixing is not required in a solution. For example, there are population-based crossoverless evolutionary algorithms such as the ‘plant propagation algorithm’ (whose name is only intended as a metaphor)
7 Creating FCM Models from Quantitative Data …
129
Fig. 7.4 Crossover (also known as ‘recombination’) is a genetic operator that takes as input two parents and generates new genomes. A crossover is a point randomly picked along the genome of the parents (a). When two one-point crossovers are performed (with different points), we get a two-point crossover (b). The process can be repeated for a .k-point crossover
“whose main feature is that the fitter individuals in the population produce more offspring with smaller mutations and, conversely, the unfitter individuals produce fewer offspring with larger mutations” [4]. The emphasis is thus on the interplay of fitness and mutation, which eliminates the need for crossover. mutating In a uniform mutation, we replace values of the genome at a given probability (.indpb) by a value uniformly drawn within a specific range. Other mutations may include additional features, such as a parameter controlling the extant to which the mutated value (or ‘mutant’) should resemble the parent or can deviate from it. selecting One approach is the tournament selection, in which a random subset of genomes competes in a tournament and the winner (with best fitness) is retained. The level of selection pressure can be controlled through the size of the subset. Intuitively, being the best out of three genomes is not much pressure compared to having to be the best out of 10. For complementary readings on selection processes in genetic algorithms, we refer the reader to several classic studies [3, 7] and more recent reviews [16]. Note that new processes for selection continue to be proposed in the literature [10]. As the process is repeatedly applied, we track the best genomes. In the DEAP library, the record is performed through a ‘hall of fame’. In the following code example, we only request to track the very best genome, but we note that this algorithm can produce several solutions of similar fitness values:
130
1 2 3
D. Bernard and P. J. Giabbanelli
i n i t C o n c e p t V a l u e s = q u e s t i o n n a i r e N u m p y [0] d a t a T o F i t = q u e s t i o n n a i r e N u m p y [1:] toolbox . register (" evaluate " , lambda genome : [ evaluate ( genome , dataToFit , i n i t C o n c e p t V a l u e s , mask ) ])
4 5
6 7 8 9 10
# the u n i f o r m m u t a t i o n f u n c t i o n ( for f l o a t s ) d o e s n ’ t e x i s t in DEAP , so we c r e a t e it def m u t U n i f o r m ( individual , low , up , indpb ) : for i in range ( len ( i n d i v i d u a l ) ) : if r a n d o m . r a n d o m () < i n d p b : i n d i v i d u a l [ i ] = r a n d o m . u n i f o r m ( low , up ) r e t u r n individual ,
11 12 13
14
t o o l b o x . r e g i s t e r ( " mate " , tools . c x O n e P o i n t ) t o o l b o x . r e g i s t e r ( " m u t a t e " , mutUniform , low = MIN_BOUND , up = MAX_BOUND , indpb =2.0/ g e n o m e S i z e ) toolbox . register (" select " , tools . selTournament , tournsize =10)
15 16 17
18
# c r e a t i o n of a hall of fame of size 1 # a r c h i v e s the x best i n d i v i d u a l s ( we ’ re only i n t e r e s t e d in the best one here ) hof = tools . H a l l O f F a m e (1)
Having defined the genome and the optimization process, we are now ready to run the algorithm. At this stage, we set the population size (i.e., number of genomes at each generation), start the process (i.e., main loop) and define stopping criteria. Setting adequate stopping criteria is particularly important when comparing algorithms [1]. For instance, some algorithms can perform more evaluations in a single generation, hence setting a maximum number of generations as stopping criterion could lead to biased performance estimates [14]. There are two broad types of criteria. We can have set targets, such as a maximum number of generations or a desired fitness level. Alternatively, we can employ an adaptive target to monitor whether the solution is unlikely to get better. This approach can be further subdivided into two variants [1]. The genotypical approach monitors whether the genomes have converged in terms of content, which happens when a certain percentage of genomes share the same values on the same genome locations. The phenotypical approach checks whether the genomes have converged in terms of performance, as defined by a fitness level that has not improved for more than a given number of generations. We can employ multiple stopping criteria in one problem. The following example uses both a set target with a fixed number of iterations and a phenotypical approach when the average fitness shows little improvements:
7 Creating FCM Models from Quantitative Data …
1 2
131
NGEN = 10000 h i s t o r y = []
3 4 5
p o p S i z e = 500 # n u m b e r of i n d i v i d u a l s in the p o p u l a t i o n pop = t o o l b o x . p o p u l a t i o n ( n = p o p S i z e )
6 7 8 9
f i t n e s s e s = map ( t o o l b o x . evaluate , pop ) for ind , fit in zip ( pop , f i t n e s s e s ) : ind . f i t n e s s . v a l u e s = fit
10 11
hof . u p d a t e ( pop ) # update hall of fame from the p o p u l a t i o n
12 13 14
15
for gen in range ( NGEN ) : # g e n e r a t e o f f s p r i n g from pop , with cxpb the c r o s s o v e r p r o b a b i l i t y , and m u t p b the m u t a t i o n p r o b a b i l i t y o f f s p r i n g = a l g o r i t h m s . v a r A n d ( pop , toolbox , cxpb = 0.2 , mutpb = 0.8)
16 17 18 19 20
# E v a l u a t e the o f f s p r i n g f i t n e s s e s = t o o l b o x . map ( t o o l b o x . evaluate , o f f s p r i n g ) for ind , fit in zip ( offspring , f i t n e s s e s ) : ind . f i t n e s s . v a l u e s = fit
21 22 23 24 25
# update hall of fame from the o f f s p r i n g hof . u p d a t e ( o f f s p r i n g ) h i s t o r y . a p p e n d ( hof [0]. f i t n e s s . values [0]) print ( gen , hof [0]. f i t n e s s . values [0])
26 27
28 29 30 31
# stop o p t i m i z a t i o n if d u r i n g 500 g e n e r a t i o n s the MSE has not i m p r o v e d by more than 0.001 if gen > 501: if ( h i s t o r y [ -500] - h i s t o r y [ -1] < 0 . 0 0 1 ) : break pop = t o o l b o x . s e l e c t ( pop + offspring , p o p S i z e ) # s e l e c t the new p o p u l a t i o n
For concision, we only show the first few results produced by the algorithm as follows, while noting that a complete analysis is performed in the next subsection: 0 1 2 3 4 5
0.04172689129563648 0.039064868585156204 0.0370178945862414 0.03570346476234482 0.034502705615025504 0.0333912054693354
7.5 Analysis Once the optimization process is conducted, users may want to see how the process performed (e.g., to check that the maximum number of generations was chosen reasonably), obtain its best model, and contrast the simulated outputs from that
132
D. Bernard and P. J. Giabbanelli
model with real-world data. We address these common analytical tasks in the present section. A common insight on performances is to monitor the fitness as a function of the number of generations. The necessary information was recorded in a variable history from the previous code listing, in which we added the fitness value at each iteration. The information can thus be plotted via the following code: 1 2 3 4 5 6
i m p o r t m a t p l o t l i b . p y p l o t as plt plt . plot ( h i s t o r y ) # plt . y s c a l e ( ’ log ’) plt . x l a b e l ( ’ g e n e r a t i o n ’ ) plt . y l a b e l ( ’ MSE ’ ) plt . show ()
Figure 7.5 below shows that the initial level of error decreased very quickly and then stabilized, hence there was no apparent need for more than 500 generations. To obtain the best adjacency matrix, remember that (1) the ‘hall of fame’ tracks the best genome(s) in the DEAP library and (2) a genome can be transformed into an adjacency matrix by reshaping and using a mask. We thus create an adjacency matrix of the appropriate size, populate it with 0, and replace some of these values with the content of the genome via the mask: 1
2
w e i g h t s M a t r i x = np . z e r o s (( q u e s t i o n n a i r e N u m p y . s h a p e [1] , q u e s t i o n n a i r e N u m p y . shape [1]) ) w e i g h t s M a t r i x [ mask ] = hof [0]
Fig. 7.5 Visual inspection of the performances of the genetic algorithm, as plotted in Python
7 Creating FCM Models from Quantitative Data …
133
Fig. 7.6 Data points from a questionnaire (orange) used as ground truth are compared with simulated data produced by an FCM (blue) built from the best genome in the previous section
In this example, one matrix can have the following values (which have been truncated for readability): array ([[ 0. , 0.20.. , 0.18.. , -0.89..] , [ -0.52.. , 0. , -0.99.. , -0.99..] , [ -0.99.. , 0.77.. , 0. , 0.57..] , [ -0.52.. , -0.99.. , -0.99.. , 0. ]])
Finally, we wish to observe the simulations from this model with regard to realworld data. This means that we will start the FCM with the same node input values as in the data, run the simulation, and compare simulated values of each node with real-world values. One way to present this information is to have a time series (values over time) for each node of the FCM. When there are many nodes, this information can be packed into a set of plots, such as a set of 2x3 time series for six nodes. In the code shown below, we extract the data for plotting from both the simulation and the ground truth dataset, and then we use the Seaborn visualization library to create the set of time series shown in Fig. 7.6. We emphasize that the value of the guiding example resides in demonstrating the key steps of genetic algorithms, rather than in obtaining the best model; a state-of-the-art approach is shown in the next section. In this example, we observe that some the model closely follows real-world data for one node (‘suicide ideation’) but struggles for two other nodes (‘suicide attempt’, ‘mental healthcare’). Note that modelers should not seek to obtain accurate results on all nodes: we recommend to focus on closely matching the most important nodes, whose outcome are closely monitored by model users. The fitness measure (e.g., MSE) can be weighted to favor solutions that fit on the most important nodes. 1
i m p o r t s e a b o r n as sns
2 3
n S t e p s = q u e s t i o n n a i r e N u m p y . shape [0]
4 5 6
7 8 9 10
result = np . copy ( q u e s t i o n n a i r e N u m p y ) result [1:] = s i m u l a t e F C M ( i n i t C o n c e p t V a l u e s , w e i g h t s M a t r i x , nSteps -1) r e s u l t = pd . D a t a F r a m e ( result , c o l u m n s = q u e s t i o n n a i r e . c o l u m n s ) r e s u l t [ ’ step ’ ] = list ( range ( n S t e p s ) ) r e s u l t [ ’ from ’ ] = ’ FCM ’ data = pd . D a t a F r a m e ( q u e s t i o n n a i r e N u m p y , c o l u m n s = questionnaire . columns )
134 11 12 13
D. Bernard and P. J. Giabbanelli
data [ ’ step ’ ] = list ( range ( n S t e p s ) ) data [ ’ from ’ ] = ’ q u e s t i o n n a i r e ’ r e s u l t = pd . c o n c a t ([ result , data ] , i g n o r e _ i n d e x = True ) . melt ( i d _ v a r s =[ ’ step ’ , ’ from ’ ])
14 15
16 17 18 19 20 21 22
g = sns . F a c e t G r i d ( data = result , col = ’ v a r i a b l e ’ , c o l _ w r a p =4 , hue = ’ from ’ ) g . map ( sns . lineplot , ’ step ’ , ’ value ’ ) g . map ( sns . s c a t t e r p l o t , ’ step ’ , ’ value ’ ) g . set ( ylim =(0 , 1.05) ) g . a d d _ l e g e n d () for item , ax in g . a x e s _ d i c t . items () : ax . s e t _ t i t l e ( item ) plt . show ()
7.6 CMA-ES CMA-ES (Covariance matrix adaptation evolution strategy) is an evolutionary algorithm [8] governed by three parameters. The centroid and step-size parameter .σ are used to initialize the evolutionary strategy (Fig. 7.7). .σ can intuitively be understood as the spread around the centroid. .λ is the number of evaluations (or ‘offsprings’) at each generation. The user does not have to set .λ, as its default value already accounts for the complexity of the optimization problem. Consequently, CMA-ES is considered ‘almost’ parameter-free in contrast with other evolutionary algorithms. To begin using CMA-ES, we import the relevant packages from the DEAP library: 1 2 3 4 5
6 7
8
from deap i m p o r t base from deap i m p o r t cma from deap i m p o r t c r e a t o r from deap i m p o r t tools # we are in a mono - o b j e c t i v e c o n t e x t ( mse to be m i n i m i z e d ) so w e i g h t s = -1 c r e a t o r . c r e a t e ( " F i t n e s s M i n " , base . Fitness , w e i g h t s =( -1.0 ,) ) # our i n d i v i d u a l s ( g e n o m e s ) are r e p r e s e n t e d in the form of a list c r e a t o r . c r e a t e ( " I n d i v i d u a l " , list , f i t n e s s = c r e a t o r . FitnessMin )
When optimizing the weights of an FCM, the search space must be within the [.−1, 1] interval. While CMA-ES does not directly allow to limit the search space, it can be forced to remain within a desired domain. Any weight outside the desired search space will be repositioned to the nearest boundary of the search space, and a penalty (as a function of the distance to the nearest boundary) will be added to the result produced by the evaluation function. This mechanism ensures that genomes outside the search space yield a lower fitness than the nearest genome within the space. The following functions are thus added:
7 Creating FCM Models from Quantitative Data …
135
Fig. 7.7 Since we are looking for edge weights in the interval [.−1, 1], we start the centroid in the middle of this search space at (0, 0). We need a .σ that is neither too large nor too small, so that we do not miss points beyond the search space. In this example, we set .σ = 1 as shown by the magenta circle in the middle of the top-left figure (generation 0). Colors express the fitness of a solution, ranging from low fitness (blue) to medium (yellow) and high (red). As the algorithm is applied, the parameters shift the solution toward areas of higher fitness
1 2
M I N _ B O U N D = -1 # w e i g h t b o u n d s in [ -1 , 1] MAX_BOUND = 1
3 4 5 6
def d i s t a n c e ( f e a s i b l e _ i n d , o r i g i n a l _ i n d ) : """ d i s t a n c e f u n c t i o n to the f e a s i b i l i t y r e g i o n """ r e t u r n sum (( f - o ) **2 for f , o in zip ( f e a s i b l e _ i n d , original_ind ))
7 8 9
10 11 12 13
def c l o s e s t _ f e a s i b l e ( i n d i v i d u a l ) : """ A f u n c t i o n r e t u r n i n g a valid i n d i v i d u a l from an i n v a l i d one . """ f e a s i b l e _ i n d = np . array ( i n d i v i d u a l ) f e a s i b l e _ i n d = np . m a x i m u m ( MIN_BOUND , f e a s i b l e _ i n d ) f e a s i b l e _ i n d = np . m i n i m u m ( MAX_BOUND , f e a s i b l e _ i n d ) return feasible_ind
14 15 16 17 18 19 20
def v a l i d ( i n d i v i d u a l ) : """ D e t e r m i n e s if the i n d i v i d u a l is valid or not . """ inp = np . array ( i n d i v i d u a l ) if any ( inp < M I N _ B O U N D ) or any ( inp > M A X _ B O U N D ) : r e t u r n False r e t u r n True
136
D. Bernard and P. J. Giabbanelli
We then configure the evaluation function. The evaluation function from the DEAP library allows a single parameter: the genome. We thus wrap it in a lambda function, which allows to define the parameters involved in the evaluation of all genomes. Since the DEAP library is flexibly designed to support multi-objective optimization but we wish to optimize a single measure (the fitness as quantified by MSE), we configure the evaluation to return a list containing only the MSE. 1 2 3 4
5
t o o l b o x = base . T o o l b o x () i n i t C o n c e p t V a l u e s = q u e s t i o n n a i r e N u m p y [0] d a t a T o F i t = q u e s t i o n n a i r e N u m p y [1:] toolbox . register (" evaluate " , lambda genome : [ evaluate ( genome , dataToFit , i n i t C o n c e p t V a l u e s , mask ) ]) t o o l b o x . d e c o r a t e ( " e v a l u a t e " , tools . C l o s e s t V a l i d P e n a l t y ( valid , c l o s e s t _ f e a s i b l e , 1.0 e +6 , d i s t a n c e ) )
We initialize the usual parameters of genome size and population size, and define the CMA-ES specific parameters consisting of the centroid, .σ , and .λ. As shown in Fig. 7.7, we set the centroid to (0, 0) since it is in the middle of the search space of edge weights defined in [.−1, 1]. As explained in the previous section, the ‘hall of fame’ records the best genome. 1 2 3
N = len ( mask [0]) # genome size # n u m b e r of i n d i v i d u a l s in the p o p u l a t i o n p o p S i z e = 500
4 5 6
7
8
# d e f i n i t i o n of the CMA - ES s t r a t e g y s t r a t e g y = cma . S t r a t e g y ( c e n t r o i d = [ 0 . 0 ] * N , sigma =0.25 , lambda_ = popSize ) t o o l b o x . r e g i s t e r ( " g e n e r a t e " , s t r a t e g y . generate , c r e a t o r . Individual ) toolbox . register (" update " , strategy . update )
9 10 11
# tracks the best i n d i v i d u a l g e n o m e hof = tools . H a l l O f F a m e (1)
Finally, we can run the CMA-ES algorithm as shown by the code below. This code displays the same information as in the Sect. 7.4, consisting of the fitness as each generation. The performance of the optimization process can be visualized as explained in Sect. 7.5, and the best genome can be converted into an FCM in the same manner.
7 Creating FCM Models from Quantitative Data …
1 2
137
NGEN = 10000 h i s t o r y = []
3 4 5
6
7
for gen in range ( NGEN ) : p o p u l a t i o n = t o o l b o x . g e n e r a t e () # g e n e r a t e new population # If CMA - ES c o n v e r g e s to a s i n g l e solution , the c o v a r i a n c e m a t r i x may p r o d u c e i n d i v i d u a l s with nan . If so , o p t i m i z a t i o n is c o m p l e t e . if np . any ( np . isnan ( p o p u l a t i o n ) ) : break
8 9 10
11 12
# E v a l u a t e the i n d i v i d u a l s f i t n e s s e s = list ( t o o l b o x . map ( t o o l b o x . evaluate , population )) for ind , fit in zip ( population , f i t n e s s e s ) : ind . f i t n e s s . v a l u e s = fit
13 14 15 16 17
# update hall of fame from the p o p u l a t i o n hof . u p d a t e ( p o p u l a t i o n ) h i s t o r y . a p p e n d ( hof [0]. f i t n e s s . values [0]) print ( gen , hof [0]. f i t n e s s . values [0])
18 19 20 21
if gen > 501: # stop c o n d i t i o n if ( h i s t o r y [ -500] - h i s t o r y [ -1] < 0 . 0 0 1 ) : break
22 23
24
# U p d a t e the c u r r e n t c o v a r i a n c e m a t r i x s t r a t e g y from the population . toolbox . update ( population )
In this case, the best genomes obtained by CMA-ES and the genetic algorithm fit real-world data in the same way. This exemplifies that the optimization algorithms do not necessarily yield a better outcome: the optimal solution was identified by either method, as the case study is relatively simple. However, CMA-ES finds it faster. It is thus important to keep in mind the quality of the solution and the compute time necessary to achieve it. In our prior works, we found that CMA-ES was 15 times faster to create FCMs than a genetic algorithm [2, 17]. As the case study was more complex, there was also a notable difference between the fitness obtained by CMA-ES and the genetic algorithm. In other words, CMA-ES was able to obtain a higher fit while performing fewer evaluations. The quality of a solution can be nuanced based on: sample scaling While we exemplified key concepts to generate one FCM, it may be of interest to create multiple FCMs. For example, this allows to equip each agent of an ABM with its own decision-making module, as explained in Chap. 4. Generally, we expect a linear scaling with the number of FCMs to generate. If we have to create two FCMs, it should take about twice as long as making one. problem hardness Some real-world trajectories may be easier to follow than others (Fig. 7.6). Intuitively, a ‘harder’ instance translates to a genome with a lower fit. We can thus categorize instances based on the quartile of their fitness, and study various measures (e.g., compute time) as a function of the fitness quartile.
138
D. Bernard and P. J. Giabbanelli
measurement scale The data could be as minimal as an initial case and a final value. However, data could also be longitudinal, such as time series [18]. If an FCM has to replicate real-world data on more measurements, it results in more calculations (and likely a lower fit). If modelers are interested in the scaling of their solution with respect to measurements but they do not have many data points, one approach is to build a synthetic benchmark by adding points between the existing ones (i.e., interpolation), as shown in [2]. measurement noise Measurements are not perfect, and the level of noise (or variability) could have an impact onto the quality of a solution. To study the impact of noise onto the results, we need a controllable level of noise. This is not directly possible by using one dataset, since it has a set amount of noise. In the same manner as when we study measurement scaling, we can study the impact of measurement noise by adding synthetic points with a controlled amount of noise (e.g., defined as the deviation from the interpolation of existing data).
Exercises 1. This chapter focused on population-based algorithms, such as genetic algorithms. However, there also exist single-individual algorithms, such as simulated annealing. After reading [6], compare these two methods in the context of optimizing Fuzzy Cognitive Maps. 2. Another optimization approach is to use swarm intelligence, which does not have the detailed evolutionary mechanisms shown in this chapter. Early works have particle swarm optimization (PSO) for FCMs. Contrast the search strategy of PSO as explained in [13] with the process used by CMA-ES. 3. While the algorithms explained in this chapter were rooted in evolution (sometimes in a metaphorical way), other algorithms were inspired by the behavior of animals. In particular, the Artificial Bee Colony (ABC) algorithm or Ants Colony Optimisation (ACO) mimic how the foraging behavior of individuals is geared favors a group-level optimization. Both of these solutions have individuals looking for food sources and signaling it to others, either through direct communication (for bees) or indirectly via the environment (using pheromones for ants). Using either the ABC algorithm in [20] or the ACO algorithm in [5], explain how the algorithm works in the context of FCMs. 4. In Sect. 7.3, we defined a case study via a questionnaire consisting of four concepts, recorded over four measurements. Using the following questionnaire instead, repeat our procedure and visualize how the best genomes (from both CMA-ES and the genetic algorithm) are approximating the data.
7 Creating FCM Models from Quantitative Data …
139
questionnaire = { ’ O b e s i t y ’ :[0.2 ,0.35 ,0.5] , ’ P h y s i c a l a c t i v i t y ’ :[0.2 ,0.15 ,0.1] , ’ F o o d i n t a k e ’ :[0.5 ,0.6 ,0.65] , ’ Stress ’ :[0.4 ,0.35 ,0.35] }
5. Consider the more detailed questionnaire below. As shown in [2], create a function that can add new data points by interpolation between existing ones. Then, examine how the quality and compute time of CMA-ES depend on the number of data points. questionnaire = { ’ P e r c e i v e d _ I n t a k e ’ : [0.8 , 0.6 , 0.4] , ’ A t t i t u d e _ H e a l t h y _ R ’ : [0.8 , 0.8 , 1.0] , ’ A t t i t u d e _ C o s t _ R ’ : [0.6 , 0.6 , 0.6] , ’ S e l f _ E f f i c a c y _ C a n _ E a t ’ : [1.0 , 0.8 , 1.0] , ’ S e l f _ E f f i c a c y _ D i f f i c u l t _ R ’ : [0.8 , 0.8 , 1.0] , ’ S o c i a l _ N o r m s _ N o r m a t i v e ’ : [0.8 , 0.8 , 0.8] , ’ S o c i a l _ N o r m s _ M o d e l i n g ’ : [0.6 , 0.4 , 0.8] , ’ I n t e n t i o n ’ : [1.0 , 0.8 , 0.8] , ’ P l a n n i n g _ A c t i o n _ W h e n ’ : [0.5 , 0.5 , 1.0] , ’ P l a n n i n g _ A c t i o n _ W h i c h ’ : [0.5 , 0.5 , 1.0] , ’ P l a n n i n g _ A c t i o n _ H o w _ M a n y ’ : [0.5 , 0.5 , 1.0] , ’ P l a n n i n g _ C o p i n g _ D i f f i c u l t _ I n t e r f e r e s ’ : [0.5 , 0.5 , 1.0] , ’ P l a n n i n g _ C o p i n g _ D i f f i c u l t ’ : [0.5 , 0.5 , 1.0] , ’ F r u i t _ a t _ H o m e _ R ’ : [1.0 , 1.0 , 1.0] , ’ F r u i t _ a t _ H o m e _ L o c a t i o n _ R ’ : [1.0 , 1.0 , 1.0] , ’ P i e c e s _ D a y ’ : [ 0 . 3 4 4 8 2 7 5 8 6 2 0 6 8 9 6 6 , 0.3125 , 0.12244897959183673]}
References 1. I. Abu Doush, M. El-Abd, A.I. Hammouri, M.Q. Bataineh, The effect of different stopping criteria on multi-objective optimization algorithms. Neural Comput. Appl. 1–31 (2021) 2. D. Bernard, S. Cussat-Blanc, P.J. Giabbanelli, Fast generation of heterogeneous mental models from longitudinal data by combining genetic algorithms and fuzzy cognitive maps, in Proceedings of the 56th Hawaii International Conference on System Sciences (2023), pp. 1570–1579 3. T. Blickle, L. Thiele, A comparison of selection schemes used in evolutionary algorithms. Evolut. Comput. 4(4), 361–394 (1996) 4. N. Brouwer, D. Dijkzeul, L. Koppenhol, I. Pijning, D. Van den Berg, Survivor selection in a crossoverless evolutionary algorithm, in Proceedings of the Genetic and Evolutionary Computation Conference Companion (2022), pp. 1631–1639 5. Y. Chen, L. Mazlack, L. Lu, Learning fuzzy cognitive maps from data by ant colony optimization, in Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation (2012), pp. 9–16 6. M. Ghazanfari, S. Alizadeh, M. Fathian, D.E. Koulouriotis, Comparing simulated annealing and genetic algorithm in learning fcm. Appl. Math. Comput. 192(1), 56–68 (2007) 7. D.E. Goldberg, K. Deb, A comparative analysis of selection schemes used in genetic algorithms, in Foundations of Genetic Algorithms, vol. 1 (Elsevier, 1991), pp. 69–93
140
D. Bernard and P. J. Giabbanelli
8. N. Hansen, S.D. Müller, P. Koumoutsakos, Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (cma-es). Evolut. Comput. 11(1), 1–18 (2003) 9. W. Hoyos, J. Aguilar, M. Toro, Federated learning approaches for fuzzy cognitive maps to support clinical decision-making in dengue. Eng. Appl. Artif. Intell. 123, 106371 (2023) 10. A. Hussain, S. Riaz, M.S. Amjad, E. Ul Haq, Genetic algorithm with a new round-robin based tournament selection: statistical properties analysis. Plos One 17(9), e0274456 (2022) 11. W. Liang, Y. Zhang, X. Liu, H. Yin, J. Wang, Y. Yang, Towards improved multifactorial particle swarm optimization learning of fuzzy cognitive maps: a case study on air quality prediction. Appl. Soft Comput. 130, 109708 (2022) 12. E.I. Papageorgiou, Learning algorithms for fuzzy cognitive maps’a review study. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(2), 150–163 (2011) 13. K.E. Parsopoulos, E.I. Papageorgiou, P.P. Groumpos, M.N. Vrahatis, A first study of fuzzy cognitive maps learning using particle swarm optimization, in The 2003 Congress on Evolutionary Computation, 2003. CEC’03, vol. 2 (IEEE, 2003), pp. 1440–1447 ˇ 14. M. Ravber, S.-H. Liu, M. Mernik, M. Crepinšek, Maximum number of generations as a stopping criterion considered harmful. Appl. Soft Comput. 128, 109478 (2022) 15. J.L. Salmeron, T. Mansouri, M.R. Sadeghi Moghadam, A. Mardani, Learning fuzzy cognitive maps with modified asexual reproduction optimisation algorithm. Knowl.-Based Syst. 163, 723–735 (2019) 16. A. Shukla, H.M. Pandey, D. Mehrotra, Comparative review of selection techniques in genetic algorithm, in 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE) (IEEE, 2015), pp. 515–519 17. M.K. Wozniak, S. Mkhitaryan, P.J. Giabbanelli, Automatic generation of individual fuzzy cognitive maps from longitudinal data, in International Conference on Computational Science (Springer, 2022), pp. 312–325 18. K. Wu, J. Liu, Learning large-scale fuzzy cognitive maps under limited resources. Eng. Appl. Artif. Intell. 116, 105376 (2022) 19. Z. Yang, J. Liu, Learning of fuzzy cognitive maps using a niching-based multi-modal multiagent genetic algorithm. Appl. Soft Comput. 74, 356–367 (2019) 20. E. Yesil, C. Ozturk, M.F. Dodurka, A. Sakalli, Fuzzy cognitive maps learning using artificial bee colony optimization, in 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (IEEE, 2013), pp. 1–8
Chapter 8
Advanced Learning Algorithm to Create FCM Models From Quantitative Data Agnieszka Jastrze˛bska and Gonzalo Nápoles
Abstract This chapter describes an FCM model for decision-making and prediction problems where concepts are split into inputs and outputs. A key property of this model relies on its hybrid nature, where experts are expected to define the weights of some relationships while others are learned from the data. The learning procedure does not alter the weights defined by the experts, thus enabling hybrid reasoning. The learning algorithm computes the learnable weights from historical data using the Moore-Penrose inverse, a mathematically solid and fast operation. In this regard, we develop a toy example that illustrates how to use this algorithm to solve a prediction problem using an existing Python implementation. Supplementary aspects addressed in this chapter revolve around optimizing the network topology. As such, we describe two methods to identify and eliminate redundant relationships connecting inputs with outputs without significantly harming the simulation results. The chapter also conducts a parameter sensitivity analysis and compares the Moore-Penrose inversebased learning method with metaheuristic-based methods. After reading this chapter, we expect the reader can master the foundations of these algorithms and apply them to datasets describing multi-output prediction problems.
8.1 Introduction Let us assume we want to model a complex system where the input and output variables have already been identified. Economics is a prominent example of a domain where such models are much needed. A viable option to create our model is to rely on the knowledge of human experts, as discussed in Chaps. 1 and 2. For example, we A. Jastrze˛bska (B) Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland e-mail: [email protected] G. Nápoles Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_8
141
142
A. Jastrze˛bska and G. Nápoles
could ask them to use their expertise to define relationships between certain variables that ought to be present in the model. Overall, the knowledge provided by experts is a precious insight that we wish to capture, and often, it makes sense to prioritize human-provided information over information acquired through automatic learning procedures. However, modeling complex problems requires interdisciplinary teams of subject-matter experts that are forced to define a considerable number of parameters with a high degree of precision. Therefore, we need methods able to reason over pieces of knowledge provided by human experts and evidence-based knowledge mined from the available historical data [10]. In some cases, FCMs can be built in purely data-driven processing pipelines. It means that a model can be constructed without any human intervention. As such, concepts and weights characterizing the relationships can be mined automatically from the data. Such a model construction scenario is attractive for applications where expert knowledge is not available. However, it is not leveraging all the advantages offered by cognitive modeling. Moreover, fusing expert knowledge concerning modeled phenomena with knowledge mined automatically from the data takes the best of these two worlds. A field that benefits from this approach is social sciences, in which some phenomena that should be present in a model are not measured with empirical data suitable for automatic processing. Besides the inability to leverage human expert knowledge in the FCM training procedure, purely data-driven approaches typically share other weaknesses. First and foremost, the weights resulting from these supervised learning algorithms can hardly be considered causal. Instead, they act as coefficients in a regression model where input concepts act as regressors and the output ones act as dependent variables. Any attempt to provide a causal meaning to these coefficient-like weights should be derived from controlled experiments and the results validated by human experts. That is why a hybrid approach involving expert knowledge would help align the FCM model with the physical system being modeled. The second limitation concerns the usage of metaheuristic-based learning algorithms to compute the weights from the available data. Firstly, these algorithms often produce very dense networks that are difficult to understand and mine. The literature reports some successful learning algorithms that produce sparse weight matrices, such as the methods developed by Wu and Liu [23] or Chi and Liu [3]. Secondly, metaheuristic-based learning algorithms become quite slow when having too many instances and variables [19] while often being sensitive to the parametric settings [1]. An approach that answers the need for fast FCM training was proposed by Nápoles, Jastrze˛bska, Mosquera, Vanhoof, and Homenda [14]. This method uses a mathematical approach to compute the weight matrix in a single iteration while allowing the experts to define the weights defining the relationships between the input variables. Moreover, the algorithm is coupled with a new reasoning rule that helps prevent convergence to the often problematic unique fixed-point attractors. Besides describing this method in detail, this chapter will elaborate on two approaches to calibrate the model after we suppress connections from the network.
8 Advanced Learning Algorithm to Create FCM Models …
143
8.2 Hybrid Fuzzy Cognitive Map Model Let us discuss the architecture of an FCM model for decision-making composed of two blocks. The inner block concerns the input concepts and the relationships between them. In this part, the expert is expected to define weights in the .[−1, 1] interval characterizing the relationships between input concepts. Although an input concept can be related to any other concept in the network, the expert might decide to define a subset of all possible relationships for the sake of simplicity. The outer block concerns the relationships between input and output concepts. These relationships are not defined by the expert, but computed from the historical data using a learning algorithm. Figure 8.1 shows an example involving five variables where three are deemed inputs while the others are regarded as outputs. The weight matrix of this FCM model is denoted as .W, and it is composed of two sub-matrices .W I and .W O . .W I contains information concerning relationships ideally defined by the experts. .W O collects relationships that will be learned from historical data. .W I remains fixed in the learning procedure. It should be highlighted that neural concepts are split into inputs and outputs. While input concepts are initialized with the normalized values that input variables take, output concepts will not be initialized since their values will result from the reasoning mechanism. As such, the inner block captures the dynamic properties of the system being modeled as defined by the relationships between the input concepts and their activation values. In contrast, the outer block focuses on computing the output produced by the FCM model for a given input.
Fig. 8.1 Hybrid FCM model for a decision-making or prediction problem with three inputs (.x1 , .x2 , .x3 ) and two outputs (. y1 , . y2 )
Inner block 1
Outer block 1
2
2
3
144
A. Jastrze˛bska and G. Nápoles
•
! Important
In some fields, the inputs of a system are considered entities that remain constant during reasoning. In our hybrid FCM model, input concepts can be influenced by other input concepts. Moreover, it is assumed that output concepts do not influence other concepts regardless they represent input or output variables.
Overall, this model is considered hybrid because it fuses both the expert knowledge and the patterns extracted from the historical data through a learning algorithm. The strategy in which we involve human experts in defining parameters of machine learning models is called human-in-the-loop. Another aspect that deserves attention is the reasoning rule used to perform inference and its convergence behavior. If the inner block converges to a unique fixed-point attractor, it will feed the outer block with the same state vector. This happens because an FCM model that converges to a unique fixed point will output the same output regardless of the initial activation values used to start the reasoning process. Therefore, if this situation comes to light, the outer block will also produce the same values for any possible initial activation vector. One approach to tackle this issue would be to stop the reasoning process when we have reasons to believe that the network converges is converging to a unique fixedpoint attractor. However, this state must be detected after inspecting the network’s outputs for a sizable batch of initial activation vectors, which might be computationally demanding. Another approach relies on the quasi-nonlinear reasoning rule introduced in [12], which can be deemed a generalization of the classic reasoning rule discussed in Chap. 1 (Eq. 1.1). The intuition of the quasi-nonlinear reasoning rule is that a concept’s activation value in the current iteration is given by the weighted aggregation between the concept’s activation value in the previous iteration and its initial activation value. In that way, the concepts’ current states will directly depend on the initial conditions, thus preventing convergence to a unique fixed-point attractor. Eq. (8.1) formalizes this rule, which expresses that the concept’s activation values in the .t-th iteration depend on the initial conditions, ⎛ a (t) = φ · f i ⎝
N ∑
. ki
⎞ w ji ak(t)j ⎠ + (1 − φ) · aki(0) , i /= j.
(8.1)
j=1
where . N is the number of concepts, .k denotes the activation vector index used to initialize the concepts, and .w ji is the weight connecting .c j with .ci . In the inner block, I O .w ji ∈ W . In the outer block, .w ji ∈ W . In this model, the nonlinearity coefficient .φ ∈ [0, 1] determines what share of the current output is being contributed by the previous state and the initial conditions. In this way, the signal may be strengthened by the initial activation values in the intermediate stages of model iterations. The linear component ensures that, in each iteration, we add a term resulting from scaling the
8 Advanced Learning Algorithm to Create FCM Models …
145
initial activation vector. Note that setting .φ to 1.0 results in the classical reasoning rule, while setting it to 0.0 implies no recurrence. This quasi-nonlinear reasoning rule allows controlling the network convergence to a great extent. Nápoles et al. [11] proved that .0 < φ < 1 guarantees that the fixedpoint attractor does not exist. Such a theorem holds for any continuous activation function and weight matrix, regardless of whether the weights between concepts are causal or considered regression-like coefficients. Setting .φ = 1 ensures that the model will converge to a unique fixed point provided that some analytical conditions related to the eigenvalues of.W are fulfilled. The reader is referred to [11] for further details about this reasoning rule and the related proofs. Finally, the activation function. f i (·) in Eq. (8.1) is used to keep the activation value of the .i-th concept in the activation interval. Note that this formalism allows having a custom activation function for each neural concept in the network. The practical usability of this feature will become apparent in the next sections when we present the weight normalization procedure and the network optimization methods. For the sake of convenience, let us adopt a simplified variant of the well-known generalized sigmoid function, which is formalized as follows: f (x) = li +
. i
u i − li . (1 + e−λi (x−h i ) )
(8.2)
where .λi > 0 and .h i ∈ R are parameters that define the shape of the sigmoid transfer function. .λi controls the slope, .h i stands for the offset, while .li and .u i are two parameters used to enforce domain-specific constraints over each neural concept’s minimal and maximal activation value, respectively.
8.3 Training the Hybrid FCM Model In this section, we will discuss how to prepare the available historical data and formalize the learning task devoted to computing the weights that connect the inner and outer blocks in a supervised manner. Let us denote the training dataset .(X|Y) as the concatenation of two matrices. In this formalization,.X is a. K × R matrix such that. K indicates the number of instances (also referred to as observations or examples). Each instance is described by . R input variables that take values in the.[0, 1] interval. More specifically,.X gathers the values used to activate the input neural concepts, which is required before performing the reasoning process. The input matrix is depicted below such that .xki gives the value of the .i-th input variable according to the .k-th instance:
146
A. Jastrze˛bska and G. Nápoles
⎛
x11 ⎜ .. ⎜ . ⎜ ⎜ .X = ⎜ ⎜ xk1 ⎜ . ⎝ .. xK 1
... .. . .. . .. . ...
⎞ x1i . . . x1R . . . . .. ⎟ . . . ⎟ ⎟ ⎟ .. . xki . xk R ⎟ ⎟ ⎟ . . . . .. ⎠ . . . xKi . . . xK R
The expected (actual) values for the output concepts are gathered in a separate K × M matrix denoted as .Y. In this matrix, . M gives the number of output variables, while . K is the number of instances in the training dataset, which must match for both matrices. The output matrix is depicted below such that .xki gives the value of the .i-th output variable according to the .k-th instance:
.
⎛
y11 ⎜ .. ⎜ . ⎜ ⎜ .Y = ⎜ ⎜ yk1 ⎜ . ⎝ .. yK 1
... .. . .. . .. . ...
⎞ y1i . . . y1M . ⎟ .. .. . . .. ⎟ ⎟ ⎟ .. . . yki yk M ⎟ ⎟ .. ⎟ .. .. . . . ⎠ yK i . . . yK M
The learning process of this FCM model narrows down to computing .W O using the training dataset .[X, Y]. In this regard, Nápoles et al. [13] developed a fast, deterministic, and accurate learning rule based on the Moore-Penrose inverse (described below). It tries to solve the least squares problem, which consists of minimizing the sum of the squares of the differences between the expected outputs and the expected ones. Note that, similarly to metaheuristic-based learning algorithms, this algorithm is not able to produce authentic causal relationships. Background Information In mathematics, the Moore-Penrose inverse . H + of a matrix . H is a popular generalization of the inverse. It allows approximating the inverse of non-square matrices. The Moore-Penrose inverse. H + satisfies the following conditions: (1). H H + H = H , (2) . H + H H + = H + , (3) .(H H + )T = H H + , and (4) .(H + H )T = H + H . The listed conditions are referred to as Moore-Penrose conditions.
The first step toward computing the weights connecting the input and output neurons is to capture the system semantics according to the inputs and the weight matrix .W I . Thus, let .𝚿 (T ) (X) denote an . R × K matrix after performing .T iterations of the FCM inference process on the input matrix .X, that is:
8 Advanced Learning Algorithm to Create FCM Models …
⎛ (T ) a11 ⎜ . ⎜ .. ⎜ ⎜ (T ) (T ) .𝚿 (X) = ⎜ak1 ⎜ . ⎜ . ⎝ . a K(T1)
. . . a1i(T ) .. .. . . . . . aki(T ) .. .. . . . . . a K(Ti)
147
⎞ (T ) . . . a1R . . .. ⎟ . . ⎟ ⎟ ⎟ . . . ak(TR) ⎟ . ⎟ . . .. ⎟ . . ⎠ . . . a K(TR)
Finally, the learning procedure shown in Eq. (8.3) computes .W O without the need to perform multiple reasoning passes or iterations, ( )‡ ( ) W O = 𝚿 (T ) (X) F − Y
.
(8.3)
such that .(·)‡ stands for the Moore-Penrose (MP) inverse of a given matrix, and ⎞ ⎛ − f 1 (y11 ) . . . f i− (y1i ) . . . f M− (y1M ) ⎟ ⎜ .. .. .. .. .. ⎟ ⎜ . . . . . ⎟ ⎜ ( ) ⎜ ⎟ − . . Y = ⎜ f − (yk1 ) . . f − (yki ) . . f − (yk M ) ⎟ .F i M ⎟ ⎜ 1 ⎟ ⎜ .. .. .. .. .. ⎠ ⎝ . . . . . − − − f 1 (y K 1 ) . . . f i (y K i ) . . . f M (y K M ) represents a . K × M matrix containing the inverse of the transfer functions attached to output concepts, which is given below: ( ) y − li 1 −1 (8.4) . fi (y) = h i + ln . λi ui − y i ≤ 0 because the natural logaNote that this expression is undefined when . uy−l i −y rithm is undefined in the domain of real numbers when its argument is negative or zero. Moreover, setting .λi to zero will also make this expression undefined due to division by zero. To avoid both of these situations, we need to ensure that . y is within the .(li , u i ) interval and .λi is greater than zero.
Background Information We can employ an orthogonal projection to obtain the MP inverse. If a matrix . H has linearly independent columns (. H H is nonsingular), then . H ‡ = (H H )− H . In contrast, if . H has linearly independent rows (. H H is nonsingular), then . H ‡ = H (H H )− . The former denotes a left inverse because . H ‡ H = I , while the latter comprises a right inverse because . H H ‡ = I .
148
A. Jastrze˛bska and G. Nápoles
A side-effect of applying the MP inverse is that the produced weights are in the real domain. Thus, to maintain consistency with the FCM formalism, we supplement the relationships learning step with a normalization procedure that ensures that weights stay in the .[−1, 1] interval required by the FCM formalism. Let . yki denote the output generated by the .i-th output concept for a given initial condition. This output value can be computed as follows: y = li +
. ki
u i − li (t+1)
1 + eθki
,
(8.5)
where ( ) (t) θ (t) = −λi w1i ak1 + · · · + w ji ak(t)j + · · · + w Ri ak(t)R − h i .
. ki
The normalization of the weight .w ji to the .[−1, 1] interval is obtained through a two-step algebraic transformation. It consists of dividing .w ji , ∀ j and .h i by the .φi = max j {|w ji |} while simultaneously multiplying .λi by .φi to compensate for the change. Such a transformation can be written as follows: θ (t) = −λi φi
. ki
'
If we make .w ji =
(w
1i
φi
' w ji , .λi φi
( '
(t) ak1 + ··· +
w ji (t) w Ri (t) h i ) ak j + · · · + a − . φi φi k R φi '
= λi φi , and .h i =
hi φi
then .θi is given by
(t) θ (t) = −λi w1i ak1 + · · · + w ji ak(t)j + · · · + w Ri ak(t)R − h i '
'
'
'
)
. ki '
such that .w ji ∈ [−1, 1] as desired. Notice that such normalization is possible when operating with a parametrized activation function that tolerates modifications to its parameters to compensate for the alterations to the network.
•
! Caution
It was already established that an FCM model using the quasi-nonlinear reasoning rule introduced in Eq. (8.1) will not converge to a unique fixed-point attractor when .0 < φ < 1, as mathematically proved in [11]. However, if we insist on using the traditional approach where .φ = 1, then the learning algorithm might fail due to an overflow in the calculations related to the sigmoid function. This scenario, for example, emerges when the FCM model converges to the same or nearly the same fixed-point attractor for all possible initial activation vectors used to perform reasoning. As a result, .𝚿 (T ) (X) will have the same or nearly the same rows, which forces the algorithm to compute very large weights in an attempt to solve the problem. As an implementation trick, we could clip very large weights to prevent overflow situations at the expense of sacrificing performance.
8 Advanced Learning Algorithm to Create FCM Models …
149
8.4 Optimizing the Hybrid FCM Model As mentioned earlier, a common downside of many FCM learning methods is that they focus only on minimizing a specific fitness function, which depending on the particular data analysis task, would help produce more accurate predictions. The issue of model transparency is usually ignored. At the same time, FCMs should illustrate the relationships between their neural concepts. Thus, a natural requirement is that we want the models to be as understandable as possible. While reducing the number of neural concepts is sometimes a viable strategy, another approach is eliminating unimportant relationships from a trained model.
8.4.1 Detecting Superfluous Relationships To obtain a procedure devoted to eliminating unimportant relationships from a trained model, one shall settle on a measure to detect which relationships could be removed from the network without largely harming its performance. The most straightforward solution, discussed, for instance, by Homenda et al. [6], is to remove relationships that have a small absolute value. This strategy, however, suffers from one conceptual drawback. It may turn out that weights with high absolute values get activated to a small degree, while weights with small absolute values will be paired with high activation values. Thus, a better strategy when selecting candidate relationships for removal is to consider both the absolute weight value and the activation values paired with that weight during the training procedure. In this manner, we avoid removing potentially important relationships from the network. The primary challenge of deriving a procedure to detect superfluous relationships is that activation values paired with a given relationship will likely change from iteration to iteration and from one instance to another. This holds even if we fix the number of iterations. Nevertheless, we can formulate the probability for the weight .wi j to be unimportant as depicted below: .
p(w ji ) = 1 − |w ji |E (t) [c j ]
(8.6)
such that . E (t) [c j ] represents the expected activation value of the neural concept .c j in the .t-th iteration. The probability density of concept .c j is not given in an analytic form since it is problem-dependent. Therefore, we shall estimate it, for instance, using the kernel density estimation method [17].
150
A. Jastrze˛bska and G. Nápoles
Background Information A kernel density estimator is a non-parametric estimator designed to determine the density of the distribution of a random variable based on a finite data sample, i.e., a set of observations measured in a certain experiment. The random variable can be univariate or multivariate. Since it is a non-parametric method, no a priori information about the distribution type present is required.
Since we do not know the real distribution of the concepts’ activation values, we must use a generic probability density function. For the sake of simplicity, let us use a Gaussian kernel. Eq. (8.7) shows how to compute the expected activation value for the . j-th neural concept from the available data:
.
E (t) [c j ] =
K ∑
ak(t)j d j (ak(t)j ),
(8.7)
k=1
where .d j (·) is the density function obtained with the Gaussian kernel, which gives statistical information about the concept’s activation values. Background Information The Gaussian probability density function of a normally distributed random variable with expected value .μ and variance .σ 2 is defined as ( ) 1 (x − μ)2 1 .d(x) = . √ exp − 2 σ2 σ 2π
(8.8)
The .σ parameter determines the width of the Gaussian kernel. The term . σ √12π is the normalization constant, which is needed because the integral over the exponential function is not a unity. A kernel defined this way is a normalized kernel, which means that its integral over the domain is unity for every .σ .
8.4.2 Calibrating the Sigmoid Offset After dropping out superfluous relationships, we should calibrate the FCM model by recomputing the retained weights or the sigmoid function parameters. The network calibration is needed to compensate for the removal of relationships that might have an effect, however small, on the model’s outputs.
8 Advanced Learning Algorithm to Create FCM Models …
151
In the first method, we calibrate the offset value that controls the shifting of the sigmoid function along the x-axis. Let us focus on the weights in .W O , estimated from historical data, and assume that the . yki value is outputted by the .i-th neural concept. Let us rewrite Eq. (8.5) as follows: y = li +
. ki
u i − li (t)
1 + e−λi (a¯ ki )
,
(8.9)
where .a¯ i(t) denotes the raw activation value of neuron .ci , just before applying the sigmoid function. In addition, we shall note that (t) a¯ (t) = w1i ak1 + · · · + w ji ak(t)j + · · · + w Ri ak(t)R − h i
. ki
is equivalent to (t) a¯ (t) = w1i ak1 + · · · + w Ri ak(t)R − (h i − w ji ak(t)j ).
. i
By substituting .h i with the exact value given by .(h i − w ji ak(t)j ), it becomes clear that the sigmoid offset depends on the activation value of the . j-th neuron when initialized with the .k-th training instance. While this derivation is correct, it certainly increases the complexity of the calibration method. Thus, we can simplify this formulation by approximating the term .ak(t)j with the expected value of the . j-th neural concept for the available training data. As such, it holds that '
'
(t) a¯ (t) ≈ w1i ak1 + · · · + w Ri ak(t)R − (h i − w ji E (t) [c j ]).
. i
'
hi
Figure 8.2 shows the effect of calibrating the sigmoid function offset for a fixed slope value. In this example, we observe that removing a likely superfluous weight caused the offset value to increase from .h = 0 to .h = 1. The consequence of this increase is that the shape of the .i-th sigmoid function changes.
8.4.3 Calibrating the Weights The second calibration method focuses on adjusting the weights’ values retained in the simplified model (after having discarded the likely superfluous relationships). The intuition of this method consists in distributing the information of the removed relationships among those to be retained. By doing that, let us write the formula describing the raw activation value of the .i-th concept, (t) a¯ (t) = w1i ak1 + · · · + w ji ak(t)j + · · · + w Ri ak(t)R .
. ki
152
A. Jastrze˛bska and G. Nápoles
Fig. 8.2 Offset parameter calibration after removing a likely superfluous relationship from the FCM model
1.0 0.8 0.6
λ=1,h=0.0 λ=1,h=2.5
0.4 0.2
-10
-5
5
10
We wish to distribute .w ji ak(t)j among the remaining . R − 1 relationships impacting the raw activation value of the .ci concept. This can be done as follows: .a ¯ ki(t)
( ( w ji ak(t)j ) w Ri ak(t)j ) (t) (t) + · · · + w Ri ak R + . = w1i ak1 + R−1 R−1
Now, this previous formula is equivalent to .a ¯ ki(t)
=
(t) ak1
( w1i +
w ji ak(t)j R−1 (t) ak1
)
+ ··· +
ak(t)R
(
w Ri +
w Ri ak(t)R R−1 ak(t)R
) .
Since the activation values of concepts change with each instance, we replace them with their expected activation values at the expense of introducing some estimation errors. As a result, the new weights are obtained as follows: ( ( ' ξ(w ji ) ) ξ(w ji ) ) (t) w1i + (t) + · · · + ak(t)R w Ri + (t) , a¯ (t) ≈ ak1 E [c1 ] E [c R ]
. ki
'
'
w1i
such that ξ(w ji ) =
.
w Ri
w ji E (t) [c j ] . R−1
Notice that this calibration procedure calculates the weights based on the computation of . R − 1 neural concepts’ expected values, which might increase the errors in the network’s calibration process.
8 Advanced Learning Algorithm to Create FCM Models …
153
Table 8.1 Data concerning a multi-output decision-making problem Inputs Outputs Instance 1 2 3 4 5
.x1
.x2
.x3
. y1
. y2
0.37 0.60 0.06 0.71 0.83
0.95 0.16 0.87 0.02 0.21
0.73 0.16 0.60 0.97 0.18
0.35 0.37 0.42 0.26 0.33
0.47 0.43 0.50 0.48 0.40
8.5 How to Use These Algorithms in Practice? In this section, we will illustrate how to use these algorithms using a publicly available Python implementation and a toy example concerning decision making. To install the required package, we can just call the command pip install fcm-mp from the Python command line or a Jupyter Notebook. Let us assume that we want to solve a decision-making problem involving three input variables (.x1 , .x2 and .x3 ), two output variables (. y1 and . y2 ) and five problem instances. Table 8.1 shows the dataset where variables take values in the .(0, 1) interval, which can be seen as a single matrix .(X | .Y). Such a decision-making problem requires solving a multi-output prediction problem that estimates the values of the output variables from the input ones for each instance. The first step towards solving this problem using the FCM model introduced in this chapter is to split the dataset into two separate matrices .X and .Y and express them as numpy matrices as shown in the code snippet below. 1
i m p o r t n u m p y as np
2 3
4 5 6 7 8
# T h i s m a t r i x c o n t a i n s the variables X = np . array ([[0.37 , 0.95 , [0.60 , 0.16 , [0.06 , 0.87 , [0.71 , 0.02 , [0.83 , 0.21 ,
data c o n c e r n i n g the i n p u t 0.73] , 0.16] , 0.60] , 0.97] , 0.18]])
9 10
11 12 13 14 15
# T h i s m a t r i x c o n t a i n s the data c o n c e r n i n g the output variables Y = np . a r r a y ([[0.35 , 0.47] , [0.37 , 0.43] , [0.42 , 0.50] , [0.26 , 0.48] , [0.33 , 0 . 4 ] ] )
The next step consists of defining a weight matrix .W I characterizing the interaction between the input variables. Ideally, this matrix should be provided by human
154
A. Jastrze˛bska and G. Nápoles
experts during a knowledge engineering process. To develop our example, we will use an arbitrary weight matrix defined below. 1
2 3 4
# It c h a r a c t e r i z e s the r e l a t i o n s h i p s b e t w e e n i n p u t variables Wi = np . array ([[0.00 , -1.00 , -0.27] , [ -0.50 , 0.00 , 0.15] , [ -0.20 , 0.23 , 0 . 0 0 ] ] )
Now, we are ready to build the FCM model. Besides the weight matrix defining the interaction between the input variables, we can specify the number of iterations .T to be performed during reasoning, the nonlinearity coefficient .φ in Eq. (8.1) and the initial slope .λ and offset .h of the sigmoid function in Eq. (8.2). Note that all concepts will use the same sigmoid function parameters at first, but they will be modified by the normalization and calibration procedures. In practical terms, building the FCM model means (i) performing the reasoning process .T times using each row of .X as an initial activation vector and building the I O .W matrix, (ii) solving the least squares error problem to compute .W and (iii) normalizing the computed weights to express them in the .[−1, 1] interval. Both the object initialization with the desired parameters and the model construction can be done using the following piece of code: 1 2
3 4
from fcm . F C M _ M P i m p o r t F C M _ M P # We f i r s t d e f i n e p a r a m e t e r s and then build the model m o d e l = F C M _ M P ( T =10 , phi =0.5 , s l o p e =1.0 , o f f s e t = 0 . 0 ) m o d e l . fit ( X , Y )
The normalized weight matrix and the corrected sigmoid function parameters can be retrieved using the attributes model.W, model.slope and model.offset, respectively. In our example, the corrected slope is 1.0 and the corrected offset is 0.0, suggesting that weights produced by the algorithm were already in the .[−1, 1] interval. The weight matrix is given as follows: ⎛ ⎞ −0.83 −0.75 O .W = ⎝ 0.5 −0.26⎠ . −0.91 0.56 Finally, we can contrast the predictions made by the model with the ground truth. To obtain the predictions for the training data .X, we can call the model.predict(X) function, which results in the following matrix: ⎛ ⎞ 0.37 0.47 ⎜0.36 0.43⎟ ⎜ ⎟ ˆ = ⎜0.41 0.49⎟ . .Y ⎜ ⎟ ⎝0.26 0.48⎠ 0.34 0.41
8 Advanced Learning Algorithm to Create FCM Models …
155
As we can see, the predictions computed by the FCM model are reasonably close to the ground truth .Y. If we want to quantify how the predictions differ from the ground truth, we can compute a performance metric for regression problems such as the Root Mean Square Error (RMSE), as shown below. 1 2 3
rmse = np . sqrt ( np . mean (( Y - Y_hat ) **2) ) p r i n t ( np . r o u n d ( rmse , 4) ) # RMSE = 0 . 0 0 8 8
Let us inspect the probabilities of weights being superfluous. We are interested in analyzing the weights connecting the inner block with the outer one since weights within the inner block are normally given by experts. As explained, these probabilities combine the expected values of neural concepts and the absolute values of learned weights. This can be done using the following snippet: 1
2
p r o b _ m a t r i x = np . r o u n d ( m o d e l . w e i g h t _ p r o b a b i l i t i e s () ,2) print ( prob_matrix )
The call to the model.weight_probabilities() function gives us the matrix depicted below, which suggests that .w11 = −0.26 is the weakest weight in the matrix connecting the inner block with the outer one. In this example, the weight with the larger probability coincides with the weight having the smallest absolute value since the concept’s expected values are similar. ⎛ ⎞ 0.57 0.62 .Pr = ⎝0.78 0.89⎠ 0.52 0.70 By calling model.remove_and_calibrate(i=1, j=1), we can remove the desired weight .w11 = −0.26 from the network and calibrate the sigmoid function parameters to reduce the effect of such modification. The function returns the updated slope value, which is .h = 1.11 in our example. To inspect the effect of suppressing this weight on the model’s predictions, we can call the model.predict(X) function again, which produces the following output: ⎛ ⎞ 0.37 0.52 ⎜0.36 0.44⎟ ⎜ ⎟ ˆ .Y = ⎜0.41 0.54⎟ ⎜ ⎟ ⎝0.26 0.49⎠ 0.34 0.42 It can be observed that the predictions worsened slightly, which resulted in an RMSE value equal to 0.022 despite the calibration. This happens because the expected value might differ notably from the concept’s values in each instance. Actually, if we replace the expected values with the true concepts’ values in each case, there will be no induced error. Likewise, weights very close to zero will report negligible induced errors after removing them from the model.
156
A. Jastrze˛bska and G. Nápoles
8.6 Applying the FCM Model to Real-World Data In this section, we conduct some empirical studies devoted to the efficacy and efficiency of the algorithms discussed in this chapter. With this goal in mind, we resorted to 35 publicly available datasets described by numerical features. The properties of these datasets are listed in Table 8.2. It should be mentioned that these datasets are typically used to test classification algorithms. To transform them into decision-making problems, we removed their class labels and arbitrarily split their features into input and output variables. Furthermore, all feature values were min-max normalized to maintain coherence with the FCM formalism. For each of these datasets, we built an FCM model with. R = N − 1 input concepts and . M = 1 output concepts. For consistency in the experiment, the output concept is assumed to be the last feature in the dataset being processed. Since knowledge to be provided by human experts is not available for these datasets, we computed .W I as coefficients of a regression model with the form .ai(0) = w ji a (0) j + b where the bias term is zeroed for convenience. This is done as follows: w ji =
K
.
xki xk j − k xki k xk j , w ji ∈ W I K ( k xk2j ) − ( k xk j )2 k
(8.10)
where .xki is the value of the .i-th feature according with the .k-th instance, whereas K is the number of instances in the training set. In order to measure the quality of the predictions, we compute the Mean Squared Error (MSE) between the expected outputs and those produced by the FCM model for a given instance. This measure is defined as follows:
.
.
MSE =
K M )2 1 ∑∑( yki − yˆki K · M k=1 i=1
(8.11)
where . yki is the ground-truth value for the .i-the output concept according to the .k-th instance, while . yˆki is the predicted output. To obtain more consistent results in our numerical simulations, we report the average MSE after performing 10-fold cross-validation. Background Information The.k-fold cross-validation method is a validation technique in statistics and machine learning that relies on partitioning a sample of data into complementary subsets. In this procedure, the available data is split into .k smaller sets, which are referred to as folds. Then, a model is created using the training data, which is composed of .k − 1 folds. The trained model is tested on the remaining part of the data. The procedure is repeated .k times so that each fold is used for testing purposes. Finally, we average the outcomes of the performance measure computed for the test set in each repetition to estimate the model’s predictive performance.
8 Advanced Learning Algorithm to Create FCM Models … Table 8.2 Datasets used during the numerical simulations ID Dataset Instances 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
acute-inflammation acute-nephritis appendicitis balance-noise balance-scale blood echocardiogram ecoli glass glass-10an-nn glass-20an-nn glass-5an-nn haberman hayes-roth heart-5an-nn heart-statlog iris iris-10an-nn iris-20an-nn iris-5an-nn liver-disorders monk-2 new-thyroid parkinsons pima pima-10an-nn pima-20an-nn pima-5an-nn planning saheart tae vertebral2 vertebral3 wine wine-5an-nn
120 120 106 625 625 748 131 336 214 214 214 214 306 160 270 270 150 150 150 150 345 432 215 195 768 768 768 768 182 462 151 310 310 178 178
157
Attributes
Noisy
6 6 7 4 4 4 11 7 9 9 9 9 3 4 13 13 4 4 4 4 6 6 5 22 8 8 8 8 12 9 5 6 6 13 13
No No No Yes No No No No No Yes Yes Yes No No Yes No No Yes Yes Yes No No No No No Yes Yes Yes No No No No No No Yes
158
A. Jastrze˛bska and G. Nápoles
Fig. 8.3 Average MSE for different values for the sigmoid function parameters. The plot concerns 35 datasets involved in the experiment
8.6.1 Sensitivity to the Sigmoid Function Parameters Although the MP-based learning algorithm does not involve hyper-parameters, the hybrid FCM model uses a parametrized activation function. The sigmoid function’s slope and offset control its shape and often play a pivotal role in the model’s behavior. Therefore, it makes sense to study the impact of these parameters on the predictions produced by the model for the selected datasets. Aiming at performing such a sensitivity analysis, we train the FCM model on each dataset while varying the values of the .λ and .h parameters. For simplicity, it was assumed that the same parameter values are paired with each neural concept. Figure 8.3 visualizes the average MSE over all datasets involved in the experiment for various combinations of the .λ and .h parameters. The sensitivity analysis results have shown that the smallest prediction errors are obtained when .λ = h = 1. Conversely, setting .λ to .5 and .h to .0.8 resulted in the largest prediction errors overall. Such behavior matches our expectations since the sigmoid function will likely behave like a binary step function when the parameters are large. Thus, the resulting model fails to reproduce small changes in the data in successive iterations of the simulation process.
8.6.2 Comparison with Other Learning Approaches The following experiment will compare the deterministic MP-based learning rule with classical metaheuristic optimizers. We have employed popular evolutionary and population-based optimization methods to produce FCM models and compared average MSE. We have tested the following algorithms: Global-best Particle Swarm
8 Advanced Learning Algorithm to Create FCM Models …
159
Fig. 8.4 Average MSE of the FCM model using different optimization methods (lower values are preferred)
Optimization (PSO) [4] with .c1 = c2 = 2.05 and constriction coefficient equal to 0.72984, Real-Coded Genetic Algorithms (RCGA) [9] with crossover and mutation probabilities equal to .0.95 and .0.05, respectively, and Differential Evolution (DE) [21] with crossover probability equal to .0.9 and the differential weight equal to .2.0. The number of individuals in each population was set to .200. Also, the maximum number of allowed iterations was set to .200, but we terminated the optimization procedure if .20 consecutive iterations did not significantly reduce the fitness function value. Each experiment was repeated 10 times to mitigate the possible influence of randomness in the optimization process of these algorithms. Figure 8.4 visualizes the test MSE values averaged over the 35 datasets. The simulation results using the real-world datasets show that the MP-based learning rule achieved smaller prediction errors compared to the metaheuristic-based optimization strategies. The errors obtained with the FCM models trained with the metaheuristic optimizers are quite similar, with DE and PSO reporting the best and worst results within this group, respectively.
.
Tips Let us assume we repeat an experiment on various datasets using different machine learning algorithms. We quantify algorithms’ performance using some commonlyaccepted measures. Even though we obtain different numeric values of these statistics for other models, we still have to validate whether these differences are statistically significant. This can be done using a three-step methodology.
160
A. Jastrze˛bska and G. Nápoles
1. First, we use Friedman’s test, which is a non-parametric statistical test used to compare multiple related samples (i.e., the performance of our algorithms on the datasets). The null hypothesis . H0 is that the algorithms result in negligible prediction performance. The alternative hypothesis. H1 is that the algorithms result in significantly different processing qualities. 2. Next, we use the Wilcoxon signed-rank test to conduct a paired analysis of the algorithms’ performance. The null hypothesis . H0 states that there are no significant differences in the algorithm’s performance across datasets, while the alternative hypothesis . H1 reports significant differences. By analyzing the ranked differences for all relevant pairs, the test determines whether there is evidence to reject the null hypothesis and concludes that certain algorithms perform significantly better or worse than others. 3. Finally, we use the Bonferroni-Holm post-hoc procedure to adjust the . p-values produced by the Wilcoxon signed-rank test. It aims to control the family-wise error rate when performing a pairwise analysis by adjusting the . p-values obtained in each comparison. The adjusted . p-values are compared to the desired significance level to determine statistically significant results, reducing the likelihood of making false positive errors across multiple tests.
To determine whether the observed differences in performance are statistically significant, we will use the Friedman two-way analysis of variances. The . p-value = .1.72E − 11 < 0.05 suggests rejecting the null hypothesis . H0 for a confidence interval of 95%. This result can be interpreted as an indicator of statistically significant differences between at least two learning algorithms in the tested group. However, we need a pairwise analysis to contrast the MP-based learning rule against the metaheuristic-based learning algorithms. Table 8.3 displays the . p-value reported by the Wilcoxon signed-rank test for each pairwise comparison involving the MP learning rule, the negative (. R − ) and the positive (. R + ) ranks, the corrected . p-values as computed by Holm’s posthoc method and whether the null hypothesis is rejected or not for a 95% confidence interval. Note that the Holm-Bonferroni method is used to control the family-wise error rate resulting from multiple comparisons. The . p-values from Holm’s post-hoc method suggest rejecting the null hypotheses in all cases. This result confirms that the differences between the MP-based learning rule and the metaheuristic-based methods are statistically significant. In addition,
Table 8.3 Pairwise analysis using MP as the control method Algorithm p-value R.− R.+ RCGA PSO DE
2.16E-8 2.65E-8 3.53E-6
32 32 33
3 3 2
Holm
Hypothesis
2.19E-8 2.68E-8 3.27E-6
Rejected Rejected Rejected
8 Advanced Learning Algorithm to Create FCM Models …
161
Fig. 8.5 Average training time (in seconds) required by each optimizer algorithm to build the FCM models (lower values are preferred)
such differences translate into the control method producing smaller prediction errors since this algorithm reports fewer negative ranks. Other attractive features of the MP-based learning rule discussed in this chapter are its deterministic nature, the absence of sensitive hyper-parameters to be fine-tuned and its computational efficiency. The latter property is illustrated in Fig. 8.5, which reports the training time (in seconds) of each learning algorithm averaged over the 35 datasets used for simulation. The experiments were run on a MacBook Pro with a 1.4 GHz Quad-Core Intel Core i5 processor. The results show that the average training time of the MP-based learning rule is close to zero since it was able to produce the weights in milliseconds. In contrast, the metaheuristic-based optimization needed much more time to solve the same optimization problem, with RCGA and PSO being the fastest and DE being the slowest. Admittedly, these increased training times might result from different computational implementations or even the number of processes running in the computer at the time of running the experiments. However, the fact that these population-based metaheuristics rely on multiple agents and need to perform multiple iterations explains their poor scalability and computational efficiency.
8.7 Further Readings Sometimes, we have an initial weight matrix characterizing the causal relationships in an FCM model, and we desire to adapt these weights to a given initial activation vector. To accomplish that, an ample group of researchers resorts to Hebbian learning, which is a generalization of Hebb’s Rule defined as .∆wi = ηxi y, where .∆wi is the
162
A. Jastrze˛bska and G. Nápoles
change of the .i-th weight, .η is the learning rate, .xi is the .i-th input while . y is the output. When using Hebbian learning, we are given a predefined weight matrix, which is incrementally adapted by optimizing an objective function. Stula et al. [22], Papageorgiou et al. [15], Salmeron and Palos-Sanchez [18], and Stach et al. [20] are among the authors who studied methods that can be generally described as extensions of the Hebbian learning strategy for FCM models. While Hebbian learning might seem well-suited for refining FCM models resulting from participatory modeling, we strongly discourage practitioners from using Hebbian-based algorithms for learning FCM models. On the one hand, these algorithms often rely on a single instance to adapt the weights, thus making them unable to generalize beyond that sample. The generalization issues of these algorithms were exposed by Papakostas et al. [16] in a pattern classification context. On the other hand, Salmeron and Palos-Sanchez [18] conducted an interesting study showing that FCM models trained using Hebbian learning tend to increase uncertainty. In practice, these algorithms produce FCM models that fail to converge, which is often a desired feature in decision-making settings. The reader is referred to [16, 18] to dive into the limitations of Hebbian-based algorithms. Several metaheuristic-based learning algorithms have been developed as an alternative to Hebbian learning in recent years. They address sensible limitations of classic population-based approaches as their poor scalability, strong dependence on the parametric setting, and slow training. For example, a challenging issue of these algorithms is that they produce dense FCM models when trained on large-scale data. Some areas of such a learning scheme include gene regulatory network reconstruction and community modeling. Wu et al. [24, 25] elaborated on novel learning methods able to produce sparse weight matrices. In the extreme scenario, we also may need to employ an online learning scheme, as advocated by Wu et al. [26], to handle data so large that it does not fit in computer memory. An expert-knowledge-preserving FCM training system can alternatively be framed as a fuzzy decision-making system. Such an approach instantiated with the DEMATEL (Decision-Making Trial & Evaluation Laboratory) methodology was brought forward by Mazzuto et al. [8], while Hajek and Froelich [5] integrated TOPSIS (Technique for Order of Preference by Similarity to Ideal Solution) to learn FCMs. A simplified variant of such an approach can be based on a collection of fuzzy rules, as was presented by Amirkhani et al. [2] or Lee et al. [7]. These approaches utilize reasoning mechanisms to train a model, which on the one hand, increases transparency and allows explicit external knowledge formulation, but on the other hand, limits the practical capabilities of these models.
8.8 Exercises The following exercises are based on the example presented in Sect. 8.5 and involve modifying the algorithm’s parametric settings.
8 Advanced Learning Algorithm to Create FCM Models …
163
1. Build a new FCM model where the nonlinearity coefficient is .φ = 1.0 and refit the model on the training data. Why do the results worsen? 2. Build and refit the FCM model again with .φ = 1.0 and .T = 100. Why can we affirm that the resulting model is not useful for prediction? 3. Change the nonlinearity coefficient from 0 to 1 with a step of 0.1 and compute the prediction error for each configuration. Which coefficient value reported the lowest prediction error in your experiments?
References 1. S. Ahmadi, N. Forouzideh, S. Alizadeh, E.I. Papageorgiou, Learning fuzzy cognitive maps using imperialist competitive algorithm. Neural Comput. Appl. 26(6), 1333–1354 (2015) 2. A. Amirkhani, M.R. Mosavi, K. Mohammadi, E.I. Papageorgiou, A novel hybrid method based on fuzzy cognitive maps and fuzzy clustering algorithms for grading celiac disease. Neural Comput. Appl. 30(5), 1573–1588 (2018) 3. Y. Chi, J. Liu, Learning of fuzzy cognitive maps with varying densities using a multiobjective evolutionary algorithm. IEEE Trans. Fuzzy Syst. 24(1), 71–81 (2016) 4. M. Clerc, J. Kennedy, The particle swarm - explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evolut. Comput. 6, 58–73 (2002) 5. P. Hajek, W. Froelich, Integrating topsis with interval-valued intuitionistic fuzzy cognitive maps for effective group decision making. Inf. Sci. 485, 394–412 (2019) 6. W. Homenda, A. Jastrze˛bska, W. Pedrycz, Time series modeling with fuzzy cognitive maps: Simplification strategies, in Computer Information Systems and Industrial Management, ed. by K. Saeed, V. Snášel (Springer, Berlin, 2014), pp. 409–420 7. S. Lee, S.-U. Cheon, J. Yang, Development of a fuzzy rule-based decision-making system for evaluating the lifetime of a rubber fender. Qual. Reliab. Eng. Int. 31(5), 811–828 (2015) 8. G. Mazzuto, C. Stylios, M. Bevilacqua, Hybrid decision support system based on dematel and fuzzy cognitive maps. IFAC-PapersOnLine 51(11), 1636–1642 (2018). 16th IFAC Symposium on Information Control Problems in Manufacturing INCOM 2018 9. Z. Michalewicz, Genetic Algorithms+ Data Structures= Evolution Programs (Springer Science & Business Media, 2013) 10. S. Mkhitaryan, P.J. Giabbanelli, N.K. de Vries, R. Crutzen, Dealing with complexity: how to use a hybrid approach to incorporate complexity in health behavior interventions. Intell.-Based Med. 3–4:100008 (2020) 11. G. Nápoles, I. Grau, L. Concepción, L. Koutsoviti Koumeri, J.P. Papa, Modeling implicit bias with fuzzy cognitive maps. Neurocomputing 481, 33–45 (2022) 12. G. Nápoles, Y. Salgueiro, I. Grau, M.L. Espinosa, Recurrence-aware long-term cognitive network for explainable pattern classification. IEEE Trans. Cybern. 1–12 (2022) 13. G. Nápoles, J.L. Salmeron, W. Froelich et al., Fuzzy cognitive modeling: theoretical and practical considerations, in Intelligent Decision Technologies 2019, ed. by I. Czarnowski, R.J. Howlett, L.C. Jain (Springer, 2020), pp. 77–87 14. G. Nápoles, A. Jastrze˛bska, C. Mosquera, K. Vanhoof, W. Homenda, Deterministic learning of hybrid fuzzy cognitive maps and network reduction approaches. Neural Netw. 124, 258–268 (2020) 15. E.I. Papageorgiou, C. Stylios, P. Groumpos, Active Hebbian learning algorithm to train fuzzy cognitive maps. Int. J. Approx. Reas. 37(3), 219–249 (2004) 16. G.A. Papakostas, D.E. Koulouriotis, A.S. Polydoros, V.D. Tourassis, Towards hebbian learning of fuzzy cognitive maps in pattern classification problems. Expert Syst. Appl. 39(12), 10620– 10629 (2012)
164
A. Jastrze˛bska and G. Nápoles
17. M. Rosenblatt, Remarks on some nonparametric estimates of a density function. Ann. Math. Stat. 27(3), 832–837 (1956) 18. J.L. Salmeron, P.R. Palos-Sanchez, Uncertainty propagation in fuzzy grey cognitive maps with hebbian-like learning algorithms. IEEE Trans. Cybern. 49(1), 211–220 (2019) 19. J.L. Salmeron, W. Froelich, Dynamic optimization of fuzzy cognitive maps for time series forecasting. Knowl.-Based Syst. 105, 29–37 (2016) 20. W. Stach, L. Kurgan, W. Pedrycz, Data-driven nonlinear hebbian learning method for fuzzy cognitive maps, in 2008 IEEE International Conference on Fuzzy Systems (IEEE World Congress on Computational Intelligence) (2008), pp. 1975–1981 21. R. Storn, K. Price, Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. J. Global Optim. 11(4), 341–359 (1997) 22. M. Stula, J. Maras, S. Mladenovic, Continuously self-adjusting fuzzy cognitive map with semiautonomous concepts. Neurocomputing 232, 34–51 (2017). Advances in Fuzzy Cognitive Maps Theory 23. K. Wu, J. Liu, Robust learning of large-scale fuzzy cognitive maps via the lasso from noisy time series. Knowl.-Based Syst. 113, 23–38 (2016) 24. K. Wu, J. Liu, Learning large-scale fuzzy cognitive maps based on compressed sensing and application in reconstructing gene regulatory networks. IEEE Trans. Fuzzy Syst. 25(6), 1546– 1560 (2017) 25. K. Wu, J. Liu, X. Hao, P. Liu, F. Shen, An evolutionary multiobjective framework for complex network reconstruction using community structure. IEEE Trans. Evol. Comput. 25(2), 247–261 (2021) 26. K. Wu, J. Liu, X. Hao, P. Liu, F. Shen, Online fuzzy cognitive map learning. IEEE Trans. Fuzzy Syst. 29(7), 1885–1898 (2021)
Chapter 9
Introduction to Fuzzy Cognitive Map-Based Classification Agnieszka Jastrze˛bska and Gonzalo Nápoles
Abstract In this chapter, we elaborate on the construction of a FCM-based classifier for tabular data classification. The pipeline comprises exploratory data analysis, preliminary input processing, classification mechanism construction, and quality evaluation. The specifics of how to adapt an FCM to this task are discussed. We use a two-block FCM architecture. One block is specific to the input, and the second is used for class label generation. We have as many inputs as features and as many outputs as classes such that the weights are learned using Genetic Algorithms. The procedure is illustrated with a case study where we process a dataset named “wine”. The overall quality of a basic FCM-based classifier is shown, and the behavior of featurerelated activation values is studied. The chapter contains a complete Python code for the elementary FCM-based classifier. The reader may conveniently follow and replicate the discussed experiment. Therefore, this chapter is specifically dedicated to those who wish to get well-acquainted with the elementary FCM-based classification model. The secondary goal of this chapter is to introduce notions essential to tabular data classification. These notions are utilized in the next chapters devoted to more advanced data classification models.
9.1 Introduction Pattern classification aims to build a model that can automatically assign new observations to predefined categories based on their characteristics or features. This is typically done using statistical or machine learning techniques, where the classifier learns to identify patterns or relationships between the input features and the output A. Jastrze˛bska (B) Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland e-mail: [email protected] G. Nápoles Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_9
165
166
A. Jastrze˛bska and G. Nápoles
categories from a training set of labeled examples. The ultimate goal is to achieve accurate and reliable classification performance on unseen data. This generalization property of a classifier indicates the usability of the model derived from the training data. For the sake of consistency in the terminology in this chapter, we will refer to observations as instances, the variables describing these observations as features and the predefined categories as decision classes or labels. Example classification problems include: • recognizing if a person is healthy or ill (two classes: healthy and ill), • recognizing handwritten digits (ten classes, one for each digit), • distinguishing between fraudulent and legitimate behavior of a subject (two classes: legitimate and fraud), • recognizing phonemes in some recordings in English (there are 44 classes, because there are 44 English phonemes), Pattern recognition is a supervised machine learning task since the classification model is derived from a set of input-output instances that serve as learning examples.1 The traditional scenario assumes that each problem instance is associated with a single decision class. Furthermore, the decision classes are assumed to remain unchanged in the entire processing pipeline. The model that performs the classification task is referred to as classifier. These classifiers are built using a sample of annotated data called training data. Annotations may be attached manually or automatically. There are vast differences in the training data and annotations formats since classification is a much-needed task for various real-world problems. For example: • In the case of a classifier that aids a medical diagnosis, the training set could be automatically generated based on electronic medical records. Features could include measurements (e.g., blood pressure) and past diagnoses. The class label could indicate whether the patient was discharged early (‘Yes’) or not (‘No’). The classifier would then be used to predict whether a new patient would be suitable for an early discharge based on the observed measurements. • In the case of a training set for handwritten digit recognition, it would contain images with cut-out handwritten digits. A human input would be needed to create a dictionary, which would link each image with a class label. A pivotal component in the classification task is the nature of the data used to build the models. We may name four essential categories of data types: • tabular data—data that can be represented using the structured format of a table, in which each row corresponds to a single instance (one object) and each column corresponds to one descriptor (feature) of an instance,
1
This contrasts with other pattern recognition tasks such as clustering, which are unsupervised because the task is performed solely by looking at the features, in the absence of labels.
9 Introduction to Fuzzy Cognitive Map-Based Classification
167
• visual data—data that comes in the form of an image denoted as a matrix with pixels stored using a selected color representation scheme, • audio data—data that comes in the form of an audio recording, file is saved using some audio format suitable to represent a sound wave, • sequential data—data that comes in the form of a sequence of signals; it can be a sequence of numbers, symbols, video frames, observations, etc. In this chapter, we will elaborate on the issues related to building FCM-based classifiers from historical data. The chapter exemplifies a step-by-step a classifier construction procedure in which we address all essential elements of a pattern classification pipeline. Theoretical analyses are supported with a toy case study of data classification using an FCM-based classifier.
9.2 Preliminaries This section presents preliminary notions relevant to the domain of pattern classification. We pay particular attention to the potential issues that may arise when we move from theory to practice in classifier training.
9.2.1 Notions of Classification and Features In this chapter, we focus on tabular data where problem instances are described by features. Let us denote features as .x1 , x2 , . . . , x N , where . N denotes the number of problem variables or features. The domain of.x is denoted as. Dom(x) and determines what values the feature can assume. We may distinguish two primary families of features: qualitative (categorical) and quantitative (numerical) as shown in Fig. 9.1. Qualitative features can be further divided into nominal and ordinal. Nominal features do not have any natural order. For example, blood type (A, B, AB, O) does not have features
categorical (qualitative)
ordinal
nominal e.g. blood type
numerical (quantitative)
discrete
e.g. clothing size e.g. number of gears
Fig. 9.1 Taxonomy of features
continuous e.g. temperature
168
A. Jastrze˛bska and G. Nápoles
any natural order. Ordinal features induce a ranking among the values. For example, clothing size (S, M, L) can easily be ranked. Quantitative features can be either continuous or discrete. An example of a discrete feature is the number of gears in a car. An example of a continuous feature is temperature. Therefore, let .X = Dom(x1 , x2 , . . . , x N ) be the feature space and .Y = Dom(y) be the label space since we can see the decision feature as a special, distinguished feature that is always categorical in classification problems. If we are concerned with a two-class problem, we say that the problem is a binary classification problem. If we have more than two classes, we say that it is a multi-class classification problem. We also distinguish the case when we ought to recognize a single class. This scenario is called one-class classification. Mathematically speaking, building a classifier requires constructing a mapping . that performs the following operation: .
:X →Y
(9.1)
such that we predict . y based on the feature set .x1 , x2 , . . . , x N describing the problem. The stronger the classifier, the less mistakes it makes when predicting the class label. The true mapping . is unknown, hence the goal of a classification algorithm is to produce a candidate model that approximates this target function. The candidate models are known as hypotheses (or hypothesis functions), as they produce a candidate mapping from the features onto the target class. Background Information Pattern classification involves a specific data processing pipeline. The classifier will be trained using a chunk of available information called a training set. A training set should contain a representative sample of observations from each class. We envision the classifier “seeing” samples from each class and learning to distinguish between them. However, the efficiency of a classifier should not be evaluated on the samples it has seen during the training procedure. We wish to know the classifier’s efficiency “in action”, when it deals with previously unseen samples. Thus, when constructing a pattern classification model, we set aside a subset of available samples. We called this subset a hold-out set or a test set. Evaluation of classifier efficiency should first and foremost concern its performance on the test set.
9.2.2 Preliminary Processing There are certain details that may require our attention before building the classifier. The first is the encoding of nominal features. Let us assume that we are processing a feature called “pet” such that the domain is the set {cat, dog, rabbit, parrot}. As mentioned, there is no natural order between these elements. Encoding these
9 Introduction to Fuzzy Cognitive Map-Based Classification Table 9.1 One-hot encoded features related to favorite pets x_cat x_dog x_rabbit Animal Cat Dog Rabbit Parrot
1 0 0 0
0 1 0 0
0 0 1 0
169
x_parrot 0 0 0 1
features as consecutive integers (cat—0, dog—1, rabbit—2, parrot—3) is counterintuitive since it creates a situation where the difference between cat and dog has a smaller absolute value than between cat and parrot. It could lead a classifier to make nonsensical rules, such as estimating whether an input pet is ‘greater than’ a cat. To avoid this problem, we can use one-hot encoding for the feature. This technique uses a vector-based binary representation where the presence of the encoded value is marked as 1, and its absence as 0. It means that the nominal feature would be represented using a set of several binary features. The one-hot encoding of the different pets used in the example above is outlined in Table 9.1. In other words, the nominal feature “Animal”, denoted as .x in this example, is encoded by four binary features: x_cat, x_dog, x_rabbit, and x_parrot. Note that the number of problem features is increased after merging the original numerical features and the transformed features that emerged from one-hot-encoded procedure. This process is repeated for all nominal features in the dataset. In the case of ordinal features, we simply replace their values with consecutive integer numbers (e.g., clothing sizes S, M, and L can become 0, 1, and 2). The second standard preliminary processing element concerns feature normalization. It aims at expressing all problem features using an analogous range of values. When different features are brought into alignment, they are easier to compare. Furthermore, some popular classifiers like regression models train more efficiently when feature values are normalized. A very popular normalization scheme translates all features to the .[0, 1] interval. The most straightforward method to achieve it is by applying the min-max normalization formula: .
x' =
x − xmin . xmax − xmin
(9.2)
Naturally, we may use any fixed range .[a, b]. In this case, we apply Eq. (9.3): .
x' = a +
(x − xmin ) (b − a) . xmax − xmin
(9.3)
An alternative to the fixed-range normalization is feature standardization. It is based on the feature mean value .μ and standard deviation .σ :
170
A. Jastrze˛bska and G. Nápoles
.
x' =
x −μ . σ
(9.4)
Another feature-level issue concerns missing values. There may be many reasons for this situation. For example, a missing feature might appear when some measurement instrument has failed to record it. In some cases, missing feature values are supposed to be missing. For example, when a feature is related to a concept that should be absent for some type of instances. An example could be the “fuel tank capacity” feature for instances corresponding to electric cars. In some cases, we can safely fill in zeros for missing values. Yet sometimes, we need to program additional logic to handle such cases. One choice is to remove instances with missing values. This solution, in many cases, is not acceptable as it may reduce the dataset too much. A commonly acceptable alternative is to insert some value in place of the missing one. This procedure is called missing data imputation, which can be performed by using a value based on the feature alone (univariate approach such as mean, median, or mode) or based on other feature values (multivariate approach). Important All parameters used to preprocess the data are determined from the training data alone! These include the parameters used for normalization (see Eqs. (9.2)– (9.4)) as well as values to be imputed. The hold-out set must be adjusted accordingly, but using the same parameters determined from the training set. Now, imagine a training set contains 100 instances out of which 99 belong to the class “healthy” and one to the class “sick”. If we build a constant classifier that always predicts that a patient is healthy, it will report impressive results on the training data. However, it will lack usability in practice. This highlights a class imbalance problem, as several classification algorithms aim to maximize the accuracy on the most prevalent outcome at the detriment of less frequent outcomes. Background Information Imbalanced pattern classification refers to problems where the distribution of instances among the decision classes in a training set is significantly uneven. There is no formal definition of the minimum disproportionate relationship for assuming that a data set is unbalanced. Disproportions can be small or large. Therefore, we should always provide information about the imbalance relationship when operating with real-world data sets. Class imbalance can affect one or more pairs of classes. The decision class with the least instances is called the minority class, while the decision class with the most instances is called the majority class.
Class imbalance is easy to detect. All we need to do is to compute a frequency table for the dependent variable. Based on its content, we verify whether or not the problem we are dealing with is imbalanced. If it turns out that the problem is imbalanced, we
9 Introduction to Fuzzy Cognitive Map-Based Classification
171
shall consider training dataset transformations that would mitigate this problem. For example, we could undersample the minority class and/or oversample the majority to obtain better-balanced classes.
9.2.3 Performance Metrics When it comes to the quality of classification itself, we may measure it using a range of metrics. An ample number of these metrics are derived from the values present in the confusion matrix, which looks as follows: predicted outcome
actual value
True Positives (TP)
False Negatives (FN)
False Positives (FP)
True Negatives (TN)
The problem at hand distinguishes between instances from the positive class and the negative class. Based on these values, we compute other metrics. The most elementary one is classification accuracy, expressed as the number of correctly classified instances divided by all instances: accuracy =
.
TP + TN . TP + TN + FP + FN
(9.5)
Evaluating classification quality using the training set gives us information on how well the classifier adapted to the data it saw when it was being adjusted. Poor scores on the training set are a sign that the training process did not succeed. Unfortunately, encouraging quality scores on the training set does not ensure we can rest. The next step we should execute is quality evaluation on the hold-out set (in other words, test set). High-quality scores on the training set and low scores on the hold-out set indicate that the model adapted too much to the data it has seen and has poor generalization ability to new, previously unseen instances (i.e., overfitting). When dealing with imbalanced classes, it is often necessary to quantify individual class recognition. In this regard, two very popular measures shall be mentioned, precision and recall. Precision is given as follows: precision =
.
TP , TP + FP
(9.6)
172
A. Jastrze˛bska and G. Nápoles
while recall is given as recall =
.
TP . TP + FN
(9.7)
Another measure that is often used is the F1-score, which is computed as the harmonic mean of precision and recall: F1 =
.
2TP precision · recall =2· . 2 · TP + FP + FN precision + recall
(9.8)
We often receive a single dataset, which we must partition into training and test sets. Therefore, one shall consider maintaining appropriate proportions of instances in the training set. To do that, we can perform stratified sampling with respect to the class labels. In this manner, we ensure that all classes have adequate representation in the training set. There is no universal rule on what size our training set should be. It all depends on the difficulty of the problem at hand and the quality of the instances. Nonetheless, whenever our procedure includes a non-deterministic element (and sampling is non-deterministic), we must remember that quality evaluation should be adapted so that we can measure the variability of the outcomes. In most scenarios, non-determinism arises on two occasions. Firstly, it may appear at the stage of training and test set creation. Secondly, it may be present at the moment of classifier training, which often involves some sort of sampling. Quality validation for real-world applications In real-world data analysis schemes, quality evaluation goes beyond the training and hold-out sets. In these settings, data processing typically involves several algorithms that together process the incoming data. Imagine an automatic parking-fee computation system that recognizes license plates to calculate the time lapse between entry and exit times to a parking lot. In this scenario, at first we evaluate classifier quality on training set and hold-out set. However, these measurements are in a lab environment. These are highly controlled experiments that may not take into account all the nuances of the real-world. Thus, we ought to perform quality measurements after we deploy the system in a real-world environment. This would be typically done for one or more designated sites called pilot locations. It may turn out that measurements for pilot locations reveal flaws of our system. For instance, it may turn out that the model is incapable of recognizing symbols when light conditions are poor or when it is raining. In this case, the model would be revised.
9 Introduction to Fuzzy Cognitive Map-Based Classification
173
9.3 The FCM-Based Classification Model In this section, we present an FCM-based classification model using the reasoning mechanism presented in the previous chapter. Moreover, we explain how to compute the weight matrix using Genetic Algorithms.
9.3.1 Basic FCM Architecture for Data Classification When building an FCM-based model for tabular data classification, we distinguish neurons responsible for two tasks. The first group of neurons processes the inputs, while the second is responsible for the output generation. This two-block architecture for data classification is referred to as class-per-output model. In this architecture, input concepts represent the features describing the classification problem, whereas output concepts denote the decision classes. Therefore, there will be as many input concepts as features and outputs as decision classes. Secondly, we create the inner neural processing block where input concepts are connected to each other to create a feedback loop that processes the relationships between the inputted feature values. Finally, we create the outer neural processing block where the input concepts are connected with the output ones. Remark In the class-per-output architecture, the inner block transforms the input by correlating the input features, thus extracting high-level features. Notice that the number of high-level features will not be smaller than those in the input space but potentially more informative. The reader is referred to [20] for a richer discussion on classification and feature transformation with FCM models. The outer block is responsible for the decision-making process, which is class label assignment. Therefore, it connects the high-level features with the decision classes.
Figure 9.2 illustrates the class-per-output architecture for a problem having three features and two decision classes. The last block determines the class labels from the final activation values of the neurons in the outer block. In order to understand the decision-making process of FCM-based classifiers, we need to discuss how to derive abstract hidden layers from the temporal states produced by the network. Notice that this intuition is also present in other recurrent neural networks where the data is processed in a windowed form and hidden layers aim to capture time-related patterns. Hidden layers in a recurrent neural network are generated based on the procedure called unfolding.
174
A. Jastrze˛bska and G. Nápoles
Inner block 1
Outer block 1
labeling
2
2
3
Fig. 9.2 Class-per-output architecture for a classification problem with three input neurons (.x1 , .x2 , .x3 ) and two decision neurons (. y1 , . y2 )
Background Information Unfolding refers to the process of converting a sequence of temporal states into a sequence of abstract hidden layers. Such a sequence determines the length of a temporal pattern that the network will capture. In FCM-based models, we can interpret the activation vector in each iteration as an abstract hidden layer resulting from unfolding the recurrent neural architecture.
As introduced in Chap. 1 and detailed in Chap. 3 (Sect. 3.2), FCMs perform such that the model’s temporal states take the form .A(t) = ( (t) several(t)iterations (t) ) a1 , . . . , ai , . . . , a N , where .ai(t) denotes the value of the activation forwarded to the input of the .ith neural concept. Activation values are in the .[0, 1] range. More explicitly, the model processes the input in an iterative manner by applying the following formula: ( ) (t+1) .A = f A(t) W I , (9.9) where .W I is the weight matrix of size . N × N characterizing the interaction between neurons in the inner block. Moreover,. f is an activation function (such as the sigmoid function) that clips the values into the desired interval. Figure 9.3 visualizes the inner working of a traditional FCM-based classifier using the class-per-output architecture. After initializing the input neurons with the normalized feature values, we perform reasoning for a fixed number of iterations. In this figure, circular nodes denote the states generated within the inner layer, the squared node denotes the output layer, while .A(t) denotes the activation vector in the .t-th iteration. The reader can notice that the class assignment relies on .A(T ) , the
9 Introduction to Fuzzy Cognitive Map-Based Classification
A (0)
W
A (1)
W
W
...
175
A(
)
W
D
Fig. 9.3 Decision model of traditional FCM-based classifiers
inner block matrix .W I , and the weight matrix .W O that connects the inner block processing the inputs with the outer block producing the numerical outputs. The decision model of an FCM-based classifier refers to how the decision classes are derived from the concepts’ activation values after the reasoning process finishes. Unsurprisingly, the correctness of the classification procedure is closely related to how an FCM finishes its computations. The decision class is computed from the final activation values of output concepts. More specifically, the decision class is given by the output concept having the largest final activation value. In the elementary class-per-output model, we often assume that the activation values converge (ideally, a different fixed point for each decision class) and indicate which class label should be assigned to a sample fed to the map input. There is a risk that the outputs would not converge. In practice, this would manifest in different class assignment decisions based on the maximum number of iterations performed (.T ). In the worst-case scenario, the map converges to a unique fixed-point attractor, which would lead to a useless model that always returns the same class label regardless of the data fed to its input. The risk of encountering the two behaviors mentioned above affects the robustness of this classification model. As discussed in the previous chapter, the quasi-nonlinear reasoning mechanism is an effective way to avoid unique fixed-point attractors. In its matrix form, this reasoning rule can be expressed as follows: ( ) (t+1) .A = φ · f A(t) W + (1 − φ) · A(0) , (9.10) such that .
f (x) =
1 (1 +
e−λ·(x−h) )
.
(9.11)
Note that setting .φ < 1 will suffice to ensure that the decision classes computed by the class-per-output architecture depend on the initial conditions encoding the instance being classified. Of course, there might be situations where the network does not converge to a unique fixed-point attractor even if .φ = 1, particularly when performing a limited number of iterations during reasoning.
176
A. Jastrze˛bska and G. Nápoles
Fig. 9.4 Illustration of a chromosome encoding the .W I and .W O matrices. The flattened part corresponding to the .W I matrix is visualized in blue, while the part corresponding to the .W O matrix is marked pictured in orange
9.3.2 Genetic Algorithm-Based Optimization The weight matrix is the essence of an FCM-based classifier and must be constructed automatically from the available historical data (see Chap. 7). In the class-per-output architecture, the weight matrix .W is split into two sub-matrices .W I and .W O . The former gathers the weights in the inner block, while the latter gathers the weights connecting the inner block to the outer block. In contrast to the model discussed in the previous chapter, where only .W O was computed from data, the model in the section requires both matrices to be estimated from the training data. During the learning process, we impose certain quality requirements on how good a map should be. Typically, this is expressed as an error between responses computed by the FCM-based classifier and the ground-truth target values. Naturally, the smaller the error, the better. We are aware that we will not be able to construct a perfect model due to objective limitations such as insufficient training samples or their intrinsic quality. However, we aim to construct a model that performs well and does not overfit the training set. A prominent family of optimization algorithms capable of delivering us the solution with desired properties is based on heuristic rules that mimic how problems are solved in natural systems. In this section, we will use Genetic Algorithms (GA) to compute the weight matrices.W I and.W O . The first step to discuss is the way of representing both matrices in the chromosomes, which will be composed of real-coded genes. Therefore, each candidate solution will represent flattened versions of these matrices concatenated next to each other, as illustrated in Fig. 9.4. Since the .W I is a . N × N matrix and .W O is a . N × M matrix, the number of weights to be estimated from data is . N 2 + N · M. Notice that we do not enforce the constraint that a concept cannot influence itself in the inner block. This is inspired by the lack of causal meaning of weights learned from historical data without human intervention. The algorithm iteratively modifies chromosomes using selection, mutation, and crossover operators (which were introduced in Chap. 7). The procedure terminates when preset criteria are satisfied. These can be the maximal number of iterations to be executed, the condition checking if the fittest individual does not change notably in several subsequent iterations or when the fittest individual provides a ‘good enough’ solution. The nature of this algorithm (and of other similar heuristics) is that it does not ensure an optimal solution but a good enough model. Heuristic FCM optimization is a commonplace approach. Different authors discussed the advantages of various population-based and evolutionary algorithms. For instance, Liang et al. [12] elaborated on the use of Particle Swarm Optimization, Ighravwe and Mashao
9 Introduction to Fuzzy Cognitive Map-Based Classification
177
[8] demonstrated the use of Differential Evolution, Wu and Liu [22] addressed the use of memetic algorithms, while Wang et al. [21] showed how many-task evolutionary algorithm can be helpful in optimization of large FCMs. A review on different heuristic-based optimization for FCMs was conducted by Schuerkamp and Giabbanelli [17]. Note that particular implementations of GA differ in how they execute some actions. For example, regarding the selection operator that chooses which individuals are carried over to the next generation, we often see the following solutions: roulette wheel selection,2 rank selection,3 and tournament selection.4 By analogy, when doing the crossover implementation, there are some typical variants one may choose from. In particular, the method for selecting a pair of individuals that will be used as a base for offspring generation needs to be decided upon. Similarly to the selection operator, we have roulette, tournament, and rank-based strategies. Mutation typically takes place after selection and crossover are complete. Another relevant aspect to be defined is the fitness function, which quantifies the quality of each solution. Overall, it quantifies the extent to which a candidate solution encodes matrices leading to accurate predictions. Since fitness functions are typically oriented to maximization, we can measure the similarity between the predictions made by FCM-based classifier and the ground truth. This can be done by using any of the performance metrics introduced in Sect. 9.2.3 or simply relying on the similarities between the numerical predictions and the expected one-hot encoded outputs. This chapter uses the latter approach where the fitness function is calculated as the exponential of the negated Euclidean distance between the model’s outputs and the ground-truth output, as formalized below: ⎛ ⎞ M K ∑ ∑ . F = exp ⎝− || yˆki − yki ||⎠ , (9.12) i=1 j=1
where . K is the number of instances and . M is the number of classes, while .|| · || denotes the Euclidean distance. Notice that the target/output data is a matrix because decision classes must be one-hot encoded beforehand. Moreover, the fitness function will always produce values in the .[0, 1] interval. The fitness function formalized in Eq. (9.12) relies on the “distance” between the desired and predicted values. The obtained distance is turned into a negative value, to which we apply the exponential function to obtain a suitable fitness function.
2
Roulette wheel selection assumes that the probability for selection of an individual is proportional to its fitness. 3 Rank selection assumes that the probability for selection of an individual depends on its fitness rank within the population. 4 Tournament selection involves running several trials called tournaments among a few individuals that are chosen randomly from the population. The winner of each tournament is the individual with the highest fitness.
178
A. Jastrze˛bska and G. Nápoles
9.3.3 How Does the Model Classify New Instances? Let us give an example of an already-trained FCM and trace computations for a hypothetical instance that we pass to its input. The FCM is composed of two blocks: inner and outer. Assume that we have already trained it, and the weight matrices for the inner and outer blocks are given as: ⎛
⎞ −0.01 −0.01 −0.14 I 0.19 −0.68⎠ .W = ⎝ 0.02 −0.62 −0.40 0.69 ⎛
⎞ 0.64 −0.32 O .W = ⎝ 0.31 −0.73⎠ −0.58 1.00 Let us assume the the sigmoid function parameters are arbitrarily fixed as .λ = 7.03, the offset .h = 0, the nonlinearity coefficient .φ = 0.8, and .T = 10 is the number of iterations of the reasoning process. Based on the dimensions of matrices .W I and .W O one may immediately notice that it is a case with three input variables, since .W I is a .3 × 3 matrix. Similarly, we can conclude that there are two classes, since .W O is a .3 × 2 matrix. Detailed values in this section are all given by default with a two-digit decimal precision. Moreover, let us assume that the problem instance to be classified is encoded as a vector used to feed the FCM-based classifier: ( ) A(0) = 0.37 0.95 0.73 .
.
The first step toward classifying the instance relies on applying Eq. (9.10). Let us trace the computations done in the first iteration: ⎛ ⎞ −0.01 −0.01 −0.14 ( ) ( ) (0) .A W I = 0.37 0.95 0.73 × ⎝ 0.02 0.19 −0.68⎠ = −0.44 −0.12 −0.19 −0.62 −0.40 0.69 The above computations performs the .A(0) W I operation per Eq. (9.10). Values in the resulting raw activation vector are then transformed using the sigmoid function. The resulting vector is depicted below: .
( ) ( ) f −0.44 −0.12 −0.19 = 0.04 0.31 0.20 .
Finally, we multiply that vector by the nonlinearity coefficient .φ and add the vector (1 − φ) · A(0) . This gives us the activation vector:
.
( ) A(1) = 0.11 0.44 0.31 .
.
9 Introduction to Fuzzy Cognitive Map-Based Classification
179
We then repeat that procedure recurrently to obtain the remaining intermediate activation vectors, which are given as follows: ( ) A(2) = 0.24 0.54 0.41
.
( ) A(3) = 0.19 0.51 0.39
.
( ) A(4) = 0.20 0.51 0.41
.
( ) A(5) = 0.19 0.50 0.42
.
( ) A(6) = 0.19 0.49 0.44
.
( ) A(7) = 0.18 0.48 0.47
.
( ) A(8) = 0.17 0.46 0.51
.
( ) A(9) = 0.15 0.43 0.57
.
( ) A(10) = 0.14 0.40 0.66 .
.
The activation vector .A(10) is deemed the output of the inner block, since we stop at . T = 10 iterations. According to the logic illustrated in Fig. 9.3, we need to multiply that vector with weight matrix .W O that connects the inner and outer blocks: ⎛ ⎞ 0.64 −0.32 ) ( ) (10) .A W O = 0.14 0.40 0.66 · ⎝ 0.31 −0.73⎠ = −0.17 0.32 . −0.58 1.00 (
We can apply the sigmoid function to the vector outputted by the FCM-based classifier to obtain values in the .[0, 1] interval: .
( ) ( ) f −0.17 0.32 = 0.23 0.91 .
In the resulting vector, the activation value in the second vector dimension (.0.91) is higher than the value on the first dimension (.0.23). These computations indicate that the instance should be assigned the second class. Bottom-line classification quality We can compute the bottom-line performance quality expressed in terms of the accuracy measure. Bottom-line accuracy is a concept adhering to model validity evaluation. For balanced classification problems, the bottom-line accuracy is given by . M1 , where . M denotes the number of decision classes. This performance value
180
A. Jastrze˛bska and G. Nápoles
corresponds to a solution of a random class assignment procedure. We naturally wish to build models that are better than a random guess. For instance, if there are two decision classes, then a bottom line would be a 50-50 guess, hence a classifier should report accuracy values higher than 50%.
9.4 Classification Toy Case Study In this section, we showcase the experiment aiming at classification of a relatively simple dataset named “wine”, which is publicly available. This section contains a complete code for the elementary FCM-based classifier.
9.4.1 Data Description The dataset concerns the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents (denoting features) found in each of the three types of wines (denoting the decision classes). The number of instances for these classes is 59, 71 and 48, respectively. The class distribution indicates that the dataset is fairly balanced, so we can use the accuracy performance measure. To load this dataset one may take advantage of the code snippet below. 1 2
from s k l e a r n . d a t a s e t s i m p o r t l o a d _ w i n e from s k l e a r n . m o d e l _ s e l e c t i o n i m p o r t t r a i n _ t e s t _ s p l i t
3 4 5 6 7
# The wine d a t a s e t is l o a d e d # I n d e p e n d e n t v a r i a b l e s are kept in X # D e p e n d e n t v a r i a b l e is kept in y X , y = l o a d _ w i n e ( r e t u r n _ X _ y = True , a s _ f r a m e = True )
8 9 10 11 12
# The data is split into train / test sets # Train set will c o n t a i n 70% of s a m p l e s X_train , X_test , y_train , y _ t e s t = t r a i n _ t e s t _ s p l i t ( X , y , t e s t _ s i z e =0.30 , r a n d o m _ s t a t e =42)
The features are all real-valued and listed as follows: alcohol (f1), malic acid (f2), ash (f3), alcalinity of ash (f4), magnesium (f5), total phenols (f6), flavanoids (f7), nonflavanoid phenols (f8), proanthocyanins (f9), colour intensity (f10), hue (f11), od280/od315 of diluted wines (f12), and proline (f13). The data after downloading consists of 178 rows, each describing a problem instance (one object, one wine that was tested). There is no specific training/test set split, so we need to establish one on our own. Let us arbitrarily decide on a random
9 Introduction to Fuzzy Cognitive Map-Based Classification
181
stratified training/test set split in proportions 70/30, which gives us 124 instances in the training set and 54 instances in the test set. Table 9.2 outlines basic statistics for each problem feature using the portion of data regarded as the training set. The min/max values will be later used to normalize the data into the .[0, 1] interval. Moreover, we report the arithmetic average (mean), the standard deviation, and the lowest (Q1) quartile, median (Q2), and upper (Q3) quartile. Q1 is also called the 25th percentile and means that 25% of the values lie below this threshold. Q2 is also called the 50th percentile and Q3 is called the 75th percentile, and their interpretation is analogous to Q1. Since this problem involves three decision classes that are fairly balanced, the bottom-line accuracy is approximately 33%. Any classifier reporting lower prediction rates would be regarded worse than a random decision. The data will be transformed using the methodology outlined in Sect. 9.2. Since all features are numerical, we only need to normalize the features according to their minimum and maximum values (see Table 9.2) observed in the training data. A code snippet implementing the normalization step is given below. 1
from s k l e a r n . p r e p r o c e s s i n g i m p o r t M i n M a x S c a l e r
2 3 4 5 6 7 8
# The d a t a is s c a l e d to a d e s i r e d r a n g e . # Here , we s c a l e to the d e f a u l t [0 , 1] r a n g e . s c a l e r = M i n M a x S c a l e r () s c a l e r . fit ( X _ t r a i n ) scaled_X_train = scaler . transform ( X_train ) scaled_X_test = scaler . transform ( X_test )
Table 9.2 Descriptive statistics of independent variables in randomly drawn 70% of instances in the wine dataset concerning the training data Mean St. dev. Min Q1 Q2 Q3 Max Feature Alcohol Malic acid Ash Alcalinity of ash Magnesium Total phenols Flavanoids Nonflavanoid phenols Proanthocyanins Color intensity Hue od280/od315 of diluted wines Proline
12.96 2.40 2.37 19.50 100.88 2.27 2.01 0.36
0.84 1.10 0.27 3.48 15.37 0.65 1.01 0.13
11.03 0.89 1.36 10.60 70.00 0.98 0.34 0.13
12.29 1.64 2.22 17.35 88.00 1.70 1.17 0.27
12.95 1.90 2.36 19.50 98.00 2.25 2.13 0.34
13.70 3.17 2.52 21.50 108.50 2.80 2.81 0.45
14.83 5.80 3.23 30.00 162.00 3.88 5.08 0.66
1.59 4.97 0.96 2.60
0.58 2.19 0.23 0.73
0.42 1.74 0.48 1.27
1.25 3.17 0.78 1.83
1.56 4.55 0.98 2.78
1.95 6.11 1.12 3.18
3.58 10.80 1.71 4.00
737.27
304.71
278.00
498.75
666.00
924.25
1547.00
182
A. Jastrze˛bska and G. Nápoles
9.4.2 Classifier Implementation This subsection is devoted to implementing the FCM-based classifier using a classper-output architecture, including the learning algorithm that computes the weight matrices. We provide it in the form of a Python class, whose stub is presented in the code snippet below. The class stub does not contain a body of the functions as they will be provided opportunely in the section. This division was dictated by the need to improve the layout and make the narration easier. 1 2 3 4 5
i m p o r t n u m p y as np i m p o r t p a n d a s as pd import random import pygad from s k l e a r n . base i m p o r t B a s e E s t i m a t o r , ClassifierMixin
6 7
class FCM_GA ( BaseEstimator , ClassifierMixin ):
8 9
10 11
def _ _ i n i t _ _ ( self , T =5 , phi =0.8 , s l o p e =1.0 , offset =0.0) : """ I n i t i a l i z e s the F C M _ G A o b j e c t .
12 13 14
15
16
17
18 19 20 21 22
Parameters : - T : N u m b e r of i t e r a t i o n s in the r e a s o n i n g process . - phi : N o n l i n e a r i t y c o e f f i c i e n t u s e d d u r i n g reasoning . - s l o p e : S l o p e p a r a m e t e r for the s i g m o i d function . - o f f s e t : O f f s e t p a r a m e t e r for the s i g m o i d function . """ self . T = T self . phi = phi self . slope = slope self . offset = offset
23 24 25
self . W _ i n n e r = None self . W _ o u t e r = None
26 27 28
def s i g m o i d ( self , X ) : # Body of the " s i g m o i d " f u n c t i o n c o m e s here .
29 30 31
def r e a s o n i n g ( self , X ) : # Body of the " r e a s o n i n g " f u n c t i o n c o m e s here .
32 33 34
def fit ( self , X_train , Y _ t r a i n ) : # Body of the " fit " f u n c t i o n c o m e s here .
35 36
def p r e d i c t ( self , X _ t e s t ) :
9 Introduction to Fuzzy Cognitive Map-Based Classification 37
183
# Body of the " p r e d i c t " f u n c t i o n c o m e s here .
38 39 40
def c l a s s i f y ( self , X_test , t y p e c h a n g e ) : # Body of the " c l a s s i f y " f u n c t i o n c o m e s here .
41 42 43
def n o r m a l i z e ( self ) : # Body of the " n o r m a l i z e " f u n c t i o n c o m e s here .
In the class constructor, we initialize the number of iterations T (.T ), the nonlinearity parameter phi (.φ) in the quasi-nonlinear reasoning, the sigmoid function slope (.λ), and the sigmoid function offset (.h). The roles of these variables are duly commented in the provided code for guidance. Furthermore, member variables W_inner and W_outer denote the .W I and .W O matrices. Since these matrices are the target of the GA-based learning algorithms, they will be initialized with null values (set to None in the provided code). As a reminder, W_inner will contain weights connecting the neurons in the inner block (see Fig. 9.2). The algorithm will create W_inner from the training data with dimensions . N × N , where . N is the number of features. Similarly, W_outer will contain weights related to output generation in the outer block (see Fig. 9.2). The training procedure will also derive W_outer from the training data with dimensions . N × M, where. M is the number of decision classes. The implementation given in this chapter is supplemented with some suggestions related to default parameter values. One may change them to see their influence on the performance. Some experiments with these parameters will also be addressed later in this chapter. Let us then proceed with the description of the functions needed to implement the reasoning process in the inner block. The first small building block is a simple sigmoid function used as an activation function. 1 2 3
def s i g m o i d ( self , X ) : """ Sigmoid activation function .
4 5 6
Parameters : - X: Input values
7 8 9
10 11
Returns : - R e s u l t of the s i g m o i d f u n c t i o n a p p l i e d to the input values . """ r e t u r n 1.0 / (1 + np . exp ( - self . slope * ( X - self . offset )))
184
A. Jastrze˛bska and G. Nápoles
The sigmoid function will be used as a building block of the formalism that realizes FCM operations. These activities are coded in a function called reasoning, which body is given in the code snippet below. 1 2 3
def r e a s o n i n g ( self , X ) : """ P e r f o r m the FCM r e a s o n i n g p r o c e s s .
4 5 6
Parameters : - X: Input values .
7 8 9 10 11 12 13
14
Returns : - R e s u l t of the r e a s o n i n g p r o c e s s . """ X0 = X for t in range (1 , self . T +1) : X = self . phi * self . s i g m o i d ( np . dot (X , self . W _ i n n e r ) , self . slope , self . o f f s e t ) + (1 - s e l f . phi ) * X0 return X
It is worth noticing that the code concerning the reasoning function implements the reasoning formula given in Eq. (9.10). This formula was presented in Chap. 8 and then re-introduced in Sect. 9.3 using its matrix form. In the following code snippet, we present the logic of the model fitting step. It aims to create the weight matrices and involves a nested function that implements the fitness function GA-based optimization procedure. 1 2 3
def fit ( self , X_train , Y _ t r a i n ) : """ Fit the FCM - GA model to the t r a i n i n g data .
4 5 6 7 8 9 10 11
Parameters : - X _ t r a i n : Input t r a i n i n g data - Y _ t r a i n : O u t p u t t r a i n i n g data """ r a n d o m . seed (42) n _ i n p u t s = X _ t r a i n . shape [1] n _ o u t p u t s = Y _ t r a i n . shape [1]
12 13 14
self . W _ i n n e r = np . z e r o s (( n_inputs , n _ i n p u t s ) ) self . W _ o u t e r = np . z e r o s (( n_inputs , n _ o u t p u t s ) )
15 16 17 18
19 20 21
22
23
def f i t n e s s _ f u n c t i o n ( g a _ i n s t a n c e , solution , s o l u t i o n _ i d x ) : """ The f i t n e s s f u n c t i o n is c a l c u l a t e d as the e x p o n e n t i a l of the n e g a t i v e E u c l i d e a n d i s t a n c e b e t w e e n the p r e d i c t e d o u t p u t and the a c t u a l o u t p u t . """ X = s o l u t i o n . f l a t t e n () self . W _ i n n e r = X [: n _ i n p u t s * * 2 ] . r e s h a p e (( n_inputs , n_inputs )) self . W _ o u t e r = X [ - n _ i n p u t s * n _ o u t p u t s :]. r e s h a p e (( n_inputs , n _ o u t p u t s ) ) r e t u r n np . exp ( - np . l i n a l g . norm ( self . p r e d i c t ( X _ t r a i n ) Y_train ))
9 Introduction to Fuzzy Cognitive Map-Based Classification
185
24 25 26
# It is a s s u m e d a class - per - o u t p u t a r c h i t e c t u r e n u m _ g e n e s = n _ i n p u t s **2 + n _ i n p u t s * n _ o u t p u t s
27 28 29 30 31 32 33 34 35 36 37 38 39
# E x e c u t e the GA - based l e a r n i n g g a _ i n s t a n c e = pygad . GA ( s o l _ p e r _ p o p =50 , n u m _ g e n e r a t i o n s =100 , n u m _ p a r e n t s _ m a t i n g =10 , fitness_func = fitness_function , m u t a t i o n _ b y _ r e p l a c e m e n t = False , c r o s s o v e r _ p r o b a b i l i t y =0.9 , m u t a t i o n _ p r o b a b i l i t y =0.1 , n u m _ g e n e s = num_genes , i n i t _ r a n g e _ l o w = -1.0 , i n i t _ r a n g e _ h i g h =1.0 , r a n d o m _ s e e d =42)
40 41 42
g a _ i n s t a n c e . run () solution , _ , _ = g a _ i n s t a n c e . b e s t _ s o l u t i o n ()
43 44 45
46
47
X = s o l u t i o n . f l a t t e n () # store the best s o l u t i o n in the W _ i n n e r and W _ o u t e r matrices self . W _ i n n e r = X [: n _ i n p u t s * * 2 ] . r e s h a p e (( n_inputs , n _ i n p u t s ) ) self . W _ o u t e r = X [ - n _ i n p u t s * n _ o u t p u t s :]. r e s h a p e (( n_inputs , n_outputs ))
In the code above, we call the GA-based learning method. The GA implementation used in this example is taken from the pygad package5 and can be installed by calling pip install pygad from the command line. The initial domain of these weight matrices is restricted to the .[−1, 1] interval using the parameters init_range_low and init_range_high. However, the algorithm might still generate off-range values. Note that we fixed some parameters, such as the number of generations and chromosomes, for simplicity. The fitness function coded as a nested function is given in Eq. (9.12). After the algorithm has finished, we retrieve the best solution and store it as the content of the W_inner and W_outer matrices. Next, we give the code snippet for the predict function. It calls the reasoning function to process a given instance. Note the use of the sigmoid activation function, which is applied in the final steps of the computations.
5
https://pygad.readthedocs.io/en/latest/pygad.html.
186
1 2 3
A. Jastrze˛bska and G. Nápoles
def p r e d i c t ( self , X _ t e s t ) : """ P r e d i c t the o u t p u t for the given test data .
4 5 6
Parameters : - X _ t e s t : I n p u t test data .
7 8 9 10 11 12
Returns : - Predicted output . """ H = self . r e a s o n i n g ( X _ t e s t ) r e t u r n self . s i g m o i d ( np . dot ( H , self . W _ o u t e r ) , self . slope , self . o f f s e t )
The normalize function implements a normalization procedure for situations where the GA-based learning method produces weights outside the .[−1, 1] interval. This is done by dividing all weights by the maximal absolute weight and then calibrating the parameters that govern the sigmoid function. 1 2 3
def n o r m a l i z e ( self ) : """ N o r m a l i z e the w e i g h t s and p a r a m e t e r s . This f u n c t i o n e n s u r e s that the w e i g h t s are in the [ -1 ,1] i n t e r v a l .
4 5 6 7
""" W = np . h s t a c k (( s e l f . W_inner , self . W _ o u t e r ) ) mx = np . max ( np . abs ( W ) )
8 9 10 11 12 13
if mx > 1: self . W _ i n n e r /= mx self . W _ o u t e r /= mx self . slope *= mx s e l f . o f f s e t /= mx
Since the model itself is used for data classification, in the next code snippet we present the steps leading to class label assignment. 1 2 3
def c l a s s i f y ( self , X _ t e s t ) : """ C l a s s i f y the i n p u t test data .
4 5 6
Parameters : - X _ t e s t : I n p u t test data .
7 8 9 10 11 12 13
Returns : - Binary classification output . """ self . n o r m a l i z e () Y = self . p r e d i c t ( X _ t e s t ) r e t u r n ( Y == np . max (Y , axis =1 , k e e p d i m s = True ) ) . a s t y p e ( int )
9 Introduction to Fuzzy Cognitive Map-Based Classification
187
As illustrated in Sect. 9.3.3, in the class-per-output architecture, the index of the output neuron with the highest activation value after reasoning is used to label the instance being classified. Moreover, since the output data is expected to be one-hotencoded, it is convenient to return a boolean mask indicating the class to be produced for each instance. Notice that the whole procedure can operate in a mini-batch setting where multiple instances are processed simultaneously.
9.4.3 Classification—Overall Quality Now, we proceed to construct the FCM-based classifier for the wine dataset using our implementation. We need it to outperform the bottom-line accuracy since the classifier cannot be outperformed by a random decision. However, an accuracy value above 33% is not outstanding for most pattern classification fields due to the large number of misclassified instances it carries. To run the code described in the previous subsection, we can use the code snippet provided below. Moreover, such a piece of code computes the accuracy for the test set to explore the model’s performance on unseen data. 1
from s k l e a r n . m e t r i c s i m p o r t a c c u r a c y _ s c o r e
2 3 4
# R e p r e s e n t c l a s s l a b e l s as one - hot - e n c o d e d v e c t o r s . o n e _ h o t _ e n c o d e d _ y _ t r a i n = pd . g e t _ d u m m i e s ( y _ t r a i n )
5 6
7 8
# I n i t i a l i z e the m o d e l and run its f i t t i n g p r o c e d u r e . m o d e l = F C M _ G A ( T =10 , phi = 0 . 8 ) m o d e l . fit ( s c a l e d _ X _ t r a i n , o n e _ h o t _ e n c o d e d _ y _ t r a i n )
9 10
11 12
# A p p l y the t r a i n e d m o d e l to p r e d i c t c l a s s l a b e l s for test data . predict_test = model . classify ( scaled_X_test ) p r e d _ l a b e l _ t e s t = np . a r g m a x ( p r e d i c t _ t e s t , axis =1)
13 14 15
# C o m p u t e test set a c c u r a c y a c c u r a c y = a c c u r a c y _ s c o r e ( y _ t r u e = y_test , y _ p r e d = pred_label_test )
The parameters used in the experiments are as follows: the number of chromosomes in the population is set to 50, the number of generations is set to 100, the number of solutions used as parents is set to 10, and the crossover and mutation probabilities are set to 0.9 and 0.1, respectively. Finally, the number of iterations .T is set to 10 and the nonlinearity coefficient .φ is set to 0.8. We executed the FCM-based classifier and obtained confusion matrices for the training and test sets as depicted in Fig. 9.5. We can use these values to calculate the accuracy values for both the training and test data. This can be done by adding up the values in the main diagonal of each matrix and dividing by the number of
188
A. Jastrze˛bska and G. Nápoles
Fig. 9.5 Confusion matrices obtained with the FCM-based classifier using a fixed parametric setting for the training (left) and test (right) sets
instances in each dataset. Therefore, the training accuracy is .(40 + 49 + 32)/124 ≈ 97.58% while the test accuracy is .(19 + 19 + 13)/54 ≈ 94.44%. Based on both accuracy values, we can conclude that the FCM-based classifier trained with GA clearly outperformed the bottom-line, while misclassifying only five instances. Running a single experiment is insufficient since meta-heuristics often use random numbers to explore the search space and the performance can change if we use different data partitions. To alleviate this issue, we repeated the procedure 10 times6 and obtained an average test accuracy of 93.33%, a standard deviation of 4.07%, while the best-achieved accuracy was 98.15%. The overall performance indicates that the model is a fairly good classifier as the average accuracy suggests that in 9 out of 10 cases, we will make a right prediction. The standard deviation informs us about the scale of expected differences between subsequent runs of the procedure and it is relatively high. These differences are first and foremost due to different contents of the training set, which is drawn randomly from the entire dataset. To achieve a different draw, one shall use different seeds for random number generator (see the random_state parameter in the first code snippet in Sect. 9.4.1 devoted to loading the data). An essential parameter for the studied model is .φ, which controls what share of the current output is being contributed by a previous state of the model and the initial state of the model. Setting .φ = 1 results in a classical FCM formalism, while setting .φ = 0 implies no recurrence at all. The practical impact of this parameter on activation values is given in Fig. 9.6. The experiments presented in this figure concerned the case of .T = 10 iterations of the reasoning process. Figure 9.6 clearly indicates that.φ = 1 is the worst parametric setting. Let us recall that the signal in this setting is not being strengthened by the original feature values, which is expected to negatively influence the outcome of the classification procedure. 6
During a peer-review process, authors may need to justify the number of repeats. It can be calculated using the Confidence Intervals methods, for example by aiming for a 95% CI [19].
9 Introduction to Fuzzy Cognitive Map-Based Classification
(a)
= 0.0, accuracy = 96.30%
(b)
= 0.2, accuracy = 98.15%
(c)
= 0.4, accuracy = 96.30%
(d)
= 0.6, accuracy = 94.44%
(e)
= 0.8, accuracy = 96.30%
(f)
= 1.0, accuracy = 85.19%
189
Fig. 9.6 Confusion matrices and accuracy values for the test set when using various .φ values in the quasi-nonlinear reasoning rule
190
A. Jastrze˛bska and G. Nápoles
However, the results also show that the FCM-based classifier does not converge to a unique fixed-point attractor. If that situation emerges, then all instances would be placed along the same column of the confusion matrix.
9.5 Further Readings The literature includes several attempts to improve the performance of FCM-based classifiers. These approaches focus on how the dynamic model is built and how the class labels are derived. For example, Froelich [5] presented a deterministic procedure to find the optimal partition of the activation space of decision concepts. The intuition is that some decision classes would benefit from larger decision intervals. Following another research direction, Karatzinis et al. [10] introduced the functional weights, which is a promising approach towards increasing the generalization ability of FCM models devoted to pattern classification. Conversely, Sbahi and Stanfield [16] proposed to train a weight matrix separately from bias to improve classification accuracy. The choice of fitness function was also discussed in the literature. For example, when it comes to data classification, Amirkhani et al. [2] proposed to optimize different metrics to boost classification accuracy. Other studies combined the traditional formalism with other data mining algorithms to improve the effectiveness of FCM-based classifiers. Lefebre-Lobaina and García [11] presented a fusion of fuzzy clustering with association rule mining techniques to design an FCM-based classifier. In short, they used the fuzzy c-means algorithm to transform numerical features into categorical ones. Subsequently, data is represented in the form of a table representing its properties in a more general manner. The authors establish the causal relationships between concepts by relying on support and confidence measures for fuzzy sets. Nápoles et al. [14] also used clustering to obtain an FCM-based classifier devoted to multi-label classification. Their approach utilizes the notion of granularity, which is interpreted as a data property referring to the manner of how precisely (vs. how generally) data is represented. There were also several attempts at fusing techniques utilized in the deep learning domain with FCM models. An example of such a fusion was delivered by Mansouri and Vadera [13], who extended the Long Short-Term Memory-like behavior to FCMs. Karatzinis et al. [9] proposed a model that contains elements known from Convolutional Neural Networks (CNNs). Hilal et al. [7] also developed an FCM with a bird swarm optimizer that contains a feature extraction step similar to the one used in CNNs. Furthermore Wu et al. [23] proposed a system that combines FCM-like processing strengthened with a sparse autoencoder-based feature extraction block. Several authors also elaborated on transfer learning techniques from deep neural networks to FCMs, such as the approach in [1]. It shall be also mentioned that there is a distinct group of papers on FCMs, that focus on feature extraction for specific data types. In particular, speech and text recognition tasks require a special attention since their success largely relies on the
9 Introduction to Fuzzy Cognitive Map-Based Classification
191
feature extraction. In this domain, some insights may be derived from the works of Engome Tchupo et al. [4] or Zhang et al. [24]. The literature reports many FCM-based classifiers that use granular computing principles. This data processing paradigm refers to the approaches to data staging and model construction that pay particular attention to specificity, which is understood as how precisely raw data is represented. One may intuitively understand two most drastic cases: one when specificity (also referred to as granularity) is extremely low, and the data is represented in a manner so generic that the instances are indistinguishable. The second extreme scenario is when raw data is represented without generalization, leading to overfitted models establishing similarities between instances is not possible. One of these granular FCM-based classifiers is the Fuzzy-Rough Cognitive Network [15]. In this classifier, concepts denote fuzzy-rough regions that allow formalizing the causal relationships between them through well-defined construction rules. Therefore, fine-tuning the weight matrix is not required. Moreover, the initial activation values are defined using the inclusion degree of the instance’s similarity class into the different fuzzy-rough regions. Harmati [6] and Concepción et al. [3] independently studied the mathematical properties of this model, which allowed a better understanding of its building block and optimizing its topology. Sovatzidi et al. [18] studied an FCM-based classification model dedicated to images and utilized semantic granules to represent image characteristics.
9.6 Exercises The following exercises are based on the example presented in Sect. 9.4 and involve modifying the algorithm’s parametric settings. 1. Build and refit the FCM-based classifier using a larger number of iterations (for example,.T = 100) while retaining the configuration of the remaining parameters. Did the test accuracy value change significantly? 2. Build and refit the FCM-based classifier using a large number of iterations and different phi values. Which setting reported the best and worst results? 3. Repeat the experiment but using different slope and offset values. Some suggestions include slope .∈ {2.5, 3.0} and offset .∈ {−1.0, 1.0}. Which setting reported the best and worst results in this experiment?
References 1. T.G. Altundogan, M. Karakose, A new deep neural network based dynamic fuzzy cognitive map weight updating approach, in 2019 International Artificial Intelligence and Data Processing Symposium (IDAP) (2019), pp. 1–6
192
A. Jastrze˛bska and G. Nápoles
2. A. Amirkhani, M. Kolahdoozi, E.I. Papageorgiou, M.R. Mosavi, Classifying Mammography Images by Using Fuzzy Cognitive Maps and a New Segmentation Algorithm (Springer International Publishing, Cham, 2018), pp.99–116 3. L. Concepción, G. Nápoles, I. Grau, W. Pedrycz, Fuzzy-rough cognitive networks: theoretical analysis and simpler models. IEEE Trans. Cybern. 52(5), 2994–3005 (2022) 4. D.E. Tchupo, J.H. Kim, G.A. Macht, Fuzzy cognitive maps (fcms) for the analysis of team communication. Appl. Ergon. 83, 102979 (2020) 5. W. Froelich, Towards improving the efficiency of the fuzzy cognitive map classifier. Neurocomputing 232, 83–93 (2017). Advances in Fuzzy Cognitive Maps Theory 6. I.A. Harmati, Dynamics of fuzzy-rough cognitive networks. Symmetry 13(5) (2021) 7. A.M. Hilal, H. Alsolai, F.N. Al-Wesabi, M.K. Nour, A. Motwakel, A. Kumar, I. Yaseen, A.S. Zamani, D. Oliva, Fuzzy cognitive maps with bird swarm intelligence optimization-based remote sensing image classification (Intell, Neurosci, 2022) 8. D.E. Ighravwe, D. Mashao, Development of a differential evolution-based fuzzy cognitive maps for data breach in health-care sector fuzzy cognitive maps for data breach, in 2019 IEEE AFRICON (2019), pp. 1–5 9. G.D. Karatzinis, N.A. Apostolikas, Y.S. Boutalis, G.A. Papakostas, Fuzzy cognitive networks in diverse applications using hybrid representative structures. Int. J. Fuzzy Syst. 1–21 (2023) 10. G.D. Karatzinis, Y.S. Boutalis, Fuzzy cognitive networks with functional weights for time series and pattern recognition applications. Appl. Soft Comput. 106, 107415 (2021) 11. J.A. Lefebre-Lobaina, M. García, Data drive fuzzy cognitive map for classification problems, in Progress in Artificial Intelligence and Pattern Recognition. ed. by Y.H. Heredia, V.M. Núñez, J.R. Shulcloper (Springer International Publishing, Cham, 2021), pp.249–259 12. W. Liang, Y. Zhang, X. Liu, H. Yin, J. Wang, Y. Yang, Towards improved multifactorial particle swarm optimization learning of fuzzy cognitive maps: a case study on air quality prediction. Appl. Soft Comput. 130, 109708 (2022) 13. T. Mansouri, S. Vadera. Explainable fault prediction using learning fuzzy cognitive maps. Expert Syst. n/a(n/a), e13316 (2023) 14. G. Nápoles, R. Falcon, E. Papageorgiou, R. Bello, K. Vanhoof, Partitive granular cognitive maps to graded multilabel classification, in 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (2016), pp. 1363–1370 15. G. Nápoles, C. Mosquera, R. Falcon, I. Grau, R. Bello, K. Vanhoof, Fuzzy-rough cognitive networks. Neural Netw. 97, 19–27 (2018) 16. S. Sabahi, P.M. Stanfield, Neural network based fuzzy cognitive map. Expert Syst. Appl. 204, 117567 (2022) 17. R. Schuerkamp, P.J. Giabbanelli, Extensions of fuzzy cognitive maps: a systematic review (ACM Comput, Surv, 2023) 18. G. Sovatzidi, M.D. Vasilakakis, D.K. Iakovidis. Fuzzy cognitive maps for interpretable imagebased classification, in 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (2022), pp. 1–6 19. S. Robinson, The practice of model development and use simulation (2004) 20. P. Szwed, Classification and feature transformation with fuzzy cognitive maps. Appl. Soft Comput. 105, 107271 (2021) 21. C. Wang, J. Liu, K. Wu, C. Ying, Learning large-scale fuzzy cognitive maps using an evolutionary many-task algorithm. Appl. Soft Comput. 108, 107441 (2021) 22. K. Wu, J. Liu, Learning large-scale fuzzy cognitive maps under limited resources. Engin. Appl. Artif. Intell. 116, 105376 (2022) 23. K. Wu, K. Yuan, Y. Teng, J. Liu, L. Jiao, Broad fuzzy cognitive map systems for time series classification. Appl. Soft Comput. 128, 109458 (2022) 24. W. Zhang, X. Zhang, Y. Sun, A new fuzzy cognitive map learning algorithm for speech emotion recognition. Math. Probl. Eng. 1–12, 2017 (2017)
Chapter 10
Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers Gonzalo Nápoles and Agnieszka Jastrze˛bska
Abstract The chapter presents an FCM-based model for pattern classification termed Long-Term Cognitive Network (LTCN). This model uses the class-per-output architecture discussed in the previous chapter and the quasi-nonlinear reasoning rule to avoid the unique fixed-point attractor. To improve its prediction capabilities, the LTCN-based classifier suppresses the constraint that weights must be in the [–1, 1] interval while using all temporal states produced by the network in the classification process. As for the tuning aspect, this classifier is equipped with two versions of the Moore-Penrose learning algorithm. Besides presenting the mathematical formalism of this model and its ensuing learning algorithm, we will develop an example that shows the steps required to solve classification problems using an existing Python implementation. The chapter also elaborates on a measure that estimates the role of each concept in the classification process and presents simulation results using real-world datasets. After reading this chapter, the reader will have acquired a solid understanding of the fundamentals of these algorithms and will be able to apply them to real-world pattern classification datasets.
10.1 Introduction The previous chapter presented an evolutionary approach for learning FCM-based classifiers that avoided the issues posed by unique fixed-point attractors by using a quasi-nonlinear reasoning rule. However, there are two aspects that deserve further attention: the classifier’s predictive power and its interpretability. G. Nápoles (B) Department of Cognitive Science and Artificial Intelligence, Tilburg University, Tilburg, The Netherlands e-mail: [email protected] A. Jastrze˛bska Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland e-mail: [email protected] © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1_10
193
194
G. Nápoles and A. Jastrze˛bska
Fig. 10.1 Average covering of 200 synthetically generated FCM models
Let us first discuss the former issue and its possible causes. Unfortunately, FCMbased classifiers often underperform compared to traditional methods devoted to structured pattern classification. After excluding the unique-fixed attractor, which is the primary cause for issues when solving prediction problems with FCM models, this poor performance of FCM-based classifiers can be accounted for two primary reasons. Firstly, the network topology depends on the problem domain, and hidden concepts are not allowed. That means that the number of concepts in the network is given by the number of input variables and decision classes. Secondly, the activation values of concepts and the weights are confined to a closed interval, thus limiting the classifier’s approximation capabilities. A special situation related to weights and activation values being bounded emerges when using the sigmoid transfer function. Concepción et al. [6] mathematically proved that the state space of sigmoid FCM models will shrink as more iterations are performed. This result indicates that the feasible activation space containing the concepts’ activation values might be smaller than the .[0, 1] interval. Let us illustrate this point with the aid of some simulations using synthetically generated FCM models. Figure 10.1 shows the average covering of 200 synthetically generated FCM models that converged to a fixed point. In a nutshell, the covering of a neural concept quantifies the proportion of the activation space that is reachable. For example, a covering value of 0.5 means that the concept’s activation values will reach 50% of the whole activation space at most. In this experiment, the number of neural concepts was varied from 5 to 30 and the weights were uniformly distributed in the .[−1, 1] interval. The network connectivity was also randomly varied from 10% to 100% to simulate different network configurations and densities. The message that this simulation unmistakably conveys is that obtaining high covering values is only possible with large and densely connected networks, which are not always possible due to design constraints.
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
195
In terms of interpretability, FCMs provide notable advantages over other machine learning models, as stressed by Apostolopoulos and Groumpos [1], Sovatzidi, Vasilakakis, and Iakovidis [19] and Jastrze˛bska and Cisłak [10]. This property refers to the visual depiction of the cognitive network as a signed digraph composed of meaningful concepts and their dependencies. Because of this, FCM-based classifiers can be accounted to the group of Explainable Artificial Intelligence methods. However, an important distinction shall be made about the transparency of the dynamic and static models attached to FCM-based classifiers. While the static model refers to the graph structure, the dynamic model refers to the recurrent reasoning mechanism. The former is considered a white box since both neural concepts and weights have a specific meaning for the problem domain. In contrast, the dynamic model (everything that happens from the moment we feed the FCM-based classifier with a given input until we produce a decision class) is deemed a black box. This chapter will cover theoretical and practical aspects of an FCM-inspired classifier proposed by Nápoles, Salgueiro, Grau, and Leon-Espinosa [14] that attempts to overcome the accuracy issues of traditional FCM classifiers. Moreover, we will discuss a procedure to estimate the relevance of neural concepts in the classification process to enhance the model’s interpretability.
10.2 Long-Term Cognitive Network-Based Classifier This section elaborates on the different blocks devoted to increasing the performance of FCM models used in classification tasks. These building blocks cover a generalization of the classic FCM formalism, a recurrence-aware reasoning rule, and a learning algorithm that uses the Moore-Penrose inverse.
10.2.1 Generalizing the Traditional FCM Formalism The first challenge to be resolved concerns the limited approximation capabilities of FCM-based classifiers. While we are not allowed to add explicit hidden neurons and layers, as happens with other neural models, nothing prevents us from having more expressive weights. This can be done by assuming that weights computed from historical data using a learning algorithm do not necessarily have a causal meaning and can take any real values instead of being confined to the [–1, 1] interval. Note that not having weights denoting causal relationships does not hinder the network’s interpretability since these weights can be understood as coefficients in the regression model .ai(t+1) = f (w ji a (t) j ). Another modification that will enhance the FCM’s prediction power consists of loosening the constraint that a concept cannot be considered a cause of itself. Therefore, having an explicit auto-regressive component allows for the concept’s activation values to be determined by the other connected concepts and its own previous activation value.
196
G. Nápoles and A. Jastrze˛bska
The Long-term Cognitive Networks (LTCNs) [15] implement these modifications, thus leading to a generalization of the classic FCM formalism. In a nutshell, the main advantage of this model is that it allows expanding the covering of its neural concepts, which translates into improved prediction capabilities. Note that this model by itself does not prevent the network from converging to a unique fixed-point attractor. Therefore, an LTCN-based classifier will benefit from being coupled with the quasinonlinear reasoning mechanism presented in Chap. 8. Background Information The LTCN model was originally aimed at capturing long-term dependencies between input and output variables, specifically in scenarios requiring multiple outputs simultaneously. In that model, it was assumed that experts provided the whole weight matrix characterizing the interaction between the neural concepts. This model used a nonsynaptic learning algorithm to improve the network’s accuracy by fine-tuning the sigmoid function parameters attached to each neural concept instead of the weights, as usually happens when training neural systems. This strategy is comparable to the calibration method presented in Chap. 8, which recomputed the sigmoid function offset after removing some relationships from the network.
Similarly to the previous chapter, we can rely on the class-per-output architecture to design an LTCN-based classifier. As a reminder, this architecture has two neural blocks: the inner and outer blocks. The former contains neurons that represent input variables connected with each other, while the latter contains the output neurons that receive incoming connections from the inner block.
10.2.2 Recurrence-Aware Decision Model While the quasi-nonlinear reasoning rule ensures that the unique fixed-point attractor will not appear, the mathematical analysis in [12] does not ensure that the network will converge. That means that cyclic and chaos could still be observed. Moreover, the FCM-based classifier discussed in the previous chapter does not use the information concerning the temporal states (i.e., intermediate activation vectors) to determine the decision class to be assigned to each instance. Instead, decisions are made from the last concepts’ activation vectors after the reasoning process is stopped, allowing for a limited number of weights to be computed by the learning algorithm. Actually, in the traditional view, the number of parameters to be estimated is given by the number of problem features multiplied by the number of decision classes. Let us suppose we additionally connect the temporal estates produced by the model with the output concepts. In that case, we can account for the convergence of the network while increasing the number of learnable parameters.
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
197
R (0) R (1) A (0)
W
A (1)
W
...
B 1.0
W
A(
B ...
W
)
R( )
1.0
D
Q 1.0
Fig. 10.2 Recurrence-aware decision model of LTCN-based classifiers
Figure 10.2 outlines the recurrence-aware decision model that exploits the information of the network’s temporal states. This model involves four types of weights characterizing the interactions between neural concepts. While .W defines the relationships between input concepts in the inner block,.R establishes direct relationships between the temporal states produced during reasoning with the output concepts. The reader can notice that this model incorporates bias concepts that are permanently activated (denoted with the gray nodes). Therefore, .B and .Q gather the bias weights attached to the temporal states and the output concepts, respectively. We would like to point attention to the.R(t) matrices connecting the temporal states with the output concepts. On the one hand, they allow using the trajectory towards the final state induced by the initial activation vector as additional information to produce the decision class. That is why the decision model is “recurrence aware”. On the other hand, the number of additional parameters to be estimated is given by . N × T × M, thus increasing the classifier’s generalization capabilities. Note that connections in .R are not visualized in the graph because they depend on the iteration number. That is why they have associated a superscript denoting the iteration of the temporal state being connected with the outer block. Background Information It is worth mentioning that the notion of forwarding raw input signal (or slightly altered input signal) in a manner that skips a few layers in a neural model is the core idea behind the extremely popular and effective neural architecture called a ResNet (Residual Neural Network) introduced by He et al. [8]. ResNet allows the so-called skip connections or shortcuts that omit some layers.
Let us discuss how to use these weight matrices to perform reasoning. To lighten the notation as much as possible, we will express the equations in their matrix form.
198
G. Nápoles and A. Jastrze˛bska
Therefore, we should first elaborate on the shape of these matrices and the information they convey. .W is an . N × N matrix and contains the weights between the neurons in the inner block and .B denotes a bias matrix of size .1 × N . While the bias term increases the network’s approximation power, their interpretation differs from other weight terms. However, one can inspect these terms to determine how much the predictions rely on problem features. If the bias term is zero, then we can claim that concept’s activation value exclusively depends on the connected concepts. If the bias is different from zero, then it captures the inherent value of the concepts’ activation values that the connected concepts cannot explain. These two matrices will be automatically learned from data in an unsupervised manner (i.e., without using the information of labels attached to each problem instance). .R is an . N (T + 1) × M matrix that implements the recurrence-aware mechanism. .Q holds bias values for each decision neural concept and, thus, is of size .1 × M. These matrices will also be automatically learned from the training data but using a supervised approach (i.e., using the information of both problem features and labels). Note that it is assumed that all recurrence-aware weight matrices resulting from the learning process will be gathered in a single matrix of size . N (T + 1) × M. Therefore, it would be convenient to gather all temporal states produced by the network in a single matrix of size .1 × N (T + 1). Mathematically, this can be done by concatenating the temporal states horizontally as follows: ) ( (T ) .Hk = A(0) |A(1) |A(2) | . . . |A(T −2) |A(T −1) |A(T ) . (10.1) Due to the particularities of this LTCN-based classifier, we will have different reasoning rules in the inner and outer blocks. • Reasoning in the inner block. It uses the weight matrices W and B to implement the quasi-nonlinear reasoning rule that includes a bias term, ( ) A(t) = φ f A(t−1) W + B + (1 − φ)A(0) .
.
(10.2)
• Reasoning in the outer block. It uses the weight matrices R and Q to implement the recurrence-aware mechanism that includes a bias term, ( ) (T ) .Y = f Hk R + Q . (10.3) Although the previous mathematical formalization of the LTCN-based classifier’s reasoning focuses on processing a single input at a time, the matrix form allows processing batches of instances without any further modification. That feature makes both reasoning and learning steps quite efficient while reducing the implementation hassle that iterative procedures typically require.
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
199
10.2.3 Learning Algorithm for LTCN-Based Classifiers Having outlined the architecture and reasoning issues, let us discuss how to compute the W and B matrices in the inner block and the R and Q matrices arriving at the outer block. Similarly to the previous chapter, we assume that the training data is split into two matrices X and Y. The former is of dimension . K × N and contains the problem instances described by their features. The latter is of dimension . K × M and contains the one-hot encoded expected outputs. Let us start by addressing the inner block. The matrices in this block are computed using an unsupervised learning approach that exploits the information stored in the problem features while excluding the class labels from this analysis. Overall, the goal is to learn the relationships characterizing the interaction between the features and the bias term. To compute the .i-th column of the matrix .W and the corresponding bias value, we will use the following formula: ( ) ( )−1 bi = L L L Xi , . (10.4) Wi where .Xi represents the .i-th column of the training set. .L is a . K × (N + 1) matrix obtained after replacing the .ith column of the matrix .X with zeros and concatenating a . K × 1 column vector full of ones, where . K is the number of training instances. We would like to point the reader back to Fig. 10.2, where the first block is depicted with the use of circles and appropriate matrices involved in the computations are noted on the edges. Notice that most programming languages allow parallelizing this operation such that all columns are computed simultaneously. The matrices connecting the inner and outer blocks are computed using a supervised learning approach. These weights connect the problem features and temporal states produced by the inner block with the decision classes, which are gathered in the weight matrix .R and the bias terms associated with the output concepts. Equation (10.5) displays how to execute this procedure in a single step, ( ) ( )‡ R . (10.5) = H(T ) |1 Y, Q where the .1 symbol denotes a . K × 1 matrix filled with ones and .(·)‡ represents the MP inverse [16], which is a convenient choice to solve the least squares problem when a matrix is not invertible [3]. We refer the reader back to Fig. 10.2, where we have visualized where the .R and .Q matrices are located within the classifier’s decision process. Note that the more iterations the LTCN-based classifier performs, the larger the number of parameters to be estimated by the supervised learning algorithm. This is a pivotal difference from other FCM-based models, where performing more iterations often damages their predictive performance due to convergence issues. However, the number of iterations parameter should be wisely configured as it can still negatively impact the algorithm’s efficiency.
200
G. Nápoles and A. Jastrze˛bska
•
! Caution:
While convergence to a unique fixed point will not bring any overflow issues when evaluating exponential-based activation functions, the algorithm’s efficiency might be affected if we perform too many iterations. For example, every iteration the model performs after convergence concatenates a repeated state vector (or state matrix if we are operating with batches) to the .H(T ) matrix. These repeated states bring no additional information that can be exploited to increase the model’s discriminatory capabilities. Instead, they make the learning algorithm unnecessarily slower and potentially unstable from a numerical viewpoint.
Another issue that might arise when the LTCN-based classifier performs too many iterations is overfitting. This situation typically emerges when machine learning models are over-parametrized (i.e., there are too many learnable parameters to be estimated using the available training data). Overfitted models tend to perform very well on the training data and poorly on the test data. One popular strategy when training deep neural networks is to enforce a regularization strategy to force them to learn general patterns, not specific ones involving noise. Background Information Regularization is a technique in machine learning aimed at preventing overfitting by adding a penalty term to the optimization process that computes a model’s learnable parameters. Tikhonov regularization, also known as ridge regression or . 2 regularization, is one such method. It introduces a penalty term proportional to the square of the Euclidean norm of the model’s parameters. This penalty term encourages smaller parameter values, effectively shrinking the parameter estimates toward zero and reducing the impact of individual features. By controlling the regularization strength through a user-defined hyperparameter .α ≥ 0, this regularization technique strikes a balance between fitting the training data and controlling the magnitude of the parameters, resulting in more robust models.
Equation (10.6) displays another learning rule that incorporates a ridge regularization term to control overfitting when estimating the .R and .Q matrices, ( ) )− ( R . (10.6) = Φ Φ + αD Φ Y Q where .Φ = (H(T ) |1) such that .1 represents a . K × 1 matrix filled with ones, .D is the diagonal matrix of .Φ Φ, while .(·)− and .(·) denote the matrix inverse and transpose operations, respectively. As mentioned,.α ≥ 0 is a hyperparameter that indicates how
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
201
much regularization will be applied. The larger the value of this hyperparameter, the more regularized the model will be. However, excessively large penalty values might cause the model to underfit the training data.
10.3 Model-Dependent Feature Importance Measure After training, the LTCN-based classifier is ready to predict the corresponding class labels for unseen instances. Furthermore, the static model learned from data can be analyzed to determine the most important concepts in the decision-making process. This can be done by inspecting the relationships that connect the concepts. We emphasize that the LTCN-based classifier does not contain hidden neurons that might hinder understanding how the model makes decisions, thus facilitating its analysis. In this section, we will use a centrality-based scoring function to compute concept relevance. It is deemed a model-specific post-hoc explanation method since it uses the network’s inner knowledge structures. The first neural block allows each neural concept to be directly linked with others in the input layer. This is perhaps the most exciting part of the LTCN model from the perspective of a person interpreting it. Naturally, the human capacity to grasp and understand relationships deteriorates as the number of problem variables increases. Therefore, when it comes to human understanding, the more straightforward the model, the easier to interpret. At the same time, reducing the number of problem variables induces a risk of lowering the model’s prediction capabilities. We seek a method to evaluate the variables so that if we decide to remove some, we can point toward the worst ones according to the assumed criterion. Nápoles et al. [14] developed a scoring function to estimate the relevance of problem features encoded as neural concepts in LTCN-based classifiers. This modelspecific post-hoc method relies on the degree centrality of concepts in the classifier’s static model to determine influential neural entities. To compute the relevance score .Ω of neural concepts in the model, we consider the weights in both sub-networks of the LTCN-based classifier. Weights related to biases are not considered since there is no direct mapping between them and the input. Equation (10.7) shows the centrality-based score function for the .i-th input concept in an LTCN-based classifier with . M output concepts, Ω(ci ) =
N ∑
.
j=1
|wi j | +
T M ∑ ∑
(t) (t) |ri(t) j |, wi j ∈ W, ri j ∈ R .
(10.7)
j=1 t=0
By evaluating the concepts, we assess features. We assume that significant weights are the ones with high absolute values. This notion is relatively straightforward and was also empirically shown in the previous chapter. The proposed scoring function allows comparing the variable describing the data through the learned weights. The score heavily relies on the values considered
202
G. Nápoles and A. Jastrze˛bska
during the recurrence-aware reasoning step. Therefore, it captures – to some extent – the dynamic behavior of the model. Nonetheless, complete information about the dynamic aspect of the model is absent because the relevance score does not consider the model’s activation values and convergence status.
10.4 How to Use the LTCN-Based Classifier in Practice? In this section, we will describe how to apply the LTCN-based classifier to a realworld classification dataset using a publicly available Python implementation. To install the required package, we can just call the command pip install ltcn from the Python command line or a Jupyter Notebook. To develop our example, we will use the Iris flower dataset, which is a well-known and frequently used benchmark dataset in machine learning and pattern recognition. It consists of measurements from 150 instances of Iris flowers, each represented by four features: sepal length, sepal width, petal length, and petal width. The dataset is labeled with three different species of Iris flowers: Setosa, Versicolor, and Virginica. Importantly, there are 50 instances for each class, making it a perfectly balanced problem. Table 10.1 shows an excerpt of this dataset. Let us load the data, already available in the sklearn library, one-hot encode the decision classes and create a .k-fold cross-validation object to inspect the model’s generalization capability. In short, this feature refers to a machine learning model’s ability to perform well on unseen data, thus measuring how effectively the model can learn patterns and make accurate predictions beyond the training data. A model with good generalization capability exhibits a balance between fitting the training data well and avoiding overfitting, which occurs when the model memorizes the training examples but fails to generalize to new instances.
Table 10.1 Randomly selected instances from the Iris Flower dataset Problem features Decision classes Sepal length Sepal width Petal length Petal width Setosa Versicolor 5.1 4.9 6.3 5.8 5.0 6.4
3.5 3.1 2.9 2.7 3.4 2.7
1.4 1.5 5.6 4.1 1.6 5.3
0.2 0.1 1.8 1.0 0.4 1.9
Virginica
. . . . . .
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
203
Background Information Stratified .k-fold cross-validation is a technique used to evaluate classifiers. It divides the data into .k equally sized folds and ensures that each fold contains a proportional number of instances from each decision class. This approach is particularly useful when working with imbalanced datasets, where one or more classes may have much fewer instances than the others. By stratifying the available data, we can ensure that the model is trained on a representative sample of each class, which can improve its performance and reduce the risk of overfitting.
The following code snippet implements the necessary steps to load and prepare the data to be used to create the LTCN-based classifier. 1 2 3
i m p o r t n u m p y as np i m p o r t p a n d a s as pd from s k l e a r n i m p o r t d a t a s e t s
4 5 6
# The Iris d a t a s e t is l o a d e d as a P a n d a s D a t a F r a m e iris = d a t a s e t s . l o a d _ i r i s ( a s _ f r a m e = True )
7 8 9 10 11
# X # Y
We r e t r i e v e the X v a l u e s as they are = iris . d a t a . v a l u e s We one - hot - e n c o d e the t a r g e t v a l u e s = pd . g e t _ d u m m i e s ( i r i s . t a r g e t ) . v a l u e s
Let us break down this piece of code snippet into two separate steps to understand the functions used to prepare the data for learning. • Firstly, the Iris dataset is loaded using the load_iris function from datasets. By setting as_frame=True, the dataset is loaded as a Pandas DataFrame, allowing for easier data handling and analysis. • Secondly, the input features (X) are retrieved from the loaded dataset using the data.values attribute of the Iris DataFrame. These values represent the measurements of sepal length, sepal width, petal length, and petal width for each iris sample. The target values (Y) are one-hot-encoded using the get_dummies function from Pandas. This transformation converts the categorical target variable (Iris species) into a binary representation, where each class has a separate column with a binary value indicating its presence. The following step consists of creating a stratified 5-fold cross-validation object and create the partitions. This is done as follows: 1 2
# We c r e a t e a 5 - f o l d corss - v a l i d a t i o n o b j e c t skf = S t r a t i f i e d K F o l d ( n _ s p l i t s =5 , s h u f f l e = True , r a n d o m _ s t a t e =42)
Again, let us break down the piece of code to understand the role of these functions and the role of their parameters and values.
204
G. Nápoles and A. Jastrze˛bska
• Firstly, a 5-fold cross-validation object is created using the StratifiedKFold class from sklearn.model_selection. Stratified k-fold cross-validation ensures that the class distribution is preserved in each fold, which is important for maintaining representative training and test subsets. • Secondly, the get_n_splits function of the cross-validation object is called to generate the splits used during cross-validation. This function takes the input features (X) and the target values (Y) as arguments and returns the number of splits based on the specified number of folds. We are ready to use the partitions created by the get_n_splits function to build the classifier and test its performance on unseen data. We will resort to the Cohen’s kappa score to measure performance, although the accuracy metric is a viable option since the decision classes are perfectly balanced. 1 2
s c o r e s = [] for t r a i n _ i n d e x , t e s t _ i n d e x in skf . split (X , iris . target ):
3 4 5 6
# We r e t r i v e the s p l i t ’s t r a i n i n i g and test data X_train , X _ t e s t = X [ t r a i n _ i n d e x ] , X [ t e s t _ i n d e x ] Y_train , Y _ t e s t = Y [ t r a i n _ i n d e x ] , Y [ t e s t _ i n d e x ]
7 8
9
10
# We c r e a t e the c l a s s i f i e r u s i n g the t r a i n i n g data model = LTCN ( T =10 , phi =0.9 , f u n c t i o n = ’ s i g m o i d ’ , m e t h o d = ’ r i d g e ’ , alpha =1.0 E -4) model . fit ( X_train , Y _ t r a i n )
11 12
13
# We p r e d i c t one - hot e n c o d e d c l a s s e s for the test data Y_pred = model . predict ( X_test )
14 15
16
# We c o m p u t e the C o h e n ’ s k a p p a s c o r e for the test data s c o r e s . a p p e n d ( c o h e n _ k a p p a _ s c o r e ( np . a r g m a x ( Y_test , a x i s =1) , np . a r g m a x ( Y_pred , axis =1) ) )
17 18 19 20
# We p r i n t the a v e r a g e C o h e n ’ s k a p p a s c o r e p r i n t ( sum ( s c o r e s ) / len ( s c o r e s ) ) # 0.96
In the code snippet, a loop is performed using the cross-validation splits obtained from the skf.split function. In each iteration, the training and test subsets to be used in that fold are extracted from X and Y. Then the LTCN-based classifier is configured as follows: the number of iterations is set to 10, the nonlinearity coefficient is set to 0.9 and the sigmoid function is used as the activation function. Moreover, we use the regularized learning rule shown in Eq. (10.6) such that the penalty parameter is set to 1.E-04. The classifier is fitted by calling the fit function using the problem features data (X_train) and decision classes (Y_train).
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
205
After training is done, the predict function is used to assign class labels to the test data (X_test). This function produces a value for each decision class such that the largest value marks the class label index to be returned. Such indexes can be obtained with the argmax function. After running this operation on both the predicted and ground-truth data, the predicted and ground-truth indexes are compared to evaluate the model’s performance. As mentioned, we will use Cohen’s kappa score as the performance metric. Finally, the average kappa score across all folds is computed, providing an estimation of the model’s performance on unseen data. As shown in the last line, the average Cohen’s kappa score is 0.96, which is aligned with the results reported in the literature for this classification dataset.
10.5 Empirical Evaluation of the LTCN Classifier In this section, we will explore the performance of the LTCN-based classifier in a series of empirical experiments involving real-world classification datasets. Moreover, we conduct a hyperparameter sensitivity analysis.
10.5.1 Pattern Classification Datasets The experiments involve 30 classification problems outlined in Table 10.2. All datasets have numerical features that do not contain missing values for convenience. Features were normalized using the min-max method but excluding the boundaries to avoid issues with the sigmoid function inverse. A closer inspection at Table 10.2 reveals that there are small and relatively large datasets. For example, while vehicle0, vehicle1, vehicle2, and vehicle3 have 846 instances, two classes and 18 features, plant-shape, plant-margin, and plant-texture have 1,600 instances, 64 features, and 100 classes. Furthermore, we tested the model on heavily imbalanced datasets. For example, the wine-quality-white dataset contains samples belonging to seven classes, and the imbalance ratio between the two border classes is 440:1.
10.5.2 Does the LTCN Classifier Outperform the FCM Classifier? Let us start the empirical analysis by contrasting the predictive behavior of the classic FCM-GA classifier presented in Chap. 9 using .φ = 1.0 with two LTCN-based classifiers using .φ = 1.0 and .φ = 0.0, which are extreme cases. .φ = 1.0 ensures that the model will converge to a unique fixed-point attractor, while .φ = 0.0 suppresses the network’s recurrence, thus leading to a logistic regression model. In all cases,
206
G. Nápoles and A. Jastrze˛bska
Table 10.2 Real-world classification datasets used for simulation purposes ID Name Instances Features Classes D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20 D21 D22 D23 D24 D25 D26 D27 D28 D29 D30
banana bank cardiotocography-10 cardiotocography-3 mfeat-factors mfeat-fourier mfeat-karhunen mfeat-morphological mfeat-pixel mfeat-zernike musk2 optdigits page-blocks pendigits plant-margin plant-shape plant-texture segment spambase vehicle vehicle0 vehicle1 vehicle2 vehicle3 waveform winequality-red winequality-white yeast yeast1 yeast3
5,300 4,520 2,126 2,126 2,000 2,000 2,000 2,000 2,000 2,000 6,598 5,620 5,473 10,992 1,600 1,600 1,600 2,301 846 846 846 846 846 846 5,000 1,599 4,898 1,484 1,484 1,484
2 16 19 35 216 77 64 6 240 25 164 64 10 13 64 64 64 19 18 18 18 18 18 18 40 11 11 8 8 8
2 2 10 3 10 10 10 10 10 10 2 10 5 10 100 100 100 7 2 4 2 2 2 2 3 6 7 10 2 2
Imbalance No 7:1 11:1 10:1 No No No No No No No No 175:1 No No No No No No No No No No No No 68:1 440:1 93:1 No 8:1
the number of iterations .T is set to 20 and the sigmoid function is employed to clip the concept’s activation values. Figure 10.3 displays a radar plot with Cohen’s kappa scores for all classification problems and the three classification models being compared. In this figure, the preferred classifier is the one reporting the largest area described by the Cohen’s kappa scores across all datasets. Figure 10.3 shows that the classic FCM classifier performs worse than the logistic regression model. Such a result is explained by three aspects. Firstly, the generalization capability of the classic FCM formalism is quite limited since weights
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
207
Fig. 10.3 Cohen’s kappa scores obtained by FCM and LTCN classifiers for each dataset (the larger the area, the batter)
and activation values are bounded in an interval. Secondly, the number of learnable parameters is strictly conditioned by the number of problem features. Thirdly, the model converges to a unique fixed point in most cases, which causes the classifier to recognize a single decision class. That is why the kappa score is often zero. In contrast, the LTCN-based classifier does not seem affected by the unique fixed-point attractors while outperforming the logistic regression model.
10.5.3 Hyperparameter Sensitivity Analysis A critical aspect of any classifier is understanding the effect of its hyperparameters on its performance. In the case of the LTCN-based classifier, a relevant hyperparameter is the nonlinearity coefficient that controls the balance between the features extracted by the inner neural block and the initial conditions used to feed the network. Another hyperparameter that might impact the algorithm’s efficiency is the number of iterations in the recurrent reasoning mechanism. Next, we will study the classifier’s performance when varying the values of the nonlinearity coefficient (from 0.0 to 1.0) and the maximum number of iterations (from 5 to 20). Figure 10.4 presents Cohen’s kappa scores for different parametric settings and selected datasets showing interesting patterns. The results of this analysis indicate that the parametric setting .φ = 1.0 does not necessarily imply that we will obtain the best-performing classifier. The choice of the optimal nonlinearity coefficient value is dataset-dependent. However, we can notice values slightly smaller than one often lead to decent Cohen’s kappa scores. As for the number of iterations, the results indicate that Cohen’s kappa does not increase
208
G. Nápoles and A. Jastrze˛bska
(a) D4
(b) D18
(c) D25
(d) D26
(e) D28
(f) D30
Fig. 10.4 Cohen’s kappa scores obtained by the LTCN classifier for selected datasets when varying the nonlinearity coefficient and the number of iterations
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
209
much after performing five iterations. This behavior is explained by the fact that the model often converges to a fixed point attractor, which might or might not be unique as it depends on the nonlinearity coefficient value.
10.5.4 Comparison of LTCN with State-of-the-Art Classifiers As the next step in our empirical study, we will contrast the performance of the LTCN-based classifier against state-of-the-art classifiers. The selected algorithms are SVM (Support Vector Machines), LR (Logistic Regression), DT (Decision Trees), RF (Random Forests), MLP (Multilayer Perceptron), RIPPER (Repeated Incremental Pruning to Produce Error Reduction) as described in [2], SOM (Self-Organizing Maps), GAM (Generalized Additive Models) as described in [22], LightGBM (Light Gradient Boosting Machine), HyRS (Hybrid Rule Set) as described in [21], and FRCN (Fuzzy-Rough Cognitive Networks) as described in [13]. The latter is particularly pertinent since it refers to a granular FCM-based classifier that detects consistent/inconsistent patterns and exploits them toward performing classification. Noticeably, the learning algorithm focuses on fine-tuning the initial activation values of concepts instead of the weights between concepts. In this experiment, we report the Cohen’s kappa score obtained by each classifier averaged over the 35 datasets. Moreover, we include the accuracy measure to illustrate how it can lead to over-optimistic performance scores and the training time needed to process a single fold. We performed stratified 5-fold nested cross-validation (i.e., with hyperparameter tuning using the grid search method) to promote that each model reaches optimal performance. The reader is referred to [14] for more details about the tuned hyperparameters and their values. Background Information The .k-fold nested cross-validation employs a two-level cross-validation approach involving outer and inner cross-validation procedures controlled by two loops. In the outer loop, the data is divided into .k folds such that .k iterations are performed. Each iteration holds out one fold as a test set, while the remaining .k–1 folds are combined and used for model construction and validation. The inner loop is performed within each outer loop iteration and focuses on tuning the model’s hyperparameters. It divides the training data in the outer cross-validation’s current iteration into another set of .k–1 folds for training and one fold for validation, enabling the selection of optimal hyperparameters using a grid search procedure. The grid search involves creating a grid with all possible hyperparameter combinations, fitting a model for each combination on the training set, and evaluating all these models on the validation set. This process is repeated for all folds in the inner cross-validation, resulting in an average performance score for each combination. The hyperparameter combination with the highest average performance score is selected, and a final model is fitted on the full training set from the outer loop using the best-performing hyperparameter
210
G. Nápoles and A. Jastrze˛bska
setting. Finally, this optimized model is evaluated on the test set held out in the current outer loop iteration. It is crucial to emphasize that the test set should never be used for model construction or hyperparameter tuning.
Table 10.3 displays the average accuracy, Cohen’s kappa score, and training time for each classifier on the test data after performing nested 5-fold cross-validation. The results indicate that LTCN, SVM, RF, and LightGBM are the best-performing algorithms as they report the largest Cohen’s kappa scores. Other neural classifiers like MLP and FRCN also perform well. In terms of training time, DT and LR are the fastest algorithms, while SOM and RIPPER are the slowest. As expected, the classifiers providing interpretability features (SOM, DT, LR, and RIPPER) perform poorly on the test data compared to the black boxes (SVM, RF, and LightGBM). The dissimilarities between the accuracy and Cohen’s kappa scores are also noticeable, which advocate against using the accuracy score for assessing the classifiers’ performance. Overall, the poor performance of SOM, DT, LR, and RIPPER evidences the challenges imposed by the trade-off between interpretability and accuracy in machine learning, where the complexity of deep and machine learning models conflicts with their transparency and understandability. While the literature reports several agnostic post-hoc explanation methods for black-box models, these methods might lead to misleading explanations. Furthermore, post-hoc methods often produce different explanations for the same problem instances. The LTCN-based classifier equipped with its model-dependent post-hoc method provides a good balance between interpretability and accuracy. This claim is supported by the fact that the LTCN-based classifier performs similarly to black boxes while being intrinsically interpretable to a large extent.
Table 10.3 Average accuracy, Cohen’s kappa and training time reported by each classifier after performing nested 5-fold cross-validation Accuracy Cohen’s kappa Training time Classifier SVM LR DT RF MLP RIPPER SOM LTCN
0.8733 0.8139 0.7686 0.8625 0.8503 0.9369 0.5362 0.8651
0.7671 0.6256 0.6404 0.7522 0.7278 0.6669 0.3376 0.7613
0.21 0.09 0.01 0.53 0.19 1.95 2.98 0.14
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
211
10.6 Illustrative Case Study: Phishing Dataset Next, let us present a case study concerning a binary classification problem devoted to distinguishing between phishing and legitimate web pages [5]. This study aims to demonstrate the inner working of the LTCN classifier and the model-specific posthoc method to compute feature importance. The dataset contains 10,000 instances described by 48 features and is perfectly balanced since 5,000 instances are labeled phishing, and the remaining 5,000 instances as safe. To solve this problem, we will use the LTCN-based classifier trained with the MP method, .φ = 0.8 while the number of iterations is set to 20. It should be noted that the number of iterations might be larger than needed. However, we wanted to test and visualize what happens when the recurrent reasoning mechanism performs more iterations than necessary. Such a study seems relevant since the optimal number of iterations is problem-dependent and unknown in advance. The stratified 5-fold nested cross-validation reports an accuracy of 96%, which can be seen as a quality measure of the classifier’s inner knowledge structures. Figure 10.5 displays the weights connecting the recurrent sub-network with the decision-making layer. Interestingly, the behavior of weights reflects that network converged to a fixed point after less than 15 iterations. Consequently, the learning algorithm will continue to produce the same weights after convergence. Figure 10.6 shows the distribution of learned weights that connect the neural blocks of the LTCN-based classifier. One may notice that the weights produced by
Fig. 10.5 Behavior of learned LTCN weights
Fig. 10.6 Histogram of learned LTCN weights
212
G. Nápoles and A. Jastrze˛bska
the Moore-Penrose method follow a zero-mean normal distribution with a reasonable standard deviation. Such a distribution translates into the weight matrix being relatively sparse. This effect is highly desirable since near-zero weights carry little information and could be skipped when interpreting the model. Remark: Notice that the weights do not change significantly after 12 iterations and that a large amount of these weights are close to zero, thus indicating the model is converging to a fixed-point attractor. We also know that the fixed-point attractor cannot be unique because the nonlinearity parameter used to run this experiment is strictly less than one. Overall, the simulations indicate that the LTCN-based classifier’ learning algorithm is quite robust to performing an unnecessarily large number of iterations, although when it does increase the training time.
The last step in our analysis concerns the model-specific post-hoc method. This can be done by exploring the learned weights to derive feature importance, as formalized in Eq. (10.7). The relevance score uses the absolute values of all weights linked to a given neural concept to determine its role in the decision-making process. The weights refer to those that connect the input concepts within the inner block and those that connect the inner and outer blocks. Figure 10.7 portrays the relevance scores for the ten most important features in the phishing dataset as extracted by the post-hoc method from the trained LTCN-base classifier. The relevance scores indicate that the number of dash symbols (F5) is a pivotal feature in recognizing malicious web pages. Other features that play an important
Fig. 10.7 Relevance scores for the 10 most important features in the phishing dataset (the larger the score, the most relevant the feature)
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
213
role in the classification process include number of dash symbols in the host name (F6), number of query components (F11), and whether the HTTPS is obfuscated in the host name (F20). Typically, the outcome of such a post-hoc procedure should be forwarded to domain experts to validate its correctness. Determining the importance of problem features using the discussed modelspecific post-hoc method involves two related limitations. On the one hand, this method does not account for the recurrent nature that governs the reasoning of LTCN-based classifiers. While the static model of LTCN-based classifiers is only defined by their weights, their dynamic model is influenced by factors such as initial conditions, activation functions, convergence properties and reasoning rules. On the other hand, the post-hoc method assumes that input concepts will retain their semantics during the whole reasoning process. However, whether an input concept will continue representing a given problem feature during reasoning depends on the meaning of relationships in the first neural block.
10.7 Further Readings Recent literature has showcased interesting applications and theoretical developments in the field of pattern classification using FCM-based classifiers. For example, Sovatzidi et al. [19] introduced an approach to image classification, leveraging FCMs in conjunction with high-level features extracted from a Convolutional Neural Network. Cárdenas et al. [4] trained a model specifically tailored for arrhythmia classification utilizing ECG data. Homenda et al. [9] delved into time series classification, combining fuzzy clustering techniques for automatic concept extraction and similarity-based comparison of FCM models. In an effort to enhance the accuracy of FCM-based classifiers, Yu et al. [24] integrated the capsule network into FCM reasoning, resulting in the development of novel inference rules. Their methodology involved metaheuristics for weight learning, optimizing a cross-entropy objective function with additional constraints to balance interpretability and classification performance. Frias et al. [7] followed a different research avenue concerning symbolic FCM-based classifiers, which offer heightened interpretability through the symbolic representation of both weights and concept activation values. Another angle that deserves a separate note is the resemblance between the recurrence-aware decision model discussed in this chapter and the High-order Fuzzy Cognitive Map (HoFCM) model in Stach et al. [20]. It attempted to overcome the limitations of traditional FCMs in capturing the behaviors of complex systems due to their limited first-order dynamics (i.e., a concept’s state in a given iteration depends solely on its state in the preceding iteration). In the HoFCM model, however, a concept’s state in the.t-th iteration is determined from its.t − K past activation values and also the .t − K past activation values of connected concepts. The reader is referred to [11, 17, 18, 23, 25] to delve into the usage of this model for forecasting time series.
214
G. Nápoles and A. Jastrze˛bska
The main difference between the recurrence-aware decision model and the HoFCM model is that the latter passes all aggregated values through the activation function, which is often linked to the FCM’s convergence issues.
10.8 Exercises The following exercises are based on the example presented in Sect. 10.4 and involve modifying the algorithm’s parametric settings. 1. Build and refit the FCM model again using the following regularization penalties (denoted by the alpha parameter): 1.E-2, 1.E-1, 0.5 and 1.0. Compute the prediction error for each configuration and compare the results. 2. Build and refit the FCM model again using the parameter method=’inverse’ instead of method=’ridge’. Do the results change? 3. Replace line 9 with the call to the FCM-based classifier introduced in the previous chapter. Make sure to use the same number of iterations and nonlinearity coefficient. Which algorithm performed the best?
References 1. I.D. Apostolopoulos, P.P. Groumpos, Fuzzy cognitive maps: their role in explainable artificial intelligence. Appl. Sci. 13(6) (2023) 2. S. Asadi, Evolutionary fuzzification of ripper for regression: case study of stock prediction. Neurocomputing 331, 121–137 (2019) 3. S.L. Campbell, C.D. Meyer, Generalized Inverses of Linear Transformations (Society for Industrial and Applied Mathematics, 2009) 4. O.A. Cárdenas, L.M. Flores Nava, F.G. Castaneda, J.A. Moreno Cadenas, Ecg arrhythmia classification based on fuzzy cognitive maps, in 2019 16th International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE) (2019), pp. 1–4 5. K.L. Chiew, C.L. Tan, K.S. Wong, K.S.C. Yong, W.K. Tiong, A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153– 166 (2019) 6. L. Concepción, G. Nápoles, R. Falcon, K. Vanhoof, R. Bello, Unveiling the dynamic behavior of fuzzy cognitive maps. IEEE Trans. Fuzzy Syst. 29(5), 1252–1261 (2021) 7. M. Frias, G. Nápoles, Y. Filiberto, R. Bello, K. Vanhoof, A preliminary study on symbolic fuzzy cognitive maps for pattern classification, in Applied Computer Sciences in Engineering, ed. by J.C. Figueroa-García, M. Duarte-González, S. Jaramillo-Isaza, et al. (Springer International Publishing, 2019), pp. 285–295 8. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778 9. W. Homenda, A. Jastrze˛bska, Time-series classification using fuzzy cognitive maps. IEEE Trans. Fuzzy Syst. 28(7), 1383–1394 (2020) 10. A. Jastrze˛bska, A. Cisłak, Interpretation-aware cognitive map construction for time series modeling. Fuzzy Sets Syst. 361, 33–55 (2019) 11. Z. Liu, J. Liu, A robust time series prediction method based on empirical mode decomposition and high-order fuzzy cognitive maps. Knowl.-Based Syst. 203, 106105 (2020)
10 Addressing Accuracy Issues of Fuzzy Cognitive Map-Based Classifiers
215
12. G. Nápoles, I. Grau, L. Concepcion, L. Koutsoviti Koumeri, J.P. Papa, Modeling implicit bias with fuzzy cognitive maps. Neurocomputing 481, 33–45 (2022) 13. G. Nápoles, C. Mosquera, R. Falcon, I. Grau, R. Bello, K. Vanhoof, Fuzzy-rough cognitive networks. Neural Netw. 97, 19–27 (2018) 14. G. Nápoles, Y. Salgueiro, I. Grau, M.L. Espinosa, Recurrence-aware long-term cognitive network for explainable pattern classification. IEEE Trans. Cybern. 1–12 (2022) 15. G. Nápoles, F. Vanhoenshoven, R. Falcon, K. Vanhoof, Nonsynaptic error backpropagation in long-term cognitive networks. IEEE Trans. Neural Netw. Learn. Syst. 31(3), 865–875 (2020) 16. R. Penrose, A generalized inverse for matrices. Math. Proc. Camb. Philos. Soc. 51(3), 406–413 (1955) 17. B. Qiao, J. Liu, P. Wu, Y. Teng, Wind power forecasting based on variational mode decomposition and high-order fuzzy cognitive maps. Appl. Soft Comput. 129, 109586 (2022) 18. F. Shen, J. Liu, K. Wu, Multivariate time series forecasting based on elastic net and high-order fuzzy cognitive maps: A case study on human action prediction through eeg signals. IEEE Trans. Fuzzy Syst. 29(8), 2336–2348 (2021) 19. G. Sovatzidi, M.D. Vasilakakis, D.K. Iakovidis, Automatic fuzzy graph construction for interpretable image classification, in 2022 IEEE International Conference on Image Processing (ICIP) (2022), pp. 3743–3747 20. W. Stach, L. Kurgan, W. Pedrycz, Higher-order fuzzy cognitive maps, in NAFIPS 2006 - 2006 Annual Meeting of the North American Fuzzy Information Processing Society (2006), pp. 166–171 21. T. Wang, Gaining free or low-cost interpretability with interpretable partial substitute, in Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, ed. by K. Chaudhuri, R. Salakhutdinov (PMLR, 2019), pp. 6505– 6514 22. S.N. Wood, N. Pya, B. Safken, Smoothing parameter and model selection for general smooth models. J. Am. Stat. Assoc. 111(516), 1548–1563 (2016) 23. K. Wu, J. Liu, P. Liu, S. Yang, Time series prediction using sparse autoencoder and high-order fuzzy cognitive maps. IEEE Trans. Fuzzy Syst. 28(12), 3110–3121 (2020) 24. T. Yu, Q. Gan, G. Feng, G. Han, A new fuzzy cognitive maps classifier based on capsule network. Knowl.-Based Syst. 250, 108950 (2022) 25. K. Yuan, J. Liu, S. Yang, K. Wu, F. Shen, Time series forecasting based on kernel mapping and high-order fuzzy cognitive maps. Knowl.-Based Syst. 206, 106359 (2020)
Index
A Absolute change, 71 Access to participants, 20 Accuracy, 171 Activation function, 47 Agent-based modeling, 63 Aggregate-level model, 9, 62 Assigning FCMs to agents, 68 Average distance, 98
B Betweenness centrality, 91 Binary classification, 168 Bivalent, 47 Bonferroni-Holm post-hoc procedure, 160
C Cellular automaton, 62 Centrality, 91 Chaotic, 50 Classification, 166 Class imbalance, 170 Closeness centrality, 91 Clustering, 166 Clustering coefficient, 99 CMA-ES, 134 Cognitive map, 3 Cohen’s kappa score, 210 Collective intelligence, 13 Composability of changes, 71 Constraint, 125
Constructivist psychology, 4 Convergence, 50, 145, 200 Convergence plot, 53 Crossover, 128 CUDA cores, 81 CUDA-Hybrid, 79 Cyclic, 50
D Data collection, 29 Data processing, 168 Decision class, 166 Deep FCM (DFCM), 107 Degree centrality, 91 Delay, 106 Dependencies, 68 Deterministic, 9 Diameter, 98 Differential Evolution (DE), 159 Distributed Evolutionary Algorithms in Python (DEAP), 128 Dynamical system, 5
E Eigenvector centrality, 91 Elicitation, 23 Eliminating relationships, 149 Ensemble IVFCM, 107 Evolutionary algorithm, 134 Expanded FCM, 107 Expert systems, 11
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 P. J. Giabbanelli and G. Nápoles (eds.), Fuzzy Cognitive Maps, https://doi.org/10.1007/978-3-031-48963-1
217
218
Index
Explanatory model, 8 Extended-FCM (E-FCM), 107
Intuitionistic FCM (iFCM-II), 107 Iteration, 47
F Facilitation skills, 39 Facilitator, 26 FCM for Discrete Random Variables (FCM4DRV), 107 FCMpy, 36 Feature normalization, 169 Features, 166 Feedback loop, 100 Fitness, 126, 149 Fixed-point, 50 F1-score, 172 Friedman’s test, 160 Fuzzy Grey Cognitive Maps (FGCM), 107 Fuzzy membership function, 36 Fuzzy-rough cognitive networks, 209
K Katz centrality, 91 Keep It a Learning Tool (KILT), 65 Keep It Descriptive Stupid (KIDS), 65 Keep It Simple Stupid (KISS), 65 Kernel density estimator, 150 .k-fold nested cross-validation, 209 Knowledge representation, 11
G Gaussian probability density function, 150 Generalized Logistic Function (GLF), 107 Genetic algorithms, 128 Genome, 123 Genotypical, 130 Global stabilization, 52 Graph theory, 87 Group knowledge, 3 Group modeling, 22
H Halting conditions, 71 Hand-drawn model, 29 Hidden layer, 174 High-Order FCM (HFCM), 107 High-Order Intuitionistic FCM, 107 Hold-out set, 168 Human-in-the-loop, 144 Hybrid modeling, 63 Hybrid simulation, 63 Hyperbolic tangent, 47 Hyperparameter tuning, 210
I In-degree, 91 Instances, 166 Interpretability, 5 Interval-Valued FCM (IVFCM), 107 Interventions, 8
L Label, 166 Linguistic variable, 36 Long-Term Cognitive Network (LTCN), 193
M Majority class, 170 Mask, 125 Memory, 46 Mental model, 3 MentalModeler, 4, 20 Minority class, 170 Missing data imputation, 170 Mixing, 129 Modeling software, 29 Moore-Penrose inverse, 146, 212 Multi-class classification, 168 Multi-output prediction, 153 Multivariate distribution, 68, 170 Mutation, 129
N Network generator, 66 Network science, 87 NetworkX, 99 Neural networks, 5 Neuron, 47 Neutrosophic Cognitive Map (NCM), 107 Nonlinear relation, 106 Non-linear system, 47 Normalization, 148
O One-class classification, 168 One-hot encoding, 169 Open-ended concepts, 30 Ordinary concept, 89
Index Orthogonal projection, 147 Overfitting, 171 Oversample, 171
P Parallelism, 81 Partial stabilization, 52 Participant-led mapping, 25 Participant recruitment, 37 Particle Swarm Optimization (PSO), 138, 159 Pen-and-paper, 30 Phenotypical, 130 Population synthesis, 66 Post-it notes, 30 Precision, 171 Pre-defined concepts, 30 Predictive model, 8 Purposive sampling, 37
Q Quasi-nonlinear reasoning rule, 144 Questionnaire, 26
R Real-Coded Genetic Algorithms (RCGA), 159 Recall, 171 Receiver concept, 89 Receiver-transmitter ratio, 98 Regularization, 200 Relative change, 71 Relevance scores, 212 Replicability, 68 Representative sample, 75 Risks to participants, 39 Ruleset, 63
219 Semi-structured interview, 25 Sensitivity analysis, 207 Sigmoid, 47, 150 Simulated annealing, 138 Simulation, 8 Snowball sampling, 38 Sparse weight matrices, 142 Standardized concepts, 22 State space, 49 Steady state simulation, 52 Stochastic, 10
T Tabular data, 166 Temporal conflicts, 71 Test set, 168 Time-Delay Mining Fuzzy Time Cognitive Map (TM-FTCM), 107 Time-Interval FCM (TI-FCM), 107 Tradeoff of complex models, 73 Training data, 166 Training set, 168 Transmitter concept, 89 Trivalent, 47 Type I FCM, 46 Type II FCM, 46
U Uncertainty, 106 Undersample, 171 Unfolding, 174 Univariate distribution, 170 Unsupervised Dynamic FCM (UDFCM), 107 Update, 6, 45, 144
V Validation, 97 S Sampling design, 25 Scalability, 137 Selection, 129 Semantic memory, 4
W Warm-up period, 52 Wilcoxon signed-rank test, 160