131 114 8MB
English Pages 178 [175] Year 2023
Studies in Computational Intelligence 1095
Massimo Buscema · Masoud Asadi-Zeydabadi · Giulia Massini · Weldon A. Lodwick · Marco Breda · Riccardo Petritoli · Francis Newman · Francesca Della Torre
The Topological Weighted Centroid: A New Vision of Geographic Profiling Theory and Applications
Studies in Computational Intelligence Volume 1095
Series Editor Janusz Kacprzyk, Polish Academy of Sciences, Warsaw, Poland
The series “Studies in Computational Intelligence” (SCI) publishes new developments and advances in the various areas of computational intelligence—quickly and with a high quality. The intent is to cover the theory, applications, and design methods of computational intelligence, as embedded in the fields of engineering, computer science, physics and life sciences, as well as the methodologies behind them. The series contains monographs, lecture notes and edited volumes in computational intelligence spanning the areas of neural networks, connectionist systems, genetic algorithms, evolutionary computation, artificial intelligence, cellular automata, selforganizing systems, soft computing, fuzzy systems, and hybrid intelligent systems. Of particular value to both the contributors and the readership are the short publication timeframe and the world-wide distribution, which enable both wide and rapid dissemination of research output. Indexed by SCOPUS, DBLP, WTI Frankfurt eG, zbMATH, SCImago. All books published in the series are submitted for consideration in Web of Science.
Massimo Buscema · Masoud Asadi-Zeydabadi · Giulia Massini · Weldon A. Lodwick · Marco Breda · Riccardo Petritoli · Francis Newman · Francesca Della Torre
The Topological Weighted Centroid: A New Vision of Geographic Profiling Theory and Applications
Massimo Buscema Semeion Centro Ricerche di Scienze della Comunicazione Rome, Italy
Masoud Asadi-Zeydabadi Department of Physics University of Colorado Denver Denver, CO, USA
Giulia Massini Semeion Centro Ricerche di Scienze della Comunicazione Rome, Italy
Weldon A. Lodwick Department of Mathematical and Statistical Sciences University of Colorado Denver Denver, CO, USA
Marco Breda Semeion Centro Ricerche di Scienze della Comunicazione Rome, Italy Francis Newman Department of Radiation Oncology University of Colorado Denver Denver, CO, USA
Riccardo Petritoli Semeion Centro Ricerche di Scienze della Comunicazione Rome, Italy Francesca Della Torre Semeion Centro Ricerche di Scienze della Comunicazione Rome, Italy
ISSN 1860-949X ISSN 1860-9503 (electronic) Studies in Computational Intelligence ISBN 978-3-031-28900-2 ISBN 978-3-031-28901-9 (eBook) https://doi.org/10.1007/978-3-031-28901-9 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This monograph describes in detail the foundations and applications of a new approach to geographic profiling called Topological Weighted Centroid. Both example applications and small simple illustrative examples are provided. This monograph is designed for those interested in the dynamics of geographical processes such as criminality/terrorist activity, the evolution of a disease in time and space, archeological dynamics, environmental issues, and political/cultural processes. The basic underlying principles center around looking at geographical relationships from a free energy and entropy point of view. One of the algorithms couples the analysis with graphs and minimal spanning trees and another couples the geographical analysis with an artificial adaptive system (Auto-Contractive Maps). Associated with the algorithms presented are C and Matlab codes. The Matlab code is found in the Appendix. The C code can be downloaded from https://github. com/SemeionResearch/TWC. Rome, Italy Denver, USA Rome, Italy Denver, USA Rome, Italy Rome, Italy Denver, USA Rome, Italy
Massimo Buscema Masoud Asadi-Zeydabadi Giulia Massini Weldon A. Lodwick Marco Breda Riccardo Petritoli Francis Newman Francesca Della Torre
v
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2
2 Precursor to TWC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Theory of Location and Geographic Profiling—Geometric Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Topological Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 4 6 7 8
3 The Theory of TWC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 General Introduction to TWC Theory . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 The Four Types of TWC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 How the TWC Data Is Organized . . . . . . . . . . . . . . . . . . . . 3.2 Theoretical Components of TWC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Attraction Strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 TWC Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.4 Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Free Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Alpha Star . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 TWC Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 TWC-Alpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 TWC Alpha Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Alpha Map/Scalar Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Self-Topological Weighted Centroid (STWC) . . . . . . . . . . . . . . . . . 3.5.1 Beta Star . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 TWC-Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6.1 Beta Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 TWC-Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.1 Gamma Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7.2 Gamma Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11 11 12 12 13 14 25 27 28 29 30 32 35 36 37 39 39 40 41 42 42 44 vii
viii
Contents
3.8
TWC-Theta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.1 Theta Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.2 Theta Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.3 Nonlinear Minimum Spanning Tree . . . . . . . . . . . . . . . . . . 3.8.4 Theta Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8.5 Theta Transition Matrix via Markov Chains . . . . . . . . . . . . 3.8.6 Discrete Time Markov Chain (DTMC) . . . . . . . . . . . . . . . . 3.9 TWC—Iota . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.1 Iota Projected Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.2 Meta-Distance in TWC—Iota . . . . . . . . . . . . . . . . . . . . . . . . 3.9.3 Meta-Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.4 Fuzzy Membership to the Meta-Clusters . . . . . . . . . . . . . . 3.9.5 Iota Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.6 Meta-Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
45 45 46 47 47 50 50 51 51 51 53 57 58 59 60 62
4 Illustrative Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Two Simple Example Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 TWC Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 TWC-Alpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 TWC-Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 TWC-Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 TWC-Theta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63 63 66 67 70 72 72 77
5 Advanced TWC Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.1 TWC Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2 Semantic TWC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5.2.2 Theory and Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2.3 A Comparison Between Semantic and Syntactic TWC (50 Terrorist Attacks in Afghanistan in 2009) . . . . . 87 5.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6 TWC Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Categories of TWC Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Example Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Crime: Seven Robberies in Denver, Colorado, USA, December 2010 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Crime—Italian Unibomber, Veneto, Italy, 1991–2006 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Crime—24 Drug Arrests—Denver, CO, USA, 2010–11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.4 Cocaine Trafficking in London, 2006: A Real Case . . . . .
103 104 105 105 106 106 108
Contents
6.2.5 Epidemics—Ebola Virus Outbreak of 2014 . . . . . . . . . . . . 6.2.6 Epidemics—Dengue Fever in Brazil 2001 . . . . . . . . . . . . . 6.2.7 Epidemics: COVID-19 USA—2021 Analysis . . . . . . . . . . 6.2.8 Disease Dynamics—The German HUS in May 2011 . . . . 6.2.9 Disease Dynamics: Listeria USA 2011 . . . . . . . . . . . . . . . . 6.2.10 Disease Dynamics: Hawaii, 2010 Food Poisoning . . . . . . 6.2.11 Environment—Fire, Italy, 2013 . . . . . . . . . . . . . . . . . . . . . . 6.3 A Brief Comparison Between the Geometric and Topological Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 German HUS (May 2011) . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 Oahu Food Poisoning Outbreak 2010 . . . . . . . . . . . . . . . . . 6.3.3 Ebola Epidemic (Africa 2014–2018) . . . . . . . . . . . . . . . . . . 6.4 Why TWC is Powerful Compared to Some Other Algorithms: The Russian Flu in Sweden (1889–1890) . . . . . . . . . . 6.4.1 The TWC Analysis of the Swedish Spanish Flu Outbreak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Comparison TWC with Other Algorithms . . . . . . . . . . . . . 6.5 Strengths and Weakness of TWC . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Appendices—MATLAB Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 TWC-Alpha . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 TWC-Beta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 TWC-Gamma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 TWC-Theta . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 TWC-Theta Minimal Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . . .
ix
113 116 117 119 123 124 126 128 128 129 136 140 141 147 151 151 153 153 155 157 158 162
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Chapter 1
Introduction
Topological Weighted Centroid (TWC) is a set of algorithms that take a distribution of locations whose 2-dimensional (latitude/longitude) geographical coordinates are known input data and extracts patterns, dynamics, that is, information. In particular, TWC imputes the past “source” that caused the current distribution of data and its future configurations. By “source” we mean the location of the place, with respect to minimal energy, from which the given distribution of the input data of the geographical event in question would be attained. TWC is an intelligent geographical information system method that has been outlined in the literature and can be found in, for example, [1]. This data driven approach was developed based on statistical thermodynamics, diffusion theory, optimizing free energy and entropy. Generally speaking, the TWC algorithms identify the center locations from which the impetus of current configuration of data arose or will arise. These locations are calculated as past sites, present sites, near future sites, and longer future sites. This monograph uses the word “phenomenon” to denote various events that occur across geographical space (disease, terror attacks, robberies, drug distribution, and so on). There are four sets of TWC algorithms [1]: (1) TWC-Original, (2) TWCFrequency, (3) TWC-Sematic, and (4) TWC-Windowing. Each of the four types of TWC have five algorithms that can be applied. The five TWC algorithms are called TWC-Alpha, TWC-Beta, TWC-Gamma, TWC-Theta, and TWC-Iota. This monograph deals with maps defined by a discrete set of points in a two dimensional space of real numbers R2 , a rectangular grid. The number of points in the grid is determined by the desired or required resolution of the analysis. There is also a given distribution of points, the input data, in a subset of the grid points, most often a proper subset of the set of all grid points of the rectangular area of interest. Square areas are chosen to simplify the subscript notation. Thus we define two sets of points as follows.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. Buscema et al., The Topological Weighted Centroid: A New Vision of Geographic Profiling, Studies in Computational Intelligence 1095, https://doi.org/10.1007/978-3-031-28901-9_1
1
2
1 Introduction
Definition 1 Given a two-dimensional rectangle geographic region, we call a map at a given M × N resolution, the set of M N points on this grid is called grid points and denoted = xi , y j , 1 ≤ i ≤ M, 1 ≤ j ≤ N .
(1.1)
Definition 2 Given a map , a subset U ⊂ of K ≤ M N points, called a set of distribution points or the assigned points or the data points (the input data set), is the set U = (xk , yl ) ∈ , k ⊂ {1, ..., M}, l ⊂ {1, ...N }, and most often K B , 0 otherwise
M = total number of distribution points, n = index of each distribution point, i, j = indices of each grid point, X, Y = coordinate of each grid point, x, y = coordinate of each distribution point, B = buffer zone, f, g = parameters that need to be fixed according to the data, p = parameter, determined experimentally for pi j to be a probability. 2. The Canter algorithm (see [5, 12]). This algorithm also is very well known and used in the geographic profiling literature. Its basic equation is very simple. pi j =
M √ 2 2 αe−β (X i −xn ) +(Yi −yn )
(2.2)
n=1
where α, β = free parameters adapted to the data set.
2.2 Topological Systems A new approach that has appeared relatively recently is the topological approach. It shares many characteristics of the geometrical approach, and some of the differentialalgebraic approach as well. However, the new approach makes it possible to estimate the location of both the outbreak point and the dynamic unfolding of the phenomenon by working only on the spatial distribution of the events, without any reference to the chronology and frequency of the observations. That is, the topological approach is based purely on the spatial characteristics of the phenomenon. Moreover, unlike the two previous approaches, the topological one works on the basis of an optimization
2.3 Comments
7
principle, that is, it adaptively explores the fitness landscape with respect of certain quantities (free energy and entropy), and determines the optimal solution accordingly. The aim of this section is to introduce the theoretical foundations of this topological approach in a systematic manner, and to illustrate its potential through its application to a few real case studies. Since the topological approach does not need to take into account the actual sequence and frequency of observations to derive the structural features of the underlying process and to predict its future unfolding, it is possible to apply it not only to other kinds of diffusion phenomena, as it was the case for the geographical profiling approach, but also to the analysis of cases where the spatial distribution of the events was not the outcome of a diffusion process in the proper sense of the word. For instance, the topological method can be applied to characterize the structural features of the distribution of geographical location of points that describe a certain type of human activity (urban settlements, social and economic activities, and so on), which were not actually shaped by a diffusion process of a different phenomenon, or more generally by the action of a well-recognizable agent, but which nevertheless yield an observed spatial pattern that is observationally indistinguishable from one generated by an epidemic or an agent-driven one. We speak in this case of a pseudo-diffusion pattern or, in case we focus upon the processes of the phenomenon as the conceptual benchmark, of a pseudo-phenomenon one. It will be shown that the topological approach to spatial distributions of human activities such as epidemic patterns allows us to gain considerable insight into their structural characteristics, as well as to effectively forecast their future unfolding. The topological approach, therefore, has a vast range of application to diverse classes of problems where the spatial distribution dimension has a key relevance.
2.3 Comments Understanding the spatial and temporal structure of geographic events is a theme of major concern in scientific as well as policy-making institutions. Properly tracking events that occur in geographic regions is a formidable challenge that calls for a comprehensive, multidisciplinary approach, and for state-of-the-art mathematical tools, which is able to forecast the spatial diffusion patterns of geographic events. Its time dynamics is quintessential for effective action. If an action based on a geographic analysis is a mistake, it is generally very costly both financially and may result in the loss of life. Classical ways to approach these issues have a differentialalgebraic base, namely, making use of the already available data to calibrate a set of differential equations to provide the best possible prediction of geographic events’ further unfolding. We can therefore speak of a differential-algebraic approach to the mathematical study of geographic events. The key to this modelling strategy is capturing the structure of the interactions among the main variables involved in the process, and to set the free parameters accordingly in order to reproduce the errors in the observable dynamic pattern as accurately as possible to obtain the best possible guarantee of future predictive accuracy. That is, the traditional geographic models,
8
2 Precursor to TWC
for example, for epidemics or predator-prey models of biomathematics, are mathematical models that rely on the correct encoding of the cause-effect relationships of the events in question (disease spread for example) and data is used to fit this model accordingly. Despite its intuitive appeal and it wide adoption, this approach has important limitations. 1. The interaction among variables needs to be modelled a priori, on the basis of previous data generated by the same process. 2. Large sets of temporal and spatial data are needed in order to effectively calibrate the model parameters. 3. The number of free parameters needed to attain a reasonable data fit could be relatively large. 4. Validation of the model is generally an interpolation based on data from the same epidemic, or an extrapolation based data from similar epidemics. 5. Even when the dynamical modelling provides useful cause-effect relations, calibrating the main parameters and estimating connection strengths between variables are difficult tasks. And parameters are typically changing in time as the phenomenon evolves. A more systematic quantitative comparison between TWC and the other algorithms of geographic profiling may be found in, for example, [13]. We provide below a brief comparison between geographic profiling/geometric methods and topological methods of TWC in Sect. 6.3 after the chapter on the applications. The comparisons just use TWC—Alpha and TWC—Beta since the other TWC quantities have no correspondences with the classic geographic profiling methods.
References 1. P.L. Brantingham, P.J. Brantingham, Environmental Criminology (Waveland Press Inc, Prospect Heights IL, 1981) 2. D.K. Rossmo, Geographic Profiling (CRC Press, Boca Raton, FL, 2000), p.2000 3. N. Levine and Associates, CrimeStat III—A Spacial Statistical Program for the Analysis of Crime Incident Locations (The National Institute of Justice, Washington DC, 2004), pp.101– 102 4. M. O’Leary, A new mathematical technique for geographic profiling, in The NIJ Conference, Washington DC, 17–19 June 2006 5. D. Canter, Mapping Murder: The Secrets of Geographic Profiling (Virgin Publishing, London, 2007) 6. S.C. Le Comber, D.K. Rossmo, A.N. Hassan, D.O. Fuller, J.C. Beier, Geographic profiling as a novel spatial tool for targeting infectious disease control. Int. J. Health Geograph. 10, Num. 35 (8 pages) (2011) 7. S.C. Le Comber, B. Nicholls, D.K. Rossmo, P.A. Racey, Geographic profiling and animal foraging. J. Theor. Biol. 240, 233–240 (2006) 8. P.M. Barone, R.M. Di Maggio, S. Mesturini, Materials for the study of the locus operandi in the search for missing persons in Italy, Forensic Sciences Research 2020 preprint, pp. 1–11 (published online February 23, 2021) (2020). https://doi.org/10.1080/20961790.2020.1854501
References
9
9. V. Berezowski, D. MacGregor, J. Ellis, I. Moffat, X. Mallett, More than an offender location tool: geographic profiling and body deposition sites. J. Police Crim. Psychol. (2021). https:// doi.org/10.1007/s11896-021-09475-6 10. S. Curtis-Ham, W. Bernasco, O. N. Medvedev, D.L.L. Polasche, A national examination of the spatial extent and similarity of offenders’ activity spaces using police data. ISPRS Int. J. Geo-Inf. 10(2), 47 (2021). https://doi.org/10.3390/ijgi10020047 11. G. Hajela, M. Chawla, A. Rasool, A clustering based hotspot identification approach for crime prediction. Procedia Comput. Sci. 167, 1462–1470 (2020) 12. D. Canter, T. Coffey, M. Huntley, C. Missen, Predicting serial killers’ home base using a decision support system. J. Quant. Criminol. 16 Num. 4, 457–478 (2000) 13. M. Buscema, E. Grossi, A. Bronstein, W. Lodwick, M.A. Zeydabadi, R. Benzi, F. Newman, A new algorithm for identifying possible epidemic sources with application to the German Escherichia Coli Outbreak. ISPRS Int. J. Geo-Inf. 2, 155–200 (2013). https://doi.org/10.3390/ ijgi2010155.
Chapter 3
The Theory of TWC
3.1 General Introduction to TWC Theory TWC theory consists of several components. This chapter reviews the key elements that make up the theory and the constituent algorithms that are used in TWC. There are theoretical components of TWC, among them: (1) pseudo-distances, (2) attraction strength, (3) entropy, and (3) free energy, that are used in various TWC algorithms. The development of these components is followed by the theory of the TWC algorithms. The main goal of the topological approach is the extraction from the geographical data, of the dynamics and the patterns implicit in the data and make these explicit/visible. Figure 3.1 shows the main divisions of the TWC theory. Each division is further characterized by different strategic quantities and outputs. The theoretical aspects of the five TWC algorithms depicted above can be summarized as follows. 1. TWC—Alpha represents a spatial estimate of the point or area where the process under examination “originated” (outbreak), where “originated” means where its impetus from an energy and entropy point of view. “Originated” in certain contexts means where the process started in other contexts means where the process would locate itself in the most efficient way with respect to energy expenditure. That is, the outbreak points can also be interpreted as the points from which the impetus of the process under examination is concentrated. This estimate is provided both at the point level (TWC alpha point) and at the “heat” map level (Alpha map). 2. TWC—Beta represents the current likely distribution of the process under consideration in the short term. The head map of Beta also represents the places where other hidden events of the same type could have occurred. 3. TWC—Gamma represents the likely future evolution of the TWC—Beta distribution, considering the system’s self-organizing properties.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. Buscema et al., The Topological Weighted Centroid: A New Vision of Geographic Profiling, Studies in Computational Intelligence 1095, https://doi.org/10.1007/978-3-031-28901-9_3
11
12
3 The Theory of TWC
4. TWC—Theta represents a further level of evolution over time, developed from TWC—Gamma as the “communication” and interaction between the observed events that stabilizes to become highly organized with the minimum energy required. TWC—Theta also provides a directed weighted graph from which a hypothesized flow of communication among the points is represented as connections. 5. TWC—Iota provides a “heat” map representing the infectivity rate of the area. It is also hypothesized as being a point where the process under consideration will terminate as will be described below. Iota can also represent where the beginning and end of the process associated with the given set of data point.
3.1.1 The Four Types of TWC There are four types of TWC and each type will generate five types of maps. These four types are briefly described next. Chapter 5 will describe these four types in more detail. Three of the types (original, frequency, and semantic) deal with the entire input data set that TWC processes. The fourth, TWC-Windowing, is a process by which subsets, called windows, of the overall region can be processed and then “tiled” to focus phenomena on the subset of the entire region, the windows. 1. TWC-Original: The data occurrence at a geographical location is taken as one event regardless how many there are at the associated location. The phenomena is analyzed from the point of view of a single occurrence at each of the data point locations. 2. TWC-Frequency: The data at geographical locations take into account the number of occurrences of the phenomenon being analyzed. 3. TWC-Semantic: The data at geographical locations have, in addition to location data, latitude/longitude, attributes and then analyzed. An artificial adaptive system is augmented to the either the semantic TWC-Original or the semantic TWCFrequency. 4. TWC-Windowing: Subregions of the entire area are the input data and then “tiled” together.
3.1.2 How the TWC Data Is Organized TWC considers three types of data sets: 1. Syntactic data N , Data = {P (xi , yi )}i=1
(3.1)
3.2 Theoretical Components of TWC
13
Fig. 3.1 TWC algorithms
where, N is the number of the data points (N 3), and, xi and yi are longitude and latitude of the of ith point; 2. Time series data N T , (3.2) Data = P xi,t , yi,t i=1 t=1
where Eq. (3.1) has to be modified to include a parameter that is associated with time, T is the number of the temporal data points and xi,t and yi,t are longitude and latitude of the point i at time step t; 3. Semantic data M N T Data = P xi,t , yi,t , ai,k k=1 , i=1 t=1
(3.3)
where each data-point is characterized by a set of specific features associated with Eq. (3.1), ai,k is the kth attribute associated with point i, and M is the number of the attributes (features) for each of the data points. The Fig. 3.1 shows the main components of the TWC theory. Each algorithm is further characterized by different strategic quantities and outputs that are outlined next.
3.2 Theoretical Components of TWC TWC input data points are assumed to have the following two characteristics: 1. Each point of the distribution represents a discrete manifestation of the same process;
14
3 The Theory of TWC
2. The distribution of points is statistically representative of the process being analyzed.
3.2.1 Distance The TWC approach is that the relationships among the data points are somehow related to “distances” among the points, where distance is not simply the Euclidean distance but distances related to energy or “attractions”. These energy distances result in the subsequent distribution of the data points in the data set and result in various different types of TWC maps (alpha, beta, gamma, theta, iota). This section presents the TWC concept distance as it is used in the various algorithms. The Concept of Pseudo-Distance and its Properties This section introduces the concept called pseudo-distance (or equivalently, indirectdistance). In TWC, “distances” between two points i and j involve average distances between the two designated points and all other points. These two concepts are based on the traditional mathematical definition of distance. Definition 3.1 A function d : X × X → [0, ∞) is said to be a distance (in a metric space) if it satisfies the following properties. 1.d(x, y) ≥ 0, 2.d(x, y) = 0 if and only if x = y,
(3.4) (3.5)
3.d(x, y) = d(y, x), 4.d(x, y) ≤ d(x, z) + d(z, y).
(3.6) (3.7)
Definition 3.2 The Euclidean distance between two points with two coordinates Pi = (xi , yi ), P j = (x j , y j ) ∈ R2 is d(Pi , P j ) = di j =
xi − x j
2
2 + yi − y j .
(3.8)
Definition 3.3 Given data, that is, a subset of N ≥ 3 points in the Euclidean space R2 , the pseudo-distance between points Pi and P j , with i = j, Pi , P j ∈ Dataset, is defined to be the function d(·, ·) : Data × Data → [0, ∞)
such that d(Pi , P j ) = d(i, j) = d i j =
1 N −2
0
N
k=i k= j
dik , i = j
for i = j.
(3.9)
3.2 Theoretical Components of TWC
15
where di j is the Euclidean distance between points Pi and P j . This concept can be generalized to Rn in a straight-forward manner, though we restrict ourselves to R2 . To calculate the pseudo-distance, we can also write it as follows. 1 dik N − 2 k=i N
di j =
k= j
1 = N −2
N
dik − di j − dii
k=1
N 1 dik − di j . = N − 2 k=1
The symbolic calculation of the pseudo-distance for i = 2 from a data set of 5 points is as follows. 1 1 d2k = (d23 + d24 + d25 ) 5 − 2 k=2 3 5
d 21 =
k=1
d 22 = 0 1 1 d2k = (d21 + d24 + d25 ) 5 − 2 k=2 3 5
d 23 =
k=3
1 1 d2k = (d21 + d23 + d25 ) 5 − 2 k=2 3 5
d 24 =
k=4
1 1 d2k = (d21 + d23 + d24 ) 5 − 2 k=2 3 5
d 25 =
k=5
Comment The more a point is relatively close to most of the others, the less it contributes to the pseudo-distance. It can be defined as a system of distances because it takes into account not only the individual pairs but its relationship to all the points. Moreover, given N points, this measure is a redistribution of the Euclidean distance among the N points. Euclidean distance is not a “systemic” measure in the following sense. If one of the given points changes its position, then only its distance from the other points changes but the distances among the other points do not change. The pseudo-distance, instead, is a systemic measure. If one point of the N given points changes its position, all the pseudo-distances among the other
16
3 The Theory of TWC
N − 1 points are rearranged, in order to guarantee that the N points continue to form a “cluster”. Generally speaking, the points that are more isolated from the others reduce their distance, while the points much closer to each other are pushed away. A pseudo-distance is not strictly speaking a “distance”, because it does not adhere to all the axioms of a distance; for example, the distance from A to A is not zero, but it is the average of the distance from A from the other points. It is not symmetric because the distance from A to B is different of the distance from B to A. Moreover, the triangle inequality does not hold. All the same the pseudo-distance presents some interesting features. 1. It is able to mimic the spatial behavior of biological systems such as groups of animals and also humans, which tend spontaneously to rearrange their reciprocal distances in order to maintain their group cohesion, especially in critical situations. If an individual member moves away from the group, all members reposition themselves accordingly in order to ensure that the group remains compact. This collective behavior is partially present in the plant world. 2. Given N points, the summation of all the differences (their “deltas”) between the Euclidean distance and the pseudo-distance among these points is zero. Thus, the amount of energy between the distance matrices remains constant (conserves energy). That is, the matrix of the pseudo-distance is a weighted redistribution of the matrix of the Euclidean distance. The main “goal” of this redistribution is to maintain the cohesion of the given points as a cluster for the given points considered as a system. Table 12 Points shows a possible distribution of N points (N = 12). Table E D, Table P D and Table P D E D below show respectively the Euclidean distance among them, the Pseudo-Distance and the matrix of their differences. This 12-point example is used in association with our development of the theoretical aspects of TWC. Later, two other examples of 10 points and 11 points are given in Chap. 4.
City V olterra Per ugia Cer veteri T arquinia Populonia Chiusi Ar ezzo V etulonia Bolsena Roselle V eio V ulci
Data x − latitude y − longitude 10.8662 43.3981 P1 12.3908 43.1076 P2 12.0974 41.9963 P3 P4 11.7575 42.4235 10.4917 42.9895 P5 11.9448 43.0153 P6 11.8792 43.4606 P7 10.9717 42.8572 P8 11.9861 42.6438 P9 11.1385 42.808 P10 12.4631 42.1121 P11 11.6311 42.4235 P12 Table 12 Points
3.2 Theoretical Components of TWC
17
Fig. 3.2 Map of the distribution of 12 points
The term distance has the modifiers“pseudo” and “indirect” because, as we will see in Example 3.4, Eq. (3.9) may not satisfy two of the four properties necessary to be a distance. However, it is useful to understand the effect pseudo-distance produces on the points of space. As we mentioned above, the more a point is relatively close to most others, the less it contributes to the pseudo-distance. A point that is relatively close to many other points may therefore be close to another one in terms of pseudo-distance even if the two are physically distant in terms of Euclidean distance in the original space. Vice versa, a point that is physically close to another, but relatively distant from most others, may have a large pseudo-distant from the other despite being physically close to it in the original space. Therefore, application of the pseudo-distance expands or contracts the space between two points depending on their relative position in the global distribution. The pseudo-distance between two points is therefore based on the global distribution of points, and not on the individual property of the points as occurs in the Euclidean distance which only considers the two individual points, not the whole data set of points (Figs. 3.2, 3.3, 3.4 and 3.5).
Fig. 3.3 Table ED: the euclidean distance among the 12 points
18
3 The Theory of TWC
Fig. 3.4 Table PD: the pseudo-distance among the 12 points
Fig. 3.5 Table PDED: difference among pseudo-distance and euclidean distance matrices
Property 3.1 The pseudo-distance between i and j is generally different from that between j and i. In fact, d i j − d ji
1 = N −2 1 = N −2 1 = N −2
N
dik − di j
k
N
k
N
−
dik
N
−
N
k
N
+
d jk d jk
k
k
dik
=
N k
d jk − d ji
Since, in general,
(3.10)
k
k
dik
−
N
d jk ,
.
d ji − di j N −2
(3.11)
(3.12)
3.2 Theoretical Components of TWC
19
one can see from (3.10) that the pseudo-distance is not always symmetric (violating the third property of distance, Eq. (3.6)). Property 3.2 The pseudo-distance may not satisfy the triangle inequality (3.7). In fact, using (3.10), we have d i j + d ik
1 = N −2
N
dim
− di j
m
1 + N −2
N
d jk
dim
− dik
m
N 1 = dim − di j − dik , 2 N −2 m
and
(3.13)
N 1 = d jm − d jk . N − 2 m=1
(3.14)
Thus, from Eqs. (3.13) and (3.9)
d i j + d ik − d jk =
1 N −2
1 = N −2
2 2
N
m N
− di j − dik
dim
−
dim
m
N
d jm
−
1 N −2
N
d jm − d jk
m=1
− di j + dik − d jk .
(3.15)
m=1
From (3.15), it can be seen that the condition for the triangle inequality can be violated. That is, the condition for the triangle inequality is d i j + d ik ≥ d jk d i j + d ik − d jk ≥ 0, which means that N N 1 dim − d jm − di j + dik − d jk ≥ 0 2 N −2 m m=1 N N dim − d jm ≥ di j + dik − d jk . 2 m
m=1
(3.16) The right side of (3.16) is always positive since it is the Euclidean triangle inequality whereas the left side could be negative. An example where this occurs is given next.
20
3 The Theory of TWC
Example 3.4 Suppose we have four points, P1 = (0, 1), P2 = (a, 0), P3 = (0, 2), and P4 = (0, 0).
(3.17)
1 1 1 (d13 + d14 ) + (d12 + d14 ) − (d21 + d24 ) 2 2 2 1 1 = d13 + d14 − d24 2 2 3 a = − . 2 2
d 12 + d 13 − d 23 =
The triangle inequality states that distances have the property that d12 + d13 ≥ d23 . However, if a > 3, for example, a = 4, the triangle inequality is violated. Moreover, 1 1 (d21 + d23 ) − (d41 + d43 ) 2 2 1 2 1 2 1 = a +1+ a + 4 − (1 + 2) . 2 2 2
d 24 − d 42 =
The equation d 24 − d 42 = 0, if a = 0. Otherwise, d 24 − d 42 > 0.For example, if a = 5, d 24 − d 42 =
1√ 1√ 3 26 + 29 − = 3.7421... = 0. 2 2 2
Despite not being a proper distance, as we mentioned in our comments above, the pseudo-distance possesses interesting features from the viewpoint of our topological approach. 1. The pseudo-distance between two points depends on the distances of each point from all the others, which is a “holistic” characteristic of a data set of geographic entities. 2. Although pseudo-distance basically alters the notions of closeness and distance in the Euclidean sense, the sum of all the pseudo-distances across all points equals that of the classic Euclidean distances between the same points. This is a conservation property. This latter characteristic is proved next. Proposition 3.5 Let M be the square matrix of distances of N points, that is (M)i j = di j , i, j = 1, ..., N ,
3.2 Theoretical Components of TWC
21
and let M be the matrix of pseudo-distances, M i j = d i j ,and di j = di j − d i j . Then N
di j = 0.
i, j
Proof Given i = j di j = di j − d i j
N 1 = di j − dik − di j N −2 k 1 N −1 di j − dik . N −2 N −2 k N
= Since dii = dii − d ii = 0, N
di j = dii +
j=1
=
N j=i
N
di j
j=1. j=i
N −1 1 di j − dik N −2 N −2 k N
1 N −1 di j − dik = N − 2 j=i N − 2 j=i k N
N
N −1 N −1 di j − dik = 0. N − 2 j=i N −2 k N
=
N
N
Thus, N N i=1 j=1
and the proposition holds.
di j =
N
0=0
i=1
We can calculate the summation over the rows and columns of the matrix di j as follows.
22
3 The Theory of TWC
Sum(Rowi ) =
N N di j = dii + di j j=1, j=i
j=1
⎛ ⎞ N N N N − 1 1 ⎝ = di j − dik ⎠ N − 2 j=1 N − 2 j=1, j=i k=1, j=i ⎞ ⎛ N N N − 1 1 ⎝ = di j − di j ⎠ = 0. (N − 1) N − 2 j=1 N −2 j=1
(3.18)
Likewise, Sum(Col j ) =
N
N
di j = d j j +
di j
i=1,i= j
i=1
N N N N − 1 1 = di j − dik , N − 2 j=1 N − 2 i=1,i= j k=1
(3.19)
which is not in general zero, but as we have shown, the summation of the elements of the matrix is zero, that is, N
di j = 0.
i, j=1
Example 3.6 Consider the points given by data set (3.17). d11 = 0, d12 = d21 = a 2 + 1, d13 = d31 = 1, d14 = d41 = 1, d21 = d12 = a 2 + 1, d22 = 0, d23 = d32 = a 2 + 4, d24 = d42 = a, d31 = d13 = 1, d32 = d32 = a 2 + 4, d33 = 0, d34 = d43 = 2, d41 = d14 = 1, d42 = d42 = a, d43 = d43 = 2, d44 = 0, ij
di j = ( a 2 + 1 + 1 + 1) + ( a 2 + 1 + a 2 + 4 + a) + (1 + a 2 +4 + 2) + (1 + a + 2) = 2a + 2 a 2 + 1 + 2 a 2 + 4 + 8
√ ⎤ a2 + 1 √ 1 1 √ 0 ⎢ a2 + 1 a2 + 4 a ⎥ ⎥ √ 0 D = di j = ⎢ ⎣ 1 a2 + 4 0 2⎦ 1 a 2 0 ⎡
3.2 Theoretical Components of TWC
23
1 1 1 (d13 + d14 ) , d 13 = (d12 + d14 ) , d 14 = (d12 + d13 ) 2 2 2
1 1 2 1 2 = (1 + 1) , d 13 = a + 1 + 1 , d 14 = a +1+1 2 2 2
d 11 = 0, d 12 = d 11 = 0, d 12
1 1 1 (d23 + d24 ) , d 22 = 0, d 23 = (d21 + d24 ) , d 24 = (d21 + d23 ) 2 2 2
1 2 1 2 1 2 = a + 4 + a , d 22 = 0, d 23 = a + 1 + a , d 24 = a + 1 + a2 + 4 2 2 2
d 21 = d 21
1 1 1 (d32 + d34 ) , d 32 = (d31 + d34 ) , d 33 = 0, , d 34 = (d31 + d32 ) 2 2 2
1 2 1 1 1 + a2 + 4 = a + 4 + 2 , d 32 = (1 + 2) , d 33 = 0, d 34 = 2 2 2
d 31 = d 31
1 1 1 (d42 + d43 ) , d 42 = (d41 + d43 ) , d 43 = (d41 + d42 ) , d 44 = 0 2 2 2 1 1 1 = (a + 2) , d 42 = (1 + 2) , d 43 = (1 + a) , d 44 = 0 2 2 2
d 41 = d 41 ij
1
1 1 2 a +1+1 + a2 + 1 + 1 + (1 + 1) + 2 2 2
1 1 1 2 a2 + 4 + a + a2 + 1 + a , + a + 1 + a2 + 4 + 2 2 2
1
1 2 1 1 + a2 + 4 + a + 4 + 2 + (1 + 2) + 2 2 2 1 1 1 (a + 2) + (1 + 2) + (1 + a + 2) 2 2 2 17 2a + 2 a 2 + 1 + 2 a 2 + 4 + 8 = 2a + 2 a 2 + 1 + 2 a 2 + 4 + 2
di j =
⎡
0
1
⎢√ 2 a +4 ⎢ + a2 0 ⎢ D = di j = ⎢ √ 2 ⎢ a 2 +4 + 1 3 ⎣ 2 2 a 3 + 1 2 2
√ √ 2 2 1 + a2+1 21 + a2+1 2 √ √ √ 2 2 a 2 +1 + a2 a2+1 + a2+4 2 √ a 2 +4 0 + 21 2 a + 23 0 2
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎦
24
3 The Theory of TWC
Example 3.7 (Distance and Pseudo-distance Matrices) Suppose we are given a the table of 10 points below. Data P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
x 2 2 35 28 30 34 35 33 31 28
y 11 13 14 14 15 10 12 16 10 12
Table of 10 Points The distance matrix for these 10 points is as follows. ⎡
di j ⎢i = 1 ⎢ ⎢i = 2 ⎢ ⎢i = 3 ⎢ ⎢i = 4 ⎢ ⎢i = 5 ⎢ ⎢i = 6 ⎢ ⎢i = 7 ⎢ ⎢i = 8 ⎢ ⎣i = 9 i = 10
j =1 j =2 0.00 2.00 2.00 0.00 33.14 33.02 26.17 26.02 28.28 28.07 32.02 32.14 33.02 33.02 31.40 31.14 29.02 29.15 26.02 26.02
j =3 j =4 33.14 26.17 33.02 26.02 0.00 7.00 7.00 0.00 5.10 2.24 4.12 7.21 2.00 7.28 2.83 5.39 5.66 5.00 7.28 2.00
j = 5 j = 6 j = 7 j = 8 j = 9 j = 10 28.28 32.02 33.02 31.40 29.02 26.02 28.07 32.14 33.02 31.14 29.15 26.02 5.10 4.12 2.00 2.83 5.66 7.28 2.24 7.21 7.28 5.39 5.00 2.00 0.00 6.40 5.83 3.16 5.10 3.61 6.40 0.00 2.24 6.08 3.00 6.32 5.83 2.24 0.00 4.47 4.47 7.00 3.16 6.08 4.47 0.00 6.32 6.40 5.10 3.00 4.47 6.32 0.00 3.61 3.61 6.32 7.00 6.40 3.61 0.00
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
The pseudo-distance matrix for these 10 points is as follows. ⎡
di j ⎢i = 1 ⎢ ⎢i = 2 ⎢ ⎢i = 3 ⎢ ⎢i = 4 ⎢ ⎢i = 5 ⎢ ⎢i = 6 ⎢ ⎢i = 7 ⎢ ⎢i = 8 ⎢ ⎣i = 9 i = 10
j =1 j =2 0.00 29.88 29.82 0.00 8.38 8.39 7.77 7.79 7.44 7.47 8.44 8.42 8.29 8.29 8.23 8.26 7.79 7.77 7.78 7.78
j =3 j =4 25.99 26.86 25.95 26.82 0.00 11.64 10.16 0.00 10.34 10.69 11.93 11.54 12.17 11.51 11.80 11.48 10.71 10.79 10.12 10.78
j = 5 j = 6 j = 7 j = 8 j = 9 j = 10 26.60 26.13 26.01 26.21 26.51 26.88 26.56 26.05 25.95 26.18 26.43 26.82 11.88 12.00 12.27 12.16 11.81 11.61 10.76 10.14 10.13 10.36 10.41 10.79 0.00 10.17 10.25 10.58 10.34 10.52 11.64 0.00 12.16 11.68 12.07 11.65 11.69 12.14 0.00 11.86 11.86 11.54 11.76 11.39 11.59 0.00 11.36 11.35 10.78 11.04 10.86 10.63 0.00 10.97 10.58 10.24 10.16 10.23 10.58 0.00
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
3.2 Theoretical Components of TWC
25
The next table has the differences between distances and pseudo-distance matrices M N ×N = M N ×N − M N ×N . ⎡
M N ×N ⎢i = 1 ⎢ ⎢i = 2 ⎢ ⎢i = 3 ⎢ ⎢i = 4 ⎢ ⎢i = 5 ⎢ ⎢i = 6 ⎢ ⎢i = 7 ⎢ ⎢i = 8 ⎢ ⎣i = 9 i = 10
j =1 0.00 −27.82 24.76 18.41 20.85 23.58 24.73 23.18 21.23 18.24
j =2 −27.88 0.00 24.62 18.23 20.61 23.72 24.73 22.89 21.38 18.24
j =3 7.15 7.07 0.00 −3.16 −5.24 −7.80 −10.17 −8.97 −5.05 −2.84
j =4 −0.69 −0.80 −4.64 0.00 −8.46 −4.33 −4.23 −6.09 −5.79 −8.78
j =5 1.69 1.51 −6.78 −8.52 0.00 −5.24 −5.86 −8.59 −5.68 −6.98
j =6 5.88 6.09 −7.88 −2.93 −3.77 0.00 −9.90 −5.31 −8.04 −3.92
j =7 7.01 7.07 −10.27 −2.85 −4.41 −9.93 0.00 −7.12 −6.39 −3.16
j =8 5.19 4.97 −9.34 −4.98 −7.42 −5.60 −7.38 0.00 −4.30 −3.83
j =9 2.51 2.73 −6.15 −5.41 −5.24 −9.07 −7.38 −5.04 0.00 −6.98
j = 10 −0.86 −0.80 −4.33 −8.79 −6.92 −5.33 −4.54 −4.95 −7.36 0.00
⎤ sum 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 0
Remark 3.8 The negative sign in an entry of the difference matrix shows the points for which the space is “expanding”, while the points with positive entries show the points for which the space is “contracting”. Moreover, when the pseudo-distance matrix is treated as a distance matrix, and the pseudo-distance matrix of this distance matrix is calculated, that is the meta-distance, the result defines a global attractor distribution. Thus, the meta-distance is considered as being an attractor.
3.2.2 Attraction Strength What is termed in TWC as attraction strength is an exponential that decays as the distance (pseudo-distance) becomes larger. It quantifies the attraction strength of each given point with respect to all others, where strengths exponentially decay with indirect distance. Historically, the concept of strength of attraction has always been linked to objects with a mass. This section proposes a definition that extends this concept to objects (occurrences of a disease, a robbery, and so on) with no associated mass. For TWC we define attraction as linked with the concept of pseudo-distances. Definition 3.9 (Attraction Strength) Given N points in a 2-dimensional space, the non-parametric attraction strength wi that a point i exerts on all the others is given by 1 − di j e D N − 1 j=i N
wi =
26
3 The Theory of TWC
where d i j is the pseudo-distance between points i and j and D = max i, j
d i j . The
parametric (with parameter αn ) strength of attraction is given by wi (αn ) =
N di j 1 e − D αn N − 1 j=1, j=i
(3.20)
and αn ≥ 0 corresponds to the algorithm being used, and it is the inverse of the simulated temperature of the entire system of points at any step. For TWC Semantic 1 − d i j +di j D wi (αn ) = e N − 1 j=i N
[C M]
αn
where the αn is a natural number and modulates the attraction strength of points over time according to the rule αn+1 = αn + ∈ and di[Cj M] arise from the contractive map algorithm, ACM (see [1, 2]). A self-attraction strength is an attraction strength where the point itself is not excluded from the attraction and in this case there are N points, not N − 1. This is denoted by the parameter as β (βn+1 = βn + ∈) so that wi (βn ) =
N 1 − d i j βn e D N j=1
and the Semantic TWC for the self-attraction case is, Swi (βn ) =
N [C M] 1 − d i j +di j βn D e . N j=1
Remark 3.10 The stronger the attraction strength is at one point, the more such a point “communicates” with the others, that is, the more its location becomes “meaningful” once put in relation to all the other points. The non-parametric version is used to calculate the “absolute” strength value of each point. The parametric version is used in the calculation of TWC alpha trajectories (see below). The “self” version is used to identify the beta star parameter. That is, Eq. (3.20) quantifies the attraction strength of each given point with respect to all others, where strengths exponentially decay with respect to the pseudo-distance. If each point of the data set has an associated energy in the plane, then the stronger the attraction strength at one point has, the more such a point influences each of the other points.
3.2 Theoretical Components of TWC
27
3.2.3 TWC Trajectories What TWC terms as the (discrete) trajectory is the path connecting the center of mass with “optimal” points calculated by the five TWC algorithms, alpha, beta, gamma, or iota. Given the set of points (x, y) from the grid that, once drawn on a Cartesian plane, determine a trajectory whose starting point is the center of mass of the assigned (given data) points and the arrival point is called TWC alpha or beta.
T W C x (αn ) =
1
N wi (αn )xi ,
(3.21)
N wi (αn )yi .
(3.22)
N i=1 wi (αn ) i=1
T W C y (αn ) =
1
N i=1 wi (αn ) i=1
forms a trajectory. That is, the trajectory is constructed by weighing the coordinates of the center of mass with the attraction strength of the individual points as the alpha parameter changes. The alpha parameter simulates, as in thermodynamic systems, the inverse of the temperature. Remark 3.11 The alpha (beta, gamma, theta, iota) path is the vector of points starting from the center of mass and arriving to the alpha point. The more the alpha path is nonlinear, the more complex is the set of assigned (input data) points, their distribution. In this case it means that the assigned points interact strongly generating many side effects. That is, any minimal displacement of some of assigned points may generate a strong rearrangement of all the others. Each point of the alpha path that is closer to the alpha point represents a place where there is a reduction of the entropy of the distances among the assigned points. The Euclidean distance between any component of the alpha path vector and the next component represents the instant “velocity” of an abstract object in that position along the path. The alpha path, in short, should represent the ideal way that is the safest path from the center of mass to the alpha point of a prey that tries to control, in the best way, the “attacks” coming from the assigned points. Figure 3.6 shows the alpha path for the 12 point example.
28
3 The Theory of TWC
Fig. 3.6 The alpha path from center of mass and alpha point
3.2.4 Entropy The information theory concept of entropy is redefined in the context of a distribution of points. Entropy measures the organization of a system. However, information in the context of geographical assigned points needs to be defined first. Definition 3.12 (Information) The information contained in an event E emitted from a source, is calculated as I (E) = − logb P (E) ,
(3.23)
where b is the base of the logarithm and P (E) is the probability of event E. Usually b = 2. Remark 3.13 The main idea behind Definition 3.12 is that the less likely an event is, the more information it carries when it occurs. Moreover, the place from which our information will emanate is our topological weighted centroid. A natural consequence, once the information of an event has been defined, is that it is possible to define the entropy of the system. Definition 3.14 (Shannon Entropy) The Shannon entropy of a source X is defined as the expected value of the information, N H (X ) = E( − logb P (X )) = − Pi (X ) logb Pi (X ) . i=1
(3.24)
3.2 Theoretical Components of TWC
29
The last equality of Eq. (3.24) holds if X is a discrete random variable. The objective at this point is to find a probability associated with the points of the starting data set so we can define an analogous measure of entropy for our case. To do so, the weights (attraction strength), wi (αn ), need to be normalized so that it is akin to a probability, wi (αn ) . pi (αn ) = N wi (αn ) i=1
For TWC, entropy H is defined as follows. Definition 3.15 (TWC Entropy) The TWC entropy for each iteration of TWC( n), that is, for each n, is defined, according to the Shannon entropy, as N H (αn ) = − pi (αn ) log2 ( pi (αn )) ,
(3.25)
i=1
where αn is our parameter. Remark 3.16 The value of the entropy at the center of mass is H (α0 ) = H (0) = log2 N , so that the entropy is maximum. The value of the entropy decreases monotonically from the center of mass until it attains a minimal value when H (αn→∞ ) = H (αn → ∞). This is the TWC point that corresponds to where the summation of d i j is minimal. This implies that, with each iteration, it will be possible to calculate the probability of each wi (αn ) and consequently the information of each one by means of Eq. (3.23).
3.2.5 Free Energy This section introduces the concept of “free energy” from physics. However, we adapt this concept to the case of a distribution of points. In physics, free energy, F, is obtained from the internal energy, U , minus the product of temperature and entropy, T S, F = U − T S. (3.26) Free energy is a measure of the amount of energy available to do work. Whenever an energy change takes place, the total amount of energy stays the same, in a conservative system. As the entropy increases, the disorder energy increases and free energy decreases. The concepts and laws coming from physics, in this context, free energy and entropy, are useful concepts for understanding our approach. We use the concept of a thermodynamical system as function of T , where T = 1/αn , αn our parameter, to calculate the free energy in the context of a distribution of points.
30
3 The Theory of TWC
TWC uses the concept of free energy in the context of geometric distributions of events in a time/space geographic domain that is akin to concept of free energy in thermodynamics. This is done as follows. Definition 3.17 Given Z (αn ) =
N wi (αn ), i=1
the free energy of the system is F(αn ) = −
ln Z (αn ) . αn
(3.27)
Remark 3.18 The parameter alpha plays the role of the inverse of temperature. The singular points of free energy at the high temperature phase, where T =
1 αn
is close to the center of mass (is close to the low temperature phase, or is close to the maximum density), and are separated by a region where the free energy is maximum at least at one point. The more a point is relatively close to most of the others, the less it contributes to the distance. It can be defined as a system distance because it takes into account not only the individual pairs but the set of all the points (this insight is due to Roberto Benzi).
3.2.6 Alpha Star There is a (unique) point α∗ of maximum free energy, F(α∗ ) = max {F(αn )} . αn
(3.28)
Theorem 3.19 max {F(αn )} exists and is unique, that is, α∗ , such that F(α∗ ) = αn
max {F(αn )} is unique. αn
n) to F(x) = − ln Zx(x) . Then Proof Change the notation of F(αn ) = − ln Zα(α n
3.2 Theoretical Components of TWC
31
ln Z (x) x ln wi (x) =− x d 1 N − Di j x ln N −1 j=i e . =− x
F(x) = −
Now, x > 0 so that F(x) is continuous. By construction, the sequence αn , our x, is an increasing sequence that has an upper limit. Thus x ∈ [α0 , A], α0 > 0. This means that F(x) has a maximum in [α0 , A] (since F is continuous). Thus, an optimal exists. Next let us look at uniqueness.
1 wi (x) x −1 − ln wi (x) x −2 . F (x) = − wi (x) But
wi (x) = −
di j wi (x), D
so that
di j 1 −1 −2 − wi (x) x − ln F (x) = − wi (x) x wi (x) D
di j 1 wi (x) x −1 + ln wi (x) x −2 = D wi (x)
This means that F (x) = 0 for
di j 1 wi (x) x −1 = − ln wi (x) x −2 wi (x) D di j 1 wi (x) = F(x). wi (x) D ! ! This is a fixed point problem. A necessary condition is that ! F (x)! ≤ 1. This condition means that
32
3 The Theory of TWC
! ! ! !
di j 1 ! −1 −2 ! wi (x) x + ln wi (x) x ! ≤ 1 ! ! wi (x) ! D ! ! ! !
di j 1 ! −1 ! wi (x) x ! ≤ x wi (x) + ln ! ! wi (x) ! D ! ! ! ! di j 1 ! ! wi (x) − F(x)! ≤ x ! ! wi (x) ! D so that di j 1 −x ≤ wi (x) − F(x) ≤ x wi (x) D is our fixed point condition. Now di j min d i j 1 ≤ wi (x) ≤ 1 wi (x) D D so that di j min d i j 1 − F(x) ≤ wi (x) − F(x) ≤ 1 − F(x). wi (x) D D The condition then is that min d i j − F(x) ≤ 1 − F(x) ≤ x, D min d i j −x + F(x) ≤ ≤ 1 ≤ x + F(x). D −x ≤
(3.29)
This expression is dominated by x and x increases so that Eq. (3.29) holds. In fact it holds with strict inequality so that the fixed point (or maximum) is unique. The maximum value of free energy iteration therefore represents a unique point. After the point associated with α∗ , the entropy and free energy begins to decrease together, while the temperature monotonically decreases.
3.3 TWC Algorithms The five TWC algorithms depict, in the form of a map, what we call a scalar field. In general, the scalar field or TWC map is as follows.
3.3 TWC Algorithms
33
Definition 3.20 (Scalar Field) A scalar field or TWC map defined on a region of space U is a function s : U → X , where X ∈ R (a real number) that associates each point of U ⊂ Rn (in our case, a point is (x, y), R2 , 2 − D) with a scalar value, a real number. The real number will be different for each of our five TWC types. The five types of TWC algorithms use the components discussed in the previous sections. However, each type has its own approach to the implementation of these components. TWC theory is based on the definition of a different type of “centroid”. This point is no longer only a function of the Euclidean distances of the points but also depends on their pseudo-distances and is modulated by a control parameter αn ( β, γ). By considering distances in the modified space, that is, the space in which pseudo-distances have modified the relationships between points by expanding or contracting the Euclidean distance, it is possible to take into account the structure of the distribution of points as a whole and in particular its varying local levels of organization, which represent, from the TWC point of view, a key piece of information used inferring the temporal unfolding of the process represented by the way the points geographically are found. Topological weighted centroid is defined next where the parameter αn could also be β, γ. N ⊂ R2 , Definition 3.21 (Topological Weighted Centroid) Given D = (xi , yi )i−1 the topological weighted centroid (TWC) of the points belonging to D for any parameter αn is given by the point T W C(αn ) = specific value of the control T W C x (αn ), T W C y (αn ) ∈ R2 , where the coordinates are: N wi (αn )xi
T W C x (αn ) =
i=1 N
,
(3.30)
.
(3.31)
wi (αn )
i=1 N wi (αn )yi
T W C y (αn ) =
i=1 N
wi (αn )
i=1
Remark 3.22 For the TWC algorithms, α0 = 0, αn+1 = αn + 0.1 (α being a generic parameter and could be β, γ) where these values are used in the implementation of all the algorithms. Proposition 3.23 Given the T W C(αn ) for a set D of N points in R2 , then,
N
N
T W C(α0 ) = xi , yi i=1 i=1 R
R 1 , limn→∞ T W C(αn ) = 2R k=1 x ik , x jk , k=1 yik , y jk 1 N
1 N
(3.32)
34
3 The Theory of TWC
where xik , x jk , yik , y jk = arg min Ti j , i, j ∈ {1, ..., N } identify the points between which the pseudo-distance is minimal. There could be more than one such minimal pseudo-distance. Proof (a) If n = 0, α0 = 0, and
T W C x (0) =
N wi (0)xi
N
i=1 N
i=1
=
N
wi (0)
i=1 1 N −1
1 N −1
1 N −1
=
N
N
xi
e−
di j D
0
xi
j=i, j=1
1 N −1
i=1 N
N
e−
di j D
0
j=i, j=1
1
j=i, j=1
i=1
=
1 N −1
N N
1
i=1 j=i, j=1
N
i=1 N N −1
xi
1 xi . N i=1 N
=
Similarly, 1 yi . N i=1 N
T W C y (0) =
This is the center of mass. That is, the algorithm starts at the center of mass. (b) For TWC, αn , n > 0, the case of the x coordinate will be proved. The y coordinate is done in the same way.
T W C x (αn ) =
N wi (αn )xi
N
i=1 N
i=1
wi (αn )
i=1
=
1 N −1
N i=1
N
e−
di j D
αn
xi
j=i, j=1
1 N −1
N
. e
d − Di j
αn
j=i, j=1
Since the free energy identifies the optimal α, at the optimal alpha, α∗ , we have
3.4 TWC-Alpha
T W C x (α∗ ) =
35 N wi (α∗ )xi
N
i=1 N
i=1
=
wi (α∗ )
i=1
1 N −1
N
N
e−
di j D
α∗
xi
j=i, j=1
1 N −1
i=1
N
, e
−
di j D
α∗
j=i, j=1
which is the second the above equation.
The TWC weighted centroid at the zeroth iteration coincides with the classic center of mass N N 1 1 CM = xi , yi N i=1 N i=1 whose coordinates are nothing more than the average of the coordinates of all the points of the given set. As αn increases monotonically with a small quantity αn+1 = 1 as mentioned above, the centroid moves away from the classic αn + , we use 10 center of mass. Definition 3.21 considers a centroid point for each iteration of the algorithm, that is, for each element of the alpha sequence. This means that, considering very small values of > 0, when αn varies, a trajectory will be drawn. Since what we are looking for is only one point that can constitute the centroid not merely topologically but weighed by the intrinsic semantics of the points of D, it is necessary to determine the optimal alpha value. In order to do this, some objective functions will be defined in the following sections. To this end, the two concepts previously defined, entropy and free energy, in the context of a geographic distribution of points, are used.
3.4 TWC-Alpha The computation of the TWC-Alpha map uses the concept of alpha star. Recall that F(α∗ ) = max {F(αn )} . αn
It is this alpha star that is used.
36
3 The Theory of TWC
3.4.1 TWC Alpha Point 1
∗
T W C x (α ) =
N wi (α∗ )xi ,
(3.33)
N wi (α∗ )yi .
(3.34)
N i=1 wi (αn ) i=1
T W C y (α∗ ) =
1
N i=1 wi (αn ) i=1
Remark 3.24 The TWC alpha point is the point where the free energy is maximum and, as demonstrated experimentally, the entropy of the system is minimal. It is therefore a point of maximum organization. For this reason, it was elected as the point of the possible origin (impetus) of the system’s dynamics. In a certain sense, if a musician were placed at each assigned point, the alpha point would be the point where one could best listen to the “harmony” produced by all the “musicians”. Moreover, the alpha point detects the presumed outbreak source (impetus) of the distribution, the given input data set (the assigned points). Mathematically speaking, it is the opposite of the center of mass (centroid). In fact, given N points, the center of mass is the place from where all the points appear most similar to each other. Recall that the center of mass is based on the concept of average, and the average minimizes the sum of the square distances of all the points from the average itself. Consequently, the entropy of the distances in this place is maximal. The alpha point works in opposite way. The alpha point represents the place from where all the given points appear most different. For the same reason the alpha point is the place where the entropy is minimal and the free energy is maximal. Thus, the alpha point is the place from where all the given points have the maximal order and the maximal free energy. A biological example can help. Let us imagine that the N given points represent the places from which a predator can come. Let us imagine also that a prey, knowing the positions of these points, needs to choose the safest place to have a rest. Considering these conditions, the alpha point represents, for the prey, the place from where it can control all the given points with the minimum of energy, because from this position they appear different from each other. Furthermore, the alpha point also represents, for the prey, the place where it maintains the maximum free energy it can spent. The inverse logic can be applied to this example. When we have N points where an unknown process is manifest, the alpha point can be said to be the most probable place from where the process itself originated (outbreak or impetus point). The more the alpha point is far away from the center of mass, the more the assigned points distribution is organized, and consequently the more “nonintuitive” the estimation of the outbreak or impetus point is when looking at the assigned point (input
3.4 TWC-Alpha
37
data) distribution. Figure 3.6 shows N assigned points (n = 12) and the position of alpha point and the center of mass.
3.4.2 Alpha Map/Scalar Field The Alpha Map measures the degree of “activation” of different regions of the space in terms of their proximity, with respect to pseudo-distances, to the path leading to alpha star. The Alpha Map therefore estimates the short-term unfolding of the process by means of the observed distribution of past events. The associated equations are as follows (Fig. 3.7). mk j =
xk − T W C x (α j )
2
2 + yk − T W C y (αk ) .
For Semantic TWC the equations changes as follows where we repeat the Euclidean distance with another variable name since it will be changed accordingly. dk j =
xk − T W C x (α j )
2
2 + yk − T W C y (αk ) .
1 [C M] d N − 1 n kn N
Mean Di[C M] =
1 [Euclidean] = d N − 1 n kn
[Euclidean] 1 N [C M] dkn n Mean Di N m lk = [Euclidean] 1 N n Mean Di N N
Mean Di[Euclidean]
Fig. 3.7 Center of mass, path, distance
38
3 The Theory of TWC
ak =
N −1 1 − mk j β∗ e D , N j=0
where β ∗ (beta star) is defined below and the k is the index for each point of the plane and j is the index of each point of the TWC trajectory. The Alpha Map therefore estimates the short-term unfolding of the process by means of the observed distribution of past events. Remark 3.25 The Alpha Map represents how much every grid point of the grid in which the process occurs is close to a “safe path”, its alpha path. Because all the grid points cover the entire grid, the Alpha Map is a scalar field that may be represented with a “hot map”, whose colors vary from dark red (grid points closer to the alpha path) to dark blue (grid points far away from the alpha path). If the resulting Alpha map is divided into intervals (bins) according to their color value, then it is possible to calculate the probability that each area of the map is selected in a random search. The less the probability is, the darker red area of the map, the more probable the outbreak point is location identified by the alpha point. The dark red area of the Alpha Map, in brief, indicates the average error of the position of the alpha point and it identifies the zone where the alpha point can be found. Because the alpha quantities detect the presumed origin or impetus of the assigned points distribution, then all these quantities linked to the alpha (alpha point, alpha path and Alpha Map) work as inferences connected to the past of the process represented by the assigned or input data points. Figure 3.8 shows the Alpha Map of the 12 point example, divided into 20 intervals (bins).
Fig. 3.8 Alpha map of the 12 point example
3.5 Self-Topological Weighted Centroid (STWC)
39
3.5 Self-Topological Weighted Centroid (STWC) The self-topological weighted centroid points are pairs of coordinates (x, y) that form a trajectory similar to that of TWC-Alpha, but this time it considers the self attraction strength and defined below by Eqs. (3.35) and (3.36). The set of points (x, y) that, once drawn in a Cartesian plane, determine a trajectory whose starting point is the center of mass of the assigned points and the “optimal” point, the beta star (formally defined below) is determined by the maximum distance before the trajectory comes back towards the center of mass. The trajectory starts from the center of mass and stops when the maximum distance between the starting point and the point calculated by the iteration is attained. It is experimentally indicated that after a certain number of iterations, not known a priori, the trajectory curves and tends to return to the starting point since the strength of the point itself becomes dominant with respect to the others. The associated equations are as follows. ST W C x (βn ) =
1
N wi (βn )xi ,
(3.35)
N wi (βn )yi .
(3.36)
N i=1 wi (βn ) i=1
ST W C y (βn ) =
1
N i=1 wi (βn ) i=1
3.5.1 Beta Star The beta star is the optimum control parameter of the STWC. It corresponds to the point where the trajectory stops. That is, β ∗ = max dist ST W Cβn , Center Mass βn
(3.37)
The beta star parameter is of fundamental importance in the calculation of scalar fields (heat maps). Remark 3.26 Given a distribution of N points in a plane, the optimal beta parameter (β ∗ ) is the width of the bell curve preserving the interaction among the points. Under this optimal width each point tends to interact mainly with itself. This interaction is simulated considering, for each assigned point, a function of its self attraction from the other points, including itself. This quantity, weighted by the x and y of each point, is repeated in a loop, increasing the beta parameter monotonically. At each cycle a new pair of coordinates is generated (a new beta point). When the beta parameter is β0 the beta point is the same as the center of mass. Increasing the beta parameter,
40
3 The Theory of TWC
Fig. 3.9 Beta star for the 12 point example
the beta point moves away from the center of mass. But for a specific value of the beta parameter (depending on the distribution of the assigned points), the distance between the beta point and the center of mass stops increasing (in distance) and for bigger values of the beta parameter, the beta point subsequently starts to decrease in distance to the center of mass until the beta point is again the center of mass. When the increasing value of the beta parameter does not increase the distance between the beta point and the center of mass, it means that the interaction of each point with itself is becoming bigger than the interaction of that point with the others. Consequently, this value of the beta parameter, called Beta Star, is the limit over which the N assigned points stop to further influence the system of points. Figure 3.9 illustrates the optimal beta parameter.
3.6 TWC-Beta The Beta Map defines the probability of observing new points (events) that belong to the same distribution of input data. It begins with the input data coordinates of the points in space from which the scalar field is calculated and the coordinates of the assigned/given data points. The associated calculations are as follows where the index i relates to the assigned or given data points and the index j relates to the grid points. The matrix of beta points is n i j where ni j =
xi − x j
2
2 + yi − y j .
(3.38)
For the Semantic TWC this matrix is changed as follows. Here we repeat the distance between any grid point and the assigned point using another label since the beta points given by 3.38 will be modified.
3.6 TWC-Beta
41
di j =
xi − x j
2
2 + yi − y j
1 [C M] d N − 1 n kn N
Mean Di[C M] =
1 [Euclidean] d N − 1 n kn
[Euclidean] 1 N [C M] dkn n Mean Di N n lk = [Euclidean] 1 N n Mean Di N N
Mean Di[Euclidean] =
bk =
N −1 1 − mk j β∗ e D , N j=0
and β ∗ is defined in Eq. (3.37) and the k is the index of each point of the plane and j is the index of each point of the TWC trajectory.
3.6.1 Beta Map The Beta Map defines the probability of observing new points (events) that belong to the same distribution, that is, to the same data generating process. The Beta Map can therefore be regarded as a snapshot of a probability density function which is implicit in the structure of the data set. The output of the Beta Map is a K × K matrix, n i j , 1 ≤ i, j ≤ K . So in the end we have an activation value in the interval [0, 1] for each point of the grid. Remark 3.27 The beta map represents a distribution function of other points, beyond the assigned points, at the time when the assigned points were collected. The beta map is a scalar field that may be visualized as a hot map: the darker red the area is, the higher is the probability to detect that there other points of the same process. In the epidemic analysis the red areas can point out new areas of interest where new cases can be found. The Beta map is regulated by three factors: a. The beta star parameter to tune the influence that each one of the assigned points can have on the others; b. The pseudo distance; c. The attraction strength (wi ) that each point exercises in its neighboring area. A biological example might help. Imagine that the N given points represent the places from which a predator can come. The beta map can be interpreted to represent the areas where the probability to find others predators is higher. In other words, the Beta map represents a map of probability to find new points at the present time. Figure 3.10 shows the beta map of the twelve point example.
42
3 The Theory of TWC
Fig. 3.10 Beta map for the 12 point example
3.7 TWC-Gamma The gamma algorithm further probes into future configurations of the underlying structure of the process being studied.
3.7.1 Gamma Paths Gamma paths are trajectories which, considering the attraction strengths exerted by each assigned point, connect the center of mass to each of them. Gamma paths consist of a finite number of points that varies from trajectory to trajectory. The control parameter is γi , for each of the source points. The input data consist of the coordinates of assigned points as well as the matrix of pseudo-distances. The output generates N different discrete trajectories, where N is the number of assigned points as before, connecting the center of mass to each point. The trajectory is calculated as follows. pi j (γi (t)) = e−
di j D
γi (t)
.
3.7 TWC-Gamma
43
The output is N different discrete trajectories, where N is the number of assigned points, connecting the center of mass to each point. The Semantic TWC gamma changes the above to pi j (γi (t)) = e−
[C M] di j +di j D
γi (t)
.
The trajectory are made up of the points T W C x (γi (t)) =
1 N
pi j (γi (t))
N
pi j (γi (t))xi ,
j=1
j=1
T W C y (γi (t)) =
1 N
pi j (γi (t))
N
pi j (γi (t))yi ,
j=1
j=1
where D is the maximum of pseudo-distances and γi (t + 1) = γi (t) + . The main idea behind the creation of gamma trajectories is that at an early stage of the process, the points have somehow begun to “communicate” with each other. Since the system is not yet organized or optimized in principle, the “messages” sent take on random trajectories, thus favoring the passage through the center of mass. For this reason, it has been hypothesized that “communication” in the first future will take place using the center of mass as an intermediary. Remark 3.28 The gamma scalar field may be thought of as a further qualification of the beta field, which transforms attraction strengths into intensities of network interaction. In fact, it may be regarded as a “prediction” of the evolution of the beta field, when the points start communicating in a non-organized way. The gamma map needs that a “simulation of messages” exchange among the assigned points is introduced. For this reason gamma map may be considered the immediate future of the beta map. Consequently, gamma map is hypothesized to have a predictive nature. In a situation where no point “knows” the position of the other, in order to optimize the path of its messages to the first neighbors, the center of mass tends to become the first addressee of all the points. Consequently, it is useful to assume that in this phase the center of mass works as the central hub where all the messages are distributed from each point to any other point. The path linking each point to the center of mass can be simulated taking account all the other paths, in order to generate a systemic picture where each path is influenced by all the others. Figure 3.11 shows the gamma paths between the twelve point example and the center of mass.
44
3 The Theory of TWC
Fig. 3.11 Gamma path of the 12 point example
3.7.2 Gamma Map The gamma map is the scalar field generated calculating the distance of each grid point of the box from all the gamma paths. The gamma map, or gamma scalar field, describes the extent to which each point of the space is activated by its closeness to any of the points belonging to any of the T W C(γi (t)) trajectories. The input to this part of the algorithm consists of all the gamma paths, that is, the N vector of coordinates T W C(γi (t)) =
T W C x (γi (t)), T W C y (γi (t))
, t = 1, 2, ...Q i .
A space U , usually a square, is a subset of the plane on which to the scalar field is defined. The trajectory is calculated as follows. m kγi (t)
2 = (xk − T W C x (γi (t)))2 + yk − T W C y (γi (t)) ,
For Semantic TWC the change is 2 dkγi (t) = (xk − T W C x (γi (t)))2 + yk − T W C y (γi (t)) 1 [C M] d N − 1 n kn N
Mean Di[C M] =
1 [Euclidean] = d N − 1 n kn dkγi (t) N1 nN Mean Di[C M] = 1 N [Euclidean] n Mean Di N N
Mean Di[Euclidean] m kγi (t)
3.8 TWC-Theta
45
Fig. 3.12 Gamma map for the 12 point example
1 ck = N
i=1 Q i
N V −1
e−
m kγ (t) i D
β∗
,
i=1 j=0
where the points (xk , yk ) are points of U and Q i is the number of points making up the ith trajectory. The gamma map for the 12 point example is given next in Fig. 3.12.
3.8 TWC-Theta TWC—Theta consists of the theta path, theta distances, and uses the minimal spanning tree to depict the connections between the assigned points. These connections can be interpreted as the possible pathways of the underlying process being modeled. We begin with the theta path.
3.8.1 Theta Paths The N (N − 1)/2 gamma paths that connect directly each pair of points of the data set, according to their nonlinear distances from the center of mass consists of what we call the theta path. In this way, for each possible interaction between any couple of points, we can determine, on the basis of the curvature of the corresponding nonlinear
46
3 The Theory of TWC
edge, what is the spatial direction that exerts the most significant interference on that interaction. What is needed is the gamma paths as input and the coordinates of assigned points. What results for the algorithm are the N (N − 1)/2 trajectories connecting each assigned point with all the others. It is a complete regular graph with nonlinear edges. These are calculated as follows. T W C_θ(i j)x (t) = x j + T W Ci x (γi (t)) − T W C jx (γi (t)) , T W C_θ(i j) y (t) = y j + T W Ci y (γi (t)) − T W C jy (γi (t)) .
(3.39) (3.40)
After an initial random interaction, it is assumed that the points start to “communicate” (be related or connect) in a more direct and organized way. For this reason, based on gamma trajectories, the process is thought to evolves towards a further future that directly connects each point with each other without the need to pass through the center of mass.
3.8.2 Theta Distances A matrix that contains all the nonlinear distances between all the assigned points is needed as input. It quantifies all the nonlinear connections (interactions) between the points of the spatial distribution. What we call the theta distances consists of a matrix that contains all the nonlinear distances between all the assigned points. It quantifies all the nonlinear connections, that is, “interactions” between the points of the spatial distribution. What is needed for the algorithm are the theta paths. The results is the N × N matrix, where N is the number of the assigned points. The algorithm calculates this in the following way. Z i j −1
θi j =
dist T W C_θ(i j)x (t), T W C_θ(i j) y (t) ,
t=1
Z i j = min Q i , Q j , where Q i , Q j are the number of points of gamma paths and dist (_, _) is the usual Euclidean distance between two consecutive points of the same theta path, those defined by Eqs. (3.39) and (3.40). The theta distance arises from the need to measure distances between points not only in Euclidean terms but taking into account past history and the way they interact. It is therefore based on the length of the theta trajectories that join a pair of points knowing that these take into account the gamma paths.
3.8 TWC-Theta
47
3.8.3 Nonlinear Minimum Spanning Tree The nonlinear minimum spanning tree (NL-MST) represents the minimum-energy tree configuration that connects all the points of the distribution, pruning away all the edges that are not indispensable to maintain the connectedness among points by means of the most relevant interactions. The tree requires theta distances and Euclidean distances and produces the NL-MST with nonlinear edges. The calculation algorithm is provided with the following characteristics. 1. The MST is calculated from the Euclidean distance matrix using any well-known Kruskal algorithm (see [3]). 2. The list of linear edges chosen by the algorithm is selected. 3. The linear edges are replaced by nonlinear theta trajectories. Thus, TWC uses two associated algorithm that are typically used in a graph theoretic and in the artificial adaptive systems setting (see [2]). For the context of TWC, the MST is used in conjunction with TWC-Theta, in which the associated graph between the nodes (geographic locations of the given data) indicating paths between the nodes of greatest significance is generated. Recall that MST finds an acyclic subset T of nodes of all edges E between the geographical locations given by the data that connects all of the vertices in the graph whose total weight is minimized, where total weight in our case is given by D(Tr ) =
N v −1
dˆirk jrk ,
(3.41)
k=0
and Nv is the number of vertices (geographic locations given by the data), Tr is called the r th tree. The MST is the Tr ∗ tree whose weighted sum of edges attains the minimum value, that is, M ST = Tr ∗ = min {d(Tr )} .
(3.42)
Kruskal [3] found an algorithm to determine the MST of any undirected graph in a quadratic number of steps for the worst case. Obviously, the Kruskal algorithm generates one of the possible MSTs but not all of them. The connections for the 12-Point example is depicted in Fig. 3.13. The NL-MST for the 12-Point example is depicted in Fig. 3.14.
3.8.4 Theta Map The theta map that is calculated contains the future of the gamma map. What is required for the algorithm is the NL-MST, and a space U , usually a square and
48
3 The Theory of TWC
Fig. 3.13 Nonlinear connection for the 12-point example
Fig. 3.14 NL-MST for the 12-point example
subset of the plane, on which to define the scalar field. Also, the beta star, and D, the maximum of pseudo-distances, are also needed. The calculations are as follows.
Ikθz (t) =
xk − Dir ect Path_θz x (t)
2
2 + yk − Dir ect Path_θz x (t)
and for Semantic TWC we have 2 2 dkθz (t) = xk − Dir ect Path_θz x (t) + yk − Dir ect Path_θz x (t) .
3.8 TWC-Theta
49
Fig. 3.15 Theta map for the 12-point example
1 [C M] d N − 1 n kn N
Mean Di[C M] =
1 [Euclidean] = d N − 1 n kn dkθz (t) N1 nN Mean Di[C M] = 1 N [Euclidean] n Mean Di N N
Mean Di[Euclidean] Ikθz (t)
1 ek = N z=1
Pz N −1
Pz
e−
m kγ (t) i D
β∗
,
z=1 j=0
where the points (xk , yk ) belong to U and Pz is the number of points composing the edge z of the NL-MST from the Dir ect Path_θz . The Theta Map for the 12-Point example is depict in Fig. 3.15.
50
3 The Theory of TWC
3.8.5 Theta Transition Matrix via Markov Chains The theta transition matrix via a Markov chain (see [4]) is a probability matrix based upon the theta distance matrix. The calculation of the transition matrix is obtained as follows. θ˜ = max _θ − θi j if i = j = ˜i j θ˜i j , ij θi j = 1 if i = j θ˜i j Pi j = N θ˜i j k=1
where max _θ − θi j is the maximum theta distance among the points. The transition matrix estimates the probability that any given point in the spatial distribution “communicates” (relates, connects, interacts) with another one as the effect of some random external shock (for instance, a local shortage of some key resource for quasiepidemic processes, or the occurrence of a mutation of the infective agent in an epidemic process). The closer two points in the spatial distribution (according to theta), the more likely that the random shock will force an interaction between them. This assumption can be taken as the basis for building an explicit diffusion model for our spatial distribution.
3.8.6 Discrete Time Markov Chain (DTMC) The DTMC is an explicit dynamical model that allows the computation of the attractor of the dynamics. It uses the theta transition matrix as input and produces a graph of attractors. The components for the calculation of the DTMC algorithm are as follows. 1. The number of calls to be made is fixed. 2. One of the N assigned points is randomly extracted. 3. Another point and a pseudo-random number are randomly extracted. (a) If the pseudo-random number is greater than the transition probability the algorithm returns to point 2. (b) If the pseudo-random number is less than the probability of transition, the passage takes place and the algorithm proceeds from point 3 taking as its first point the last extract. 4. A statistic of the most frequent transitions and attractor points, that is, from which one never manages to exit, is carried out.
3.9 TWC—Iota
51
3.9 TWC—Iota The final algorithm of the TWC suite of algorithms to be discussed is TWC - Iota which is presented next.
3.9.1 Iota Projected Points Each originally assigned point is projected onto the map. Each new point denotes the possible position of a new point in the future. To compute the iota projected points, the selected matrix of meta-distances, according to one of the previous criteria, and the a multi-dimensional scaling (MDS) algorithm, are needed. The results are outputted as a set of new N points where (x, y) are the coordinates of each one. The selected matrix is projected into a 2 − dimensional space by means of the MDS algorithms. If the matrix is N × N , then N new points corresponding to those (meta) distances will be obtained. Clearly the positions of the projected points depend on the selected matrix. What is outputted are the (x, y) coordinates of meta-clusters.
3.9.2 Meta-Distance in TWC—Iota The general topic of meta-distance has been previously mentioned. Recall that a meta-distance matrix associated with N points is the distance matrix of their
dis [0] tances. The input is an N × N distance matrix where D = di j = di j is the Euclidean distance from i to j. Then, di[0] j
" # N # 2 =$ dik − d jk . k=1
The idea is quite simple. It aims to measure how far the distances between each point and all the others are. Iterative Meta-Distance Matrices The iterative meta-distance matrix is " # N
2 # [t] $ di[t+1] dik − d [t] = , t = 0, 1, ..., n . j jk k=1
This calculation of meta-distance will diverge. Thus, there is a convergent iterative meta-distance that is given next. We start with the iterative meta-distance
52
3 The Theory of TWC
Di[t+1] j
" # N
2 # [t] dik − d [t] =$ , t = 0, 1, ..., n . jk
(3.43)
k=1
= f (Di[t+1] ). di[t+1] j j
(3.44)
For the Sematic TWC Eq. (3.44) is modified to be = di[t+1] j
1 [C M] f (Di[t+1] , 0 < wi[Cj M] ≤ 1 ) + 1 − w j ij 2
Di[0] j is the usual Euclidean distance matrix,
f (Di[t]j ) = scale Di[t]j + o f f set. Note that f (Di[t]j ) is a function that linearly scales the input in the range of [0, 1] for
scale Di[t]j = o f f set =
1 max
Di[t]j
− min Di[t]j
min Di[t]j max Di[t]j − min Di[t]j
, .
The sequence will converge since we are iterating on numbers between 0 and 1 (at zero and one it stays at zero and one). Thus, one can stop the iteration when ! !! [t+1] [t] ! !di j − di j ! < ∈ i, j
where ∈ is small. Remark 3.29 Given a distance matrix, it is always possible to consider the rows (or columns) as vectors and calculate again a distance matrix of the same size. In the case that the distance matrices are iteratively scaled in the range [0,1] convergence is ensured. Moreover, it is experimentally verified that clusters are created on the matrix while letting the system evolve until convergence. Meta-Distance Matrix Selection The iterative converging method for meta-distances produces a finite family of matrices. Thus, it is necessary to establish a criterion for selecting one. This is discussed next. The input is a family of meta-distance matrices. We will output a single metadistance matrix. A first meta-distance matrix is selected and it is denoted Di[1] j and a [K ] last meta-distance matrix is selected which is denoted Di j . What we call the delta delta J matrix criterion is given as follows.
3.9 TWC—Iota
53
1. For t = 2, ..., K do steps 2 − 7. N N −1 [t] 2. S = Di[t]j , i=1 j=i+1 N −1 N
3. S [t] =
Di[t]j − Di[t−1] , j
i=1 j=i+1 δ = 1 S [t] > 0 , 4. δ = −1 S [t] < 0 [t]
S 5. J [t] = log |S [t] | δ,
6. 7. 8. 9.
J [t,t−1] = J [t] − J [t−1] , J [t,t−1] − J [t−1,t−2] , J [t,t−1,t−2] = [t,t−1,t−2] ∗ , t = maxt J ∗ Mi j = D [t ] .
3.9.3 Meta-Clusters Once the new points have been projected, one can determine the main clusters of the distribution. These points correspond to the strategic areas for the diffusion of the process that is being analyzed. The algorithm needs the coordinates of new projected points and a clustering algorithm. The results output the (x, y) coordinates of metaclusters. The clustering algorithm is applied to the new points. In order to get the centroids, the following are needed. • The k − Means algorithm is used for each k from 2 to N , where N is the number of points. • For each iteration of the k − Means algorithm, the Davies-Boulding index, [5], is computed. • The iteration relevant to the best index is chosen, so that an optimal k is defined, k∗. • The centroid of each of the k ∗ clusters is considered as one of the meta-cluster. Remark 3.30 It is possible to interpret the points corresponding to the meta-clusters as strategic points for the diffusion of the studied process. The iota points represent where the N assigned points tend to go, when, according to their distribution, they move in order to cluster. In 2−dimensions the iota points tend to converge to their vanishing points. Thus, the vanishing points in the context of epidemics might indicate where the last cases of the epidemic will be located. This means that the iota points tend to converge to their vanishing points. In 2−dimensions there will be two convergent vanishing points, and step by step all the assigned points will collapse in two clusters. Consequently, the migration of the assigned points toward their vanishing points describes a vectorial field. At the end
54
3 The Theory of TWC
Fig. 3.16 Locations of the 12-point example
Fig. 3.17 Migration of the !2-point example
of this process, each one of the assigned points will show many replicas of itself, each one in a different position from its original position. The entire population of these points we call iota points. It is important to note that the iota points can be placed outside of the convex hull of the assigned points. This feature is fundamental in order to understand the capability of the iota points to also detect new areas of interest outside the original convex hull area of the assigned points. Figure 3.16 shows the N (n = 12) assigned points for he 12-Point example and Fig. 3.18 shows the iota points generated from the first ones, during their migration (Fig. 3.17). Remark 3.31 The iota algorithm is based on the iterative calculation of the distances among the assigned points. At each iteration a new meta-distance (distance of a distance) is computed. Because the distance in two dimensions is calculated on the hypotenuse of a right triangle, then at each iteration the distance should increase. However, if after each iteration, the new distance metric is linearly scaled between 0 and 1, then after a finite number of iteration the iota algorithm converges to a fixed
3.9 TWC—Iota
55
Fig. 3.18 Clusters for the 12-point example
Fig. 3.19 Final two clusters for the 12-point example
point, generating two big clusters where each of the assigned points will collapse. Figure 3.19 shows the final step of this process, when all the assigned points collapse into two clusters. The iota algorithm ends its loop of iterations when the matrix of distances at iteration (n) is equivalent to the matrix of distances at iteration (n − 1); that is, it does not change anymore. The iota algorithm does not consider as significant the position of replicas of the assigned points, only when the series converges. The iota algorithm considers a distribution of replicas of the assigned points a salient distribution when the summation of delta delta of the distances at iteration (n) and iteration(n − 1) is the biggest. This happens when the biggest change of regime occurs between a matrix of distances and the previous one. Figure 3.20 shows the dynamics of the deltas (the values of the increment of changes) after any iteration. For our 12-Point Example, this occurs at iteration #14. There is a peak that represents the most salient distribution of replicas of the assigned points. Figure 3.21 shows how each assigned point is projected into its replica at iteration #14. Figure 3.22 shows, finally, the two centroids of the clusters generated by the all the replicas of the assigned points.
56 Fig. 3.20 Dynamics of the delta delta matrices for the 12-point example
Fig. 3.21 Salient Iota replica (Red) of each assigned point (Blue) at iteration 14
Fig. 3.22 Assigned points (Blue), salient clusters (Red), iteration 14
3 The Theory of TWC
3.9 TWC—Iota
57
Cluster #1 Name Fuzzy Clustering Membership Value I D_2 0.7948 I D_3 0.7771 I D_4 0.7513 I D_6 0.7522 I D_7 0.6248 I D_9 0.8576 I D_11 0.8249 I D_12 0.6965 Cluster #2 I D_1 0.7275 I D_5 0.8798 I D_8 0.6456 I D_10 0.5616 Membership Values of the Two Clusters
3.9.4 Fuzzy Membership to the Meta-Clusters Through the clustering algorithm each point is assigned to one of the meta-clusters. Besides the meta-clusters, the coordinates of meta-clusters are needed and for each of the coordinates, a (fuzzy) membership is calculated. This (membership) value for each of the points and each of the meta-clusters is computed. The calculations is as follows. dic μck ( pi ) = 1 − G k j dic j where μck ( pi ) = the membership of the ithpoint with respect to the kth cluster, dick = distance between the ithpoint and the centroid of the kth cluster, dic j = distance between the ithpoint and the centroid of the jth cluster, G = the number of clusters. Remark 3.32 Through the fuzzy membership function it is possible to quantitatively evaluate how much each point belongs to one of the two clusters. This procedure is general, but it’s usually computed to know how much each projected point is typical of its centroid.
58
3 The Theory of TWC
Iota points and iota clusters can be considered to represents the virtual dynamics that the assigned points could manifest when a perturbation is activated at the center of mass of the points distribution. A biological example of this dynamics could be represented by a herd of herbivores (assigned points) whose center is suddenly perturbed and consequently the herd is pushed to split in two clusters. The centroids of this new clusters (iota clusters) are the best positions for predators to attack the herd after the perturbation.
3.9.5 Iota Map The iota map is generated by the distance of each generic point of the two-dimensional space from each projected point of the meta-distance algorithm, weighted by the distance that each projected point has from its original point. The farther away a projected point is from its original point, the stronger the weight. The coordinates of assigned points (PS ), the coordinates of projected points (PP ), the coordinates of the points in space U, from which one wants to calculate the scalar field (Pg ), and the maximum of pseudo-distances are needed as inputs for this algorithm. What this produces is a scalar field that is represented by a heat map. The calculations are done as follows.
Mdk =
Dwi =
%
N
PSk(x) − PPk(x)
k=1 −Dwi
Ai = e
2
2 + PSk(y) − PPk(y) , %
2
2 PPk(x) − Pgk(x) + PPk(y) − Pgk(y)
Mdk 1 − N j=1 Mdk
Max D
,
,
where Max D is the maximum distance among the assigned points. Remark 3.33 TWC—Iota takes into account how each point is projected on the map. For this reason, in the case of processes related to epidemics, it is interpreted as the level of infectivity of the different areas. The iota map represents the main area of clustering of the assigned points. In order to understand the meaning of the iota map, it is important to consider two distances: • The distance between each assigned point and its projection into an iota point; • The distance between each iota projection point and any generic grid point. Figure 3.23 shows these key distances. The iota map is the results of the distance of each grid point from all iota points, where each iota point is weighted by its distance from the assigned point to those
3.9 TWC—Iota
59
Fig. 3.23 Iota map of two distance measures
of its projection. The iota map appears as a “hot map” in which its dark red area is far away for all the assigned points. The iota map points out a new zone from where or toward where the assigned points tend to come or tend to go. The iota algorithm, in practice, could detect a critical piece of information about the past or about the future of the system of points. The forward or the backward direction of its meaning depends on the previous history of the assigned points. Figure 3.24 shows the iota map.
3.9.6 Meta-Beta Once the plane has been repopulated with new possible future points, it is possible to reapply the algorithm to obtain the TWC-Beta in order to have the estimate of the current distribution, in a hypothetical future. The inputs of this algorithm are the coordinates of the points in the space U from which one wants to calculate the scalar field, the coordinates of assigned points, the coordinates of the projected points (hereafter both are considered as new assigned points), the maximum Euclidean distance among the new assigned points, the vector of non-parametric attraction strength computed considering the new assigned points, and the beta star relevant to the new assigned points. The output is a K × K matrix where K × K is the number of points of the space U . Therefore, in the end one will get an activation value in [0, 1] for each point in the considered space. The calculations are as follows.
60
3 The Theory of TWC
Fig. 3.24 Iota map for the 12-point example
nk j =
xk − x j
2
2 + yk − y j ,
N n jk ∗ 1 w j e− D β bk = N j=1
where k is the index for each point of the plane and j is the index for each new assigned point. Remark 3.34 Thus, meta-beta allows a projection of the possible future evolution of the process. Clearly, in addition to the TWC Meta-Beta, it is possible to replicate all the other types of the TWC theory (Meta-Gamma, Meta-Theta, Meta-Iota).
3.10 Summary A brief written and visual summary of the concepts developed in this is useful at this point. TWC is an adaptive algorithm that generates a set of quantities able to transform a static distribution of points in the planar space into a system of points with a past and future history. The following is a brief list of these quantities. 1. Pseudo Distance (metric of the system of points) 2. Alpha Point, Alpha Path and Alpha Map (the past of the system of points)
3.10 Summary
Fig. 3.25 A synthesis of TWC items
Fig. 3.26 TWC maps
61
62
3 The Theory of TWC
3. 4. 5. 6.
Beta Parameter and Beta Map (the present of the system of points) Gamma Paths and Gamma Map (the immediate future of the system of points) Theta Nonlinear MST and Theta Map (the medium future of the system of points) Iota Points, Iota Clusters and Iota Map (the final future or first origin of the system of points)
Figure 3.25 below synthesizes these items. Figure 3.26 shows the different maps produced by TWC in the points distribution analyzed for the 12-Point Example.
References 1. M. Buscema, M. Breda, E. Grossi, L. Catzola, P.L. Sacco, Semantics of point spaces through the topological weighted centroid and other mathematical quantities: theory and applications, Chapter 4, in Data Mining Applications Using Artificial Adaptive Systems, ed. by W.J. Tastle (Springer Science+Business Media New York, 2013), pp. 75–139 2. M. Buscema, W. Lodwick, M. Breda, G. Massini, F. Newman, M.A. Zeydabadi, in Artificial Adaptive Systems Using Auto Contractive Maps: Theory. Applications and Extensions (Springer, 2018). https://doi.org/10.1007/978-3-319-75049-1 3. J. B. Kruskal, On the shortest spanning subtree of a graph and the traveling salesman problem, Proceedings of the American Mathematical Society, vol. 7, Num. 1, pp. 48–50 (1956) 4. B. Bolch, S. Greiner, H. de Meer, K.S. Trivedi, K.S., Queueing Networks and Markov Chains, 2nd ed. (Wiley, New York, 2006) 5. D.L. Davies, W. Boulding, A clustering separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 1, Num. 2, 224–227 (1979)
Chapter 4
Illustrative Examples
Two simple examples are presented next to further illustrate the TWC algorithms so that a sense of how the algorithms analyze geographic data work. The example of Chap. 3 came from an application. The following two examples are made up. The first example consists of 10 points and is one in which there are two points that are “outliers”. The second example consists of 11 points and has a more random distribution set of points. We use the equations developed in Chap. 3 and apply them to these two examples using MATLAB whose code can be found in the Appendix.
4.1 Two Simple Example Data Sets The two example data sets are presented next followed by their corresponding TWC maps. Example 4.1 Example data set 1 of 10 points (with two “outliers”) Data P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
x 2 2 35 28 30 34 35 33 31 28
y 11 13 14 14 15 10 12 16 10 12
(4.1)
is depicted in Fig. 4.1, which shows the coordinates of points. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. Buscema et al., The Topological Weighted Centroid: A New Vision of Geographic Profiling, Studies in Computational Intelligence 1095, https://doi.org/10.1007/978-3-031-28901-9_4
63
64
4 Illustrative Examples
Fig. 4.1 10 point illustrative Example 4.1
Example 4.2 Example data set 2 of 11 points Data P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11
x 3 13 2 6 11 5 9 7 4 15 8
y 14 25 8 17 23 1 12 33 28 8 18
is depicted in Fig. 4.2, which shows the coordinates of points.
(4.2)
4.1 Two Simple Example Data Sets
65
Fig. 4.2 11 point illustrative Example 4.2
Recall that the attraction force is defined as 1 − di j α wi = e D N − 1 j=i N
which, for the 10 point example, of TWC-Alpha is Data P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
x 2 2 35 28 30 34 35 33 31 28
y 11 13 14 14 15 10 12 16 10 12
wi 0.4084 0.4091 0.6900 0.7206 0.7221 . 0.6915 0.6921 0.6974 0.7127 0.7207
Table: Attraction Strength of Points of Table 4.1
(4.3)
66
4 Illustrative Examples
Fig. 4.3 Attraction strength field of 10 points
The depiction of the 10-point example as an inverted attractive force field is illustrated in Fig. 4.3 below. The attractive force map is done in a similar way as that for the 11 point example.
4.2 TWC Maps This section outlines the aspects that contribute to formation of the four (TWC— Alpha, TWC—Beta, TWC—Gamma, and TWC—Theta) of the five different TWC algorithms omitting the TWC-Iota analysis, and their associated scalar fields, their maps, in greater detail with respect to the two example data sets of 10 and 11 points. As we have already mentioned, TWC is a whole theory composed of several elements. Each of the elements is indispensable for a general understanding of the process.
4.2 TWC Maps
67
Figure 3.25, depicted the five TWC algorithms four of which we discuss next with respect to the 10 and 11 point simple examples. Remark 4.1 The MATLAB code used to generate the maps and graphs associated with this section can be found in the Appendix.
4.2.1 TWC-Alpha The value of α at which free energy attains its maximum value corresponds to the α where the points of the distribution manifest the maximum level of attraction strength among them. Figure 4.1 shows the movement of the alpha value from the center of mass to the position of maximum attraction for the Example 4.1 defined over U = [0, 60] × [0, 36] ⊂ R2 . Intuitively, the movement makes perfect sense. As can be seen from Fig. 4.4, the algorithm is moving the alpha point toward a location where it can “hear” sounds from each of the data points in the most effective way. Since this is a “patterned” set of points, the movement is also “patterned”. In particular, there is symmetry (top/bottom) so the movement is along a straight line.
Fig. 4.4 TWC alpha point movement Example 4.1
68
4 Illustrative Examples
The alpha point location is where the free energy is maximal and perhaps one can detect that there is a “turning back” in Fig. 4.4, it is faint. The alpha point corresponds to where the path turns back. For Example 4.2, Fig. 4.5 depicts the movement that identifies the alpha point. Since the points are random in the grid, the maximum alpha where it can “hear” all of the sounds coming from the data points most clearly would be close to the center of mass of the given data points. A zoom of the subtle movement from the center of mass to the alpha point is depicted in Fig. 4.6. There is only a small change from the center of mass since the points are more or less randomly located on the grid. That is, the center of mass is relatively near the location from which sounds emanating from the 11 points can all be heard. The alpha point actually moves a slight distance and stops as soon as the free energy decreases (also corresponds to the minimal entropy), which is located at the first turning point. We let the algorithm continue for illustrative purposes, that is, so the location which is almost at the center of gravity (right endpoint) can be better seen.
Fig. 4.5 TWC alpha point movement Example 4.2
4.2 TWC Maps
69
Fig. 4.6 Zoom of the movement of the alpha point—Example 4.2
Fig. 4.7 TWC-Alpha Example 4.1
The map (the “heat” map) associated with the alpha point for Examples 4.1 and 4.2 are presented next. For Example 4.1, since there is a relatively patterned set of points (symmetric about the a horizontal line) with the location of the “sounds” more or less circular with two outliers, we see a relatively but somewhat patterned distribution a little off center of the circle of points as can be verified in Fig. 4.7. TWC-Alpah for Example 4.2, since it consists of a random set of point, has its alpha point tightly located, see Fig. 4.8. That is, there is more of a single location from which sounds can be heard coming from the points of the given data points but the center of mass is roughly where all sounds are best heard.
70
4 Illustrative Examples
Fig. 4.8 TWC-Apha Example 4.2
4.2.2 TWC-Beta The TWC-Beta maps were generated in several formats. Recall that TWC-Beta is associated with where points in the short term future will locate themselves. It is the first prediction of the movement of the data points in the short term. Notice that the TWC-Beta algorithm is beginning to “hedge” its bets. Using the analogy of predator/prey, if the configuration of prey is Example 4.1, the predator would locate itself at the alpha point depicted by Fig. 4.7. As time goes on, the predator might consider the outliers and spend some of its energy according to the TWC-Beta map depicted by Fig. 4.9. Another view of the TWC-Beta is Fig. 4.10.
Fig. 4.9 TWC-Beta Example 4.1
4.2 TWC Maps
71
Fig. 4.10 TWC-Bet TWC-Beta Example 4.1 View 2 Fig. 4.11 TWC-Beta Example 4.2
The TWC-Beta map for the Example 4.2 data, Fig. 4.11, being more of a random distribution of points, does not change much compared with its TWC-Alpha map. This is to be expected since a short term future configuration would be roughly the same as the past configuration.
72
4 Illustrative Examples
4.2.3 TWC-Gamma The TWC-Gamma further projects the dynamics of data set. Compared with TWCApha and TWC-Beta the configuration it seems to be focusing on the set of the original individual data points. Figure 4.12 is the illustration of TWC-Gamma for Example 4.1. In the predator/prey analogy, the predator focuses, in the near future, on each of the prey. Another view of this configuration is Fig. 4.13. Associated with TWC-Gamma are associated, with respect to the predator/prey analogy, the path over which the predator would follow to attack the prey. It is clear, from Fig. 4.14 that the paths would be what is expected from such a configuration. The TWC-Gamma map for Example 4.2, Fig. 4.15, shows a similar concentration of “energy” around the original set of individual data points. Another view of this configuration is given by Fig. 4.16. The associated gamma path for Example 4.2 is illustrate in Fig. 4.17. Note that paths are most strongly in the vertical direction rather than the horizontal indicating paths that a predator most likely might follow in pursuit of prey, or pathways a disease outbreak might follow.
4.2.4 TWC-Theta TWC-Theta produces a longer term projection of the evolution of the dynamics of the data set. For Example 4.1, the configuration represented as a graph looks at how the points organize themselves over a longer period of time (see Fig. 4.18). The links between the points are as might be expected from this data set. The undirected graph of weights for Example 4.1 is depicted by Fig. 4.19.
Fig. 4.12 TWC-Gamma Example 4.1
4.2 TWC Maps
Fig. 4.13 TWC-Gamma Example 4.1 View 2
Fig. 4.14 TWC-Gamma paths Example 4.1
73
74
Fig. 4.15 TWC-Gamma Example 4.2
Fig. 4.16 TWC-Gamma Example 4.2 view 2
4 Illustrative Examples
4.2 TWC Maps
Fig. 4.17 TWC-Gamma path Example 4.2
Fig. 4.18 TWC-Theta Example 4.1
75
76
Fig. 4.19 TWC-THETA MST graph of weights Example 4.1s
Fig. 4.20 TWC-Theta Example 4.2
4 Illustrative Examples
4.3 Summary
77
For Example 4.2, the pattern is roughly a connected graph of the given data points as they are more or less randomly distributed. The graph associated with TWC-Theta for Example 4.2 is illustrated by Fig. 4.20. The MST weighted undirected graph for Example 4.2 are similar to Example 4.1 but quite “busy”, so it is not included.
4.3 Summary The two simple example illustrate the way that the TWC algorithms work on these two data sets. The evolution of the points from the point of view of free energy and entropy can be seen to be compatible with how these two principles work.
Chapter 5
Advanced TWC Topics
The previous chapters mentioned two types of analyses, the inclusion of semantic data associated with spatial data and the inclusion of local effects when they are significant so that these are not “washed out” by the global effects. This chapter begins with the development of how TWC balances local effects with the global ones. Then, this is followed by the discussion of how to incorporate what we term “semantic” data associated with geographic (location) data.
5.1 TWC Windowing TWC-Widowing algorithm proceeds as follows. For each point of the data set selected a priori, neighboring points of the data set define its window. The size of the window can be small or large. The five TWC algorithms can then be applied in each window. After all the analyses (TWC—Alpha, Beta, Gamma, Theta, Iota) on each window is complete, all TWC maps are superimposed as a single map. The window maps, which depend on their size, might have large or small number of overlaps. This will make the superimposed TWC a network. Basically, each window around a point can be interpreted as a local TWC analysis between that point with its associated group of points and the superimposed TWC network over all data points that are its global analysis. TWC-Windowing selects an a priori integer K < N for segmenting (windowing) the N data points. The number of points in each window is P = N /K . In practice K = d_TWC_CM(n-1)& d_TWC_CM(n)>= d_TWC_CM(n+1) n_bet=n; end end d_TWC_CM_bet=max(sqrt((TWC_bet_x-CM_x).ˆ2+(TWC_bet_y-CM_y).ˆ2)); d_TWC_CM(n_bet); bet_str=beta(n_bet); %%%%%%%%%%%%%%%% del=input(’del= ’); % For example del=1; x_p=min(x)-30*del:del:max(x)+30*del; y_p=min(y)-30*del:del:max(y)+30*del; for k1=1:length(x_p) for j1=1:length(y_p)
7.3 TWC-Gamma
for k2=1:length(x_p) for j2=1:length(y_p) n(k1,j1,k2,j2)=sqrt((x_p(k1)-x_p(k2))ˆ2+(y_p(j1)-y_p(j2))ˆ2); end end end end for i=1:N F(i)=0; for j=1:N F(i)=F(i)+exp(-d_bar(i,j)/D)/(N-1); end end for k1=1:length(x_p) for j1=1:length(y_p) b(k1,j1)=0; for i=1:N n(k1,j1,i)=sqrt((x_p(k1)-x(i))ˆ2+(y_p(j1)-y(i))ˆ2); b(k1,j1)=b(k1,j1)+F(i)*exp(-n(k1,j1,i)*bet_str/D)/N; end end end %%%%%%%%%%%%%%%%%%%%% for i=1:length(x_p) for j=1:length(y_p) xx(i,j)=x_p(i); yy(i,j)=y_p(j); end end figure(3) contourf(xx,yy,b),title(’TWC\beta’) figure(4) surface(x_p,y_p,b’),title(’TWC\beta’) figure(5) surf(x_p,y_p,b’),title(’TWC\beta’)
7.3 TWC-Gamma %TWC_gamma_Code clear all % A is matrix of x and y coordinates. A = importdata(’xy.dat’); x=A(:,1);
157
158
7 Appendices—MATLAB Programs
y=A(:,2); figure(1) plot(x,y,’*b’),xlabel(’x’),ylabel(’y’), axis equal hold on N=length(x); %dij %dij for i=1:N for j=1:N d(i,j)=sqrt((x(i)-x(j))ˆ2+(y(i)-y(j))ˆ2); end end D=max(max(d)); del_gam=input(’del_gam= ’); %For example: del_gam=0.01; n_max=input(’n_max= ’); %For example: n_max=500; for n=1:n_max gamma(n)=(n-1)*del_gam; for i=1:N TWC_gam_x(i,n)=0; TWC_gam_y(i,n)=0; Z(i,n)=0; for j=1:N P_gam(i,j,n)=exp(-d(i,j)*gamma(n)/D); TWC_gam_x(i,n)=TWC_gam_x(i,n)+P_gam(i,j,n)*x(j); TWC_gam_y(i,n)=TWC_gam_y(i,n)+P_gam(i,j,n)*y(j); Z(i,n)=Z(i,n)+P_gam(i,j,n); end TWC_gam_x(i,n)=TWC_gam_x(i,n)/Z(i,n); TWC_gam_y(i,n)=TWC_gam_y(i,n)/Z(i,n); figure(1) plot(TWC_gam_x(i,1),TWC_gam_y(i,1),’*r’,TWC_gam_x(i,n),TWC_gam_y (i,n),’.g’),grid on drawnow end end
7.4 TWC-Theta %TWC_theta_Code clear all % A is matrix of x and y coordinates. A = importdata(’xy.dat’); x=A(:,1);
7.4 TWC-Theta
y=A(:,2); figure(1) plot(x,y,’*b’),xlabel(’x’),ylabel(’y’), axis equal hold on N=length(x); %dij %dij for i=1:N for j=1:N d(i,j)=sqrt((x(i)-x(j))ˆ2+(y(i)-y(j))ˆ2); end end Dmax=max(max(d)); for i=1:N for j=1:N d_bar(i,j)=0; if i˜=j d_bar(i,j)=sum(d(i,:))-d(i,j); end end end d_bar/(N-2); del_d_n=d-d_bar; D=max(max(d_bar)); %w del_beta=input(’del_beta= ’); %For example: del_beta=0.01; n_max_b=input(’n_max_b= ’); %For example: n_max_b=10000; %w(alpha) for n=1:n_max_b beta(n)=(n-1)*del_beta; for i=1:N w_beta(n,i)=0; for j=1:N w_beta(n,i)=w_beta(n,i)+exp(-d_bar(i,j)/D*beta(n)); end end end w_beta=w_beta/(N-1); %w_beta=w_beta/N; %TWC_beta for n=1:n_max_b TWC_bet_x(n)=0; TWC_bet_y(n)=0; Sum_w_beta(n)=0; for i=1:N TWC_bet_x(n)=TWC_bet_x(n)+w_beta(n,i)*x(i);
159
160
7 Appendices—MATLAB Programs
TWC_bet_y(n)=TWC_bet_y(n)+w_beta(n,i)*y(i); Sum_w_beta(n)=Sum_w_beta(n)+w_beta(n,i); end TWC_bet_x(n)=TWC_bet_x(n)/Sum_w_beta(n); TWC_bet_y(n)=TWC_bet_y(n)/Sum_w_beta(n); CM_x=TWC_bet_x(1); CM_y=TWC_bet_y(1); d_TWC_CM(n)=sqrt((TWC_bet_x(n)-CM_x)ˆ2+(TWC_bet_y(n)-CM_y)ˆ2); figure(1) plot(TWC_bet_x(n),TWC_bet_y(n),’.r’) drawnow end for n=2:n_max_b if d_TWC_CM(n)>= d_TWC_CM(n-1)& d_TWC_CM(n)>= d_TWC_CM(n+1) n_bet=n end end d_TWC_CM_bet=max(sqrt((TWC_bet_x-CM_x).ˆ2+(TWC_bet_y-CM_y).ˆ2)); d_TWC_CM(n_bet); bet_str=beta(n_bet); %%%%%%%%%%%%%%%% del_gam=input(’del_gam= ’); %For example: del_gam=0.01; n_max=input(’n_max= ’); %For example: n_max=10000; for n=1:n_max gamma(n)=(n-1)*del_gam; for i=1:N TWC_gam_x(i,n)=0; TWC_gam_y(i,n)=0; Z(i,n)=0; for j=1:N P_gam(i,j,n)=exp(-d(i,j)*gamma(n)/Dmax); TWC_gam_x(i,n)=TWC_gam_x(i,n)+P_gam(i,j,n)*x(j); TWC_gam_y(i,n)=TWC_gam_y(i,n)+P_gam(i,j,n)*y(j); Z(i,n)=Z(i,n)+P_gam(i,j,n); end TWC_gam_x(i,n)=TWC_gam_x(i,n)/Z(i,n); TWC_gam_y(i,n)=TWC_gam_y(i,n)/Z(i,n); figure(1) plot(TWC_gam_x(i,1),TWC_gam_y(i,1),’*r’,TWC_gam_x(i,n), TWC_gam_y(i,n),’.g’),grid on drawnow end end
7.4 TWC-Theta
%%%%%%%%%%% del=input(’del= ’); %For example: del=0.1; x_p=min(x)-5*del:del:max(x)+20*del; y_p=min(y)-5*del:del:max(y)+20*del; for i=1:length(x_p) for j=1:length(y_p) c(i,j)=0; for n=1:N for t=1:n_max m(i,j,n,t)=sqrt((x_p(i)-TWC_gam_x(n,t))ˆ2+(y_p(j)TWC_gam_y(n,t))ˆ2); c(i,j)=c(i,j)+exp(-m(i,j,n,t)*bet_str/Dmax)/n_max; end end end end for i=1:length(x_p) for j=1:length(y_p) xx(i,j)=x_p(i); yy(i,j)=y_p(j); end end figure(2) contourf(xx,yy,c),title(’TWC\gamma’) figure(3) surface(x_p,y_p,c’),title(’TWC\gamma’) figure(4) surf(x_p,y_p,c’),title(’TWC\gamma’) t_max=n_max i_thet=1; TWC_ijt_thet_x=[]; TWC_ijt_thet_y=[]; for t=1:t_max TWC_ij_thet_x=[]; TWC_ij_thet_y=[]; for i=1:N for j=1:N TWC_thet_x(i,j,t)=x(j)+TWC_gam_x(i,t)-TWC_gam_x(j,t); TWC_thet_y(i,j,t)=y(j)+TWC_gam_y(i,t)-TWC_gam_y(j,t); end end end for i=1:N for j=1:N thet_ij(i,j)=0;
161
162
7 Appendices—MATLAB Programs
for t=1:t_max-1 thet_ij(i,j)=(TWC_thet_x(i,j,t+1)-TWC_thet_x(i,j,t))ˆ2(TWC_thet_y(i,j,t+1)-TWC_thet_y(i,j,t))ˆ2; end end end max_thet=max(max(thet_ij)); thet_ij_tilde=max_thet-thet_ij; for i=1:N sum_thet(i)=0; for k=1:N sum_thet(i)=sum_thet(i)+thet_ij_tilde(i,k); end end for i=1:N for j=1:N P_thet(i,j)=thet_ij_tilde(i,j)/sum_thet(i); end end
7.5 TWC-Theta Minimal Spanning Tree %TWC_theta_MST_Code clear all % A is matrix of x and y coordinates. A = importdata(’xy.dat’); x=A(:,1); y=A(:,2); figure(1) plot(x,y,’*b’),xlabel(’x’),ylabel(’y’), axis equal hold on N=length(x); %dij %dij for i=1:N for j=1:N d(i,j)=sqrt((x(i)-x(j))ˆ2+(y(i)-y(j))ˆ2); end end Dmax=max(max(d)); for i=1:N for j=1:N d_bar(i,j)=0;
7.5 TWC-Theta Minimal Spanning Tree
163
if i˜=j d_bar(i,j)=sum(d(i,:))-d(i,j); end endend d_bar/(N-2); del_d_n=d-d_bar; D=max(max(d_bar)); %w del_beta=input(’del_beta= ’); %For example: del_beta=0.01 n_max_b=input(’n_max_b= ’); %For example: n_max_b=10000 %w(alpha) for n=1:n_max_b beta(n)=(n-1)*del_beta; for i=1:N w_beta(n,i)=0; for j=1:N w_beta(n,i)=w_beta(n,i)+exp(-d_bar(i,j)/D*beta(n)); end end end w_beta=w_beta/(N-1); %w_beta=w_beta/N; %TWC_beta for n=1:n_max_b TWC_bet_x(n)=0; TWC_bet_y(n)=0; Sum_w_beta(n)=0; for i=1:N TWC_bet_x(n)=TWC_bet_x(n)+w_beta(n,i)*x(i); TWC_bet_y(n)=TWC_bet_y(n)+w_beta(n,i)*y(i); Sum_w_beta(n)=Sum_w_beta(n)+w_beta(n,i); end TWC_bet_x(n)=TWC_bet_x(n)/Sum_w_beta(n); TWC_bet_y(n)=TWC_bet_y(n)/Sum_w_beta(n); CM_x=TWC_bet_x(1); CM_y=TWC_bet_y(1); d_TWC_CM(n)=sqrt((TWC_bet_x(n)-CM_x)ˆ2+(TWC_bet_y(n)-CM_y)ˆ2); figure(1) plot(TWC_bet_x(n),TWC_bet_y(n),’.r’), drawnow end for n=2:n_max_b if d_TWC_CM(n)>= d_TWC_CM(n-1)& d_TWC_CM(n)>= d_TWC_CM(n+1) n_bet=n end
164
7 Appendices—MATLAB Programs
end d_TWC_CM_bet=max(sqrt((TWC_bet_x-CM_x).ˆ2+(TWC_bet_y-CM_y).ˆ2)); d_TWC_CM(n_bet); bet_str=beta(n_bet); %%%%%%%%%%%%%%%% del_gam=input(’del_gam= ’); %For example: del_gam=0.001 n_max=input(’n_max= ’); %For example: n_max=10000 for n=1:n_max gamma(n)=(n-1)*del_gam; for i=1:N TWC_gam_x(i,n)=0; TWC_gam_y(i,n)=0; Z(i,n)=0; for j=1:N P_gam(i,j,n)=exp(-d(i,j)*gamma(n)/Dmax); TWC_gam_x(i,n)=TWC_gam_x(i,n)+P_gam(i,j,n)*x(j); TWC_gam_y(i,n)=TWC_gam_y(i,n)+P_gam(i,j,n)*y(j); Z(i,n)=Z(i,n)+P_gam(i,j,n); end TWC_gam_x(i,n)=TWC_gam_x(i,n)/Z(i,n); TWC_gam_y(i,n)=TWC_gam_y(i,n)/Z(i,n); figure(1) plot(TWC_gam_x(i,1),TWC_gam_y(i,1),’*r’,TWC_gam_x(i,n), TWC_gam_y(i,n),’.g’),grid on drawnow end end %%%%%%%%%%% del=input(’del= ’); %For example: del=0.1 x_p=min(x)-5*del:del:max(x)+20*del; y_p=min(y)-5*del:del:max(y)+20*del; for i=1:length(x_p) for j=1:length(y_p) c(i,j)=0; for n=1:N for t=1:n_max m(i,j,n,t)=sqrt((x_p(i)-TWC_gam_x(n,t))ˆ2+(y_p(j)TWC_gam_y(n,t))ˆ2); c(i,j)=c(i,j)+exp(-m(i,j,n,t)*bet_str/Dmax)/n_max; end end end end for i=1:length(x_p) for j=1:length(y_p)
7.5 TWC-Theta Minimal Spanning Tree
xx(i,j)=x_p(i); yy(i,j)=y_p(j); end end figure(2) contourf(xx,yy,c),title(’TWC\gamma’) figure(3) surface(x_p,y_p,c’),title(’TWC\gamma’) figure(4) surf(x_p,y_p,c’),title(’TWC\gamma’) t_max=n_max i_thet=1; TWC_ijt_thet_x=[]; TWC_ijt_thet_y=[]; for t=1:t_max TWC_ij_thet_x=[]; TWC_ij_thet_y=[]; for i=1:N for j=1:N TWC_thet_x(i,j,t)=x(j)+TWC_gam_x(i,t)-TWC_gam_x(j,t); TWC_thet_y(i,j,t)=y(j)+TWC_gam_y(i,t)-TWC_gam_y(j,t); TWC_theta_x(i_thet)=TWC_thet_x(i,j,t); TWC_theta_y(i_thet)=TWC_thet_y(i,j,t); i_thet=i_thet+1; TWC_ij_thet_x=[TWC_ij_thet_x,TWC_thet_x(i,j,t)]; TWC_ij_thet_y=[TWC_ij_thet_y,TWC_thet_y(i,j,t)]; end end TWC_ijt_thet_x=[TWC_ijt_thet_x;TWC_ij_thet_x]; TWC_ijt_thet_y=[TWC_ijt_thet_y;TWC_ij_thet_y]; end figure(5) plot(x,y,’*b’,TWC_theta_x,TWC_theta_y,’.r’),xlabel(’x’),ylabel(’y’), title(’TWC\theta’) G_thet=[]; for i=1:N for j=1:N theta(i,j)=0; for t=1:t_max-1 if i˜= j theta(i,j)=theta(i,j)+sqrt((TWC_thet_x(i,j,t+1)TWC_thet_x(i,j,t))ˆ2+ (TWC_thet_y(i,j,t+1)-TWC_thet_y(i,j,t))ˆ2); G_thet=[G_thet;i j theta(i,j)]; end
165
166
7 Appendices—MATLAB Programs
end end end G_thet_MST=[]; for i=1:N for j=i:N if theta(i,j) < theta(j,i) theta_MST(i,j)=theta(i,j); else theta_MST(i,j)=theta(j,i); G_thet_MST=[G_thet_MST;i j theta_MST(i,j)]; end end end G_MST = graph(G_thet_MST(:,1)’,G_thet_MST(:,2)’,G_thet_MST(:,3)’); G_MST.Edges figure(6) p = plot(G_MST,’EdgeLabel’,G_MST.Edges.Weight); tree = minspantree(G_MST); tree.Edges highlight(p,tree)
Index
A Afghanistan, 87, 90, 95, 96, 98, 99, 104 Africa, 104, 128, 136 Archaeology, 105 Artificial adaptive systems, 12, 47 Attraction strength, 11, 25–27, 29, 39, 41– 43, 59, 65–67, 83 Auto-CM , 81–83, 87 B Beta star, 26, 38–41, 48, 59 Brazil, 104, 116 C Canter algorithm, 6, 128, 131, 133, 135, 137, 139, 147, 148 Center of mass, 4, 27–30, 34–40, 42, 43, 45, 46, 58, 67–69, 108, 110, 147, 148 Centroid, 1, 28, 33, 35, 36, 39, 53, 55, 57, 58, 141 Chikungunya, 104 Colorado, 105, 106, 108, 114, 119, 123 Coordinate, 1, 6, 14, 27, 33–35, 39, 40, 42, 44, 46, 51, 53, 57–59, 63, 64, 80, 90, 113, 116, 117, 127, 141, 153, 155, 157, 158, 162 COVID-19, 104, 117, 118 Crime, 4, 105, 106, 114 Cultural activities, 104 D Democratic Republic of Congo, 113
Dengue, 104, 116 Denver, 104–106, 108, 114, 119 Differential-algebraic approach, 6, 7 Diffusion theory, 1 Direct path, 48, 49, 86 Discrete Time Markov Chain (DTMC), 50, 117 Disease dynamics, 104, 119, 123, 124 Distance block distance, 5 decay distance function, 5 energy distance, 14 Euclidean, 5, 14–18, 20, 27, 33, 37, 46, 47, 51, 52, 59, 82, 83, 87 geometric distance, 84, 85 indirect distance, 25 iterative distance, 51, 52, 54 pseudo, 41, 60 pseudo-geographic distance, 84, 85 quasi-distance, 82 reciprocal distance, 16 semantic distance, 84, 85 theta distance, 45–47, 50 travel distance, 4, 5 Drug, 1, 99, 104, 106–108, 110, 112 Drug arrests, 106, 112 Drug trafficking, 99, 104, 110, 112
E Ebola, 104, 113, 114, 128, 136–138 Enigmatic tablets, 105 Entropy, 1, 7, 11, 27–29, 32, 35, 36, 68, 77 Environment, 80
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 M. Buscema et al., The Topological Weighted Centroid: A New Vision of Geographic Profiling, Studies in Computational Intelligence 1095, https://doi.org/10.1007/978-3-031-28901-9
167
168 Environment–Fire, 126 Epidemics, 3, 7, 8, 41, 50, 53, 58, 104, 113, 114, 116, 117, 123, 124, 128–130, 133, 136, 140, 141, 143 Escherichia Coli mutation, 119 Europe, 104
F Fire, 104, 126, 127 Florence Monster, 104 Food poisoning, 104, 124, 128, 129, 132– 136 Frankfurt, 119, 128, 129 Free energy, 1, 7, 11, 29, 30, 32, 34–36, 67, 68, 77 Fuzzy membership, 57
G Geographic information, 4 Geographic profiling, 3–6, 8, 128 Geometric system, 4 German HUS, 104, 128–130 German HUS (Escherichia Coli mutation), 119 Guinea, 113, 114
H Hamburg, 119, 128, 129 Hawaii, 104, 124, 128 Hit Score, 5 Holland, 104 Hot map, 38, 41, 59, 96
I Italian, 104, 106, 107, 126 Italy, 104–106, 126
K Kruskal algorithm, 47
L Latitude, 1, 12, 13, 16, 82, 87, 88, 103, 105, 113, 117, 123, 153 Liberia, 113 Listeria, 104, 123 Location, 1, 3, 4, 6, 7, 12, 26, 38, 47, 54, 67– 69, 79, 80, 106, 107, 113, 123, 124, 127, 140, 141, 147, 148, 151
Index London, 104, 108–112, 114 Longitude, 1, 12, 13, 16, 82, 87, 88, 103, 105, 113, 117, 123, 153
M Markov chains, 50, 117 MATLAB MATLAB codes, 67, 153 Megalithic monuments, 105 Meta-clusters, 51, 53, 57, 124, 125 Middle East, 104 Minimum spanning tree, 47
N Nigeria, 113 Nonlinear minimum spanning tree, 47 North Africa, 104
O Optimization principle, 7 Outbreak, 3, 6, 11, 36, 38, 72, 81, 90, 95, 104, 113, 114, 116, 117, 119, 120, 122– 124, 128, 129, 133, 136, 140, 141, 147, 148
P Prediction, 7, 43, 70, 92, 117, 120 Projected points, 51, 53, 57–59
R Records, 81 Robberies, 1, 25, 103–106, 114 Rocky Mountain Poison Center, 124 Rome, 105
S Scalar field alpha scalar field, 84 beta scalar field, 85 gamma scalar field, 43, 44, 86 Self-attraction strength, 83 Self-Topological Weighted Centroid (STWC), 39 Semeion Research Center, 105 Senegal, 113 Shannon entropy, 28, 29 Sierra Leone, 113 Simulation, 43
Index Square matrix of distances, 20 Statistical thermodynamics, 1 Sweden, 104, 140–142, 147 T Temperature, 27, 29, 30, 32 Terrorism and Criminality, 104 Terrorist attacks, 87, 92, 95, 96, 98, 104, 114 Topological Weighted Centroid (TWC), 1, 28, 33 Transition matrix, 50 Triangular inequality, 16, 19, 20 Tuscan tribes, 105 TWC Algorithms (Methods) TWC algorithms (Methods) TWC-alpha, 1, 35, 39, 65, 67, 69, 71, 117, 129, 131, 132, 136–138, 153 TWC-alpha map, 81, 117 TWC-alpha point, 81, 110, 147 TWC-alpha-star, 35 TWC-alpha trajectory, 26 TWC-beta, 1, 40, 59, 70–72, 128–130, 132, 136–138, 140, 155 TWC-beta map, 70, 71, 81, 137, 143 TWC-beta point, 40, 85 TWC-beta-star, 41 TWC-beta trajectory, 38, 39 TWC-gamma, 1, 42, 66, 72–74, 157 TWC-gamma map, 72, 81, 147 TWC-gamma-path, 42–44, 62, 72, 73, 75, 146 TWC-iota, 1, 66, 120
169 TWC-Iota map, 81 TWC-iota point, 58, 62, 81 TWC maps, 14, 32, 33, 61, 63, 66, 79, 88, 123, 153 TWC points, 29, 33, 83 TWC-theta, 1, 47, 72, 76, 77, 87, 117, 158, 162 TWC-theta-path, 45, 46 TWC-theta trajectory, 81 TWC trajectories, 27 TWC types TWC-frequency, 1, 12 TWC-original, 1, 12, 79 TWC-semantic, 12, 26, 37, 40, 43, 45, 48, 80, 83, 84, 87, 90, 92, 95, 96, 101 TWC-windowing, 1, 12, 79, 117
U Unibomber, 104, 106, 107 USA, 87, 96, 104–106, 117, 118, 123
V Variable, 7, 8, 29, 37, 80, 81, 87 Veneto, 104, 106 Virus, 4, 104, 113
W West Africa, 113 West Nile Virus, 104