122 16 6MB
English Pages 128 [126] Year 2022
Automation, Collaboration, & E-Services
Xin W. Chen
Network Science Models for Data Analytics Automation Theories and Applications
Automation, Collaboration, & E-Services Volume 9
Series Editor Shimon Y. Nof, PRISM Center, Grissom, Purdue University, West Lafayette, IN, USA
The Automation, Collaboration, & E-Services series (ACES) publishes new developments and advances in the fields of Automation, collaboration and e-services; rapidly and informally but with a high quality. It captures the scientific and engineering theories and techniques addressing challenges of the megatrends of automation, and collaboration. These trends, defining the scope of the ACES Series, are evident with wireless communication, Internetworking, multi-agent systems, sensor networks, cyber-physical collaborative systems, interactive-collaborative devices, and social robotics – all enabled by collaborative e-Services. Within the scope of the series are monographs, lecture notes, selected contributions from specialized conferences and workshops.
More information about this series at https://link.springer.com/bookseries/8393
Xin W. Chen
Network Science Models for Data Analytics Automation Theories and Applications
Xin W. Chen Department of Industrial Engineering Southern Illinois University Edwardsville Edwardsville, IL, USA
ISSN 2193-472X ISSN 2193-4738 (electronic) Automation, Collaboration, & E-Services ISBN 978-3-030-96469-6 ISBN 978-3-030-96470-2 (eBook) https://doi.org/10.1007/978-3-030-96470-2 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Contents
1 Network Science Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Network Science Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Network Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Node and Link Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Phase Transition in Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Challenges in Network Modeling of Complex Systems . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 2 4 11 12 13 15
2 Interdependent Critical Infrastructures . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Expert System Modeling and Analysis of Human and Social Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Interdependencies and Critical Components . . . . . . . . . . . . . . . . . . . . 2.3 Networked Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Distributed Collaborative Decision Making . . . . . . . . . . . . . . . . . . . . . 2.5 An Example of Three ICIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Smart Electric Power Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 Multimodal Transportation Network . . . . . . . . . . . . . . . . . . . . 2.5.3 Water-Related Infrastructures . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Conclusions and Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17
3 Public Health . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Process and Tools for Modeling Public Health Networks . . . . . . . . . 3.1.1 Design of Network Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Complex Network Theory and Structural Search . . . . . . . . . . 3.2 Conflict and Error Detection in Health Insurance Claims . . . . . . . . . 3.2.1 Management of Health Insurance Claim Denials . . . . . . . . . . 3.2.2 A Case Study of Insurance Claim Denial . . . . . . . . . . . . . . . . 3.2.3 Modeling Conflicts and Errors in Health Insurance Claims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19 20 23 25 26 27 29 30 31 31 35 36 37 39 39 40 43 44 45
v
vi
Contents
4 Smart and Autonomous Power Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Smart and Autonomous High-Power Converter . . . . . . . . . . . . . . . . . 4.2 Weather-Proof Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Integration of High-Power Converters and Weather-Proof Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Water Distribution Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Design and Maintenance of WDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Data and Web Mining of the WDS in Washington D.C., U.S.: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Water Supply in D.C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 The D.C. WDS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
49 51 51 53 54 55 56 57 60 67 77
6 Transportation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.1 Connected Autonomous Vehicles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.1.1 Centralized Network Control with Decentralized Collision Avoidance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.1.2 Centralized Control with Backup Decentralized Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 6.1.3 Mixed Centralized and Decentralized Control . . . . . . . . . . . . 86 6.1.4 Mostly Decentralized Control with Centralized Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.1.5 Fully Decentralized Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2 Freight Transportation Using Railroad Networks . . . . . . . . . . . . . . . . 88 6.2.1 Scheduling of Railcars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.2.2 Impact of Weather on Railroad Operations . . . . . . . . . . . . . . . 90 6.2.3 Railroad Transportation of Hazardous Materials . . . . . . . . . . 91 6.3 Airport Traffic Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 6.3.1 Congestion and Choke Points . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.3.2 Airport Capacity Utilization, Delays, and Congestion . . . . . 94 6.3.3 Methods of Airport Traffic Control and Runway Traffic Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 7 Adaptive Algorithms for Knowledge Acquisition Over Complex Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Strengths and Limitations of Knowledge Management and Web Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Complex Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Network-Adaptive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Development of Knowledge Acquisition Algorithms . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
109 111 112 113 115 119
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Chapter 1
Network Science Models
The structure and dynamics of complex systems such as supply chains, health networks, terrorist organizations, and smart grid are everchanging. Network science models provide a set of mathematical models which are used in conjunction with statistical tools to identify structures of complex systems. Some well-known network models include Weibull distributions with parameters α and β, power law distributions with a parameter γ , and random networks with a parameter θ , the average node degree. Statistical tools (e.g., the cumulative squared error and Kolmogorov– Smirnov test) are used to determine the best network model for a complex system. After a network model of a complex system is identified, the model can be used to generate statistically different systems to determine system properties and dynamics. The network model can be studied analytically to determine the intrinsic relationship between entities in a system and system properties and dynamics. The basic entities in a network model of a complex system are nodes and links. Nodes in a system perform certain functions. Links represent relationships and interactions between nodes and the flow of other entities from one node to the other. Links may be unidirectional or directional. A directional link is an arc. Links may have a capacity limit, a weight, or are subject to other constraints. Network metrics for a network model are defined and computed using properties of nodes and links. Network metrics measure system properties and dynamics, which are intertwined through the network model of the system. For example, systems described by a random network have the property of a phase transition or bond percolation. When a system’s average degree, represented by the parameter θ , increases from a value that is less than one to greater than one, the system changes from a mostly disconnected network to a connected network. The system dynamics changes as the network metric (i.e., average degree θ ) changes from a value less than one to greater than one. In addition to network metrics, a system requires metrics for individual entities. There are at least two types of individual metrics, node metrics and link metrics, which describe the properties of nodes and links, respectively. For example, the node degree (θ ) is the number of links a node has in a system. It is a node metric and © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 X. W. Chen, Network Science Models for Data Analytics Automation, Automation, Collaboration, & E-Services 9, https://doi.org/10.1007/978-3-030-96470-2_1
1
2
1 Network Science Models
indicates how well the node is connected to other nodes. For another example, the conductivity is a link metric and has a value between zero and one. The larger the conductivity of a link is the more important is the link. If the conductivity of a link between two nodes is one, the two nodes become disconnected if the link is broken or removed. If the conductivity of a link between two nodes is close to zero, the connectivity between two nodes is not affected if the link is broken or removed. The rest of this chapter discusses the important aspects of network science models. The discussion begins with network models for complex systems and then covers network, node, and link metrics, and finally illustrates challenges in network science modeling of complex systems.
1.1 Network Science Models Several network science models, including the Erd˝os-Rényi random network [13– 15], scale-free network [1], Bose–Einstein condensation network [6, 7], and preferential attachment [4] are applied to describe energy grids [4], air traffic [10], drug injectors [30], and other complex systems. The classic model of random networks was first discussed in the early 1950s [28] and was rediscovered and analyzed in a series of papers published in the late 1950s and early 1960s [13–15]. Most random networks assume that (a) undirected links; (b) at most one link between any two nodes; and (c) a node does not have a link to itself. In a random network with n nodes and a probability Pr to connect any pair . The probability of nodes, the maximum number of links in the network is n(n−1) 2 n−1 that a node has degree θ is Pr θ (1 − Pr )n−1−θ , which is also the fraction θ of nodes in the network that have degree θ . The mean degree θ = (n − 1)Pr . The random network is homogeneous in the sense that most nodes in the network have approximately the same number of links. A typical example of the random network is a highway system [5, 20]. If a city is a node and the highway between two cities is a link, most cities in the system connect to approximately the same number of highways. The small-world network (e.g., [29]) has higher clustering coefficient compared to the random network, which means that there is a heightened probability that two nodes will be connected directly to one another if they have another neighboring node in common. There is little evidence that any known real-world system is substantially similar to the small-world network [24]. A common characteristic of both random and small-world networks is that the probability of having a highly connected node (i.e., a large θ ), decreases exponentially with θ ; nodes with large number of links are practically absent [4]. The most studied network model is scale-free networks that have a degree distribution described by power law series. The scale-free network captures the topology of many real-world systems [1, 2, 4, 9, 25]. In scale-free networks, the probability
1.1 Network Science Models
3
Prθ that a node has θ degree follows a power law distribution, Prθ ∝ θ −γ , where γ is between 2.1 and 4 for real-world systems [4]. Compared to the nodes in random and small-world networks, nodes with large θ have a better chance to occur in scalefree networks. Despite its popularity, many complex systems cannot be described by power law series [26]. The Bose–Einstein condensation network [7] was discovered in an effort to model the competitive nature of networks. Many real-world scale-free networks are formed following the rule that older nodes have high probability of obtaining links. This is true in many real-world systems. For example, an old business has more customers than a new business. This rule is not always true, however. Motivated by many realworld systems, a fitness model [6] was proposed to assign a fitness parameter ξi to each node i. A node with large ξi has higher probability to obtain links. ξi is chosen from a statistical distribution ρ(ξ ). When ρ(ξ ) follows certain distributions (e.g., ρ(ξ ) = (λ + 1)(1 − ξ )λ , λ > 1), a Bose–Einstein condensation network forms. The Bose–Einstein network shows “winner-takes-all” phenomena observed in competitive systems. This means that the fittest node acquires a finite fraction of the total links (about 80% [7]) in the network. The fraction is independent of the order of the network (i.e., the total number of nodes). In contrast, the fittest nodes’ share of all links decreases to zero in the scale-free network. The fittest node corresponds to the lowest energy level in the Bose gas. The fittest node acquires most links corresponds to the phenomenon that many particles are at the lowest energy level when the temperature is close to absolute zero. Bose–Einstein condensation was predicted in 1924 by Bose and Einstein and was created in 1995 by Cornell and Wieman who received the Nobel Prize in Physics 2001 together with Ketterle for their work on Bose–Einstein condensation. There are many other network science models. For example, the 9/11 terrorist network (total 63 nodes [22]) is best described by a Weibull distribution with α = 4.73 and β = 1.17 (Fig. 1.1a [18]) according to the cumulative squared error (0.029 for the power law distribution with γ = 1.01, and 0.007 for the Weibull distribution) and Kolmogorov–Smirnov test (0.416 for the power law distribution and 0.0909 for the Weibull distribution). The analysis of the electrical power grid of the western
(a) 9/11 Terrorist Network
Fig. 1.1 Network models of two systems
(b) Electrical Power Grid of The Western United States
4
1 Network Science Models
United States (total 4941 nodes [29]) using the maximum likelihood estimation indicates that the power grid is best described by a Weibull distribution with α = 5.64 and β = 2.03 (Fig. 1.1b). In the Erd˝os-Rényi random networks, the probability of having a highly connected node (i.e., a large degree θ ) decreases exponentially with the degree; nodes with large number of links are absent. Nodes with a large degree have a better chance to occur in scale-free networks than in the Erd˝os-Rényi random networks; this is one of the reasons why scale-free networks are popular. The Weibull networks can have a scale, which is evident in the electrical power grid of the western United States, and allows the existence of nodes with a large degree. Several other network models, including binomial, negative binomial, exponential, Poisson, and splines, may be applied to model complex systems. The maximum likelihood estimation and expectation maximization algorithm [18] can be used to estimate the parameters in these network models. Data for complex systems from various sources are available and network science models can be used to model their structures. Several examples of data sources include: (a) electrical power grids (e.g., the IEEE 57-bus and 118-bus networks (http://www.ee.washington.edu/research/ pstca/) and the electrical power grid of the western United States with 4941 nodes [29], (b) social networks (e.g., an acquaintance network of 293 drug injectors on the streets of Hartford, CT, USA [30], (c) supply chains and transportation networks (e.g., the global network of cargo ports [21], (d) information and communication networks (e.g., the Internet and World Wide Web [1]), and (e) criminal networks (e.g., the 9/11 terrorist network [22]).
1.2 Network Metrics Network metrics are defined over the entire network and describe system properties and dynamics. There are a variety of centrality-based network metrics [16]. This section discusses four practical network metrics: speed, scale, conductance, and homogeneity. Without losing generality, a link is assumed to be unidirectional, and entities can flow from one node to another through both directions of the link that connects the two nodes. In cases where directional arcs are used to model connections between nodes (e.g., electricity flow along one direction in a power grid), modification of metrics can be made. Diffusion speed, S P, indicates how fast information, diseases, products, or services move in a network. Equation (1.1) defines S P. SP =
2
1 i> j di j
n(n − 1)
(1.1)
n is the total number of nodes in a network (i.e., the order of a system). di j is the geodesic distance between nodes i and j in the network. The geodesic distance di j is the distance of the shortest path(s) between i and j, 1 ≤ di j ≤ n − 1 if there exists a path between i and j. di j = 1 if i and j are neighbors and connected by one link. If
1.2 Network Metrics
5
Fig. 1.2 A network of five nodes
1
3
2
i and j are disconnected and there exists no path between i and j,
4 1 di j
5
= 0. In other
words, di j is ∞ if i and j are disconnected, and therefore = 0. 0 ≤ S P ≤ 1. S P = 0 if all nodes are disconnected and there are no links in a network. S P = 1 if the network is a clique in which each node is a neighbor of all other nodes in the network. S P is the average of the inverse of geodesic distance. The larger the S P is, the higher diffusion speed does the network have. Suppose two nodes are removed from a network of five nodes (Fig. 1.2) and the impact of removal on the network’s diffusion speed is evaluated. The five squares in Fig. 1.2 represent five nodes. The numbers in the squares are the indices 5 of nodes. The lines that connect nodes are links. There are total = 10 2 different ways that two nodes are removed from the network. S P\(1∧2)∨(4∧5)∨(1∧5) = 1/1+1/2+1/1 = 0.833 when nodes 1 and 2, 4 and 5, or 1 and 5 are cut from the network; 3 = 0.333 when nodes 1 and 3, 1 and 4, S P\(1∧3)∨(1∧4)∨(3∧5)∨(2∧5)∨(2∧3)∨(3∧4) = 1/1 3 3 and 5, 2 and 5, 2 and 3, or 3 and 4 are cut from the network; S P\(2∧4) = 03 = 0 when nodes 2 and 4 are cut from the network. Removing nodes 2 and 4 minimizes the network diffusion speed. Diffusion scale, SC, of a network indicates the scale of attacks, spread of diseases, or delivery range of products and services. Figure 1.3 shows two networks both of 1 di j
which have a total of 10 nodes. Using Eq. (1.1), S P = 2
2
1 i> j di j
1 i> j di j
10×9
= 0.333 for network
1.3(a) and S P = 10×9 = 0.303 for network 1.3(b); network 1.3(a) has a higher diffusion speed than 1.3(b) does. Network 1.3(b) can, however, deliver entities to a maximum of seven nodes whereas network 1.3(a) can deliver entities to a maximum
(a) Fig. 1.3 Two networks with different diffusion speeds and scales
(b)
6
1 Network Science Models
of five nodes, if one node in either network initiates the deliveries. Network 1.3(b) is more capable than network 1.3(a) is with respect to the scale of deliveries. A network of total n nodes is comprised of m components, m ≥ 1. A component is a set of nodes and links that are connected to each other but are disconnected from other nodes or links in the same network. Let k be the index of components in a network, k = 1, 2, . . . , m. Suppose there is sufficient time for a network to deliver entities to any node that is connected to the source node with the information, products, or services. Equation (1.2) calculates the expected time-insensitive diffusion scale SC.
m m
n k 1 1 × nk − 1 = n 2k − n (1.2) SC = n−1 n n(n − 1) k=1 k=1 n k in Eq. (1.2) is the number of nodes in component k. nnk is the probability that the source node belongs to component k. 0 ≤ SC ≤ 1. SC = 0 if the network has no links and is comprised of n disconnected components each of which have one node. SC = 1 if all nodes in the network are connected to each other directly or indirectly; the network has only one component. SC for networks 1.3(a) and 1.3(b) in Fig. 1.3 are 0.44 and 0.53, respectively. On average, network 1.3(b) can deliver entities on a larger scale than network 1.3(a). Suppose diffusion time (DT ) is limited and there are DT time units for the delivery of entities in a network. Further, it is assumed that it takes t time units to deliver entities between any two directly linked nodes. DT is determined by operations requirements and t is deterministic and determined by system properties. Equation (1.3) calculates the time-sensitive expected diffusion scale with deterministic diffusion times. ⎤ ⎫ ⎧ ⎡ m ⎬ i∈k 1 + j∈k,i = j qi j 1 ⎨⎣ n k ⎦−1 × SC(t) = ⎭ n − 1 ⎩ k=1 n nk ⎡ ⎛ ⎞⎤ m 1 ⎣ ⎝ = qi j ⎠⎦ n(n − 1) k=1 i∈k j∈k,i = j qi j =
1, i f di j ≤ DT t 0, other wise
i = j
(1.3)
qi j in Eq. (1.3) indicates whether node i can deliver entities to node j in component , then j receives k. If di j ≤ DT t entities from i within DT time units and qi j = 1; otherwise qi j = 0. 1 + j∈k,i = j qi j is the total number of nodes that receive (1+ j∈k,i = j qi j ) is the average number of nodes entities from i, including node i. i∈k nk in component k that receive entities if one node in k is the source node that initiates for ∀i, j, i = j, SC = SC(t). If the diffusion time between the delivery. If di j ≤ DT t two directly linked nodes is a random variable T described by a probability density
1.2 Network Metrics
7
function f (T ), Eq. (1.3) can be used to calculate the time-sensitive expected diffusion scale with stochastic diffusion times by replacing t with the random variable T . In addition to speed and scale, network conductance is used to measure system resilience. In a network where all nodes are connected directly or indirectly, n is the total number of nodes and e is the total number of links. A set of ν nodes, 0 < ν < n, or a set of η links, 0 < η < e, may be cut or removed from the network. Two network conductance metrics, node-based network conductance (N C) and linkbased network conductance (LC), describe system resilience to failures and attacks. Equation (1.4) defines the N C. 0 ≤ N C ≤ 1. N C = 0 when removing one node can disconnect the entire network (i.e., all nodes become disconnected). N C = 1 indicates the network is most resilient to node failures or attacks. For example, a clique has N C = 1. NC =
n−1 n−2
min ν
ν 1 − n − ν − (maxk n k ) + 1 n−1
(1.4)
N C takes the minimum over all sets of ν nodes in a network of n nodes. Suppose ν nodes are removed from a network of n nodes. There are n − ν remaining nodes. The component with the maximum number of nodes in the remaining n − ν nodes has maxk n k nodes, which is subtracted from n − ν. n − ν − maxk n k is the number of nodes that are disconnected from the network due to ν nodes being removed. The ν , indicates how resilient a system is. A larger ratio means that a ratio, n−ν−max k n k +1 large number of nodes need to be removed to disconnect a few nodes, and the system is more resilient to node failures or attacks. N C is a worst-case metric since it takes ν . Adding “1” in the denominator is to ensure the minimum of the ratio n−ν−max k n k +1 that the denominator is not equal to zero. Figure 1.4 has two networks, (a) and (b), each of which has 10 nodes. Each node in either network has a degree of three (i.e., θ = 3). In other words, each node has three links that connect the node to three other nodes. N C = 1 for network 1.4(a) when
(a)
(b)
Fig. 1.4 Two networks with different node-based and link-based network conductance
8
1 Network Science Models
one node is removed and the rest of the network remains as a component. N C = 0.1 for network 1.4(b) when one node (either one at the center of the network) is removed and the network is left with two components, a larger component with five nodes and a smaller component with four nodes. Network 1.4(a) is more resilient than network 1.4(b) when nodes fail. The LC is defined in Eq. (1.5) [12, 27]. 0 ≤ LC ≤ 1. LC = 0 if removing a link cuts a network in half and therefore removing half of the nodes in the network. LC = 1 indicates the network is most resilient to link failures or attacks. For example, a clique has LC = 1. LC takes the minimum over all sets of η links in a network of e links. Suppose η links are cut and as a result, the network has a component with the maximum number of nodes, maxk n k ; the remaining n − maxk n k nodes are disconnected from the network. LC is a worst-case metric that determines the minimum ratio of η divided by n − maxk n k + 1. A larger LC indicates that a larger number of links need to be cut to disconnect a few nodes, and the system is more resilient to link failures or attacks. LC of network 1.4(a) is 1 and LC of network 1.4(b) is 0. Network 1.4(a) is more resilient than network 1.4(b) in terms of link failures or attacks. 2 η n+2 min (1.5) − LC = η n − (maxk n k ) + 1 n n+2 Both N C in Eq. (1.4) and LC in Eq. (1.5) are used to evaluate system resilience. The two networks in Fig. 1.4 have homogeneous degrees; all nodes have the same degree. A higher N C indicates a higher LC, and vice versa. If a homogeneous system is resilient to link failures or attacks, it is also resilient to node failures or attacks, and vice versa. When a system has non-homogeneous degrees, the N C and LC may be quite different. Figure 1.5 is a network with 10 nodes. The center node has Fig. 1.5 A network with non-homogeneous degrees
1.2 Network Metrics
9
(b)
(a) Fig. 1.6 Two networks with different homogeneity
nine links whereas the other nine nodes each have one link. The network is highly non-homogeneous. Its N C = 0 whereas LC = 0.4. This network is resilient to link failures or attacks; cutting one link only disconnects one node. It is vulnerable to node failures or attacks, however. In the worst case, the entire network is disconnected when the center node is removed. The examples of network conductance indicate that there is a need to quantify a system’s homogeneity. Suppose node i diffuses entities to other nodes in a network n j =i di j . with n nodes, the average geodesic distance from i to other nodes is j=1, n−1 1 Further, suppose all nodes, 1, 2, …, n, have an equal probability n to be the source node that diffuses entities to other nodes. The average geodesic distance of the network is n n i=1 j=i+1 di j for n > 1. The average geodesic distance indicates the average number n(n−1) 2 of steps it takes to diffuse entities from the source node to other nodes. If we let di j = 0 when i and j are not connected through any path, the average geodesic distance is between 0 and n − 1. The average geodesic distance of a clique is 1. The average geodesic distance of a network without any link is 0. Both a clique and a network without any link are homogeneous networks. Each node in a clique has n − 1 links whereas each node in a network without any link has zero link. Let LG D represent the largest geodesic distance in a network. Equation (1.6) defines homogeneity (H G), which is the ratio of the average geodesic distance divided by the LG D. 0 < H G ≤ 1. Larger H G indicates the network is more homogeneous. For example, clique has H G = 1. When a network does not have n a n any link, LG D = 0 and i=1 j=i+1 di j = 0. It is defined that H G = 1. A network without any link is the most homogeneous because each node has the same number of links (zero link). n i=1 j=i+1 n(n−1)LG D 2
n HG =
di j
=
2
n i=1
n j=i+1
di j
n(n − 1)LG D
(1.6)
10
1 Network Science Models
Table 1.1 H G of networks with n ≥ 3 n f and LG D
HG
HG : n → ∞
nf = n
1
1
1 ≤ nf ≤ n−2 ! n f = 0 ∩ (n ≥ 4) ∩ (LG D = n − 1) ! n f = 0 ∩ (1 ≤ LG D ≤ n − 2)
0.5 < H G ≤ 1 −
n f (2n−n f −1) 2n(n−1)
1 2 3 + 3(n−1) (LG D+1)(LG D+2) ≤ HG 3n(n−1) D−1) 1 − 2(LG D+1)(LG 3n(n−1)
0.5 < H G < 1 1 3
≤
< HG ≤
5 9
0 < HG < 1
Figure 1.6 shows two networks, (a) and (b). Figure 1.6a has five nodes and Fig. 1.6b has seven nodes. Both networks have the same average geodesic distance of two. For example, the average geodesic distance of 1.6(a) is calculated as follows: (1+2+3+4)+(1+2+3)+(1+2)+1 = 2. The LG D of network 1.6(a) is 4 and the LG D 5(5−1) 2
of network 6(b) is 3. For network 1.6(a), H G = 24 = 0.5. For network 1.6(b), H G = 23 ≈ 0.667. Network 1.6(b) is more homogeneous than network 1.6(a). Network 1.6(a) has a chain structure; two nodes at the end have one link whereas three nodes in the middle have two links. Network 1.6(b) has a loop structure, and each node has two links. The H G of network 1.6(b) is less than one although the network seems homogeneous. Network 1.6(b) is not the most homogeneous network (e.g., a clique). The geodesic distance between a node in network 1.6(b) and other nodes is 1, 2, or 3. The H G not only measures the evenness among nodes (e.g., same number of links for all nodes), but also the evenness among links (e.g., same geodesic distances). Let n f represent the number of fully connected nodes in a network of n nodes. A fully connected node has n − 1 links that connect the node to all other nodes in the network (i.e., θ = n − 1). Table 1.1 summarizes the analytical results about H G [11]. Table 1.1 does not include the trivial cases that n = 1 and n = 2. When LG D = 0, H G = 1 by definition for a network without any link. This is not included in Table 1.1. Table 1.1 indicates that H G is largely determined by n f and LG D. When n f = n, the network is a clique and H G = 1; the network is the most homogeneous and resilient. When 1 ≤ n f ≤ n − 2, the network has at least one node that is fully connected; the network is relatively resilient and H G > 0.5. It is noted that n f = n − 1. If n f = n − 1, the only other node must have n − 1 links and therefore n f = n. When n f = 0, the value of H G depends on LG D. When LG D = n − 1, the network has a chain structure and is less homogeneous; H G ≤ 59 . It is noted that when n f = 0 and LG D = n − 1, n = 3. If n = 3, LG D = n − 1 indicates that n f = 1, which contradicts to the condition that n f = 0. When n f = 0 and 1 ≤ LG D ≤ n − 2, the network may have a variety of structures and H G can vary between 0 and 1. In particular, a larger LG D indicates a less homogeneous network whose homogeneity has less variation (a smaller range for H G). A smaller LG D
1.2 Network Metrics
11
indicates a network’s homogeneity can vary broadly (a larger range for H G) depends on its structure.
1.3 Node and Link Metrics In addition to network metrics, there are many node and link metrics [18] that can be used to model systems and analyze system properties and dynamics. For instance, if threenodes might be attacked and fail in a network of 100 nodes, there are total 100 = 161,700 different combinations. To determine the impact of such failures 3 or attacks, a network metric can be calculated repeatedly for 161,700 networks, each of which has 97 nodes. If the total number of nodes in the network increases from 100 1000 to 1000, there are = 166,167,000 different ways that three nodes might be 3 attacked and fail. Compared to the network of 100 nodes, the number of combinations increases more than 1000 times when the number of nodes increases 10 times. Using network metrics is computationally infeasible for the analysis of large systems. A node metric is calculated for a node and indicates the node’s properties such as diffusion speed and scale. Similarly, a link metric is calculated for a link and indicates the link’s properties. The number of times a node metric or link metric is calculated is at most the number of nodes or links in a network, respectively. Using node and link metrics can improve the computation efficiency in the analysis of network science models. Several node metrics (e.g., [8, 16]) may be used to characterize node properties. In addition to the degree of a node, four other node metrics are defined in Eqs. (1.7)–(1.10). The betweenness of a node i (B N i , Eq. (1.7)) measures the frequency with which the node falls on the shortest paths connecting pairs of other nodes. Let j be another index of nodes in addition to i and j. Let g j j be the total number of shortest paths connecting nodes j and j and g j j (i) is the number of shortest paths that connect j and j and contain i. B Ni =
n n g j j (i) g j j j=1 j = j+1
(1.7)
The closeness of a node measures the degree to which the node is close to all other nodes in the system. The closeness metric can be calculated in two different ways: the reciprocal closeness, RC i , and the complementary closeness, CC i . RC i (Eq. (1.8)) is the sum of the reciprocals of the geodesic distances di j , di j > 0, between i and other nodes. CC i (Eq. (1.9)) is the same as RC i if a network has only one (connected) component. If a network has multiple components, RC i cannot be calculated. CC i is calculated by constructing a complementary network of the original. A complementary network is constructed by removing all links in the original network and adding a link between any two nodes in the original network that are not connected by a
12
1 Network Science Models
link. diCj is the geodesic distance between i and j in the complementary network. n i is the number of nodes in the component to which i belongs. n−1 is the closeness n j=1 di j centrality of i. RCi =
n 1 d j=1 i j
n − ni CCi = 1 − n C j=1 di j
(1.8) n−1 n j=1 di j
(1.9)
The eigenvector centrality (Eq. (1.10)), E V i , identifies the importance of node i according to the number and quality of its connections. A node with a large eigenvector centrality is important because either it is connected to more nodes or connected to nodes that have more links. λ is the largest eigenvalue of the adjacency matrix A of a network. Ai j is the i j th element of A; Ai j = 1 if i and j are neighbors and Ai j = 0 otherwise. EV = [E i , E 2 , . . . , E n ]T is the eigenvector corresponding to λ. 1 Ai j E j E Vi = λ j=1 n
(1.10)
Some of these node metrics may be revised to measure link properties. For example, the node betweenness defined in Eq. (1.7) can be revised to measure link betweenness L B l , where l is the index of links. Equation (1.11) defines L B l . g j j in Eq. (1.11) is the total number of shortest paths connecting nodes j and j and g j j (l) is the number of shortest paths that connect j and j and contain link l. L Bl =
n n g j j (l) g j j j=1 j = j+1
(1.11)
1.4 Phase Transition in Networks System properties and dynamics depend on a variety of underlying network properties. One of the most important network properties is phase transition, which indicates a transition between a fragmented network and a network dominated by a large component. A network may be comprised of one or more components. Nodes in the same component are connected to each other directly through links or indirectly through other nodes and links. Nodes from different components are disconnected from each other. The number of nodes in a network (n) or a component (n k ) is the order of the network or component. The largest component order in a network often
1.4 Phase Transition in Networks
13
indicates the network’s ability to perform tasks. For example, each computer in a local area network is a node; the largest component order indicates the largest number of computers that can collaborate to perform tasks. For another example, a customer in a water distribution system is a node; the largest component order indicates how many customers have access to water. A large component that dominates a network has the largest component order with 2 at least n 3 nodes of the network [17, 18]. The order of the large component describes the scale of collaboration and determines system properties. A phase transition may be caused by changes in the network structure such as the average degree, θ . A phase transition reveals a definitive and transformative relationship between network structure and system properties. There is a phase transition or bond percolation [3, 23, 24, 28] from a fragmented random network for the average degree θ ≤ 1 to a random network dominated by a large component for the average degree θ > 1. When the average degree θ ≤ 1, the number of nodes to which there exist paths from an arbitrary node is negligible compared to the total number of nodes n in the network if n is large. When the average degree θ > 1, the percentage of nodes to which there exist paths from an arbitrary node increases rapidly, starting with slope two. For instance, the percentage is 80% when θ = 2. In the language of bond percolation, a spanning cluster forms when the average degree θ > 1. Except for the random network, the phase transition has not been clearly defined for other types of networks. The connectivity, especially how a fragmented network transitions to a well-connected network and vice versa, and under what conditions, is of great importance and requires further investigation. The phase transition or bond percolation of a network significantly affects the system dynamics and properties.
1.5 Challenges in Network Modeling of Complex Systems The study of network properties (e.g., phase transition) is a typical approach to understand system properties and dynamics. The system is modeled as networks whose properties are analyzed to infer system properties and dynamics. In general, there are three approaches in the analysis of systems. First, experiments may be conducted on real-world systems to observe their properties. For instance, Albert et al. [2] studied the Internet and World Wide Web for error and attack tolerance. Results of these studies are limited to specific systems and may not be generalized to others. In the second approach, a network growth model (e.g., preferential attachment [4]) is developed for a system. The growth model is used to generate networks that emulate the system. Experiments are then conducted on these “grown” networks to analyze the system. This approach is essentially the same as the first approach except those systems being studied are not “naturally grown,” which brings uncertainty to the validity of analyses.
14
1 Network Science Models
The third approach is to analyze systems using the network model, which is a mathematical or statistical function that describes the topology or structure of a system. Section 1.1 discusses several network models. There are two different ways to analyze systems using network models: 1.
2.
Derive analytical results from the network model. For example, Erd˝os and Renyi [13–15] showed that there is a phase transition from a fragmented random network with the average degree θ ≤ 1 to a random network dominated by a large component with the average degree θ > 1. A random network is described n−1 using a degree distribution Pr θ (1 − Pr )n−1−θ , where n is the total θ number of nodes, θ is node degree, and Pr is the probability that a link between two nodes exists. θ = (n − 1)Pr . Analytical results unambiguously depict a system’s properties, but can be difficult to derive; and Conduct simulation experiments to analyze systems. Since closed-form analytical results are seldom available, simulation experiments are frequently used to examine system properties. The first step in simulation is to generate networks using the network model. For example, the degree distribution may be used to generate networks [19]. Networks generated in simulation are expected to be similar to the original system from which the network model is extracted.
There are several challenges in using network science models to study complex real-world systems. The first challenge is to identify appropriate network growth models or network models that prescribe a system. A systematic approach is necessary for the validation of a network model for a system. The second challenge is to discover network properties and infer about system dynamics and properties. Analytical results such as a phase transition in random networks are difficult to obtain. Experiments are frequently used to derive network properties. This leads to the third challenge of network science modeling. In practice, simulated networks are quite different from the underlying system upon which the network model is developed. For example, Jahanpour and Chen [19] used four different degree distributions to model the power grid of the western United States, which includes 4941 nodes that form a connected network. A node in the grid represents a station, a substation, or a transformer. Total 800 simulated networks were generated using the four different network models. None of the 800 simulated networks is a connected network. An important objective of data analytics is to extract information from systems. As the amount of data increases, automating data analytics to extract real-time information becomes necessary. Network modeling of complex systems is often the first step to enable data analytics automation. The study of network science models began in 1950s and had a few breakthroughs. Addition development to address ongoing challenges will advance data analytics automation.
References
15
References 1. Albert R, Jeong H, Barabási AL (1999) Internet: diameter of the World-Wide Web. Nature 401(6749):130–131 2. Albert R, Jeong H, Barabási AL (2000) Error and attack tolerance of complex networks. Nature 406(6794):378–382 3. Angeles Serrano M, De Los Rios P (2007) Interfaces and the edge percolation map of random directed networks. Phys Rev E Stat Nonlnear Soft Matter Phys 76(5):56–121 4. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512 5. Barabási AL (2002) Linked: the new science of networks. Perseus Publishing, Cambridge, MA 6. Bianconi G, Barabasi AL (2001) Competition and multiscaling in evolving networks. Europhys Lett 54(4):436–442 7. Bianconi G, Barabasi AL (2001) Bose-Einstein condensation in complex networks. Phys Rev Lett 86(24):5632–5635 8. Borgatti SP (2006) Identifying sets of key players in a social network. Comput Math Organ Theory 12(14):21–34 9. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the Web. Comput Netw 33(1):309–320 10. Chen XW, Landry SJ, Nof SY (2011) A framework of enroute air traffic conflict detection and resolution through complex network analysis. Comput Ind 62:787–794 11. Chen X (2016) Critical nodes identification in complex systems. Complex Intell Syst 1:37–56 12. Chung FRK (1997) Spectral graph theory. American Mathematical Society 13. Erd˝os P, Rényi A (1959) On random graphs. Publicationes Mathematicae Debrecen 6:290–291 (see Newman et al. 2006) 14. Erd˝os P, Rényi A (1960) On the evolution of random graphs. Magyar Tud Akad Mat Kutato Int Kozl 5:17–61 (see Newman et al. 2006) 15. Erd˝os P, Rényi A (1961) On the strength of connectedness of a random graph. Acta Mathematica Academiae Scientiarum Hungaricae 12: 261–267 (see Newman et al. 2006) 16. Freeman LC (1978) Centrality in social network conceptual clarification. Soc Netw 1:215–239 17. Jackson M (2008) Social and economic networks. Princeton University Press, New Jersey 18. Jahanpour E, Chen X (2013) Analysis of complex network performance and heuristic node removal strategies. Commun Nonlinear Sci Numer Simul 18:3458–3468 19. Jahanpour E, Chen X (2014) Improving accuracy of complex network modeling using expectation-maximization. Discontin Nonlinearity Complex 3:169–221 20. Jeong H (2003) Complex scale-free networks. Physica A 321:226–237 21. Kaluza P, Kölzsch A, Gastner MT, Blasius B (2010) The complex network of global cargo ship movements. J R Soc Interface 7:1093–1103 22. Krebs V (2002) Mapping networks of terrorist cells. Connect 24(3):43–52 23. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45(2):167– 256 24. Newman MEJ, Barabasi AL, Watts DJ (2006) The structure and dynamics of networks. Princeton University Press, Princeton, NJ 25. De Solla Price DJ (1965) Networks of scientific papers. Science 149:510–515 (see Newman et al. 2006) 26. Shalizi C (2006) Methods and techniques of complex systems science: an overview, Chapter 1. In: Deisboeck TS, Yasha Kresh J (eds) Complex systems science in biomedicine. Springer, NY, pp 33–114 27. Sinclair A, Jerrum M (1989) Approximate counting, uniform generation and rapidly mixing Markov chains. Inf Comput 82(1):93–133
16
1 Network Science Models
28. Solomonoff R, Rapoport A (1951) Connectivity of random nets. Bull Math Biophys 13:107–117 (see Newman et al. 2006) 29. Watts DJ, Strogatz SH (1998) Collective dynamics of ‘small-world’ networks. Nature 393(6684):440–442 30. Weeks MR, Clair S, Borgatti SP, Radda K (2002) Social networks of drug users in high-risk sites: finding the connections. AIDS Behav 6(2):193–206
Chapter 2
Interdependent Critical Infrastructures
Critical infrastructures such as highways, power grid, storm water systems, dams, and locks are ubiquitous in our daily life. An infrastructure’s resilience to failures and attacks is of great importance to safety, security, and well-being of the society. Many interdependent critical infrastructures (ICIs) are vulnerable to disruptions caused by natural disasters and terrorist violence. ICIs can be modeled as interconnected networks of nodes and links, and their collective resilience is the ability to continue functioning when certain nodes and links fail due to failures or attacks. A resilient ICI has better performance and ICIs must collaborate to improve their resilience. Answers to four questions define the resilience of ICIs: 1.
2.
3.
How are disruptions of an infrastructure related to each other and what is their compounding effect? A wide range of accidents and incidents may cause disruptions to an infrastructure. The first step of analyzing ICIs’ collective resilience is to understand each infrastructure’s output, possible disruptions and their interdependencies, and potential impact of disruptions and their compounding effect. What protection measures may reduce the impact of disruptions and how do these measures interplay in response to disruptions? Since disruptions may be interdependent and have compounding effect on an infrastructure’s performance, it is imperative to determine how different protection and preparedness measures respond to disruptions and the optimal level of investment in protection that maximizes the infrastructure’s performance. Insufficient investment in protection may lead to catastrophic events whereas overly high investment is not sustainable in a long term. How are infrastructures interdependent on each other and what are critical components that must be protected to optimize ICIs’ collective performance? Critical infrastructures share or are connected through common features and functions. For instance, they are located in the same geographic region or
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 X. W. Chen, Network Science Models for Data Analytics Automation, Automation, Collaboration, & E-Services 9, https://doi.org/10.1007/978-3-030-96470-2_2
17
18
2 Interdependent Critical Infrastructures
Fig. 2.1 Three steps of determining resilience of ICIs
4.
depend on each other to function. These features and functions lead to interdependencies among infrastructures and may be extracted to analyze their collective resilience. One approach is to overlay ICIs and identify components whose failures have the most significant impact on the performance of ICIs. These critical components are identified based on their structural importance, likelihood of failures, and impact on other components in ICIs. How do ICIs collaborate to improve their resilience? Collective resilience of ICIs is a complex phenomenon and may be improved through a multifaceted approach. Operations and management of ICIs are influenced by the interplay of historical, economic, and societal factors. Many ICIs operate independently from each other although they may be geographically or functionally interdependent. A collaboration protocol enables distributed yet collaborative decision making and allows ICIs to collaborate and improve their collective resilience to failures and attacks. Collaborative decision making is necessary not only for ICIs that traditionally operate independently but are forced to respond to disruptions collectively, but also for ICIs that are subject to uncertainties in their routine operations.
The data-analytics process (Fig. 2.1) of developing answers to the four questions starts with an expert system that captures human and social resilience and interdependencies of ICIs. The expert system obtains knowledge from stakeholders in government agencies, industry, and academia. The second step is to model and investigate interdependencies among ICIs using portfolio theory, geographic information systems (GIS) analytics, and network science models. The third step applies distributed and collaborative decision making to improve the ICIs’ collective resilience.
2 Interdependent Critical Infrastructures
19
Natural disasters have a dual-tiered impact on local populations. First, there is direct impact on residential and commercial activity as physical structures are hit by severe wind, rain, or ground disasters [4, 23, 49]. People are displaced and head to nearby areas with better employment opportunities [7, 38]. At the same time, those who remain behind during rebuilding and reclamation efforts may earn more. For example, individuals who remain in disaster-impacted counties in southeastern US states realized wage gains of 2–10% depending on the industrial sector [5, 6, 22, 50]. Similar results were found for suburban St. Louis after the 1993 Great Flood [55]. The second major impact is an indirect one. As an infrastructure is damaged, local governments are called into action to restore electricity, repair damaged roadways, and ensure safe drinking water. But the loss of economic activity can have major repercussions on taxable revenues for those local governments [4, 46]. The lack of resources drastically slowdowns recovery efforts. This is especially a major problem in regions that do not face frequent disasters and thus suffer from a lack of readiness [1]. In addition to natural disasters, many manmade disasters have profound social impact [33]. For example, Colorado gold mine spill [53] triggered an uncontrolled rapid release of approximately three million gallons of acid mine water, and contaminated downstream water flows all the way to the State of Utah over a 10-day period. In Flint, Michigan, 8000 children have been exposed to lead and they are at risk for lifelong health problems as a result [41]. The impact of infrastructure disruptions on humans and society must be included in the analysis of ICIs’ resilience. The next four sections describe the three steps of the data-analytics process in Fig. 2.1.
2.1 Expert System Modeling and Analysis of Human and Social Resilience The analysis of ICIs first collects the geospatial data of each ICI and conducts initial analysis, and then gathers expert knowledge on the characteristics of each interdependent network. Local experts from each interdependent network may be recruited to join an advisory focus group to aid in the creation of critical information about ICIs, including infrastructure output, possible protection and preparedness measures, impact of potential disruptions, and interdependencies among ICIs. The impact of potential disruptions includes both explicit items (e.g., property damage and traffic delay cost) and often implicit and/or long-term effect such as labor market change, taxation, and relocation, which are part of human and social resilience. This expert systems approach facilitates the inclusion of implicit system knowledge which cannot be accessed through document analyses or standard geospatial data. For example, experts may have informal knowledge based on experience working in the region that would not be identified through geographic relationships within and between interdependent networks. Knowledge gathered from experts can be incorporated into an expert systems model [47] which may be developed using
20
2 Interdependent Critical Infrastructures
statistical sampling or likelihood rules for predictive purposes. The model minimizes the demands placed on GIS systems and can be assessed for accuracy when presence data become available. Expert systems have been successful in public interest research projects (e.g., [44, 45]). From a human and social resilience perspective, an expert system captures the impact of a disruption to a critical component of the local infrastructure (and any interdependent links across infrastructures) such that the following can be identified: (a) the amount of allocated resources from local, state, and federal government agencies, and the speed by which those resources are distributed when a disruption occurs; (b) the number of displaced individuals, both temporary and long term; (c) the total value of damages, both insured and uninsured; and (d) the speed with which local commercial areas can reopen. This information is used to assess the net economic impact of a disaster on a local area by conducting either a simple difference-in-difference (DD) estimation with which a single suitable control is identified and serves to highlight the true effectiveness of disaster-response policies, or extending the findings into less of a localized setting, which can then be extended into a generalized-difference-indifference (GDD) estimation [6] in which multiple controls and treatments are used. In the GDD, members of the control groups in one period may become treatment groups in a subsequent period. While the GDD approach provides better overall policy implications, either the DD or GDD provides a suitable option of both linking up assorted disruptions to infrastructures as well as isolating the overall net impact for any one individual disruptive event.
2.2 Interdependencies and Critical Components Interdependencies between ICIs are analyzed in two parallel approaches that converge and determine the optimal level and allocation of protection resources to maximize ICIs’ collective performance. Interdependencies among disruptions and protection are investigated using the portfolio theory to identify the optimal level of protection for each individual ICI. Meantime, interdependencies among infrastructures are analyzed using GIS tools and networked constraints to identify critical components across multiple ICIs. Outcomes of both approaches are integrated and further analyzed to determine the level of protection for critical components and how to optimize ICIs’ collective performance. The general principle of portfolio theory is that an investor can maximize outcome from a mix of uncorrelated investments that match the investor’s risk propensity [15, 36]. For ICIs, different levels of protection may be applied to reduce the impact of exogenous (e.g., flooding) and endogenous (e.g., scheduled road maintenance) events. Let Pr ot be the amount of investment in protection and preparedness as a portion of an infrastructure’s output. For example, the California Independent System Operator (ISO) delivers over 260 million megawatt-hours of electricity each year [8] with the average price around $45 per megawatt-hour [9], which is equivalent to
2.2 Interdependencies and Critical Components
21
$11.7 billion revenue. If Pr ot = 0.001, the amount of investment in protection and preparedness is $11.7 million. 0 < Pr ot ≤ 1. Let Per f be the performance of protection and preparedness (e.g., providing backup power for traffic lights). Per f is the portion of output that continues functioning given the impact of disruptions and the level of protection and preparedness. Let I M P be the impact of a disruption as a portion of an infrastructure’s output. 0 < I M P ≤ 1. Equation (2.1) calculates the performance of protection and preparedness. E f f is the effectiveness of protection and preparedness actions. E f f > 0. The larger the E f f is, the more effective are protection and preparedness measures of an infrastructure. E f f = 1 indicates that the portion of output protected from a disaster is the same as the portion of output invested in protection and preparedness. E f f > 1 indicates that the portion of output protected from a disruption is greater than the portion of output invested in protection and preparedness. 0 < E f f < 1 indicates that the investment is greater than the outcome and protection and preparedness measures are less effective. Per f = (1 − Pr ot) 1 −
IMP Pr ot × E f f α
(2.1)
α in Eq. (2.1) indicates the criticality of protection and preparedness, which is directly related to the criticality of the infrastructure. The value of α determines whether the effect of protection and preparedness measures may be multiplied or may diminish in response to a disruption. For example, power grids are prone to failures in winter storms. There are a few protection and preparedness measures, for example, using underground power lines (costly but effective) or increasing the number of maintenance personnel who can respond and repair falling power lines. In a major storm, e.g., the January 2016 storm along East Coast which caused at least 240,000 homes and businesses to lose power [57], use of underground power lines would significantly reduce the damage and protect most of the power grid output. α for underground power lines is greater than one. Additional maintenance personnel would not be a critical protection measure in a major storm; their effect diminishes because of road and weather conditions and α is likely to be less than one. α may change in response to different disruptions. For example, in a moderate storm in which maintenance personnel can quickly reach various locations, having additional maintenance personnel would be a critical protection measure and α is expected to be around one. IMP is the portion of output protected from a disruption whose potential 1 − Pr ot×E ffα impact is represented by I M P. To maximize Per f , the first order derivative with respect to Pr ot (Eq. (2.2)) is calculated. The level of protection and preparedness IMP ∗ that maximizes Per f is Pr ot = E f f α , which is the solution to Eq. (2.2) when 2
Per f it is set equal to zero. The second order derivative with respect to Pr ot, dd Pr = ot 2 2I M P IMP ∗ − Pr ot 3 E f f α < 0 when Pr ot = E f f α . The maximum infrastructure performance 2 is 1 − EI Mf fPα .
22
2 Interdependent Critical Infrastructures
IMP d Per f = −1=0 d Pr ot Pr ot 2 × E f f α
(2.2)
Many disruptions are interdependent and may occur simultaneously; their compounding impact on an infrastructure needs to be studied holistically. For example, rainstorms may cause flooding, such as the one happened southwest of St. Louis, Missouri in December 2015 and January 2016. Heavy rain caused flooding in Meramec River where the river reached more than 4 feet above the previous record at some spots [3]. On the other hand, protection and preparedness measures may protect an infrastructure from multiple disruptions. Let i be the index of potential impact of disruptions, i = 1, . . . , m. I M Pi is the collective impact of a combination of interdependent disruptions. Let Pr oti be a set of protection and preparedness measures which may be applied to protect an infrastructure from I M Pi . The measures in Pr oti are interdependent because of their compounding effect on an infrastructure. For example, a backup generator in a power grid not only provides electricity when the main generators stop working due to natural disasters, but also provides additional electricity during peak times. Equation (2.3) calculates the performance of an infrastructure given multiple sets of disruptions and their corresponding protection and preparedness measures. Per f = 1 −
Pr oti
i
1− i
I M Pi Pr oti × E f f iαi
(2.3)
Pi If a set of disruptions causes a cascading failure (i.e., 1 − Pr otI M αi = 0), the i ×E f f i infrastructure loses all its output and Per f = 0. In this case, allocating resources to protect the infrastructure from other disruptions does not maximize its performance since Per f = 0. The goal is to optimally allocate protection resources to prevent cascading failures from occurring and maximizing the infrastructure’s performance. In Eq. (2.3), Per f is a function of m variables Pr ot1 , Pr ot2 , …, Pr otm . The firstorder partial derivative with respect to each variable is calculated and set equal to zero to find the optimal level of protection and preparedness that maximizes Per f . For example, Eq. (2.4) is the first-order partial derivative with respect to Pr ot1 . Assuming that cascading failures are prevented by optimally allocating protection Pi resources, i.e., 1 − Pr otI M αi = 0 for ∀i, Eq. (2.4) is set equal to zero and rewritten i ×E f f i as Eq. (2.5). To find the optimal level of protection and preparedness is therefore equivalent to solving a system of m nonlinear equations in Eq. (2.6).
⎡ ⎛ ⎞⎤ α1 2
∂ Per f I M Pi × E f f Pr ot 1 1 ⎠⎦ ⎣ ⎝ 1− = 1− Pr oti + ∂ Pr ot1 I M P1 Pr oti × E f f iαi i,i=1 i,i=1 (2.4) i,i=1
Pr oti +
Pr ot12 × E f f 1α1 =1 I M P1
(2.5)
2.2 Interdependencies and Critical Components
23 αj
Pr oti +
Pr ot 2j × E f f j I M Pj
i,i= j
= 1,
j = 1, 2, . . . , m
(2.6)
For many ICIs, the total amount of investment in protection and preparedness is fixed, i.e., Pr ottotal , the portion of an infrastructure’s output invested in protection Pr oti . Equation (2.6) may be and preparedness, is a constant, where Pr ottotal = i
rewritten as Eq. (2.7), and Eq. (2.8) is the solution to Eq. (2.7). The second order partial derivatives are calculated to obtain the Hessian matrix. For example, Eq. (2.9) is the second order partial derivative with respect to Pr ot1 and a component in the Hessian matrix. If the Hessian is negative definite at the solution to (Eq. (2.8)), then Per f attains a local maximum. αj
Pr ottotal − Pr ot j +
Pr ot 2j × E f f j I M Pj
= 1,
j = 1, 2, . . . , m
I M P j2 I M Pj I M Pj Pr ot j = + + αj α (1 − Pr ottotal ), 2α j 2E f f j Ef fj j 4E f f j
(2.7)
j = 1, 2, . . . , m (2.8)
⎡
⎛
⎞⎤
Pr ot12 × E f f 1α1 ⎠⎦ ∂ 2 Per f ⎣1 − ⎝ = Pr ot + i I M P1 ∂ Pr ot12 i,i=1 I M Pi 2Pr ot1 × E f f 1α1
1− − I M P1 Pr oti × E f f iαi i,i=1
(2.9)
2.3 Networked Constraints In addition to interdependencies among disruptions and protection measures, the great challenge in modeling ICIs is to capture and analyze interdependencies among infrastructures. Networked constraints are used to represent sophisticated relations in domain knowledge from multiple ICIs as large-scale interacting constraints. Data collected from ICIs are overlaid in ArcGIS to represent and visualize networked constraints. Advanced algorithms must be developed to reason on multi-scale data extracted from ICIs and identify critical components by utilizing the topological information in the structure. Networked constraints represent knowledge and relations. They are originally representations designed to solve constraint satisfaction problems [34]. Constraint networks are capable of representing both quantitative and qualitative knowledge
24
2 Interdependent Critical Infrastructures
Fig. 2.2 Physical process layer of a smart grid represented in ArcGIS
as continuous and binary variables and constraints. If the structure of interactive constraints is correctly analyzed, critical components in multiple ICIs can be tracked and identified. To date, preliminary development of disruption prevention and detection in complex networks has successfully demonstrated the use of advanced constraint networks, which have single, deterministic links, and inclusive, exclusive, or independent relations between constraints [11]. Generic and non-deterministic relations need to be explored for dynamic representation of knowledge through constraints. Each constraint in a constraint network is a node in ICIs and stores variables representing quantitative and qualitative knowledge in an ICI. The non-deterministic relations between constraints enable dynamic knowledge integration and updating through contemporizing the structure and weight of links. Constraint networks support reasoning in multiple granularities. Most engineered infrastructures can be viewed as having a multi-layered architecture as the infrastructures allocate different responsibilities into different layers. A normal web system, for example, has at least presentation, application, and data layers. In many scientific fields, the behavior of subjects can be represented in different models when observation granularity varies. The multi-granularity feature also enables multi-scale data analysis of ICIs. ICIs (e.g., smart grid) can be represented by a special case of constraint networks (e.g.,
2.3 Networked Constraints
25
complex centralized network [12]). Figure 2.2 is the physical process layer represented in ArcGIS [21]. By using interdependencies, shared variables, and interactions between constraints, constraint networks can be constructed and allow reasoning with different scale of data collected from multiple ICIs to identify critical components. Many resilience metrics (e.g., [13, 18]) and frameworks (e.g., [32]) may be used to model ICIs. For example, link conductance (see Chap. 1 and [14, 19, 20, 37, 48]) is a metric that measures infrastructure resilience when links fail or are cut. For some of those resilience metrics, polynomial-time algorithms are developed to optimize resilience by either maximizing or minimizing the metrics [13, 19, 37].
2.4 Distributed Collaborative Decision Making Many ICIs do not collaborate although they are interdependent. For example, wind farms in neighboring communities often operate independently. There is lack of incentive or mechanism for ICIs to form long-term and beneficial collaboration during routine operations and in response to disruptions. Figure 2.3 shows a collaboration protocol that guides ICIs to collaborate and improve their collective resilience.
Fig. 2.3 Collaboration protocol for ICIs community
26
2 Interdependent Critical Infrastructures
Stakeholders responsible for each ICI determine the level of investment in protection and preparedness and how and which components in the ICI are protected. An ICI may decide to join or remain in the community if collaborating with other ICIs improves its resilience. On the other hand, an ICI that is already part of the community may decide to leave the community if collaborating with other ICIs is detrimental to its performance. Each ICI makes its decisions based on infrastructure output, possible disruptions, and effectiveness and criticality of protection measures. These distributed decisions may be able to maximize the performance of an individual ICI given its limited protection resources but are unlikely to be able to maximize the overall performance of multiple ICIs. The collaboration protocol enables distributed yet collaborative decision making. An ICI has the incentive to join the ICIs community because its performance and resilience to disruptions may be further improved with resources from other ICIs in the community. When an ICI requests to join the community, it submits to the ICIs community information about its output, resources, and estimated performance. The ICIs community integrates information from multiple ICIs and identifies the optimal level of protection and preparedness for each ICI to maximize their collective performance. In addition, the ICIs community determines critical components across multiple ICIs and allocates resources to protect certain critical components to optimize the effect of protection. When an ICI requests to leave the community, a similar evaluation process is conducted to advise the ICI and the rest of the community about the impact. The community can preemptively request an ICI to leave the community if the ICI does not have an acceptable level of performance and is unable to improve it. The community can also invite an ICI to join. The collaboration protocol must support distributed, collaborative decision making in response to disruptions. Several disruption scenarios, including storm, flooding, earthquake, tornado, and their interactions and uncertainties are the input to a high-level simulation model, which aims to (a) validate that collaboration among ICIs improves their collective performance; and (b) conduct sensitivity analysis to determine how the optimal level of protection and preparedness may change if certain parameters vary. The simulation model may be fine-tuned in iterations until the model output is consistent with the domain experts’ prediction. The output of the simulation model is used to form recommendations for stakeholders on how they might collaborate to improve ICIs’ collective resilience. An application example of three ICIs, including an electric power grid, a multimodal transportation network, and water-related infrastructures is illustrated in the next section.
2.5 An Example of Three ICIs Many ICIs, such as multimodal transportation networks and electric power grids, are highly connected; they grow larger and become more connected with time. Failures in one component may have significant impact on other components across a large
2.5 An Example of Three ICIs
27
geographic region. Sustainability and resilience depend on social, economic, environmental, and geographical factors. Natural and manmade disasters often occur unexpectedly and are difficult to predict. Multiple ICIs may fail simultaneously due to tornadoes, floods, earthquakes, oil spills, terrorist attacks, and other disasters. Numerous accidents and incidents occur frequently and may cause catastrophic events if appropriate protection measures are not implemented. The following subsections describe three ICIs and their interdependencies.
2.5.1 Smart Electric Power Grid The North American Power Grid is arguably the most complex of the critical infrastructures. The energy delivery infrastructure is central to the smooth operation of many economic, social, and societal aspects of day-to-day life, and to other critical infrastructures. Widespread unplanned power outages known as blackouts best exemplify this fact. Some recent and prominent examples include the 2003 Northeast blackout [51], the 2011 San Diego blackout [40], and the 2012 Indian blackout [39]. The quantitative measure of “smoothness” is formally termed as the power grid reliability. Unlike other critical infrastructures, however, the national power grid is not owned, operated, or controlled by a single authoritative entity, even at geographically local levels. In the U.S., there are seven Regional Transmission Organization (RTO) and Independent System Operator (ISO) entities (Fig. 2.4), over 2000 public and investor-owned utilities, nearly 900 cooperatives, over 150 power marketers, nine
Fig. 2.4 North American Electric Reliability Corporation (NERC) regions and balancing authorities as of July 12, 2012 [2]
28
2 Interdependent Critical Infrastructures
federal power agencies [2], and over 100 balancing authorities spread across eight electric reliability regions. A widespread grid overhaul to improve its reliability is a complex and complicated undertaking due to many logistical hurdles. Energy demand is balanced against its generation on a real-time basis; bulk electricity cannot be generated, stored, or released on demand in the form of energy in any practical and feasible way like water or gas. Instead, energy generation is scheduled on a daily/hourly basis using historical usage statistics. Sudden and substantial deviations in planned/scheduled energy usages can make the power grid to operate under stressed boundary conditions. There are inevitable interdependencies between the power grid and other critical infrastructures that have many energy-dependent systems and components. Lighting systems, movable bridges across waterways, and traffic control systems (color lights, timers, and monitoring and sensing equipment) in transportation networks, drainage pumps and other electricity driven control systems in storm water systems all depend on a reliable supply of electric energy for proper operation. Conversely, a paralyzed transportation system prevents necessary raw materials (e.g., coal) from being shipped to the power plant and further delays or stops energy production. Some of the most critical components in these ICIs must be reinforced. Primary needs of power grid data are those that are indicative of what affects the reliability of the power grid and what is affected by the lack of reliability of the power grid. A broad list of desired power grid data is as follows: 1.
2.
3.
4.
Power grid structural data: geographical location of substations, transmission and distribution lines, network topologies, critical transmission corridors, critical distribution corridors, and equipment age such as age of transformers and utility poles. Power grid operational data: load characteristics of transmission and distribution lines, number of customers served per substation basis, and generation, transmission, and distribution capabilities. Power grid contingent operation data: backup generation assets and their capacities, energy imports and exports, generation dependencies (transported coal, hydro reservoir location, cooling water needs for nuclear), backup equipment inventories, repair and response capabilities, and manned/unmanned assets and their remote operational capabilities. Power grid related environmental and historic data: line failure statistics due to snow accumulation and recurring natural disasters such as hurricanes and tornadoes, and past equipment failures due to rapid freezing.
Utility companies, however, often operate under strict federal and state regulatory constraints, restricting them from releasing most types of power grid related data. Explicit collection of certain data may be difficult. As a supplemental tool, implicit data collection may be conducted to obtain publicly available data such as GIS data, maps, satellite imagery, and public achieves to derive power grid data. Example resources include the [43], Department of Energy (DOE) Energy Storage Database [17], KONECT [31], and [35]. Data from some of these databases such as Platts Database Suite may be purchased.
2.5 An Example of Three ICIs
29
2.5.2 Multimodal Transportation Network The Department of Homeland Security (DHS) identifies seven key modes of transportation in the U.S. [16]. Three of these modes, highway infrastructure and motor carrier (hereafter roadway network), maritime transportation system (hereafter waterway network), and freight rail (hereafter railway network), are of great importance to the safety and security of society because of their roles in delivering essential goods and would likely play a significant role in providing relief following manmade and natural disasters. Disruptions of any one of these networks could have significant local economic consequences. For example, the I-35W bridge collapse in Minneapolis, MN was estimated to have resulted in an economic loss of $71,000–220,000 per day due to travel delay [56]. For another example, the effects of an unscheduled 180-day closure of the Chickamauga Lock in east Tennessee were estimated to cost over $2,000,000 annually in traffic delays [10]. Systematically, each of these ground-based transit types consists of a collection of intramodal links and nodes. The roadway network in the U.S. includes 3.9 million miles of roadway links [54]. These links are divided into seven primary functional classes including Interstate, Other Freeway/Expressway, Other Principal Arterial, Minor Arterial, Major Arterial, Minor Collector, Collections, and Local. Intramodal nodes in this network include bridges, overpasses, culvert crossings, tunnels, junctions, tolls, and rail crossings. The inland waterway network in the U.S. includes 25,000 miles of navigable waterways [16]. Intramodal nodes in this network include locks, dams, and portages. Intermodal nodes in this network include inland waterway ports. The infrastructure of this network is publicly and privately owned. Unlike the roadway network, the majority of railway network in the U.S. is made up of seven competitive Class I carriers including BNSF, Canadian National, Canadian Pacific, CSX, Kansas City Southern, Norfolk Southern, and Union Pacific. Combined, these carriers have over 95,000 miles of railways. Intermodal nodes in railway networks include transloading facilities (hereafter, intermodal facilities). These facilities allow for the loading and unloading of railcars. Bridges and transportation management centers are also essential to the efficient operation of several modes of transportation. Transportation management centers help operations engineers monitor travel quality and coordinate response as necessary. These centers, as well as its field equipment, are vulnerable to power outages. For example, during a regional power outage a center might remain operational on back-up power, but its field devices might be inoperable without power. A multimodal transportation network can be created in ArcGIS by conflating links and nodes from the roadway network, waterway network, and railway network into one multimodal transportation network. Spatially referenced data on the links and nodes of the transportation sector are publicly available through numerous sources, including local, state, and federal agencies, private corporations, and universities. As an example, potential sources of geospatial information for the State of Illinois, U.S. are:
30
1. 2. 3. 4. 5. 6.
2 Interdependent Critical Infrastructures
Illinois Roadway Information System (IRIS) [27] Illinois Structure Information System (ISIS) [28] Illinois Railroad Information System (IRRIS) [29] National Waterway Network (NWN) [52] Center for Transportation Analysis Rail Network (CTARN) [42] 2011 Illinois Department of Transportation (IDOT) Orthophotography [26].
The IRIS dataset includes geospatially referenced data on all roads in Illinois. The IRRIS includes information on the location of all railroad crossings in Illinois but does not include data on the location of the actual railroad tracks and facilities, which can be extracted from the CTARN for Class I carrier’s tracks and facilities in Illinois. Specific transloading facilities can also be found in trade publications for each Class I carrier. The ISIS dataset includes information on all structural nodes of the IRIS network, including bridges, tunnels, overpasses, and culverts. Information on the inland waterway network, including the location of locks, dams, and ports can be extracted from the NWN. One of the potential issues with developing this intermodal transportation network is to line up components spatially, which may become intractable for data from multiple different sources. There might be missing or hidden data; the later need to be queried prior to use. These issues can be addressed by verifying the accuracy of data and editing geospatial information in the GIS. Missing data can be manually created using the 2011 IDOT Orthophotography for Illinois, which has a spatial resolution of 6 × 6 in.
2.5.3 Water-Related Infrastructures Many areas around the world are at the confluence of rivers, oceans, or lakes. The geographical relationship with high order streams makes water-related infrastructures critical to the resilience of ICIs. Along waterways are numerous locks, dams, levies, and portages that play a critical role in the mitigation of storm water. Managing storm water is essential for avoiding flooded transportation links. In particular, culverts are used to enable the flow of storm water from one side of a road/rail to another. Engineers design these structures with the capacity for differing levels of runoff, depending on the importance of the transportation link and agency policies. For example, culverts designed for IDOT in the State of Illinois, U.S. must be able to convey storm water from a 100-year flood [30]. Culverts designed in the past or for other public agencies might not have considered such high flood levels. Additionally, low-lying interstate highways are sometimes below the water table and require continuous pump control to maintain a dry roadway. Thus, during flood events, the performance of storm water systems is critical to the operation of many transportation links. Additionally, the reliability of electricity grid has a direct impact on the ability of storm water pumping stations.
2.5 An Example of Three ICIs
31
To support the maintenance and management of storm water infrastructure, many public agencies have inventories that list all culverts over a minimum size and pumping stations. Agencies can integrate this information with existing asset management systems or GIS databases. For some water-related infrastructures, this maintenance and management is part of a broader transportation system. Flooding risk is based on a variety of environmental conditions, including elevation above water, distance from water, flow of water (pools and sinks), and the permeability of a given area. Federal Emergency Management Agency (FEMA) has a collection of high-quality risk assessment maps for flood prone areas, which is continuously updated by the FEMA Flood Map Service Center. In addition, high resolution digital elevation models, high resolution aerial photographs, and high-quality stream data are available for analysis from the United States Geological Survey (USGS) High Resolution Orthoimagery dataset and USGS National Hydrologic Dataset 24k stream network. These resources can be used to supplement the FEMA data and to customize the modeling efforts. The location and details of storm water facilities are recorded in an inventory by local agencies and can be geocoded using ArcGIS [24, 25]. Chapter 5 describes the modeling and analysis of water distribution systems.
2.6 Conclusions and Recommendations The collective resilience of ICIs requires (a) qualitative and quantitative analysis of human and social resilience of ICIs; (b) optimal allocation of protection resources to maximize the expected performance of ICIs; (c) critical components identified at the interface of multiple ICIs; (d) a collaboration protocol that enables distributed, collaborative decision making to improve ICIs’ collective resilience; and (e) an interdisciplinary team of experts from engineering, computer science, geography, and economics whose complementary skills enables the integration of system modeling, network science, and data analytics related to critical infrastructures and other large and complex spatial systems. In the long term, sustainable ICIs rely on a community partnership with a region’s water and power distributors and entities that support and manage various infrastructure components. Information technologies are at the forefront of this partnership and enable real-time monitoring of ICIs and collaboration between stakeholders.
References 1. Abramson D, Redlener I (2012) Hurricane sandy: lessons learned again. Disaster Med Public Health Prep 6(4):328–329 2. American Public Power Association (APPA) (2013) Statistics from the 2012–2013 public power annual directory and statistical report
32
2 Interdependent Critical Infrastructures
3. Associated Press (2016) Towns South of St. Louis prepare for Mississippi River floods. http://www.foxnews.com/us/2016/01/01/towns-south-st-louis-prepare-for-mississippiriver-floods.html. Accessed Jan 2021 4. Belasen AR, Dai C (2014) When oceans attack: assessing the impact of hurricanes on localized taxable sales. Ann Reg Sci 52(2):325–342 5. Belasen AR, Polachek SW (2008) How hurricanes affect wages and employment in local labor markets. Am Econ Rev 98(2):49–53 6. Belasen AR, Polachek SW (2009) How disasters affect local labor markets: the effects of hurricanes in Florida. J Hum Resour 41(1):251–276 7. Belasen AR, Polachek SW (2013) Migration as a result of environmental disasters. In: Constant AF, Zimmermann KF (eds) International handbook on the economics of migration. Edward Elgar, Cheltenham, UK, and Northampton, USA, pp 309–330 (chapter 17) 8. California ISO (2013) Company information and facts 9. California ISO (2013) Annual report on market issues & performance 10. Chatterjee A, Wegmann F, Jackson N, Everett J, Bray L (2001) Effect of increased truck traffic from Chickamauga Lock closure. Transp Res Rec 1763(1):80–84 11. Chen XW, Nof SY (2012) Constraint-based conflict and error management. Eng Opt 44(7):821– 841 12. Chen XW, Nof SY (2012) Conflict and error prevention and detection in complex networks. Automatica 48(5):770–778 13. Chen X (2016) Critical nodes identification in complex systems. Complex Intell Syst. https:// doi.org/10.1007/s40747-016-0006-8 14. Chung FRK (1997) Spectral graph theory. American Mathematical Society 15. Cole S (2010) The regional portfolio of disruptions, protection, and disasters. Ann Reg Sci 44:251–272 16. DHS (2013) Transportation systems sector. http://www.dhs.gov/transportation-systems-sector. Accessed Feb 2021 17. DOE Energy Storage Database (2016) http://www.energystorageexchange.org/projects. Accessed Feb 2021 18. Eisenberg DA, Park J, Bates ME, Fox-Lent C, Seager TP, Linkov I (2015) Resilience metrics: lessons learned from military doctrines. Solut J 5 19. Ercal G, Matta J (2013) Resilience notions for scale-free networks. Complex Adapt Syst 510– 515 20. Ercal G (2014) On vertex attack tolerance of regular graphs. CoRR, abs/1409.2172. http:// arxiv.org/abs/1409.2172 21. ESRI (2013) ArcGIS® schematics: dealing with connectivity. ESRI White Paper, 17 22. Ewing BT, Kruse JB (2001) Hurricane bertha and unemployment: a case study of Wilmington, NC. In: Proceedings of the Americas conference on wind engineering 23. Guimaraes P, Hefner FL, Woodward DP (1993) Wealth and income effects of natural disasters: an econometric analysis of Hurricane Hugo. Rev Reg Stud 23:97–114 24. Hu S, Kysor T (2010) Modeling our environment: from cartographic to photo-realistic. In: Proceedings of the joint symposium of ISPRS technical commission IV & AutoCarto 25. Hu S (2015) The use of GIS, remote sensing and virtual reality in flooding hazardous modeling, assessment and visualization. In: Tobin GA, Montz BE (eds) Evolving approaches to understanding natural hazards. Cambridge Scholars Publishing, Newcastle-Upon-Tyne, pp 230–239 26. IDOT (2011) 2011 Illinois Department of Transportation Orthophotography. http://www.isgs. uiuc.edu/nsdihome/webdocs/idot2011/. Accessed Feb 2021 27. IDOT (2013) Illinois Roadway Information System. http://gis.dot.illinois.gov/gist2/. Accessed Feb 2021 28. IDOT (2013) Illinois Structure Information System. http://gis.dot.illinois.gov/gist2/. Accessed Feb 2021 29. IDOT (2013) Illinois Railroad Information System. http://gis.dot.illinois.gov/gist2/. Accessed Feb 2021
References
33
30. IDOT (2015) IDOT bureau of design and environment manual, Chapter 40, Section 3.04. http://www.idot.illinois.gov/doing-business/procurements/engineering-architectural-profes sional-services/Consultants-Resources/index. Accessed Oct 2020 31. KONECT Networks (2016) http://konect.uni-koblenz.de/networks/opsahl-powergrid. Accessed Feb 2021 32. Larkin S, Fox-Lent C, Eisenberg DA, Trump BD, Wallace S, Chadderton C, Linkov I (2015) Benchmarking agency and organizational practices in resilience decision making. Environ Syst Decis 35 33. Laska S, Morrow BH (2006) Social vulnerabilities and Hurricane Katrina: an unnatural disaster in New Orleans. Mar Technol Soc J 40(4):16–26 34. Lecoutre C (2009) Constraint networks: techniques and algorithms. Wiley, Hoboken, NJ 35. MAPSearch Electric Power GIS Data (2016) http://www.mapsearch.com/gis-asset-data/ele ctric-power-gis-data.html. Accessed Feb 2021 36. Markowitz H (1959) Portfolio selection: the efficient diversification of investments. Wiley, New York 37. Matta J, Borwey J, Ercal G (2014) Comparative resilience notions and vertex attack tolerance of scale-free networks. CoRR, abs/1404.0103. http://arxiv.org/abs/1404.0103 38. McIntosh MF (2008) Measuring the labor market impacts of Hurricane Katrina migration: evidence from Houston, Texas. Am Econ Rev 98(2):54–57 39. National Post (2012) Pictures of chaos as massive India blackout leaves 670 million without power. http://news.nationalpost.com/2012/07/31/pictures-of-chaos-as-massive-indiablackout-leaves-670-million-without-power/. Accessed Feb 2021 40. National University System Institute for Policy Research (2011) Preliminary report on the San Diego blackout economic impact. http://www.nusinstitute.org/Research/Briefs/PreliminaryReport-on-SanDiego-Blackout-Economic-Impact.html. Accessed Feb 2021 41. Nelson L (2016) The flint water crisis, explained. http://www.vox.com/2016/2/15/10991626/ flint-water-crisis. Accessed Feb 2021 42. Oakridge National Laboratory (2016) Center for Transportation Analysis Rail Network. http:// www-cta.ornl.gov/transnet/RailRoads.html. Accessed Feb 2021 43. Platts Database Suite (2016) http://www.platts.com/products/world-electric-power-plants-dat abase. Accessed Feb 2021 44. Ridgley M, Fernandes L (1999) Multiple criteria analysis integrates economic, ecological and social objectives for coral reef managers. Coral Reefs 18(4):393–402 45. Ripy J, Grossardt T, Shouse M, Mink P, Bailey K, Shields C (2014) Expert systems archaeological predictive model. Transp Res Rec: J Transp Res Board 2403:37–44 46. Rose A, Benavides J, Chang SE, Szczesniak P, Lim D (1997) The regional economic impact of an earthquake: direct and indirect effects of electricity lifeline disruptions. J Reg Sci 37(3):437– 458 47. Shouse M, Blandford B (2015) Expert systems model for Kentucky Arrow Darter habitat in the upper Kentucky River Basin. Pap Appl Geogr 4:383–390 48. Sinclair A, Jerrum M (1989) Approximate counting, uniform generation and rapidly mixing Markov chains. Inf Comput 82(1):93–133 49. Skidmore M, Toya H (2002) Do natural disasters promote long-run growth? Econ Inq 40:664– 687 50. Strobl E (2008) The economic growth impact of hurricanes: evidence from US coastal counties. IZA Discussion Paper No. 3619 51. U.S., Canada Power System Outage Task Force (2004) Final report on the August 14th blackout in the United States and Canada: causes and recommendations. https://reports.energy.gov/Bla ckoutFinal-Web.pdf. Accessed Feb 2021 52. U.S. Army Corp of Engineers (2014) National waterway network. http://www.navigationdatac enter.us/data/datanwn.htm. Accessed Feb 2021 53. U.S. Department of the Interior (2015) Technical evaluation of the Gold King Mine incident 54. U.S. Department of Transportation Federal highway Administration (2013) Highway functional classification concepts, criteria, and procedures, FHWA-PL-13-026
34
2 Interdependent Critical Infrastructures
55. Xiao Y, Feser E (2014) The unemployment impact of the 1993 US midwest flood: a quasiexperimental structural break point analysis. Environ Hazards 13(2):93–113 56. Xie F, Levinson D (2011) Evaluating the effects of the I-35W bridge collapse on road-users in the twin cities metropolitan region. Transp Plan Technol 34(7):691–703 57. Zoroya G (2016) Blizzard knocks out power to hundreds of thousands along East Coast. USA Today. http://www.usatoday.com/story/news/nation/2016/01/23/blizzard-knocksout-power-more-than-100000-new-jersey/79222284/. Accessed Jan 2021
Chapter 3
Public Health
As the world becomes more and more connected, critical decisions have an increasing impact on public health. To prevent the spread of diseases, members of a population are immunized or quarantined. Ideally, we would like to provide vaccines to everyone and quarantine each person who is infected by a transmissible disease. In reality, however, resources such as time available to respond to a pandemic or contain contagious diseases are limited; decisions must be made to determine which members of a network are isolated or treated to minimize the spread of diseases. The study of networks helps to solve complex problems in public health. As the healthcare cost keeps increasing, decision makers often need to select a subset of population members to immunize or quarantine in order to optimally contain an epidemic such as sexually transmitted, air-born, and blood-borne infectious diseases. The impact of these critical decisions on public health is profound. In physics and engineering, there has been extensive research on developing mathematical models to describe networks. Some network models (e.g., [5, 10]) were successfully applied to describe energy grids, air traffic, and groups of terrorists and drug injectors. In social science, many metrics including centrality [18], complement metric [13], and reciprocal metric [6] were developed to describe network properties. There is a strong need to establish an analytical framework for decision making in public health under resource constraints. The analytical model integrates network science and social science and enables algorithms to maximize the effect of network separation through isolating or removing critical nodes and links. First, it is imperative to understand the structures and properties of networks in public health. Networks have different structures which can be described by mathematical models. Network properties are determined by the network structure and reflected in network metrics, which can guide node and link isolation or removal for maximum separation effect. Secondly, it is necessary to have computationally feasible and effective algorithms that identify nodes and links for removal or isolation. Many networks especially those in public health have large order and the brute-force simulation is computationally infeasible. The key is to design effective algorithms that pinpoint the nodes and links © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 X. W. Chen, Network Science Models for Data Analytics Automation, Automation, Collaboration, & E-Services 9, https://doi.org/10.1007/978-3-030-96470-2_3
35
36
3 Public Health
for isolation or removal within reasonable amount of time. Thirdly, the analytical model must be verified and validated with networks in public health. Network data from various sources are available for verification and validation. For instance, an acquaintance network of 293 drug injectors on the streets of Hartford, CT [36] is available for analysis. The objective is to examine whether the analytical model can effectively reduce the speed and scale of disease diffusion in a network.
3.1 Process and Tools for Modeling Public Health Networks There are in general four steps in modeling a public health network: 1.
2.
3.
Design network metrics to unequivocally assess the speed and scale of disease diffusion in a network. A common misunderstanding is that a network with larger centrality diffuses diseases faster. Most centrality metrics including Freeman’s three centralities [18] and the complement metric [13] cannot measure the speed and scale of disease diffusion in a network. For each of these metrics, we can find counter examples that show a network with larger centrality diffuses diseases slower than a network with smaller centrality does. Borgatti [6] correctly pointed out that these metrics were not developed with the consideration of a network’s capability in diseases diffusion and proposed the reciprocal metric, which aims to be a robust metric for both separated and connected networks. The network metrics illustrated in Chap. 1 can be customized and used to assess the speed and scale of diseases diffusion in a network. Model structures of networks in public health. Many real-world networks can be described with a mathematical model that specifies their degree distribution. Several well-known models for complex networks such as random and scalefree and statistical analysis tools such as goodness-of-fit analysis can be used to identify network structures. The models are used to generate statistically different networks to test various algorithms and help determine which nodes and links in the network should be quarantined or removed. Design node and link metrics and structural search algorithms. Network metrics may be used to compare the speed and scale of diseases diffusion in a network before and after isolating or removing certain nodes and links. These comparisons require all combinations be evaluated. In practice, it is not computationally feasible to calculate a network metric for all combinations. Node and link metrics must be designed to enable the structural search of nodes and links for isolation or removal. A node or link metric is calculated for each node or link, respectively. The structural search algorithm then determines which nodes and links are isolated or removed based on the node or link metric. The structural search algorithm selects the nodes or links for isolation or removal based on available resources.
3.1 Process and Tools for Modeling Public Health Networks
4.
37
Verify and validate the analytical framework with networks in public health.
The analytical framework for decision making in public health under resources constraints integrates knowledge in three areas. First, in social science, a variety of network metrics were developed to describe properties of a network. Social scientists also led the effort in quantifying network with adjacency matrices and network visualization. These are appropriate tools to generate and visualize statistically different networks with various orders to test the structural search algorithms. Secondly, complex network theory describes networks with mathematical models and reveals structural properties through analytical and simulation studies. Using mathematical models to describe networks has many advantages. These models reveal structural properties of a network and provide a direction on the design of node and link metrics for structural search that determines which nodes and links need to be isolated or removed. Thirdly, optimization tools and structural search work in tandem to support informed decision making. Optimization tools are used to generate statistically different networks to test the structural search algorithms. For instance, to generate a scale-free network, nodes must be paired up according to the power law distribution. This is a typical integer linear programming (ILP) problem. Structural search is a process to search networks for useful structures such as elite groups and portals [15]. The useful structure we intend to search in the context of public health is a group of nodes or links. Isolating or removing these can minimize the speed and scale of diseases diffusion in a network. The next two subsections discuss the first three steps in modeling a public health network.
3.1.1 Design of Network Metrics Several network metrics were developed [18] to describe network centrality. The degree centrality describes a network’s communication load. The betweenness centrality describes a network’s control of information. The closeness centrality describes a network’s communication capability. Centralities were not developed with the consideration of diffusion speed and scale. Among the three metrics, the closeness centrality is the closest to the network metric desirable for public health, but it does not appropriately measure the speed of diseases diffusion in a network. In addition, the closeness centrality cannot be calculated for a network with multiple components because nodes from different components are disconnected. To overcome this difficulty, the complement centrality [13] was developed for both networks with one component and networks with multiple disconnected components. The complement centrality is based on Freeman’s closeness centrality. The complement centrality and closeness centrality are the same for a connected network. The complement centrality therefore cannot accurately measure the speed and scale of diseases diffusion in a network. Figure 3.1 shows three networks each of which has five nodes. The network in Fig. 3.1a has five links. The network in Fig. 3.1b has four links. The network in
38
3 Public Health
4
4
4
5
3
5
3
5
3
1
2
1
2
1
2
(c)
(b)
(a) Fig. 3.1 Network metrics of three networks
Table 3.1 Diseases diffusion in three networks in Fig. 3.1 Networks
(1, 2)
(1, 3)
(1, 4)
(1, 5)
(2, 3)
(2, 4)
(2, 5)
(3, 4)
(3, 5)
(4, 5)
3.1(a)
1
2
2
1
1
2
2
1
2
1
3.1(b)
1
2
3
4
1
2
3
1
2
1
3.1(c)
1
∞
∞
∞
∞
∞
∞
1
2
1
Fig. 3.1c has links. To compare the speed of diseases diffusion, we notice that three 5 there are = 10 pairs of nodes in each of the three networks. The speed of 2 diseases diffusion in a network is determined by how fast diseases transmit between each pair of nodes. Table 3.1 shows the 10 pairs of nodes and how many steps it takes to transmit diseases between nodes in each pair, which is a direct measurement of how fast diseases diffuse. The more steps it takes, the slower diseases diffuse. Between networks 3.1(a) and 3.1(b), the number of steps is the same for seven out of 10 pairs. For three pairs highlighted in bold in Table 3.1, (1, 4), (1, 5), and (2, 5), network 3.1(b) takes more steps than network 3.1(a). It is clear that network 3.1(a) diffuses diseases faster than network 3.1(b). Network 3.1(c) has two components; it has the lowest speed of diseases diffusion. The network metric that is the closest to describing diffusion speed is the reciprocal metric [6], which can be calculated for both connected and separated networks. The 2
1 i> j d
reciprocal metric is defined as 1 − n(n−1)i j , where n is the total number of nodes in a network and di j is the geodesic distance between nodes i and j. d1i j = 0 if nodes i and j are disconnected. One issue with the reciprocal metric is that the smaller it is, the higher diffusion speed does a network have. For example, the values of the reciprocal metric for networks 3.1(a), 3.1(b), and 3.1(c) are 0.250, 0.358, and 0.650, respectively. These values indicate that network 3.1(a) has the highest diffusion speed whereas network 3.1(c) has the lowest diffusion speed. To correct this misalignment between the diffusion speed and the values of the network metric, the diffusion speed, S P, defined in Eq. (1.1) in Chap. 1 is used to measure diffusion speed. The S P is simply one minus the reciprocal metric. The values of S P for networks 3.1(a), 3.1(b),
3.1 Process and Tools for Modeling Public Health Networks
39
and 3.1(c) are therefore 0.750, 0.642, and 0.350, respectively. In addition to the S P, several other network metrics defined in Chap. 1 such as SC, N C, LC, and H G, can be used to describe network properties in public health.
3.1.2 Complex Network Theory and Structural Search Complex network theory provides a list of candidate mathematical models which can be used in conjunction with statistical tools to identify structures of networks in public health. The models that describe networks in public health are utilized to generate statistically different networks with various orders to test the structural search algorithms. Several models can be analyzed to identify network structures in public health. Section 1.1 in Chap. 1 and Sect. 7.2 in Chap. 7 provide an overview of network models. Structural search is a process to search networks for useful structures and was applied to find elite groups and portals [15]. The motivation of applying structural search is to design a computationally feasible algorithm to identify nodes and links in networks for isolation or removal. In theory, network metrics that reflect the speed and scale of diseases diffusion are sufficient in determining which nodes and links should be isolated or removed. In practice, however, network metrics cannot be used because of their high computational requirement. The node and link metrics illustrated in Sect. 1.3 in Chap. 1 can be used in structural search.
3.2 Conflict and Error Detection in Health Insurance Claims Health insurance claim denials represent a major problem for public health in the U.S. The Health Insurance Association of America estimated in 2003 that 14% of all insurance claims are denied [29]. The denial rate is difficult to verify since the reporting requirements of denials vary by state and insurers interpret the requirements differently. Table 3.2 summarizes denial rates in four states [35]. Table 3.2 Private insurance denial rate across states [35] State
Private insurance denial rate (%)
Private insurance denial rate range
Number of insurance companies
Ohio
11
–
All
Connecticut
21
4–29%
21
Maryland
16
–
41
California
24
6–40%
6
40
3 Public Health
The U.S. healthcare system is dependent on health insurance [9]. Healthcare providers receive compensation for their services to the patients in an indirect manner. Most patients pay annual fees to an insurance company, which reimburses payments for approved healthcare services provided by healthcare providers. About 70–80% of U.S. hospitals’ operating revenues are reimbursed by third-party healthcare insurance companies (e.g., Blue Cross and Blue Shield Association, UnitedHealth Group, Medicare, and Medicaid). The insurance claim process is prone to conflicts and errors, which lead to insurance claim denials and cause significant revenue loss for healthcare providers. The revenue function in healthcare is different from other industries in several ways: (a) most payments are made by a third-party and not by the customer (patient); (b) identical services may have significantly different payments depending on the payer; and (c) the payment process is complex and depends on rules and guidelines set by payers. The rules and guidelines not only vary by payers but also by state. Figure 3.2 shows an example of an insurance claim process. The players in the insurance claim process include healthcare providers, insurance company, and patients. The process is case-specific and time-sensitive. An increasing number of health insurance claim denials are due to medical necessity, insurance coverage, billing, and coding errors. A health insurance denial occurs when the insurance company (the payer) rejects the claim payment for the healthcare service. These denials can be divided into hard and soft denials. Hard denials are unrecoverable and considered by healthcare providers as loss or written-off revenue. Soft denials are recoverable and can be paid if the healthcare provider takes the right follow-up actions. It is estimated that 90% of health insurance denials are preventable and 67% of denied claims are recoverable [7]. Medical necessity and coverage denials contribute to a significant percentage of total claim denials. They are also difficult to prevent and have the lowest recovery rate [30]. Managing health insurance denials has become one of the most debated and pervasive issues in public health. As a result of health insurance denials, healthcare organizations lose millions of dollars every year. It is estimated that on average providers loose 5% of gross revenue due to billing inconsistencies, which can translate to millions of dollars for a hospital [19]. From year 2000 to 2006, 42 acute care hospitals filed for bankruptcy, and 28 of those were forced to close soon after [25]. Hospitals across the U.S. face rising costs and decreasing insurance payments. The American Hospital Association (AHA) revealed a rapid growth in healthcare costs from about 200 billion to over 2 trillion in the past thirty years [1].
3.2.1 Management of Health Insurance Claim Denials Reducing insurance denials provides the largest opportunity for revenue growth in healthcare organizations [12]. Tools for claim processing aim to minimize processing errors. For example, probabilistic inductive learning can be used for reengineering claim processing. This approach assists in reducing cost and predicting and enhancing
3.2 Conflict and Error Detection in Health Insurance Claims
41
Fig. 3.2 An example of health insurance claim process
quality and case outcomes in managed care practices [4]. Data mining is also used to predict and prevent errors in health insurance claims processing [24]. Classification methods are useful for insurance fraud detection [31]. A method using co-occurrence [34] can detect fraud and errors in health insurance claims. Statistical methods can be used to analyze insurance claim denials. For example, Baptist Health System in San Antonio, Texas applied statistical methods to determine if denial rates are increasing or decreasing [19]. Health insurance denials can also be reduced using computerized collaborative billing [26], where a networked relational database was developed for trauma surgeons to help to reduce trauma payment denials.
42
3 Public Health
Besides the technical and data-intensive approaches, managerial approaches are used by healthcare providers to reduce insurance denials and improve revenue. One effective approach is the Six Sigma, which reduces the volume of insurance denials and ultimately increases revenue. Stanly Regional Medical Center in Albemarle, North Carolina applied the Six Sigma in 2007 and achieved savings of over $1,600,000 per year [22]. Mercy Medical Center in Des Moines, Iowa applied the Six Sigma in 2005 to minimize outpatient claim denials and write-offs by about $350,000 per year. They also reduced the number of personnel hours required to fix claim errors by 62%. Hospital case management can also be used to reduce health insurance denials [14]. Case management, which is based on resource management, allows healthcare providers to reverse mounting losses caused by payer denials, increase patient access to care, and improve efficiency in care delivery. Medical necessity denials contribute to a significant percentage of total denials. An outpatient order entry system was developed and implemented to support medical necessity by utilizing electronically accessible patient history, provider information, and clinic-related diagnoses [17]. An order entry system helps healthcare providers select compliant justifications for tests and procedures. A method for determining medical necessity for six procedures based on a modified Delphi panel process was also developed [21]. The panel was comprised of practicing clinicians who were recognized leaders in their fields. The proposed method tested the medical need for a procedure using ratings for appropriateness and necessity and provided guidelines for the identification of unmet needs. Other managerial approaches for optimizing revenue by reducing medical necessity claim denials [8] include examining local medical review policies, enhancing physician communications, training patient access staff, and establishing proper billing processes. A strategic approach for medical necessity denials management [30] suggested close cooperation among case management, discharge planning, patient accounts, admissions, physician leadership, payer contracting, and information technology. Auditing systems that are based on rule extraction and case extraction have reallife applications in software security assurance [20], insurance fraud detection [32], internal auditing of banks [27], customer service management [2], and insurance claim denials [3]. A mathematical model for hospital profit planning under Medicare reimbursement was developed [28]. The model used nonlinear programming for pricing and cost allocation to aid hospital administrators in maximizing profits and meeting profit goals. Researcher also conducted a comparison of different Medicare reimbursement policies under optimal hospital pricing [16]. The study used a nonlinear hospital pricing model to determine the effect of profit-maximization and satisfying behaviors of three reimbursement policies on the levels of hospital profit received and on the respective implications for private payers and Medicare. The impact of an outcome-oriented reimbursement policy on clinic, patients, and pharmaceutical firms was mathematically modeled and analyzed [33]. Since many health insurance claim denials are unintentional and caused by conflicts and errors occurring at different steps of claim life cycle, a conflict and error detection model can be developed to identify conflicts and errors and prevent them from occurring. Conflict and error detection models were developed for collaborative
3.2 Conflict and Error Detection in Health Insurance Claims
43
networks such as a production network [11, 23, 37, 38]. The models are used to identify production conflicts and errors within production networks, and these models can be applied in service industries such as healthcare networks, transportation networks, and energy and water distribution systems.
3.2.2 A Case Study of Insurance Claim Denial Figure 3.3 shows an example of an outpatient who is scheduled for a spinal surgery with a primary doctor at a local hospital. Prior to completing the surgery, the authorization is given by the insurance company on reimbursement even though this does not guarantee payment. Due to the surgery’s unpredictable circumstances, a slightly different surgery is performed. The surgery performed has approximately the same cost as the authorized surgery, but with different procedural terminology codes. The doctor records the surgery performed in medical records, which creates a conflict between the claim filed and authorization received. The procedure is coded and passed through claims editing without any problems. The claim editing process uses a claim scrubber to check if required fields for insurance are filled but does not check for conflict and errors in completed fields. If all fields are complete, the claim is assumed to be valid and is then sent to the insurance company. The insurance company receives the claim and promptly denies it. The procedure performed is
Fig. 3.3 A case study of denial of insurance claim
44
3 Public Health
not authorized prior to completion; the insurance company does not reimburse the hospital. To resolve the issue, the hospital financial office contacts the doctor, who is not able to offer much assistance because he does receive reimbursement for the procedure from the hospital. The hospital financial office has worked over a year on trying to receive reimbursement from the insurance company with no success. As more time passes, the more likely that the cost of the procedure, $13,000, will be lost. This is not the first time that the local hospital deals with the same procedure. Previous experience produces opposite outcome. The difference with this claim is that the insurance provider is not local, and the appeal must go through an unfamiliar company. This case study shows the complexity involved in filing insurance claims, even for an outpatient who is only in the hospital for a day. Each minor detail affects a claim. The insurance company expects a claim to cover every detail of a patient’s medical treatment and uses tremendous resources to investigate every claim. The burden of rectifying denials is often left to healthcare providers. If an error or conflict occurs with the procedure or in the billing, the provider (e.g., a hospital) is expected to absorb the cost, which further highlights the importance and broader impacts of streamlining health insurance claims in public health.
3.2.3 Modeling Conflicts and Errors in Health Insurance Claims A conflict is an inconsistency between multiple entities such as healthcare providers, patients, insurers, claims, and diseases. Each entity has properties. A property may be a code of an examination, the dollar amount of a claim, age of a patient, time of the healthcare service, and others. A conflict occurs when properties of multiple entities are inconsistent. An error occurs when an entity does not satisfy a constraint. For example, an error occurs when a claim is not submitted by a deadline. The claim is an entity, and the deadline is a constraint. A constraint defines the expected activity of an entity. The key difference between an error and a conflict is that an error involves only one entity, and a conflict is between multiple entities. This difference is critical because how we correct an error is different than how we resolve a conflict. Figure 3.4 shows examples of errors and conflicts. Both the healthcare provider and insurance company in Fig. 3.4 have seven entities. The entities belong to the healthcare provider are numbered from 1 to 7 and the entities belong to the insurance company are numbered from A to G. The healthcare system in the U.S. is prone to conflicts and errors on a regular basis. Most patient treatments consist of several services, which are performed by multiple healthcare providers, including doctors, nurses, technicians, and administrators. In addition, there are information editing and coding systems throughout the healthcare process. Conflicts and errors occur at each level of the insurance claim network. A conflict and error prevention and detection model (CEPDM) helps to
3.2 Conflict and Error Detection in Health Insurance Claims
45
Healthcare Provider
Insurance Company Conflicts D
2 B
5 1
3
6
7
Claim Submission/ Resubmission
E A
G
C
F 4
Errors
Claim Decision Reimbursement or Denial
Fig. 3.4 Examples of errors and conflicts
identify causes of conflicts and errors. The CEPDM prevents and detects conflicts and errors through a rule-based mechanism, network-adaptive search, and auditing protocols. The CEDPM also offers rules and guidelines on how CEs are analyzed and interpreted. The rule-based mechanism aims to compare a list of healthcare services and information, provided by patients in the claim, such as patient information, diagnosis, and treatment, with rules automatically generated based on historical information. Conflicts and errors are identified when any of the predefined rules is violated. The network-adaptive search generates a network of constraints based on each claim, which are compared to other claims to identify conflicts and errors. The sequence of constraints and claims selected for comparison is dynamic and changes based on network topology, constraint relationships, and claim relationships. The auditing protocols evaluate the necessity of procedures and treatments based on clinical standards, insurance (coverage) policies/guidelines, and medical coding. A comprehensive and updated database is important to support the auditing protocols.
References 1. American Hospital Association (AHA) (2010) Trends effecting hospitals and health systems. Available at: http://www.aha.org/aha/research-and-trends/chartbook/index.html. Accessed Aug 2020 2. An L, Yan J, Tong L (2005) An integrated rule-based and case-based reasoning system for customer service management. In: Proceedings of the IEEE international conference on ebusiness engineering (ICEBE’05) 3. Aqlan F, Yoon SW, Khasawneh MT, Srihari K (2011) A three-stage auditing system framework to minimize medical necessity claim denials. In: Proceedings of industrial engineering research conference
46
3 Public Health
4. Arunasalam RG, Richie JT, Egan W, Gur-Ali O, Wallace WA (1999) Reengineering claims processing using probabilistic inductive learning. IEEE Trans Eng Manag 46(3):335–345 5. Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512 6. Borgatti SP (2006) Identifying sets of key players in a social network. Comput Math Organ Theory 12:21–34 7. California Medical Association (CMA) (2006) Time management and administrative simplification: running your practice efficiently and effectively. Available at: http://www.cmanet.org/ bestpractices/best_practices_3.pdf. Accessed Dec 2020 8. Carter D (2002) Optimizing revenue by reducing medical necessity claims denials—Medicare/Medicaid. Healthc Financ Manag 6(10):88–94 9. Casto A, Layman E (2006) Principles of healthcare reimbursement. American Health Information Management Association, Chicago, IL 10. Chen XW, Landry SJ, Nof SY (2011) A framework of enroute air traffic conflict detection and resolution through complex network analysis. Comput Ind 62:787–794 11. Chen XW, Nof SY (2012) Conflict and error prevention and detection in complex networks. Automatica 48:770–778 12. Citerone B, Phillips B (2004) Two numbers you need to know. Healthc Financ Manag 46–49 13. Cornwell B (2005) A complement-derived centrality index for disconnected graphs. Connections 26(2):70–81 14. Daniels S (1999) Using hospital-based case management to reduce payer denials. Healthc Financ Manag 53(5):37–39 15. Dawande M, Mookerjeeh V, Sriskandarajah C, Zhu Y (2012) Structural search and optimization in social networks. INFORMS J Comput 24(4):611–623 16. Dittman DA, Morey RC (1984) A comparison of alternative Medicare reimbursement policies under optimal hospital pricing. Health Serv Res 18(2):137–164 17. FitzHenry F, Kiepek W, Shultz E, Byrd J, Doran J, Miller R (2002) Implementing outpatient order entry to support medical necessity using the patient’s electronic past medical history. In: AMIA 2002 annual fall symposium, pp 250–254 18. Freeman LC (1979) Centrality in social network conceptual clarification. Soc Netw 1:215–239 19. Goding R (2007) Insurance claims: a study in denials—the Baptist Health System. Project report, Army-Baylor University, Sam Houston, TX, USA 20. Jang C, Kim J, Jang H, Park H, Jang B, Kim B, Choi E (2009) Rule-based auditing system for software security assurance. In: ICUFN 2009. First international conference, vol l(1), pp 198–202 21. Kahan K, Bernstein S, Leape L, Hilborne L, Park R, Parker L, Kamberg C, Brook R (1994) Measuring the necessity of medical procedures. Med Care 32(4):357–365 22. Kennedy B (2007) A Six Sigma approach to denials management. Society for Health Systems 2009 Conference 23. Ko HS, Yoon SW, Nof SY (2011) Intelligent alert systems for error and conflict detection in supply networks. In: Proceedings of international federation of automation control 24. Kumarm M, Ghani R, Mei Z (2010) Data mining to predict and prevent errors in health insurance claims processing. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining 25. Landry AY, Landry RJ (2009) Factors associated with hospital bankruptcies: a political and economic framework... with commentary by Nowak MC. J Healthc Manag 54(4):252–272 26. Lawrence R, Kimberly D, Geoffrey S, Thomas E, Victoria T, Richard G (2003) Reducing trauma payment denials with computerized collaborative billing. J Trauma-Injury Infect Crit Care 55(4):762–770 27. Lee GH (2008) Rule-based and case-based reasoning approach for internal audit of bank. Knowl-Based Syst 21(2):140–147 28. Morey RC, Dittman DA (1984) Hospital profit planning under Medicare reimbursement. Oper Res 32(2):250–269 29. Niedzwiecki MH (2006) The revenue cycle: what it is, how it works, and how to enhance it. AORN J 84(4):577–601
References
47
30. Olaniyan O, Brown IL, Williams K (2009) Managing medical necessity and notification denials. Healthc Financ Manag 63(8):62–67 31. Peng Y, Kou G, Sabatka A, Matza J, Chen Z, Khazanchi D, Shi Y (2007) Application of classification methods to individual disability income insurance fraud detection. In: Shi Y, van Albada GD, Dongarra J, Sloot P (eds) ICCS2007. LNCS, vol 4489. Springer, Berlin, Heidelberg, pp 852–858 32. Schiller J (2006) The impact of insurance fraud detection systems. J Risk Insur 73(3):421–438 33. So K, Tang C (2000) Modeling the impact of an outcome-oriented reimbursement policy on clinic, patients, and pharmaceutical firms. Manag Sci 46(7):875–892 34. Tyler M, Saifee M, Rahman S, Pathria A, Allmon A (2009) Healthcare insurance claim fraud and error detection using co-occurrence. United States Patent Application, Pub. No. US2009/0094064 35. U.S. Government Accountability Office (GAO) (2011) Private health insurance: data on application and coverage denials 36. Weeks MR, Clair S, Borgatti SP, Radda K (2002) Social networks of drug users in high-risk sites: finding the connections. AIDS Behav 6(2):193–206 37. Yang CL, Nof SY (2004) Design of a task planning conflict and error detection system with active protocols and agents. PRISM Research Memorandum No. 2004-P1, School of Industrial Engineering, Purdue University 38. Yang CL, Chen X, Nof SY (2005) Design of a production conflict and error detection model with active protocols and agents. In: 18th international conference on production research
Chapter 4
Smart and Autonomous Power Grid
Electrical power grid is a critical infrastructure and the backbone of modern civilization. The energy delivery infrastructure is central to the smooth operation of many economic, social, and societal aspects of day-to-day life and other critical infrastructures. Widespread unplanned power outages known as blackouts exemplify the importance of power grid. Recent blackout examples include the 2003 Northeast blackout [16], the 2011 San Diego blackout [12], and the 2012 Indian blackout [11]. Unlike other critical infrastructures, the U.S. power grid is not owned, operated, or controlled by a single authoritative entity. The U.S. has seven Regional Transmission Organization (RTO) and Independent System Operator (ISO) entities, over 2000 public and investor-owned utilities, nearly 900 cooperatives, over 150 power marketers, nine federal power agencies [1], and over 100 balancing authorities spread across eight electric reliability regions. In recent years, distributed power generation systems (DPGS) such as wind and solar energy conversion systems that can trim energy bills and environmental pollution have increased their share in energy supply. More households are installing gridconnected wind turbines and photovoltaic (PV) panels. In the U.S., the newly installed PV systems have a compound annual growth of nearly 60% since 2010. Since 2008, solar power installations in the U.S. have grown 17-fold from 1.2 gigawatts (GW) to an estimated 32 GW, which is enough capacity to power the equivalent of 5.7 million average American homes. There is also a huge growth of installed wind energy conversion systems. For example, as of 2015, Denmark generates 40% of its electricity from wind. At least 83 other countries in the world use the wind energy to supply the power grid. In 2014, global wind energy capacity expanded 16% to 369.553 GW. In the U.S., wind energy increases rapidly over the last decade. By the end of 2015, the overall installed wind energy generation capacity in the U.S. has reached 75 GW, which is exceeded only by China and European Union. The wind power’s largest growth in capacity was in 2012, when 11,895 MW of wind energy conversion systems were installed, representing 26.5% of newly installed
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 X. W. Chen, Network Science Models for Data Analytics Automation, Automation, Collaboration, & E-Services 9, https://doi.org/10.1007/978-3-030-96470-2_4
49
50
4 Smart and Autonomous Power Grid
power capacity. The U.S. wind industry has an average annual growth of 25.8% over the last ten years. Many states in the U.S. set up ambitious plans of utilizing more renewable energy resources, such as California, Hawaii, Texas, and Minnesota. In 2015, the State of Hawaii set a goal of obtaining 100% of its electricity generation from renewable energy resources by 2045. Meanwhile, in order to reduce the carbon-dioxide emissions and greenhouse effect, there is a growing trend of eliminating the traditional fossil-fuel based electricity generation capacities. For example, France aims to shut down all coal power plants by 2023. There is, however, a potential downside of these new renewable energy resources being integrated into the existing grid. Most DPGS are set up to disconnect from the grid whenever they detect a significant fault. If a single household DPGS trips offline, it is a limited fault for the owner. If hundreds or thousands of them trip simultaneously, which has become a reality as more DPGS are set up and join the grid, they would upset the network’s delicate power balance, affecting the overall power system stability, and turning an otherwise small disturbance into a blackout throughout a large region. The electrical power generated by DPGS is injected into the distribution lines. This requires the power converters to convert the direct current (DC) output of solar PV panels or the alternating current (AC) output of wind turbine generators, to the standard (e.g., 60 Hz in the U.S.) utility power. Commercial power converters can supply AC at the right voltage and frequency to the utility grid, but they are not capable of effectively responding to failures in DPGS. Smart and autonomous power converters can sense the information in a power grid using phasor measurement units or synchro-phasors and prevent DPGS from going off-line when they do not have to. With Smart and autonomous power converters, DPGS can help to make the power grid more stable by preventing sudden deterioration in voltage or frequency that would otherwise occur when a multitude of DPGS fail at the same time. As an example, a coal fired plant unexpectedly goes offline or a transmission line from the plant suddenly goes down. The loss might cause the grid voltage to dip by up to 10%. To restore the voltage back to normal, technicians must bring additional generation online quickly through backup capacity; or they must reduce the demand through an automated mechanism known as demand response, which enlists customers to reduce their electricity consumption when the electricity price is high or when grid reliability is threatened. An undesirable outcome is to lose more generators, but that often happens when a host of traditional power converters switch off en masse in response to the voltage drop. The reason for converters to disconnect is to make sure that distributed solar PV and wind energy conversion systems do not inadvertently energize a downed power line. In the first few seconds of the voltage drop, however, it is critical that DPGS keep producing power in the section of the grid that has not been damaged. A smart power converter can “ride through” voltage or frequency drops and other transient disturbances.
4.1 Smart and Autonomous High-Power Converter
51
Fig. 4.1 Electrical schematic of a single dead-zone oscillator
4.1 Smart and Autonomous High-Power Converter Smart grid is a connection of heterogeneous energy sources such as solar PV arrays, fuel cells, wind energy conversion systems, and energy storage devices, which are interfaced with an electric distribution network that can be connected to and isolated from the tradition AC utility power grid. The control challenge is to regulate the amplitude and frequency of the inverters’ terminal voltage such that high quality of electricity delivery can be guaranteed. The existing ubiquitous control method for power converters is droop control, which linearly trades off the inverter-voltage amplitude and frequency with reactive power output [2, 3, 14, 18]. Apart from droop control, a pioneering time domain control method [4–6] is the virtual oscillator control (VOC; [8, 13, 15, 17]). The key idea behind VOC is to leverage the properties of oscillators. The VOC inverters are regulated to mimic the dynamics of nonlinear dead-zone limit cycle oscillators. The term “virtual oscillator” refers to the nonlinear oscillator that is programmed on a digital microprocessor. The VOC presents appealing circuit level (power inverter) and system level (electrical grid) advantages. At the system level, synchronization emerges in connected electrical networks of VOC inverters, and primary level voltage and frequency regulation is achieved in a decentralized fashion. At the circuit level, each inverter rapidly stabilizes arbitrary initial conditions and load transients to a stable limit cycle. The VOC is fundamentally different from droop control. The VOC acts on instantaneous time domain signals whereas droop control is based on the electrical phasor quantities. The VOC subsumes droop control in sinusoidal steady state condition. Figure 4.1 shows the electrical schematic of the nonlinear dead-zone oscillator. Figure 4.2 is the diagram of the single-phase Space-Vector Pulse Width Modulation based VOC (SVPWM-VOC) smart power converter. Figure 4.3 is the three-phase SVPWM-VOC converter.
4.2 Weather-Proof Smart Grid An emerging feature of smart grid is weather-dependent power generation. The amount of power generated by solar PV arrays depends on the amount of light that the cells are exposed to. The amount of power generated by wind turbines depends
52
4 Smart and Autonomous Power Grid
Fig. 4.2 Single-phase SVPWM-VOC converter
Fig. 4.3 Three-phase SVPWM-VOC converter
on the speed and duration of wind. There has been a sustained increase of solar and wind energy production distributed throughout the grid. Compared to a centralized power generation system, distributed power flowing into the grid is more difficult to control. The U.S. National Weather Service provides real time information indicating areas under tornado warnings and watches. These areas are identified in multiple formats, including the counties affected and geometries of affected areas. A database can be developed to monitor data release, identify affected generators, and issue grid control advisories [7]. For wind power generators, the most likely control is to shut
4.2 Weather-Proof Smart Grid
53
down wind turbines in severe weather conditions. For solar PV arrays, severe weather conditions indicate a significant decline in solar energy output. The U.S. National Weather Service also produces real time, post-processed radar images of rain and cloud density roughly every ten minutes. The database for grid control can acquire the images as they are released and process them to identify areas of high cloud density and provide grid control advisories. Sequences of images can be used to create near-term predictions of cloud density, which is used to estimate solar energy output [9, 10]. In addition to radar images, the National Oceanic and Atmospheric Administration (NOAA) produces satellite images of visible and infrared cloud cover images every half hour. The database can incorporate these images to help provide grid control advisories.
4.3 Integration of High-Power Converters and Weather-Proof Smart Grid Sustainability and resilience of power grid depend on weather conditions, economic, and geographical factors. Natural and manmade disasters often occur unexpectedly and are difficult to predict. Multiple power converters may fail simultaneously due to tornadoes, floods, earthquakes, terrorist attacks, and other disasters. Numerous accidents and incidents occur frequently and may cause catastrophic events if appropriate protection measures are not implemented. There are three steps to achieve a weather-proof smart grid through automation and data analytics. The first step is to develop prototype smart and autonomous power converters for a micro-grid. These prototypes are implemented to stabilize a relatively small and isolated electricity network, the micro-grid. A micro-grid can be connected to or disconnected from the larger utility power grid. When the micro-grid is isolated from the larger utility power grid, it generates power from its own generation units. In this case, it can be challenging to maintain stability because a micro-grid has a relatively small total load. The addition of new loads, such as a central air-conditioning unit, can cause a large disturbance to the micro-grid. Multiple load types and several types of small autonomous power converters should be tested. The second step is to develop full-scale smart and autonomous power converters and implement in utility-scale grid. Tests are be conducted on a large collection of residential scale inverters and loads connected to a simulated power grid. The inverter and loads interact realistically with the simulated power grid, allowing a range of scenarios to determine if the grid voltage and frequency stay within the desired normal operating conditions. The third step is to make off-the-shelf products of the smart and autonomous power converters. The products should be compatible with the existing droop control widely used in power grid. As more and more renewable energy supplies power to the utility power grid, this smart and autonomous space-vector pulse width modulated power converter scheme can help to maintain a stable power grid.
54
4 Smart and Autonomous Power Grid
References 1. American Public Power Association (APPA) (2013) Statistics from the 2012–2013 Public Power Annual Directory and Statistical Report 2. Bidram A, Davoudi A (2012) Hierarchical structure of microgrids control system. IEEE Trans Smart Grid 3(4):1963–1976 3. Chandorkar MC, Divan DM, Adapa R (1993) Control of parallel connected inverters in standalone AC supply systems. IEEE Trans Ind Appl 29:136–143 4. Johnson BB, Dhople SV, Hamadeh AO, Krein PT (2014) Synchronization of nonlinear oscillators in an LTI electrical power network. IEEE Trans Circuits Syst I Regul Pap 61:834–844 5. Johnson BB, Dhople SV, Cale JL, Hamadeh AO, Krein PT (2014) Oscillator-based inverter control for islanded three-phase microgrids. IEEE J Photovoltaics 4:387–395 6. Johnson BB, Dhople SV, Hamadeh AO, Krein PT (2014) Synchronization of parallel singlephase inverters with virtual oscillator control. IEEE Trans Power Electron 29:6124–6138 7. Maughan L, McKenney M, Benchley Z (2014) A model of aggregate operations for data analytics over spatiotemporal objects. In: Proceeding of the international conference on conceptual modeling, pp 234–240 8. Mauroy A, Sacré, P, Sepulchre RJ (2012) Kick synchronization versus diffusive synchronization. In: Proceeding of the IEEE conference on decision and control, pp 7171–7183 9. McKenney M, Shelby R, Bagga S (2016) Implementing set operations over moving regions using the component moving region model. GeoInformatica 1–28 10. Mckenney M, Frye R (2015) Generating moving regions from snapshots of complex regions. ACM Trans Spatial Algorithms Syst 1(1):4 11. National Post (2012) Pictures of Chaos as Massive India Blackout Leaves 670 Million without Power. http://news.nationalpost.com/2012/07/31/pictures-of-chaos-as-massive-indiablackout-leaves-670-million-without-power/. Accessed in Feb 2021 12. National University System Institute for Policy Research (2011) Preliminary report on the San Diego blackout economic impact. http://www.nusinstitute.org/Research/Briefs/PreliminaryReport-on-SanDiego-Blackout-Economic-Impact.html. Accessed in Feb 2021 13. Peng FZ, Lai J-S (1996) Generalized instantaneous reactive power theory for three-phase power systems. IEEE Trans Instrum and Measur 45 14. Pogaku N, Prodanovic M, Green T (2007) Modeling, analysis and testing of autonomous operation of an inverter-based microgrid. IEEE Trans Power Electron 22:613–625 15. Tôrres LAB, Hespanha JP, Moehlis J (2012) Power supplies dynamical synchronization without communication. In: Proceeding of the power & energy society 2012 general meeting 16. U.S. and Canada Power System Outage Task Force (2004) Final Report on the August 14th Blackout in the United States and Canada: Causes and Recommendations. https://reports.ene rgy.gov/BlackoutFinal-Web.pdf. Accessed in Feb 2021 17. Wang F, Duarte J, Hendrix M (2009) Active and reactive power control schemes for distributed generation systems under voltage dips. In: IEEE energy conversion congress and exposition, pp 3564–3571 18. Zhong Q-C (2013) Robust droop controller for accurate proportional load sharing among inverters operated in parallel. IEEE Trans Industr Electron 60(4):1281–1290
Chapter 5
Water Distribution Systems
Water distribution systems (WDS) are a type of spatially organized systems in which components are connected by pipes [20]. The interconnected components are nontrivial and interact in various ways through pipes they form a complex network, which comprises nodes and links [19]. Nodes include source nodes (e.g., reservoirs, tanks, and storage facilities), control and distribution nodes (e.g., pressure control valves, pipe junctions, and pumps), and demand nodes or sinks (e.g., consumers). Transmission and distribution pipes are links. A WDS may be characterized using several performance metrics. Redundancy refers to the existence of alternative water supply sources or paths. Path redundancy may be measured using three metrics. The larger the three metrics are, more redundant is the WDS. A meshedness coefficient is the number of loops divided by the maximum possible number of loops. A loop provides two paths from a source to a customer. The link density is the number of links divided by the maximum possible number of links. Higher density indicates a more connected WDS. A clustering coefficient is the number of triangles divided by the maximum possible number of triangles. A triangle may be viewed as a loop of three nodes. If a link in a triangle breaks, it does not affect the connectivity of the triangle. Another metric for the WDS is reliability or availability [2, 8, 9, 16], which is the probability of non-failure over time and is assessed using historical data for failures. Higher redundancy in a WDS increases its reliability. On the other hand, a more reliable WDS requires less redundancy and therefore reduces the cost of developing and maintaining the WDS. The third metric for the WDS is resilience or robustness, which indicates the overall structural tolerance of the WDS to errors and failures and may be measured using five different metrics. The average degree of nodes, θ , is the average number of links for a node in a WDS. If a WDS can be described using a degree distribution, θ is the mean of the degree distribution. A WDS with a larger θ is usually more resilient to failures, although this conclusion depends on the structure of the WDS. The diameter is the maximum geodesic distance between all pairs of nodes. A WDS is © The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 X. W. Chen, Network Science Models for Data Analytics Automation, Automation, Collaboration, & E-Services 9, https://doi.org/10.1007/978-3-030-96470-2_5
55
56
5 Water Distribution Systems
a connected network. Every node is connected to any other node directly or indirectly. The diameter indicates the size of a WDS. The average path length is the average geodesic distance between all pairs of nodes in a WDS. The central point dominance is the average difference between the most central point and all other nodes. The spectral length is the difference between the first and second eigenvalues of the adjacency matrix of a WDS. The algebraic connectivity is the second smallest eigenvalue of the Laplacian matrix.
5.1 Design and Maintenance of WDS Design of WDS aims to identify optimal pipe sizes that minimize the total cost and head deficit while satisfying the following conditions: (a) water demand must be met; (b) conservation of mass that ensures inflows and outflows must balance at each node; (c) conservation of energy that requires the head loss in a closed loop must equal zero; (d) head loss in each pipe must be a function of the flow in the pipe and the pipe’s diameter, length, and hydraulic resistance; (e) pressure head and flow requirements indicate that a minimum pressure must be provided for any given set of demands; and (f) admissible pipe sizes require that diameters of pipes are selected from a set of commercially available ones. In a typical WDS, short and medium-size pipes are used at the city centers or in urban areas whereas a few long and large pipes carry water from suburban sources and reservoirs. There are three types of maintenance policies for WDS [6, 20]: (a) corrective maintenance is to repair when a failure occurs or when the degradation level of the system reaches an unacceptable error, (b) condition-based maintenance is to repair when an indicator of the condition of the system reaches predetermined level; and (c) preventive maintenance is to repair at predetermined time intervals which are estimated based on the historical data. Advancements in sensor technologies, chemical and physical non-destructive testing (NDT), and sophisticated measurement techniques have allowed better condition-based maintenance with lower cost and risk. There are three main tasks for condition-based maintenance. First, the maintenance team determines the condition indicator which can describe the condition of the system. A condition indicator can be a characteristic such as corrosion rate, crack-growth, and wear and lubricant condition such as viscosity. Secondly, it is necessary to monitor the condition indicator and assess the current system condition from the measured data. Thirdly, the limit value of the condition indicator is determined and exceeding the limit triggers condition-based maintenance. When a system is subject to degradation and sudden failures (competing risks), the degradation state of the system is monitored continuously. When either the level of degradation goes up to a predetermined threshold or a sudden failure occurs before the system reaches up to the degradation threshold level, the system is immediately repaired (renewed) and restored to normal operation. This problem includes several processes and factors, including renewal process, degradation process, failure
5.1 Design and Maintenance of WDS
57
process, operating times, repair times, and repair cost. A condition-based maintenance model needs to be formulated to describe and integrate these processes and factors. A maintenance policy must take into account properties of a WDS, historical data about failures, and water demand. For instance, if water demand is high and there is no tolerance to stopping water flow for an extend period of time, preventive maintenance is not the best option.
5.2 Data and Web Mining of the WDS in Washington D.C., U.S.: A Case Study Washington D.C. (D.C. hereafter), the capital of the U.S., has a total area of 68.3 square miles, of which 61.4 square miles are land and 6.9 square miles are water. D.C. had a population of 601,723 in 2010 and was estimated to have a population of 632,323 in 2012 [17]. The water demand of D.C. Metropolitan Area was about 468 million gallons per day in 1997, whereas D.C. population was 572,059 in the official census of 2000 (the census closest to the year of 1997). The water demand in 2012 was estimated to be 517.30 million gallons per day (632,323/572,059 × 468 = 517.30). Residential water demand refers to the average water consumption in gallons per person per day from the utility in a particular supply area [11]. Economic, social, and environmental factors affect residential water demand. From the economic perspective, the household demand for water is a function of the direct demand for drinking purposes and indirect demand for different household activities such as cooking, cleaning, personal hygiene, and gardening [11]. Price of the water is one of the determinants that affects water demand per person, but is estimated mostly inelastic [1]. Even though water has no substitutes for basic uses, prices can play significant role in demand management as long as the elasticity is not zero [1]. Non-residential water demand can be categorized into municipal and industrial water demand without tourism, municipal water demand restricted to tourism, and agricultural water demand [15]. Municipal and industrial water demand refers to water requirements for commercial sector (e.g., shops, department stores, and hotels), and for the services sector (e.g., hospitals, offices, and schools). Municipal water demand restricted to tourism represents water demand for overnight-stayed tourists [15]. Figures 5.1, 5.2, 5.3 and 5.4 indicate museums (total 13; Figs. 5.1 and 5.2), universities (total 10; Fig. 5.3), and hospitals (total eight; Fig. 5.4) in D.C. Most museums are located in the White House area. Universities spread out across the city. Most hospitals are located close to each other. The area between the White House and South Dakota Avenue is crucial in terms of continuous water supply. Table 5.1 shows total number of visits made to each museum in D.C. in year 2012. Table 5.2 shows the total number of students and the number of on-campus residents at each university. Table 5.3 shows the hospitals in D.C. and the number of beds each
58
Fig. 5.1 Museums in D.C. (Part 1)
Fig. 5.2 Museums in D.C. (Part 2)
5 Water Distribution Systems
5.2 Data and Web Mining of the WDS in Washington D. C. …
59
Fig. 5.3 Universities in D.C.
hospital has. In addition, Table 5.4 demonstrates the water consumption of Museum of Science, Environment and Climate of Lleida, Spain. Figure 5.5 shows major roads and intersections in D.C. Magenta lines indicate major roads, whereas light blue lines indicate the second-level roads that provide water to residential and industrial areas and are connected to major roads. Red circles indicate major intersections, whereas dark blue circles show the second-level intersections. It may be reasonably assumed that major water supply pipes are underneath the major roads and major pipe junctions are underneath the major intersections. The size of a pipe or junction is proportional to the size of a road or intersection, respectively. Figure 5.6 is the AutoCAD drawing of D.C. Region 1 has all the museums and some hospitals. Region 2 has most of the hospitals. Region 3 includes most of the universities. The other four regions, 4, 5, 6, and 7, are large residential areas.
60
5 Water Distribution Systems
Fig. 5.4 Hospitals in D.C.
5.2.1 Water Supply in D.C. In 1997, the WDS for D.C. Metropolitan Area supplied an average of 468 million gallons per day, and 96% of this demand was provided by the Fairfax County Water Authority (FCWA) servicing most of northern Virginia, the Washington Aqueduct Division (WAD) of the Army Corps of Engineers servicing the District of Columbia, Arlington, and Falls Church, and the Washington Suburban Sanitary Commission (WSSC) servicing most of the Maryland suburbs. Seventy-five percent of the water supply to D.C. Metropolitan Area came from the Potomac River and the other 25% came from the Patuxent and Occoquan Rivers [18]. The FCWA provides water to Fairfax County and wholesales water to Virginia American Water Company for distribution to Alexandria and Dale Cities. The FCWA also services water under contract to the Prince William County Service Authority and the Loudoun County Sanitation Authority. The FCWA provided water to 385,000 households in 1995 [18]. Figure 5.7 shows that most of the service area of the FCWA
5.2 Data and Web Mining of the WDS in Washington D. C. …
61
Table 5.1 Number of visits to Museums in D.C. [7] Location in Figs. 5.1 and 5.2 Museum
Number of visits
1
Donald W. Reynolds Center for American Art 1 million and Portraiture
2
National Museum of African Art
353,959
3
National Air and Space Museum
6.8 million
4
National Museum of American History
4.8 million
5
National Museum of the American Indian
1.6 million
6
Hirshhorn Museum and Sculpture Garden
753,258
7
National Museum of Natural History
7.6 million
8
Renwick Gallery of the Smithsonian American Art Museum
162,051
9
S. Dillon Ripley Center
465,654
10
Smithsonian Institution Building, “The Castle”
1.4 million
11
Arthur M. Sackler Gallery & Free Gallery of Art
869,743
12
National Postal Museum
321,953
13
Anacostia Community Museum
31,168
Total
26,157,786
Table 5.2 University students and residents in D.C. [14] Location in Fig. 5.3
University
Number of on-campus residents
Student population
1
American University
3906
12,380
2
Catholic University of America
4895
6894
3
Corcoran College of Art and Design
N/A
707
4
Gallaudet University
1175
1546
5
George Washington University
17,177
25,260
6
Georgetown University
11,648
17,130
7
Howard University
6032
10,583
8
National Defense University
N/A
N/A
9
Potomac College
N/A
N/A
10
Radians College
N/A
N/A
44,833
62,120
Total
62 Table 5.3 Hospital beds in D.C. [14]
5 Water Distribution Systems Location in Fig. 5.4
Hospital
Number of beds
1
Children’s National Medical Center
303
2
George Washington University Hospital
371
3
Georgetown University Hospital
609
4
Howard University Hospital
264
5
National Rehabilitation Hospital
137
6
Providence Hospital
512
7
Psychiatric Institute of Washington
104
8
Sibley Memorial Hospital
257
9
Specialty Hospital of Washington
10
St. Elizabeth Hospital
292
11
United Medical Center
234
12
Washington Hospital Center
926
Total Table 5.4 Water consumption in museum of science, environment, and climate [10]
82
4091
Use
Usage/day
Toilets
120
3
360
Urinals
60
0
0
Showers
Liters/usage
Liters/day
2
15
30
Sinks
180
2
360
Cafeteria
100
4
400
Clinics
1
240
240
Irrigation
0
0
0
Mist/spraying
1
344
344
Water mass
1
274
Total
274 2008
5.2 Data and Web Mining of the WDS in Washington D. C. …
63
Fig. 5.5 Major roads and intersections in D.C.
is in Virginia; only Alexandra City is right next to D.C. The FCWA is not a main water supplier for D.C. The WAD supplies treated water to D.C.’s Water and Sewer Authority, the Arlington County Department of Public Works, and the Falls Church Department of Public Utilities. Falls Church distributes water to users in a large section of Fairfax County and in the town of Vienna. The WAD also furnishes water to federal facilities such as National Airport, Ft. Myer, and Arlington Cemetery. The raw water of the WAD comes mostly from the Potomac River through two conduits and the Great Falls; the remaining portion of raw water comes from the Little Falls pumping station, which is used mainly to meet demand during periods of low river flow. The WAD provided water to over 350,000 households in 1995 [18]. The WSSC supplies treated water to Prince George’s County, Montgomery County, and a small part of Howard County in Maryland. The WSSC withdraws water from the Patuxent River, storing in two reservoirs, and from an intake on the Potomac River fourteen miles above the Chain Bridge. The WSSC provided water to almost 550,000 households in 1995 [18]. Figure 5.8 indicates that the service area of the WSSC surrounds D.C. but is not located within D.C. borders. The main water supplier for D.C. is the WAD, which also supplies water to part of Virginia. Assume that the percentage of households in D.C. Metropolitan Area serviced by the three suppliers remains the same in 2012 as was in 1995, it is estimated
64
5 Water Distribution Systems
Fig. 5.6 AutoCAD drawing of D.C.
that in 2012 the WAD provided 140.91 million gallons of water per day (517.30 × 350,000 (Fig. 5.9). 27.24% = 140.91), where 27.24% is equal to 385,000+350,000+550,000 The Washington Aqueduct is a federally owned and operated public water supply agency that produced an average of 180 million gallons of water per day in 2001 at two water treatment plants (WTPs) located in D.C., the Dalecarlia and McMillan WTPs [13]. Figure 5.10 shows the locations of the two WTPs. The WAD is estimated to provide 198.96 million gallons of water per day (632,323/572,059 × 180 = 198.96). This estimate is different than the previous estimate, 140.91 million gallons, which is calculated using the percentage of households the WAD services. The percentage of non-residential water demand serviced by the WAD is not considered in the estimated 140.91 million gallons. The WAD serviced approximately one million citizens in D.C. and Virginia in 2001 [13]. The population of D.C. was 572,059 in 2000. The water demand of D.C. in 2012 is estimated to be between 80.61 (140.91 × 572,059/1,000,000 = 80.61) and 113.82 million gallons per day (198.96 × 572,059/1,000,000). The total water demand of on-campus residents at universities in D.C. (Table 5.2) is between 5.7 (80.61/632,323 × 44,833 = 5,715,415) and 8.1 million gallons per day (113.82/632,323 × 44,833 = 8,070,072). By assuming that the amount of water consumed per use would be same in the museums of D.C. with the Museum of Science, Environment and Climate in Spain, the water amount per use (Table 5.4) would be 233 gallons per day (882 L × 0.2642 = 233 gallons).
5.2 Data and Web Mining of the WDS in Washington D. C. …
Fig. 5.7 Service area of the FCWA
Fig. 5.8 Service area of the WSSC
65
66
5 Water Distribution Systems
Fig. 5.9 Percentage of households in D.C. metropolitan area serviced by three water suppliers
Fig. 5.10 Dalecarlia and McMillan WTPs
5.2 Data and Web Mining of the WDS in Washington D. C. …
67
Fig. 5.11 The D.C. WDS
5.2.2 The D.C. WDS The D.C. WDS (Fig. 5.11) includes three reservoirs, the Dalecarlia Reservoir, the Georgetown Reservoir, and the McMillian Reservoir, two pumping stations, the Little Falls Pumping Station on the Potomac River and the Bryant Street Pumping Station (BSPS), two WTPs, the Dalecarlia WTP and the McMillian WTP, and other components such as pipes and junctions. The area right next to the Georgetown Reservoir includes several parks (e.g., Rock Creek Park) and the American University (Fig. 5.12). The southeast part of D.C. includes parks (e.g., Fort Dupont Park) and the St. Elizabeth hospital (Fig. 5.13). Another pumping station, Forth Reno (Fig. 5.14), distributes treated water from the Fort Reno Reservoir [5]. Next to the BSPS is the water main for D.C. A water main is a primary pipeline used to move water from a purification and treatment plant to consumers. Water mains are also known as primary feeders, and they are connected to smaller water lines known as secondary feeders, which expand the reach of the WDS. Individual structures connect to the secondary feeders to tap into the water supply. A number of problems can cause water main breaks. When a water main breaks, shutoffs are used to isolate the failure so that workers can replace damaged pipes. Some cities experience infrastructure problems with water mains as a result of poor maintenance. In these cities, sections of old pipes have not been replaced due to concerns about expense or a lack of planning; these pipes fail and cause water main breaks.
68
Fig. 5.12 Area next to the Georgetown reservoir
Fig. 5.13 Southeast part of D.C.
5 Water Distribution Systems
5.2 Data and Web Mining of the WDS in Washington D. C. …
69
Fig. 5.14 Fort Reno pumping station
The cost to repair water main breaks represents a significant portion of a maintenance cost. Average direct and societal costs for a large-diameter water main break were estimated to be $1.7 million [4]. Factors that may cause water main breaks [12] include: (a) the quality and the age of pipes, including connectors and other equipment, (b) the type of environment in which pipes are laid; (c) the quality of the workmanship used in laying pipes; and (d) the service conditions such as pressure and water hammer. Figure 5.15 shows a revised version of the WDS in D.C. that includes both residential and non-residential demand points (numbered from 1 to 75). Table 5.5 shows residential demand points presented in Fig. 5.15. One additional demand point not included in Fig. 5.15 has a demand of 6,321,681 gallons per day is located at latitude 38.9209 and longitude −77.0802. Table 5.6 shows non-residential demand points, including museums, universities, and hospitals. The blue color represents hospitals, red color represents universities, and green color represents museums. Water demand in Tables 5.5 and 5.6 is calculated using the population of D.C. and total estimated water amount provided by the WAD in 2012. It is assumed that the daily water demand is 314.68 gallons per person in 2012 (198.98 M/632,323 = 314.68). Water demand for residential areas is calculated by using the total demand for the D.C. and the total non-residential water demand amount. Since the total demand is 198.98 million gallons and total non-residential demand amount is approximately 4.299 million gallons, the total residential demand would be about 194.681 million
70
5 Water Distribution Systems
Fig. 5.15 Revised WDS in D.C.
gallons. This amount is allocated to different residential demand points based on the area of the region they are located. Tables 5.5 and 5.6 also include the coordinates (latitude and longitude) of demand points. Total water demand of museums in D.C. can be calculated using information in Table 5.4, which demonstrates the water consumption of Museum of Science, Environment and Climate of Lerida, Spain. The museums and exhibitions in Barcelona received 21 million visitors in 2010 [3]. This number includes almost 50 museums and art galleries. The average number of visitors per museum was 420,000 (21 million/50 = 420,000) in Catalan Area. The daily water consumption of Museum of Science, Environment and Climate of Lerida was about 530 gallons (2008 L × 0.2641 = 530.46) and it had almost 1150 visitors per day (420,000/365 = 1150.68). The daily water consumption per person was 0.46 gallon (530/1150 = 0.46). This amount is used to calculate water demand of museums in D.C. by assuming that these museums have similar water usage areas as museums in Spain had. Table 5.7 shows the junction points and their coordinates.
3,440,368 6,214,134 3,596,067 8,422,287 6,230,340 6,049,313 6,212,620 9,885,021 3,217,445 2,800,062 1,056,990
14
16
18
21
29
30
31
32
47
54
55
11,153,715
8
9,885,695
16,664,525
6
13
5,296,183
4
35,734,308
7,163,531
3
19,030,380
7,396,043
2
12
5,608,337
1
9
Water demand (gallons/day)
Demand point in Fig. 5.15
Table 5.5 Residential demand points in D.C.
38.8988
38.8969
38.9123
38.8392
38.9421
38.9238
38.9353
38.9303
38.9455
38.9591
38.9809
38.9786
38.9669
38.9361
38.9306
38.8616
38.8638
38.8769
38.8870
38.8948
Latitude
−77.0454
−77.0193
−76.9867
−77.0124
−77.0856
−77.0782
−77.0846
−77.1018
−77.1003
−77.0577
−77.0550
−77.0292
−77.0226
−76.9815
−76.9709
−76.9918
−76.9542
−76.9383
−76.9223
−76.9291
Longitude
(continued)
5.2 Data and Web Mining of the WDS in Washington D. C. … 71
Water demand (gallons/day) 620,663 606,578 643,708 628,026 386,777 1,088,016 1,155,228 1,155,240 2,835,165 728,746 3,624,914 1,640,220 1,116,607 4,489,587 630,543 793,022 562,282 461,434 6,321,681
Demand point in Fig. 5.15
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
74
75
183
Table 5.5 (continued)
38.9209
38.8878
38.8852
38.8847
38.8872
38.8945
38.9121
38.9055
38.8773
38.8857
38.8763
38.8836
38.8827
38.8900
38.8943
38.9005
38.9055
38.9055
38.8946
Latitude
−77.0802
−76.9988
−76.9831
−77.0070
−76.9930
−76.9794
−76.9829
−77.0062
−77.0071
−77.0158
−77.0152
−77.0082
−77.0004
−77.0001
−77.0060
−77.0321
−77.0369
−77.0369
−77.0509
Longitude
72 5 Water Distribution Systems
The Specialty Hospital of Washington
Providence Hospital
Catholic University of America
Potomac College
American University Park (2)
American University (1)
Psychiatric Institute of Washington 104 beds
Sibley Memorial Hospital
Georgetown University (1)
Georgetown University Hospital
Georgetown University Campus (2)
George Washington University
The George Washington University Hospital
Corcoran College of Art + Design
United Medical Center
7
10
11
15
17
19
20
22
23
24
25
26
27
28
33
Water use
St. Elizabeth Hospital
5
234 beds
371 beds
17,177 residents
609 beds
11,648 residents
257 beds
3096 residents
4895 residents
512 beds
82 beds
292 beds
Name
Demand point in Fig. 5.15
Table 5.6 Non-residential demand points in D.C. Water demand (gallons/day)
73,635
116,747
7901
191,641
5358
80,873
32,727
974,252
1,540,363
161,117
25,804
91,887
38.8376
38.9168
38.9037
38.9021
38.9303
38.9158
38.9085
38.9384
38.9781
38.9375
38.9502
38.9532
38.9368
38.9460
38.8273
38.8468
Latitude
(continued)
−76.9839
−77.0686
−77.0503
−77.0485
−77.1018
−77.0756
−77.0645
−77.1092
−76.9359
−77.0878
−77.0883
−77.0791
−76.9970
−76.9924
−77.0130
−76.9917
Longitude
5.2 Data and Web Mining of the WDS in Washington D. C. … 73
Renwick Gallery
National Museum of American History
Smithsonian National Museum of Natural History
Arthur M. Sackler Gallery & Free Gallery of Art
Smithsonian Institution
National Museum of African Art
Hirshhorn Museum and Sculpture Garden
National Air and Space Museum
National Museum of the American 1.6 million visits/year Indian
Donald W. Reynolds Center for American Art and Performance
National Postal Museum
35
36
37
38
39
40
41
42
43
44
45
321,953 visits/year
1 million visits/year
6.8 million visits/year
753,258 visits/year
353,959 visits/year
1.4 million visits/year
869,743 visits/year
7.6 million visits/year
4.8 million visits/year
162,051 visits/year
31,168 visits/year
34
Water use
Name
Smithsonian Anacostia Community Museum
Demand point in Fig. 5.15
Table 5.6 (continued) Water demand (gallons/day)
406
126
2016
8570
949
446
1764
1096
9578
6049
204
39
38.9001
38.9018
38.8908
38.8905
38.8900
38.8906
38.8914
38.8907
38.8952
38.8932
38.9011
38.8592
Latitude
(continued)
−77.0082
−77.0229
−77.0160
−77.0200
−77.0228
−77.0255
−77.0260
−77.0273
−77.0261
−77.0300
−77.0389
−76.9768
Longitude
74 5 Water Distribution Systems
Name
Radians College
National Defense University
Howard University
Howard University Hospital
DaVita Children’s National Medical Center
Washington Hospital Center
National Rehabilitation Hospital
S. Dillon Ripley Center
Gallaudet University
Demand point in Fig. 5.15
46
48
49
50
51
52
53
72
73
Table 5.6 (continued)
1175 residents
465,654 visits/year
137 beds
926 beds
303 beds
264 beds
6032 residents
Water use
369,750
587
43,111
291,395
95,348
83,076
83,076
Water demand (gallons/day)
38.9110
38.9119
38.9341
38.9314
38.9292
38.9202
38.9256
38.8689
38.9119
Latitude
−76.9924
−77.0200
−77.0124
−77.0148
−77.0147
−77.0203
−77.0217
−77.0147
−77.0199
Longitude
5.2 Data and Web Mining of the WDS in Washington D. C. … 75
76
5 Water Distribution Systems
Table 5.7 Junction points and coordinates Junction Latitude Longitude Junction Latitude Longitude Junction Latitude Longitude 76
38.8896 −76.9535 112
38.8989 −77.0535 148
38.9153 −77.0675
77
38.8783 −76.9613 113
38.9234 −77.0366 149
38.9073 −77.0629
78
38.8448 −77.0081 114
38.9627 −77.0359 150
38.9047 −77.0622
79
38.8297 −77.0170 115
38.9288 −77.0545 151
38.8982 −77.0509
80
38.9056 −76.9704 116
38.9234 −77.0519 152
38.9011 −77.0463
81
38.9068 −76.9790 117
38.9234 −77.0583 153
38.9026 −77.0465
82
38.9002 −76.9835 118
38.9297 −77.0583 154
38.9026 −77.0418
83
38.9019 −76.9790 119
38.9377 −77.0860 155
38.9026 −77.0319
84
38.8993 −76.9794 120
38.9331 −77.0910 156
38.9026 −77.0296
85
38.8924 −76.9839 121
38.9276 −77.1044 157
38.9068 −77.0366
86
38.9170 −76.9779 122
38.9019 −76.9426 158
38.9286 −77.0732
87
38.9074 −77.0092 123
38.8896 −76.9245 159
38.9294 −77.0740
88
38.8987 −76.9765 124
38.8727 −76.9487 160
38.9011 −77.0186
89
38.8935 −76.9772 125
38.8702 −76.9625 161
38.8954 −77.0038
90
38.8862 −77.0001 126
38.8647 −76.9759 162
38.8921 −77.0301
91
38.8834 −77.0063 127
38.8599 −77.0032 163
38.8875 −77.0261
92
38.8898 −77.0075 128
38.8515 −77.0066 164
38.8923 −77.0153
93
38.8765 −77.0044 129
38.8357 −77.0111 165
38.8922 −77.0200
94
38.8763 −77.0138 130
38.8390 −77.0177 166
38.8846 −77.0181
95
38.8858 −77.0227 131
38.9314 −76.9716 167
38.8846 −77.0174
96
38.8966 −77.0071 132
38.9587 −77.0078 168
38.8793 −77.0129
97
38.9180 −76.9785 133
38.9533 −77.0117 169
38.8766 −77.0069
98
38.9127 −76.9927 134
38.9368 −77.0241 170
38.8721 −77.0160
99
38.9001 −77.0145 135
38.9276 −77.0298 171
38.8790 −76.9957
100
38.8874 −77.0319 136
38.9699 −77.0254 172
38.8864 −77.0007
101
38.9027 −77.0232 137
38.9478 −77.0365 173
38.8875 −77.0075
102
38.9057 −77.0319 138
38.9610 −77.0665 174
38.8879 −76.9951
103
38.9025 −77.0338 139
38.9610 −77.0771 175
38.8863 −76.9872
104
38.9026 −77.0394 140
38.9490 −77.0806 176
38.8865 −76.9872
105
38.9057 −77.0409 141
38.9377 −77.0856 177
38.8898 −76.9883
106
38.9071 −77.0366 142
38.9433 −77.0893 178
38.9040 −76.9742
107
38.9093 −77.0433 143
38.9467 −77.0976 179
38.8977 −77.0088
108
38.9097 −77.0298 144
38.9477 −77.0994 180
38.9133 −77.0093
109
38.9024 −77.0503 145
38.9270 −77.0962 181
38.8898 −76.9772
110
38.8955 −77.0475 146
38.9270 −77.0865 182
38.9205 −77.0168
111
38.8972 −77.0552 147
38.9176 −77.0691
References
77
References 1. Arbués F, García-Valiñas MÁ, Martínez-Espiñeira R (2003) Estimation of residential water demand: a state-of-the-art review. J Socio-Econ 32(1):81–102 2. Belovich SG (1995) A design technique for reliable networks under a non-uniform traffic distribution. IEEE Trans Reliab 44:377–387 3. Catalan News Agency (2011) Museums and exhibitions in Catalonia receive more than 21 million visitors in 2010. http://www.catalannewsagency.com/news/culture/museums-and-exh ibitions-catalonia-receive-more-21-million-visitors-2010. Accessed Apr 2021 4. Davis P, Sullivan E, Marlow D, Marney D (2013) A selection framework for infrastructure condition monitoring technologies in water and wastewater networks. Expert Syst Appl 40(6):1947–1958 5. District of Columbia Water and Sewer Authority (2011) Rehabilitation and upgrade of the Fort Reno Pumping Station. http://www.dcwater.com/workzones/projects/pdfs/Fort_Reno_ Pumping_Station_Rehabilitation.pdf. Accessed Feb 2021 6. Luong HT, Fujiwara O (2002) Fund allocation model for pipe repair maintenance in water distribution networks. Eur J Oper Res 136:403–421 7. Newsroom of the Smithsonian (2012) Visitors statistics. http://newsdesk.si.edu/about/stats. Accessed March 2021 8. Ostfeld A (2001) Reliability analysis of regional water distribution systems. Urban Water 3:253–260 9. Ostfeld A, Kogan D, Shamir U (2002) Reliability simulation of water distribution systems— single and multiquality. Urban Water 4(2002):53–61 10. Sagrera A, Lopez F, Wadel G, Volpi L (2011) Efficiency in water use: a museum that saves 85%. http://unaus.eu/index.php/blog/22-efficiency-in-water-use-a-museum-that-sav es-85. Accessed March 2021 11. Schleich J, Hillenbrand T (2009) Determinants of residential water demand in Germany. Ecol Econ 68(6):1756–1769 12. Shamir U, Howard CDD (1979) An analytical approach to scheduling pipe replacement. J AWWA 248–258 13. U.S. Army Corps of Engineers and EA Engineering, Science, and Technology (2001) Water quality studies in the vicinity of Washington aqueduct. http://washingtonaqueduct.nab.usace. army.mil/Residuals/Pubs/RefDocs/EA_NPDES_PermitStudy.pdf. Accessed Feb 2021 14. U.S. News & World Report (2012) Best hospital rankings. http://health.usnews.com/best-hos pitals. Accessed March 2021 15. Vanham D, Millinger S, Pliessnig H, Rauch W (2011) Rasterised water demands: methodology for their assessment and possible applications. Water Resour Manag 25(13):3301–3320 16. Wang Y, Au SK (2009) Spatial distribution of water supply reliability and critical links of water supply to crucial water consumers under an earthquake. Reliab Eng Syst Saf 94:534–541 17. Washington D.C. Office of Planning State Data Center (2010) District of Columbia census 2010 demographic and housing profiles. http://planning.dc.gov/DC/Planning/DC+Data+and+ Maps/DC+Data/Tables/Data+by+Geography/Wards/Ward+7/Demographic++and+Housing+ Profiles+2010+by+Ward. Accessed Feb 2021 18. Water Supply Task Force, League of Women Voters of the National Capital Area (1999) Drinking water supply in the Washington, D.C. metropolitan area: prospects and options for the 21st century 19. Yezdani A, Jeffrey P (2011) Complex network analysis of water distribution systems. Chaos 1:1–11 20. Zhu Y, Elsayed EA, Liao H, Chan LY (2010) Availability optimization of systems subject to competing risk. Eur J Oper Res 202:781–788
Chapter 6
Transportation Systems
Transportation systems are interconnected and cover a large geographic region. Functions of one component in a system may have substantial impact on other components in the same and/or interconnected systems. Conflicts and errors (CEs) are ubiquitous in complex systems. In transportation systems, for example, a conflict between vehicles occurs when they try to use a resource (e.g., crossing a four-way intersection from two different roadways) at the same time; the conflict must be resolved to avoid collisions. For another example, a connected vehicle (CV)’s on-board equipment (OBE) may fail; the vehicle loses communications with other vehicles and roadside equipment (RSE). This error may propagate and cause conflicts between vehicles. Cyber components in complex systems are abundant and provide an increasingly large amount of data which may be used for prevention and detection of and recovery from conflicts and errors. Sustainability and resilience of complex systems depend on social, economic, environmental, and geographical factors. On a large scale, natural and man-made disasters often occur unexpectedly and are difficult to predict. For example, OBE on multiple vehicles may fail simultaneously due to tornadoes, floods, earthquakes, oil spills, terrorist attacks, and other disasters. How transportation systems continue operating under extreme events is essential. On a smaller scale, traffic flows and patterns may affect system performance. For example, an accident causes traffic jam, which in turn spikes communications for RSE in the area. RSE may fail due to communications overload at a time that they are most needed. Transportation systems must be effective, efficient, scalable, and adaptive to dynamic behavioral and structural changes. CEs propagate within and between cyber and physical components in transportation systems, which may be modeled as graphs or networks in which nodes interact with each other through links. Many tools in graph theory [14] have been developed with specific goals of cleaning and protecting “contaminated” parts of a graph efficiently. Transportation systems may be analyzed for parameters such as tree-width,
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 X. W. Chen, Network Science Models for Data Analytics Automation, Automation, Collaboration, & E-Services 9, https://doi.org/10.1007/978-3-030-96470-2_6
79
80
6 Transportation Systems
clique-width, and other metrics, and appropriate algorithms for isolating “contaminated” parts and subsequently containing and resolving CEs. There are several challenges, however. First, most methods are domain specific, they may be effective for one isolated application but cannot adapt to other applications. For example, there are different methods for hardware diagnostics [68, 69] and software testing [6, 24]. It is necessary to develop a cross-cutting knowledge representation structure as fundamental scientific and engineering principles for complex systems across all application sectors. Secondly, multiple CEs may occur simultaneously in a large yet distributed system. As conflicts and errors propagate and the number increases, it becomes more difficult to resolve conflicts and errors and requires more resources for effective and efficient response. There are often limited resources, e.g., manpower, time, and capacity. An optimal sequence of actions such as the order of resolutions and allocation of limited resources must be identified in real-time. Algorithms must be developed to dynamically optimize the outcome of resolving CEs under resource constraints [19, 20]. Thirdly, CEs in large systems are temporally and spatially related. There has been effort in exploring the structure of connected CEs (e.g., [26]). The main obstacle is to identify and assess root causes of a large amount of interconnected CEs and determine resolutions and the sequence of resolutios to minimize the negative impact. It is necessary to capture the intricate relations between CEs and use structure and behavior-adaptive algorithms for safe, reliable, and effective resolutions [18].
6.1 Connected Autonomous Vehicles An autonomous vehicle is capable of sensing its environment and navigating without human input. Autonomous vehicles sense their surroundings with technologies such as radar, Light Detection and Ranging (LiDAR), Global Positioning System (GPS), and computer vision. Advanced control systems on board the vehicle interpret sensor information to identify appropriate navigation paths and obstacles and interpret relevant signs. Significant advances have been made in technology and legislation relevant to autonomous vehicles. Several major companies, including Tesla, Google, Nissan, Toyota, Audi, BMW, and others, have developed working autonomous prototypes. The State of Nevada was the first jurisdiction in the U.S. to enact in June 2011 legislation concerning the operation of autonomous vehicles for testing purposes using professional drivers. Autonomous vehicles have the potential to generate benefits that are consistent with the objectives of the Connected Vehicle (CV) initiatives [71], such as reducing traffic crashes and congestion and improving fuel efficiency [34]. The CV testbeds and associated interoperable environments provide supporting vehicles, infrastructure, and equipment to serve the needs of CE modeling [40]. A CV testbed includes multiple road intersections that are capable of broadcasting signal phase and timing (SPAT) and geometric intersection description (GID). The testbed
6.1 Connected Autonomous Vehicles
81
has vehicles that are equipped with SPAT-capable devices. Aside from the SPAT and GID, each intersection is capable of broadcasting terminal information message (TIM) messages and provides IPv6 network access. Wireless connectivity among vehicles (vehicle-to-vehicle = V2V communications) and between the infrastructure and vehicles (vehicle-to-infrastructure = V2I communications) is supported by the dedicated short-range communications (DSRC), an open-source protocol for wireless communication in the 5.850 to 5.925-GHz band (5.9-GHz band). The DSRC emits messages 10 times per second up to a range of 1,000 m and has low latency (0.02 s) with limited interference. To control multiple connected autonomous vehicles, a network structure of constraints represents the sophisticated relations in domain knowledge as large-scale interacting constraints. Advanced algorithms are developed using the topological information in the structure and reason on multi-scale data extracted from the network structure to detect and resolve CEs. The network structure of constraints evolves from constraint networks, which are originally representations designed to solve constraint satisfaction problems (CSP; [49]. Research shows that constraint networks are capable of representing both quantitative and qualitative knowledge as continuous and binary variables and constraints. Error detection is achieved through inconsistency check of constraints. The development in complex networks has successfully demonstrated the use of advanced constraint networks, which have deterministic links that represent inclusive, exclusive, or independent relations between constraints [19, 20]. Most engineered systems can be viewed as having a multi-layer architecture as systems allocate different responsibilities into different layers. The non-deterministic relations between constraints enable dynamic knowledge integration and updating through contemporizing the structure and weight of links. By using dependencies, shared variables, and interactions between constraints, a network structure can be constructed and allows reasoning with different scale of data. The network structure supports multi-granularity, generic, non-deterministic relations, and transformability to enable modeling and representation of different complex systems in engineering and scientific fields. There are at least four general classes of network architectures: fully centralized, Bose-Einstein condensation, scale-free, and random. Below is a brief description of how these network architectures represent the control structure of connected constraints (nodes), which may represent CV, semi-autonomous vehicles, or autonomous vehicles. 1.
2.
Fully centralized networks. Centralized networks have been the primary network structure for small-scale communications and computer networks [66]. A centralized network is one where each vehicle, computer, workstation, or communication node in the network connects to the central server directly, which forms a typical client/server architecture. A centralized network has a star topography, with the server in the center and links connecting the server and clients. Bose-Einstein condensation networks. Bose–Einstein condensation networks [12] model the competitive nature of networks. A fitness parameter ξi is assigned
82
3.
4.
6 Transportation Systems
to each node i. A node with higher ξi has higher probability to obtain links. ξi is randomly chosen from a distribution ρ(ξ ). When ρ(ξ ) follows certain distributions (e.g., ρ(ξ ) = (λ + 1)(1 − ξ )λ , λ > 1, a Bose-Einstein condensation network forms. The Bose-Einstein condensation network shows “winner-takesall” phenomena observed in competitive networks. The fittest node acquires a finite fraction of the total links (about 80%) in the network [12]. The fraction is independent of the order of the network. The concept of Bose-Einstein condensation networks arose out of studies of the behavior of certain gases at temperatures close to absolute zero. The fittest node corresponds to the lowest energy level in the Bose gas. The fittest node acquires most links corresponds to the phenomenon that many particles are at the lowest energy level when the temperature is close to absolute zero. Scale-free networks. Scale-free networks have been studied extensively [2, 8, 15] and capture the topology of many real-world networks. In scale-free networks, the probability Pr that a node has θ degree, the number of links connected to the node, follows a power law distribution, Pr ∝ θ −γ , where γ is between 2.1 and 4 for real-world scale-free networks [8]. In contrast to BoseEinstein condensation networks, scale-free networks are formed following the rule that older nodes have a higher probability of obtaining links. General random networks. Random networks [28–30, 65] consist of nodes connected to each other with probability Pr . Most random networks assume that a node does not have a link to itself. In a random network with n nodes and probability Pr to connect any pair of nodes, the maximum number of . The probability that a node has degree θ is links in the network is n(n−1) 2 n−1 Pr θ (1 − Pr )n−1−θ , which is also the fraction of nodes in the network θ that have degree θ . The mean degree is θ = (n − 1)Pr . The random network has two important properties: (a) phase transition or bond percolation [4, 55, 56, 65]. There is a phase transition from a fragmented random network for the average degree θ ≤ 1 to a random network dominated by a large component for the average degree θ > 1; and (b) critical probability Prc [28–30]. For many properties of a random network, there exists a critical probability Prc such that if Pr , the probability to connect any pair of nodes, grows slower than Prc as n → ∞, the random network fails to have those properties. If Pr grows faster than Prc , the random network has those properties. For instance, the probability of having a triple (a single node connects to two other nodes) is negligible if Pr < cn −1/2 for some constant c, but tends to be one as n becomes large if Pr > cn −1/2 . Triples, which indicate multiple-vehicle conflicts such as those that involve more than two vehicles, are more difficult to resolve.
A node controls or coordinates with another node if and only if the two nodes are directly connected (communicate with each other). From the collaborative control perspective, centralized networks have the central server, for example, a vehicle or RSE, that controls all other nodes in the network. Bose-Einstein condensation networks are mostly centralized networks in which the fittest node controls most
6.1 Connected Autonomous Vehicles
83
nodes (for example, 80%) in the network. Scale-free networks are mostly decentralized networks in which the few nodes with large degree control a portion of nodes in the network. Random networks are fully decentralized networks without any central control. The other dimension of collaborative control is the scale of collaboration, which indicates how many nodes may participate in collaboration and is determined by the order of the largest component. As defined in Chap. 1, let k be the index of components in a network and n k be the order or the number of nodes in component k. max n k is the order of the largest component in a network of n nodes. the largest component. max n k k
n
1 n
≤
max n k k
n
≤ 1.
max n k k
n
max n k k
n
k
is the portion of nodes in
= 1 for centralized networks. Larger
indicates more nodes share information whereas smaller
nodes share information. For centralized control,
max n k k
n
max n k k
n
indicates fewer
determines the maximum
number of nodes the central server can control. For decentralized control, determines the number of nodes that can collaborate.
max n k k
n
6.1.1 Centralized Network Control with Decentralized Collision Avoidance Vehicles are connected to a sector controller, with a few proximate vehicles connected to each other through DSRC for collision avoidance. The sector controller is a vehicle, RSE, or other infrastructure component. The number of links to the sector controller is small due to communications distance and workload limitations, and is limited by communications failures and weather and road conditions. The sector controller provides advisories to CV or determines trajectory and speed of autonomous vehicles and communicates with them to guide their movement. A conflict occurs when two vehicles intend to use the same resource at the same time. The sector controller communicates with the two vehicles and provides a resolution of the conflict. This resolution may cause other conflicts. With centralized control, the sector controller is connected to and commands all vehicles within its communications range; it can evaluate networks of conflicts resulting from different resolutions and choose the resolution that meets multiple objectives such as avoiding collisions and minimizing traffic delay. A conflict may occur when two vehicles get too close before receiving any advisory from the controller due to communications limitations or failures. The two vehicles can communicate using DSRC to take immediate actions and resolve the conflict. A transportation system consists of a number of these centralized networks that are independent of one another; vehicles leave and join these networks in a predictable fashion based on trajectory and speed. In a transportation system, the number of sectors is fixed, but not all sectors have the same infrastructure. Figure 6.1 shows this centralized control structure, which is vulnerable to failures and attacks due to the
84
6 Transportation Systems
Fig. 6.1 A transportation system sector controlled by a centralized network
fully centralized control. If the sector controller, for example, the RSE in Fig. 6.1, fails, the entire sector will lose coordination. Drivers or autonomous vehicles can use DSRC to prevent operational level conflicts (imminent collisions) but cannot provide strategic (long-range) and tactical (short-range) travel guidance, which is necessary to maximize roadway throughput, and reduce road congestion and vehicle fuel consumption, emissions, and queue waiting time. When the controller node fails, the network is overwhelmed by multiple conflicts that occur frequently, similar to the current transportation system.
6.1.2 Centralized Control with Backup Decentralized Control Under this concept, CV or autonomous vehicles are controlled by a decentralized system (Fig. 6.2) when the centralized system (Fig. 6.1) fails. The structure of the centralized control in Fig. 6.1 is a centralized network and the structure of the decentralized system is a random network. The decentralized system provides operational collision avoidance continuously through DSRC and acts as a backup to the centralized system when the latter fails. With the decentralized system, each vehicle is responsible for operational collision avoidance. Sufficient communications
6.1 Connected Autonomous Vehicles
85
Fig. 6.2 A transportation system sector controlled by a random network
and computational capabilities are required for each vehicle. Compared to centralized control with decentralized collision avoidance in Fig. 6.1, the sector under this backup decentralized control can accommodate high density traffic. For example, when an accident occurs and causes traffic jam, many vehicles move into this sector and relatively few leave the sector. The high traffic volume may cause communications overload for the central controller, and it fails. The decentralized system takes control and oversees collision avoidance. This allows the central controller to manage many vehicles without increasing the risk of collision since there is always a backup system. The main challenge is how the control structures shift discretely between centralized network and random network. When the control transitions from a centralized network to a random network, sufficient communications bandwidth between vehicles must be provided. When multiple central controllers from adjacent sectors fail simultaneously, vehicles from these sectors form a random network for decentralized control. In a random network, the number of links required for each node to ensure effective communications does not change as the network order increases. When the control transitions from a random network to a centralized network, multiple central controllers are needed due to the controllers’ limited capability. In other words, vehicles in Fig. 6.2 need to be divided into groups, each of which is a centralized network as shown in Fig. 6.1. Protocols must be developed to guide the formation of centralized networks. One
86
6 Transportation Systems
possibility is to use a first-request-first-form policy. Suppose a central controller controls 10 vehicles. The first 10 requests received will be accepted and these 10 vehicles form a centralized network.
6.1.3 Mixed Centralized and Decentralized Control Under the mixed centralized and decentralized control, autonomous vehicles that are sufficiently equipped navigate by themselves in a decentralized fashion, whereas a centralized system is responsible for the navigation of less equipped CV. The control structure is a Bose-Einstein condensation network in which the fittest node is the central controller for less equipped CV. The order of the largest component, max n k , k
where k is the index of components and n k is the order or the number of nodes in component k. max n k depends on the total number of links and the fraction of links k
the fittest node acquires. When max n k < n, where n is the order or the total number of nodes in the network, k
there exist nodes that do not belong to the largest component. These nodes represent autonomous vehicles that use decentralized control. Let θ f ittest be the degree of the fittest node in a Bose-Einstein condensation network. There are n − 1 vehicles in the network that are not the fittest node. The fittest node is the central controller which controls θ f ittest number of less equipped CV. Normally n and θ f ittest are given, i.e., there are θ f ittest less equipped CV, and the number of sufficiently equipped autonomous vehicles is n − 1 − θ f ittest . If any autonomous vehicle is connected to a CV directly or indirectly, the autonomous vehicle is controlled by the central controller. Any autonomous vehicle that is not connected to a CV uses decentralized control. Let e be the total number of links in a Bose-Einstein condensation network. e − θ f ittest is the number of links that connect either autonomous vehicles, or θ between autonomous vehicles and CV. f ittest is the fraction of nodes acquired by e the fittest node. The average number of links, θ autonomous , for autonomous vehicles 2(e−θ f ittest ) e−θ e−θ is between ( n−1−θf ittest , n−1−θ ). θ autonomous = n−1−θf ittest if all links connect CV f ittest f ittest f ittest 2(e−θ f ittest ) = if all links connect autonomous and autonomous vehicles. θ autonomous
e−θ f ittest n−1−θ f ittest
n−1−θ f ittest 2(e−θ f ittest ) if a n−1−θ f ittest
< θ autonomous < portion of links connect CV and vehicles. autonomous vehicles, and the rest connect autonomous vehicles. Since the average number of links for the entire network, θ > 1, is required to ensure acceptable connectivity and enable valid comparison between networks, e = nθ /2 > n/2. e − θ f ittest > n/2 − θ f ittest . The minimum number of links needed to connect all n − 1 − θ f ittest autonomous vehicles to CV is n − 1 − θ f ittest . There are three desired conditions for the control of a Bose-Einstein condensation network: (a) e is expected to be small since a larger number of links means higher communications and information sharing cost; (b) there are two types of control in a
6.1 Connected Autonomous Vehicles
87
Bose-Einstein condensation network, centralized and decentralized. A large number of decentralized controlled vehicles may create many conflicts between centrally controlled CV and decentralized autonomous vehicles. It is desired to have a small number of decentralized controlled vehicles, i.e., n − 1 − θ f ittest is expected to be small; and (c) θ autonomous is expected to be large. This helps form a large group of centrally controlled vehicles (large order for the largest component) and a group of well-connected decentralized vehicles. As a practical example, when n − 1 − θ f ittest is large, to ensure that 2 ≤ θ autonomous ≤ 4 for a network of 100 vehicles (n = 101 in this case), e = 160 θ = 0.25. When n − 1 − θ f ittest is small and to if θ f ittest = 40. The fraction f ittest e ensure that 2 ≤ θ autonomous ≤ 4 for a Bose-Einstein condensation network with θ = 0.67. This n = 101, e = 120 if θ f ittest = 80. In this case the fraction f ittest e θ f ittest example shows that to meet all three conditions, e must be large, i.e., the fittest node acquires most links. A suitable control structure for the mixed centralized and θ . decentralized concept is a Bose-Einstein condensation network with large f ittest e
6.1.4 Mostly Decentralized Control with Centralized Scheduling The mostly decentralized with centralized scheduling concept completely decentralizes collision avoidance while enforcing compliance to scheduling in a centralized, strategic fashion. This concept involves a dense network, in which vehicles are connected to each other, but also to a centralized scheduler. The control structure consists of a scheduler or a network of short-range schedulers. Vehicles are informed by schedule, i.e., trajectory and speed; compliance is secondary to conflict resolution. This control concept has the following properties compared to the other control structures discussed earlier: (a) the layer for physical devices is a scale-free network. Schedulers are nodes with large degree; (b) vehicles must be sufficiently equipped to communicate and compute trajectories and speed, which are also requirements for the control structure in Fig. 6.2; (c) the scheduler as a central controller can be a vehicle or infrastructure component; (d) scale-free networks are resilient to random failures [3, 22] but are vulnerable to targeted attacks [23]. If any vehicle other than a scheduler loses its ability to communicate, other vehicles can still function effectively with strategic and tactical travel guidance provided by the scheduler and collision avoidance through DSRC. If a scheduler fails, the group of vehicles it supports must identify a new scheduler, which must establish communications with other schedulers; and (e) the control structure of one scheduler and its supported vehicles is a centralized network (Fig. 6.1). With multiple schedulers, the control structure becomes a scale-free network that can enable high density traffic.
88
6 Transportation Systems
6.1.5 Fully Decentralized Control Under the fully decentralized control, vehicles have full autonomy over their trajectory with no centralized assistance. Vehicles must perform both tactical and strategic route planning. The underlying control structure is a classical random network which is also used in the centralized with decentralized backup control and is a scaled-up version of the local network of decentralized vehicles discussed in Sect. 6.1.3. The performance of the fully decentralized control over a random network depends on the order of the largest component. When the average degree θ > 1, the percentage of nodes to which there exist paths from an arbitrary node increases rapidly, starting max n k with slope of two. For instance, the percentage is 80% (i.e., k n = 0.8) when θ = 2. Except for the largest component, there are many smaller components that fill the portion of a random network not occupied by the largest component with an 1 average order max n k [13], where Pr is the probability to connect any k 1−(n−1)Pr +(n−1)Pr
n
pair of nodes. When θ = (n − 1)Pr = 2, the average order of small components is 1.67. For a random network of 100 vehicles, 80 vehicles form the largest component and there are on average 20/1.67 = 12 small components. This is a viable control structure although there may be many conflicts between components. There is no strategic or tactical route planning when vehicles are not connected. Links between vehicles form because of two reasons: (a) the need for collision avoidance; and (b) the need for strategic and tactical route planning. The minimum requirement for a random network to function effectively is θ > 1. The performance of the fully decentralized control is expected to improve as θ increases. For instance, max n k max n k k when θ = (n − 1)Pr = 3, k n = 0.94. is the function of θ and can be n calculated according to θ =
− ln 1−max n k k
max n k
[28]. The average order of small compo-
k
nents is 1.22 when θ = 3. For a random network of 100 vehicles, 94 vehicles form the largest component and there are on average 6/1.22 = 4.92 small components. Compared to 80 vehicles in the largest component and 12 small components when θ = 2, the random network with θ = 3 reduces potential conflicts between components. A random network allows high density traffic when θ > 1 and the control performance improves as θ increases. When θ ≤ 1, the fully decentralized control is not possible.
6.2 Freight Transportation Using Railroad Networks Many cities and nations are moving more freight than ever before, most of which are over long distances and across national borders. In 1997, on average, 41 million tons of freight, valued at over $23 billion, was transported within the U.S. per day. In total, this represented 14.8 billion tons and $8.6 trillion dollars of merchandise,
6.2 Freight Transportation Using Railroad Networks
89
requiring almost 3.9 billion ton-miles of freight activity [54]. Much of this freight was a direct result of the growth in population and economic activity, while technological developments have also contributed to a greater reliance on transportation in the production process. Worldwide merchandise trade (exports) is estimated to have grown from $58 billion in 1948 to $6168 billion in 2000. Between 1960 and 2000, while the worldwide production of merchandised goods grew more than threefold, the volume of international trade increased by a factor of almost 10 [38]. Recent projections called for increases in both U.S. and worldwide trade and associated freight volumes. These high growth rates are more than the historical growth rates in railroad infrastructures and vehicle fleets that handle freight. With many of these infrastructures already under stress and suffering from costly traffic congestion, freight planners have an important role to play in the future of the world’s railroad transportation and economic systems [38]. A railroad network consists of stations and track lines connecting adjacent stations. Stations in a railroad network include loading and unloading stations. Freight orders are generated at loading stations and moved to designate unloading stations. A rake (train) is formed by coupling railcars together based on their destinations. A freight rake may consist of both loaded and empty railcars. Empty railcars are attached to a freight rake based on an empty railcar distribution policy or per specific requirements. Loaded railcars are attached to the rake based on the destination of the railcars to minimize switching of railcars for the rake. The length of a freight rake is standardized based on the locomotive hauling power and length of sidings at stations. Sidings provide meet/overtake opportunities for trains. Different types of railcars carry various classes of commodities. Compatibility among railcars is considered in forming a rake. A rake is coupled with a locomotive and moved by the locomotive to its destination. Once a rake arrives at its destination station, the locomotive that hauls it becomes available for the next assignment, either at the same station or at another station after a deadheading run. Deadheading of locomotives is necessary to minimize the waiting time of rakes for locomotives. The locomotive assignment problem consists of assigning an available set of locomotives to cover all freight demand at minimum cost. Uneven flow of traffic in one direction accumulates locomotives at one station, resulting in a shortage of locomotives at another station. This is the root cause of deadheading. In addition, each locomotive has a maintenance schedule; it is taken offline and remains unavailable for a few hours to a few days from time to time according to the schedule [37].
6.2.1 Scheduling of Railcars Railroad operations involve complex switching and classification decisions that must be made in short time. Optimization with respect to these decisions is difficult due to discrete and nonlinear characteristics of the problem. The formation of a rake is one of the most important decisions in railroad operations. Mathematical programming and algorithms are available for solving the rake formation problem, but it
90
6 Transportation Systems
requires excessive computation time to obtain optimal solutions. Meantime, shorter decision windows become necessary given the highly competitive railroad industry. Artificial intelligence (AI) offers promising alternatives to conventional optimization approaches and may offer better solutions than conventional models [52]. The railroad industry desires to operate with schedules [58]. In scheduling, a policy for deriving schedules from a strategic viewpoint is derived. This policy is implemented on a monthly or weekly basis. At the tactical level, all trains obtain their schedules [47]. Railroad managers determine which pairs of stations are provided with direct train connections, the frequency of service, how railcars are routed through the available configuration of trains and intermediate stations, and how railcars are grouped or “blocked” within a rake. The objective is to minimize the total train costs, railcar time costs, and classification yard costs, while not exceeding limits on train size and yard volumes. These decisions are modeled as a mixed-integer programming problem, where the decision to operate a given train between two stations corresponds to a 0–1 binary variable. If there is no limit on train size, the mixedinteger programming model can be solved efficiently using Lagrangian relaxation. If the solution contains some overloaded trains, which is likely, heuristic adjustments are necessary to obtain a feasible operating plan [45]. Other models have been developed to determine the optimal structure and size of a railcar fleet under uncertainty in demand and travel times [25, 46].
6.2.2 Impact of Weather on Railroad Operations Weather conditions significantly impact road transportation including railroad systems. Weather conditions adversely affect operating efficiency, physical infrastructure, and safe passage of freight rakes. Railroad companies must operate under a variety of meteorological conditions, some of which are particularly problematic for rail transportation [60]. In addition to transient weather conditions, there are seasonal and time-of-day factors. Precipitation and fog lead to decreased visibility of signals. Flash floods can wash out tracks and excessive heat can warp tracks. Crosswinds reduce stability and may lead to blow-over of railcars. Snow and ice often cause regional delays and shutdowns. Serious problems may result from preexisting accumulations of rain, ice, and snow that are not associated with current weather conditions. Weather can lead to delays in railroad systems and cause loss of economic efficiency [60]. There are potential benefits in using enhanced weather information to support railroad decision making. For example, the positive train control (PTC) technology is part of the intelligent transportation system (ITS). The ITS embodies a unified set of electronics, communications, hardware, and software. Freight rakes’ onboard sensors are data sources for meteorological models and forecast decisions. Stationary sensors mounted in wayside bungalows and along tracks provide meteorologists and railroad traffic managers with valuable, multipurpose observations from remote locations [60].
6.2 Freight Transportation Using Railroad Networks
91
6.2.3 Railroad Transportation of Hazardous Materials Railroad transportation plays a critical role to transport hazardous materials (hazmat) safely and efficiently. Several strategies have been implemented to reduce the risk of hazmat release from train accidents. Each of these risk reduction strategies has its safety benefit and corresponding implementation cost. The cost effectiveness of the integration of different risk reduction strategies is not well understood. There has been growing interest in the U.S. rail industry and government to best allocate resources for improving hazmat transportation safety. There are approximately two million railcars of hazmat in North America [5] annually. Although most of these shipments (99.996% in 2011) safely reached their destinations [5], the potential severe consequence of a hazmat release incident remains a major safety concern to the rail industry, government, and public. For example, the release of chlorine gas from a train collision in Graniteville, South Carolina in January 2005 resulted in nine fatalities, hundreds of injuries, an evacuation of about 5400 people, and economic loss exceeding $6.9 million [72]. There has been growing interest and intensifying regulatory requirements in the U.S. to improve the safety of railway hazmat transportation. Improvements have focused on enhancing packaging and tank car safety design [9–11, 61–63, 70], deploying wayside defect detection technologies [41, 57, 59, 67], upgrading track infrastructure [44, 50, 51], routing [1, 31, 35, 36, 42], reducing train speed [43], and improving emergency response practices [17]. Each strategy has a direct effect on the hazmat release risk and different strategies may have compounding effects.
6.3 Airport Traffic Control Airport capacity is an airport’s capability of meeting demand (number of arrivals and departures of aircraft; [7]. With the continuing increase in air transportation demand in the long term (Table 6.1), current capacity in some major airports is insufficient [33]. The growing demand exceeds the capacity and causes congestion. Congested airports are “choke points,” which are priority areas for improvement to Table 6.1 Airport demand projections in thousands of passengers Airport
2015
2040
Hartsfield-Jackson Atlanta International (ATL)
1028
1724
Chicago O’Hare International (ORD)
988
1676
Dallas/Fort Worth International (DFW)
685
1016
Denver International (DEN)
704
1119
Los Angeles International (LAX)
638
1037
George Bush Intercontinental/Houston (IAH)
634
1383
92
6 Transportation Systems
increase efficiency [16]. Federal agencies such FAA, National Aeronautics and Space Administration (NASA), and other organizations and academia continue working on the elimination of choke points. Airports need a 4-D (three-dimensional space plus time) decision support systems that can model and analyze dynamic trajectory information of vehicles in real time and provide recommendations for the best schedules to minimize congestion.
6.3.1 Congestion and Choke Points The FAA identifies 30 core airports, which are the busiest airports in the U.S. Three years of data (2010–2012) are obtained from FAA’s Aviation System Performance Metrics (ASPM) database and analyzed according to FAA’s airport performance criteria. San Francisco International, Newark International, and La Guardia airports experienced significantly high gate arrival delay. Newark International airport had significantly high departure and taxiing delays. The IAH is the only airport with increasing arrival, departure, and taxiing delays. In terms of takeoff, La Guardia, John F. Kennedy, Newark International, and Philadelphia airports had steadily high airborne delays in the three years analyzed, indicating that the northeastern corridor had the worst congestion in the U.S. The FAA identifies congested airspaces by looking at the size and the number of congested airports in a metropolitan area. A candidate airspace for traffic optimization has at least one large or medium hub airport with constrained capacity or at least two small hub airports with constrained capacity [53]. An airspace that contains two or more capacity constrained large hub airports requires traffic optimization. Congestion in air and air-ground traffic can also occur due to the inefficiencies in air traffic control. The FAA determined three main issues that caused congestion in the IAH airspace: (a) limitations of the conventional, ground-based navigation system and existing area navigation procedures; (b) limited flight path predictability and flexibility, particularly during adverse weather conditions; and (c) high occurrence of voice communications among controllers and pilots, leading to excessive workload, and increased hear-back and read-back errors. Quantification of choke points is vital because it is a means for understanding the magnitude of congestion at airports. Three metrics can be used for quantification: volume, demand to capacity ratio, and delays. The correlation analysis that includes the US’s 30 busiest airports revealed that there is no significant relationship between delays and high volume at airports. Tables 6.2, 6.3, and 6.4 show operations and delays, including National Airspace System (NAS) delays, caused by volume in years 2010, 2011, 2012, respectively, for the 30 core airports. Seven airports have an average number of delays per year from 2010 to 2012 that is above 10,000: ATL, LAX, CLT, PHL, DTW, JFK, and LGA. ATL has experienced the largest number of delays.
6.3 Airport Traffic Control
93
Table 6.2 Operations and delays in 2010 Airports
Number of operations
Number of delays
Number of NAS delays
Percentage of NAS delays (%)
ATL
948,833
25,755
75,242
8
ORD
879,332
9411
75,710
9
DFW
651,371
4663
31,005
5
DEN
636,036
4875
27,858
4
LAX
572,811
11,433
28,583
5
Charlotte/Douglas International (CLT)
522,155
19,275
33,575
6
McCarran Internal (LAS) 404,616
6602
15,780
4
IAH
535,495
9307
41,715
8
Phoenix International (PHX)
443,562
3748
17,077
4
Philadelphia International (PHL)
449,013
14,461
39,019
9
Detroit Metropolitan (DTW)
453,578
14,346
31,524
7
Minneapolis St. Paul (MSP)
436,063
7458
28,562
7
San Francisco International (SFO)
384,202
1557
54,057
14
Newark Liberty International (EWR)
405,996
8250
64,350
16
John F Kennedy International (JFK)
404,751
13,538
43,308
11
Miami International (MIA)
376,437
6281
25,635
7
La Guardia (LGA)
365,039
12,445
42,491
12
Logan International (BOS)
358,304
4347
33,107
9
Washington Dulles International (IAD)
364,473
7092
17,458
5
Salt Lake City International (SLC)
324,441
3395
15,638
5
Seattle Tacoma International (SEA)
313,468
3172
13,573
4
Orlando International (MCO)
313,796
6192
16,663
5
Ronald Reagan National (DCA)
270,848
5782
16,576
6 (continued)
94
6 Transportation Systems
Table 6.2 (continued) Airports
Number of operations
Number of delays
Number of NAS delays
Percentage of NAS delays (%)
Honolulu International (HNL)
193,070
1918
7086
4
Memphis International (MEM)
338,598
4700
15,948
5
Baltimore/Washington International (BWI)
269,016
4616
12,724
5
Fort Lauderdale International (FLL)
257,255
7084
15,410
6
Chicago Midway International (MDW)
232,382
2391
8738
4
Tampa International (TPA)
186,511
2731
8561
5
San Diego International (SAN)
187,358
2309
7195
4
6.3.2 Airport Capacity Utilization, Delays, and Congestion Increasing volume at airports boosts the capacity utilization but pushes the utilization rate to its limit during peak times. Airport traffic control becomes challenging during peak times; this is a precursor to bottlenecks. For example, New York City area is one of the most congested airspace due to its three airports, JFK, EWR, and LGA, with above-average delays [27]. The bottlenecks caused by capacity utilization can be eliminated through enhancement projects. Airport planning guidelines suggest that capacity enhancement projects are justified for airports with 60% to 80% capacity utilization. Capacity improvement projects should be initiated for airports with higher than 80% capacity utilization [73]. When the capacity utilization is greater than 0.8, delays become excessive and must be resolved. An airport is congested if it experiences long arrival and departure delays. The threshold for long delays is 12 min [73]. An airport is congested if the average gate arrival delay is 12 min or more per flight on annual basis. An airport is congested if the average departure queue delay is 12 min or more per flight in good or bad weather conditions. Table 6.5 shows average gate arrival delays at 30 core airports in years 2010–2012. Table 6.6 shows average departure queue delays at 30 core airports. Table 6.6 also shows average airport and gate delays during departure. The average queue delay is the summation of the average airport and gate delays. Tables 6.5 and 6.6 indicate that 11 airports, ATL, ORD, IAH, PHL, SFO, EWR, JFK, MIA, LGA, IAD, and MEM, are congested due to long delays. Another indicator of congestion at airports is the average taxi-in and taxi-out times. Above-average taxi times are caused by congestion at airports. When the capacity of an airport is saturated [64], taxi-times exceed the average among similar airports. Tables 6.7 and 6.8 show average taxi-in and taxi-out times at 30 core airports,
6.3 Airport Traffic Control
95
Table 6.3 Operations and delays in 2011 Airports
Number of operations
Number of delays
Number of NAS delays
Percentage of NAS delays (%)
ATL
922,521
28,446
60,886
7
ORD
875,930
9916
74,892
9
DFW
646,409
7265
28,313
4
DEN
635,219
6612
32,396
5
LAX
602,238
14,722
39,627
7
CLT
533,987
26,630
40,850
8
LAS
420,748
7757
17,545
4
IAH
530,665
10,431
36,244
7
PHX
455,981
7870
17,738
4
PHL
445,882
12,424
45,658
10
DTW
444,399
9070
21,909
5
MSP
435,320
5026
21,635
5
SFO
400,805
2016
52,505
13
EWR
408,212
10,232
69,274
17
JFK
411,614
8723
41,203
10
MIA
391,140
4983
21,982
6
LGA
368,501
10,489
45,547
12
BOS
359,946
4539
39,234
11
IAD
355,718
8505
20,489
6
SLC
305,506
2857
10,234
3
SEA
313,800
4628
15,941
5
MCO
315,504
6456
18,236
6
DCA
282,383
6746
20,529
7
HNL
189,309
749
5149
3
MEM
311,984
3499
13,072
4
BWI
272,248
3628
13,122
5
FLL
250,513
7636
16,384
7
MDW
239,933
1926
8014
3
TPA
183,751
2297
8526
5
SAN
182,242
2613
8037
4
respectively. The taxi-in time is the summation of taxi-in delay and unimpeded taxi-in time. Similarly, the taxi-out time is the summation of taxi-out delay and unimpeded taxi-out time. The unimpeded taxi time represents the time required for taxi-in or taxi-out under optimal airport conditions.
96
6 Transportation Systems
Table 6.4 Operations and delays in 2012 Airports
Number of operations
Number of delays
Number of NAS delays
Percentage of NAS delays (%)
ATL
927,521
22,491
48,788
5
ORD
874,561
11,913
54,223
6
DFW
649,495
7711
28,643
4
DEN
617,901
5343
27,991
5
LAX
603,381
12,841
32,040
5
CLT
546,715
14,938
26,406
5
LAS
415,097
4846
14,985
4
IAH
511,202
6705
29,343
6
PHX
443,850
5029
18,375
4
PHL
441,000
10,227
35,412
8
DTW
428,070
6274
17,594
4
MSP
424,405
4427
14,345
3
SFO
419,867
2544
57,816
14
EWR
413,778
7520
52,881
13
JFK
405,083
10,041
27,181
7
MIA
387,253
7238
23,235
6
LGA
369,960
9966
37,736
10
BOS
346,144
3546
20,111
6
IAD
337,007
6460
16,244
5
SLC
286,905
1793
7460
3
SEA
307,379
3397
12,387
4
MCO
305,351
6727
15,817
5
DCA
286,448
6384
18,218
6
HNL
200,747
2065
5280
3
MEM
268,761
2863
8735
3
BWI
264,829
4777
14,433
5
FLL
248,275
5947
14,822
6
MDW
239,586
2808
7834
3
TPA
180,537
2625
8269
5
SAN
183,985
2341
6642
4
6.3.3 Methods of Airport Traffic Control and Runway Traffic Optimization Several procedures and tools (e.g., Traffic Management Advisor (TMA) and Surface Decision Support System (SDSS)) are implemented to mitigate delays at airport. They have various advantages but are insufficient during peak times. Continuous
6.3 Airport Traffic Control
97
Table 6.5 Average gate arrival delays (minutes) at 30 core airports Airports
2010
2011
2012
ATL
11.4
9.4
7.3
ORD
9.8
10.4
8.5
DFW
7.0
6.7
6.7
DEN
7.6
7.3
7.5
LAX
7.5
8.2
7.7
CLT
7.3
8.5
6.2
LAS
7.5
7.6
7.4
IAH
8.3
8.4
8.7
PHX
5.8
5.7
5.5
PHL
10.6
13.1
10.3
DTW
9.2
7.8
6.6
MSP
9.1
7.8
6.2
SFO
15.3
14.0
15.5
EWR
15.1
17.2
14.3
JFK
12.5
11.0
8.1
MIA
10.2
9.9
8.0
LGA
12.0
13.5
10.2
BOS
11.9
12.2
8.4
IAD
9.6
10.5
10.1
SLC
7.3
6.0
5.2
SEA
6.5
6.2
5.6
MCO
8.8
8.5
7.6
DCA
7.3
8.3
7.2
HNL
6.6
5.1
5.5
MEM
7.7
7.9
6.5
BWI
8.6
8.3
8.1
FLL
9.5
9.1
8.2
MDW
8.7
7.9
7.0
TPA
8.6
8.0
7.7
SAN
7.2
7.6
7.2
improvement of these procedures and tools becomes necessary. For instance, a Multicenter Traffic Management Advisor (McTMA; [48] was developed for air traffic management of the northern corridor, which has one of the most congested air traffic. The McTMA covers 400 nautical-mile radius, an upgraded from the Single-center Traffic Management Advisor (ScTMA) whose capacity is limited to 250 nauticalmile radius [48]. The McTMA was implemented at PHL and significantly reduced delays, airborne holding, and vectoring [32]. Sequencing is another focal point in
98
6 Transportation Systems
Table 6.6 Average departure delays (minutes) at 30 core airports Airport
2010
2011
2012
Airport
Gate
Queue
Airport
Gate
Queue
Airport
Gate
Queue
ATL
17.9
10.9
7.0
15.5
9.5
6.0
12.1
7.6
4.5
ORD
14.1
9.9
4.2
15.0
10.5
4.5
14.0
10.0
4.0
DFW
12.3
9.6
2.7
11.6
9.2
2.4
11.7
9.3
2.4
DEN
11.0
8.3
2.7
11.6
8.7
2.9
11.8
9.3
2.5
LAX
10.2
7.9
2.3
10.4
7.5
2.9
11.3
7.9
3.4
CLT
12.0
7.5
4.5
13.3
8.8
4.5
10.2
6.8
3.4
LAS
10.5
7.7
2.8
10.2
7.4
2.8
10.0
7.5
2.5
IAH
11.5
8.1
3.3
12.5
8.6
3.9
14.9
10.6
4.3
PHX
9.7
6.1
3.6
8.5
6.1
2.4
8.5
5.7
2.9
PHL
16.4
9.9
6.6
16.8
11.0
5.8
13.4
8.6
4.8
DTW
15.7
10.7
5.0
12.3
9.9
2.4
11.5
9.3
2.2
MSP
12.8
9.3
3.5
10.9
7.9
3.0
8.9
6.6
2.4
SFO
12.1
9.5
2.6
12.0
8.7
3.3
14.6
10.3
4.2
EWR
18.8
11.9
6.9
19.6
12.4
7.2
20.5
13.2
7.3
JFK
23.2
14.1
9.1
19.9
12.5
7.4
16.8
10.4
6.4
MIA
16.3
13.5
2.8
15.4
12.7
2.7
14.8
12.1
2.7
LGA
19.2
9.4
9.9
20.7
10.6
10.1
17.5
8.2
9.3
BOS
13.1
8.8
4.3
13.4
9.3
4.1
10.7
7.5
3.2
IAD
14.0
11.0
3.0
15.2
11.9
3.3
14.7
11.6
3.1
SLC
11.3
7.2
4.1
8.9
5.9
3.0
7.8
5.1
2.7
SEA
7.1
5.7
1.5
6.7
5.0
1.7
6.6
5.0
1.7
MCO
10.6
8.1
2.5
10.3
7.7
2.6
9.1
6.6
2.5
DCA
10.0
6.7
3.3
11.2
7.9
3.4
9.8
6.8
3.0
HNL
5.8
4.5
1.3
4.9
3.8
1.1
5.4
4.1
1.3
MEM
14.5
11.1
3.4
16.6
15.1
1.5
15.2
14.2
1.0
BWI
12.0
9.4
2.6
11.7
9.0
2.7
11.0
8.3
2.7
FLL
11.7
9.2
2.5
11.8
8.7
3.2
11.3
8.0
3.3
MDW
14.6
11.6
3.0
13.2
10.8
2.4
11.6
9.6
2.0
TPA
9.1
7.0
2.1
7.8
6.3
1.5
7.1
5.8
1.3
SAN
8.9
6.8
2.1
8.4
6.3
2.1
8.8
6.4
2.4
enhancement projects, and new sequencing rules (e.g., [39] help reduce delays. On the other hand, high-precision taxiing and automated control system significantly improve the airport efficiency by reducing the time required to clear taxiing and cross active runways [21]. Both FAA and NASA suggest that delays at airports are caused by not only air traffic but also surface operations.
11.3
8.9
9.4
8.0
8.9
7.5
6.4
7.5
7.1
6.4
10.6
7.9
6.0
8.6
10.6
8.4
7.6
6.6
6.6
6.5
ORD
DFW
DEN
LAX
CLT
LAS
IAH
PHX
PHL
DTW
MSP
SFO
EWR
JFK
MIA
LGA
BOS
IAD
SLC
1.9
1.4
1.5
2.4
2.9
3.7
2.0
1.7
2.4
2.9
2.0
2.3
2.0
1.6
2.7
3.0
2.1
3.7
3.3
4.3
4.6
5.2
5.1
5.2
5.5
6.9
6.7
4.2
5.6
7.7
4.5
4.8
5.5
4.8
4.8
5.9
6.0
5.7
5.6
7.0
5.9
6.8
7.0
7.1
7.8
9.3
9.0
6.4
7.0
8.5
6.3
6.9
7.6
6.3
8.5
9.2
7.9
8.7
8.7
10.8
2011 Taxi-in
Unimpeded taxi-in
Taxi-in
Taxi-in delay
2010
ATL
Airport
Table 6.7 Average taxi-in times (minutes) at 30 core airports
1.3
1.6
2.0
2.0
2.3
2.6
2.5
2.2
1.5
0.9
1.8
2.2
2.3
1.6
3.4
3.3
2.0
3.3
3.2
4.0
Taxi-in delay
4.5
5.2
5.0
5.2
5.5
6.7
6.5
4.3
5.5
7.6
4.5
4.7
5.3
4.8
5.1
5.8
5.9
5.4
5.5
6.8
Unimpeded taxi-in
2012
5.8
6.5
6.8
7.4
8.1
8.7
8.9
7.3
6.5
8.2
6.1
7.3
7.5
6.8
8.2
9.9
8.0
9.0
8.6
9.9
Taxi-in
1.3
1.4
1.9
2.2
2.5
2.0
2.9
3.1
1.2
0.8
1.7
2.9
2.4
2.1
3.1
4.0
2.0
3.6
3.2
3.1
Taxi-in delay
4.5
5.1
4.9
5.2
5.6
6.7
6.1
4.3
5.3
7.4
4.4
4.5
5.1
4.7
5.1
5.9
5.9
5.4
5.5
6.8
(continued)
Unimpeded taxi-in
6.3 Airport Traffic Control 99
6.0
7.3
5.1
6.1
7.3
5.8
4.7
5.3
4.8
3.6
MCO
DCA
HNL
MEM
BWI
FLL
MDW
TPA
SAN
0.9
0.9
1.4
1.1
1.7
1.4
1.6
1.4
2.2
1.2
2.8
3.8
3.9
3.6
4.1
5.9
4.5
3.6
5.2
4.8
3.5
4.5
5.2
4.8
5.6
5.9
5.9
5.0
7.6
5.8
2011 Taxi-in
Unimpeded taxi-in
Taxi-in
Taxi-in delay
2010
SEA
Airport
Table 6.7 (continued)
0.8
0.8
1.3
1.1
1.5
0.5
1.1
1.3
2.3
1.0
Taxi-in delay
2.7
3.8
3.9
3.6
4.1
5.5
4.9
3.7
5.3
4.9
Unimpeded taxi-in
2012
3.8
4.6
5.5
4.6
5.6
5.5
6.2
4.8
7.3
6.2
Taxi-in
1.1
0.9
1.6
1.0
1.7
0.3
1.3
1.1
2.1
1.2
Taxi-in delay
2.7
3.7
3.9
3.6
3.9
5.1
4.8
3.7
5.2
4.9
Unimpeded taxi-in
100 6 Transportation Systems
16.3
14.8
14.3
14.1
17.2
14.1
16.1
14.9
19.0
20.0
17.3
16.0
20.9
28.7
16.1
24.4
18.4
16.0
18.3
ORD
DFW
DEN
LAX
CLT
LAS
IAH
PHX
PHL
DTW
MSP
SFO
EWR
JFK
MIA
LGA
BOS
IAD
SLC
5.3
3.7
5.5
11.5
3.5
10.1
8.1
3.7
4.3
6.1
7.5
4.9
4.3
3.9
5.4
3.2
3.5
3.6
5.1
8.6
13.0
12.3
12.9
12.9
12.6
18.6
12.8
12.3
13.0
13.9
11.5
10.1
11.8
10.3
11.7
10.9
10.9
11.2
11.2
12.7
17.4
16.0
18.2
24.7
16.2
26.0
21.0
16.4
16.5
17.6
19.4
13.8
16.5
14.3
17.9
14.9
14.0
14.4
16.3
20.3
Taxi-out
21.3
2011
Unimpeded taxi-out
Taxi-out
Taxi-out delay
2010
ATL
Airport
Table 6.8 Average taxi-out Times (minutes) at 30 core airports
4.5
4.0
5.2
11.7
3.4
8.5
8.5
4.4
3.7
3.3
6.6
3.8
4.8
3.8
5.5
4.0
3.7
3.3
5.6
7.5
Taxi-out delay
12.9
12.0
13.0
13.0
12.8
17.5
12.5
12.0
12.8
14.3
12.9
10.1
11.7
10.5
12.3
10.9
10.3
11.1
10.7
12.8
Unimpeded taxi-out
17.2
15.7
17.2
24.0
16.2
25.0
21.3
17.5
16.1
17.3
18.5
14.5
16.4
13.9
16.9
15.4
13.6
14.5
16.1
18.8
Taxi-out
2012
4.3
3.8
4.3
11.0
3.5
7.6
8.7
5.4
3.4
3.1
5.6
4.4
5.3
3.6
4.6
4.5
3.3
3.4
5.0
6.0
Taxi-out Delay
(continued)
12.9
11.9
12.9
13.0
12.8
17.4
12.6
12.1
12.8
14.2
13.0
10.1
11.2
10.4
12.3
10.9
10.3
11.1
11.0
12.8
Unimpeded taxi-out
6.3 Airport Traffic Control 101
13.1
15.7
12.7
16.8
12.5
15.1
11.5
12.5
12.9
MCO
DCA
HNL
MEM
BWI
FLL
MDW
TPA
SAN
3.1
3.0
3.5
3.2
3.4
4.4
2.3
4.4
3.4
2.4
9.8
9.4
8.0
11.9
9.1
12.5
10.4
11.3
9.6
11.9
12.9
11.9
11.0
15.9
12.5
15.2
12.7
16.1
13.5
14.1
Taxi-out
14.3
2011
Unimpeded taxi-out
Taxi-out
Taxi-out delay
2010
SEA
Airport
Table 6.8 (continued)
3.2
2.4
3.0
4.2
3.4
2.1
2.0
4.3
3.7
2.6
Taxi-out delay
9.8
9.5
8.1
11.7
9.2
13.1
10.7
11.8
9.8
11.5
Unimpeded taxi-out
13.4
11.7
10.6
16.1
12.7
14.7
13.0
15.7
13.4
14.1
Taxi-out
2012
3.6
2.2
2.6
4.4
3.4
1.6
2.3
3.9
3.6
2.6
Taxi-out Delay
9.8
9.5
8.0
11.7
9.2
13.0
10.7
11.8
9.8
11.5
Unimpeded taxi-out
102 6 Transportation Systems
6.3 Airport Traffic Control
103
Major U.S. airports have been experiencing significant delays, and most delays occur during taxi-in and taxi-out activities. A trajectory prediction-based conflict detection and resolution (CD&R) model can address taxiing delays more efficiently than currently available tools. Conflicts in airport surface operations can be defined as incompatibilities between constraints, which are based on 4D (three-dimensional space plus time) trajectory predictions. Conflicts arise under several conditions, including takeoffs/departures, deviation from assigned clearance, and updated trajectories based on new information. A successful CD&R methodology must answer four questions: 1. 2. 3. 4.
How can the dynamically changing constraints such as aircraft movements and spontaneous airport adjustments be modeled? How can the interactions between the constraints be identified and analyzed? How conflicts propagate and trigger other conflicts? Can a solution of conflicts cause further conflicts?
An airport has multiple conflicts at any time during operations. These conflicts evolve and form complex networks. The ultimate question in airport traffic control is whether a decision support system based on the complex network theory can detect and resolve conflicts on taxiways and runways. A constraint indicates the 4D trajectories of aircraft and ground transportation vehicles and resources used by these vehicles. There are two types of constraints in a complex network, time constraints and capacity constraints. Time constraints describe predicted times of entering and exiting intersections and runway and taxiway segments. Capacity constraints depict the acceptable number of vehicles, including aircraft and ground transportation vehicles, on a runway or taxiway section or at an intersection at a given time. The complex network theory is applied to provide guidance on local and global conflict resolutions and decisions. There are three different conflict resolution approaches: (a) global conflict resolution for all resources. All resources have the same priority. Time constraints for all resources are randomly selected to resolve conflicts; (b) local conflict resolution for resources involved in conflicts. Conflict resolutions are confined to the resources involved in conflicts. Time constraints are randomly selected to resolve conflicts; this is the same for global conflict resolutions in which all resources have the same priority. In local conflict resolutions, however, conflicts on the use of one resource does not propagate to other resources, whereas conflicts may propagate among resources in global conflict resolutions; and (c) hybrid approach of global conflict resolution for high priority resources (e.g., runways) and local conflict resolution for low priority resources (e.g., taxiways). Compared to local conflict resolutions, global conflict resolutions are more likely to find a solution because of its large solution base, but conflict propagation poses safety risks and global conflict resolutions can be computationally challenging for large airports. Local conflict resolutions minimize safety risks, but can result in infeasible solutions. The hybrid approach for decision support is expected to provide feasible yet low risk solutions for high throughput surface operations. A typical airport has several runways and taxiways. Figure 6.3 shows the St. Louis Lambert International Airport (STL) with four runways and numerous taxiways.
104
6 Transportation Systems
Fig. 6.3 The STL airport in 2010
The STL is a medium-size airport. Larger airports and hubs have more runways and taxiways. Runway traffic optimization aims to minimize incursions and delays, which improves flight safety, reduces cost, increases customer satisfaction and utilization of runways to accommodate more flights. There are several parameters that must be included in a network of constraints to help optimize runway traffic. These parameters include (a) aircraft ground speed; (b) type, size, and capacity of the aircraft, which determine how much runway the aircraft uses before pulling into a taxiway; and (c) the gate where the aircraft is approaching or leaving. There are also conflicts of interest. Airlines want to minimize the fuel consumption and maximize operations efficiency. The FAA is responsible for minimizing the risk of incursions on ground or in air. Airports often have their own objectives and priorities.
References 1. Abkowitz M, List G, Radwan AE (1989) Critical issues in safe transport of hazardous materials. J Transp Eng 115(6):608–629 2. Albert R, Jeong H, Barabási AL (1999) Internet: diameter of the World-Wide Web. Nature 401:130–131 3. Albert R, Jeong H, Barabási AL (2000) Error and attack tolerance of complex networks. Nature 406:378–382 4. Angeles Serrano M, De Los Rios P (2007) Interfaces and the edge percolation map of random directed networks. Phys Rev E: Stat, Nonlin, Soft Matter Phys 76:56–121 5. Association of American Railroads (2012) Annual Report of hazardous materials transported by rail: calendar year 2011, Washington, DC 6. Ball T, Rajamani S (2000) Bebop: a symbolic model-checker for boolean programs,” In: Proceedings of the 7th international SPIN workshop, Lecture Notes in computer science 1885
References
105
7. Bazargan M, Fleming K, Subramanian P (2002) A simulation study to investigate runway capacity using TAAM. In: Proceedings of the winter simulation conference 2002 8. Barabási AL, Albert R (1999) Emergence of scaling in random networks. Science 286:509–512 9. Barkan CPL, Glickman TS, Harvey AE (1991) Benefit–cost evaluation of using different specification tank cars to reduce the risk of transporting environmentally sensitive chemicals. Transp Res Rec 1313:33–43 10. Barkan CPL, Ukkusuri S, Waller ST (2007) Optimizing the design of railway tank cars to minimize accident-caused releases. Comput Oper Res 34(5):1266–1286 11. Barkan CPL (2008) Improving the design of higher-capacity railway tank cars for hazardous materials transport: optimizing the trade-off between weight and safety. J Hazard Mater 160:122–134 12. Bianconi G, Barabási AL (2001) Bose-Einstein condensation in complex networks. Phys Rev Lett 86:5632–5635 13. Bollobas B (2001) Random graphs, 2nd edn. Academic Press, New York 14. Brandstädt A. Le VB, Spinrad JP (1999) Graph classes: a survey. In: Discrete mathematics and algorithms. Society for Industrial and Applied Mathematics 15. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the web. Comput Netw 33:309–320 16. Callaham MB, DeArmon JS, Cooper AM, Goodfriend JH, Moch-Mooney D, Solomos GH (2001) Assessing NAS performance: Normalizing for the effects of weather. In: Proceedings of the 4th USA/Europe air traffic management R&D symposium 17. Center for Chemical Process Safety (2008) Guidelines for chemical transportation safety, security, and risk management, 2nd edn. Hoboken, NJ 18. Chen XW, Nof SY (2009) Automating errors and conflicts prognostics and prevention. In: Nof SY (ed) Springer handbook of automation. Springer, Heidelberg, Germany, pp 503–525 19. Chen XW, Nof SY (2012a) Constraint-based conflict and error management. Eng Optim 44(7):821–841 20. Chen XW, Nof SY (2012b) Conflict and error prevention and detection in complex networks. Automatica 48(5):770–778 21. Cheng VHL, Sharma V, Foyle DC (2001) A study of aircraft taxi performance for enhancing airport surface traffic control. IEEE Trans Intell Transp Syst 2(2):39–54 22. Cohen R, Erez K, Ben-Avraham D, Havlin S (2000) Resilience of the internet to random breakdowns. Phys Rev Lett 85:4626–4628 23. Cohen R, Erez K, Ben-Avraham D, Havlin S (2001) Breakdown of the internet under intentional attack. Phys Rev Lett 86:3682–3685 24. Corbett JC, Dwyer MB, Hatcliff J, Laubach S, Pasareanu CS, Robby, Zheng H (2000) Bandera: Extracting finite-state models from Java source code. In Proceedings of the 22nd international conference on software engineering 25. Crainic TG (1987) Freight transport planning and logistics. In: Proceedings of an international seminar on freight transport planning and logistics, Bressanone, Italy, pp 463–509 26. Dawande M, Mookerjeeh V, Sriskandarajah C, Zhu Y (2011) Structural search and optimization in social networks. INFORMS J Comput 24(4):611–623 27. Donaldson AD (2011) Improvement of terminal area capacity in the New York Airspace. Ph.D. Dissertation, Massachusetts Institute of Technology 28. Erd˝os P, Rényi A (1959) On random graphs. Publicationes Matehmaticae Debrecen 6:290–291 29. Erd˝os P, Rényi A (1960) On the evolution of random graphs. Magyar Tud Akad Mat Kutato Int Kozl 5:17–61 30. Erd˝os P, Rényi A (1961) On the strength of connectedness of a random graph. Acta Math Academiae Scientiarum Hung 12:261–267 31. Erkut E, Tjandra SA, Verter V (2007) Hazardous materials transportation. In: Barnhart C, Laporte G (eds) Handbooks in operations research and management science, vol 14. Transportation, North-Holland, Amsterdam, The Netherlands 32. Farley CT, Landry SJ, Hoang T, Nickelson M, Levin KM, Rowe D, Welch JD (2005) Multicenter traffic management advisor: Operational test results. In: Proceedings of the AIAA 5th aviation technology, integration, and operations conference
106
6 Transportation Systems
33. Federal Aviation Administration (FAA) (2012) Terminal area forecast summary: fiscal years 2011–2040 34. Glassco R (2011) State of the practice of techniques for evaluating the environmental impacts of ITS deployment. Report No. FHWA-JPO-11-142, Prepared by Noblis, Inc. for U.S. DOT Research and Innovative Technology Administration ITS Joint Program Office 35. Glickman TS (1983) Rerouting railroad shipments of hazardous materials transportation risk assessment. Accid Anal Prev 15(5):329–335 36. Glickman TS, Rosenfield DB (1984) Risks of catastrophic derailments involving the release of hazardous materials. Manage Sci 30(4):503–511 37. Godwin T, Gopalan R, Narendran TT (2008) Tactical locomotive fleet sizing for freight train operations. Transp Res Part E 44:440–454 38. Goulias KG (2002) Transportation systems planning, freight transportation planning: models and methods. CRC Press, Frank Southworth Oak Ridge National Laboratory 39. Helbing K, Spaeth T, Valasek J (2006) Improving aircraft sequencing and separation at a small aircraft transportation system airport. J Aircr 43(6):1636 40. Hill CJ, Garrett JK (2011) AASHTO connected vehicle field infrastructure deployment analysis. Prepared by Mixon Hill, Inc. for the American Association of State Highway Officials and the U.S. DOT Research and Innovative Technologies Administration, Report No. FHWA-JPO-11090 41. Kalay S, French P, Tournay HM (2011) The safety impact of wagon health monitoring in North America. In: Proceedings of the world congress on railway research, Lille, France 42. Kawprasert A, Barkan CPL (2008) Effects of route rationalization on hazardous materials transportation risk. Transp Res Rec 2043:65–72 43. Kawprasert A (2010) Quantitative analysis of options to reduce risk of hazardous materials transportation by railroad. Ph.D. Dissertation, Department of Civil and Environmental Engineering. University of Illinois at Urbana-Champaign, Urbana, IL 44. Kawprasert A, Barkan CPL (2010) Effect of train speed on risk analysis of transporting hazardous materials by rail. Transp Res Rec 2159:59–68 45. Keaton MH (1989) Designing optimal railroad operating plans: Lagrangian relaxation and heuristic approaches. Transp Res Part B Methodol 23(6):415–431 46. Klosterhalfen ST, Kallrath J, Fischer G (2014) Rail car fleet design: Optimization of structure and size. Int J Prod Econ 157:112–119 47. Kraay DR, Harker PT (1995) Real-time scheduling of freight railroads. Transp Res Part B Methodol 29(3):213–229 48. Landry SJ (2004) Modifications to the design of the multi-center traffic management advisor distributed scheduler. In: Proceedings of the AIAA 4th aviation technology, integration, and operations (ATIO) technical forum 49. Lecoutre C (2009) Constraint networks: techniques and algorithms. Wiley, Hoboken, NJ 50. Liu X, Saat MR, Barkan CPL (2011a) Benefit–cost analysis of heavy haul railway track upgrade for safety and efficiency. In: Proceedings of the international heavy haul association conference, Calgary, Canada 51. Liu X, Barkan CPL, Saat MR (2011b) Analysis of derailments by accident cause: evaluating railroad track upgrades to reduce transportation risk. Transp Res Rec 2261:178–185 52. Martinelli DR, Teng H (1996) Optimization of railway operations using neural networks. Transp Res Part C Emerg Technol 4(1):33–49 53. MITRE Corporation (2013) An Analysis of airports and metropolitan area demand and operational capacity in the future 54. Mitretek Systems (2001) Intelligent transportation systems benefits: 2001 update, The federal highway administration, United States Department of Transportation, Washington, D.C. 55. Newman MEJ (2003) The structure and function of complex networks. SIAM Rev 45:167–256 56. Newman MEJ, Barabási AL, Watts DJ (2006) The structure and dynamics of networks. Princeton University Press, Princeton, N.J. 57. Ouyang Y, Li X, Lai YC, Barkan CPL, Kawprasert A (2009) Optimal locations of railroad wayside defect detection installations. Comput Aided Civil Infrastruct Eng 24:1–11
References
107
58. Pålsson BA (2013) Design optimisation of switch rails in railway turnouts. Veh Syst Dyn 51(10):1619–1639 59. Resor RR, Zarembski AM (2004) Factors determining the economics of wayside defect detectors. In: Proceedings of transportation research board of the national academies, Washington, DC 60. Rossetti M (2003) Potential impacts of climate change on railroads. Geography 61. Saat MR, Barkan CPL (2005) Release risk and optimization of railroad tank car safety design. Transp Res Rec 1916:78–87 62. Saat MR (2009) Optimizing railroad tank car safety design to reduce hazardous materials transportation risk. Ph.D. Dissertation, Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana 63. Saat MR, Barkan CPL (2011) Generalized railway tank car safety design optimization for hazardous materials transport: addressing the trade-off between transportation efficiency and safety. J Hazard Mater 189:62–68 64. Simaiakis I, Balakrishnan H (2010) Impact of congestion on taxi times, fuel burn, and emissions at major airports. Transp Res Rec 2184(1):22–30 65. Solomonoff R, Rapoport A (1951) Connectivity of random nets. Bull Math Biophys 13:107–117 66. Tcha D, Yoon M (1995) Conduit and cable installation for a centralized network with logical star-star topology. IEEE Trans Commun 43:958–967 67. Tournay HM, Cummings S (2005) Monitoring the performance of railroad cars by means of wayside detectors in support of predictive maintenance. In: Proceedings of the 8th international heavy haul conference, Rio De Janeiro, Brazil 68. Tu F, Pattipati KR, Deb S, Malepati VN (2003) Computationally efficient algorithms for multiple fault diagnosis in large graph-based systems. IEEE Trans Syst Man Cybern Part A Syst Hum 33(1):73–85 69. Tu F, Pattipati KR (2003) Rollout strategies for sequential fault diagnosis. IEEE Trans Syst Man Cybern Part A Syst Hum 33(1):86–99 70. Tyrell D, Jeong DY, Jacobsen K (2007) Improved tank car safety research. In: Proceedings of the ASME rail transportation division fall technical conference, Chicago, IL 71. U.S. DOT research and innovative technologies administration ITS Joint Program Office (2010) Achieving the Vision: From VII to IntelliDrive. White Paper 72. U.S. National Transportation Safety Board (2005) Railroad Accident Report, RAR-05–04, Washington, DC 73. Zabarah A, Callahan B, Antezano C, Lamartin D, Henriques R, Rahman S (2009) Breaking through the bottleneck transportation to make Stewart a viable New York airport. In: Systems and information engineering design symposium
Chapter 7
Adaptive Algorithms for Knowledge Acquisition Over Complex Networks
Knowledge acquisition over complex networks is a process of identifying nodes (knowledge sources) that possess the required knowledge. Despite substantial research on knowledge management, it remains a challenge to effectively acquire distributed knowledge from a few entities within a network of interrelated knowledge sources. This main challenges in knowledge acquisition are: (a) what are the structures and characteristics of knowledge networks? (b) how can knowledge be acquired from a complex knowledge network? (c) how can different knowledge acquisition algorithms be evaluated to identify the most effective algorithm for a specific knowledge network? and (d) how do the dynamics of knowledge networks, i.e., interactions between and changes of knowledge sources, affect the performance of knowledge acquisition algorithms? To overcome these challenges, it is necessary to develop a complex knowledge network theory and effective knowledge acquisition algorithms for different knowledge networks with various scales. The algorithms must be network adaptive in the sense that they can dynamically adjust themselves in response to network properties as the knowledge acquisition process evolves. An important assumption in knowledge acquisition is that knowledge is distributed and an omniscient entity does not exist. Multiple distributed sources may possess the same knowledge. Knowledge acquisition can involve the identification of one source for knowledge or identifying all the sources for further analysis. In other cases, each source possesses a portion of the knowledge, while complete knowledge of the subject can only be obtained by identifying all distributed knowledge sources. Knowledge consists of data and/or information that have been organized and processed to convey understanding, experience, accumulated learning, and expertise for problem solving. Knowledge acquisition over complex networks is a process of identifying nodes (knowledge sources) that possess the required knowledge. There are at least three types of knowledge networks:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 X. W. Chen, Network Science Models for Data Analytics Automation, Automation, Collaboration, & E-Services 9, https://doi.org/10.1007/978-3-030-96470-2_7
109
110
1.
2.
3.
7 Adaptive Algorithms for Knowledge Acquisition Over Complex …
Professional organizations, institutions, and other connected knowledge groups, e.g., Institute of Industrial and Systems Engineers (IISE), National Institute of Standards and Technology (NIST), university presidents and provosts. In these organizations, each member is a knowledge source. Some sources are connected to other sources and some are not. They may form a random network, a scale-free network, a hierarchical network, or other types of networks. When two sources are connected, there are also different types of relationships. For instance, sources may form a subgroup, also known as a circle, in a knowledge network, and each source in the group is aware of what knowledge other sources in the same group have or do not have. Activities such as conducting a search of a president for a university are typical knowledge acquisition processes. Supply chains in which customers and suppliers form a knowledge network. Knowledge desired from suppliers includes what products they can provide, price, quantity, lead time, due-date performance, and other features for which customers are concerned. Knowledge required from customers includes what products are needed, acceptable price, delivery date, payment methods, and other features for which suppliers are concerned. Supply chains may be hierarchical networks in which a knowledge source can be both a customer and a supplier. In this case, a supply network is not a strict bipartite network [19]. The Internet and World-Wide Web (WWW). Each Web site or Web page is a knowledge source. The Web pages on the WWW and the underling physical structure of the Internet are well-known SFNs [1].
The importance of knowledge acquisition is twofold: (a) First, the need to obtain knowledge at the right time is critical for the success of organizations and individuals. As the amount of data and information increases exponentially, knowledge becomes highly distributed. Rather than reinventing the wheel to find solutions for problems, researchers need to ask the more challenging question of how to find existing knowledge. To obtain knowledge effectively can reduce or eliminate waste of resources and significantly improve system performance. For instance, through effective knowledge acquisition, lives may be saved if doctors can quickly acquire critical knowledge about certain diseases, or pharmaceutical companies can use an effective algorithm to find necessary supplies to produce sufficient vaccines; energy consumption may be reduced if a supplier can identify real-time customer demand and consolidate customers’ orders for optimal shipping schedules; (b) Secondly, effective knowledge acquisition is the cornerstone of collaboration science. Advances in science and technology have enabled access to a large amount of data and information since 1970s, mostly through the Internet. In the last ten to twenty years, the Internet has enabled real-time interactions between individuals and groups. At the beginning of this century, we are marching into a new era that will hopefully provide a platform for effective collaboration among distributed entities, which requires timely identification of knowledge sources as potential collaboration partners. For instance, one of the objectives of emergency preparedness is to establish a collaboration network
7 Adaptive Algorithms for Knowledge Acquisition Over Complex …
111
whose members have complementary knowledge and/or skills. To this end, knowledge acquisition is fundamental to collaboration and the advance of science and technology. A variety of knowledge management systems, including expert systems, decision support systems, and artificial intelligence systems, and tools such as interviewing, neural network, and data mining, have been developed to extract knowledge from individuals, data, and information. It is often assumed that knowledge sources can be easily identified, whereas one of the practical challenges of many organizations is how to acquire knowledge most effectively from a few sources within a large, complex network. To acquire knowledge from such networks poses a real challenge because of their scale and complexity. A large knowledge network, such as a professional organization or a supply chain, can easily have over ten thousand nodes (knowledge sources; members in a professional organization or customers and suppliers in a supply chain). Multiple nodes may own the same, complete piece of knowledge, or only part of certain knowledge. Knowledge networks are complex because there are many nodes and various types of direct and indirect relationships between nodes.
7.1 Strengths and Limitations of Knowledge Management and Web Mining A variety of topics including knowledge representation [27], knowledge acquisition [12], and knowledge management [17], have been studied in the context of intelligent systems [23, 24]. The advance in knowledge management has enabled knowledge representation in many different ways. A wide range of information systems such as decision support systems, expert systems, databases, and knowledge bases have been developed to manage knowledge after they are acquired. Other tools and techniques, e.g., data mining and interviewing, deal with knowledge extraction from individuals, data sets, or information. Little work [12] has been performed on acquiring knowledge over a network of interrelated sources. The closest analogy to knowledge acquisition over complex networks is Web mining, which employs tools including fuzzy sets, artificial neural networks, genetic algorithms, and rough sets [21]. The objective of Web mining is to either extract knowledge from a single entity, e.g., text, Web page, and table, or retrieve data and information from multiple sources. Web mining is similar to data mining when the objective is to extract knowledge from a single entity. When the objective is to retrieve data and information from multiple sources, Web mining is similar to knowledge acquisition over complex networks but lacks substantial effort in the development of effective mining algorithms to take advantage of network characteristics and respond to network dynamics. The difference between Web mining and knowledge acquisition over networks also lies in the two main challenges for Web mining: (a) how to acquire knowledge that is not available on the Web (e.g., knowledge for suppliers and customers in a supply
112
7 Adaptive Algorithms for Knowledge Acquisition Over Complex …
chain); and (b) how to model interactions between multiple sources when knowledge is implicit. For instance, it is often not sufficient to go through information available online for members in a professional organization to locate specific knowledge. This is because knowledge owned by a source is often implicit. Even if it is explicit, it may be too general and lack specifics. The source may even deny offering knowledge for various reasons. The interactions between knowledge sources must be modeled for effective knowledge acquisition.
7.2 Complex Networks The study of complex networks has had a long history (see Chapter 1 for a brief discussion). The classic model of network, the random network, was first discussed in the early 1950s [26] and was rediscovered and analyzed in a series of papers published in the late 1950s and early 1960s [13–15]. The degree of a node, θ , is the number of links connected to it. In a random network with n nodes and probability Pr to connect any pair of nodes, the maximum number of links in the network is n − 1 1 n(n − 1). The probability that a node has degree θ is Pr θ (1 − Pr )n−1−θ , 2 θ which is also the fraction of nodes in the network that have degree θ . The average degree θ = (n − 1)Pr . One of the important properties of a random network is phase transition or bond percolation [2, 20, 26]. There is a phase transition from a fragmented random network for the mean degree θ ≤ 1 to a random network dominated by a giant component for the mean degree θ > 1. Two other types of networks that have been studied extensively and capture the topology of many real-world networks are: (a) scale-free network [1, 3, 7, 22]. In scale-free networks, the probability Prθ that a node has θ degree follows a power law distribution, i.e., Prθ ∝ θ −γ , where γ is between 2.1 and 4 for real-world scalefree networks [3], and (b) Bose-Einstein condensation network [5, 6], which was discovered in an effort to model the competitive nature of networks. A fitness model [6] was proposed to assign a fitness parameter ξi to each node i. A node with higher ξi has higher probability to obtain links. ξi is randomly chosen from a distribution ρ(ξ ). Table 7.1 summarizes the three types of complex networks with examples. Yet another type of complex networks that has been gaining attention recently is hierarchical networks in biology and supply chains. Biological networks can be decomposed into modular components that recur across and within organisms. The top level in a hierarchical biological network is comprised of interacting regulatory motifs consisting of groups of 2–4 components such as proteins or genes [16, 25, 30]. Motifs are small sub-networks and pattern searching techniques can be used to determine the frequency of occurrence of these simple motifs [25]. At the lowest level in this hierarchy is the module describing transcriptional regulation [4]. In supply chains, a customer may have multiple suppliers some of which may have their suppliers. The top level in a hierarchical supply chain is comprised of customers only
7.2 Complex Networks
113
Table 7.1 Summary of three types of complex networks
Network characteristic
Random network
Scale-free network
Bose-Einstein condensation network
The link between any pair of nodes exists with a probability p
The degree distribution follows a power law distribution
The fittest node acquires a finite fraction of all links, independent of the size of the network
cθ −γ , c is a positive constant
The fittest nodes acquires about 80% of all links
Homogeneous: the majority of nodes have the same number of links and nodes that deviate from the average are rare
Hierarchical: nodes span from rare hubs that have many links to the numerous tiny nodes with a few links
Winer-takes-all: the fittest node has a large fraction of all links
US highway systems
World-Wide Web, actor collaboration, power grid
Supply chains
Degree distribution Topology
n−1 θ
Pr θ (1 − Pr )n−1−θ
Topology example n = 10; Pr = 0.17; γ = −3; c = 1/1.2 Real-world examples
whereas the lowest level is comprised of suppliers only. All other levels in the middle are both customers and suppliers. Unlike random networks, scale-free networks, and Bose-Einstein condensation networks, analytical models have not been developed to describe the topology and reveal properties of hierarchical networks. The rich analytical and experimental studies on complex networks have laid a solid foundation for the development of a complex knowledge network theory. The analytical study of knowledge acquisition algorithms is largely based on available network theory and statistical tools.
7.3 Network-Adaptive Algorithms Many domain-specific algorithms have been developed to address network problems, for instance, model checking for software design [11], testing algorithms for
114
7 Adaptive Algorithms for Knowledge Acquisition Over Complex …
Table 7.2 Performance of CEPD algorithms over the same undirected networks Performance measures
Centralized algorithm
Decentralized algorithm
TT
T −TS
max(td )
CA
pg PrC E r
PA
pg PrC E r 1 −
Condition
T −T S t d +2t i
T S+(t d +2t i )/ pg T
j ∈ o(n); 1 − pg j → 0
PrC E r
≥ pg PrC E r 1 − TS T n
T S+t d T
∈ θ(n); ti = 0; td ≤ T − T S
hardware design [28, 29], and decentralized algorithms for conflict and error prevention and detection [8, 9]. Most recently, network-adaptive algorithms [10] have been developed to prevent and detect conflicts and errors over complex networks. There are two types of conflict and error prevention and detection (CEPD) algorithms: centralized and decentralized. The centralized algorithm has a central unit that controls data, information, and algorithm execution, whereas the decentralized algorithm has multiple agents that perform tasks simultaneously. Substantial analytical results have been derived and validated by simulation experiments. Table 7.2 shows a portion of analytical results for CEPD algorithms [10]. T T , C A, and P A are mean total time, mean coverage ability, and mean preventability, respectively. T and T S are the end time and start time of an invariant model. td and ti are detection time and communication time between agents, respectively. pg is the order of the largest component (or giant component) in a complex network. pg is different for different complex networks. PrC E is the probability that a node has a conflict or an error. r is the reliability of a detection technique. These analytical results can offer powerful insight into the performance of CEPD algorithms and help develop network-adaptive algorithms. Analytical models, however, unavoidably admit assumptions as shown in the last row of Table 7.2, to allow tractable analytical results. Experimental studies must be followed to validate analytical results and reveal algorithms’ performance when certain assumptions are relaxed. The success of network-adaptive CEPD algorithms has shown the potential for developing effective network-adaptive algorithms for knowledge acquisition. Three complex networks, a random network, a scale-free network, and a Bose-Einstein condensation network, were studied in CEPD and these networks may not reflect the topology and characteristics of knowledge networks. The relationships and interactions between nodes in knowledge networks are most likely different than those in CEPD networks. The algorithms for these two different areas are therefore different. The types of algorithms studied for CEPD, centralized and decentralized, can be applied to knowledge acquisition. The network-adaptive algorithms for CEPD serve as a good reference for the development of knowledge acquisition algorithms. The CEPD algorithms, however, lack (a) network dynamics analysis. The CEPD algorithms assumed that interactions and relationships between network nodes do not change, and nodes do not join or leave the network (invariant model), and (b) adaptability to real-world problems. The centralized and decentralized algorithms are two extremes: sequential versus parallel. In real-world knowledge acquisition
7.3 Network-Adaptive Algorithms
115
applications, mixed algorithms that are partly centralized and partly decentralized, are necessary for many problems. For instance, in a hierarchical supply chain, a customer may be able to acquire knowledge from its direct suppliers with centralized algorithms, i.e., sequentially interacting with direct suppliers to acquire knowledge, while the direct suppliers can acquire knowledge from their respective suppliers with decentralized algorithms, i.e., multiple direct suppliers can acquire knowledge from their suppliers simultaneously and convey the knowledge to the customer. The mixed algorithms may be a good fit for hierarchical networks.
7.4 Development of Knowledge Acquisition Algorithms Knowledge acquisition requires a combined approach based on the development in knowledge management and is implemented by incorporating extensive analytical and experimental studies on complex networks and existing techniques in algorithms design and analysis for highly distributed systems. Knowledge acquisition algorithms integrate three disciplines: Computer Science for agent technology and intelligent systems, Industrial Engineering for knowledge management, and Physics for complex network theory. To understand the essentials of this interdisciplinary approach, Fig. 7.1 shows how algorithms with best performance are developed to acquire knowledge. Rectangles in Fig. 7.1 denote activities and are numbered; numbers indicate the sequence. Activities that can be performed simultaneously have the same number, e.g., 1 and 1’. An oval denotes the product of an activity. Dashed lines indicate a loop that is repeated until appropriate products are obtained. There are three phases in the development of the complex knowledge network theory and knowledge acquisition algorithms. Phase I is requirement modeling and network identification. Required knowledge must be appropriately represented (activity 1) with clear objectives (activity 2). Network dynamics analysis in this stage focuses on identifying relationships between knowledge sources. Network identification provides an analytical model describing the relationships between nodes. Activities 1’ and 2’ repeat until a correct network model is identified. The first task in Phase I is to collect and review data from complex knowledge networks such as connected knowledge groups, supply chains, and the Internet and WWW. These knowledge networks are then studied to collect data and identify relationships between nodes. For the Internet and WWW, a local area network (LAN) or Web pages inside an organization can be studied. There are distinct differences between the different types of knowledge networks. In terms of topology, the Internet and WWW are scale-free networks. Supply chains are loosely defined bipartite networks and can be modeled as hierarchical networks. Connected knowledge groups may be random networks, scale-free networks, BoseEinstein condensation networks, or hierarchical networks. In terms of relationships between nodes, connected groups, the Internet, and WWW are undirected networks. Supply chains are directed networks but may include some undirected links. Aside from the direction, each link may indicate different relationships such as proximity,
116
7 Adaptive Algorithms for Knowledge Acquisition Over Complex …
Fig. 7.1 Three-phase development process for the complex knowledge network theory and knowledge acquisition algorithms
precedence, inclusive, and exclusive. In terms of interactions between nodes, the Internet and WWW have the least number of interactions whereas connected knowledge groups and supply chains require high-level interactions between knowledge sources. Requirement modeling aims at representing knowledge and defining the objective of knowledge acquisition. Knowledge can be represented as logic, frames, rule-based systems, or other formats. A review of a knowledge network will help determine the most appropriate representation and related objectives for knowledge acquisition. At least four different objectives exist when knowledge is required: 1. 2. 3.
Multiple copies of the same knowledge are required, e.g., a group of suppliers for the same product is needed. Only one copy of the knowledge is required, e.g., any member of IISE who is familiar with gradient search is desired. No knowledge sources own the complete piece of knowledge. All sources that own part of the knowledge are required to provide their knowledge. For instance, to prepare a documentary for an event, all Web pages that cover the event are identified to piece together the documentary.
7.4 Development of Knowledge Acquisition Algorithms
4.
117
No knowledge sources own the complete piece of knowledge. Distributed knowledge that can together provide the complete knowledge is required. For instance, a group of surgeons who have complementary skills are needed for a surgery.
Network identification is part of system identification where data-driven methods, linear approximations, and mechanistic models [18] can be used to infer models from system observations. The methodology for network identification in this project is the data-driven method. Data collected from knowledge networks will be analyzed with statistical tools and software programs to infer analytical models. Once an analytical model is identified, simulation programs can be developed to validate if the model is a good fit for a knowledge network. Phase II focuses on algorithms development. With clearly defined knowledge requirements, centralized, decentralized, and mixed algorithms are developed. Each knowledge source is a work agent that interacts with other agents to acquire knowledge. Agent modeling is part of network dynamics analysis that determines time requirements for interactions between agents and develops communication protocols for agents. Performance measures align algorithms’ objectives with knowledge requirements. The centralized, decentralized, and mixed algorithms are developed to acquire knowledge. In the centralized algorithm, a central unit (knowledge source) interacts with other knowledge sources sequentially and takes advantage of the relationship between sources to acquire knowledge. In the decentralized algorithm, multiple agents (knowledge sources) interact with each other simultaneously to acquire knowledge. The centralized algorithm has a broad view of the knowledge network, i.e., the central unit is aware of knowledge possession of other sources and network topology but lacks parallelism in knowledge acquisition. The decentralized algorithm has a narrow view of the knowledge network, i.e., each knowledge source is aware of knowledge possession of directly connected sources but enables parallelism for knowledge acquisition. The mixed algorithm is partly centralized and partly decentralized. At the high level, the mixed algorithm is centralized where a central unit interacts with other knowledge sources sequentially. At the low level, the mixed algorithm is decentralized in the sense that multiple knowledge sources interact with each other simultaneously to acquire knowledge after a group of knowledge sources is reached by the central unit. A reverse structure of the mixed algorithm is possible, i.e., the mixed algorithm is decentralized at high level and centralized at low level. One of the critical tasks is to determine the optimal ratio between centralized and decentralized parts in the mixed algorithm. Two tasks must be performed for algorithms development: agent modeling and design of performance measures. In agent modeling, network dynamics analysis focuses on modeling interactions between knowledge sources. Parameters for agents such as communication and knowledge acquisition times can be obtained from data and modeled with statistical distributions for analytical study. Communication protocols are developed to model the interactions between sources for both analytical and experimental studies. Performance measures including algorithm response time,
118
7 Adaptive Algorithms for Knowledge Acquisition Over Complex …
knowledge completeness, coverage ability, and error rate are designed to align the algorithms with knowledge acquisition objectives. Phase III is performance evaluation and improvement. Performance of algorithms is evaluated with analytical and experimental studies. Network dynamics analysis in this stage studies network properties and algorithms’ performance when knowledge sources change (join, leave, knowledge update), relationships between knowledge sources change (connect, disconnect, relationship update), or interactions between knowledge sources change (time requirements or communication protocol changes). Algorithms are coupled with knowledge networks to improve their performance and are evaluated after each improvement which stops when the performance is acceptable or reaches the optimum. The phase produces a complex knowledge network theory and a set of algorithms with best performance for acquiring the required knowledge. Extensive analytical studies are performed to evaluate the performance of the algorithms over knowledge networks using complex network theory and statistical tools. In complex network theory, analytical models have been developed for random networks, scale-free networks, and Bose-Einstein condensation networks. Analysis of algorithms over these networks naturally integrate the analytical models with appropriate statistical tools. Analytical models for hierarchical networks need to be developed for supply chains and/or connected knowledge groups. Simulation software such as AutoMod, Arena, or Extend can be used to develop experiments to validate analytical results. Various assumptions must be made to facilitate analytical studies. Simulation experiments are used to evaluate the performance of algorithms when the assumptions are relaxed. The performance of algorithms may be improved based on analytical and simulation results. Another important task in this phase is to analyze network dynamics when (a) knowledge sources join or leave networks, or update knowledge; (b) knowledge sources become connected, disconnected, or the relationships between knowledge sources change; and (c) the interactions between knowledge sources change, including time requirements and protocols changes. These changes may affect the performance of the algorithms and must be studied analytically and with experiments for the identification of best network-adaptive algorithms. For instance, a networkadaptive centralized algorithm may start knowledge acquisition from a knowledge source that is connected to the largest number of knowledge sources in a SFN, and then move to the next knowledge source that is connected to the second largest number of knowledge sources. In summary, the three phases answer the important question of how distributed knowledge can be effectively obtained from a complex knowledge network. This development process uses data from different types of knowledge networks and develop a complex knowledge network theory and centralized, decentralized, and mixed algorithms for effective knowledge acquisition over different knowledge networks.
References
119
References 1. Albert R, Jeong H, Barabasi AL (1999) Internet: diameter of the World-Wide Web. Nature 401(6749):130–131 2. Angeles Serrano M, De Los Rios P (2007) Interfaces and the edge percolation map of random directed networks. Phys Rev E Stat Nonlnear Soft Matter Phys 76(5):56–121 3. Barabasi AL, Albert R (1999) Emergence of scaling in random networks. Science 286(5439):509–512 4. Barkai N, Leibler S (2000) Circadian clocks limited by noise. Nature 403:267–268 5. Bianconi G, Barabasi AL (2001) Bose-Einstein condensation in complex networks. Phys Rev Lett 86(24):5632–5635 6. Bianconi G, Barabasi AL (2001) Competition and multiscaling in evolving networks. Europhys Lett 54(4):436–442 7. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J (2000) Graph structure in the Web. Comput Netw 33(1):309–320 8. Chen XW, Nof SY (2007) Error detection and prediction algorithms: application in robotics. J Intell Rob Syst 48(2):225–252 9. Chen XW, Nof SY (2010) A decentralized conflict and error detection and prediction model. Int J Prod Res 48(16):4829–4843 10. Chen XW (2009) Prognostics and diagnostics of conflicts and errors with prevention and detection logic. Ph.D. Dissertation, Purdue University, West Lafayette, Indiana, USA 11. Clarke EM, Grumberg O, Peled DA (2000) Model checking. The MIT Press 12. Da Fontoura Costa L (2006) Learning about knowledge: A complex network approach. Phys Rev E Stat Nonlinear Soft Matter Phys 74(2):026103(1–11) 13. Erd˝os P, Renyi A (1959) On random graphs. Publicationes Mathematicae Debrecen 6:290–291 14. Erd˝os P, Renyi A (1960) On the evolution of random graphs. Magyar Tud Akad Mat Kutato Int Kozl 5:17–61 15. Erd˝os P, Renyi A (1961) On the strength of connectedness of a random graph. Acta mathematica Academiae Scientiarum Hungaricae 12:261–267 16. Lee TI et al (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 298:799–804 17. Liebowitz J (1999) Knowledge management handbook. CRC Press 18. Ljung L (1999) System identification: theory for the user. Prentice Hall, New Jersey, USA 19. Newman MEJ, Strogatz SH, Watts DJ (2001) “Random graphs with arbitrary degree distributions and their applications. Phys Rev E Stat Nonlinear Soft Matter Phys 64(2):0261181– 02611817 20. Newman MEJ, Barabasi AL, Watts DJ (2006) The structure and dynamics of networks. Princeton University Press, Princeton, N.J. 21. Pal SK, Talwar V, Mitra P (2002) Web mining in soft computing framework: relevance, state of the art and future directions. IEEE Trans Neural Networks 13(5):1163–1177 22. DE Price DJ, S., (1965) Networks of scientific papers. Science 149:510–515 23. Ras ZW, Zemankova M (eds) (1991) Methodologies for intelligent systems: 6th international symposium. ISMIS ’91, Charlotte, North Carolina, USA, October 16–19 24. Ras ZW, Zemankova M (eds) (1994) Methodologies for intelligent systems: 8th international symposium. ISMIS ’94, Charlotte, North Carolina, USA, October 16–19 25. Shen-Orr SS, Milo R, Mangan S, Alon U (2002) Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet 31:64–68 26. Solomonoff R, Rapoport A (1951) Connectivity of random nets. Bull Math Biophys 13:107–117 27. Sowa JF (2000) Knowledge representation. Brooks/Cole, CA, USA 28. Tu F, Pattipati K, Deb S, Malepati VN (2002) Multiple fault diagnosis in graph-based systems. FL, United States, The International Society for Optical Engineering, Orlando
120
7 Adaptive Algorithms for Knowledge Acquisition Over Complex …
29. Tu F, Pattipati KR, Deb S, Malepati VN (2003) Computationally efficient algorithms for multiple fault diagnosis in large graph-based systems. IEEE Trans Syst Man Cybern Part A Syst Hum 33(1):73–85 30. Zak DE, Gonye GE, Schwaber JS, Doyle FJ III (2003) Importance of input perturbations and stochastic gene expression in the reverse engineering of genetic regulatory networks: insights from an identifiability analysis of an in-silico network. Genome Res 13:2396–2405
Index
A Airport capacity, 91 Alternating current, 50
B Biological network, 112 Bose-Einstein condensation network, 3, 81, 82
F First-request-first-form, 86 Fittest node, 86
H Health insurance claim denial, 40 Homogeneity, 9
I Interdependent critical infrastructure, 17 C Choke point, 92 Collaboration protocol, 26 Complex network theory, 39 Conflict, 44, 45 Congestion, 92 Connected vehicle, 79
K Knowledge acquisition, 109, 110 Knowledge network, 115
L Link, 1, 2 D Dedicated short-range communications, 81 Diffusion scale, 5 Diffusion speed, 4 Direct current, 50 Distributed power generation system, 49
E Eigenvector centrality, 12 Electrical power grid, 49 Erd˝os-Rényi random network, 2 Error, 44, 45
M Multi-center traffic management advisor, 97
N Network conductance, 7 Networked constraint, 23 Network identification, 117 Network metric, 4 Network science model, 1, 2 Node, 1, 2 Node metric, 11
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2022 X. W. Chen, Network Science Models for Data Analytics Automation, Automation, Collaboration, & E-Services 9, https://doi.org/10.1007/978-3-030-96470-2
121
122 O On-board equipment, 79
P Phase transition, 12, 13 Photovoltaic, 49 Portfolio theory, 20 Public health network, 36
R Railroad network, 88, 89 Roadside equipment, 79
Index S Scale-free network, 2, 3 Small-world network, 2, 3 St. Louis Lambert International Airport, 103 Structural search, 39
V Virtual oscillator control, 51
W Washington D.C., 57 Water distribution system, 55