240 3 1MB
English Pages VIII, 101 [102] Year 2020
SPRINGER BRIEFS IN OPTIMIZATION
V. A. Kalyagin A. P. Koldanov P. A. Koldanov P. M. Pardalos
Statistical Analysis of Graph Structures in Random Variable Networks
SpringerBriefs in Optimization Series Editors Sergiy Butenko, Texas A & M University, College Station, TX, USA Mirjam Dür, University of Trier, Trier, Germany Panos M. Pardalos, University of Florida, Gainesville, FL, USA János D. Pintér, Lehigh University, Bethlehem, PA, USA Stephen M. Robinson, University of Wisconsin-Madison, Madison, WI, USA Tamás Terlaky, Lehigh University, Bethlehem, PA, USA My T. Thai , University of Florida, Gainesville, FL, USA
SpringerBriefs present concise summaries of cutting-edge research and practical applications across a wide spectrum of fields. Featuring compact volumes of 50 to 125 pages, the series covers a range of content from professional to academic. Briefs are characterized by fast, global electronic dissemination, standard publishing contracts, standardized manuscript preparation and formatting guidelines, and expedited production schedules. Typical topics might include: • A timely report of state-of-the art techniques • A bridge between new research results, as published in journal articles, and a contextual literature review • A snapshot of a hot or emerging topic • An in-depth case study • A presentation of core concepts that students must understand in order to make independent contributions SpringerBriefs in Optimization showcase algorithmic and theoretical techniques, case studies, and applications within the broad-based field of optimization. Manuscripts related to the ever-growing applications of optimization in applied mathematics, engineering, medicine, economics, and other applied sciences are encouraged. Titles from this series are indexed by Web of Science, Mathematical Reviews, and zbMATH.
More information about this series at http://www.springer.com/series/8918
V. A. Kalyagin • A. P. Koldanov P. A. Koldanov • P. M. Pardalos
Statistical Analysis of Graph Structures in Random Variable Networks
123
V. A. Kalyagin Laboratory of Algorithms and Technologies for Networks Analysis National Research University Higher School of Economics Nizhny Novgorod, Russia
A. P. Koldanov Laboratory of Algorithms and Technologies for Networks Analysis National Research University Higher School of Economics Nizhny Novgorod, Russia
P. A. Koldanov Laboratory of Algorithms and Technologies for Networks Analysis National Research University Higher School of Economics Nizhny Novgorod, Russia
P. M. Pardalos Department of Industrial & Systems Engineering University of Florida Gainesville, FL, USA
ISSN 2190-8354 ISSN 2191-575X (electronic) SpringerBriefs in Optimization ISBN 978-3-030-60292-5 ISBN 978-3-030-60293-2 (eBook) https://doi.org/10.1007/978-3-030-60293-2 © The Author(s) 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
Network analysis in our days is a rapidly developing area of analysis of complex systems. A complex system is understood as a network of a large number of interacting elements. Examples of such systems are the stock market, the gene expression network, brain networks, the climate network, the Internet, social networks, and many others. Network analysis of such complex systems involves the construction of a weighted graph, where the nodes are the elements of the system, and the weights of the edges reflect the degree of similarity between the elements. To identify key information in a complex system, one can filter the huge amount of data to obtain representative unweighted subgraphs of the network. Such unweighted subgraphs, called network structures, include, in particular, the maximum spanning tree, the threshold graph, its cliques and independent sets, and other characteristics known in graph and network theory. In this book we study complex systems with elements represented by random variables. Data analysis related with these systems is confronted with statistical errors and uncertainties. The main goal of this book is to study and compare uncertainty of algorithms of network structure identification with applications to market network analysis. For this we introduce a mathematical model of random variable network, define uncertainty of identification procedure through a risk function, discuss random variables networks with different measures of similarity (dependence), and study general statistical properties of identification algorithms. In addition, we introduce a new class of identification algorithms based on a new measure of similarity and prove its robustness in a large class of distributions. This monograph can be used by experts in the field and it can be helpful for graduate and PhD students. The work is supported by the lab LATNA in the framework of Basic Research Program at the National Research University Higher School of Economics and by RRFI grant 18-07-00524. Nizhny Novgorod, Russia Nizhny Novgorod, Russia Nizhny Novgorod, Russia Gainesville, FL, USA
Valery A. Kalyagin Alexander P. Koldanov Petr A. Koldanov Panos M. Pardalos v
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Network Structures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Market Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Gene Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Related Type of Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 1 2 2 3 5
2
Random Variable Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Basic Definitions and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Distributions and Measures of Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Relations Between Different Networks and Network Structures . . . . . . 14 2.4 Partial Correlation Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3
Network Structure Identification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Threshold Graph Identification: Multiple Testing Algorithms . . . . . . . . . 3.1.1 Simultaneous Inference. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Holm Step Down Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.3 Hochberg Step-up Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.4 Benjamini-Hochberg Procedure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.5 Individual Test Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Concentration Graph Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Maximum Spanning Tree Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Example of MST Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
21 21 22 22 23 24 26 31 32 34
4
Uncertainty of Network Structure Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Multiple Decision Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Loss and Risk Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Additive Loss and Risk Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39 39 41 44
vii
viii
5
6
Contents
Robustness of Network Structure Identification . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Concept of Robustness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Robust Network Structure Identification in Sign Similarity Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Robust Network Structure Identification in Correlation Networks . . . .
47 47
Optimality of Network Structure Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Concept of Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 W-Unbiasedness for Additive Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Optimal Concentration Graph Identification in Partial Correlation Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Gaussian Graphical Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 GGMS Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Uniformly Most Powerful Unbiased Tests of Neyman Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.4 Uniformly Most Powerful Unbiased Test for Conditional Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.5 Sample Partial Correlation Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.6 Optimal Multiple Decision Procedures . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Optimal Threshold Graph Identification in Pearson Correlation Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Optimal Threshold Graph Identification in Sign Similarity Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61 61 62
52 58
64 64 64 65 66 68 71 75 79
7
Applications to Market Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Market Network Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Measures of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Numerical Experiments Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Statistical Uncertainty of MST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Statistical Uncertainty of PMFG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Statistical Uncertainty of MG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Statistical Uncertainty of MC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Statistical Uncertainty of MIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
85 85 87 89 90 90 91 92 93
8
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Chapter 1
Introduction
Abstract Network analysis is widely used in modern data mining techniques. In this book, we consider networks where the nodes of a network are associated with random variables and edges reflect some kind of similarity among them. Statistical analysis of this type of networks involves uncertainty. This aspect is not well covered in existing literature. The main goal of the book is to develop a general approach to handle uncertainty of network structure identification. This approach allows to study general statistical properties (unbiasedness, optimality) of different identification algorithms and to design a new class of robust identification algorithms. Large area of applications varies from market network analysis to gene network analysis.
1.1 Network Structures Consider a complex system with N elements. Network model of a complex system is a complete weighted graph with N nodes where the weights of edges reflect the degree of interaction between elements of the system. To identify key information in a complex system, one can reduce the complete weighted graph to simpler graph models. We will call such graph models network structures. There is a variety of network structures considered in the literature: concentration graph, threshold (market) graph, cliques and independent sets of the threshold graph, maximum spanning tree, planar maximally filtered graph, and others. Concentration graph is an unweighted graph obtained from the complete weighted graph by a simple filtration procedure. An edge is included in the concentration graph if and only if its weight in the complete weighted graph is nonzero. In a similar way, an edge is included in the threshold graph if and only if its weight is larger than a given threshold. Clique in a graph is a set of nodes such that every two nodes in the set are connected by an edge. Independent set is a set of nodes in a graph without edges between them. Maximum spanning tree in a weighted graph is the spanning tree of maximal total weight. Planar maximally filtered graph is the planar subgraph of a complete weighted graph of maximal total weight.
© The Author(s) 2020 V. A. Kalyagin et al., Statistical Analysis of Graph Structures in Random Variable Networks, SpringerBriefs in Optimization, https://doi.org/10.1007/978-3-030-60293-2_1
1
2
1 Introduction
The concentration graph gives information about the topology of pairwise connections in a complex system. The family of threshold graphs gives information about the variation of topology of pairwise connections with respect to the variable threshold. Cliques in the threshold graph are sets of closely connected elements of a complex system. Independent sets in the threshold graph are sets of nonconnected elements of a complex system. Maximum spanning tree and planar maximally filtered graph allow to detect a hierarchical clusters structure in a complex system.
1.2 Market Networks Network approach has become very popular in the stock market analysis. Different network structures are considered. Maximum spanning tree (MST) was used in [66] to describe the hierarchical structure in USA stock market. This approach was further developed in [73] for portfolio analysis and in [74] for clustering in the stock market. Planar maximally filtered graph (PMFG) was introduced in [87], and it was used in [83] to develop a new clustering technique for the stock market. MST for different stock markets was investigated in [7, 8, 24, 26, 48, 69, 92]. Investigation of stock markets is using another popular network structure, the threshold or market graph, which was started in [5] and developed in [4, 6]. It was shown that cliques and independent sets of a threshold graph contain useful information on the stock market. Different stock markets using threshold (market) graph technique are investigated in [14, 27, 30, 39, 42, 71, 89]. Other network characteristics of the stock market attracted important attention in the literature too. The most influential stocks related with the stock index are investigated in [22, 23, 33, 81, 85]. Connections with random matrix theory were studied in [46, 70, 72, 76, 94]. Clustering and dynamics of a market network was investigated in a large number of publications, see for example [28, 40, 52, 53, 80, 84]. Big data aspects for market networks are developed in [31, 50, 51]. Network model for interbank connections was considered in [63]. Different measures of similarity for stock market networks were considered in [47, 82, 93, 95]. Review on the subject and large bibliography is presented in [67]. Most of publications are related with numerical algorithms and economic interpretations of obtained results. Much less attention is paid to uncertainty of obtained results generated by the stochastic nature of the market. The new approach to this problem is proposed in [44].
1.3 Gene Networks According to Anderson [1], the use of graphical diagrams in genetics goes back to the work of the geneticist Sewall Wright (1921, 1934). In our days, “ ‘a graphical model in statistics is a visual diagram in which observable variables are identified
1.4 Uncertainty
3
with points (vertices or nodes) connected by edges and an associated family of probability distributions satisfying some independence specified by the visual pattern’ ” (see [1]). Fundamental probabilistic theory of Gaussian graphical models was developed in [59] where conditional independence is used for visualization. Some applications of graphical models for genetic analysis are presented in [58]. One of the important structures in graphical models is the concentration graph for the network where the weights of edges are given by partial correlations. Such a network is usually called gene expression network. In practice for gene expression network it is important to identify the concentration graph from observations. This problem is called graphical model selection problem. Different graphical model selection (identification of concentration graph) algorithms were developed in the literature. Detailed review on the topics is presented in [17]. Another type of gene network is so-called gene co-expression network. The concept of gene co-expression network was introduced in [12] and it is actively used in bioscience. An important structure for the gene co-expression network is the threshold graph for different measures of similarity (Pearson’s correlation, Mutual Information, Spearman’s rank correlation, and others). Weighted gene coexpression network analysis (WGCNA) was introduced in [96]. WGCNA is a tool for constructing and analyzing weighted gene co-expression networks with the scale-free topology. Detailed review on the subject can be found in [38]. Most of publications on gene networks are related with biological applications and interpretation of obtained results. Much less attention is paid to uncertainty of obtained results generated by stochastic nature of the network.
1.4 Uncertainty To handle uncertainty, we introduce a concept of Random Variable Network (RVN), define a true network structure, and consider identification algorithms as statistical procedures. Then we define uncertainty by a risk function. This allows us to compare uncertainty of different identification procedures and study its general statistical properties, such as unbiasedness, optimality, and robustness (distribution free risk functions). Random variable network is a pair (X, γ ), where X = (X1 , X2 , . . . , XN ) is a random vector and γ is a measure of similarity (association) between a pair of random variables. For Gaussian Pearson correlation network, the vector X has multivariate Gaussian distribution and the measure of similarity γ is the Pearson correlation. In the same way, one can define the Student Pearson correlation network, the Gaussian partial correlation network, and so on. The random variable network generates a network model, a complete weighted graph with N nodes, where the weight of edge (i, j ) is given by γi,j = γ (Xi , Xj ), i = j , i, j = 1, 2, . . . , N. Note that we consider the symmetric pairwise measures only such that γi,j = γj,i . We will call this complete weighted graph true network model. The network structure in the true network model will be called true network structure.
4
1 Introduction
We show in the book that different random variable networks can generate the same true network model and the same true network structures. This is the case, for example, where two different random vectors have elliptically contoured distributions and the measure of similarity γ is the Pearson correlation. Moreover, different random variable networks can generate functionally related true network models. This is the case, for example, where X has a given multivariate Gaussian distribution and two measures of similarity are Pearson and tau-Kendall correlations. This result is important for the adequate choice of measures of similarity for practical problems. In practice, it is important to identify a true network structure from observations. We call this problem network structure identification problem. Any identification algorithm in this setting can be considered as statistical procedure which is a map from the sample space to the set of graphs with specific properties. For example, the threshold graph identification procedure is a map from the sample space to the set of simple unweighted graphs with N nodes, and MST identification procedure is a map from the sample space to the set of spanning trees with N nodes. We call sample structure the network structure obtained by an identification procedure. Network structure identification uncertainty is defined by a difference between true and sample network structure. This difference generates a loss from a false decision. Expected value of the loss function is called risk function [62, 91]. We will use the risk function to measure uncertainty of network structure identification procedures. In general, the loss function can be associated with different distances between two graphs (true and sample network structures) such as symmetric difference, mutual information, K-L divergence, and others. The network structure identification problem can be considered as multiple testing problem of individual hypotheses about edges. Two types of errors of identification arise: Type I error is the false inclusion of an edge in the structure (edge is absent in the true structure but it is present in the sample structure), Type II error is false exclusion of an edge from the structure (edge is present in the true structure, but it is absent in the sample structure). Well-known quality characteristics of multiple hypotheses testing procedures such as FWER (Family Wise Error Rate = probability of at least one Type I error) and FDR (False Discovery Rate) are particular case of the risk function for appropriate choice of the losses. In terms of machine learning, the network structure identification problem is a binary classification problem. Each edge in the network model is classified in one of two classes: class Yes, if the edge is included in the true network structure, and class No if the edge is not included in the true network structure. In this setting, Type I error is associated with False Positive decision, and Type II error is associated with False Negative decision. Well-known characteristics of binary classification such as TPR (True Positive Rate) and FPR (False Positive Rate) are a particular case of the risk function for appropriate choice of the losses. In our opinion, it is natural in network structure identification to consider the so called additive losses, i.e., the loss of the false decision about network structure is a sum of losses of the false decisions about individual edges. We show in the book that under some additional conditions, the risk function for additive losses is a linear combination of expected value of the numbers of Type I (FP) and Type II
1.5 Related Type of Networks
5
(FN) errors. For this risk function, we study general statistical properties of known identification procedures. In particular, we show that the statistical procedure for the concentration graph identification based on individual partial correlation tests in Gaussian partial correlation network is optimal in the class of unbiased multiple testing procedures. In practice the distribution of random vector X is unknown. Therefore, it is important to construct robust (distribution free risk function) network structure identification procedures. Such a procedure has a risk function which does not depend on distribution from a large class. We introduce in the book a new class of identification algorithms based on a new measure of similarity and prove its robustness (distribution free risk function) in the class of elliptically contoured distributions. This result can be applied in a wide area of practical applications.
1.5 Related Type of Networks There are different types of networks, related with this book. In particular, functional brain networks and climate networks are closely connected with networks, considered in the book. In brain network one considers symmetrical measures of statistical association or functional connectivity – such as correlations, coherence, and mutual information – to construct undirected graphs and study their properties. A survey on this topics is presented in [11], see also [56]. In climate network, one uses a correlation between climate observations at different points and studies a threshold graph generated by this network. More detailed information can be found in [86]. Other types of networks are presented in the recent surveys [10, 97]. In the present book, we develop a general approach to study statistical properties of network structure identification algorithms and to measure their uncertainty. Most applications are given to market networks, but this approach can be developed for other networks too.
Chapter 2
Random Variable Networks
Abstract In this chapter, we give basic definitions related to random variable networks. After rigorous definitions of a random variable network and a network model, we give definitions of specific network structures: maximum spanning tree, planar maximally filtered graph, concentration graph, threshold graph, maximum clique, and maximum independent set in the threshold graph. All definitions are illustrated by examples. Then we consider a large class of network models generated by different distributions and different measures of similarity. For distributions, we use an important class of elliptical distributions. For pairwise measures of similarity, we consider Pearson correlation, Kruskal correlation, sign similarity, Fechner correlation, Kendall and Spearman correlations. As a main result of the chapter, we establish connections between network structures in different random variable network models. In addition, we define the partial correlation network and establish its properties for elliptical distributions.
2.1 Basic Definitions and Notations Random variables network is a pair (X, γ ), where X = (X1 , . . . , XN ) is a random vector, and γ is a pairwise measure of similarity (dependence, association,. . . ) between random variables. One can consider different random variable networks associated with different distributions of the random vector X and different measures of similarity γ . For example, the Gaussian Pearson correlation network is the random variable network, where X has a multivariate Gaussian distribution and γ is the Pearson correlation. On the same way one can consider the Gaussian partial correlation network, the Gaussian Kendall correlation network, the Student Pearson correlation network, and so on. The random variables network generates a network model. Network model for random variable network (X, γ ) is the complete weighted graph with N nodes (V , Γ ), where V = {1, 2, . . . , N } is the set of nodes, Γ = (γi,j ) is the matrix of weights, γi,j = γ (Xi , Xj ). Network structure in the network model (V , Γ ) is an unweighted graph (U, E), where U ⊂ V , E is the set of edges between nodes in © The Author(s) 2020 V. A. Kalyagin et al., Statistical Analysis of Graph Structures in Random Variable Networks, SpringerBriefs in Optimization, https://doi.org/10.1007/978-3-030-60293-2_2
7
8
2 Random Variable Networks
U . In this book, we consider the following network structures: maximum spanning tree (MST), planar maximally filtered graph (PMFG), concentration graph (CG), threshold graph (TG), maximum clique (MC), and maximum independent set (MIS) in the threshold graph. The spanning tree in the network model (V , Γ ) is a connected graph (network structure) (V , E) without cycles. Weight of the spanning tree (V , E) is the sum of weights of its edges (i,j )∈E γi,j . Maximum spanning tree (MST) is the spanning tree with maximal weight. Natural extension of maximum spanning tree is the planar maximally filtered graph (PMFG), planar connected graph (V , E) of maximal weight (see [29] for planar graph definition). There are different algorithms to find MST or PMFG in a weighted graph. In this book, we use well known Kruskal algorithm for MST and Kruskal type algorithm for PMFG. Both algorithms have a polynomial computational complexity. The concentration graph in the graph model (V , Γ ) is the network structure (V , E), where edge (i, j ) ∈ E if and only if γi,j = 0. Threshold graph in the network model (V , Γ ) is the network structure (V , E), where (i, j ) ∈ E if and only if γi,j > γ0 and γ0 is a given threshold. Depending on the value of the threshold γ0 , threshold graph is varying from complete unweighted graph to the graph with isolated vertices. Clique in a threshold graph G = (V , E) is the set of nodes U, U ⊂ V such that (i, j ) ∈ E, ∀i, j ∈ U, i = j . Independent set in a threshold graph G = (V , E) is the set of nodes U, U ⊂ V such that (i, j ) ∈ / E, ∀i, j ∈ U, i = j . Maximum clique and maximum independent set problems are known to be NP-hard. In our computations, we use fast exact algorithm by Carraghan and Pardalos [15]. All these defined structures are popular in applications. MST is widely used in stock market network analysis. PMFG was used in market network analysis for cluster structure detection. Concentration graph gives information about a dependence structure in the network model (V , Γ ). Family of threshold graphs gives information about the variation of topology of pairwise connections with respect to the variable threshold. It was observed [6] that for some threshold values the market graph has a scale free property, i.e., vertex degree distribution in the market graph follows a power law. Cliques in the threshold graph are sets of closely connected elements of the network model (V , Γ ). For some markets (e.g., Russian market), maximum cliques are shown to be the most influential stocks of the market [89]. Independent sets in the threshold graph are sets of nonconnected elements of the network model. Maximum independent sets are known to be useful for portfolio optimization [43]. Maximum spanning tree and planar maximally filtered graph allow to detect a hierarchical clusters structure in the network model [88]. Example 2.1 Let us illustrate the introduced network structures by the following example. Consider network model with 10 nodes V = {1, 2, . . . , 10}, and matrix Γ given by matrix below. The network structures are given by Figs. 2.1, 2.2, 2.3, 2.4 and 2.5. The Fig. 2.1 represents the maximum spanning tree. One can observe two clusters in MST (1, 2, 3, 4, 10 and 5, 6, 7, 8, 9) with the centers 9 and 10 connected by an edge. The Fig. 2.2 represents the planar maximally filtered
2.1 Basic Definitions and Notations
9
Fig. 2.1 Maximum spanning tree for the network model of the example 2.1
Fig. 2.2 Planar maximally filtered graph for the network model of the example 2.1
Fig. 2.3 Threshold graph for the network model of the example 2.1. Threshold γ0 = 0.3
graph. The structure of PMFG is more complicated than MST. The filtered graph contains more information about the structure of the considered network model, new connections appear. There are 5 new connections inside clusters and 10 new connections between clusters.
10
2 Random Variable Networks
Fig. 2.4 Threshold graph for the network model of the example 2.1. Threshold γ0 = 0.55
Fig. 2.5 Threshold graph for the network model of the example 2.1. Threshold γ0 = 0.7
1 2
10
3 9
4 8 5 7
⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
1 2 3 4 5 6 7 8 9 10
1 1.0000 0.7220 0.4681 0.4809 0.6209 0.5380 0.6252 0.6285 0.7786 0.7909
2 0.7220 1.0000 0.4395 0.5979 0.6381 0.5725 0.6666 0.6266 0.8583 0.8640
3 0.4681 0.4395 1.0000 0.3432 0.3468 0.2740 0.4090 0.4016 0.4615 0.4832
4 0.4809 0.5979 0.3432 1.0000 0.4518 0.4460 0.4635 0.4940 0.6447 0.6601
5 0.6209 0.6381 0.3468 0.4518 1.0000 0.5640 0.5994 0.5369 0.7170 0.7136
6 0.5380 0.5725 0.2740 0.4460 0.5640 1.0000 0.4969 0.4775 0.6439 0.6242
7 0.6252 0.6666 0.4090 0.4635 0.5994 0.4969 1.0000 0.6098 0.7161 0.7158
6
8 0.6285 0.6266 0.4016 0.4940 0.5369 0.4775 0.6098 1.0000 0.6805 0.6748
9 0.7786 0.8583 0.4615 0.6447 0.7170 0.6439 0.7161 0.6805 1.0000 0.9523
⎞ 10 0.7909 ⎟ ⎟ ⎟ 0.8640 ⎟ ⎟ 0.4832 ⎟ ⎟ 0.6601 ⎟ ⎟ 0.7136 ⎟ ⎟ 0.6242 ⎟ ⎟ ⎟ 0.7158 ⎟ ⎟ 0.6748 ⎟ ⎟ 0.9523 ⎠ 1.0000
2.2 Distributions and Measures of Similarity
11
The Figs. 2.3, 2.4, and 2.5 represent the threshold graphs constructed for the thresholds γ0 = 0.3; 0.55; 0.7. For γ0 = 0.3, this graph is almost complete, only edge (3, 6) is absent. There are two maximum cliques: {1,2,3,4,5,7,8,9,10} and {1,2,4,5,6,7,8,9,10}. The maximum independent set is {3,6}. For γ0 = 0.55, there are two maximum cliques with 6 vertices. One of them {1,2,5,7,9,10} has the maximal weight. There are four maximum independent sets with 4 vertices. One of them {3, 4, 6, 7} has minimal weight. For γ0 = 0.7, the threshold graph has only 10 edges. There is one maximum clique with 4 vertices {1,2,9,10}. There are two maximum independent sets with 7 vertices {2,3,4,5,6,7,8}, {1,3,4,5,6,7,8}. Note that the described structures (MST, PMFG, MG, MC, MIS) are unweighted subgraphs of the network model and reflect different aspects of pairwise connections. Remark This network model is taken from USA stock market. The following stocks are considered: A (Agilent Technologies Inc), AA (Alcoa Inc), AAP (Advance Auto Parts Inc), AAPL (Apple Inc), AAWW (Atlas Air Worldwide Holdings Inc), ABAX (Abaxis Inc), ABD (ACCO Brands Corp), ABG (Asbury Automotive Group Inc), ACWI (iShares MSCI ACWI Index Fund), ADX (Adams Express Company). Here γi,j are the sample Pearson correlations between daily stock returns calculated by the 250 observations started from November 2010. Note that for another number of observations, one can obtain another value of sample Pearson correlations and consequently another network structures. This leads to uncertainty in network structure identification. One can construct a huge number of different network models of such type depending on the period of observations and measure of similarity used. The main questions is: how reliable are the results of network structures construction? Is there a stable structure behind these observations? How large is a deviation from this structure due to the stochastic nature of observations? To answer these questions, we propose a new approach based on the notion of random variable network.
2.2 Distributions and Measures of Similarity In this section, we consider more in details different random variable networks (X, γ ) associated with different distributions of the random vector X and different measures of similarity γ . We consider a large class of distributions, elliptically contoured distributions (or simply elliptical distributions), which are known to be useful in many applications [32]. Random vector X belongs to the class of elliptically contoured distributions if its density function has the form [1]: 1
f (x; μ, Λ) = |Λ|− 2 g{(x − μ) Λ−1 (x − μ)}
(2.1)
where Λ = (λi,j )i,j =1,2,...,N is positive definite symmetric matrix, g(x) ≥ 0, and
12
2 Random Variable Networks
∞
−∞
...
∞ −∞
g(y y)dy1 dy2 · · · dyN = 1
This class includes in particular multivariate Gaussian distribution fGauss (x) =
1
1
(2π )p/2 |Λ|
1 2
e− 2 (x−μ) Λ
−1 (x−μ)
and multivariate Student distribution with ν degree of freedom fStudent(x) =
Γ
Γ
ν 2
ν+N
1 2 |Λ|− 2 N/2 N/2 ν π
(x − μ) Λ−1 (x − μ) 1+ ν
− ν+N 2
Here Γ (λ) is the well-known Gamma function given by
+∞
Γ (λ) =
t λ−1 e−t dt
0
The class of elliptical distributions is a natural generalization of the class of Gaussian distributions. Many properties of Gaussian distributions have analogs for elliptical distributions, but this class is much larger, in particular it includes distributions with heavy tails. For detailed investigation of elliptical distributions, see [1, 32]. It is known that if E(X) exists then E(X) = μ. One important property of elliptical distribution X is the connection between covariance matrix of the vector X and the matrix Λ. Namely, if covariance matrix exists one has σi,j = Cov(Xi , Xj ) = C · λi,j
(2.2)
where 1
C=
2π 2 N Γ ( 12 N)
+∞
r N +1 g(r 2 )dr
0
In particular, for Gaussian distribution one has Cov(Xi , Xj ) = λi,j . For multivariate Student distribution with ν degree of freedom (ν > 2), one has σi,j = ν/(ν −1)λi,j . In this section, we discuss several measures of dependence (similarity) known as measures of association studied by Kruskal [57]. In our opinion, this important paper was not fully appreciated in the literature. Let us consider two-dimensional random vector (X, Y ). The classical Pearson correlation is defined by Cov(X, Y ) γ P (X, Y ) = √ √ Cov(X, X) Cov(Y, Y )
(2.3)
2.2 Distributions and Measures of Similarity
13
Common interpretation of γ P (X, Y ) is a measure of linear dependence between Y and X. There is an old and interesting interpretation of this correlation [45]. Suppose that the structure of X and Y has the form: X = U1 + U2 + . . . + Um + V1 + . . . + Vn Y = U1 + U2 + . . . + Um + W1 + . . . + Wn where U, V , W are all mutually uncorrelated with the same variance. Then m γ P (X, Y ) = m+n which is the proportion of common components between X and Y . Pearson correlation is the most popular measure of similarity in network analysis. From the other side, there is a family of measures of similarity (association) between X and Y on the base of probabilities P {(X > x0 and Y > y0 ) or (X < x0 and Y < y0 )} = P {(X − x0 )(Y − y0 ) > 0}, where x0 , y0 are some real numbers [57]. These probabilities have a natural interpretation as the proportion of concordance of deviation of X and Y from x0 and y0 . Different choice of x0 , y0 leads to different measures of similarity. In this book, we emphasize the sign similarity measure of association. This measure is obtained by the following choice of x0 , y0 : x0 = E(X), y0 = E(Y ) γ Sg (X, Y ) = P {(X − E(X))(Y − E(Y )) > 0}
(2.4)
Sign similarity is related with Fechner correlation γ F h (X, Y ) = 2γ Sg (X, Y ) − 1
(2.5)
Sign similarity measure of association is the proportion of concordance of deviation of X and Y from their expected values. For x0 = MedX and y0 = MedY , one has Kruskal correlation γ Kr (X, Y ) = 2P {(X − MedX)(Y − MedY ) > 0} − 1
(2.6)
Following [57], we notice that γ Kr (X, Y ) remains unchanged by monotone functional transformations of the coordinates: if instead of X and Y we consider f (X) and g(Y ), where f and g are both monotone strictly increasing (decreasing) then γ Kr (X, Y ) is unchanged. In order to avoid arbitrariness in the choice of x0 , y0 , one can consider the difference between two independent random vectors (X1 , Y1 ), (X2 , Y2 ) with the same distribution as (X, Y ). This leads to a measure of similarity connected with Kendall-τ correlation. Define γ Kd (X, Y ) = 2P {(X1 − X2 )(Y1 − Y2 ) > 0} − 1
(2.7)
14
2 Random Variable Networks
The classical Kendall-τ correlation can be considered as unbiased and consistent estimation of γ Kd (X, Y ). There is an obvious relation between γ Kd (X, Y ), γ Kr (X, Y ) and γ F h (X, Y ): γ Kd (X, Y ) = γ Kr (X1 − X2 , Y1 − Y2 ) = γ F h (X1 − X2 , Y1 − Y2 ) If we consider three independent random vectors (X1 , Y1 ), (X2 , Y2 ), (X3 , Y3 ) with the same distribution as (X, Y ) then one can define the following measure of similarity: γ Sp (X, Y ) = 6P {(X1 − X2 )(Y1 − Y3 ) > 0} − 3
(2.8)
The classical Spearman correlation can be considered as unbiased and consistent estimation of γ Sp (X, Y ). Both measures of similarity γ Kd (X, Y ), γ Sp (X, Y ) also remain unchanged by monotone functional transformations of the coordinates. Note that the considered interpretation (proposed in [57]) of the Kendall-τ correlation and the Spearman correlation is not traditional. In our investigations, it is important to define the Kendall-τ correlation and the Spearman correlation as measures of similarity between two random variables in order to construct true network structures in associated network models. Different distributions and different measures of similarity generate different random variables networks. The choice of the measure of similarity is crucial for network analysis. Traditionally, Pearson correlation is the most used. This measure is appropriate for Gaussian distribution. However, there is no theoretical justification for the use of Pearson correlation for other type of distributions. In this book, we show that for the class of elliptical distributions the sign similarity is more appropriate for network analysis than Pearson correlation.
2.3 Relations Between Different Networks and Network Structures First we investigate the relations between different measures of similarities for two dimensional elliptical distributions. Let (X, Y ) be two dimensional vector with elliptical distribution with density function 1
f (x, y) = det(A)− 2 g(a 1,1 (x − μX )2 + 2a 1,2 (x − μX )(y − μY ) + a 2,2 (y − μY )2 ) (2.9) where A = (ai,j )i=1,2;j =1,2 is positive definite symmetric matrix, and a i,j are the elements of the inverse matrix A−1 = (a i,j )i=1,2;j =1,2 , and
∞
∞
−∞ −∞
g(u2 + v 2 )dudv = 1.
2.3 Relations Between Different Networks and Network Structures
15
The density function f (x, y) is symmetric with respect to the vertical line at value (μX , μY ). It implies that Med(X) = μX , Med(Y ) = μY . If E(X), E(Y ) exist then E(X) = μX = Med(X), E(Y ) = μY = Med(Y ) and Kruskal and Fechner correlations coincide γ Kr (X, Y ) = γ F h (X, Y ). For simplicity of notations sometimes, we will write γ (X, Y ) = γ if it does not lead to some confusions. In the case, when (X, Y ) has bi-variate Gaussian distribution, one has [57]: γ F h = γ Kr = γ Kd =
2 arcsin(γ P ), π
γ Sg =
1 1 + arcsin(γ P ) 2 π
(2.10)
and γ Sp =
γP 6 arcsin π 2
(2.11)
Formulas (2.10) and (2.11) are well known and can be used to get an interpretation of Pearson correlation in the case of Gaussian distribution. For example, γ P (X, Y ) = 0, 600 means that γ Sg , proportion of concordance of deviation of X and Y from E(X) and E(Y ), is roughly equal to 0, 705. In the general case of bi-variate distribution of random vector (X, Y ), one can only state that the following inequalities are sharp [57]: 1 1 −1 + (1 + γ Kd )2 ≤ γ Sp ≤ (1 + 3γ Kd ) if γ Kd ≤ 0 2 2 1 1 (−1 + 3γ Kd ) ≤ γ Sp ≤ 1 − (1 − γ Kd )2 if γ Kd ≥ 0 2 2 1 1 (1 + γ Kr )2 − 1 ≤ γ Kd ≤ 1 − (1 − γ Kr )2 4 4 3 3 (1 + γ Kr )3 − 1 ≤ γ Sp ≤ 1 − (1 − γ Kr )3 16 16 Note that in general case if X and Y are independent then γ P (X, Y ) = γ Kr (X, Y ) = γ Kd (X, Y ) = γ Sp (X, Y ) = 0 and if in addition E(X) = Med(X), E(Y ) = Med(Y ) then γ F h (X, Y ) = 0. For a given positive definite symmetric matrix A = (ai,j )i=1,2;j =1,2 , let us introduce the class K(A) of random vectors (X, Y ) with the density function given by (2.9) with arbitrary function g. From general property of elliptical distributions, one has (if all covariances exist): γ P (X, Y ) = √
ai,j Cov(X, Y ) =√ √ ai,i aj,j Cov(X, X) Cov(Y, Y )
16
2 Random Variable Networks
Therefore, the Pearson correlations γ P (X, Y ) are the same for any elliptical vector (X, Y ) from the class K(A), such that E(X2 + Y 2 ) < ∞. Now we prove that the relations (2.10) between γ F h , γ Kr , γ Kd , γ P , γ Sg true for Gaussian vector, remain valid for the case where random vector (X, Y ) has bivariate elliptically contoured distribution. For Kendall-τ correlation, it was proved in [65] and [25]. We take a slightly different approach. The main result is Theorem 2.1 Let (X, Y ) be a random vector with density function (2.9), and E(X2 + Y 2 ) < ∞, then γ F h = γ Kr = γ Kd =
2 arcsin(γ P ), π
γ Sg =
1 1 + arcsin(γ P ) 2 π
(2.12)
We split the proof on three lemmas. Lemma 2.1 Let (X, Y ) be random vector from the class K(A). Then γ Sg (X, Y ) = γ Sg (XG , YG )
(2.13)
where (XG , YG ) is the Gaussian vector from the class K(A). Proof Let (X, Y ) be a random vector from the class K(A) with the density function (2.9). Without loss of generality, one can suppose μX = 0, μY = 0. In this case, one has γ Sg (X, Y ) = P (X > 0, Y > 0) + P (X < 0, Y < 0). Write 1
P (X > 0, Y > 0) = det(A)− 2
∞ ∞ 0
1
= det(A)− 2
g(a 1,1 x 2 + 2a 1,2 xy + a 2,2 y 2 )dxdy =
0
∞
0
∞
g((x, y)A−1 (x, y) )dxdy,
0
where B means the transpose of the matrix (or vector) B. Matrix A−1 is symmetric positive definite. Therefore, there exists matrix C = (ci,j )i=1,2;j =1,2 such that
−1
CA
C=
10 . 01
By the change of variables x = ci,i u + cj,i v, y = ci,j u + cj,j v, that is (x, y) = (u, v)C one get det(A)
− 12
∞ ∞
−1
g((x, y)A 0
0
(x, y) )dxdy= det(A)
− 12
g(u2 +v 2 )dudv
det(C) D
2.3 Relations Between Different Networks and Network Structures
17
where D = {ci,i u + cj,i v > 0, ci,j u + cj,j v > 0}. One has (C )−1 C −1 = (C )−1 C A−1 CC −1 = A−1 √ It implies CC = A and det(C) = det(C ) = det(A). The domain D is a cone (angle) with vertex at (0, 0). In polar coordinates (r, φ), it is defined by 0 < r < ∞, φ1 < φ < φ2 , where φ1 , φ2 are defined by the matrix C and does not depend on g. It implies
P (X > 0, Y > 0) =
∞ φ2
g(u + v )dudv = 2
2
0
D
∞
= (φ2 − φ1 ) 0
g(r 2 )rdr =
g(r 2 )rdrdφ =
φ1
(φ2 − φ1 ) 2π
Therefore, P (X > 0, Y > 0) does not depend on g. In the same way one can prove that P (X < 0, Y < 0) does not depend on g. It implies that γ Sg (X, Y ) does not depend on g and takes the same value for all distributions from the class K(Λ). The Lemma follows. Now we prove the relation between Pearson correlation and sign similarity for any elliptical bi-variate vector. Lemma 2.2 For any bi-variate elliptical vector (X, Y ), if E(X2 + Y 2 ) < ∞ then γ Sg (X, Y ) =
1 1 + arcsin(γ P (X, Y )) 2 π
Proof Let (X, Y ) be a random vector with density function (2.9), and (XG , YG ) be the Gaussian vector from the same class K(A). One has γ P (X, Y ) = γ P (XG , YG ),
γ Sg (X, Y ) = γ Sg (XG , YG )
Then 1 1 1 1 γ Sg (X, Y )=γ Sg (XG , YG )= + arcsin(γ P (XG , YG ))= + arcsin(γ P (X, Y )) 2 π 2 π the lemma is proved. Next we prove the relations between Pearson, Fechner, Kruskal, Kendall-τ correlations and sign similarity. Lemma 2.3 For any bi-variate elliptical vector (X, Y ), if E(X2 + Y 2 ) < ∞ then γ F h (X, Y ) = γ Kr (X, Y ) = γ Kd (X, Y ) =
2 arcsin(γ P (X, Y )) π
18
2 Random Variable Networks
and γ Kd (X, Y ) = 2γ Sg (X, Y ) − 1 Proof Let (X1 , Y1 ), (X2 , Y2 ) be independent random vectors with the same distribution as (X, Y ). It can be proved that vector (X1 − X2 , Y1 − Y2 ) is elliptical too. Therefore, γ Sg (X1 − X2 , Y1 − Y2 ) =
1 1 + arcsin(γ P (X1 − X2 , Y1 − Y2 )) 2 π
One has cov(X1 −X2 , Y1 −Y2 )=2cov(X, Y ), var(X1 −X2 )=2var(X), var(Y1 −Y2 )=2var(Y ) It implies γ Kd (X, Y ) = 2γ Sg (X1 − X2 , Y1 − Y2 ) − 1 =
2 arcsin(γ P (X, Y )) π
The theorem is proved. Let K(Λ) be the class of elliptical distributions defined by the density function (2.1) with fixed matrix Λ. It follows from the statements above that network models and network structures for different random variables networks are related if X ∈ K(Λ). Indeed, if X ∈ K(Λ) then the Fechner correlation network model, the Kruskal correlation network model, and the Kendall correlation network model coincide. The same is true for all network structures in these network models. All these network models are related with Pearson correlation network model by monotone transformation of the weights of edges. The same is true for the sign similarity network model. Therefore, MST and PMFG network structures are the same for the following network models: Fechner correlation network model, Kruskal correlation network model, Kendall correlation network model, Pearson correlation network model, and sign similarity network model. Moreover, threshold graph in one model can be obtained as a threshold graph in another model by an appropriate choice of thresholds. The same is true for maximum cliques and maximum independent sets in the threshold graph. Taking this into account the following question is crucial: are there any differences in network structure identification using different correlations, and if so which correlation is more appropriate? In what follows we show that the use of different correlations generates a different uncertainty in identification and discuss the choice of correlation.
2.4 Partial Correlation Network
19
2.4 Partial Correlation Network Let X = (X1 , X2 , . . . , XN ) be a random vector. Consider the conditional distribution of the vector (Xi , Xj ) when other Xk , k = 1, . . . , N ; k = i, k = j being fixed. Correlation between Xi , Xj when other Xk , k = 1, . . . , N ; k = i, k = j being fixed Par . This partial is known as saturated partial correlation and we will denote it by γi,j correlation is used as the measure of similarity in gene expression network analysis [58]. If X has elliptical distribution with the density function (2.1), then there is a connection of partial correlations with the matrix Λ: Par γi,j = −√
λi,j λi,i λj,j
,
where λi,j are the elements of the inverse matrix Λ−1 . Identification of concentration graph by observations in Gaussian partial correlation network is known as the Gaussian Graphical Model Selection Problem [19]. The relations between partial correlations and matrix Λ imply that for all distributions X ∈ K(Λ), the associated network models (V , Γ ) coincide. Therefore, all concentration graphs, and other network structures, are the same in partial correlation network for all distributions X ∈ K(Λ).
Chapter 3
Network Structure Identification Algorithms
Abstract In this chapter we state the problem of network structure identification by observations. We discuss in details the problem of threshold graph identification. We show that this problem can be considered as multiple testing problem and describe different multiple testing algorithms for threshold graph identification. We note that existing practice of market graph identification can be considered in the proposed framework. In the same way, we discuss the problem of identification of concentration graph. In addition, we describe some algorithms for MST and PMFG identification. We use numerical simulations to illustrate described algorithms and discuss the results.
3.1 Threshold Graph Identification: Multiple Testing Algorithms Let (X, γ ) be a random variable network. For a given threshold γ0 , the threshold graph in network model (V , Γ ) is constructed as follows: the edge between two vertices i and j is included in the threshold graph, iff γ (Xi , Xj ) = γi,j > γ0 . We call it true threshold graph. In practice γi,j (true network model) are not known. We have only observations from the distribution of the vector X. The threshold graph identification problem is to identify the true threshold graph from observations. This problem can be considered as multiple testing problem. Consider the set of individual hypotheses: hi,j : γi,j ≤ γ0 vs ki,j : γi,j > γ0 (i, j = 1, . . . , N ; i = j ).
(3.1)
We shall assume that tests for the individual hypotheses are available and have the form 0, Ti,j (x) ≤ ci,j ϕi,j (x) = (3.2) 1, Ti,j (x) > ci,j
© The Author(s) 2020 V. A. Kalyagin et al., Statistical Analysis of Graph Structures in Random Variable Networks, SpringerBriefs in Optimization, https://doi.org/10.1007/978-3-030-60293-2_3
21
22
3 Network Structure Identification Algorithms
where Ti,j (x) are individual tests statistics i, j = 1, 2, . . . , N, i = j . Define the p-value p(i, j ) of the test ϕi,j as p(i, j ) = Pγ0 (Ti,j > ti,j )
(3.3)
where ti,j is the observed value of Ti,j , i.e., ti,j = Ti,j (x), and the probability Pγ0 is calculated by distribution of the statistic Ti,j under condition γi,j = γ0 . In what follows we describe some threshold graph identification algorithms associated with multiple testing procedures.
3.1.1 Simultaneous Inference Calculate p-value p(i, j ) of the test ϕi,j , i, j = 1, . . . , N ; i = j . Fix significance level αi,j for each individual hypothesis hi,j . • If p(i, j ) < αi,j then the hypothesis hi,j is rejected, i.e., the edge (i, j ) is included in the threshold graph. • If p(i, j ) ≥ αi,j then the hypothesis hi,j is accepted, i.e., the edge (i, j ) is not included in the threshold graph. Note that the popular Bonferroni procedure is a particular case of the simultaneous inference if one put αi,j = α/M, where M is the number of individual hypothesis M = N(N − 1)/2, α is number from the interval (0, 1). It is known that for this procedure the probability of at least one false rejection (false edge inclusion) is bounded by α [62].
3.1.2 Holm Step Down Procedure Calculate p-value p(i, j ) of the test ϕi,j , i, j = 1, . . . , N, i = j . Fix α ∈ (0, 1). The step down procedure consists of at most M steps, M = N(N − 1)/2. At each step either one individual hypothesis hi,j is rejected or all remaining hypotheses are accepted. Holm step down procedure is constructed as follows: • Step 1: If min p(i, j ) ≥
α M
then accept all hypotheses hi,j , i, j = 1, 2, . . . , N (all vertices in the threshold graph are isolated, there are no edges), else if min p(i, j ) = p(i1 , j1 )
γ0 ,
i, j = 1, 2, . . . , N, i = j
It is interesting to investigate general statistical properties of described procedures for different networks and general choice of significance level.
3.2 Concentration Graph Identification Let (X, γ ) be a random variable network. The concentration graph in a network model (V , Γ ) is constructed as follows: the edge between two vertices i and j is included in the concentration graph, iff γ (Xi , Xj ) = γi,j = 0. We call it true concentration graph. Concentration graph identification problem is to identify the true concentration graph from observations. This problem can be considered as multiple testing problem too. Consider the set of individual hypotheses: hi,j : γi,j = 0 vs ki,j : γi,j = 0, (i, j = 1, . . . , N ; i = j ).
(3.4)
We shall assume that tests for the individual hypotheses are available and have the symmetric form ϕi,j (x) =
0, |Ti,j (x)| ≤ ci,j 1, |Ti,j (x)| > ci,j
(3.5)
where Ti,j (x) are individual tests statistics i, j = 1, 2, . . . , N, i = j . Define the p-value p(i, j ) of the test ϕi,j as p(i, j ) = P0 (|Ti,j | > |ti,j |)
(3.6)
32
3 Network Structure Identification Algorithms
where ti,j is the observed value of Ti,j , i.e., ti,j = Ti,j (x), and the probability P0 is calculated by distribution of the statistic Ti,j under condition γi,j = 0. One can use the same algorithms (simultaneous inference, Holm, Hochberg, Benjamini-Hochberg) for concentration graph identification as for threshold graph identification. For the concentration graph identification in Pearson, Fechner, Kruskal, Kendall, Spearman, and partial correlation networks one can use the same statistics as for threshold graph identification for γ0 = 0 with the p-values calculated by p(i, j ) = 2[1 − Φ(|ti,j |)] where ti,j is the observed value of associated statistics. Remark Partial correlation is a popular measure of similarity in gene expression network analysis. Identification of the concentration graph is well studied for the Gaussian distribution of the vector X and in this case the identification problem is called Gaussian Graphical Model Selection problem (GGMS). Different algorithms are known for the solution of GGMS problem [19, 20].
3.3 Maximum Spanning Tree Identification Let (X, γ ) be a random variable network, and (V , Γ ) associated network model. The true maximum spanning tree in the complete weighted graph (V , Γ ) can be constructed by any appropriate algorithm, such as Kruskal algorithm, Prim algorithm, and others [29]. In what follows we will use the Kruskal algorithm. Kruskal algorithms To construct the MST, a list of edges is sorted in descending order according to the weight and following the ordered list an edge is added to the MST if and only if it does not create a cycle. In general, MST constructed by Kruskal algorithm is not unique. To avoid complications, we will consider the case where all weights of edges are different. In this case, MST is unique. In practice γi,j (true network model) are not known. We have only observations from the distribution of the vector X. The maximum spanning tree identification problem is to identify the MST from observations. To solve this problem one can use γˆi,j , estimation of γi,j , to construct a sample network model, and then use any algorithm of maximum spanning tree construction (e.g., Kruskal algorithm). In particular, one can use the following estimations for different networks Sample Pearson correlation P γˆi,j
= ri,j =
t (xi (t) − x i )(xj (t) − x j )
t (xi (t) − x i )
2
t (xj (t) − x j )
2
3.3 Maximum Spanning Tree Identification
33
Sample sign similarity 1 Sg 1 Sg Ti,j = Ii,j (t) n n n
Sg
γˆi,j =
t=1
Sample Fechner similarity 1 Fh Ii,j (t) n n
Fh γˆi,j =
t=1
Sample Kruskal correlation 1 Kr = Ii,j (t) n n
Kr γˆi,j
t=1
Sample Kendall correlation Kd γˆi,j =
n n 1 Kd (t, s) Ii,j n(n − 1) t=1 s=1 s = t
Sample Spearman correlation Sp
γˆi,j =
n n n 3 Sp Ii,j (t, s, l) n(n − 1)(n − 2) t=1 s=1 l=1 s = t l = t l = s
Sample partial correlation Par γˆi,j = r i,j = − √
s i,j s i,i s j,j
Here s i,j are the elements of inverse sample covariance matrix S −1 . Remark MST is widely used in market network analysis. Number of publications is growing [67]. However, the question of uncertainty of MST identification is not well studied.
34
3 Network Structure Identification Algorithms
3.4 Example of MST Identification We take the example of the Sect. 2.1. Consider the class of elliptical distributions of the class K(Λ), where matrix Λ is defined in the Example 2.1. True MST is given by the Fig. 2.1. True MST is the same for all distributions from the class K(Λ). By Cayley formula [13] total number of possible MST is 108 . To reduce the number of variants, we introduce the following simple topological characteristic of MST: vector of degrees of vertices of MST ordered in ascending order. With respect to the characteristic, there are 22 possible topologically different MST, which are listed below: {(1, 1, 1, 1, 1, 1, 1, 1, 1, 9), (1, 1, 1, 1, 1, 1, 1, 1, 4, 6), (1, 1, 1, 1, 1, 1, 1, 2, 3, 6), (1, 1, 1, 1, 1, 1, 1, 3, 4, 4), (1, 1, 1, 1, 1, 1, 2, 2, 4, 4), (1, 1, 1, 1, 1, 2, 2, 2, 2, 5), (1, 1, 1, 1, 2, 2, 2, 2, 2, 4), (1, 1, 2, 2, 2, 2, 2, 2, 2, 2)}
(1, 1, 1, 1, 1, 1, 1, 1, 2, 8), (1, 1, 1, 1, 1, 1, 1, 1, 5, 5), (1, 1, 1, 1, 1, 1, 1, 2, 4, 5), (1, 1, 1, 1, 1, 1, 2, 2, 2, 6), (1, 1, 1, 1, 1, 1, 2, 3, 3, 4), (1, 1, 1, 1, 1, 2, 2, 2, 3, 4), (1, 1, 1, 1, 2, 2, 2, 2, 3, 3),
(1, 1, 1, 1, 1, 1, 1, 1, 3, 7), (1, 1, 1, 1, 1, 1, 1, 2, 2, 7), (1, 1, 1, 1, 1, 1, 1, 3, 3, 5), (1, 1, 1, 1, 1, 1, 2, 2, 3, 5), (1, 1, 1, 1, 1, 1, 3, 3, 3, 3), (1, 1, 1, 1, 1, 2, 2, 3, 3, 3), (1, 1, 1, 2, 2, 2, 2, 2, 2, 3),
True MST corresponds to the following vector (1, 1, 1, 1, 1, 1, 1, 1, 5, 5). We conduct the following experiments: • For a given distribution of the vector X from K(Λ), generate sample of the size n from X • Calculate the estimations γˆi,j of the true edge weights in the network • Apply Kruskal algorithm to construct MST for the sample network (V , Γˆ ), Γˆ = (γˆi,j ) • Calculate the topological characteristic of obtained MST • Repeat the experiment S times, and calculate the frequencies of appearance of each topological characteristic We chose two distributions from the class K(Λ): Gaussian distribution and Student distribution with 3 degrees of freedom. Sample sizes are going from n = 5 to n = 50,000. Number of replications is S = 1000. The results are presented in the Tables 3.1, 3.2, 3.3, and 3.4. Tables 3.1 and 3.2 present the results for Pearson correlation network. One can see that for Gaussian distribution and n = 5, 10, 20, the true MST almost not appear, most popular are MST with the following topological characteristics (1, 1, 1, 1, 2, 2, 2, 2, 3, 3), (1, 1, 1, 1, 1, 2, 2, 2, 3, 4). The situation is much worse for Student distribution. Note that for a small number of observations the hubs of the true MST (vertices 9 and 10) are not identified. The true MST structure starts to be identified only from n = 50,000. Tables 3.3 and 3.4 present the results for sign similarity network. One can see that in this case the picture is much more stable with respect to distribution than for Pearson correlation network. As we
3.4 Example of MST Identification
35
Table 3.1 Observed frequencies of degree vectors for 1000 simulations. Pearson correlation network, normal distribution Degree vec./no. of observations (1, 1, 1, 1, 1, 1, 1, 1, 1, 9) (1, 1, 1, 1, 1, 1, 1, 1, 2, 8) (1, 1, 1, 1, 1, 1, 1, 1, 3, 7) (1, 1, 1, 1, 1, 1, 1, 1, 4, 6) (1, 1, 1, 1, 1, 1, 1, 1, 5, 5) (1, 1, 1, 1, 1, 1, 1, 2, 2, 7) (1, 1, 1, 1, 1, 1, 1, 2, 3, 6) (1, 1, 1, 1, 1, 1, 1, 2, 4, 5) (1, 1, 1, 1, 1, 1, 1, 3, 3, 5) (1, 1, 1, 1, 1, 1, 1, 3, 4, 4) (1, 1, 1, 1, 1, 1, 2, 2, 2, 6) (1, 1, 1, 1, 1, 1, 2, 2, 3, 5) (1, 1, 1, 1, 1, 1, 2, 2, 4, 4) (1, 1, 1, 1, 1, 1, 2, 3, 3, 4) (1, 1, 1, 1, 1, 1, 3, 3, 3, 3) (1, 1, 1, 1, 1, 2, 2, 2, 2, 5) (1, 1, 1, 1, 1, 2, 2, 2, 3, 4) (1, 1, 1, 1, 1, 2, 2, 3, 3, 3) (1, 1, 1, 1, 2, 2, 2, 2, 2, 4) (1, 1, 1, 1, 2, 2, 2, 2, 3, 3) (1, 1, 1, 2, 2, 2, 2, 2, 2, 3) (1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
5
10
20
0 0 0 0 0 0 0 0 0 2 0 1 4 16 5 3 107 171 61 410 210 10
0 0 0 0 0 2 5 6 9 8 13 61 39 107 8 40 236 166 70 180 49 1
0 1 1 2 5 16 38 48 25 27 32 125 84 119 9 55 211 92 42 62 6 0
100 2 35 72 128 84 67 180 217 18 17 35 67 44 14 0 7 12 1 0 0 0 0
1000 0 47 164 339 255 7 45 143 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10,000 0 0 28 430 540 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50,000 0 0 0 378 622 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
will show later this fact has a strong theoretical justification. At the same time MST identification in sign similarity network is more correct for Student distribution that in Pearson correlation network.
36
3 Network Structure Identification Algorithms
Table 3.2 Observed frequencies of degree vectors for 1000 simulations. Pearson correlation network, Student distribution with 3 degree of freedom Degree vec./no. of observations (1, 1, 1, 1, 1, 1, 1, 1, 1, 9) (1, 1, 1, 1, 1, 1, 1, 1, 2, 8) (1, 1, 1, 1, 1, 1, 1, 1, 3, 7) (1, 1, 1, 1, 1, 1, 1, 1, 4, 6) (1, 1, 1, 1, 1, 1, 1, 1, 5, 5) (1, 1, 1, 1, 1, 1, 1, 2, 2, 7) (1, 1, 1, 1, 1, 1, 1, 2, 3, 6) (1, 1, 1, 1, 1, 1, 1, 2, 4, 5) (1, 1, 1, 1, 1, 1, 1, 3, 3, 5) (1, 1, 1, 1, 1, 1, 1, 3, 4, 4) (1, 1, 1, 1, 1, 1, 2, 2, 2, 6) (1, 1, 1, 1, 1, 1, 2, 2, 3, 5) (1, 1, 1, 1, 1, 1, 2, 2, 4, 4) (1, 1, 1, 1, 1, 1, 2, 3, 3, 4) (1, 1, 1, 1, 1, 1, 3, 3, 3, 3) (1, 1, 1, 1, 1, 2, 2, 2, 2, 5) (1, 1, 1, 1, 1, 2, 2, 2, 3, 4) (1, 1, 1, 1, 1, 2, 2, 3, 3, 3) (1, 1, 1, 1, 2, 2, 2, 2, 2, 4) (1, 1, 1, 1, 2, 2, 2, 2, 3, 3) (1, 1, 1, 2, 2, 2, 2, 2, 2, 3) (1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
5
10
20
0 0 0 0 0 0 0 0 0 0 0 1 3 11 3 5 82 121 47 406 285 36
0 0 0 1 0 0 1 2 1 3 4 38 36 58 13 21 217 146 84 278 94 3
0 0 0 1 0 5 8 21 12 11 19 87 50 106 8 50 241 116 67 151 43 4
100 1 8 9 24 13 30 97 96 36 31 56 170 91 94 3 39 113 43 15 26 5 0
1000 2 35 85 148 95 44 147 226 20 19 35 54 37 16 0 7 20 5 4 1 0 0
10,000 1 36 156 279 210 20 89 179 7 5 1 6 4 2 0 0 2 1 1 1 0 0
50,000 0 32 116 351 318 7 37 116 0 1 4 5 4 5 1 0 0 0 0 3 0 0
3.4 Example of MST Identification
37
Table 3.3 Observed frequencies of degree vectors for 1000 simulations. Sign similarity network, normal distribution Degree vec./no. of observations (1, 1, 1, 1, 1, 1, 1, 1, 1, 9) (1, 1, 1, 1, 1, 1, 1, 1, 2, 8) (1, 1, 1, 1, 1, 1, 1, 1, 3, 7) (1, 1, 1, 1, 1, 1, 1, 1, 4, 6) (1, 1, 1, 1, 1, 1, 1, 1, 5, 5) (1, 1, 1, 1, 1, 1, 1, 2, 2, 7) (1, 1, 1, 1, 1, 1, 1, 2, 3, 6) (1, 1, 1, 1, 1, 1, 1, 2, 4, 5) (1, 1, 1, 1, 1, 1, 1, 3, 3, 5) (1, 1, 1, 1, 1, 1, 1, 3, 4, 4) (1, 1, 1, 1, 1, 1, 2, 2, 2, 6) (1, 1, 1, 1, 1, 1, 2, 2, 3, 5) (1, 1, 1, 1, 1, 1, 2, 2, 4, 4) (1, 1, 1, 1, 1, 1, 2, 3, 3, 4) (1, 1, 1, 1, 1, 1, 3, 3, 3, 3) (1, 1, 1, 1, 1, 2, 2, 2, 2, 5) (1, 1, 1, 1, 1, 2, 2, 2, 3, 4) (1, 1, 1, 1, 1, 2, 2, 3, 3, 3) (1, 1, 1, 1, 2, 2, 2, 2, 2, 4) (1, 1, 1, 1, 2, 2, 2, 2, 3, 3) (1, 1, 1, 2, 2, 2, 2, 2, 2, 3) (1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
5 25 63 53 61 35 60 108 109 44 31 38 106 40 81 0 17 63 37 8 17 4 0
10
20
3 11 14 12 6 33 53 57 22 21 63 111 70 101 5 44 189 61 34 69 20 1
0 8 4 6 1 24 42 34 15 12 59 121 63 99 3 70 199 83 42 94 21 0
100 1 17 17 27 14 54 106 106 28 24 65 157 88 78 6 41 116 27 10 18 0 0
1000 7 80 161 225 123 55 120 194 8 6 6 8 6 0 0 0 1 0 0 0 0 0
10,000 0 31 188 370 307 5 28 71 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50,000 0 5 104 431 456 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0
38
3 Network Structure Identification Algorithms
Table 3.4 Observed frequencies of degree vectors for 1000 simulations. Sign similarity network, Student distribution with 3 degree of freedom Degree vec./no. of observations (1, 1, 1, 1, 1, 1, 1, 1, 1, 9) (1, 1, 1, 1, 1, 1, 1, 1, 2, 8) (1, 1, 1, 1, 1, 1, 1, 1, 3, 7) (1, 1, 1, 1, 1, 1, 1, 1, 4, 6) (1, 1, 1, 1, 1, 1, 1, 1, 5, 5) (1, 1, 1, 1, 1, 1, 1, 2, 2, 7) (1, 1, 1, 1, 1, 1, 1, 2, 3, 6) (1, 1, 1, 1, 1, 1, 1, 2, 4, 5) (1, 1, 1, 1, 1, 1, 1, 3, 3, 5) (1, 1, 1, 1, 1, 1, 1, 3, 4, 4) (1, 1, 1, 1, 1, 1, 2, 2, 2, 6) (1, 1, 1, 1, 1, 1, 2, 2, 3, 5) (1, 1, 1, 1, 1, 1, 2, 2, 4, 4) (1, 1, 1, 1, 1, 1, 2, 3, 3, 4) (1, 1, 1, 1, 1, 1, 3, 3, 3, 3) (1, 1, 1, 1, 1, 2, 2, 2, 2, 5) (1, 1, 1, 1, 1, 2, 2, 2, 3, 4) (1, 1, 1, 1, 1, 2, 2, 3, 3, 3) (1, 1, 1, 1, 2, 2, 2, 2, 2, 4) (1, 1, 1, 1, 2, 2, 2, 2, 3, 3) (1, 1, 1, 2, 2, 2, 2, 2, 2, 3) (1, 1, 2, 2, 2, 2, 2, 2, 2, 2)
5 24 72 67 48 38 66 105 110 34 39 34 110 40 76 6 20 66 21 7 15 1 1
10 3 11 8 19 2 28 53 48 22 17 63 138 60 102 6 58 171 74 33 68 14 2
20 2 5 3 3 5 25 42 40 23 13 46 133 57 101 5 55 193 90 47 94 17 1
100 0 14 14 28 16 57 84 108 32 30 75 173 72 75 2 36 119 39 14 12 0 0
1000 12 48 133 237 132 46 125 219 5 7 4 16 12 3 0 1 0 0 0 0 0 0
10,000 0 32 165 418 282 2 33 68 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50,000 0 7 95 431 461 0 0 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Chapter 4
Uncertainty of Network Structure Identification
Abstract In this chapter, we develop a theoretical foundation to study uncertainty of network structure identification algorithms. We suggest to consider identification algorithms as multiple decision statistical procedures. In this framework, uncertainty is connected with a loss function and it is defined as expected value of the total loss, known as the risk function. We discuss, from this point of view, different measures of quality of multiple hypotheses testing and binary classification. We emphasize the class of additive loss functions as the most appropriate for network structure identification. We show that under some additional conditions, the risk function for additive losses is a linear combination of expected value of the numbers of Type I (False Positive) and Type II (False Negative) errors.
4.1 Multiple Decision Approach In this section, we consider the network structure identification problem in the framework of decision theory [60, 91]. According to this approach, we specify the decision procedure (decision rule) and risk function. Let (X, γ ) be a random variable network, (V , Γ ) be the network model generated by (X, γ ). Assume that vector X has distribution from some class K. Each distribution Pθ from the class K is associated with some parameter θ from the parameter space Ω. The network structure is an unweighted graph (U, E), where U ⊂ V , and E is a set of edges between nodes in U . For better understanding, we develop our approach for the networks structures with U = V (threshold graph, concentration graph, maximum spanning tree, planar maximally filtered graph). The case U = V can be treated in a similar way. For the case U = V , network structure is defined by an adjacency matrix (N × N ). Define the set G of all N × N symmetric matrices G = (gi,j ) with gi,j ∈ {0, 1}, i, j = 1, 2, . . . , N, gi,i = 0, i = 1, 2, . . . , N . Matrices G ∈ G represent adjacency matrices of all simple undirected graphs with N vertices. The total number of matrices in G is equal to L = 2M with M = N(N − 1)/2. The network structure
© The Author(s) 2020 V. A. Kalyagin et al., Statistical Analysis of Graph Structures in Random Variable Networks, SpringerBriefs in Optimization, https://doi.org/10.1007/978-3-030-60293-2_4
39
40
4 Uncertainty of Network Structure Identification
identification problem can be formulated as a multiple decision problem of the choice of one from L hypotheses: HG : G is the adjacency matrix of the true network structure
(4.1)
Let x = (xi (t)) ∈ R N ×n be a sample of the size n from the distribution X. The multiple decision statistical procedure δ(x) is a map from the sample space R N ×n to the decision space D = {dG , G ∈ G }, where the decision dG is the acceptance of hypothesis HG , G ∈ G . According to [91], the quality of the statistical procedure is defined by the risk function. Let w(S, Q) be the loss from decision dQ when hypothesis HS is true, i.e., w(HS ; dQ ) = w(S, Q), S, Q ∈ G . We assume that w(S, S) = 0. The risk function is defined by Risk(S, θ ; δ) =
w(S, Q)Pθ (δ(x) = dQ ), θ ∈ ΩS ,
Q∈G
where ΩS is the parametric region corresponding to the hypothesis HS (i.e., the set of distributions such that the true network structure in (V , Γ ) has adjacency matrix S), and Pθ (δ(x) = dQ ) is the probability that decision dQ is taken. The multiple decision statistical procedure can be represented by the matrix ⎛
0 ϕ1,2 (x) ⎜ ϕ2,1 (x) 0 Φ(x) = ⎜ ⎝ ... ... ϕN,1 (x) ϕN,2 (x)
⎞ . . . ϕ1,N (x) . . . ϕ2,N (x) ⎟ ⎟, ... ... ⎠ ... 0
(4.2)
where ϕi,j (x) ∈ {0, 1}. In this case one has δ(x) = dG , iff Φ(x) = G
(4.3)
The value w(S, Φ(x)) is the loss from the decision dΦ , when the true decision is dS . Risk is the expected value of this loss for the fixed S and θ ∈ ΩS . Uncertainty of multiple decision procedure can therefore be measured by the risk. This framework allows to study general statistical properties of network structure identification algorithms. In this book, we are interested in two properties: optimality and robustness. For a given loss function, decision procedure δ is called optimal in the class of decision procedures F if Risk(S, θ ; δ) ≤ Risk(S, θ ; δ ), θ ∈ ΩS , S ∈ G , δ, δ ∈ F
(4.4)
4.2 Loss and Risk Functions
41
To find an optimal procedure, one needs to solve a multiobjective optimization problem. It is possible that this problem does not have a solution in the class F . It is interesting to specify the class of procedures where there is an optimal procedure. In this book, we consider the class of unbiased procedures and describe an optimal procedure in this class for some network structure identification problem. In the general case, one can look for Pareto optimal or admissible procedures. For a given loss function, decision procedure is called distribution free (robust) in the class of distributions K if Risk(S, θ ; δ) = Risk(S, θ ; δ), Pθ , Pθ ∈ K, θ, θ ∈ ΩS
(4.5)
Such procedures are especially important for practical use in the case where there is no enough information about the distribution. In this book, we propose a new type of identification procedures which are robust (distribution free) in the class of elliptical distributions.
4.2 Loss and Risk Functions The multiple decision approach allows to consider in the same framework many measures of uncertainty (error rate) widely used in multiple hypotheses testing and machine learning. In this section, we consider the most used among them. Let S = (si,j ), Q = (qi,j ) be two adjacency matrix from G . Loss function w(S, Q) must be related with the difference between S and Q. Depending on how this difference is measured, one has different measures of uncertainty (error rates). The most simple loss function is wSimple(S,Q) =
1 if S = Q 0 if S = Q
The associated risk is the probability of the false decision Risk(S, θ ; δ) = Pθ (δ(x) = dS ), θ ∈ ΩS . This loss function does not take into account how large is the difference between the matrices, only the fact of the difference is indicated. Moreover, this loss function does not make difference between two types of error: zero is replaced by one (Type I error), or one is replaced by zero (Type II error). However, in practice it can be important. To explain better what follows we introduce two tables. Table 4.1 illustrates Type I and Type II errors for the individual edge (i, j ). Table 4.2 illustrates the difference between S and Q based on the numbers of Type I and Type II errors. Table 4.1 represents all possible cases for different values of si,j and qi,j . Value 0 means that the edge (i, j ) is not included in the corresponding structure, value 1 means that the edge (i, j ) is included in the corresponding structure. We associate
42
4 Uncertainty of Network Structure Identification
Table 4.1 Type I and Type II errors for the edge (i, j ) qi,j \si,j 0 0 1
1
Edge is not included correctly, no error Edge is not included incorrectly, Type II error Edge is included incorrectly, Type I error Edge is included correctly, no error
Table 4.2 Numbers of Type I and Type II errors Q\S 0 in Q 1 in Q Total
0 in S TN FP Number of 0 in S
1 in S FN TP Number of 1 in S
Total Number of 0 in Q Number of 1 in Q N (N − 1)/2
the case si,j = 0, qi,j = 1 with Type I error (false edge inclusion), and we associate the case si,j = 1, qi,j = 0 with Type II error (false edge noninclusion). Table 4.2 represents the numbers of Type I errors (False Positive, FP), number of Type II errors (False Negative, FN), and numbers of correct decisions (True Positive, TP and True Negative, TN). In multiple hypotheses testing, important attention is paid to the control of FWER (Family Wise Error Rate). FWER is the probability of at least one Type I error [9, 34, 62]. In the framework of multiple decision approach, it is related with the following loss and risk functions: wFWER(S,Q) =
1 if F P > 0 0 if F P = 0
In this case, one has RiskFWER(S,θ;δ) = Eθ (wFWER(S;δ) ) = Pθ (F P (S; δ) > 0) = FWER These loss and associated risk functions take into account only one type of errors (Type I) and do not take into account the numbers of such errors. It was noted that Bonferroni and Holm procedures control FWER for any S in the following sense: FWER ≤ α. Natural generalization of FWER is k-FWER [61]. To obtain k-FWER, one can define the following loss function: wk−FWER (S, Q) =
1 if F P ≥ k 0 if F P < k
For this loss function, one has Riskk−FWER (S, θ ; δ) = Eθ (wk−FWER (S; δ)) = Pθ (F P (S; δ) ≥ k) = k-FWER It is known that some modifications of Bonferroni and Holm procedures control kFWER for any S [61]. In multiple hypotheses testing, Type II errors are often taken
4.2 Loss and Risk Functions
43
into account using a generalization of the classical notion of the power of test. We consider Conjunctive Power (CPOWER) and Disjunctive Power (DPOWER) [9]. Conjunctive power is analogous to FWER for the Type II errors. It is defined as probability of absence of Type II error (all edges in S are included in the Q). It can be obtained with the use of the following loss function: wCPOWER(S,Q) =
1 if F N > 0 0 if F N = 0
The associated risk function can be calculated as RiskCPOWER(S,θ;δ) = Eθ (wCPOWER(S;δ) ) = Pθ (F N (S; δ) > 0) = = 1 − Pθ (F N (S; δ) = 0) = 1 − CPOWER These loss and risk functions take into account only one type of errors (Type II) and do not take into account the numbers of such errors. Alternatively, Disjunctive Power is defined as the probability of at least one correct edge inclusion (at least one edge in S is included in Q). Define the loss function wDPOWER(S,Q) =
1 if T P = 0 0 if T P > 0
In this case, one has RiskDPOWER(S,θ;δ) = Eθ (wDPOWER(S;δ) ) = Pθ (T P (S; δ) = 0) = = 1 − Pθ (T P (S; δ) > 0) = 1 − DPOWER Considered measures of uncertainty, FWER, k-FWER, CPOWER, DPOWER do not take into account the numbers of associated errors. Next uncertainty measures take into account the numbers of errors. We start with Per-Family Error Rate (PFER) and Per-Comparison Error Rate (PCER). PFER is defined as the expected number of Type I errors (FP). Associated loss function can be defined as wP F ER = F P . PCER is defined by P CER = P F ER/M, M = N(N − 1)/2. Average Power (AVE) is defined by E(T P /(F N + T P )). Associated loss function can be defined as wAVE = F N/(T P + F N). In this case, one has RiskAVE = 1 − AVE. In binary classification, RiskAVE is related with False Negative Rate, and 1−RiskAVE is related with Sensitivity or Recall. All these uncertainty characteristics take into account only one type of errors. Both type and numbers of errors are taken into account in False Discovery Rate (FDR) and Accuracy. FDR is defined as FDR = E(F P /(F P + T P )). Associated loss function can be defined as wF DR = F P /(F P + T P ). One has RiskF DR = FDR. Accuracy (ACC), or proportion of correct decisions, is defined as
44
4 Uncertainty of Network Structure Identification
ACC = E(T P +T N)/M, M = N(N −1)/2. It can be defined by the following loss function wACC = (F P +F N)/M, M = N(N −1)/2. One has RiskACC = 1−ACC. There are many other uncertainty characteristics which are used in the literature. All of them are related with TN, FN, TP, FP. For example, ROC curve is defined in the coordinates False Positive Rate (F P /(F P + T N)) and True Positive Rate (T P /(T P + F N)), and Recall-Precision curve is defined in the coordinates Recall (T P /(T P + F N)), Precision (T P /(T P + F P )). FDR is one of the most popular measures of uncertainty both in multiple testing and machine learning. However, this measure has some drawbacks. For example, if the true structure is sparse (the number of zeros in S is large) then FDR can be close to 1 (bad classifier), but accuracy of both class classification can be close to 1. Similarly, in the case when the true structure is dense (the number of ones in S is large), FDR can be close to zero, but accuracy of both class classification can be close to 0.
4.3 Additive Loss and Risk Functions In this book, we suggest to attract attention to additive loss functions. Define the individual loss for the edge (i, j ) as ⎧ ⎨ ai,j , if si,j = 0, qi,j = 1, wi,j (S, Q) = bi,j , if si,j = 1, qi,j = 0, ⎩ 0, otherwise ai,j is the loss from the false inclusion of edge (i, j ) in the structure Q, and bi,j , is the loss from the false noninclusion of the edge (i, j ) in the structure Q, i, j = 1, 2, . . . , N; i = j . Following Lehmann [60], we call the loss function w(S, Q) additive if w(S, Q) =
N N
(4.6)
wi,j (S, Q)
i=1 j =1
In this case, the total loss from the misclassification of S is equal to the sum of losses from the misclassification of individual edges: w(S, Q) =
{i,j :si,j =0;qi,j =1}
ai,j +
bi,j
{i,j :si,j =1;qi,j =0}
This loss function takes into account both types of individual errors and highlights importance of each type of error. The following statement is true:
4.3 Additive Loss and Risk Functions
45
Theorem 4.1 Let the loss function w be defined by (4.6), and ai,j = a, bi,j = b, i = j , i, j = 1, 2, . . . , N . Then Risk(S, θ ; δ) = aEθ [YI (S, δ)] + bEθ [YI I (S, δ)], θ ∈ ΩS where YI (S, δ) = F P (S; δ), YI I (S, δ) = F N(S; δ) are the numbers of Type I and Type II errors by δ when the state (true decision) is dS . Proof One has Risk(S, θ ; δ) =
w(S, Q)Pθ (δ(x) = dQ ) =
Q∈G
=
[
Q∈G {i,j :si,j =0;qi,j =1}
=
ai,j +
bi,j ]Pθ (δ(x) = dQ ).
{i,j :si,j =1;qi,j =0}
[aYI (S; δ) + bYI I (S; δ)]Pθ (δ(x) = dQ ) = aEθ [YI (S, δ)] + bEθ [YI I (S, δ)]
Q∈G
The theorem is proved. Some important measures of uncertainty described above correspond to additive loss functions. In particular, for a = b = 1/M, M = N(N − 1)/2, this loss function is related with ACC (Accuracy). Note that by the choice of a, b one can take into account the numbers of elements in unbalanced classes. For a = 1, bi,j = 0, one get PFER as the associated risk function. Therefore, in our opinion, a risk function associated with additive losses can be well adapted to measure uncertainty of network structures identification algorithms.
Chapter 5
Robustness of Network Structure Identification
Abstract In this chapter, we discuss robustness of network structure identification algorithms. We understand robustness of identification algorithm as the stability of the risk function with respect to the distribution of the vector X from some class of distributions (distribution free property). We show that popular identification algorithms based on sample Pearson correlations are not robust in the class of elliptical distributions. To overcome this issue, we consider the sign similarity network, introduce a new class of network structure identification algorithms, and prove its robustness in the class of elliptical distributions. We show how to use these algorithms to construct robust network structure identification algorithms in other correlation networks.
5.1 Concept of Robustness Let (X, γ ) be a random variable network, (V , Γ ) be the network model generated by (X, γ ). Assume that vector X has distribution from some class K and for all X ∈ K the network models (V , Γ ) are identical. It means that a given network structure (U, E) is the same in all network models (V , Γ ), X ∈ K. Let δ be an identification procedure (identification algorithm) for the network structure (U, E). Let S ∈ G be adjacency matrix corresponding to the network structure (U, E). Consider the loss function w(S, Q). The associated risk function is defined as (see Chap. 4): R(S, θ ; δ) =
w(S, Q)Pθ (δ = dQ ), θ ∈ ΩS
Q∈G
The true network structure S is the same for all X ∈ K, but the quality (risk function) of the identification procedure δ can depend on distribution of X ∈ K. The decision procedure δ is called robust (stable or distribution free) in the class of distributions K if
© The Author(s) 2020 V. A. Kalyagin et al., Statistical Analysis of Graph Structures in Random Variable Networks, SpringerBriefs in Optimization, https://doi.org/10.1007/978-3-030-60293-2_5
47
48
5 Robustness of Network Structure Identification
Risk(S, θ ; δ) = Risk(S, θ ; δ), Pθ , Pθ ∈ K.
(5.1)
Uncertainty of distribution free procedure does not depend on distribution from the class K, network model (V , Γ ) being fixed. Such procedures are especially important for practical use in the case where there is not enough information about the distribution. In this book, we consider the class of elliptical distributions, defined by the density function: 1
f (x; μ, Λ) = |Λ|− 2 g{(x − μ) Λ−1 (x − μ)}
(5.2)
where Λ = (λi,j )i,j =1,2,...,N is positive definite symmetric matrix, g(x) ≥ 0, and
∞
−∞
...
∞ −∞
g(y y)dy1 dy2 · · · dyN = 1
Let K(Λ) be the class of elliptical distributions with fixed matrix Λ. It follows from the Chap. 2 that for X ∈ K(Λ) the network models (V , Γ ) are identical in each random variables network: Pearson correlation network, Fechner correlation network, Kruskal correlation network, Kendall correlation network, sign similarity network, and partial correlation network. Therefore, one can investigate the robustness of network structure identification procedures in all these networks for distributions X ∈ K(Λ). In addition, network models in different random variables networks are related. It implies that an identification procedure in one network can be used as identification procedure for corresponding network structure in other network. Let the measure of similarity γ be fixed. Consider a random variables networks with X ∈ K(Λ). To investigate the robustness of network structure identification algorithms, we compare the risk functions for different distributions X ∈ K(Λ). Variation of the risk function with variation of distribution is an indicator of nonrobustness of the identification procedure. Nonvariation of the risk function with variation of distribution is an indicator of robustness. To prove the robustness, one needs a theoretical arguments; to prove the nonrobustness, it is enough to present an example. Example 5.1 This example shows that Kruskal algorithm based on sample Pearson correlations for MST identification is not robust, i.e., its risk function essentially depends on distribution from the class K(Λ). Consider the Pearson correlation random variables network with N = 10, X ∈ K(Λ), and matrix Λ is given below
5.1 Concept of Robustness
49
Fig. 5.1 Maximum spanning tree for Example 5.1
⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
1 2 3 4 5 6 7 8 9 10
1 1.0000 0.4312 0.3470 0.3421 0.3798 0.3935 0.3212 0.4929 0.3342 0.1341
2 0.4312 1.0000 0.4548 0.3994 0.4889 0.5377 0.6053 0.4597 0.5743 0.1167
3 0.3470 0.4548 1.0000 0.3571 0.4930 0.5322 0.4478 0.4096 0.4618 0.2404
4 0.3420 0.3994 0.3571 1.0000 0.4315 0.4095 0.3741 0.4196 0.3705 0.0676
5 0.3798 0.4889 0.4930 0.4315 1.0000 0.5364 0.4732 0.4202 0.4361 0.1596
6 0.3935 0.5377 0.5322 0.4095 0.5364 1.0000 0.5197 0.5133 0.4958 0.2200
7 0.3212 0.6053 0.4478 0.3741 0.4732 0.5197 1.0000 0.4465 0.4866 0.1512
8 0.4928 0.4597 0.4096 0.4196 0.4202 0.5133 0.4465 1.0000 0.4686 0.1690
9 0.3342 0.5743 0.4618 0.3705 0.4361 0.4958 0.4866 0.4686 1.0000 0.1688
⎞ 10 0.1341 ⎟ ⎟ 0.1167 ⎟ ⎟ 0.2404 ⎟ ⎟ ⎟ 0.0676 ⎟ ⎟ 0.1596 ⎟ ⎟ 0.2200 ⎟ ⎟ 0.1512 ⎟ ⎟ 0.1690 ⎟ ⎟ 0.1688 ⎠ 1.0000
The associated network model (V , Γ ) is given by V = {1, 2, . . . , 10}, Γ = Λ. True MST for this network model is presented in Fig. 5.1 Consider, as in example 3.4, the vector of degrees of vertices of MST ordered in ascending order. True vector is (1, 1, 1, 1, 1, 2, 2, 2, 3, 4). Let us introduce the following loss function w=
1 if the vector of degrees of vertices of S differs from this vector of Q 0 if the vector of degrees of vertices of S is equal to this vector of Q
The associated risk function is the probability of error in the identified vector of degrees of vertices of MST ordered in ascending order. To study the variation of the risk function with variation of distribution, we consider the mixture of two elliptical distributions: f (x) = (1 − )fGauss(x) + fStudent(x) , 0 ≤ ≤ 1
50
5 Robustness of Network Structure Identification
Fig. 5.2 Values of the risk function for different values of mixture parameter ∈ [0, 1] for Example 5.1
where fGauss is the density function for the multivariate Gaussian distribution from the class Gauss(0, Λ), and fStudent is the density function for the multivariate Student distribution from the class Student(0, Λ; ν). We conduct the following experiment: • Generate n = 10 000 observations from the distribution with the density function f (x) (degree of freedom for Student distribution ν = 3). P of Pearson correlations and construct the matrix Γˆ . • Calculate the estimations γˆi,j • Use the Kruskal algorithm to calculate sample MST in the network model (V , Γˆ ) • Calculate the vector of degrees of vertices of sample MST ordered in ascending order. • Compare the vector of degrees of vertices of true MST with the vector of degrees of vertices of sample MST and calculate the value of the loss function w. • Repeat the experiment S = 1000 times, and estimate the value of the risk function. Result are shown on the Fig. 5.2. As one can see, the risk function essentially varies with the variation of from 0 to 1. This means that the Kruskal algorithm for MST identification in Pearson correlation network is not robust.
5.1 Concept of Robustness
51
Remark Correlation matrix for Example 5.1 was calculated from the data of Indian stock market for the year 2011. The correlation matrix was considered as true correlation matrix. The following example gives a motivation to theoretical study of the robustness of network structure identification algorithms in sign similarity network. Example 5.2 Consider the sign similarity networks for the class of distributions K(Λ), where the matrix Λ is taken from the Example 5.1. Consider the Kruskal algorithm for MST identification in sign similarity network. True MST is the same, as in the Example 5.1. Consider the same loss function w=
1 if the vector of degrees of vertices of S differs from this vector of Q 0 if the vector of degrees of vertices of S is equal to this vector of Q
The associated risk function is the probability of error in the identified vector of degrees of vertices of MST ordered in ascending order. To study the variation of the risk function with variation of distribution, we consider the same mixture of two elliptical distributions: f (x) = (1 − )fGauss(x) + fStudent(x) , 0 ≤ ≤ 1 where fGauss is the density function for the multivariate Gaussian distribution from the class Gauss(0, Λ), and fStudent is the density function for the multivariate Student distribution from the class Student(0, Λ; ν). We conduct the similar experiment: • Generate n = 10 000 observations from the distribution with the density function f (x) (degree of freedom for Student distribution ν = 3). Sg • Calculate the estimations γˆi,j of sign similarities and construct the matrix Γˆ . • Use the Kruskal algorithm to calculate sample MST in the network model (V , Γˆ ) • Calculate the vector of degrees of vertices of sample MST ordered in ascending order. • Compare the vector of degrees of vertices of true MST with the vector of degrees of vertices of sample MST and calculate the value of the loss function w. • Repeat the experiment S = 1000 times, and estimate the value of the risk function. Results are shown in Fig. 5.3. As one can see, the risk function is almost stable with the variation of from 0 to 1.
52
5 Robustness of Network Structure Identification
Fig. 5.3 Values of the risk function for different values of mixture parameter ∈ [0, 1] for Example 5.2
5.2 Robust Network Structure Identification in Sign Similarity Network In this section, we prove that the network structure identification algorithms in sign similarity networks described in the Chap. 3 are robust (distribution free) in the class K(Λ) of elliptical distributions. The sign similarity network is a pair (X, γ Sg ), where γ Sg is the probability of sign coincidence for a pair of random variables. Associated network model is (V , Γ ), where Γ = (γi,j ), γi,j = γ Sg (Xi , Xj ) = P ((Xi − E(Xi ))(Xj − E(Xj )) > 0) For X ∈ K(Λ), all network models (V , Γ ) coincide. Therefore, all true network structures coincide too. Network structure identification algorithms in sign similarity network are described in Chap. 3. They are based on the following statistics Sg
Ti,j =
n
Sg
Ii,j (t)
t=1
where Sg Ii,j (t)
=
1, (xi (t) − μi )(xj (t) − μj ) ≥ 0 0, (xi (t) − μi )(xj (t) − μj ) < 0
(5.3)
5.2 Robust Network Structure Identification in SignSimilarity Network
53
We start with two lemmas. Lemma 5.1 Let random vector (X1 , . . . , XN ) has elliptical distribution with density f (x; 0, Λ) = |Λ|−1/2 g(x Λx) Then the probabilities p(i1 , i2 , . . . , pN ) := P (i1 X1 > 0, i2 X2 > 0, . . . , iN XN > 0)
(5.4)
do not depend on the function g for any ik ∈ {−1, 1}, k = 1, 2, . . . , N . Proof One has
1
|Λ|− 2 g(x Λx)dx1 . . . dxN
P (i1 X1 > 0, i2 X2 , . . . , iN XN > 0) = ik xk >0,k=1,2,...,N
(5.5)
Matrix Λ is symmetric positive definite; therefore, there exists a matrix C such that C ΛC = I . Put y = C −1 x. Then x = Cy and
1
|Λ|− 2 g(x Λx)dx1 . . . dxN =
ik xk >0,k=1,2,...,N
g(y y)dy1 . . . dyN
(5.6)
D
where D is given by 0 < ik (ck,1 y1 + ck,2 y2 + . . . + ck,N yN ) < ∞, k = 1, 2, . . . , N
(5.7)
The vector y can be written in polar coordinates as: y1 = r sin(θ1 ) y2 = r cos(θ1 ) sin(θ2 ) y3 = r cos(θ1 ) cos(θ2 ) sin(θ3 ) ... yN −1 = r cos(θ1 ) cos(θ2 ) . . . cos(θN −2 ) sin(θN −1 ) yN = r cos(θ1 ) cos(θ2 ) . . . cos(θN −2 ) cos(θN −1 )
(5.8)
where − π2 ≤ θi ≤ π2 , i = 1 . . . , N − 2; −π ≤ θN −1 ≤ π, 0 ≤ r ≤ ∞ The Jacobian of the transformation (5.8) is r N −1 cosN −2 (θ1 ) cosN −3 (θ2 ) . . . cos(θN −2 ) 1 where D is In polar coordinates region, (5.7) is transformed to the region D × R+ given by (k = 1, 2, . . . , N ):
54
5 Robustness of Network Structure Identification
0 < ik (c11 sin(θ1 ) + . . . + c1N cos(θ1 ) cos(θ2 ) . . . cos(θN −2 ) cos(θN −1 )) < ∞ (5.9) Then p(i1 , i2 , . . . , iN ) can be written as
D
0
=
∞
D
r N −1 cosN −2 (θ1 ) cosN −3 (θ2 ) . . . cos(θN −2 )g(r 2 )drdθ1 . . . dθN −1 =
cosN −2 (θ1 ) cosN −3 (θ2 ) . . . cos(θN −2 )dθ1 . . . dθN −1
∞
r N −1 g(r 2 )dr
0
It is known [1] that
∞
r N −1 g(r 2 )dr =
0
1 C(N )
where C(N) =
π 2
− π2
...
π 2
− π2
π
−π
cosN −2 (θ1 ) cosN −3 (θ2 ) . . . cos(θN −2 )dθ1 . . . dθN −1
The region D is defined by the matrix Λ and does not depend on the function g. Therefore, the probabilities p(i1 , i2 , . . . , iN ) are defined by the matrix Λ and do not depend on the function g. In particular for N = 2 the probabilities P (X1 > 0, X2 > 0), P (X1 > 0, X2 < 0), P (X1 < 0, X2 > 0), P (X1 < 0, X2 < 0) do not depend on the function g. The lemma is proved. Sg
Next lemma shows that joint distribution of statistics Ti,j does not depend on the function g too. Lemma 5.2 Let random vector (X1 , . . . , XN ) has elliptical distribution with density f (x; 0, Λ) = |Λ|−1/2 g(x Λx) Sg
Then joint distribution of the statistics Ti,j (i, j = 1, 2, . . . , N ; i = j ) does not depend on the function g. Sg
Proof Statistic Ti,j can be written as 1 1 + sign(Xi (t))sign(Xj (t)) 2 2 n
Sg
Ti,j =
t=1
It follows from the Lemma 5.1 that joint distribution of the random vector sign(X) = (sign(X1 ), sign(X2 ), . . . , sign(XN )) is defined by the matrix Λ and does not depend on the function g. Random vectors sign(X(t)), t = 1, 2, . . . , n are
5.2 Robust Network Structure Identification in SignSimilarity Network
55
independent and identically distributed. Therefore, the joint distribution of random variables sign(Xi (t)), i = 1, 2, . . . , N, t = 1, 2, . . . , n is defined by the matrix Λ and does not depend on the function g. It implies that joint distribution of statistics Sg Ti,j , i, j = 1, 2, . . . , N; i < j does not depend on the function g. The lemma is proved. It is possible to give an explicit expression for the joint distribution of statistics Sg Ti,j , i, j = 1, 2, . . . , N; i < j using probabilities (5.4). To simplify the notations, let us introduce statistics T1 , T2 , . . . , TM , M = N(N − 1)/2 by 1 Sg Tl = Tk,s , l = l(k, s) = N(k − 1) − k(k − 1) + s − k, k < s, l = 1, 2, . . . , M 2 Then P (T1 = k1 , . . . , TM = kM ) =
n!
js ∈{0,1}
A
q(j1 , j2 , . . . , jM )m(j1 ,j2 ,...,jM ) m(j1 , j2 , . . . , jM )!
where A = {m(j1 , j2 , . . . , jM ), js ∈ {0, 1} :
m(j1 , j2 , . . . , jM ) = n;
js ∈{0,1}
m(j1 , j2 , . . . , jM ) = ks , s = 1, 2, . . . , M}
js =1
and q(j1 , j2 , . . . , jM ) =
p(i1 , i2 , . . . , iN )
B
where B={(i1 , i2 , . . . , iN ), is ∈ {−1, 1} : ik =is , if jl = 1; ik =−is , if jl = 0; l = l(k, s)} In particular, for N = 2 one has P (T1 = k1 ) =
n! q(1)k1 q(0)n−k1 k1 !(n − k1 )!
where q(1) = p(−1, −1) + p(1, 1), q(0) = p(1, −1) + p(−1, 1), q(1) + q(0) = 1 First we prove the robustness in the class K(Λ) of threshold graph identification algorithms in sign similarity networks.
56
5 Robustness of Network Structure Identification
Theorem 5.1 Let Λ be positive definite symmetric matrix, μ be a fixed vector. Then for any loss function, simultaneous inference, Holm, Hochberg, and BenjaminiHochberg threshold graph identification procedures in sign similarity network are robust (distribution free) in the class of elliptical distributions K(Λ). Sg
Proof Let γ0 be a given threshold. p-values of individual tests for threshold graph identification are defined by (see Sect. 3.1.5): Sg
Sg
p Sg (i, j ) = 1 − F (Ti,j ; n, γ0 )
(5.10)
Sg
where F (t; n, γ0 ) is the cumulative distribution function for binomial distribution Sg with parameters (n, γ0 ). It follows from the Lemma 5.2 that the joint distribution of pSg (i, j ) does not depend on the function g. All procedures under consideration are based on two operations: ordering of pSg (i, j ) (which is the same as ordering Sg of Ti,j ), and comparison of pSg (i, j ) with the fixed constants (which is the Sg
same as comparison of Ti,j with another fixed constants). For all procedures, the probabilities P (δ(x) = dQ /HS ) are defined by the joint distribution of Sg pSg (i, j ) (or by the joint distribution of Ti,j ). It implies that these probabilities do not depend on g. More precisely, denote by δ S , δ H , δ H g , δ BH , respectively, Sg simultaneous inference, Holm, Hochberg, and Benjamini-Hochberg γ0 – threshold graph identification procedures. Simultaneous inference One has for δ S procedure: P (δ S (x) = dQ /HS ) = PΛ,g (Φ(x) = Q) = Sg
Sg
Sg
Sg
= PΛ,g (Ti.j > ci,j , for qi,j = 1 and Ti.j ≤ ci,j , for qi,j = 0) = = PΛ (Ti.j > ci,j , for qi,j = 1 and Ti.j ≤ ci,j , for qi,j = 0) where Λ ∈ HS . The theorem for the simultaneous inference procedure follows. Holm procedure Let I = {(i, j ) : qi,j = 1, i, j = 1, 2, . . . , N, i < j }, k = |I |. For the Holm procedure one has P (δ H (x) = dQ /HS ) = σ
PΛ,g (Aσ ∩ Bσ ∩ CI ) =
PΛ (Aσ ∩ Bσ ∩ CI )
σ
where σ is the set of all permutations of the set I and
5.2 Robust Network Structure Identification in SignSimilarity Network
57
Aσ = {x ∈ R N ×n : Tσ (1) (x) ≥ Tσ (2) (x) ≥ · · · ≥ Tσ (k) (x)} Sg
Sg
Sg
Bσ = {x ∈ R N ×n : Tσ (1) (x) > c1H , Tσ (2) (x) > c2H , · · · , Tσ (k) (x) > ckH } Sg
Sg
Sg
H } CI = {x ∈ R N ×n : max Ti,j ≤ ck+1 Sg
(i,j )∈I /
Hochberg procedure Let J = {(i, j ) : qi,j = 0, i, j = 1, 2, . . . , N, i < j }, m = |J |. For the Hochberg procedure one has P (δ H g (x) = dQ /HS ) =
PΛ,g (Dσ ∩ Eσ ∩ FJ ) =
σ
PΛ (Dσ ∩ Eσ ∩ FJ )
σ
where σ is the set of all permutations of the set J and Dσ = {x ∈ R N ×n : Tσ (1) (x) ≤ Tσ (2) (x) ≤ · · · ≤ Tσ (m) (x)} Sg
Sg
Sg
Eσ = {x ∈ R N ×n : Tσ (1) (x) < c1 , Tσ (2) (x) < c2 , · · · , Tσ (m) (x) < cm } Sg
Hg
Sg
Hg
Sg
Hg
FJ = {x ∈ R N ×n : min Ti,j ≥ cm+1 } Sg
Hg
(i,j )∈J /
Benjamini-Hochberg procedure Let J = {(i, j ) : qi,j = 0, i, j 1, 2, . . . , N, i < j }, m = |J |. For the Benjamini-Hochberg procedure one has P (δ BH (x) = dQ /HS ) =
PΛ,g (Dσ ∩ Eσ ∩ FJ ) =
σ
PΛ (Dσ ∩ Eσ ∩ FJ )
σ
where σ is the set of all permutations of the set J and Dσ = {x ∈ R N ×n : Tσ (1) (x) ≤ Tσ (2) (x) ≤ · · · ≤ Tσ (m) (x)} Sg
Sg
Sg
BH } Eσ = {x ∈ R N ×n : Tσ (1) (x) < c1BH , Tσ (2) (x) < c2BH , · · · , Tσ (m) (x) < cm Sg
Sg
Sg
BH } FJ = {x ∈ R N ×n : min Ti,j ≥ cm+1 Sg
(i,j )∈J /
The theorem is proved.
=
58
5 Robustness of Network Structure Identification
Second we prove the robustness in the class K(Λ) of concentration graph identification algorithms in sign similarity networks. Theorem 5.2 Let Λ be positive definite symmetric matrix, μ be a fixed vector. Then for any loss function, simultaneous inference, Holm, Hochberg, and BenjaminiHochberg concentration graph identification procedures in sign similarity network are robust (distribution free) in the class of elliptical distributions K(Λ). Proof The concentration graph in sign similarity network is defined as follows: the Sg edge (i, j ) is included in the concentration graph if and only if γi,j = 12 . The pvalues of individual tests for concentration graph identification are defined by (see Sect. 3.2): p (i, j ) = Sg
Sg
2F (Ti,j ; n, 12 ), Sg
Sg
if Ti,j ≤ Sg
2(1 − F (Ti,j ; n, 12 )), if Ti,j >
n 2 n 2
(5.11)
where F (t; n, 12 ) is the cumulative distribution function for binomial distribution with parameters (n, 12 ). It follows from the Lemma 5.2 that the joint distribution of pSg (i, j ) does not depend on the function g. All procedures under consideration are based, as above, on two operations: ordering of pSg (i, j ), and comparison of pSg (i, j ) with the fixed constants. For all procedures, the probabilities P (δ(x) = dQ /HS ) are defined by the joint distribution of pSg (i, j ). It implies that these probabilities don’t depend on g. The theorem follows. Finally, we prove the robustness in the class K(Λ) of Kruskal MST identification algorithm in sign similarity networks. Theorem 5.3 Let Λ be positive definite symmetric matrix, μ be a fixed vector. Then for any loss function, Kruskal MST identification algorithm in sign similarity network is robust (distribution free) in the class of elliptical distributions K(Λ). Sg
Proof The first step of the Kruskal MST identification algorithm is to order γˆi,j in descending order. Each ordering defines a decision dQ , where Q is the adjacency matrix of the associated MST. Probability of any ordering is defined by the joint Sg distribution of the statistics Ti,j . Therefore, the probabilities P (Kruskal(x) = dQ /HS ) do not depend on the function g. It implies that the risk function is stable with variation of distributions in the class K(Λ), and the theorem follows.
5.3 Robust Network Structure Identification in Correlation Networks Robust algorithms of network structure identification in a sign similarity network can be adapted to construct robust identification algorithms in other correlation
5.3 Robust Network Structure Identification in Correlation Networks
59
networks. In this section, we construct robust identification algorithms in Pearson, Kruskal, Fechner, and Kendall correlation networks. It was shown in the Sect. 2.3 that for X ∈ K(Λ), the network models and network structures for Pearson, Kruskal, Fechner, Kendall, and sign similarity networks are related. In particular, one has Sg
Fh Kr Kd γi,j = γi,j = γi,j = 2γi,j − 1,
Sg
Sg
P γi,j = sin[π(γi,j − 0, 5)] = − cos(π γi,j )
(5.12) Therefore, for X ∈ K(Λ), true concentration graph and true MST in all correlation networks are the same as true concentration graph and true MST in sign similarity network. It implies that any robust identification algorithm in sign similarity network generates a robust identification algorithm in any correlation network. The true threshold graph in any correlation network (Pearson, Kruskal, Fechner, and Kendall correlation networks) is a true threshold graph in sign similarity network with an appropriate choice of the threshold. More precisely, the appropriate choices of the thresholds in sign similarity network in connection with given thresholds in other correlations networks are given by Sg
γ0 =
1 1 1 1 1 Sg (1 + γ0F h ) = (1 + γ0Kr ) = (1 + γ0Kd ), γ0 = + arcsin(γ0P ) 2 2 2 2 π
It implies that any robust identification algorithm for the threshold graph identification in sign similarity network generates a robust identification algorithm in any correlation network.
Chapter 6
Optimality of Network Structure Identification
Abstract In this chapter, we discuss optimality of network structure identification algorithms. We introduce a concept of optimality in the sense of minimization of the risk function. We investigate optimality of identification procedures for two problems: concentration graph identification and threshold graph identification. We restrict our study to the risk functions associated with additive losses. We prove optimality of simultaneous inference for concentration graph identification in Gaussian partial correlation network in the class of unbiased procedures. For threshold graph identification, we prove optimality of simultaneous inference in Gaussian Pearson correlation network in the class of statistical procedures invariant under scale/shift transformations. Finally, we prove optimality of simultaneous inference for threshold graph identification in sign similarity network.
6.1 Concept of Optimality In this section, we discuss the concept of optimality related to multiple decision statistical procedures. Let Ω be the set of parameters. By ΩS we denote the parametric region corresponding to hypothesis HS . For all θ ∈ ΩS , the associated network structures have the same adjacency matrix S. Let S = (si,j ), Q = (qi,j ), S, Q ∈ G . By w(S, Q) we denote the loss from decision dQ when hypothesis HS is true, i.e., w(HS ; dQ ) = w(S, Q), S, Q ∈ G . We assume that w(S, S) = 0. Let S ∈ G . The risk function of δ(x) is defined by Risk(S, θ ; δ) =
w(S, Q)Pθ (δ(x) = dQ ), θ ∈ ΩS ,
Q∈G
where Pθ (δ(x) = dQ ) is the probability that decision dQ is taken.
© The Author(s) 2020 V. A. Kalyagin et al., Statistical Analysis of Graph Structures in Random Variable Networks, SpringerBriefs in Optimization, https://doi.org/10.1007/978-3-030-60293-2_6
61
62
6 Optimality of Network Structure Identification
For a given loss function w, the decision procedure δ is called optimal in the class of decision procedures F (see Sect. 4.1) if Risk(S, θ ; δ) ≤ Risk(S, θ ; δ ), θ ∈ ΩS , S ∈ G , δ, δ ∈ F
(6.1)
To find an optimal procedure, one needs to solve a multiobjective optimization problem. It is possible that this problem has no solution in the class F . Therefore, it is important to specify the class of procedures where an optimal procedure exists. In this book, we consider the classes of unbiased and shift-scale invariant procedures and describe an optimal procedure in these classes for some network structure identification problem. In the general case one can be interested in Pareto optimal or admissible procedures.
6.2 W-Unbiasedness for Additive Loss Function The statistical procedure δ(x) is referred to as w-unbiased [62] if Risk(S, θ ; δ) = Eθ w(S; δ) ≤ Eθ w(S ; δ) = Risk(S , θ ; δ),
(6.2)
for any S, S ∈ G , θ ∈ ΩS . The unbiasedness of statistical procedure δ for a general risk function means that δ comes closer in expectation to the true decision than to any other decision. For an additive loss function (see Sect. 4.3) and for the case where ai,j = a, bi,j = b, the w-unbiasedness of the identification procedure δ means that aEθ [YI (S, δ)] + bEθ [YI I (S, δ)] ≤ aEθ [YI (S , δ)] + bEθ [YI I (S , δ)] for any S, S ∈ G , θ ∈ ΩS . Unbiasedness of multiple testing procedure for an additive loss function implies unbiasedness of individual tests. To show this one can take S, S such that S and S differ only in two positions (i, j ) and (j, i). In this case, one has Risk(S, θ ; δ) = 2Risk(si,j , θ ; ϕi,j ) +
Risk(sk,l , θ ; ϕk.l )
(k,l)=(i,j );(k,l)=(j,i)
and: , θ ; ϕi,j ) + Risk(S , θ ; δ) = 2Risk(si,j
(k,l)=(i,j );(k,l)=(j,i)
Risk(sk,l , θ ; ϕk.l )
6.2 W-Unbiasedness for Additive Loss Function
63
Therefore, , θ ; ϕi,j ). Risk(si,j , θ ; ϕi,j ) ≤ Risk(si,j
One has Risk(si,j , θ, ϕi,j ) =
ai,j Pθ (ϕi,j = 1), θ ∈ ωi,j −1 bi,j Pθ (ϕi,j = 0), θ ∈ ωi,j
and Risk(si,j , θ, ϕi,j )
=
bi,j Pθ (ϕi,j = 0), θ ∈ ωi,j −1 ai,j Pθ (ϕi,j = 1), θ ∈ ωi,j
It implies that ai,j Pθ (ϕi,j = 1) ≤ bi,j Pθ (ϕi,j = 0), if θ ∈ ωi,j and: −1 bi,j Pθ (ϕi,j = 0) ≤ ai,j Pθ (ϕi,j = 1), if θ ∈ ωi,j
It implies usual unbiasedness of individual test ϕi,j [62]: −1 Eθ (ϕi,j ) ≤ αi,j , ∀θ ∈ ωi,j ; Eθ (ϕi,j ) ≥ αi,j , ∀θ ∈ ωi,j
where αi,j =
bi,j ai,j + bi,j
For the case ai,j = a, bi,j = b, by an appropriate choice of the matrix S one can obtain the following bounds for expected numbers of Type I and Type II errors for any unbiased procedure EF P = E(YI ) ≤
b b (T N + F P ), EF N = E(YI I ) ≤ (T P + F N) a+b a+b
where (T N + F P ) is the number of off diagonal zeros in the matrix S, T P + F N is the number of ones in S.
64
6 Optimality of Network Structure Identification
6.3 Optimal Concentration Graph Identification in Partial Correlation Network 6.3.1 Gaussian Graphical Model Selection A Gaussian graphical model graphically represents the dependence structure of a Gaussian random vector. For the undirected graphical model, this dependence structure is associated with simple undirected graph. This graph is given by its adjacency matrix, which is symmetric matrix with {0, 1} entries, where zero means conditional independence and one means conditional dependence of random variables. It has been found to be an effective method in different applied fields such as bioinformatics, error-control codes, speech and language recognition, and information retrieval [41, 90]. One of the main questions in graphical models is how to recover the structure of a Gaussian graph from observations and what are statistical properties of associated algorithms. This problem is called the Gaussian graphical model selection problem (GGMS). A comprehensive survey of different approaches to this problem is given in [17, 20]. One of the first approaches to GGMS for undirected graphs was based on covariance selection procedures [16, 21]. GGMS problem is still popular our days [49, 64, 68, 75, 77, 78]. Several selection algorithms are used for large dimensional graphical models. Some approaches to GGMS are related with multiple hypotheses testing [20]. Measures of quality in multiple testing include the FWER (family wise error rate), k-FWER, FDR (false discovery rate), and FDP (false discovery proportion) [62]. Procedures with control of such type errors for GGMS are investigated in [20]. However, general statistical properties such as unbiasedness and optimality of these procedures are not well known. In this section, we combine the tests of a Neyman structure for individual hypotheses with simultaneous inference and prove that the obtained multiple decision procedure is optimal in the class of unbiased procedures.
6.3.2 GGMS Problem Statement Let X = (X1 , X2 , . . . , XN ) be a random vector with the multivariate Gaussian distribution from N(μ, Σ), where μ = (μ1 , μ2 , . . . , μN ) is the vector of means and Σ = (σi,j ) is the covariance matrix, σi,j = cov(Xi , Xj ), i, j = 1, 2, . . . , N . Let x(t), t = 1, 2, . . . , n be a sample of size n from the distribution of X. We assume that n > N, and that the matrix Σ is nonsingular. The case n < N has a practical interest too [64], but it is not considered. The undirected Gaussian graphical model is an undirected graph with N nodes. The nodes of the graph are associated with the random variables X1 , X2 , . . . , XN , edge (i, j ) is included in the graph if the random variables Xi , Xj are conditionally dependent [1, 59].
6.3 Optimal Concentration Graph Identification in Partial Correlation Network
65
The Gaussian graphical model selection problem consists of the identification of a graphical model from observations. The partial correlation ρ i,j of Xi , Xj given Xk , k ∈ N(i, j ) = {1, 2, . . . , N } \ {i, j } is defined as the correlation of Xi , Xj in the conditional distribution of Xi , Xj given Xk , k ∈ N (i, j ). This conditional distribution is Gaussian with the correlation ρ i,j . It implies that the conditional independence of Xi , Xj given Xk , k ∈ N(i, j ) = {1, 2, . . . , N } \ {i, j } is equivalent to the equation ρ i,j = 0. Therefore, the Gaussian graphical model selection is equivalent to simultaneous inference on hypotheses of pairwise conditional independence ρ i,j = 0, i = j , i, j = 1, 2, . . . , N . The problem of pairwise conditional independence testing has the form: hi,j : ρ i,j = 0 vs ki,j : ρ i,j = 0,
i = j, i, j = 1, 2, . . . , N
(6.3)
According to [59], the partial correlation can be written as ρ i,j = − √
σ i,j σ i,i σ j,j
,
where σ i,j are the elements of the inverse matrix Σ −1 = (σ i,j ), known as the concentration or precision matrix of X. Hence, (6.3) is equivalent to hi,j : σ i,j = 0, vs ki,j : σ i,j = 0, i = j, i, j = 1, 2, . . . , N
(6.4)
6.3.3 Uniformly Most Powerful Unbiased Tests of Neyman Structure To construct UMPU test for the problem (6.4), we use a test of Neyman structure for natural parameters of exponential family. Let f (x; θ ) be the density of the exponential family: ⎛ f (x; θ ) = c(θ )exp ⎝
M
⎞ θj Tj (x)⎠ m(x)
(6.5)
j =1
where c(θ ) is a function defined in the parameters space, m(x), Tj (x) are functions defined in the sample space, and Tj (X) are the sufficient statistics for θj , j = 1, . . . , M. Suppose that hypothesis has the form: hj : θj = θj0 vs kj : θj = θj0 , where θj0 is fixed.
(6.6)
66
6 Optimality of Network Structure Identification
The UMPU test for hypotheses (6.6) is (see [62], Ch. 4, theorem 4.4.1): ϕj =
0, if cj (t1 , . . . , tj −1 , tj +1 , . . . , tM ) < tj < cj (t1 , . . . , tj −1 , tj +1 , . . . , tM ) 1, otherwise (6.7)
where ti = Ti (x), i = 1, . . . , M. The constants cj , cj are defined from the equations
cj
cj
f (tj ; θj0 |Ti = ti , i = 1, . . . , M; i = j )dtj = 1 − α
(6.8)
and cj tj f (tj ; θj0 |Ti = ti , i = 1, . . . , M; i = j )dtj + −∞ +∞ + c tj f (tj ; θj0 |Ti = ti , i = 1, . . . , M; i = j )dtj = j +∞ = α −∞ tj f (tj ; θj0 |Ti = ti , i = 1, . . . , M; i = j )dtj
(6.9)
where f (tj ; θj0 |Ti = ti , i = 1, . . . , M; i = j ) is the density of conditional distribution of statistic Tj given Ti = ti , i = 1, 2, . . . , N, i = j , and α is the significance level of the test.
6.3.4 Uniformly Most Powerful Unbiased Test for Conditional Independence Now we construct the UMPU test for testing hypothesis of conditional independence (6.4). Consider statistics 1 (Xk (t) − Xk )(Xl (t) − Xl ), n n
Sk,l =
t=1
Joint distribution of statistics Sk,l , k, l = 1, 2, . . . , N , n > N is given by Wishart density function [1]: [det(σ k,l )]n/2 × [det(sk,l )](n−N −2)/2 × exp[−(1/2) k l sk,l σ k,l ] f ({sk,l }) = (N n/2) 2 × π N (N −1)/4 × Γ (n/2)Γ ((n − 1)/2) · · · Γ ((n − N + 1)/2) if the matrix S = (sk,l ) is positive definite, and f ({sk,l }) = 0 otherwise. It implies that statistics Sk,l are sufficient statistics for natural parameters σ k,l . Wishart density function can be written as:
6.3 Optimal Concentration Graph Identification in Partial Correlation Network
f ({sk,l }) = C({σ k,l }) exp[−σ i,j si,j −
1 2
67
sk,l σ k,l ]m({sk,l })
(k,l)=(i,j );(k,l)=(j,i)
where C({σ k,l }) = c1−1 [det(σ k,l )]n/2 c1 = 2(N n/2) × π N (N −1)/4 × Γ (n/2)Γ ((n − 1)/2) · · · Γ ((n − N + 1)/2) m({sk,l }) = [det(sk,l )](n−N −2)/2 According to (6.7), the UMPU test for hypothesis (6.4) has the form:
({s }) < s 0, if ci,j k,l i,j < ci,j ({sk,l }), (k, l) = (i, j ), (j, i) ({s }), (k, l) = (i, j ), (j, i), 1, if si,j ≤ ci,j ({sk,l }) or si,j ≥ ci,j k,l (6.10) , c are defined from the equations (according to (6.8) where the critical values ci,j i,j and (6.9))
ϕi,j ({sk,l }) =
;c ] [det(sk,l )] I ∩[ci,j i,j
I [det(sk,l )]
] I ∩(−∞;ci,j
+
(n−N −2)/2 ds
i,j
i,j
=1−α
(6.11)
si,j [det(sk,l )](n−N −2)/2 dsi,j +
;+∞) I ∩[ci,j
=α
(n−N −2)/2 ds
si,j [det(sk,l )](n−N −2)/2 dsi,j =
I si,j [det(sk,l )]
(n−N −2)/2 ds
(6.12)
i,j
where I is the interval of values of si,j such that the matrix S = (sk,l ) is positive definite and α is the significance level of the test. Let S = (sk,l ) be positive definite (this is true with probability 1 if n > N). Consider det(sk,l ) as a function of the variable si,j only, when fixing the values of all others {sk,l }. This determinant is a quadratic polynomial of si,j : 2 + bsi,j + c det(sk,l ) = −asi,j
(6.13)
Let K = (n − N − 2)/2. Denote by x1 , x2 (x1 < x2 ) the roots of the equation −ax 2 + bx + c = 0. One has
d f
(ax − bx − c) dx = (−1) a (x2 − x1 ) 2
K
K K
2K+1
d−x1 x2 −x1 f −x1 x2 −x1
uK (1 − u)K du
68
6 Optimality of Network Structure Identification
Therefore, the equation (6.11) takes the form:
c −x1 x2 −x1 c −x1 x2 −x1
1
u (1 − u) du = (1 − α) K
K
uK (1 − u)K du
(6.14)
0
or Γ (2K + 2) Γ (K + 1)Γ (K + 1)
c −x1 x2 −x1 c −x1 x2 −x1
uK (1 − u)K du = (1 − α)
(6.15)
It means that conditional distribution of Si,j when all other Sk,l are fixed, Sk,l = sk,l is the beta distribution Be(K + 1, K + 1). Beta distribution Be(K + 1, K + 1) is symmetric with respect to the point 12 . Therefore, the significance level condition (6.11) and unbiasedness condition (6.12) are satisfied if and only if: c − x1 c − x1 =1− x2 − x1 x2 − x1 Let q be the α2 -quantile of beta distribution Be(K + 1, K + 1), i.e., FBe (q) = α2 . Then thresholds c , c are defined by: c = x1 + (x2 − x1 )q c = x2 − (x2 − x1 )q
(6.16)
Finally, the UMPU test for testing conditional independence of Xi , Xj has the form ⎧ ⎪ as − b ⎪ ⎨ 0, 2q − 1 < i,j 2 < 1 − 2q b2 ϕi,j = (6.17) ⎪ 4 + ac ⎪ ⎩ 1, otherwise where a, b, c are defined in (6.13).
6.3.5 Sample Partial Correlation Test It is known [59] that hypothesis σ i,j = 0 is equivalent to the hypothesis ρ i,j = 0, where ρ i,j is the partial correlation between Xi and Xj given Xk , k ∈ N(i, j ) = {1, 2, . . . , N } \ {i, j }:
6.3 Optimal Concentration Graph Identification in Partial Correlation Network
ρ i,j = − √
σ i,j σ i,i σ j,j
69
−Σ i,j =√ Σ i,i Σ j,j
where for a given matrix A = (ak,l ) we denote by Ai,j the co-factor of the element ai,j . Denote by r i,j sample partial correlation −S i,j r i,j = √ S i,i S j,j where S i,j is the cofactor of the element si,j in the matrix S of sample covariances. Well-known sample partial correlation test for testing hypothesis ρ i,j = 0 has the form [1]: ϕi,j =
0, |r i,j | ≤ ci,j 1, |r i,j | > ci,j
(6.18)
where ci,j is (1 − α/2)-quantile of the distribution with the following density function 1 Γ (n − N + 1)/2) (1 − x 2 )(n−N −2)/2 , −1 ≤ x ≤ 1 f (x) = √ π Γ ((n − N)/2) Note that in practical applications the following Fisher transformation is applied: zi,j =
√ 1 + r i,j n ln 2 1 − r i,j
Under condition ρ i,j = 0 statistic Zi,j has asymptotically standard normal distribution. That is why the following test is largely used in applications [19], [20], [18]: ϕi,j =
0, |zi,j | ≤ ci,j 1, |zi,j | > ci,j
(6.19)
where the constant ci,j is (1 − α/2)-quantile of standard normal distribution. The main result of this Section is the following theorem. Theorem 6.1 Sample partial correlation test (6.18) is equivalent to UMPU test (6.17) for testing hypothesis ρ i,j = 0 vs ρ i,j = 0. Proof it is sufficient to prove that asi,j − b2 S i,j = √ b2 S i,i S j,j 4 + ac
(6.20)
70
6 Optimality of Network Structure Identification
To verify this equation, we introduce some notations. Let A = (ak,l ) be an (N × N ) symmetric matrix. Fix i < j , i, j = 1, 2, . . . , N. Denote by A(x) the matrix obtained from A by replacing the elements ai,j and aj,i by x. Denote by Ai,j (x) the co-factor of the element (i, j ) in the matrix A(x). Then the following statement is true Lemma 6.1 One has [detA(x)] = −2Ai,j (x). Proof of the Lemma one has from the general Laplace decomposition of det A(x) by two rows i and j : det(A(x)) = det
+
det
k>j
det
k>i,k=j
ai,i x x aj,j
x ai,k aj,j aj,k
ai,i ai,k x aj,k
A{i,j },{i,j } +
det
k ci,j ⎪ ⎪ 2 ⎪ 1 − r ⎪ i,j ⎨ ⎪ ⎪ ri,j − ρ0 ⎪ ⎪ 0, ≤ ci,j ⎪ ⎪ 2 ⎩ 1 − ri,j
(6.28)
is the uniformly most powerful invariant test with respect to the group G for testing the individual hypothesis hi,j defined by (6.23). Here ci,j is chosen to make the significance level of the test equal to prescribed value αi,j . This means that for any invariant with respect to the group G with E ϕ = α test ϕi,j ρ0 i,j i,j one has Corr = 1) ≥ P Pρi,j (ϕi,j ρi,j (ϕi,j = 1), ρi,j > ρ0 Corr = 1) ≤ P Pρi,j (ϕi,j ρi,j (ϕi,j = 1), ρi,j ≤ ρ0
(6.29)
where Pρi,j (ϕi,j = 1) is the probability of the rejection of hypothesis hi,j for a given ρi,j . The statistical procedure δ(x) is called invariant with respect to the group G if δ(gc,d x) = δ(x) for ∀gc,d ∈ G We consider the following class D of statistical procedures δ(x) for network structures identification: 1. Any statistical procedure δ(x) ∈ D is invariant with respect to the group G of shift/scale transformations of the sample space. 2. Risk function of any statistical procedure δ(x) ∈ D is continuous with respect to parameter. 3. Individual tests ϕi,j generated by any δ(x) ∈ D depend on observations xi (t), xj (t), t = 1, . . . , n only. Define statistical procedure δCorr (x) by
6.4 Optimal Threshold Graph Identification in Pearson CorrelationNetwork
δCorr (x) = dQ if Φ Corr (x) = Q
77
(6.30)
where ⎛
Corr (x), 0, ϕ1,2 ⎜ Corr 0, ⎜ ϕ (x), Φ Corr (x) = ⎜ 2,1 ⎝ ... ... Corr (x), ϕ Corr (x), ϕN,1 N,2
⎞ Corr (x) . . . , ϕ1,N Corr (x) ⎟ . . . , ϕ2,N ⎟ ⎟ ... ... ⎠ ..., 0
(6.31)
The main result of this section is given in the following theorem Theorem 6.3 Let (X, γ P ) be a Gaussian Pearson correlation network, loss function be additive, and losses ai,j , bi,j of individual tests ϕi,j for testing hypotheses hi,j are connected with the significance levels αi,j of the tests by ai,j = 1 − αi,j , bi,j = αi,j . Then the statistical procedure δCorr defined by (6.30) and (6.31) is optimal multiple decision statistical procedure in the class D. Proof Statistical tests ϕ Corr defined by (6.28) are invariant with respect to the group G of scale/shift transformation of the sample space. It implies that multiple decision statistical procedure δCorr is invariant with respect to the group G. Statistical tests ϕ Corr depend on xi (t), xj (t), t = 1, . . . , n only. According to [55, 60], risk function of statistical procedure δCorr for additive loss function can be written as R(S, (ρi,j ), δCorr ) =
Corr r(ρi,j , ϕi,j )
(6.32)
i,j :i=j Corr ) is the risk function of the individual test ϕ Corr for the individual where r(ρi,j , ϕi,j i,j hypothesis hi,j , (ρi,j ) is the matrix of correlations. One has
Corr )= r(ρi,j , ϕi,j
⎧ Corr ⎪ ⎨ (1 − αi,j )Eρi,j ϕi,j , if ρi,j ≤ ρ0 ⎪ ⎩ α (1 − E ϕ Corr ), if ρ > ρ i,j ρi,j i,j i,j 0
Corr = α Corr Since Eρi,j ϕi,j i,j if ρi,j = ρ0 then function r(ρi,j , ϕi,j ) is continuous as function of ρi,j . Therefore, multiple decision statistical procedure δCorr belongs to the class D. (x) Let δ ∈ D be another statistical procedure for TG identification. Then ϕi,j depends on xi = (xi (1), . . . , xi (n)), xj = (xj (1), . . . , xj (n)) only. Statistical procedure δ ∈ D is invariant with respect to the group G if and only if associated (x) are invariant with respect to the group G for all i, j = individual tests ϕi,j 1, . . . , N , i = j . One has for an additive loss function (see [55, 60]):
78
6 Optimality of Network Structure Identification
R(S, θ, δ ) =
r(θ, ϕi,j )
(6.33)
i,j
where r(θ, ϕi,j )
=
, if ρ (1 − αi,j )Eθ ϕi,j i,j ≤ ρ0 αi,j (1 − Eθ ϕi,j ), if ρi,j > ρ0
are invariant with respect to the group G then the distributions of Since tests ϕi,j depend on ρ tests ϕi,j i,j only [62]. It implies r(θ, ϕi,j ) = r(ρi,j , ϕi,j )
and R(S, θ, δ ) = R(S, (ρi,j ), δ ) Risk functions of statistical procedures from the class D are continuous with = α . It means that the test ϕ has respect to parameter, and one gets Eρ0 ϕi,j i,j i,j Corr significance levels αi,j . The test ϕi,j is UMP invariant test of the significance level αi,j , therefore one has Corr r(ρi,j , ϕi,j ) ≤ r(ρi,j , ϕi,j ), i, j = 1, . . . , N.
Then R(S, (ρi,j ), δCorr ) ≤ R(S, (ρi,j ), δ ), ∀S ∈ G , ∀δ ∈ D, ∀ρ The theorem is proved. Note that in many publications on market network analysis the edge (i, j ) is included in the market graph if ri,j > ρ0 , and it is not included in the market graph if ri,j ≤ ρ0 . This statistical procedure corresponds to the statistical procedure δCorr (x) with αi,j = 0.5, i, j = 1, 2, . . . , N , i = j . Therefore, this procedure is optimal in the class D for the risk function equals to the sum of the expected numbers of false edge inclusions and of the expected number of false edge exclusions. For the case ρ0 = 0, the statistical procedure δCorr (x) is in addition optimal in the class of unbiased statistical procedures for the same additive loss function. This follows from optimality of the tests ϕ Corr (x) in the class of unbiased tests and some results from [54, 60]. The class D is defined by three conditions. All of them are important in the proof of the Theorem 6.3 and cannot be removed. The condition that risk function is continuous with respect to parameter cannot be removed, because when we consider the class D without this condition the statistical procedure δCorr with
6.5 Optimal Threshold Graph Identification in Sign SimilarityNetwork
79
significance levels of individual tests αi,j is no more optimal in this large class. A counterexample is given by any statistical procedure of the same type δCorr , but with different significance levels of individual tests. Note that all these statistical procedures have a discontinuous risk function for the losses ai,j = 1 − αi,j , bi,j = αi,j . The condition that individual tests ϕi,j depend on observations xi (t), xj (t) only also cannot be removed. A counterexample is given by Holm type step down procedures that can be shown by numerical experiments.
6.5 Optimal Threshold Graph Identification in Sign Similarity Network In this section, we study optimal statistical procedures for threshold graph identification in a sign similarity (sign correlation based) network. Our construction of optimal procedures is based on the simultaneous inference of optimal two-decision procedures. In the sign similarity network model, the weight of the edge (i, j ) is defined by Sg
γi,j = pi,j = P ((Xi − E(Xi ))(Xj − E(Xj )) > 0)
(6.34)
Sg
For a given threshold γ0 = p0 , the true threshold graph in the sign similarity network model is constructed as follows: The edge between two nodes i and j is included in the true threshold graph if pi,j > p0 , where pi,j is the probability of the sign coincidence of the random variables associated with nodes i and j . In practice we are given a sample of observations x(1), x(2), . . . , x(n) from X, which are modeled by a family of random vectors X(t) = (X1 (t), X2 (t), . . . , XN (t)),
t = 1, 2, . . . , n
where n is the number of observations (sample size) and vectors X(t) are independent and identically distributed as X = (X1 , X2 , . . . , XN ). Henceforth we assume that expectations E(Xi ), i = 1, 2, . . . , N are known. We put (for simplicity) E(Xi ) = 0, i = 1, 2, . . . , N . In this case pi,j = P (Xi Xj > 0), i, j = 1, 2, . . . , N
(6.35)
Define the N × n matrix x = (xi (t)). Consider the set G of all N × N symmetric matrices G = (gi,j ) with gi,j ∈ {0, 1}, i, j = 1, 2, . . . , N, gi,i = 0, i = 1, 2, . . . , N . Any individual edge test can be reduced to a hypothesis testing problem: hi,j : pi,j ≤ p0 vs ki,j : pi,j > p0
(6.36)
80
6 Optimality of Network Structure Identification
Let ϕi,j (x) be a test for individual hypothesis (6.36). More precisely, ϕi,j (x) = 1 means that the hypothesis hi,j is rejected (the edge (i, j ) is included in the threshold graph), ϕi,j (x) = 0 means that hi,j is accepted (the edge (i, j ) is not included in the threshold graph). Let Φ(x) be the matrix Φ(x) = (ϕi,j (x))i,j =1,...,N . Multiple decision statistical procedure δ(x) based on the simultaneous inference of individual edge tests (6.36) can be written as δ(x) = dG , if Φ(x) = G
(6.37)
Let i,j
p1,1 = P (Xi > 0, Xj > 0)
i,j
p0,1 = P (Xi ≤ 0, Xj > 0)
p0,0 = P (Xi ≤ 0, Xj ≤ 0), p1,0 = P (Xi > 0, Xj ≤ 0), i,j
i,j
i,j
i,j
One has pi,j = p0,0 + p1,1 . Define uk (t) =
0, xk (t) ≤ 0 1, xk (t) > 0
k = 1, 2, . . . , N . Let us introduce the statistics i,j i,j T1,1 = nt=1 ui (t)uj (t); T0,0 = nt=1 (1 − ui (t))(1 − uj (t)); i,j i,j T0,1 = nt=1 (1 − ui (t))uj (t); T1,0 = nt=1 ui (t)(1 − uj (t)); i,j i,j Vi,j = T1,1 + T0,0
(6.38)
To construct a multiple decision procedure, we use the following individual edge tests: 0, Vi,j ≤ ci,j Sg (6.39) ϕi,j (xi , xj ) = 1, Vi,j > ci,j where for a given significance level αi,j , the constant ci,j is defined as the smallest integer such that: n k=ci,j
n! (p0 )k (1 − p0 )n−k ≤ αi,j k!(n − k)!
(6.40)
Let Φ Sg (x) be the matrix ⎛
Sg
1, ϕ1,2 (x), ⎜ Sg 1, ⎜ ϕ (x), Φ Sg (x) = ⎜ 2,1 ⎝ ... ... Sg Sg ϕN,1 (x), ϕN,2 (x),
⎞ Sg . . . , ϕ1,N (x) ⎟ Sg . . . , ϕ2,N (x) ⎟ ⎟, ... ... ⎠ ..., 1
(6.41)
6.5 Optimal Threshold Graph Identification in Sign SimilarityNetwork
81
Sg
where ϕi,j (x) is defined by (6.39) and (6.40). Now we can define our multiple decision statistical procedure for the threshold graph identification in sign similarity network by δ Sg (x) = dG , if Φ Sg (x) = G
(6.42)
The main result of this section is the following theorem. Theorem 6.4 Let the loss function w be additive, let the individual test ϕi,j depend only on ui (t), uj (t) for any i, j , and let the following symmetry conditions be satisfied i,j
i,j
p11 = p00 ,
i,j
i,j
p10 = p01 ,
∀i, j = 1, 2, . . . , N
(6.43)
Then, for the statistical procedure δ Sg defined by (6.39), (6.40), (6.41), and (6.42) for the threshold graph identification in the sign similarity network, one has Risk(S, δ Sg ) ≤ Risk(S, δ) for any adjacency matrix S and any w-unbiased statistical procedure δ. Proof We prove optimality in three steps. First, we prove that under the symmetry conditions (6.43), each individual test (6.39) is uniformly most powerful (UMP) in the class of tests based on ui (t), uj (t) only for the individual hypothesis testing hi,j : pi,j ≤ p0 vs ki,j : pi,j > p0
(6.44)
By symmetry the individual hypothesis (6.44) can be written as: i,j
hi,j : p00 ≤
p0 p0 i,j vs ki,j : p00 > 2 2 i,j
i,j
(6.45) i,j
Let for simplicity of notations p0,0 = p0,0 ; p0,1 = p0,1 ; p1,0 = p1,0 ; p1,1 = i,j
i,j
i,j
i,j
i,j
p1,1 , T0,0 = T0,0 ; T0,1 = T0,1 ; T1,0 = T1,0 ; T1,1 = T1,1 . One has T1,1 + T1,0 + T0,1 + T0,0 = n; Symmetry implies p0,0 + p1,0 =
1 2
Let t1,1 , t1,0 , t0,1 , t0,0 be nonnegative integers with t1,1 + t1,0 + t0,1 + t0,0 = n and C = n!/(t1,1 !t1,0 !t0,1 !t0,0 !). One has
82
6 Optimality of Network Structure Identification t
t
t
t
1,1 1,0 0,1 0,0 P (T1,1 = t1,1 ; T1,0 = t1,0 ; T0,1 = t0,1 ; T0,0 = t0,0 ) = Cp1,1 p1,0 p0,1 p0,0 =
t
1,1 = Cp0,0
+t0,0 t1,0 +t0,1 p1,0
= C1 exp{(t1,1 + t0,0 ) ln
p0,0 } 1/2 − p0,0
where C1 = C(1/2 − p0,0 )n . Then, the hypotheses (6.45) are equivalent to the hypotheses: hi,j : ln(
p0,0 p0 p0,0 p0 ) ≤ ln( ) vs ki,j : ln( ) > ln( ) 1/2 − p0,0 1 − p0 1/2 − p0,0 1 − p0 (6.46)
For p0,0 = p0 /2, the random variable V = T1,1 + T0,0 has the binomial distribution B(n, p0 ). Therefore, the critical value for the test (6.39) is defined from (6.40). According to ([62], Ch.3, corollary 3.4.1) the test (6.39) is uniformly most powerful (UMP) at the level αi,j for hypothesis testing (6.46). Second, we prove that the statistical procedure (6.42) is w-unbiased. For any two-decision test for hypothesis testing (6.44), the risk function can be written as: Risk = R(si,j , ϕi,j ) =
ai,j P (ϕi,j (x) = 1/pi,j ), if si,j = 0(pi,j ≤ p0 ) bi,j P (ϕi,j (x) = 0/pi,j ), if si,j = 1(pi,j > p0 )
One has Sg
Sg
ai,j P (ϕi,j (x) = 1/pi,j ) ≤ bi,j P (ϕi,j (x) = 0/pi,j ) if pi,j ≤ p0 Sg
Sg
ai,j P (ϕi,j (x) = 1/pi,j ) ≥ bi,j P (ϕi,j (x) = 0/pi,j ), if pi,j > p0 which is equivalent to R(si,j , ϕi,j ) ≤ R(si,j , ϕi,j ), ∀si,j , si,j Sg
Sg
(6.47)
This relation implies Sg
P (ϕi,j (x) = 1/pi,j = p0 ) = αi,j =
bi,j ai,j + bi,j
Sg
This implies that the test ϕi,j has significance level αi,j = bi,j /(ai,j + bi,j ). For the loss function (4.6) and any multiple decision statistical procedure δ, one has R(HS , δ)= Q∈G ( i,j :si,j =0;qi,j =1 ai,j + i,j :si,j =1;qi,j =0 bi,j )P (x ∈ DQ /HS )= = si,j =0 ai,j P (ϕi,j (x) = 1)/HS ) + si,j =1 bi,j P (ϕi,j (x) = 0)/HS ) (6.48)
6.5 Optimal Threshold Graph Identification in Sign SimilarityNetwork
83
Therefore: R(HS , δ) =
N N
R(si,j ; ϕi,j )
(6.49)
i=1 j =1
From (6.47), one has
w(S, Q)P (δ Sg (x)=dQ /HS ) ≤
Q∈G
w(S , Q)P (δ Sg (x)=dQ /HS ), ∀S, S ∈ G
Q∈G
(6.50) Thus, the multiple testing statistical procedure δ Sg is unbiased. Third, we prove that the procedure (6.42) is optimal in the class of unbiased statistical procedures for the threshold graph identification in the sign similarity network. Let δ(x) be another unbiased statistical procedure for the threshold graph identification in the sign similarity network. Then δ(x) generates a partition of the sample space R N ×n into L = 2M , M = N(N − 1)/2 parts: DG = {x ∈ R N ×n : δ(x) = dG };
!
DG = R N ×n
G∈G
Define " Ai,j = G:gi,j (x)=0 DG " Ai,j = G:gi,j (x)=1 DG
(6.51)
and ϕi,j (x) =
0, x ∈ Ai,j 1, x ∈ / Ai,j
(6.52)
The tests (6.52) are the tests for individual hypotheses testing (6.44). Since the procedure δ(x) is unbiased, one has Q∈G
w(S, Q)P (δ(x) = dQ /HS ) ≤
w(S , Q)P (δ(x) = dQ /HS ), ∀S, S ∈ G
Q∈G
Consider the hypotheses HS and HS which differ only in two components si,j = ; s si,j j,i = sj,i . Taking into account the unbiasedness of the procedure δ and the , ϕ ). This means structure of the loss function (4.6), one has R(si,j , ϕi,j ) ≤ R(si,j i,j that two decision tests (6.52) are unbiased. Therefore, P (ϕi,j = 1/p0 ) = αi,j =
bi,j . ai,j + bi,j
84
6 Optimality of Network Structure Identification Sg
Since we are restricted to the tests based only on ui (t), uj (t) and the test ϕi,j is UMP among tests of this class at the significance level αi,j , for any test ϕi,j based only on ui (t), uj (t) one has: Sg
R(si,j , ϕi,j ) ≤ R(si,j , ϕi,j ) From (6.49), one has R(HS , δ Sg ) ≤ R(HS , δ) for any adjacency matrix S. The theorem is proved. Note: In the proof of the theorem, we restricted ourselves in nonrandomized tests only. The generalization to the overall case easily could be obtained.
Chapter 7
Applications to Market Network Analysis
Abstract In this chapter, we use the general approach developed in previous chapters to analyze uncertainty of market network structure identification. The results of the Chap. 4 are illustrated by numerical experiments using the data from different stock markets. We study uncertainty of identification algorithms of the following market network structures: market graph, maximum cliques and maximum independent sets in the market graph, maximum spanning tree, and planar maximally filtered graph. Uncertainties of identification of different network structures are compared on the base of risk function for associated multiple decision statistical procedures.
7.1 Market Network Analysis Network models of financial markets have attracted attention during the last decades [4–7, 39, 66, 87]. A common network representation of the stock market is based on Pearson correlations stock’s returns. In such a representation, each stock corresponds to a vertex and a weight of edge between two vertices is estimated by sample Pearson correlation of corresponding returns. In our setting, the obtained complete weighted graph corresponds to a sample Pearson correlation network. The obtained network is a complete weighted graph. In order to simplify the network and preserve the key information, different filtering techniques are used in the literature. One of the filtering procedures is the extraction of a minimal set of important links associated with the highest degree of similarity belonging to the maximum spanning tree (MST) [66]. The MST was used to find a topological arrangement of stocks traded in a financial market which has associated with a meaningful economic taxonomy. This topology is useful in the theoretical description of financial markets and in search of economic common factors affecting specific groups of stocks. The topology and the hierarchical structure associated to it is obtained by using information present in the time series of stock prices only.
© The Author(s) 2020 V. A. Kalyagin et al., Statistical Analysis of Graph Structures in Random Variable Networks, SpringerBriefs in Optimization, https://doi.org/10.1007/978-3-030-60293-2_7
85
86
7 Applications to Market Network Analysis
The reduction to a minimal skeleton of links leads to loss of valuable information. To overcome this issue in [87], it was proposed to extent the MST by iteratively connecting the most similar nodes until the graph can be embedded on a surface of a given genus k. For example, for k = 0 the resulting graph is planar, which is called Planar Maximally Filtered Graph (PMFG). It was concluded in [87] that the method is pretty efficient in filtering relevant information about the connection structure both of the whole system and within obtained clusters. Another filtering procedure leads to the concept of market graph [4–6]. A market graph (MG) is obtained from the original network by removing all edges with weights less than a specified threshold γ0 ∈ [−1, 1]. Maximum cliques (MC) and maximum independent sets (MIS) analysis of the market graph was used to obtain valuable knowledge about the structure of the stock market. For example, it was noted in [89] that the peculiarity of the Russian market is reflected by the strong connection between the volume of stocks and the structure of maximum cliques. The core set of stocks of maximum cliques for Russian market is composed by the most valuable stocks. These stocks account for more than 90% of the market value and represent the largest Russian international companies from banking and natural resource sectors. In contrast, the core set of stocks of the maximum cliques for the US stock market has a different structure without connection to the stocks values. Today network analysis of financial markets is a very active area of investigation and various directions are developed in order to obtain valuable information for different stock markets [67]. Most of publications are related with numerical algorithms and economic interpretations of obtained results. Much less attention is paid to uncertainty of obtained results generated by stochastic nature of the market. In this chapter, we apply the results of the Chap. 4 Let N be a number of stocks, n be a number of days of observations. In our study, financial instruments are characterized by daily returns of the stocks. Stock k return for day t is defined as Rk (t) = ln
Pk (t) , Pk (t − 1)
(7.1)
where Pk (t) is the adjusted closing price of stock k on day t. We assume that for fixed k, Rk (t), t = 1, . . . , n, are independent random variables with the same distribution as Rk (i.i.d.) and the random vector R = (R1 , . . . , RN ) has multivariate distribution with correlation matrix ⎞ ρ1,1 · · · ρ1,N Γ = (ρij ) = ⎝ · · · · · · · · · ⎠ . ρN,1 · · · ρN,N ⎛
(7.2)
For this model, we introduce the reference (true) network model, which is a complete weighted graph with N nodes and weight matrix Γ = (ρi,j ). For the reference network, one can consider corresponding reference structures, for example, reference MST, reference PMFG, reference market graph, and others.
7.2 Measures of Uncertainty
87
Let rk (t), k = 1, . . . , N, t = 1, . . . , n, be the observed values of returns. Define the sample covariance 1 (ri (t) − r i )(rj (t) − r j ), n−1 n
si,j =
t=1
and sample correlation ri,j = √
si,j si,i sj,j
where r i = (1/n) nt=1 ri (t). Using the sample correlations we introduce the sample network, which is a complete weighted graph with N nodes and weight matrix Γˆ = (ri,j ). For the sample network, one can consider the corresponding sample structures, for example, sample MST, sample PMFG, sample market graph, and others.
7.2 Measures of Uncertainty To handle statistical uncertainty, we propose to compare the sample network with the reference network. Our comparison will be based on the risk function connected with the loss function developed in the Chap. 4. For a given structure S , we introduce a set of hypothesis: • hi,j : edge between vertices i and j is not included in the reference structure S ; • ki,j : edge between vertices i and j is included in the reference structure S . To measure the losses, we consider two types of errors: Type I error or False Positive error: edge is included in the sample structure when it is absent in the reference structure; Type II error or False Negative error: edge is not included in the sample structure when it is present in the reference structure. Let ai,j be the loss associated with the error of the first kind and bi,j the loss associated with the error of the second kind for the edge (i, j ). According to the Chap. 4, we define the risk for additive loss function for a given structure S , given identification procedure δ, and given number of observations n as R(S ; δ, n) =
[ai,j Pn (dki,j |hi,j ) + bi,j Pn (dhi,j |ki,j )],
(7.3)
1≤i