Graphs: Theory and Algorithms 1774077019, 9781774077016

The book Graphs:Theory and Algorithm is a collection of modern articles features several graph-based methods and algorit

393 103 10MB

English Pages 277 [394] Year 2020

Table of contents :
Cover
Title Page
Copyright
DECLARATION
ABOUT THE EDITOR
TABLE OF CONTENTS
List of Contributors
List of Abbreviations
Preface
Chapter 1 Introduction
Introduction
References
Chapter 2 An Image Encryption Algorithm Based on
Random Hamiltonian Path
Abstract
Introduction
Hamiltonian Path
Adjusted Bernoulli Map
Proposed Scheme
Discussion
Simulation Experiments
Histograms
Conclusions
Acknowledgments
Conflicts Of Interest
References
Chapter 3 Traveling in Networks with Blinking Nodes
Abstract
Introduction
Traveling In Complete Graphs
Traveling In Complete Bipartite Graphs
Open Problems
References
Chapter 4 On Minimum Spanning Subgraphs of Graphs with Proper
Connection Number 2
Abstract
Introduction
Complete Bipartite Graphs
Complete Multipartite Graphs
References
Chapter 5 New Algorithm for Calculating Chromatic
Index of Graphs and its Applications
Abstract
Introduction
The Main Results
Conclusion
Acknowledgements
References
Chapter 6 An Edge-Swap Heuristic for Finding Dense Spanning Trees
Abstract
Introduction
Preliminaries
The Edge-Swap Heuristic
Computational Results
Further Improvements
Complexity Analysis
References
Chapter 7 Identifying Network
Structure Similarity using Spectral Graph Theory
Abstract
Introduction
Background
Methodology
Results And Analysis
Conclusions
Future Direction
References
Chapter 8 On Generalized Distance Gaussian Estrada Index of Graphs
Abstract
Introduction
Motivation
Bounds For Generalized Distance Gaussian Estrada Index
Examples For Some Fundamental Special Graphs
Conclusions
Acknowledgments
Conflicts Of Interest
References
Chapter 9 Nullity and Energy Bounds of Central Graph of Smith Graphs
Abstract
Introduction
Literature Review
Preposition
Nullity Of Central Graph Of Smith Graphs
Conclusions
References
Chapter 10 Induced Subgraph Saturated Graphs
Abstract
Introduction
Paths
Cycles
Claws
Future Work
References
Chapter 11 Connection and Separation in Hyper Graphs
Abstract
Introduction
Fundamental Concepts
Connection In Hyper Graphs
Conclusion
Acknowledgement
References
Chapter 12 Vertex Rough Graphs
Abstract
Introduction
Preliminaries
Vertex Rough Graph
Rough Properties Of Rough Graph
Conclusion
Acknowledgements
References
Chapter 13 Incremental Graph Pattern Matching Algorithm for Big Graph Data
Abstract
Introduction
Related Work
Model And Definition
Experiments And Results Analysis
Conclusion
Notations
Conflicts Of Interest
References
Chapter 14 Framework And Algorithms For Identifying Honest Blocks
In Block Chain
Abstract
Introduction
Honest Block Identification Problem
Results
Conclusions And Discussions
Funding Statement
References
Chapter 15 Enabling Controlling Complex Networks with Local
Topological Information
Abstract
Introduction
Minimizing The Number Of Driver Nodes Through Local-Game
Matching
Minimization Of The Cost Control
Discussions And Conclusion
Acknowledgements
References
Chapter 16 Estimation Of Traffic Flow Changes Using Networks
in Networks Approaches
Abstract
Introduction
Background
Methodology
Application
Results And Discussion
Discussion
Conclusions
Acknowledgements
Funding
References
Chapter 17 Hidden Geometries In Networks Arising From
Cooperative Self-Assembly
Abstract
Introduction
Results And Discussion
Discussion
Methods
Acknowledgements
References
Chapter 18 Using Graph Theory to Assess the Interaction
between Cerebral Function, Brain Hemodynamics,
and Systemic Variables in Premature Infants
Abstract
Introduction
Dataset
Methods
Results
Discussion
Conclusions
Data Availability
Disclosure
Conflicts Of Interest
Acknowledgments
References
Index
Back Cover

Recommend Papers

Algorithms, graphs, and computers

106 101 2MB Read more

Planar Graphs: Theory and Algorithms 0444702121, 9780444702128, 9780080867748

Collected in this volume are most of the important theorems and algorithms currently known for planar graphs, together w

358 97 2MB Read more

Graphs, Networks and Algorithms [5.0] 9783662038222

From the reviews of the German edition: "Combinatorial optimization, along with graph algorithms and complexity the

103 47 Read more

Distributed Algorithms on Graphs 9780773573475

This volume contains papers presented at the First International Workshop on Distributed Algorithms. The papers present

120 24 7MB Read more

Graphs, Networks and Algorithms [4 ed.] 3540219056, 9783540219057

From the reviews of the previous editions ".... The book is a first class textbook and seems to be indispensable

356 23 4MB Read more

Graphs, Networks and Algorithms [3rd ed.] 3540727795, 9783540727798

From reviews of the previous editions “.... The book is a first class textbook and seems to be indispensable for everybo

406 74 4MB Read more

Locating Eigenvalues in Graphs: Algorithms and Applications 3031116976, 9783031116971

This book focuses on linear time eigenvalue location algorithms for graphs. This subject relates to spectral graph theor

162 106 3MB Read more

Sparsity: Graphs, Structures, and Algorithms (Algorithms and Combinatorics, 28) 9783642278747, 9783642278754, 3642278744

This is the first book devoted to the systematic study of sparse graphs and sparse finite structures. Although the notio

102 85 Read more

Graphs, Networks and Algorithms (Algorithms and Computation in Mathematics) 3540727795, 9783540727798

Book by Jungnickel, Dieter

106 40 6MB Read more

Graph drawing: algorithms for the visualization of graphs 9780133016154, 0133016153

This book is designed to describe fundamental algorithmic techniques for constructing drawings of graphs. Suitable as a

447 84 3MB Read more

Graphs: Theory and Algorithms
1774077019, 9781774077016

Author / Uploaded
Olga Moreira (Editor)

0 0 0
Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up

File loading please wait...

Citation preview

Graphs: Theory and Algorithms

Graphs: Theory and Algorithms

Edited by: Olga Moreira

ARCLER

P

r

e

s

s

www.arclerpress.com

Graphs: Theory and Algorithms Olga Moreira

Arcler Press 224 Shoreacres Road Burlington, ON L7L 2H2 Canada www.arclerpress.com Email: [email protected]

e-book Edition 2021 ISBN: 978-1-77407-902-7 (e-book) This book contains information obtained from highly regarded resources. Reprinted material sources are indicated. Copyright for individual articles remains with the authors as indicated and published under Creative Commons License. A Wide variety of references are listed. Reasonable efforts have been made to publish reliable data and views articulated in the chapters are those of the individual contributors, and not necessarily those of the editors or publishers. Editors or publishers are not responsible for the accuracy of the information in the published chapters or consequences of their use. The publisher assumes no responsibility for any damage or grievance to the persons or property arising out of the use of any materials, instructions, methods or thoughts in the book. The editors and the publisher have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission has not been obtained. If any copyright holder has not been acknowledged, please write to us so we may rectify. Notice: Registered trademark of products or corporate names are used only for explanation and identification without intent of infringement. © 2021 Arcler Press ISBN: 978-1-77407-701-6 (Hardcover)

Arcler Press publishes wide variety of books and eBooks. For more information about Arcler Press and its products, visit our website at www.arclerpress.com

DECLARATION Some content or chapters in this book are open access copyright free published research work, which is published under Creative Commons License and are indicated with the citation. We are thankful to the publishers and authors of the content and chapters as without them this book wouldn’t have been possible.

ABOUT THE EDITOR

Olga Moreira obtained her Ph.D. in Astrophysics from the University of Liege (Belgium) in 2010, her BSc. in Physics and Applied Mathematics from the University of Porto (Portugal). Her post-graduate travels and international collaborations with the European Space Agency (ESA) and European Southern Observatory (ESO) led to great personal and professional growth as a scientist. Currently, she is working as an independent researcher, technical writer, and editor in the fields of Mathematics, Physics, Astronomy and Astrophysics.

TABLE OF CONTENTS

List of Contributors .......................................................................................xv List of Abbreviations .................................................................................... xxi Preface.................................................................................................. ....xxiii Chapter 1

Introduction .............................................................................................. 1 Introduction ............................................................................................... 1 References ................................................................................................. 4

Chapter 2

An Image Encryption Algorithm Based on Random Hamiltonian Path ........................................................................ 5 Abstract ..................................................................................................... 5 Introduction ............................................................................................... 6 Hamiltonian Path ....................................................................................... 8 Adjusted Bernoulli Map ........................................................................... 12 Proposed Scheme .................................................................................... 15 Discussion ............................................................................................... 16 Simulation Experiments ........................................................................... 17 Histograms .............................................................................................. 19 Conclusions ............................................................................................. 26 Acknowledgments ................................................................................... 26 Conflicts Of Interest ................................................................................. 26 References ............................................................................................... 27

Chapter 3

Traveling in Networks with Blinking Nodes ............................................ 31 Abstract ................................................................................................... 31 Introduction ............................................................................................. 32 Traveling In Complete Graphs .................................................................. 35 Traveling In Complete Bipartite Graphs .................................................... 39 Open Problems........................................................................................ 41

References .............................................................................................. 42 Chapter 4

On Minimum Spanning Subgraphs of Graphs with Proper Connection Number 2............................................................................. 43 Abstract ................................................................................................... 43 Introduction ............................................................................................. 44 Complete Bipartite Graphs....................................................................... 47 Complete Multipartite Graphs.................................................................. 52 References ............................................................................................... 59

Chapter 5

New Algorithm for Calculating Chromatic Index of Graphs and its Applications ...................................................... 61 Abstract ................................................................................................... 61 Introduction ............................................................................................. 62 The Main Results ..................................................................................... 62 Conclusion .............................................................................................. 68 Acknowledgements ................................................................................. 68 References ............................................................................................... 69

Chapter 6

An Edge-Swap Heuristic for Finding Dense Spanning Trees .................... 71 Abstract ................................................................................................... 71 Introduction ............................................................................................. 72 Preliminaries............................................................................................ 72 The Edge-Swap Heuristic ......................................................................... 74 Computational Results ............................................................................. 76 Further Improvements .............................................................................. 77 Complexity Analysis ................................................................................ 80 References ............................................................................................... 81

Chapter 7

Identifying Network Structure Similarity using Spectral Graph Theory ................................... 83 Abstract ................................................................................................... 84 Introduction ............................................................................................. 84 Background ............................................................................................. 85 Methodology ........................................................................................... 90 Results And Analysis ................................................................................ 91 Conclusions ............................................................................................. 98

x

Future Direction ...................................................................................... 99 References ............................................................................................. 101 Chapter 8

On Generalized Distance Gaussian Estrada Index of Graphs ................ 105 Abstract ................................................................................................. 105 Introduction ........................................................................................... 106 Motivation ............................................................................................. 107 Bounds For Generalized Distance Gaussian Estrada Index ..................... 110 Examples For Some Fundamental Special Graphs .................................. 120 Conclusions ........................................................................................... 126 Acknowledgments ................................................................................. 127 Conflicts Of Interest ............................................................................... 127 References ............................................................................................. 128

Chapter 9

Nullity and Energy Bounds of Central Graph of Smith Graphs .............. 131 Abstract ................................................................................................. 131 Introduction ........................................................................................... 132 Literature Review ................................................................................... 134 Preposition ........................................................................................... 135 Nullity Of Central Graph Of Smith Graphs ............................................ 135 Conclusions ........................................................................................... 142 References ............................................................................................ 143

Chapter 10 Induced Subgraph Saturated Graphs ..................................................... 145 Abstract ................................................................................................. 145 Introduction ........................................................................................... 146 Paths .................................................................................................... 148 Cycles.................................................................................................... 157 Claws .................................................................................................... 159 Future Work........................................................................................... 160 References ............................................................................................. 161 Chapter 11 Connection and Separation in Hyper Graphs ........................................ 163 Abstract ................................................................................................. 163 Introduction .......................................................................................... 164 Fundamental Concepts .......................................................................... 165

xi

Connection In Hyper Graphs ................................................................. 170 Conclusion ............................................................................................ 191 Acknowledgement ................................................................................. 191 References ............................................................................................ 192 Chapter 12 Vertex Rough Graphs ............................................................................ 193 Abstract ................................................................................................. 193 Introduction ........................................................................................... 194 Preliminaries.......................................................................................... 196 Vertex Rough Graph .............................................................................. 197 Rough Properties Of Rough Graph......................................................... 202 Conclusion ............................................................................................ 206 Acknowledgements ............................................................................... 207 References ............................................................................................. 208 Chapter 13 Incremental Graph Pattern Matching Algorithm for Big Graph Data .... 211 Abstract ................................................................................................. 211 Introduction ........................................................................................... 212 Related Work ......................................................................................... 213 Model And Definition ............................................................................ 214 Experiments And Results Analysis .......................................................... 221 Conclusion ............................................................................................ 223 Notations ............................................................................................... 223 Conflicts Of Interest ............................................................................... 224 References ............................................................................................. 225 Chapter 14 Framework And Algorithms For Identifying Honest Blocks In Block Chain....................................................................................... 229 Abstract ................................................................................................. 230 Introduction ........................................................................................... 230 Honest Block Identification Problem...................................................... 232 Results ................................................................................................... 240 Conclusions And Discussions ................................................................ 243 Funding Statement ................................................................................. 245 References ............................................................................................. 246

xii

Chapter 15 Enabling Controlling Complex Networks with Local Topological Information........................................................................ 249 Abstract ................................................................................................. 250 Introduction ........................................................................................... 250 Minimizing The Number Of Driver Nodes Through Local-Game Matching...................................................................................... 255 Minimization Of The Cost Control ......................................................... 259 Discussions And Conclusion .................................................................. 265 Acknowledgements ............................................................................... 266 References ............................................................................................. 267 Chapter 16 Estimation Of Traffic Flow Changes Using Networks in Networks Approaches ....................................................................... 271 Abstract ................................................................................................. 271 Introduction ........................................................................................... 272 Background ........................................................................................... 274 Methodology ......................................................................................... 279 Application ............................................................................................ 286 Results And Discussion .......................................................................... 293 Discussion ............................................................................................. 298 Conclusions ........................................................................................... 299 Acknowledgements ............................................................................... 301 Funding ................................................................................................. 301 References ............................................................................................. 302 Chapter 17 Hidden Geometries In Networks Arising From Cooperative Self-Assembly .................................................................... 309 Abstract ................................................................................................. 309 Introduction ........................................................................................... 310 Results And Discussion .......................................................................... 313 Discussion ............................................................................................. 323 Methods ................................................................................................ 325 Acknowledgements ............................................................................... 326 References ............................................................................................. 327

xiii

Chapter 18 Using Graph Theory to Assess the Interaction between Cerebral Function, Brain Hemodynamics, and Systemic Variables in Premature Infants ........................................ 331 Abstract ................................................................................................. 332 Introduction ........................................................................................... 332 Dataset .................................................................................................. 334 Methods ................................................................................................ 336 Results .................................................................................................. 345 Discussion ............................................................................................. 348 Conclusions ........................................................................................... 352 Data Availability .................................................................................... 353 Disclosure ............................................................................................. 353 Conflicts Of Interest ............................................................................... 353 Acknowledgments ................................................................................. 353 References ............................................................................................ 355 Index ..................................................................................................... 359

xiv

LIST OF CONTRIBUTORS Olga Moreira Wei Zhang Software College, Northeastern University, No.11, Lane 3, Wenhua Road, Shenyang 110819, China Shuwen Wang Software College, Northeastern University, No.11, Lane 3, Wenhua Road, Shenyang 110819, China Weijie Han Software College, Northeastern University, No.11, Lane 3, Wenhua Road, Shenyang 110819, China Hai Yu Software College, Northeastern University, No.11, Lane 3, Wenhua Road, Shenyang 110819, China Zhiliang Zhu Software College, Northeastern University, No.11, Lane 3, Wenhua Road, Shenyang 110819, China Braxton Carrigan Southern CT State University James Hammer Cedar Crest College Zhenming Bi Western Michigan University Gary Chartrand Western Michigan University Garry L. Johns Western Michigan University Ping Zhang Western Michigan University xv

F. Salama Department of Mathematics, Faculty of Science, Tanta University, Tanta, Egypt Department of Mathematics, Faculty of Science, Taibah University, Madinah, Kingdom of Saudi Arabia Mustafa Ozen Bogazici University Hua Wang Georgia Southern University Kai Wang Georgia Southern University Demet Yalman Bogazici University Ralucca Gera Department of Applied Mathematics, 1 University Avenue, Naval Postgraduate School, Monterey 93943, CA, USA. L. Alonso Instituto de Física, Benemérita Universidad Autónoma de Puebla, Apartado Postal J-48, Puebla 72570, Mexico. Brian Crawford Department of Computer Science, 1 University Avenue, Naval Postgraduate School, Monterey 93943, CA, USA. Jeffrey House Department of Operation Research, 1 University Avenue, Naval Postgraduate School, Monterey 93943, CA, USA. J. A. Mendez-Bermudez Instituto de Física, Benemérita Universidad Autónoma de Puebla, Apartado Postal J-48, Puebla 72570, Mexico. Thomas Knuth Instituto de Física, Benemérita Universidad Autónoma de Puebla, Apartado Postal J-48, Puebla 72570, Mexico. Ryan Miller Department of Applied Mathematics, 1 University Avenue, Naval Postgraduate School, Monterey 93943, CA, USA. xvi

Abdollah Alhevaz Faculty of Mathematical Sciences, Shahrood University of Technology, P.O. Box 3163619995161, Shahrood, Iran Maryam Baghipur Faculty of Mathematical Sciences, Shahrood University of Technology, P.O. Box 3163619995161, Shahrood, Iran Yilun Shang Department of Computer and Information Sciences, Northumbria University, Newcastle NE1 8ST, UK Usha Sharma Department of Mathematics and Statistics, Banasthali University, Banasthali Rajasthan, India Renu Naresh Department of Mathematics and Statistics, Banasthali University, Banasthali Rajasthan, India Craig M. Tennenhouse University of New England, Mohammad A. Bahmanian Illinois State University Mateja Sajna University of Ottawa, Bibin Mathew Department of Mathematics, National Institute of Technology, Calicut, India Sunil Jacob John Department of Mathematics, National Institute of Technology, Calicut, India Harish Garg School of Mathematics, Thapar Institute of Engineering and Technology, Patiala, India Lixia Zhang College of Mathematics and Computer Science, Key Laboratory of High Performance Computing and Stochastic Information Processing, Ministry of Education of China, Hunan Normal University, Changsha 410081, China

xvii

Jianliang Gao School of Information Science and Engineering, Central South University, Changsha 410083, China Xu Wang Key Laboratory of Management, Decision and Information Systems, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China, Laboratory of Big Data and Block chain, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, China, Guohua Gan Laboratory of Big Data and Block chain, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, China, Beijing Taiyiyun Technology Co., Ltd., Beijing, China, University of Science & Technology Beijing, Beijing, China Ling-Yun WuID Key Laboratory of Management, Decision and Information Systems, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China, Laboratory of Big Data and Block chain, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, China, Guoqi Li Center for Brain Inspired Computing Research, Department of Precision Instrument, Tsinghua University, Beijing, P. R. China. Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, P. R. China. Lei Deng Center for Brain Inspired Computing Research, Department of Precision Instrument, Tsinghua University, Beijing, P. R. China. Present address: Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA, USA. Guoqi Li, Lei Deng, Gaoxi Xiao and Pei Tang contributed equally to this work. Gaoxi Xiao School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore.

xviii

Pei Tang Center for Brain Inspired Computing Research, Department of Precision Instrument, Tsinghua University, Beijing, P. R. China. Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, P. R. China. Changyun Wen School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore. Wuhua Hu School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore. Jing Pei Center for Brain Inspired Computing Research, Department of Precision Instrument, Tsinghua University, Beijing, P. R. China. Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, P. R. China. Luping Shi1 Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, P. R. China. H. Eugene Stanley Center for Polymer Studies, Department of Physics, Boston University, Boston, USA. Jürgen Hackl Institute for Construction Engineering and Management, ETH Zurich, StefanoFranscini-Platz 5, 8093, Zurich, Switzerland Bryan T. Adey Institute for Construction Engineering and Management, ETH Zurich, StefanoFranscini-Platz 5, 8093, Zurich, Switzerland Milovan Šuvakov Department of Theoretical Physics, Jožef Stefan Institute, 1000, Ljubljana, Slovenia. Institute of Physics, University of Belgrade, 11080, Belgrade, Serbia. Miroslav Andjelković Department of Theoretical Physics, Jožef Stefan Institute, 1000, Ljubljana, Slovenia. Institute of Nuclear Sciences Vinča, University of Belgrade, 1100, Belgrade, Serbia. Bosiljka Tadić Department of Theoretical Physics, Jožef Stefan Institute, 1000, Ljubljana, Slovenia.

xix

Dries Hendrikx Department of Electrical Engineering (ESAT), STADIUS, KU Leuven, Leuven, Belgium imec, Leuven, Belgium Liesbeth Thewissen Department of Development and Regeneration, KU Leuven, Leuven, Belgium Department of Neonatology, UZ Leuven, Leuven, Belgium Anne Smits Department of Development and Regeneration, KU Leuven, Leuven, Belgium Department of Neonatology, UZ Leuven, Leuven, Belgium Gunnar Naulaers Department of Development and Regeneration, KU Leuven, Leuven, Belgium Department of Neonatology, UZ Leuven, Leuven, Belgium Karel Allegaert Department of Development and Regeneration, KU Leuven, Leuven, Belgium Department of Pediatric Surgery and Intensive Care, Erasmus MC-Sophia Children’s Hospital, Rotterdam, Netherlands Department of Neonatology, Erasmus MC-Sophia Children’s Hospital, Rotterdam, Netherlands Sabine Van Huffel Department of Electrical Engineering (ESAT), STADIUS, KU Leuven, Leuven, Belgium imec, Leuven, Belgium Alexander Caicedo Department of Electrical Engineering (ESAT), STADIUS, KU Leuven, Leuven, Belgium imec, Leuven, Belgium

LIST OF ABBREVIATIONS

BA

Business process analyst

CCP

Circled control path

CRIB

Clinical risk index for babies

DB

Database engineer

DPoS

Delegated Proof of Stake

DLA

Diffusion limited aggregation

DAG

Directed acyclic graph

DCPs

Directed control paths

EEG

Electroencephalography

GOE

Gaussian Orthogonal Ensemble

GED

Graph Edit Distance

GPMS

Graph pattern matching algorithm

G-H

Gromov–Hausdorff

HR

Heart rate

ILQR

Implicit linear quadratic regulator

IR

Improvement ratio

IBI

Interburst interval

IQR

Interquartile range

LM

Local-game matching

MM

maximum matching

MABP

Mean arterial blood pressure

MLCP

Minimizing longest control path

MSTP

Minimum-weight spanning tree problem

NIRS

Near-infrared spectroscopy

NiN

Networks in Networks

NREM

nonrapid eye movement

OPGM

Orthonormal-constraint-based projected gradient method

PMA

Postmenstrual age

PNA

Postnatal age

PSD

Power spectral density

PDF

Probability density function

PM

Project manager

PoI

Proof of Importance

PoL

Proof of Luck

PoS

Proof of Stake

PoW

Proof of Work

RBF

Radial basis function

RAMc

Random allocation method

RAM

Random Allocation Method

RMS

Root mean squared

SLN

Single-layer network

SA

Software architecture

SD

Software developer

ST

Software tester

TA

Tracé alternant

TSP

Traveling Salesman Problem

UGC

University Grants Commission

VIF

Variance inflatable factor

xxii

PREFACE

This book is composed of 18 selected chapters describing graph-based methodologies, algorithms and their applications in various research fields. The first chapter introduces the basics of graph theory concepts such as the definition of directed and undirected graphs; as well as the definition of incidence, adjacency, Laplacian, degree, and distance matrices. The first chapter concludes with a brief description of the graph coloring method. The remaining chapters are organized into the following topics: • • • •

Hamiltonian path problem (Chapter 2 and 3); Minimum spanning trees and graph coloring (Chapters 4 to 6); Spectral graph theory (Chapters 7 to 9); Induced Graph, hypergraphs, and vertex rough graphs (Chapter 10 to 12); • Graph-based algorithms for analysis of patterns and networks (Chapter 13 and 15); • Examples of graph theory applications (Chapters 16 to 18). Chapter 2 describes an algorithm for decrypting digital images based on graph theory. Decryption is achieved by solving and generating Hamiltonian paths within the images. Chapter 3 is concerned with the description of Hamiltonian walks in a blinking node system. Chapter 4 presents a new method for determining proper connection number and size of several classes of multi-partite graphs. Chapter 5 proposes a new algorithm for solving edge coloring problems called RF algorithm. Chapter 6 describes a methodology and algorithm for analyzing dense spanning trees using an edge-swap algorithm. As aforementioned, chapters 7, 8 and 9 are focused on spectral graph theory. Chapter 7 includes a new method for detecting networks similarity using sequential comparisons of adjacency, Laplacian, and normalized Laplacian matrix eigenvalues distributions. Chapter 8 overviews the original Estrada index and introduces a new variant based on the Gaussianization of the generalized distance matrix. Chapter 9 discusses the nullity of central graph of different types of smith graphs. Three graph topologies are discussed in 10, 11 and 12. Chapter 10 addresses the

problem of graph saturation as it pertains to induced graphs. Chapter 11 includes a study the connectivity of properties hypergraphs. Chapter 12 introduces the theoretical aspects of vertex rough graphs. Chapters 13, 14 and 15 are centered on the implementation of three graph-based algorithms: (i) an incremental graph pattern matching algorithm for the analysis of big data; (ii) an algorithm for the detection of honest blocks; and (iii) a localgame matching algorithm for analyzing the structural controllability of large scale real-world networks. Finally, chapters 16, 17 and 18 provide detailed examples of graph theory applications in the analysis of transportation networks, self-assembly networks of nanoscale objects, and human brain network functions.

Introduction

1

Olga Moreira

INTRODUCTION Graph theory was first formulated and introduced by Leonhard Euler in his analysis of the Seven Bridges of Königsberg problem (Euler, 1741; Newman et al., 1953). To solve the problem, Euler replaces each land mass with an abstract vertex (graph node) and each bridge with an abstract connection (graph edge). Nowadays, this has numerous applications in many research fields as graphs can be used to represent different types of data. For instance, graphs have been used to represent network structures, molecular models, species migration patterns, natural language grammatical structures, and more (e.g. Gross and Yellen, 2009; Foulds, 2012). Graphs are different from tree structures as shown in Figure 1. A tree structure only flows in one direction, they follow from the root-node to child-nodes; and a child node can only have one parent-node. On the other hand, a graph can be directed, undirected, cyclic or acyclic depending on whether the edges can connect nodes in one direction only or in any possible ways; and on whether a node is repeated or not. Furthermore, graph edges can have a weight value associated with them. For instance, an edge can be weighted by its centrality. Directed acyclic graphs have been extensively used to represent data processing networks, task scheduling with ordering

2

Graphs: Theory and Algorithms

constraints, and structural causal models, (e.g. Shrier and Platt, 2008; Barrett, 2012; Textor et al., 2016). Directed cyclic graphs have been used to represent feedback and economic process models (e.g. Gassner et al., 2009). Graph structures in algorithms are represented by lists or matrices. A list representation of graphs consists of incidence lists - data arrays containing pairs of nodes; and adjacency lists - data arrays containing information on how each graph node is associated with its neighboring nodes or edges. Similarly, a matrix representation of graphs consists of incidence and adjacency matrices. An incidence matrix is a (0,1)-matrix whose rows represent nodes and whose columns represent edges. The incidence matrix of an undirected graph is such that 1 means that the node-edge pairs is incident and 0 otherwise. The incidence matrix of directed graph is such that -1 means that edge leaves the paired node, 1 means that edge enters the paired node, and 0 otherwise. An adjency matrix is a (0,1)-matrix in which both rows and columns are indexed by the nodes such that 1 means that the pair of nodes is adjacent and 0 otherwise (e.g. Chartrand, 1977; Bollobas, 2012). The relationship between a graph and its adjacency matrix’s eigenvalues and eigenvectors is studied using spectral graph theory methods (Chung and Graham, 1997). Other important matrix representations of graphics are the Laplacian, the degree, and the distance matrices. The degree matrix is a diagonal matrix that contains information about the number of edges attached to each node. The Laplacian matrix is obtained from the subtraction between the degree and adjacency matrices. It can be used to map how a graph differs at one given node from the neighboring nodes and to calculate the number of spanning trees. The distance matrix is a weighted adjacency matrix that contains information pertaining to the distance between a pair of nodes and can be used to solve length of the shortest path between two nodes. The Laplacian and distance matrices have been used to solve well-know problems such as the Hamiltonian path problem, the shortest path problem, minimum spanning tree problem with direct applications in the design and analysis of networks (e.g. Chartrand, 1977; Bollobas, 2012).

Introduction

(a) Tree

(b) Undirected Graph

(c) Directed Acyclic Graph

(d) Directed Cyclic Graph

3

Figure 1. Illustration of a tree structure (a); a graph with undirected edges (b); a directed acyclic graph (c); a directed cyclic graph.

The first version graph coloring method was originally proposed to solve the “Four Colour Problem” published in a form of a puzzle by Cayley in 1878 (Kubale, 2004). It consists in assigning labels to graph elements that are subjected to specific constraints. Vertex coloring consists of labelling the graph nodes such that no two adjacent nodes are of the same color. Similarly, edge coloring consists of labelling each graph edge such that no two adjacent edges are of the same color. Graph coloring can be used for determining whether a graph is bipartite or not, i.e. Whether the graph nodes can be divided into two disjoint and independent sets such that every edge in one set connects to one edge in the other set or not. This has important implications dynamic programming, scheduling task as it provides a mean to estimate the time complexity that will take to compute a graph. Other methodologies, algorithms, applications and different aspects of graph theory are discussed in the remaining chapters of this book, from Hamilton path problems, graph coloring, pattern matching, to the analysis of complex networks.

4

Graphs: Theory and Algorithms

REFERENCES 1.

Barrett, M. (2012). An Introduction to Directed Acyclic Graphs. https:// cran.r-project.org/web/packages/ggdag/vignettes/intro-to-dags.html 2. Bollobas, B. (2012). Graph theory: an introductory course (Vol. 63). Springer Science & Business Media. ISBN: 978-1-4612-9969-1 3. Chartrand, G. (1977). Introductory Graph Theory. Courier Corporation. ISBN:0-486-24775-9 4. Euler, L. (1741). Solutio problematis ad geometriam situs pertinentis. Commentarii academiae scientiarum Petropolitanae, 128-140. 5. Chung, F. R., & Graham, F. C. (1997). Spectral graph theory (No. 92). American Mathematical Soc. ISBN: 0-8218-0315-8. 6. Foulds, L. R. (2012). Graph theory applications. Springer Science & Business Media. ISBN: 978-0-387-97599-3. 7. Kubale, M. (2004). Graph colorings (Vol. 352). American Mathematical Soc. ISBN: 0-8218-3458-4. 8. Gassner, M., & Maréchal, F. (2009). Thermo-economic process model for thermochemical production of Synthetic Natural Gas (SNG) from lignocellulosic biomass. Biomass and bioenergy, 33(11), 1587-1604. https://arxiv.org/abs/1302.4982 9. Gross, J. L., & Yellen, J. (2005). Graph theory and its applications. CRC press. ISBN: 978-1-58488-505-4. 10. Newman, J., Leonhard Euler and the Konigsberg Bridges, Sci. Amer., 189 (1953), 66–70. 11. Shrier, I., & Platt, R. W. (2008). Reducing bias through directed acyclic graphs. BMC medical research methodology, 8(1), 70. https://doi. org/10.1186/1471-2288-8-70 12. Textor, J., van der Zander, B., Gilthorpe, M. S., Liśkiewicz, M., & Ellison, G. T. (2016). Robust causal inference using directed acyclic graphs: the R package ‘dagitty’. International journal of epidemiology, 45(6), 1887-1894. https://doi.org/10.1093/ije/dyw341

An Image Encryption Algorithm Based on Random Hamiltonian Path

2

Wei Zhang, Shuwen Wang, Weijie Han, Hai Yu and Zhiliang Zhu Software College, Northeastern University, No.11, Lane 3, Wenhua Road, Shenyang 110819, China

ABSTRACT In graph theory, Hamiltonian path refers to the path that visits each vertex exactly once. In this paper, we designed a method to generate random Hamiltonian path within digital images, which is equivalent to permutation in image encryption. By these means, building a Hamiltonian path across bit planes can shuffle the distribution of the pixel’s bits. Furthermore, a similar

Citation: Zhang, W., Wang, S., Han, W., Yu, H., & Zhu, Z. (2020). An Image Encryption Algorithm Based on Random Hamiltonian Path. Entropy, 22(1), 73. (18 pages) DOI: https://doi.org/10.3390/e22010073 URL: https://www.mdpi.com/1099-4300/22/1/73/ htm Copyright: © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

6

Graphs: Theory and Algorithms

thought can be applied for the substitution of pixel’s grey levels. To ensure the randomness of the generated Hamiltonian path, an adjusted Bernoulli map is proposed. By adopting these novel techniques, a bit-level image encryption scheme was devised. Evaluation of simulation results proves that the proposed scheme reached fair performance. In addition, a common flaw in calculating correlation coefficients of adjacent pixels was pinpointed by us. After enhancement, correlation coefficient becomes a stricter criterion for image encryption algorithms. Keywords: image encryption; Hamiltonian path; Bernoulli map; chaotic system

INTRODUCTION When computers and the internet came on the scene, here came the era of information, accompanied by the formidable challenge of information security. Among complicated information, the vivid multimedia information is preferred by people, especially digital images. Consequently, such information involves both collective interests and personal interests. For instance, images of military affairs are related to the safety of whole country. Privacy and copyright of images influence everyone’s peace of mind. To protect the rights of image’s owners, methods like steganography, watermarking, and encryption are frequently utilized [1]. Among these techniques, encryption is a direct and thorough means. Nowadays, image encryption is an inviting and fruitful field, and many imaginative image encryption algorithms are proposed. One picture is worth more than ten thousand words, and there indeed are tens of thousands of pixels in a digital image. To encrypt the bulk data of images, traditional cryptosystems are not efficient enough. Among specific image encryption schemes, the permutation–diffusion structure is widely used. Essentially, permutation is to rearrange image pixels on different dimensions. In [2], 2D CMT (chaotic magic transform) was proposed for permutation. In [3], image scrambling was performed by a parametric 2D Sudoku matrix. In [4], horizontal and vertical wave lines were utilized to realize row rotation and column rotation. This is also a 2D method. In [5], spatial permutation was performed on a 3D bit matrix by using orthogonal Latin cubes. Moreover, file-based algorithms like [6] deem images as 1D binary files when realizing permutation. Considering the features of bit distribution in digital images, encryption schemes [7,8] with bit-level permutation are proposed.

An Image Encryption Algorithm Based on Random Hamiltonian Path

7

Sometimes, the permutation phase is accompanied by a sort operation, such as [2]. However, time complexity of sorting is usually nonlinear. To obtain high efficiency, the additional operation should be avoided. Regarding an image as a 1D pixel array, permutation can be depicted as an arrangement of pixels, which is represented in the bijective map from plain image to permuted image. If we connect the pixels by order of the arrangement, all the pixels are traversed exactly once. Deeming pixels as the vertices of a graph, such a path of traversing is known as a Hamiltonian path in graph theory. Conversely, a Hamiltonian path corresponds to an arrangement of permutations. Following this thought, the method of building Hamiltonian path is equal to a permutation scheme. As a Hamiltonian path can be generated without sort operation, the corresponding permutation algorithm has the advantage of efficiency. In cryptography, substitution is a classical method of cipher schemes. To substitute pixel’s grey value, arrangement of all the possible grey levels is requisite. Hence, the thought of random Hamiltonian path is also suitable for the substitution of grey levels. It is common knowledge that chaotic systems have conspicuous advantages for cryptosystems. High dimension chaotic systems possess complex chaotic behavior, while 1D chaotic systems are convenient for implementation. Under synthetical consideration, some combined chaotic maps have been explored in recent encryption schemes [2, 9,10,11,12]. Just resembling the series-parallel connection of resistors in circuits, these chaotic maps are a combination or adjustment of the original chaotic maps. Multiple chaotic maps can be coupled as CML (chaotic map lattice) [13, 14]. By these means, chaotic behavior is magnified, leading to better chaotic performance. In this paper, a new chaotic map was also explored. There are some innovative works in this paper: •

• • •

A method of building a random Hamiltonian path within digital images was designed, which is equivalent to permutation. On this basis, bit-level permutation of high efficiency was achieved. Following the thought of the random Hamiltonian path, arrays for grey levels’ substitution can be generated. An adjusted Bernoulli map is proposed, which is suitable for image encryption schemes. The ambiguous definition of diagonal direction is normalized to two orthogonal directions when calculating correlation coefficients.

8

Graphs: Theory and Algorithms

The rest of this paper is organized as following: Section 2 explains the Hamiltonian path and the procedures to generate such paths within images. Section 3 expounds the adjusted Bernoulli map. In Section 4 the proposed scheme is thoroughly introduced. The results of simulation experiments are exhibited in Section 5. Section 6 is the summary of the entire paper.

HAMILTONIAN PATH As was mentioned earlier, the scheme of generating a random Hamiltonian path is tantamount to permutation. In this section, the relevant theories are presented, while the method of generating a Hamiltonian path is proposed.

Basic Theory of Hamiltonian Path Graph theory is a classical branch of mathematics. The term graph refers to the figures composed by points and the connecting lines between the points. Commonly, the points are called vertices, and the lines are called edges. The definition of graph is G = (V, E). Here V is a nonempty set of finite vertices and the set of edges E = {(x, y)|x, y ∈ V}. If the vertex pair (x, y) in E is ordered, the graph is named a direct graph. Otherwise, it is named an undirected graph. In an undirected graph, a path P is a sequence of vertices v1 v2 … vk, and there exists anedge between each of the vertex pairs vi vi+1. The k is the number of vertices that P contains, in other words, the length of P. There are two special categories of path, Euler path and Hamiltonian path. Euler path refers to the paths that traverse each edge once and only once. A famous instance is the problem of Konigsberg bridges [15]. In 1736, Leonhard Euler had proved that there is no solution for the problem. This is known as the beginning of graph theory. Hamiltonian path refers to the paths that traverse each node once and only once, or an arrangement of all vertices in which every adjacent vertex pair is connected by at least one edge. The problem of Hamiltonian path can be traced back to 1859, when Willian Hamilton talked about a mathematical game: traverse all the vertices of a dodecahedron and pass by the vertices exactly once. Figure 1 is the graphic illustration of the two famous problems.

An Image Encryption Algorithm Based on Random Hamiltonian Path

9

Figure 1. (a) The problem of Konigsberg bridges and (b) Hamiltonian path.

Graphs are generally intricate. The problem of finding a Hamiltonian path is a nondeterministic polynomial complete problem (NP-C problem), one of the most burdensome challenges in mathematics [16,17,18,19]. Off the beaten track, DNA computing [20] and light-based computers [21] have been developed to solve this problem efficiently. However, generating a Hamiltonian path within digital images can be much easier.

Hamiltonian Path within Digital Images For an undirected graph of N vertices, there are, at most, N × (N + 1)/2 edges. Under this condition, any two vertices are connected by an edge. Such a graph is called a complete graph. Some instances of complete graphs are shown in Figure 2. The complete graphs with three nodes, four nodes, and five nodes are shown in Figure 2a–c, respectively.

Figure 2. Complete graphs. (a) Three nodes, (b) Four nodes, (c) Five nodes.

There are varieties of theorems to measure whether there are Hamiltonian paths in a graph. One of the theorems is as below:

10

Graphs: Theory and Algorithms

Dirac theorem: In a graph G of N vertices, if for each vertex vi there always is d(vi) ≥ N/2, then at least one Hamiltonian path exists in G. The d(vi), otherwise called the degree of vi, represents the quantity of edges connected with vi. In our scheme, digital images were regarded as complete graphs. Hereof, the pixels are the vertices, and there is an edge between every two pixels. According to Dirac theorem, there always exist Hamiltonian paths in such graphs. To build a Hamiltonian path within an image, pixels are divided into two parts. One is composed by the pixels that have been added into the path. The other one is composed by the rest of the pixels. Firstly, a pixel is chosen to be the path’s outset. Then, the other pixels are added to the path one by one. If the image’s size is M × N and its pixels are {P1, P2, …, PM×N}, this progress can be generalized as the following steps: Step 1: Choose a pixel from {P1, P2, …, PM×N} and put it in the position of PM×N. Step 2: Choose a pixel from {P1, P2, …, PM×N-1} and put it in the position of PM×N-1. Step 3: Choose a pixel from {P1, P2, …, PM×N-2} and put it in the position of PM×N-2. Step M×N–1: Choose a pixel from {P1, P2} and put it in the position of P2. In the above process, pixels that have been added into the path are insulated at the back of image’s pixel array. Among the whole image, these pixels are deemed as permutated pixels. The complete process is shown in Figure 3. Figure 4 illustrates the generated Hamiltonian path from a graph perspective.

Figure 3. Process of generating a Hamiltonian path in an image of size 3 × 3. The blue pixels are chosen, and the grey pixels have been added into the path.

An Image Encryption Algorithm Based on Random Hamiltonian Path

11

Figure 4. Generated Hamiltonian path. The red pixel is the beginning of the path and the blue pixel is the rear of the path.

Hamiltonian Path across Bit Planes In [7], the intrinsic features of bit distribution in digital images were revealed. Higher bits of pixels hold higher weight of an image’s information, and there are strong correlations among the higher bit planes. In the instance of Figure 5, the 8th bit plane and the 7th bit plane tend to have opposite values. These features shall not be neglected in a secure cryptosystem.

Figure 5. Bit planes of Lena, (a–h) are from the 8th bit plane to the 1st bit plane, respectively.

To build a Hamiltonian path across bit planes, the strategy of [7] was extended to greyscale images in this paper. By these means, a plain image of size M × N is expanded to 2M × 2N. All the bit planes of a plain image’s pixels were placed to the 1st bit plane and the 2nd bit plane of the expanded image’s pixels. After generating a Hamiltonian path, the bit planes were restored, and a permutated image of size M × N was formed. The whole procedure can be generalized into Figure 6.

12

Graphs: Theory and Algorithms

Figure 6. Modified expand–shrink strategy.

Figure 7 is the illustration of the generated Hamiltonian path across bit planes.

Figure 7. Hamiltonian path across bit planes.

ADJUSTED BERNOULLI MAP To ensure the randomness of the Hamiltonian path, chaotic maps can serve as pseudo random number generators. Theoretically, any 1D chaotic map is compatible. In this section, an adjusted Bernoulli map is proposed.

Bernoulli Map The original definition of Bernoulli map [22,23] is given by:

(1)

An Image Encryption Algorithm Based on Random Hamiltonian Path

13

The piecewise linear property of a Bernoulli map is demonstrated in Figure 8. When implemented into discrete computer systems, the map resembles bit shifting of floating numbers. Such degradation means that the original Bernoulli map is seldom applied to encryption algorithms directly.

Figure 8. Bernoulli map.

Adjusted Bernoulli Map To amplify the limited nonlinear property of original Bernoulli map, cascaded modulus operations are adopted in the adjusted Bernoulli map (ABM). (2) The parameters α and β can be many of the floating-point numbers that are bigger than two. Though the multiplication operation is linear in mathematics, the multiplication operation in computer systems involves the conversion between decimal number and binary number. The ABM possesses fair chaotic behavior in practice, especially when the parameters α and β are random. Owing to the finite precision of computers, the ABM does not work well when its parameters are big numbers, and special values such as 2N and 10N should be avoided. Here the N is the set of natural numbers. Part of the parameters’ value range is shown in Figure 9.

14

Graphs: Theory and Algorithms

Figure 9. Bifurcation diagrams of ABM. (a) β = 3. The value of α is increased by 0.1, ranging from 2.1 to 202.1. (b) α = 3. The value of β is increased by 0.1, ranging from 2.1 to 202.1.

To examine the randomness of the pseudo-random numbers generated by the ABM, the NIST SP800-22 test suite [24] was utilized. In our experiment, 300 bitstreams of length 106 were generated and tested. The α = 10.45678 and β = 10.123 in these bitstreams. The initial value of x was increased by 0.0033, ranging from 0.001 to 0.991. The test results are listed in Table 1. Table 1. Randomness test using NIST SP800-22 test suite. Statistical Tests Frequency Block frequency Cumulative Sums * Runs Longest run Rank FFT Non overlapping template * Overlapping template Universal Approximate entropy Random excursions * Random excursions variant * Serial * Linear complexity

∗ Average value of multiple tests.

P-value 0.798139

Pass Rate (%) 100.00

0.108791

99.33

0.282804 0.588652 0.245072 0.319084 0.280306 0.468139 0.425059

99.83 99.67 99.33 100.00 99.00 98.95 98.00

0.449672 0.561227

99.33 99.67

0.533005 0.419542 0.464632 0.915745

98.95 99.27 98.83 99.33

An Image Encryption Algorithm Based on Random Hamiltonian Path

15

PROPOSED SCHEME After the progress of Section 2, a bit-level permutation was completed. To obtain fair diffusion properties, XOR operations were performed on pixels, and their grey values were substituted dynamically. The whole encryption scheme is detailed in this section.

Encryption Algorithm As is illustrated in Figure 10, the whole cryptosystem is handled by ABM. The inputs of the algorithm are the plain image P of size M × N and the parameters of ABM. The output is the cipher image C.

Figure 10. Encryption progress.

The whole encryption process is as below: Step 1: Read in P. Iterate ABM to avoid transient effect. Step 2:Decompose P’s bit planes. Make a montage of these bit planes to obtain an image B of size 2M × 2N. Step 3: For i = 2M × 2N, 2M × 2N − 1, 2M × 2N − 2, …, 3, 2, iterate ABM to generate pseudo random number ri and use Equation (3) to quantize it. Swap M’s ith pixel Mi and jth pixel Mj. (3) Step 4: Merge the decomposed bit planes to obtain the permutated image H of size M × N. Step 5: Initialize two 1D arrays or vectors S and T by Equation (4). For i = 0, 1, 2, …, 255,

16

Graphs: Theory and Algorithms

Si=Ti=i.

(4)

Step 6: For i = 255, 254, …, 2, 1, iterate ABM to generate pseudo random number ui and use Equation (5) to quantize it. Swap Si and Sj. j=round(ui×1014)modi. (5)

Step 7: For i = 255, 254, …, 2, 1, iterate ABM to generate pseudo random number pi and use Equation (6) to quantize it. Swap Ti and Tj. j=round(pi×1014)modi (6) Step 8: For i = 1, 2, …, M × N − 1, use Equation (7) to diffuse H’s pixel Hi+1. Here, ai is the pseudo random number generated by ABM. (7) Step 9: For i = M × N, M × N − 1, …, 3, 2, use Equation (8) to diffuse H’s pixel Hi−1. Here, bi is the pseudo random number generated by ABM. (8) Step 10: Save H as the C.

DISCUSSION In Step 2, Step 3, and Step 4, the permutation phase that works on a bit-level was performed. According to the method of building a Hamiltonian path, two arrays were generated for grey value’s substitution in Step 5, 6, and 7. The arrays were arrangements of integers from 0 to 255, in accordance with pixel’s grey levels. As the arrays were randomly generated, there were 256! ≈ 8.578 × 10506 possible arrangements. In this way, the modification of plain images could be amplified and transmitted in Step 8 and Step 9, causing an avalanche effect.

Decryption Algorithm The decryption algorithm is the reverse progress of the encryption algorithm, as can be seen from Figure 11.

An Image Encryption Algorithm Based on Random Hamiltonian Path

17

Figure 11. Decryption progress.

In the encryption process, the substitution is realized by arrays S and T. The reverse operation of substitution needs the inverse map of S and T, which can be generated by Equation (9). (9) In the formula, i = 255, 254, …, 2, 1, 0. The S’ and T’ are the inverse maps of S and T, respectively.

SIMULATION EXPERIMENTS To check the performance of the proposed scheme, the results of simulation experiments were evaluated by several criteria in this section. Our experimental environment was a desktop PC with 64-bit Windows 10 OS, Intel i7-2600 CPU, and 8GB RAM. The programming language was C++, and the developing environments were Visual Studio 2019 and OpenCV 4.1.0. The test images were chosen from SIPI image database [25].

Secret Key Analysis Secret key is an indispensable component of a cryptosystem. The key space is suggested to be no less than 2100 [26]. The secret keys of the proposed scheme are parameters of ABM. In our simulation experiment, the data type of the keys was double precision floating point numbers. According to IEEE 754 standard, each key occupies 8 Bytes and owns significant digit of 52 bits. The structure of secret key is as shown in Figure 12, and the key space is bigger than 2100.

18

Graphs: Theory and Algorithms

Figure 12. Encryption process.

To examine the key sensitivity in the encryption process and decryption process, a strict test for bit change rate—NBCR (the number of bit change rate) [27]—was utilized:

(10) In the above formula, C1 and C2 are two images of size M × N and bitdepth d. Ham(C1,C2) represents the Hamming distance between C1 and C2; in other words, the number of different bits between the two images. The calculation results of NBCR should be close to 50%, which indicates that around 50% of the bits are different between C1 and C2. In our work, three groups of modified keys were set as the illegal keys. These illegal keys are utilized to encrypt the plain images in the encryption process and decrypt the cipher images in the decryption process. The obtained encrypted images and decrypted images were made in comparison with the original plain images and cipher images. The NBCRs are listed in Table 2. Table 2. Key sensitivity (Δ = 0.00000000000001).

Key sensitivity in encryption process

Boat (512 × 512)

Couple (512 × 512)

Tank (512 × 512)

Male (1024 × 1024)

Clock (256 × 256)

(x0 + Δ, α, β)

0.499321

0.499997

0.500057

0.499904

0.501156

(x0, α + Δ, β)

0.499923

0.500155

0.499765

0.499937

0.500164

0.500076

0.500499

0.500078

0.500856

(x0, α, 0.49994 β + Δ)

An Image Encryption Algorithm Based on Random Hamiltonian Path Key sensitivity in decryption process

(x0 + Δ, α, β)

0.500289

0.499741

0.500082

0.499858

0.500328

(x0, α + Δ, β)

0.500154

0.499415

0.5001

0.49992

0.501308

(x0, α, 0.499747 β + Δ)

0.500337

0.499742

0.49986

0.499378

19

HISTOGRAMS The histogram is the foundation of various spatial image processing techniques, e.g., image enhancement. Moreover, the inherent information of histograms is useful in image compression and segmentation. For an image of size M × N and bit-depth d, the histogram is a discrete function: (11) Here i = 0, 1, 2, …, 2d − 1, qi is the pixels’ quantity of grey value ri. The variance of histogram can be calculated by Equation (12).

(12) The μh is the arithmetic mean value of qi. The histogram of a cipher image should be relatively uniform. After encryption, the variance of an image’s histogram should be reduced. In Table 3, the variance of several images’ histograms are listed. Table 3. Variance of histograms. Plain Image

Cipher Image

Chemical plant (256 × 256)

50,326.4

248.469

Clock (256 × 256) Moon surface (256 × 256) Boat (512 × 512) Couple (512 × 512) Lena (512 × 512) Tank (512 × 512)

282,062 135,688 1,535,880 1,195,460 632,254 8,103,600

248.328 248.094 1137.66 1002.11 986.281 1043.73

20

Graphs: Theory and Algorithms Airplane (1024 × 1024) Airport (1024 × 1024) Male (1024 × 1024)

115,199,000 31,596,400 11,349,400

3783.7 3832.03 4412.8

In practice, histograms are often normalized by Equation (13). (13) After normalization, p(ri) represents the emergence probability of the ith grey value. The normalized histograms of plain images and cipher images are shown in Figure 13.

Figure 13. Histograms. (a) Plain image boat; (b) plain image male; (c) plain image clock; (d) histogram of plaintext boat; (e) histogram of plaintext male; (f) histogram of plaintext clock; (g) cipher image boat; (h) cipher image male; (i) cipher image clock; (j) histogram of cyphertext boat; (k) histogram of cyphertext male; (l) histogram of cyphertext clock.

An Image Encryption Algorithm Based on Random Hamiltonian Path

21

Information Entropy Information entropy was proposed by C. E. Shannon, which is a measurement of information’s randomness. For a digital image, it is hard to predict the content if its information entropy is high. The calculation formula of information entropy is as shown in Equation (14). Here, the p(ri) is identical to Equation (13).

(14) The ideal value of a cipher image’s information entropy is its bit-depth d. In Table 4, the information entropy of plain images and cipher images are listed. Table 4. Information entropy. Plain Image

Proposed Scheme

[2]

[4]

[28]

Chemical plant(256 × 256)

7.34243

7.99725

7.99716

7.99692

7.99683

Clock(256 × 256)

6.70567

7.99727

7.99726

7.99692

7.99705

Moon surface(256 × 256)

6.70931

7.99725

7.99738

7.9974

7.9972

Boat(512 × 512)

7.19137

7.99922

7.99934

7.9994

7.99921

Couple(512 × 512)

7.20101

7.99931

7.99934

7.99931

7.99936

Lena(512 × 512)

7.44551

7.99932

7.99929

7.99934

7.99932

Tank(512 × 512)

5.49574

7.99928

7.99934

7.99923

7.99934

Airplane(1024 × 1024)

5.64145

7.99984

7.99984

7.99983

7.99981

Airport(1024 × 1024)

6.83033

7.99984

7.99983

7.99981

7.99983

Male(1024 × 1024)

7.52374

7.99981

7.99978

7.99981

7.99981

Differential Attack To resist differential attacks, tiny modification in plain images should cause massive changes in the cipher image. This is known as diffusion property in cryptography. NPCR (number of pixel change rate) and UACI (unified averaged changed intensity) are two common indicators for an algorithm’s ability of resisting differential attacks [29]. If C1 and C2 are two images of size M × N and bit-depth d, then

22

Graphs: Theory and Algorithms

(15) (16) Here, (17) In our experiment, the plain image boat of size 512 × 512 was utilized for evaluating the diffusion effect. Some pixels were chosen in the image, and the last bit of these pixels were reversed, respectively. Then, the modified images were encrypted. As can be seen from Table 5, the NPCRs and UACIs were close to theoretical values after two encryption rounds. Table 5. Results of NPCR and UACI. Index of Modified Pixel 0 255 511 65,151 65,407 130,560 130,816 131,071 196,096 196,352 261,632 261,888 262,143 Theoretical value

NPCR (1 Round)

UACI (1 Round)

NPCR (2 Rounds)

UACI (2 Rounds)

0.996983 0.99733 0.996616 0.99897 0.998333 0.996365 0.99995 0.999985 0.999943 0.999031 0.9981 0.999249 0.997608 0.996094

0.3349 0.335224 0.334727 0.3355 0.335641 0.333719 0.335614 0.335876 0.336146 0.335429 0.335204 0.335551 0.335361 0.334635

0.995941 0.996063 0.996143 0.996078 0.996254 0.995861 0.996147 0.996216 0.995804 0.996269 0.996181 0.996037 0.995998 0.996094

0.335169 0.3338 0.333902 0.333911 0.334896 0.334763 0.334734 0.335797 0.333875 0.334645 0.333829 0.334519 0.334605 0.334635

An Image Encryption Algorithm Based on Random Hamiltonian Path

23

Correlation Coefficients Plain images usually are redundant in the spatial domain, which means that adjacent pixels are highly correlated. Whereas, in cipher images, such a correlation should be broken. To measure the correlation between adjacent pixels, we calculated correlation coefficients as below:

(18) The x and y are pixel vectors of the same length. The μx and μy are their arithmetic mean values, and the σx and σy are their standard deviations. The range of correlation coefficients is [−1, 1]. If x and y are not correlated, rxy shall be close to 0. Commonly, the adjacent pixels of three directions are calculated, respectively horizontal, vertical, and diagonal. Whereas, there exist two orthogonal diagonal directions in 2D matrices of pixels—the principal diagonal direction (from upper-left to lower-right) and the minor diagonal direction (from upper-right to lower-left). For instance, in a pixel block , p1 and p4 are adjacent in the principal diagonal direction, while p2 and p3 are adjacent in the minor diagonal direction. In the field of image encryption, the definition of diagonal direction is usually ambiguous. However, the two diagonal directions are nonequivalent for some image processing techniques and image encryption algorithms [30,31,32,33,34]. Under the extreme circumstances in Figure 14 and Figure 15, it is not enough to calculate only three of the four directions.

Figure 14. One example. (a) An image in which all adjacent pixels of minor diagonal direction are equal; (b) its scatter plots.

24

Graphs: Theory and Algorithms

Figure 15. Another example. (a) An image in which all adjacent pixels of principal diagonal direction are equal; (b) its scatter plots.

For all the plain images and cipher images in our experiments, correlation coefficients of 10,000 adjacent pixel pairs in each of the four directions were calculated. The results are listed in Table 6. Table 6. Correlation coefficients.

Plain image

Proposed scheme

[2]

Boat

Male

Clock

Figure 14a

Figure 15a

Horizontal

0.936502

0.978016

0.956658

−0.0104305

−0.0347679

Vertical

0.970165

0.981711

0.973594

−0.0262352

−0.0253942

Principal diagonal

0.922103

0.965681

0.940988

0.0309025

1

Minor diagonal

0.924285

0.967724

0.934225

1

0.00261237

Horizontal

−0.00790818

−0.00530914

−0.00627326

0.0106215

0.00724055

Vertical

−0.0032019

−0.00593338

−0.00787923

0.0000261687

0.00380631

Principal diagonal

−0.00753223

0.0180877

−0.00519652

0.000707789

−0.0161754

Minor diagonal

0.001262

0.00994515

−0.00729377

−0.0060987

0.00262413

Horizontal

0.0272732

0.00444088

0.0105537

0.00498476

−0.00189389

Vertical

−0.0321433

−0.000856047

−0.00733777

−0.0126175

0.0129397

Principal diagonal

−0.00603878

−0.00964336

−0.0138118

0.0118652

−0.0067004

Minor diagonal

−0.0013256

0.0046903

−0.00501911

0.00299615

−0.0185178

An Image Encryption Algorithm Based on Random Hamiltonian Path [4]

[28]

25

Horizontal

0.00361182

−0.00595886

−0.00236848

0.000340202

−0.0171794

Vertical

0.00145023

−0.0103426

−0.00437046

0.00520304

0.00879099

Principal diagonal

0.00395435

0.00305054

−0.000705693

−0.0120762

−0.00874923

Minor diagonal

−0.000165327

0.00232492

0.000369637

−0.00743531

−0.00299425

Horizontal

0.00899491

0.00754775

0.000411031

0.0057195

−.00967912

Vertical

−0.0041634

0.000629605

−0.00538419

−0.00266845

0.0105734

Principal diagonal

−0.00463651

0.00000710876

0.011115

−0.0034216

−0.00922371

Minor diagonal

0.0127711

0.00677395

0.00671256

0.0121195

0.00781511

Efficiency In the proposed scheme, bit-level permutation is performed in linear time complexity. Meanwhile, the diffusion phase is also linear. If the encrypted image is of size M × N, the algorithm’s time complexity is O(MN). The time complexity of the algorithm in [4] is also O(MN). However, our bit-level scheme is slower than the pixel-level scheme of [4]. In [2], permutation was companied by sorting operation. Thus, the scheme’s efficiency was related to the adopted sorting algorithm. In [28], the algorithm’s time complexity is O(MN(M + N)). The comparison between these algorithms’ efficiency is presented in Table 7. Table 7. Efficiency of algorithms. Proposed Tcheme

[2]

[4]

[28]

Chemical plant(256 × 256)

0.008 s

0.02 s

0.004 s

0.03 s

Clock(256 × 256)

0.008 s

0.019 s

0.003 s

0.028 s

Moon surface(256 × 256)

0.008 s

0.019 s

0.003 s

0.029 s

Boat(512 × 512)

0.029 s

0.093 s

0.011 s

0.236 s

Couple(512 × 512)

0.03 s

0.094 s

0.012 s

0.234 s

Lena(512 ×512)

0.028 s

0.09 1s

0.009 s

0.239 s

Tank(512 × 512)

0.029 s

0.091 s

0.011 s

0.243 s

Airplane(1024 × 1024)

0.129 s

0.441 s

0.032 s

2.392 s

Airport(1024 × 1024)

0.121 s

0.44 7s

0.033 s

2.388 s

26

Graphs: Theory and Algorithms Male(1024 × 1024)

0.116 s

0.434 s

0.033 s

2.398 s

Average throughput

66.062 Mbps

21.884 Mbps

194.571 Mbps

9.542 Mbps

CONCLUSIONS In this paper, a 1D adjusted Bernoulli map is proposed, which is suitable for encryption systems. Based on the new chaotic map, an innovative image encryption algorithm was designed. The permutation phase was realized by generating a random Hamiltonian path, which was performed across different bit planes. Then, the idea of random Hamiltonian path was extended for substitution of grey levels in the diffusion phase. Various criterions indicate that our scheme had a pretty good performance. Besides, for measuring the correlation of adjacent pixels more reasonably, both the principal diagonal direction and the minor diagonal direction are involved when calculating correlation coefficients.

ACKNOWLEDGMENTS We thank the anonymous reviewers for their helpful suggestions.

CONFLICTS OF INTEREST The authors declare no conflict of interest.

An Image Encryption Algorithm Based on Random Hamiltonian Path

27

REFERENCES 1. 2. 3. 4.

5. 6. 7.

8.

9.

10. 11. 12. 13.

14.

Khan, M.; Shah, T. A literature review on image encryption techniques. 3D Res. 2014, 5, 29. Hua, Z.Y.; Zhou, Y.C.; Pun, C.M.; Chen, C.L.P. 2D sine logistic modulation map for image encryption. Inf. Sci. 2015, 297, 80–94. Wu, Y.; Zhou, Y.; Agaian, S.; Noonan, J.P. 2D Sudoku associated bijections for image scrambling. Inf. Sci. 2016, 327, 91–109. Ye, G.; Zhao, H.; Chai, H. Chaotic image encryption algorithm using wave-line permutation and block diffusion. Nonlinear Dyn. 2016, 83, 2067–2077. Xu, M.; Tian, Z. A novel image cipher based on 3D bit matrix and latin cubes. Inf. Sci. 2019, 478, 1–14. Diaconu, A.V. Circular inter-intra bit-level permutation and chaosbased image encryption. Inf. Sci. 2016, 355, 314–327. Zhang, W.; Wong, K.; Yu, H.; Zhu, Z. A symmetric color image encryption algorithm using the intrinsic features of bit distributions. Commun. Nonlinear Sci. Numer. Simul. 2013, 18, 584–600. Cao, C.; Sun, K.; Liu, W. A novel bit-level image encryption algorithm based on 2D-LICM hyperchaotic map. Signal Process. 2018, 143, 122–133. Hua, Z.; Zhou, Y.; Chen, C.L.P. A new series-wound framework for generating 1D chaotic maps. In Proceedings of the 2013 IEEE Digital Signal Processing and Signal Processing Education Meeting, Napa, CA, USA, 11–14 August 2013; pp. 118–123. Lan, R.; He, J.; Wang, S.; Gu, T.; Luo, X. Integrated chaotic systems for image encryption. Signal Process. 2018, 147, 133–145. Pak, C.; Huang, L. A new color image encryption using combination of the 1D chaotic map. Signal Process. 2017, 138, 129–137. Zhou, Y.; Bao, L.; Chen, C.L.P. Image encryption using a new parametric switching chaotic system. Signal Process. 2013, 93, 3039–3052. Chapaneri, S.; Chapaneri, R.; Sarode, T. Evaluation of Chaotic Map Lattice Systems for Image Encryption. In Proceedings of the 2014 International Conference on Circuits, Systems, Communication and Information Technology Applications (CSCITA), Mumbai, India, 4–5 April 2014; pp. 59–64. Tang, Y.; Wang, Z.; Fang, J. Image encryption using chaotic coupled

28

15. 16. 17. 18.

19. 20.

21. 22.

23.

24.

25. 26. 27.

Graphs: Theory and Algorithms

map lattices with time-varying delays. Commun. Nonlinear Sci. Numer. Simul. 2010, 15, 2456–2468. Bellman, R.; Cooke, K.L. The Konigsberg bridges problem generalized. J. Math. Anal. Appl. 1969, 25, 1–7. Bax, E.T. Inclusion and exclusion algorithm for the Hamiltonian path problem. Inf. Process. Lett. 1993, 47, 203–207. Bertossi, A.A. The edge Hamiltonian path problem is NP-complete. Inf. Process. Lett. 1981, 13, 157–159. Conrad, A.; Hindrichs, T.; Morsy, H.; Wegener, I. Solution of the knight’s Hamiltonian path problem on chessboards. Discret. Appl. Math. 1994, 50, 125–134. Schiermeyer, I. Problems remaining NP-complete for sparse or dense graphs. Discuss. Math. Gr. Theory 1995, 15, 33–41. Baumgardner, J.; Acker, K.; Adefuye, O.; Crowley, T.S.; DeLoache, W.; Dickson, J.O.; Heard, L.; Martens, A.; Morton, N.; Ritter, M.; et al. Solving a hamiltonian path problem with a bacterial computer. J. Biol. Eng. 2009, 3, 11. Oltean, M. Solving the Hamiltonian path problem with a light-based computer. Nat. Comput. 2007, 7, 57–70. Zhang, W.; Zhu, Z.; Yu, H. A symmetric image encryption algorithm based on a coupled logistic–bernoulli map and cellular automata diffusion strategy. Entropy 2019, 21, 504. Saito, A.; Yamaguchi, A. Pseudorandom number generator based on the Bernoulli map on cubic algebraic integers. Chaos Interdiscip. J. Nonlinear Sci. 2017, 28, 1054–1500. Dong, L.; Yong, Z.; Ji, L.; Han, X. Study on the Pass Rate of NIST SP800-22 Statistical Test Suite. In Proceedings of the 2014 Tenth International Conference on Computational Intelligence and Security (CIS), Kunming, China, 15–16 November 2014; pp. 402–404. The USC-SIPI Image Database. Available online: http://sipi.usc.edu/ database/database.php Alvarez, G.; Li, S. Some basic cryptographic requirements for chaosbased cryptosystems. Int. J. Bifurc. Chaos 2006, 16, 2129–2151. Castro, J.C.H.; Sierra, J.M.; Seznec, A.; Izquierdo, A.; Ribagorda, A. The strict avalanche criterion randomness test. Math. Comput. Simul. 2005, 68, 1–7.

An Image Encryption Algorithm Based on Random Hamiltonian Path

29

28. Zhang, Y. The unified image encryption algorithm based on chaos and cubic S-Box. Inf. Sci. 2018, 450, 361–377. 29. Wu, Y.; Noonan, J.P.; Agaian, S. NPCR and UACI randomness tests for image encryption. Cyber J. Multidiscip. J. Sci. Technol. J. Sel. Areas Telecommun. 2011, 7714, 31–38. 30. Ji, X.; Bai, S.; Guo, Y.; Guo, H. A new security solution to JPEG using hyper-chaotic system and modified zigzag scan coding. Commun. Nonlinear Sci. Numer. Simul. 2015, 22, 321–333. 31. Maniccam, S.S.; Bourbakis, N.G. Image and video encryption using SCAN patterns. Pattern Recognit. 2004, 37, 725–737. 32. Ramasamy, P.; Ranganathan, V.; Kadry, S.; Damaševičius, R.; Blažauskas, T. An image encryption scheme based on block scrambling, modified zigzag transformation and key generation using enhanced logistic—Tent map. Entropy 2019, 21, 656. 33. Richter, T. Lossless coding extensions for JPEG. In Proceedings of the Data Compression Conference, Snowbird, UT, USA, 7–9 April 2015; pp. 143–152. 34. Chai, X.; Zheng, X.; Gan, Z.; Han, D.; Chen, Y. An image encryption algorithm based on chaotic system and compressive sensing. Signal Process. 2018, 148, 124–144.

Traveling in Networks with Blinking Nodes

3

Braxton Carrigan1 amd James Hammer2 1.

Southern CT State University

2.

Cedar Crest College

ABSTRACT We say that a blinking node system modulo n is an ordered pair (G, L) where G is a graph and L is an on-labelling which indicates when vertices can be visited. An On-Hamiltonian walk is a sequence of all the vertices of G such that the position of each vertex modulo n is an integer of the label of that vertex. This paper will primarily investigate finding the shortest On-Hamiltonian walks in a blinking node system on complete graphs and

Citation: (APA): Carrigan, B., & Hammer, J. (2018). Traveling in Networks with Blinking Nodes. Theory and Applications of Graphs, 5(1), 2. (11 pages) DOI: https:// doi.org/10.20429/tag.2018.050102 Copyright: © This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

32

Graphs: Theory and Algorithms

complete bipartite graphs but also establishes the terminology and initial observations for working with blinking node systems on other graphs.

INTRODUCTION Consider the situation where you wish to travel in a graph and visit every vertex; however, the vertices are labelled with discrete times for which they are able to be visited. Thus, a vertex cannot be visited at a time for which it is not labeled. This can be thought of as the vertices blinking “on” and “off” referring to when the vertex can be visited. Problem (Traveling Baseball Fan). Imagine a baseball fan wishes to see a game in every baseball stadium. A stadium would be considered “on” if a game is played on that night in the stadium (the lights of that stadium are on). Natural questions arise about how one might travel to see a baseball game in every city. Other variations arise naturally where a city may be visited more than once.This is obviously not unique to our baseball analogy, as it seems natural to think of moving inside any network with restriction of when a node may be visited; for instance, virus scans on servers in use, a janitor cleaning rooms at a university while classes are in session, etc. So any scheduling problem that relies on an object travelling through a graph to “check” on every node, while being careful to visit nodes at allotted times can be viewed in this fashion. While many applications may also incorporate other such restrictions such as those discussed by travelling salesman [6], vehicle routing [2], and travelling purchaser problems [8], we wish to assume the baseball fan can travel from one stadium to the next before the next game takes place, without financial or time constraints. Definition 1.1. Let B = (G, L) be a blinking node system modulo n, where G is a graph and L: V (G) → P \ ∅ is a function known as an onlabelling. When the underlying graph is a complete graph on n vertices and the labellings are done modulo n, we denote the blinking node system as BNS(n). The blinking node system shown in Figure 1 shows a walk [a, b, g, i, d, c, h, j, e, a, f ] highlighted in red which only “visits” a vertex when it is “on.” Observation 1. If n 6= |V (G)|, it is usually possible to re-label the vertices modulo |V (G)|. For instance you can re-label vertex a in Figure 1 as {0, 2, 4, 6, 8}. However one needs to be careful doing so, as similarly relabeling vertex c to {0, 1, 3, 6, 7, 9} may cause issues, because 10 ≡ 0 (mod 10) but 10 ≡ 4 (mod 6).

Traveling in Networks with Blinking Nodes

33

While other variations may be interesting, the above observation leads us to focus our results on blinking node systems with modulo |V (G)| in this paper.

Figure 1: A Blinking Node System on the Petersen Graph modulo 5

Definition 1.2. Let W be a walk of length w − 1 represented by a sequence of w vertices in the graph G of a blinking node system. W is called an on-walk if for all 1 ≤ i ≤ w the vertex vi, in the i th position of W, i − 1 (mod n) ∈ L(vi).

More precisely we will be looking at on-walks that visit every vertex in G, which we will call On-Hamiltonian walks (not to be confused with Hamiltonian walks, which are closed). Furthermore an On-Hamiltonian walk of length |V (G)| − 1 will be called an On Hamiltonian path. “Are there blinking node systems, where no On-Hamiltonian walk exists?” If the graph isn’t connected, then obviously the BNS will not contain an On-Hamiltonian walk, but even blinking node systems on connected graphs can be made without On-Hamiltonian walks such as the one illustrated in Figure 2.

Figure 2: Graph with 2

34

Graphs: Theory and Algorithms

Remark 1.1. A necessary condition for the existence of an OnHamiltonian walk in a blinking node system B = (G, L) modulo n. For every there exists some v ∈ V (G) such that i ∈ L(v). Otherwise stated as .

One can find stronger necessary conditions for the existence of an OnHamiltonian walk. For example, consider the set Si = {v | i ∈ L(v)}; there must exist a vertex w ∈ N(Si) such that i − 1 ∈ L(w). If this does not hold, noon-walk will contain a neighbor of a vertex in Si at time i − 1, therefore the on-walk will not contain a vertex from Si at time i, hence it is not an on-walk. “How long can an On-Hamiltonian walk be?” Assuming there are no restrictions of having to visit a vertex when it is available before revisiting a vertex, these walks might be as long as one wishes. This is shown in the BNS (5) of Figure 3 where the sequence {b, e, b, a, d, . . . b, e, b, a, d, c}, such that the substring {b, e, b, a, d} repeated as many times as desired, is an On-Hamiltonian walk.

Figure 3: BNS (5).

The literature is rich on the many variations of the existence of Hamiltonian paths and cycles [4]. However, knowing if an undirected graph contains a Hamiltonian path is NP complete [3], thus we will only explore blinking node systems for which the underlying graph is known to have a Hamiltonian path. We will start by considering G = Kn, for which any sequence of vertices is a walk, asking the general question: does a BNS (n) contain an on Hamiltonian walk? Remark 1.1 states the necessity, but it is also sufficient that for an On Hamiltonian walk to exist. One can achieve this by considering any arbitrary ordering of the vertices and greedily placing the vertices at the

Traveling in Networks with Blinking Nodes

35

first unused time t, where t (mod n) ∈ L(v) for every v ∈ V (G). Now any unused time slots can be filled in by duplicating a vertex available at such a time; however, this may require visiting the same vertex for consecutive time slots. We will discuss the idea of repeating a vertex later in Section 2 when searching for the shortest On-Hamiltonian walk for graphs that do not contain an On-Hamiltonian path. Observation 2. If all but one vertex have the same labelling of size one, the On-Hamiltonian walk will be of length 2n − 2 which would be the longest such shortest On-Hamiltonian walk a BSN(n) could have. Since we have the necessary and sufficient condition for the existence of an On-Hamiltonian walk in BSN(n), it is natural to search for the shortest On-Hamiltonian walk in a BNS(n). We will begin by giving necessary and sufficient conditions for the existence of an On Hamiltonian path in Theorem 2.2 and then generalize to find the shortest On-Hamiltonian walk in Theorem 2.3. Finally we extend these results for On-Hamiltonian walks in blinking node systems on complete bipartite graphs in Theorems 3.1 and 3.2 then applying all of these results to get Corrollary 3.3. It should be noted that the conditions in Theorem 2.2 are similar to Hall’s Theorem on a matching in a bipartite graph [5]. One can see this by constructing a bipartite graph B(X, Y ) with parts, X = V (Kn) and Y = {x | 0 ≤ x ≤ n − 1} and edges, E(B) = {(x, y) | x ∈ X, y ∈ Y and y ∈ L(x)}, represent the labelling of the blinking node system. Now applying Hall’s Theorem to B(X, Y ) will determine if the blinking node system contains an On-Hamiltonian path. However this is highly dependant on the underlying graph of the blinking node system being complete, since B(X, Y ) neglects to model the edges of the graph in the blinking node system. Therefore we have provided a proof devoid of Hall’s theorem in the spirit of generalizing this process to non-complete graphs. Furthermore since Hall’s Theorem is typically proven by utilizing augmented paths of a matching via Berge’s Theorem [1], one can see our proof is different since we begin with a bijection assigning all the vertices to a time slot and preform “switches” to find an On-Hamiltonian path.

TRAVELING IN COMPLETE GRAPHS Let us begin with a small example on 5 vertices where each labelling is the same size. One could easily check all possible Hamiltonian paths in the graph in search of an On-Hamiltonian path, but for the purpose of our results, we illustrate an algorithmic approach of finding such an On-Hamiltonian path

36

Graphs: Theory and Algorithms

in Figure 4. The algorithm shown in Table 4b finds a vertex v, which is in a column numbered x /∈ L(v), then that vertex is “switched” with a vertex w, such that x ∈ L(w). We formalize this “switching process” in Algorithm 2.1 and the proof of Theorem 2.2 shows that the algorithm will produce a sequence where the number of vertices in “incorrect” columns is 0 in a finite number of steps, hence producing on On-Hamiltonian path.

Figure 4: Finding an On-Hamiltonian path

Algorithm 2.1. 0. Let R: V (G) → be a bijection and T = ∅ 1. While v ∈ V (G) such that R(v) L(v)

(a) Find w ∈ V (G), such that (w, v) T and R(w) ∈ L(v) (b) Define a permutation σ = (R(w), R(v)) (c) Set R = σ ◦ R and T = T ∪ {(w, v)} *Note that (x, y) ◦ R is a transposition of the vertices mapped to x and y respectively. 2. Define the sequence H such that v is the (R(v) + 1)st term for all v ∈ V (G). 3. Output H. Since we are mostly concerned with vertices that are not mapped to integers in their labelling, let us define the following: Definition 2.1. Given a list L(v) of the vertex v, the complement . Definition 2.2. Theorem 2.2. There exists an On-Hamiltonian path in a BNS (n) if and only if for every for at most n − |S| vertices of Kn.

Traveling in Networks with Blinking Nodes

37

Proof. We will first show if the condition isn’t met, no On-Hamiltonian path exists. Otherwise an On-Hamiltonian path is created by Algorithm 2.1. For contradiction consider the negation, thus assuming there is a subset S ∈ P(Zn) such that where n − |V | < |S|. Therefore for all times i, where i ∈ S, an On-Hamiltonian path must use a vertex w V; however, since |S| > n − |V |, there are not enough vertices to be mapped to the elements of S. Therefore by the pigeonhole principle, no such On-Hamiltonian path exists. for at least |S| vertices Now we assume that for every of Kn, and we will construct an On-Hamiltonian Path. This is equivalent to assuming if and , then |Q| ≤ n − |V |. Algorithm 2.1 will suffice for constructing an On-Hamiltonian path. When |Q| = 1 we have an equivalent statement to Remark 1.1, thus we , there see that whenever a vertex v ∈ V (G), where R(v) = i for i ∈ exists a vertex w such that i ∈ L(w). Furthermore, consider when Q = , therefore there exists at least n−|L(v)| vertices which have a label containing , but only n − | | − 1 other elements j ∈ such that j an element in ∈ L(v). Therefore there are enough vertices that can be transposed with v to guarantee that Algorithm 2.1 will create R such that R(v) ∈ L(v). Since each within its labelling, no step of transposition will map w to an element of the algorithm will increase . However, once v is transposed such that R(v) ∈ L(v) the algorithm will decrease . Furthermore, keeping track of the transpositions with T provides that Step 1 of Algorithm 2.1 will terminate; therefore we have shown Algorithm 2.1 generates an On-Hamiltonian path in the BNS (n).

“What is the shortest walk in a BNS where the conditions of Theorem 2.2 do not hold?” Let us say that S appears r times, where r > n − |S|. The proof of Theorem 2.2 gives us a natural lower bound of n + r − |S| for the length of On-Hamiltonian walk. However, the labeling obviously provides more restrictions as shown in Figure 5, which illustrates a BNS(6) where the length of the shortest On-Hamiltonian walk is 8 ([a, f, b, f, f, c, d, f, e]) compared to stated lower bound of 7. Also, we can see that it may be necessary to visit a vertex in consecutive time slots, therefore we add loops at every vertex to illustrate the walk travelling along an edge when a vertex is adjacent to itself in the sequence of an On-Hamiltonian walk.

38

Graphs: Theory and Algorithms

Figure 5: BNS (6) with loops.

Let S ⊂ P( ) and VS = {v | S ⊆ . If |S| + |VS| > n, let eS = |S| + |VS|. and AS = {nx+i | i ∈ LS and x ∈  }. So for every a ∈ AS, a Define (mod n) ∈ LS, where AS = {a1, a2, a3 . . . } such that ai < aj whenever i < j. Finally, let bS = aeS .

Theorem 2.3. If Remark 1.1 is satisfied, then the length of the shortest On-Hamiltonian walk in the BNS(n) is l = max{bS | S ∈ P(  n)}.

Since the only sets S ∈ P(  n) that are requiring us to use more than n vertices are those where |S| + |VS| > n and whenever a superset of such a set also has this property, it will provide greater restrictions. We will define the set of deficiency time slots as

Proof. We wish to build a one-to-one relation R : V (G) →  0 similar to that described in Algorithm 2.1 for which each element of V (G) is related to at least one element of  0 and the range is {x | 0 ≤ x ≤ t}. We can see that given an S ∈ P(  n) the vertices in VS need to be related to |VS| integers, for which bS must be used. Consequently t ≥ l and thus any On Hamiltonian walk in the BNS will be at least this long. Thus it is sufficient to show that there exists a one-to-one relation R such that t = l, hence we will define R : V (G) →  l−1. Let M = {S ∈ D | bS = l} and VM =

Define R(v) = l − 1 for some v ∈ VM. The BNS (n − 1) on G − v with the same on-labelling will satisfy max {bS | S ∈ P(  n)} < l. Thus recursively

Traveling in Networks with Blinking Nodes

39

assign vertices of G to integers between n and l − 1 until Theorem 2.2 is satisfied, at which point all remaining vertices can be assigned to positive integers less than n using Algorithm 2.1. Finally, for all 0 ≤ i < l without an assigned vertex, we define R(v) = i for some v ∈ V (G) such that i ∈ L(V ), hence an On-Hamiltonian walk of length l exists where v is the (R(v) + 1)st term for all v ∈ V (G).

TRAVELING IN COMPLETE BIPARTITE GRAPHS

We will denote a blinking node system on Km,n modulo m+n as BNS(m, n). This section will examine similar arguments for complete bipartite graphs. Therefore assume the underlying graph of a BNS is Km,n with bi-partition X and Y . The structure of bipartite graphs ensures that any walk will alternate between vertices in the set X and vertices in the set Y therefore the removal of any odd integer from L (v) for v in the partition visited at even time intervals and any even integer from L(w) for w in the other partition will not prohibit the existence of an On-Hamiltonian path. Figure 6 illustrates this removal process.

Figure 6: The Removal Process.

Furthermore, we will view an on-walk as the alternation between two sequences; one sequence consisting only of vertices from X and the other consisting only of vertices from Y. Therefore it is natural to think of each partition as a complete graph needing an On-Hamiltonian path, but in such a way that only uses every other time interval, thus motivating the following definitions. Definition 3.1. Let S ⊆ V (G) be a vertex partition of Km,n. Define BS(even) to be a BSN(|S|) on a complete graph on S with labelling M, where i ∈ M(v) if and only if 2i ∈ L(v) for all v ∈ S.

Definition 3.2. Let S ⊆ V (G) be a vertex partition of Km,n. Define BS(odd)

40

Graphs: Theory and Algorithms

to be a BSN(|S|) on a complete graph on S with labelling N, where i ∈ N(v) if and only if 2i + 1 ∈ L(v) for all v ∈ S.

Consider a BSN (m, n) with bi-partition X, Y. Without loss of generality we can assume |X| = m ≥ n = |Y |. By Ore’s theorem [7] it is obvious that |m − n| ≤ 1 is a necessary condition. Thus we need only consider two cases, either m = n + 1 or m = n. Theorem 3.1. A BSN (n + 1, n), B, with bi-partition X and Y such that |X| = n + 1 and |Y | = n has an On-Hamiltonian path if and only if BX(even) and BY (odd) both contain On Hamiltonian paths

Proof. Since any Hamiltonian path in Kn+1,n must begin and end in X, we will consider BX(even) and BY (odd). If B contains an On-Hamiltonian path H, then BX(even) will contain an On-Hamiltonian path corresponding to the sequence of vertices visited by H at even indexed times and BY (odd) will contain an on Hamiltonian path corresponding to the sequence of vertices visited by H at odd indexed times.

If both BX(even) and BY (odd) contain On-Hamiltonian paths, then the path constructed by alternating vertices from these two paths will be an On-Hamiltonian path in B. Therefore it is both necessary and sufficient that BX(even) and BY (odd) satisfy the condition stated in Theorem 2.2 for B to contain an On-Hamiltonian path. Theorem 3.2. A BSN(n, n), B, with bi-partition X and Y has an OnHamiltonian path if and only if BX(even) and BY (odd) both contain On Hamiltonian paths or BX(odd) and BY (even) both contain On Hamiltonian paths. Proof. A Hamiltonian path in Kn,n either begins in X and end in Y or begins in Y and ends in X, thus we recognize the possible need for both odd and even integers on the labelling of all vertices. However, without generality we can assume the On-Hamiltonian path starts in X and the same analysis of Theorem 3.1 applies, thus it is both necessary and sufficient that BX(even) and BY (odd) both satisfy the condition stated in Theorem 2.2 for B to contain an On-Hamiltonian path starting on a vertex in X.

Let B be a BSN (m, n) with bi-partitions X and Y . A similar removal can be used to find a shortest On-Hamiltonian walk, although care needs to be taken to consider the On Hamiltonian walk starting in either partition, as in Theorem 3.2. Essentially we wish to apply Theorem 2.3 to both BX(even) and BY (odd) or BX(odd) and BY (even), depending on the sizes of X and Y .

To consider an On-Hamiltonian walk starting in X define x1 to be the length of the shortest On-Hamiltonian walk in BX(even) and y1 to be the length of the shortest On-Hamiltonian walk in BY (odd). l1 = max{2x1, 2y1

Traveling in Networks with Blinking Nodes

41

+ 1}.Likewise, to consider an On-Hamiltonian walk starting in Y define x2 to be the length of the shortest On-Hamiltonian walk in BX(odd) and y2 to be the length of the shortest on Hamiltonian walk in BY (even), so that l2 = max{2x2 + 1, 2y2}. Corollary 3.3. The minimum On-Hamiltonian walk in B = BSN(m, n) is l = min{l1, l2}.

OPEN PROBLEMS As with any formalization of new language around a problem, there are many ways one can interpret the elements to ask new problems. In the case of on-walks in blinking node systems, there is the obvious questions about when do they exist or what conditions prohibit such a walk. Furthermore one could establish other criteria of the walks as we did with the On-Hamiltonian walk, such as an Eulerian version, but we will outline a few here that seem promising and are most related to what has been presented. •

•

•

Finding On-Hamiltonian paths in known Hamiltonian graphs. Keeping with the theme of the travelling baseball fan, it is natural to explore graphs which are known to be Hamiltonian. There are certainly characteristics, such as bridges, that reduce the search significantly, but we suggest looking at families of graphs which are highly connected such as complete multipartite graphs, polyhedral graphs, and grid graphs. Finding On-Hamiltonian walks in Non-Hamiltonian graphs. The graphs we considered were known to have a Hamiltonian path, but we then searched for the shortest On-Hamiltonian walk when Theorem 2.2 wasn’t satisfied. Therefore it is natural to search for the shortest On-Hamiltonian walk in a BNS on a graph. For instance one can apply Corollary 3.3 to a Star graph, but what about arbitrary trees? What conditions are necessary for such an On-Hamiltonian walk to exist? Can we find the shortest On-Hamiltonian walk? Finding shortest on-paths. Consider the application of needing to travel from a start vertex to a target vertex in a BNS. Any onwalk starting and ending at the desired vertices would meet the application, but we would want to find the shortest such on-walk, which would naturally be called the shortest on-path. The first step would be to decide if such a path exists, then be able to find the shortest on-path. Again, we suspect the original inquiry should involve certain families of graphs or at least specific knowledge of the connectivity between the starting and target vertex.

42

Graphs: Theory and Algorithms

REFERENCES 1. 2. 3.

4. 5. 6.

7. 8.

Berge, C.: Two theorems in graph theory. Proceedings of the National Academy of Sciences 43(9), 842–844 (1957) Dantzig, G.B., Ramser, J.H.: The truck dispatching problem. Management science 6(1), 80–91 (1959) Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified npcomplete graph problems. Theoretical computer science 1(3), 237–267 (1976) Gould, R.J.: Recent advances on the hamiltonian problem: Survey iii. Graphs and Combinatorics 30(1), 1–46 (2014) Hall, P.: On representatives of subsets. Journal of the London Mathematical Society 1(1), 26–30 (1935) Kruskal, J.B.: On the shortest spanning subtree of a graph and the traveling salesman problem. Proceedings of the American Mathematical society 7(1), 48–50 (1956) Ore, O.: Note on hamilton circuits. The American Mathematical Monthly 67(1), 55–55 (1960) Ramesh, T.: Traveling purchaser problem. Opsearch 18(1-3), 78–91 (1981)

On Minimum Spanning Subgraphs of Graphs with Proper Connection Number 2

4

Zhenming Bi1, Gary Chartrand1, Garry L. Johns1, Ping Zhang1

Western Michigan University

1

ABSTRACT An edge coloring of a connected graph G is a proper-path coloring if every two vertices of G are connected by a properly colored path. The minimum number of colors required of a proper-path coloring of G is called the proper connection number pc(G) of G. For a connected graph G with proper connection number 2, the minimum size of a connected spanning subgraph H of G with pc(H) = 2 is denoted by µ(G). It is shown that if s and t are integers such that t ≥ s + 2 ≥ 5, then µ(Ks,t) = 2t − 2. We also determine µ(G) for

Citation: (APA):Bi, Z., Chartrand, G., Johns, G. L., & Zhang, P. (2016). On minimum spanning subgraphs of graphs with proper connection number 2. Theory and Applications of Graphs, 3(2), 2. (15 pages) DOI: https://doi.org/10.20429/tag.2017.030202 Copyright: © This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

44

Graphs: Theory and Algorithms

several classes of complete multipartite graphs G. In particular, it is shown that if G = Kn1,n2,...,nk is a complete k-partite graph, where and t = nk ≥ r 2 + r, then µ(G) = 2t − 2r + 2.

INTRODUCTION An edge coloring of a connected graph G is a rainbow coloring of G if every two vertices are connected by a path where no two edges are colored the same (a rainbow path). A connected graph G with a rainbow coloring is a rainbow-connected graph. The minimum number of colors required for a rainbow coloring of a connected graph G is its rainbow connection number rc(G). These concepts were introduced and studied by Chartrand, Johns, McKeon and Zhang in 2006 resulting in the 2008 paper [3]. Since then, much research has been done on these topics. In fact, a book [5] has been written on rainbow connection in graphs. An edge coloring of a graph G is a proper coloring of G if every two adjacent edges of G are assigned distinct colors. The minimum number of colors required of a proper edge coloring of G is its chromatic index, denoted by χ 0 (G). An edge coloring of a connected graph G is called a proper-path coloring if every two vertices of G are connected by a properly colored path (a proper path). The minimum number of colors required of a proper-path coloring of G is called the proper connection number pc(G) of G. Therefore, pc(G) ≤ χ 0 (G) for every connected graph G. This concept was independently introduced and studied in [1, 2]. Recently, much research has been done on these concepts and, in fact, there is a dynamic survey on this topic due to Li and Magnant [4]. In [1] it was shown that every complete multipartite graph that is neither a complete graph nor a star has proper connection number 2. In fact, many connected graphs have been shown to have proper connection number 2. On the other hand, a graph can have a large proper connection number. For example, pc(T ) = ∆(T) for every tree T. Indeed, we have the following observation (see [1]). Observation 1.1 Let G be a nontrivial connected graph containing bridges. If the maximum number of bridges incident with a single vertex in G is b, then pc(G) ≥ b. While trees can have large proper connection number, this is not true for 2-connected graphs. The following theorem is due to Borozan, Fujita, Gerek, Magnant, Manoussakis, Montero and Tuza in [2].

On Minimum Spanning Subgraphs of Graphs with...

45

Theorem 1.2 If G is a 2-connected graph that is not complete, then pc(G) ≤ 3.

Figure 1: A 2-connected graph G with proper connection number 3.

The upper bound 3 in Theorem 1.2 cannot be improved as it was shown in [2] that 2-connected graphs with proper connection number 3 exist. The 2-connected graph G in Figure 1 has proper connection number 3. A properpath 3-coloring of G is shown in Figure 1. The graph G in Figure 1 is, of course, also 2-edge connected. That is, there exist 2- edge connected graphs with proper connection number 3. While verifying that the proper connection number of the graph G of Figure 1 is 3 may require some time, there is a class of 2-edge connected graphs with proper connection number 3, the verification of which is more immediate. A graph G is a friendship graph if every two distinct vertices of G have a unique common neighbor. A friendship graph has odd order 2k+1 for some positive integer k. The friendship graph of order 3 is K3 and the friendship graph of order 2k +1 ≥ 5 is obtained by identifying one vertex from each of k triangles. Proposition 1.3 If G is the friendship graph of order 7, then pc(G) = 3. Proof. Label the vertices of G as shown in Figure 2(a). First, the edge coloring c 0 where c; (wx1) = c; (wy1) = c’ (wz1) = 1, c’ (wx2) = c’ (wy2) = c’ (wz2) = 2 and c’ (x1x2) = c’ (y1y2) = c; (z1z2) = 3 is a proper-path 3-coloring of G and so pc(G) ≤ 3.

46

Graphs: Theory and Algorithms

Figure 2: The friendship graph of order 7

Next, we show that pc(G) ≥ 3. Assume, to the contrary, that there is a proper-path 2-coloring c of G with the colors 1 and 2. Denote the triangle with vertices x1, x2, w by T1, the triangle with vertices y1, y2, w by T2 and the triangle with vertices z1, z2, w by T3, as shown in Figure 2(a). We claim that in each of the three triangles T1, T2, T3, the two edges incident with w are assigned different colors by c. Assume, to the contrary, that one of T1, T2, T3 has its two edges incident with w colored the same, say c(wx1) = c(wx2) = 1. Since there is a proper x1 − y1 path and there is a proper x1 − z1 path, at least one edge incident with w in T2 and T3 is colored 2, say c(wy1) = c(wz1) = 2. Since G contains a proper y1 − z1 path, either wy2 or wz2 is colored 1, say c(wy2) = 1. Since G contains a proper x1 − y2 path, c(y1y2) = 1. Thus, (y1, w, z2, z1) is the unique proper y1 − z1 path, which implies that c(wz2) = 1 and c(z1z2) = 2. However then, there is no proper x1 − z2 path, which produces a contradiction. Thus, as claimed, in each of three triangles T1, T2, T3, the two edges incident with w are assigned different colors by c. Hence, we may assume that c(wx1) = c(wy1) = c(wz1) = 1 and c(wx2) = c(wy2) = c(wz2) = 2, as shown in Figure 2(b).

Since G contains a proper x1 − y1 path, it follows that c(x1x2) = 1 or c(y1y2) = 1, say the former. There is now only one possible proper x2 −z2 path, namely, (x2, w, z1, z2), which implies that c(z1z2) = 2. However then, there is only one proper x2 − y2 path, namely (x2, w, y1, y2), which implies that c(y1y2) = 2. Thus, we arrive at the coloring of G shown in Figure 2(c). However then, there is no proper y1 − z1 path, producing a contradiction. Hence, pc(G) ≥ 3 and so pc(G) = 3. The proof of Proposition 1.4 and the proper-path 3-coloring of G described in this proof can be extended to give the following result. Corollary 1.4. Every friendship graph of order at least 7 has proper connection number 3.

On Minimum Spanning Subgraphs of Graphs with...

47

We have noted that many graphs have proper connection number 2. Certainly, many 2-connected graphs have proper connection number 2. If G is a noncomplete connected graph containing a connected spanning subgraph H such that pc(H) = 2, then pc(G) = 2 as well. In fact, every non-complete connected supergraph F of H with V (F) = V (H) also has proper connection number 2. This suggests the following concept. For a connected graph G with pc(G) = 2, let µ(G) denote the minimum size of a connected spanning subgraph H of G with pc(H) = 2. In this context, we refer to a spanning subgraph H of G with µ(G) edges as a minimum spanning subgraph of G. In what follows, we determine µ(G) for some familiar graphs G with pc(G) = 2.

COMPLETE BIPARTITE GRAPHS We now investigate this concept for complete multipartite graphs that are neither a star nor a complete graph, beginning with complete bipartite graphs Ks,t with 2 ≤ s ≤ t. Since Ks,t contains a Hamiltonian path if and only if t − s ≤ 1, it follows that this minimum size is t + s − 1 for these graphs. It therefore suffices to consider those graphs Ks,t with t − s ≥ 2. We begin with the graphs K2,t where t ≥ 4.

Theorem 2.1. For an integer t ≥ 4, µ(K2,t) = 2t − 2. Proof. Let U = {u1, u2} and W = {w1, w2, . . . , wt} be the two partite sets of K2,t and let H = K2,t − {u1w2, u2w1}. Thus, the size of H is 2t − 2. We show (1) pc(H) = 2 and (2) H has the minimum size of a connected spanning subgraph of K2,t with proper connection number 2. First, we show that pc(H) = 2. Define the edge coloring c : E(H) → {1, 2} by

To verify that c is a proper-path coloring of H, we show that every two vertices of H are connected by a proper path. Let x and y be two nonadjacent vertices of H. * If {x, y} = {u1, u2}, then (u1, w3, u2) is a proper u1 − u2 path. If {x, y} = {u1, w2}, then (u1, wt , u2, w2) is a proper u1 − w2 path. If {x, y} = {u2, w1}, then (u2, wt , u1, w1) is a proper u2 − w1 path.

48

Graphs: Theory and Algorithms

* Let x = wi and y = wj where 1 ≤ i < j ≤ t. First, suppose that x ∈ {w1, w2}. If x = w1 and y = wj for 2 ≤ j ≤ t − 1, then (w1, u1, wt , u2, wj ) is a proper w1 − wj path. If x = w1 and y = wt , then (w1, u1, wt) is a proper w1 − wt path. If x = w2 and y = wj for 3 ≤ j ≤ t − 1, then (w2, u2, wt , u1, wj ) is a proper w2 − wj path. If x = w2 and y = wt , then (w2, u2, wt) is a proper w2 − wt path. Next, suppose that x = wi where 3 ≤ i ≤ t – 1 If y = wj for i + 1 ≤ j ≤ t − 1, then (wi , u1, wt , u2, wj ) is a proper wi − wj path. If y = wt , then (wi , u1, wt) is a proper wi − wt path. Hence, c is a proper-path coloring of H and so pc(H) = 2. Next, we show that H has the minimum size of a connected spanning subgraph of K2,t with proper connection number 2. Suppose that there is a connected spanning subgraph F of K2,t having less than 2t − 2 edges for which pc(F) = 2. Necessarily, at least three vertices of W have degree 1 in F. It cannot occur that three vertices of degree 1 in F are adjacent to the same vertex of U, for otherwise pc(F) ≥ 3 by Observation 1.1. Hence, we may assume that the vertices wi , 1 ≤ i ≤ 3, have degree 1 in F and that u1w1, u1w2, u2w3 are edges of F. Any proper-path coloring c: E(F) → {1, 2} of F must assign distinct colors to u1w1 and u1w2, say c(u1w1) = 1 and c(u1w2) = 2. We may assume, without loss of generality, that c(u2w3) = 1. Let P be a proper w1 − w3 path in F. Thus, P = (w1, u1, wj , u2, w3) for some integer j ≥ 4. Since c(w1u1) = c(u2w3) = 1 and P is a proper path, it follows that which is a contradiction. Next, we determine µ(Ks,t) for all integers s and t with t ≥ s + 2 ≥ 5.

Theorem 2.2 If s and t are integers with t ≥ s + 2 ≥ 5, then µ(Ks,t) = 2t − 2. Proof. Let G = Ks,t with partite sets

Write t = sq + r for integers q and r where 0 ≤ r ≤ s − 1. First, we construct a connected spanning subgraph H of G of size 2t − 2 such that pc(H) = 2. We consider two cases, according to whether r = 0 or 1 ≤ r ≤ s − 1.

On Minimum Spanning Subgraphs of Graphs with...

49

Case 1. r = 0. Partition the set W into q ≥ 2 subsets W1, W2, . . . , Wq where |Wi | = s for 1 ≤ i ≤ q such that W1 = {w1, w2, . . . , ws} and W2 = {ws+1, ws+2, . . . , w2s}.

*First, suppose that s = 3. If q = 2, let Q1 = (u1, w3, u2, w4, u1) = C4 and Q2 = (u2, w5, u3, w6, u2) = C4. If q ≥ 3, then for each integer i with 3 ≤ i ≤ q, let Qi = C6 be a Hamiltonian cycle of the subgraph G[U ∪ Wi ] = K3,3 of G induced by U ∪ Wi . Let H be the spanning subgraph of G with (1)

*Next, suppose that s ≥ 4. Let U’ = U − {u1, u2} and W0 = W1 − {w1, w2}. Let Q1 = C2s−4 be a Hamiltonian cycle in the subgraph G[U’∪W’ ] = Ks−2,s−2 of G induced by U’ ∪ W0 and for each integer i with 2 ≤ i ≤ q, let Qi = C2s be a Hamiltonian cycle in the subgraph G[U ∪ Wi ] = Ks,s of G induced by U ∪ Wi . Let H be the spanning subgraph of G whose edge set is described in (1). Hence, the size of H is 2t − 2. It remains to show that pc(H) = 2. We first define an edge coloring c: E(H) → {1, 2}. For 1 ≤ i ≤ q, let c be a proper edge coloring of the even cycle Qi. Also, let c(u1w1) = 1 and c(u1w2) = 2. Next, we show that c is a proper-path coloring of H. Let x and y be two nonadjacent vertices of H. If x and y lie on some even cycle Qp for some p ∈ {1, 2, . . . , q}, then there is a proper x − y path in H. Thus, we may assume that x and y do not lie on a common even cycle Qi where 1 ≤ i ≤ q.

First, suppose that s = 3. Since x and y do not lie on a common even cycle Qi where 1 ≤ i ≤ q, it follows that x, y ∈ U ∪ {w1, w2, . . . , w6}. Let H’ be the sub-graph of H induced by U ∪ {w1, w2, . . . , w6}. We may furthermore assume that the edges of H’ are colored as shown in Figure 3, where a solid edge and a dashed edge are colored differently. If x, y ∈ U, say x = u1 and y = u3, then (u1, w3, u2, w5, u3) is a proper x−y path. Thus, we may assume that at least one of x and y does not belong to U, say y ∈ W. Then y = wi for some i with i = 1, 2, . . . , 6. A case-by-case argument shows that there is a proper x − y path in H’ (and so in H). For example, (u1, w4, u2, w6) is a proper u1 − w6 path and (w1, u1, w4, u2, w6) is a proper w1 − w6 path.

Next, suppose that s ≥ 4. Since every two distinct vertices of U lie on the even cycle Q2, it follows that at least one of x and y does not belong to U, say y ∈ W. We consider two subcases, according to whether x ∈ U or x ∈ W. Subcase 1.1. x ∈ U. First, suppose that y = wi for 3 ≤ i ≤ s. Then x ∈ {u1, u2}. Since x lies on Q2 and y is adjacent to a vertex z on Q2, there are two x−y

50

Graphs: Theory and Algorithms

paths passing through z, one of which is proper. Next, suppose that y ∈ {w1, w2}. Since x and y are nonadjacent, x = ui for some i with 2 ≤ i ≤ s. Then x lies on Q2 and y is adjacent to the vertex u1 on Q2. Then there are two x − y paths passing through u1, one of which is proper.

Figure 3: A subgraph H’ in H when s = 3

Subcase 1.2. x ∈ W. We may assume that x = wi and y = wj where 1 ≤ i < j. First, suppose that i ∈ {1, 2}. If (i, j) = (1, 2) or j ≥ s + 1, then (wi , u1, wj ) is a proper wi − wj path in H. Next, suppose that 3 ≤ j ≤ s. Then x is adjacent to u1 of Q2 and y is adjacent to a vertex z of Q2 (where it is possible that u1 = z). There are two x − y paths passing through z, one of which is proper. Next, suppose that i ≥ 3. Hence, x ∈ Wa and y ∈ Wb where 1 ≤ a < b ≤ q. Then x is adjacent to a vertex z on Qb and y lies on Qb. Thus, there are two x − y paths passing through z, one of which is proper. Case 2. 1 ≤ r ≤ s − 1. Partition the set W into q + 1 subsets W0, W1, W2, . . . , Wq where |W0| = r and |Wi | = s for 1 ≤ i ≤ q such that W0 = {w1, w2, . . . , wr} and W1 = {wr+1, wr+2, . . . , wr+s}.

*For r = 1, let W[ = W1 − {w2} and U[ = U − {u1}. Now, let Q1 = C2s−2 be a Hamiltonian cycle in the subgraph G[U’ ∪ W’ ] = Ks−1,s−1 of G induced by U’ ∪ W’ and for 2 ≤ i ≤ q if q ≥ 2, let Qi = C2s be a Hamiltonian cycle in the subgraph G[U ∪ Wi ] = Ks,s of G induced by U ∪ Wi . Let H be the spanning subgraph of G whose edge set is described in (1). *For r = 2, let Qi = C2s be a Hamiltonian cycle in the subgraph G[U ∪ Wi ] = Ks,s for 1 ≤ i ≤ q. Let H be the spanning subgraph of G whose edge set is described in (1).

*For r = 3, let Q0 = (u1, w3, u2, w4, u1) = C4. Let U[= U − {u1} and W’= W1 − {w4}. Now let Q1 = C2s−2 be a Hamiltonian cycle in the subgraph G[U’

On Minimum Spanning Subgraphs of Graphs with...

51

∪ W’ ] = Ks−1,s−1 of G induced by U’∪W’ . For each integer i with 2 ≤ i ≤ q, let Qi = C2s be a Hamiltonian cycle in the subgraph G[U ∪Wi ] = Ks,s of G induced by U ∪Wi . Let H be the spanning subgraph of G whose edge set is (2)

*For r ≥ 4, let W’ = W0−{w1, w2} and U’ = {u1, u2, . . . , vr−2}. Now, let Q0 = C2r−4 be a Hamiltonian cycle in the subgraph G[U’∪W’ ] = Kr−2,r−2 of G induced by U’∪W’ and for 1 ≤ i ≤ q, let Qi = C2s be a Hamiltonian cycle in the subgraph G[U ∪ Wi ] = Ks,s of G induced by U ∪ Wi . Let H be the spanning subgraph of G whose edge set is described in (2).

In each case, the size of H is 2t − 2. We define an edge coloring c : E(H) → {1, 2} such that each even cycle Qi (1 ≤ i ≤ q or 0 ≤ i ≤ q) is properly colored and c(u1w1) = 1 and c(u1w2) = 2. An argument similar to that in Case 1 shows that c is a proper-path coloring of H and so pc(H) = 2. Next, we show that the minimum size of a connected spanning subgraph of Ks,t with proper connection number 2 is 2t − 2. Suppose that there is a connected spanning subgraph F of Ks,t having less than 2t − 2 edges but pc(F) = 2. Necessarily, at least three vertices of W have degree 1 in F. It cannot occur that three vertices of degree 1 in F are adjacent to the same vertex of U, for otherwise, pc(F) ≥ 3 by Observation 1.1. We may assume that the vertices wi , 1 ≤ i ≤ 3, have degree 1 in F. Suppose that e1, e2, e3 are the three pendant edges that are incident with w1, w2 or w3, respectively. Any proper-path coloring c: E(F) → {1, 2} of F must assign the same color to two of e1, e2, e3, say c(e1) = c(e2) = 1. Thus, e1 and e2 cannot be adjacent, say e1 = u1w1 and e2 = u2w2. Let P be a proper w1 − w2 path in F, say P = (w1, u1, wi , . . . , wj , u2, w2) for some integers i, j ∈ {3, 4, . . . , t}. Let P’ = P − {w1, u1, u2, w2} be the sub path of P. Then P’ has even length and the edges of P’ are colored alternately 2 and 1. However then, c(wju2) = c(u2w2) = 1, which is a contradiction. Therefore, the minimum size of a connected spanning subgraph of Ks,t with proper connection number 2 is 2t − 2. Combining Theorems 2.1 and 2.2, we obtain the following. Corollary 2.3 If s and t are integers with t ≥ s + 2 ≥ 4, then µ(Ks,t) = 2t − 2. The proof of Theorem 2.2 gives rise to the following useful corollary.

Corollary 2.4 Let F be a connected spanning subgraph of the complete bipartite graph Ks,t with partite sets U and W, where |U| = s, |W| = t and 1 ≤ s ≤ t and t ≥ 3. If at least three vertices of W have degree 1 in F, then pc(F) ≥ 3.

52

Graphs: Theory and Algorithms

COMPLETE MULTIPARTITE GRAPHS We now look at complete k-partite graphs for k ≥ 3. Let G = Kn1,n2,...,nk be the , where 1 ≤ n1 ≤ n2 ≤ · · · ≤ nk complete k-partite graph of order and k ≥ 3. Since G contains a Hamiltonian path if and only if , it follows that µ(G) = n − 1 in this case. Thus, it suffices to consider the case when . We begin by establishing an upper bound for µ(G) for such complete k-partite graphs G. Proposition 3.1 Let G = Kn1,n2,...,nk be the complete k-partite graph of , where 1 ≤ n1 ≤ n2 ≤ · · · ≤ nk and k ≥ 3. If , then order µ(G) ≤ 2nk − 2.

Proof. Let V1, V2, . . . , Vk be the partite sets of G with |Vi | = ni for 1 ≤ i ≤ k and let F be the complete bipartite graph with partite sets V1 ∪ V2 ∪ · · · ∪ Vk−1 and Vk. It follows by the proofs of Theorems 2.1 and 2.2 that F contains a connected spanning subgraph H of size 2nk − 2 such that pc(H) = 2. Since F is a connected spanning subgraph of G, it follows that G contains a connected spanning subgraph H of size 2nk − 2 such that pc(H) = 2 and so µ(G) ≤ 2nk − 2. Next, we describe a class of complete k-partite graphs where k ≥ 3 such that the upper bound described in Proposition 3.1 is attainable.

Proposition 3.2 Let G = Kn1,n2,...,nk be the complete k-partite graph of , where 1 ≤ n1 ≤ n2 ≤ · · · ≤ nk and k ≥ 3. If order then µ(G) = 2nk − 2.

,

, it follows by Proof. Since G has order n with Proposition 3.1 that µ(G) ≤ 2nk − 2. Every connected spanning subgraph H of G of size 2nk − 3 is a tree. Since G contains no Hamiltonian path, it follows that pc(H) = ∆(H) ≥ 3. Thus, there is no connected spanning subgraph of size 2nk − 3 having proper connection number 2. Hence, µ(G) ≥ 2nk − 2 and so µ(G) = 2nk − 2.

We now investigate µ(G) for the complete 3-partite graphs G. To simplify the notation, let G = Kr,s,t be a complete 3-partite graph where 1 ≤ r ≤ s ≤ t and t ≥ r + s + 2. We show that the upper bound described in Proposition 3.1 is attainable when r = s = 1 and t ≥ 4. First, we state a useful observation. Observation 3.3 If F is a connected spanning subgraph of a graph G such that pc(F) = pc(G) = 2, then µ(G) ≤ µ(F). Proposition 3.4 For each integer t ≥ 4, µ(K1,1,t) = 2t − 2.

Proof. Since K2,t ⊆ K1,1,t, it follows by Observation 3.3 and Theorem 2.1 that

On Minimum Spanning Subgraphs of Graphs with...

53

Thus, it suffices to show that µ(K1,1,t) ≥ 2t − 2. Assume, to the contrary, that there is a connected spanning subgraph F of K1,1,t of size m < 2t − 2 with pc(F) = 2. Thus, at least three vertices of W have degree 1 in F. Let the partite sets of K1,1,t be {u}, {v} and W = {w1, w2, . . . , wt}. First, suppose that exactly three vertices of W have degree 1 in F. Thus, m ≥ 2t − 3. Since m < 2t − 2, it follows that m = 2t − 3 and so uv ∉ E(F). However then, F is a bipartite subgraph of K2,t with partite sets {u, v} and W. Since F has three vertices of degree 1 in W, it then follows by Corollary 2.4 that pc(F) ≥ 3, which is impossible.

Next, suppose that at least five vertices of W have degree 1 in F. Then either u or v is incident with at least three bridges in F and so pc(F) ≥ 3 by Observation 1.1, which is a contradiction. Thus, exactly four vertices of W have degree 1 in F and each of u and v is incident with exactly two of these four vertices. Therefore, we may assume that the vertices wi (1 ≤ i ≤ 4) have degree 1 in F and uw1, uw2, vw3, vw4 are edges of F. Furthermore, if uv ∉ E(F), then F is a bipartite subgraph of K2,t with partite sets {u, v} and W. Again, since F has four vertices of degree 1 in W, it then follows that pc(F) ≥ 3 by Corollary 2.4, which is impossible. Thus, we assume that uv ∈ E(F).

Any proper-path coloring c : E(F) → {1, 2} of F must assign distinct colors to uw1 and uw2, say c(uw1) = 1 and c(uw2) = 2 and distinct colors to vw3 and vw4, say c(vw3) = 1 and c(vw4) = 2. Assume, without loss of generality, that c(uv) = 1. Any proper w1 − w3 path P in F must begin with w1, u and terminate with v, w3. Necessarily, the only other vertex on P is wj for some j with 5 ≤ j ≤ t, that is, P = (w1, u, wj , v, w3). Since c(w1u) = c(vw3) = 1 it follows that c(uwj ) = c(vwj ) = 2, which is a contradiction. Next, we show that the upper bound described in Proposition 3.1 can be strict if t is sufficiently large. Proposition 3.5 For an integer t ≥ 11, if G ∈ {K1,2,t, K1,1,1,t}, then Proof. Since K1,2,t is a spanning subgraph of K1,1,1,t, it follows by Observation 3.3 that µ(K1,1,1,t) ≤ µ(K1,2,t). Thus, it suffices to show that µ(K1,2,t) ≤ 2t−4 and µ(K1,1,1,t) ≥ 2t−4 for each integer t ≥ 11.

We first show that µ(K1,2,t) ≤ 2t − 4. Let G = K1,2,t, whose partite sets are {x, z}, {y} and W = {w1, w2, . . . , wt}. Let F be the connected spanning

54

Graphs: Theory and Algorithms

subgraph of G obtained from the path P3 = (x, y, z) by joining w1 and w2 to x, w3 and w4 to y, w5 and w6 to z, w7 to x and z, wi to x and y for 8 ≤ i ≤ t − 2, wi to y and z for i = t − 1, t. The graph F is shown in Figure 4 for t = 11. Since (i) there are six vertices of W of degree 1 in F, namely wi for 1 ≤ i ≤ 6, (ii) the remaining t − 6 vertices of W have degree 2 in F and (iii) there are exactly two edges in G[{x, y, z}], it follows that the size of F is 2t − 4.

Figure 4: A connected spanning subgraph F of K1,2,11 for t = 11.

Since there is a proper-path 2-edge coloring of F, it follows that pc(F) = 2. Such a proper-path 2-edge coloring of F is shown in Figure 5 for t = 11. If t ≥ 12, then (1) color the two edges wix and wiy (10 ≤ i ≤ t − 2) as w8x and w8y, (2) color the two edges wt−1y and wt−1z as w10y and w10z and (3) color the two edges wty and wtz as w11y and w11z. It can be verified that for every two nonadjacent vertices u and v of F, there is a proper u − v path in F. For example, if 12 ≤ i ≤ t − 2, then (w8, x, w9, y, wi) is a proper w8 − wi path in F.

Figure 5: A proper-path 2-edge coloring of F for t = 11. Next, we show that µ(K1,1,1,t) ≥ 2t − 4. Let G = K1,1,1,t whose partite sets are {x}, {y}, {z} and W = {w1, w2, . . . , wt}. Assume, to the contrary, that µ(G) ≤ 2t−5. Then there is a connected spanning subgraph F of G of size m ≤ 2t − 5

On Minimum Spanning Subgraphs of Graphs with...

55

with pc(F) = 2. We consider three cases, according to the number of edges in the subgraph G[{x, y, z}] induced by {x, y, z}. Case 1. The subgraph G[{x, y, z}] contains no edge. Then F is a connected spanning subgraph of K3,t with pc(F) = 2. Since µ(K3,t) = 2t − 2, the size of F is at least 2t − 2, which is impossible. Case 2. The subgraph G[{x, y, z}] contains exactly one edge, say e = xy is an edge of G[{x, y, z}]. Since the size of F is at most 2t − 5, at least six vertices of W have degree 1 in F. It cannot occur that three vertices of degree 1 in W are adjacent to a vertex in F, for otherwise, pc(F) ≥ 3 by Observation 1.1. Thus, we may assume, without loss of generality, that the vertices wi (1 ≤ i ≤ 6) have degree 1 in F and xw1, xw2, yw3, yw4, zw5, zw6. Any properpath coloring c : E(F) → {1, 2} of F must assign distinct colors to the two adjacent pendant edges in F, say c(xw1) = c(yw3) = 1 and c(xw2) = c(yw4) = 2. Then a proper w1 − w3 path must be (w1, x, y, w3) and so c(xy) = 2; while a proper w2 − w4 path must be (w2, x, y, w4) and so c(xy) = 1, which is impossible. Case 3. The subgraph G[{x, y, z}] contains at least two edges. Then at least seven vertices of W have degree 1 in F. Thus, three vertices of degree 1 in W are adjacent to a vertex in F and so pc(F) ≥ 3 by Observation 1.1, which is impossible. Hence, µ(K1,1,1,t) ≥ 2t − 4. Since 2t − 4 ≤ µ(K1,1,1,t) ≤ µ(K1,2,t) ≤ 2t − 4, it follows that µ(K1,1,1,t) = µ(K1,2,t) = 2t − 4.

Proposition 3.5 is a special case of a more general result. Let G = Kn1,n2,...,nk be the complete k-partite graph, where k ≥ 3, let . We now present a formula for µ(G) when r ≥ 3 and t is sufficiently large compared to r. Theorem 3.6 Let G = Kn1,n2,...,nk be a complete k-partite graph where

If t ≥ r 2 + r, then

Proof. Denote the partite sets of G by V1, V2, . . . , Vk, where |Vi | = ni for 1 ≤ i ≤ k. First, we show that µ(G) ≥ 2t − 2r + 2. Let H be a minimum connected spanning subgraph of G with pc(H) = 2. Certainly, every vertex of Vk has degree at least 1 in H. Also, at most 2r vertices of Vk have degree

56

Graphs: Theory and Algorithms

1, for otherwise, there are vertices of H incident with three or more pendant edges and so pc(H) ≥ 3 by Observation 1.1, contradicting the fact that pc(H) = 2. Thus, at most two vertices of Vk of degree 1 can be adjacent to the same vertex of . If any two vertices w’, w’ of degree 1 in Vk are incident with edges of the same color, then must contain an edge of the other color. Thus, the size of H must be at least

and so µ(G) ≥ 2t − 2r + 2. Next, we show that µ(G) ≤ 2t − 2r + 2. To verify this, we show that there exists a connected spanning subgraph F of size 2t − 2r + 2 in G such , where the only edges of are that pc(F) = 2. Let vr−2vr−1 and vr−1vr. Let Vk = {w1, w2, . . . , wt} where degF wi = 1 for 1 ≤ i ≤ 2r. In particular, for 1 ≤ i ≤ r, w2i−1 and w2i are adjacent to vi . Next, we define an edge coloring c : E(F) → {1, 2} as follows. Let c(vr−2vr−1) = 1, c(vr−1vr) = 2, c(viw2i−1) = 1 and c(viw2i) = 2 for 1 ≤ i ≤ r. There are r 2 distinct pairs of vertices in {v1, v2, . . . , vr}. For each such pair {va, vb}, two vertices w 0 and w 00 in {w2r+1, w2r+2, . . . , wr 2+r} are selected, which are both joined to va and vb, where c(vaw’ ) = c(vbw’’) = 1 and c(vaw’’) = c(vbw’ ) = 2. If t > r2 + r, then each vertex in {wr 2+r+1, wr 2+r+2, . . . , wt} is joined to two vertices in {v1, v2, . . . , vr}, where one edge is colored 1 and the other edge is colored 2. This completes the construction of F and the edge coloring c of F. The subgraph F and the coloring c of F are illustrated in Figure 6 for K2,2,20, where each solid edge is colored 1 and each dashed edge is colored 2.

Figure 6: A subgraph F of K2,2,20 and an edge coloring of F.

We now show that c is a proper-path 2-coloring of F. It remains to show that every two nonadjacent vertices x and y of F are connected by a proper path. This is obvious if there is a proper path of length 2 connecting x and y. Thus, we consider the other possibilities, namely either x, y ∈ Vk or exactly one of x and y belongs to Vk. First, we consider the situation when x, y ∈ Vk. There are three cases.

On Minimum Spanning Subgraphs of Graphs with...

57

Case 1. {x, y} = {wi , wj} where 1 ≤ i ≠ j ≤ 2r. Suppose that wivp and wjvq are edges in F where 1 ≤ p < q ≤ r. We consider three subcases, according to whether p < q ≤ r − 3 or p, q ≥ r − 2 or p ≤ r − 3 and q ≥ r − 2 Subcase 1.1. p < q ≤ r − 3. If c(wivp) ≠ c(wjvq), say c(wivp) = 1 and c(wjvq) = 2, then there exists a vertex ws where s > 2r such that vpws, wsvq ∈ E(F) and c(vpws) = 2 and c(wsvq) = 1. Then (wi , vp, ws, vq, wj ) is a proper wi − wj path in F. If c(wivp) = c(wjvq), say c(wivp) = c(wjvq) = 1, then there are vertices wa and wb where a, b > 2r such that vpwa, wavr−1, vqwb, wbvr ∈ E(F) with c(wavr−1) = c(wbvr) = 1 and c(wavp) = c(wbvq) = 2. Thus, (wi , vp, wa, vr−1, vr, wb, vq, wj ) is a proper wi − wj path in F. Subcase 1.2. p, q ∈ {r − 2, r − 1, r}. First, assume that wivr−2, wjvr−1 ∈ E(F).

*If wivr−2 and vr−1wj are both colored 2, then (wi , vr−2, vr−1, wj ) is a proper wi − wj path in F.

*If wivr−2 and vr−1wj are both colored 1, then there is a vertex wa with a > 2r such that vr−2wa, wavr ∈ E(F) such that c(vr−2wa) = 2 and c(wavr) = 1. Hence, (wi, vr−2, wa, vr, vr−1, wj) is a proper wi − wj path in F.

*If wivr−2 and vr−1wj are colored differently, say c(wivr−2) = 1 and c(vr−1wj ) = 2, then there is a vertex wa with a > 2r such that vr−2wa, wavr−1 ∈ E(F) such that c(vr−2wa) = 2 and c(wavr−1) = 1. Hence, (wi , vr−2, wa, vr−1, wj ) is a proper wi − wj path in F.

An argument for the situation where wivr−1, wjvr ∈ E(F) is similar. Next, assume that wivr−2, wjvr ∈ E(F).

*If c(wivr−2) = 2 and c(vrwj ) = 1, then (wi , vr−2, vr−1, vr, wj ) is a proper wi − wj path in F. *If c(wivr−2) = 1 and c(vrwj ) = 2, then there is a vertex wa with a > 2r such that vr−2wa, wavr ∈ E(F) such that c(vr−2wa) = 2 and c(wavr) = 1. Hence, (wi , vr−2, wa, vr, wj ) is a proper wi − wj path in F.

*If wivr−2 and vrwj are colored the same, say c(wivr−2) = c(vrwj ) = 1, then there is a vertex wa with a > 2r such that vr−2wa, wavr−1 ∈ E(F) such that c(vr−2wa) = 2 and c(wavr−1) = 1. Hence, (wi , vr−2, wa, vr−1, vr, wj ) is a proper wi − wj path in F. Subcase 1.3. p ≤ r − 3 and q ∈ {r − 2, r − 1, r}. Suppose that q = r − 2 and so wivp, wjvr−2 ∈ E(F).

*If wivp and wjvr−2 are colored differently, say c(wivp) = 1 and c(wjvr−2) = 2, then there is a vertex wa with a > 2r such that c(wavp) = 2 and c(wavr−2) = 1. Hence, (wi , vp, wa, vr−2, wj ) is a proper wi − wj path in F.

58

Graphs: Theory and Algorithms

*If c(wivp) = c(vr−2wj ) = 1, then there are vertices wa and wb with a, b > 2r such that vpwa, wavr, vr−1wb, wbvr−2 ∈ E(F) where c(wavr) = c(vr−1wb) = 1 and c(wavp) = c(vr−2wb) = 2. Then (wi , vp, wa, vr, vr−1, wb, vr−2, wj ) is a proper wi − wj path in F.

*If c(wivp) = c(vr−2wj ) = 2, then there is a vertex wa with a > 2r such that vpwa, wavr−1 ∈ E(F) such that c(vpwa) = 1 and c(wavr−1) = 2. Hence, (wi , vp, wa, vr−1, vr−2, wj ) is a proper wi − wj path in F. The situations for q = r − 1 or q = r are similar

Case 2. {x, y} = {wi , wj} where 2r < i 6= j ≤ t. Since each of wi and wj is incident with two edges of different colors in F, it follows that there exist p, q ∈ {1, 2, . . . , r} such that wivp, wjvq ∈ E(F) and c(wivp) ≠ c(wjvq). If vp = vq, then (wi , vp, wj ) is a proper wi−wj path in F; while if vp ≠ vq, then there is a vertex wa with a > 2r such that vpwa, wavq ∈ E(F) such that c(wivp) ≠ c(vpwa) and c(wjvq) ≠ c(vqwa). Thus, (wi , vp, wa, vq, wj ) is a proper wi − wj path in F.

Case 3. {x, y} = {wi , wj} where 1 ≤ i ≤ 2r and 2r + 1 ≤ j ≤ t. Suppose that wivp ∈ E(F). Choose vq such that c(wivp) 6= c(wjvq). We may assume that vp ≠ vq. Then there is a vertex wa with a > 2r such that vpwa, wavq ∈ E(F) where c(wivp) ≠ c(vpwa). Thus, (wi , vp, wa, vq, wj ) is a proper wi − wj path in F.

Next, we consider the situation when exactly one of x and y belongs to Vk, say x ∈ Vk and y = vj for some integer j with 1 ≤ j ≤ r. First, suppose that x = wi where 2r + 1 ≤ i ≤ t. Let wivp, wivq ∈ E(F), where say c(wivp) = 1 and c(wivq) = 2. Then there is a vertex wa with a > 2r such that vjwa, wavp ∈ E(F) and c(wavj ) = 1 and c(wavp) = 2. Thus, (wi , vp, wa, wj ) is a proper wi − wj path in F. Next, suppose that x = wi where 1 ≤ i ≤ 2r. Let wivp ∈ E(F), where say c(wivp) = 1. Then there is a vertex wa with a > 2r such that vjwa, wavp ∈ E(F) and c(wavj ) = 1 and c(wavp) = 2. Thus, (wi , vp, wa, wj ) is a proper wi − wj path in F. Hence, c is a proper-path 2-coloring of F and so pc(F) = 2. Therefore, µ(G) ≤ 2t−2r+2 and so µ(G) = 2t − 2r + 2.

What remains then is determining µ(G) for G = Kn1,n2,...,nk , where and t = nk, when r + 2 ≤ t < r2 + r. Of course, the more general problem is that of determining or at least finding bounds for µ(G) for other connected graphs G not possessing a Hamiltonian path. Closing Remarks: In a relatively short period after the concept of properpath colorings in graphs was introduced, it has been studied by many, resulting in numerous beautiful theorems and intriguing conjectures and open questions (such as in the previous paragraph). Li and Magnant’s a dynamic survey [4] provides useful information on this topic.

On Minimum Spanning Subgraphs of Graphs with...

59

REFERENCES 1. 2.

3. 4.

5. 6.

E. Andrews, E. Laforge, C. Lumduanhom and P. Zhang, On proper-path colorings in graphs. J. Combin. Math. Combin. Comput. To appear. V. Borozan, S. Fujita, A. Gerek, C. Magnant, Y. Manoussakis, L. Monteroa and Z. Tuza, Proper connection of graphs. Discrete Math. 312 2550-2560, 2012. G. Chartrand, G. L. Johns, K. A. McKeon and P. Zhang, Rainbow connection in graphs. Math. Bohem. 133 85–98, 2008. X. Li and C. Magnant, Properly colored notions of connectivity - a dynamic survey, Theory and Applications of Graphs: Vol. 0: Iss. 1, Article 2. 2015. X. L. Li and Y. F. Sun, Rainbow Connections of Graphs. Springer, Boston, MA, 2012. V. G. Vizing, On an estimate of the chromatic class of a p-graph. (Russian) Diskret. Analiz. 3, 25–30, 1964

New Algorithm for Calculating Chromatic Index of Graphs and its Applications

5

F. Salama1,2

Department of Mathematics, Faculty of Science, Tanta University, Tanta, Egypt

1

Department of Mathematics, Faculty of Science, Taibah University, Madinah, Kingdom of Saudi Arabia

2

ABSTRACT The problem of edge coloring is one of the fundamental problems in the graph theory. Graph coloring problem arises in several ways like to allocate scheduling and assignments. To follow this line of investigation, we design a new algorithm called “RF algorithm” to color the edges of a graph. In addition, we reinstate some classical results by applying the RF algorithm. Citation: (APA): Salama, F. (2019). New algorithm for calculating chromatic index of graphs and its applications. Journal of the Egyptian Mathematical Society, 27(1), 18. (8 pages) DOI: https://doi.org/10.1186/s42787-019-0018-9 Copyright: © Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Graphs: Theory and Algorithms

62

INTRODUCTION The problem of edge coloring appeared with the four-color problem. In 1880, Tait wrote the first paper dealing with the problem of edge coloring. Tait proved that only three colors are used to color the edges of every 3-connected planar graph. An s-edge coloring, s is a positive integer, is a way to color edges with s colors. The chromatic index χ ΄ (x) is the minimum number of different colors needed to color edges such that any two adjacent edges are colored by different colors (for more details, see [1, 3,4,5, 7–9, 11,12,13,14]). Konig has is the most proved, in 1916, that χ ΄ (x) = ∆(x) for every bipartite graph. number of colors that any graph can be edge-colored with, this was given by Shannon. Moreover, Vizing obtained for any simple graph ∆(x) + 1 ≥ χ ΄ (x). Omai et al. [10] determined the AVD-chromatic index for the powers of paths. Further, Lehner [6] established a result state that if every non-trivial automorphism of a countable graph G with distinguishing index D ΄ (G) moves infinitely many edges, then D ΄ (G) ≤ 2. Grzesik and Khachatrian [2] proved that k1, m, n is interval colorable iff gcd(m + 1, n + 1) = 1.

The above discussions motivate us to design a new algorithm to calculate the chromatic index of the graph. The main objective of our algorithm is to find a coloring that uses the smallest possible number of distinct colors. Some examples are given in support of our algorithm.

THE MAIN RESULTS In this article, a new algorithm, RF coloring algorithm, will be designed to evaluate chromatic index of loopness graph. RF coloring algorithm is introduced as follows: Consider a graph G of order n and size m. List its vertices as v1, v2, v3, … , vn and its edges as e1, e2, e3, … , em. •

•

Step1. Write the incidence matrix of the graph G

Step2. In this step, we constitute the RF coloring matrix from the incidence matrix as follows: (a). In the first row of the RF matrix, put

New Algorithm For Calculating Chromatic Index of Graphs ...

63

After that put

In the same way, for entry a∗1j, where 1 < j ≤ m, put

(b). Any column k has the entry a1k∗=h put

where 1 < i ≤ n and 1 < k ≤ m. (c). Now start from the second row put

where 1 ≤ l ≤ m, 1 ≤ j ≤ m and 1 ≤ s ≤ n. (d). Any column l has the entry a∗2l=fa2l∗=f put

(e). Again repeat steps (c) and (d) to complete the RF coloring matrix

•

Step3. The greatest number in the RF coloring matrix is the chromatic index of the graph G and the value of entry aij∗ is the color of the edge ej where aij∗≠0.

64

Graphs: Theory and Algorithms

In the following, we give examples solved by the new algorithm: Example1. Let G be a graph shown below (Fig. 1).

Figure. 1. The graph G.

The chromatic index of the graph G will be calculated by using the RF coloring algorithm as follows:

Then, the RF coloring matrix is given by

New Algorithm For Calculating Chromatic Index of Graphs ...

65

From the above matrix, we find the chromatic index χ ΄ (G) of the graph G is equal to 3. Example2. Given a graph G, shown in Fig. 2.

Figure. 2. The quartic graph.

Applying the RF coloring algorithm step by step to evaluate the colors of edges and chromatic index as shown in the sequence of matrices below:

Hence, the RF coloring matrix is given by

It is clear from the RF coloring matrix that the chromatic index of the graph G is 5, i.e., χ ΄ (G) = 5 and the color of e1, e7, e10 is 1, the color

66

Graphs: Theory and Algorithms

of e3, e4, e12 is 2, the color of e2, e5 is 3, the color of e6, e9 is 4, and the color of e8, e11 is 5.

In the following section, we reprove some theorems by using RF coloring algorithm

Theorem 1 Let Sn be a star graph of order n. Then, χ ΄ (Sn) = n − 1, where χ ΄ (Sn) is the chromatic index of Sn.

Proof Let Sn be a star graph of order n as shown in Fig. 3.

Figure. 3. A cycle graph Cn

By applying the RF coloring algorithm, we found the RF coloring matrix is given by

The greatest number in the RF coloring matrix is n − 1, then the chromatic index of Sn equals n − 1. Theorem 2 Let Cn be a cycle graph of order n. χ ΄ (Cn) is the chromatic index of Cn.

where

Proof Let Cn be a cycle graph of order n. Applying the RF coloring algorithm, we will stop when the RF coloring matrix will become

Now we want to evaluate the values of l and m and we have two cases:

New Algorithm For Calculating Chromatic Index of Graphs ...

Case 1. When n is even then l is the entry in then m = 1; hence, the RF coloring matrix is

67

, i.e., l = 2. Since

and the chromatic index of Cn from the RF coloring matrix is 2, i.e., χ ΄ (Cn) = 2. Case 2. When n is odd then l is the entry in

, i.e., l = 1. So, m = 3

; hence, the RF coloring matrix in this case

because is given by

and the chromatic index of Cn equals 3, i.e., χ ΄ (Cn) = 3.

One of the most famous applications on edge coloring of a graph is the timetable; let us give an example and solve it by our new algorithm as follows: In a particular faculty of science, the Mathematics department has five teachers’ t1, t2, t3, t4, t5. The teaching assignments of the five teachers are given by the array:

t1 t2 t3 t4 t5

I Year Y1 1 1 1 __ __

II Year Y2 2 1 __ __ __

III Year Y3 __ 1 __ 1 1

IV Year Y4 __ __ 2 __ 1

We want to find the minimum period timetable. To solve this problem, first we draw the graph corresponding to this problem, see Fig. 4

68

Graphs: Theory and Algorithms

Figure. 4. The graph corresponding to the time table problem

By applying the RF coloring algorithm, we find the RF coloring matrix is given by:

It is clear from the RF coloring matrix above that we have a 3-period timetable given by: t1 t2 t3 t4 t5

Period I Y1 Y2 Y4 __ Y3

Period II Y2 Y3 Y1 __ Y4

Period III Y2 Y1 Y4 Y3 __

CONCLUSION In the present manuscript, we have designed a new algorithm to find the chromatic index to color the edges of a graph. It has been found that some classical results are successfully established by our new algorithm. The new introduced algorithm has been successfully applied to a real-life example where a correct result has been obtained. It has a great scope for further research in the field of graph theory, computer programming, and algebraicspecific structures.

ACKNOWLEDGEMENTS I am so grateful to the reviewers for their many valuable suggestions and comments that significantly improved the paper.

New Algorithm For Calculating Chromatic Index of Graphs ...

69

REFERENCES 1. 2. 3. 4.

5.

6. 7.

8.

9.

10.

11. 12. 13. 14.

Bondy, J.A., Murty, U.S.R.: Graph Theory, Graduate Texts in Mathematics Series, Elsevier Science Publishing Co. Inc., (2008) Grzesik, A., Khachatrian, H.: Interval edge-colorings of k 1,m,n. Discret. Appl. Math. 174, 140–145 (2014) Elsevier Harris, J.M., Hirst, J.L., Mossinghoff, M.J.: Combinatorial and Graph Theory. Springer science +Business Media, USA (2008) Hermann, F., Hertz, A.: Finding the chromatic number by means of critical graphs. Electron. Nin. Discret. Math. 5, 174–176 (2000) Elsevier Kandel, A., Bunke, H., Last, M.: Applied Graph Theory in Computer Vision and Pattern Recognition. Springer-Verlag, Berlin Heidelberg (2007) Lehner, F.: Breaking graph symmetries by edge colourings. J. Combinatorial Theory, Series B. 127, 205–214 (2017) Elsevier Liu, C., Zhu, E.: General vertex-distinguishing total coloring of graphs. J. Appl. Math. 2014, 7 (2014) Hindawi Publishing Corporation. https:// doi.org/10.1155/2014/849748. Luna, G., Romero, J.R.M., Moyao, Y.: An approximate algorithm for the chromatic number of graphs. Electron. Notes Discret. Math. 46, 89–96 (2014) Elsevier Molloy, M., Reed, B.: Colouring graphs when the number of colours is almost the maximum degree. J. Combinatorial Theory, Series B. 109, 134–195 (2014) Elsevier Omai, M.M., de Almeida, S.M., Sasaki, D.: AVD-edge coloring on powers of paths. Electron. Notes Discret. Math. 62, 273–278 (2017) Elsevier Salama, F.: 1-mother vertex graphs. Int. J. Math. Comb. 1, 123–132 (2011) Salama, F., Rafat, H., El-Zawy, M.: General-graph and inverse-graph. Appl. Math. 3, 346–349 (2012) Salama, F., Rafat, H.: General-graph and inverse-graph. Appl. Math. 3, 346–349 (2012) Tomon, I.: On the chromatic number of regular graphs of matrix algebras. Linear Algebra Appl. 475, 154–162 (2015) Elsevier

An Edge-Swap Heuristic for Finding Dense Spanning Trees

6

Mustafa Ozen1, Hua Wang2, Kai Wang2 and Demet Yalman1

Bogazici University

1

Georgia Southern University

2

ABSTRACT Finding spanning trees under various restrictions has been an interesting question to researchers. A “dense” tree, from a graph theoretical point of view, has small total distances between vertices and large number of substructures. In this note, the “density” of a spanning tree is conveniently measured by the weight of a tree (defined as the sum of products of adjacent vertex degrees). By utilizing established conditions and relations between

Citation: (APA): Ozen, M., Wang, H., Wang, K., & Yalman, D. (2016). An edge-swap heuristic for finding dense spanning trees. Theory and Applications of Graphs, 3(1), 1. (12 pages) DOI: https://doi.org/10.20429/tag.2016.030101

Copyright: © This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

Graphs: Theory and Algorithms

72

trees with the minimum total distance or maximum number of sub trees, an edge-swap heuristic for generating “dense” spanning trees is presented. Computational results are presented for randomly generated graphs and specific examples from applications.

INTRODUCTION Given an undirected graph G with vertex set V and edge set E, a sub tree of G is a connected acyclic sub graph of G. A sub tree with vertex set V is a spanning tree of G. Finding spanning trees (under various restrictions) of a given graph is of importance in many applications such as Information Technology and Network Design. Many questions have been studied in this aspect, including, but not limited to, the well-known minimum-weight spanning tree problem (MSTP), spanning trees with bounded degree, with bounded number of leaves, or with bounded number of branch vertices. The goal in such studies is usually to find efficient algorithms to produce the desired sub graphs. Recently an edge-swap heuristic for generating spanning trees with minimum number of branch vertices was presented [7], where an efficient algorithm resulted from iteratively reducing the number of branch vertices from a random spanning tree by swapping tree edges with edges not currently in the tree. A tree of given number of vertices is considered “dense” if the number of substructures (including isomorphic sub graphs) is large or the total distance between vertices is small. In applications, structures generated from such spanning trees are preferred as they have more choices of sub-networks and allow more efficient transfer of resources with minimum cost. In this note we discuss an edge-swap heuristic, inspired by similar work presented in [7], for finding dense spanning trees.

PRELIMINARIES The number of sub trees and the total distance of a tree belong to a group of graph invariants, called topological indices that are used in the literature as effective descriptors of graph structures. For instance: •

•

The sum of distances between all pairs of vertices, also known as the Wiener index [11], is one of the most well-known distancebased index in chemical graph theory; The number of sub trees is an example of counting-based indices introduced from pure mathematical point of view [8] and applications in phylogeny [2].

An Edge-Swap Heuristic for Finding Dense Spanning Trees

73

These two indices have been extensively studied. In particular, it is well known that the star minimizes the Wiener index and maximizes the number of sub trees while the path maximizes the Wiener index and minimizes the number of sub trees. More interestingly, among tress of given degree sequence, the greedy tree (Definition 1) was shown to minimize the Wiener index [6, 9, 12] and maximize the number of sub trees [13], where the degree sequence is simply the no increasing sequence of vertex degrees. Definition 1 (Greedy trees). Given a degree sequence, the greedy tree is achieved through the following “greedy” algorithm: i)

Start with a single vertex v = v1 as the root and give v the appropriate number of neighbors so that it has the largest degree; ii) Label the neighbors of v as v2, v3, . . ., assign to them the largest available degrees such that deg(v2) ≥ deg(v3) ≥ · · · ; iii) Label the neighbors of v2 (except v) as v21, v22, . . . such that they take all the largest degrees available and that deg(v21) ≥ deg(v22) ≥ · · · , then do the same for v3, v4, . . .; iv) Repeat (iii) for all the newly labeled vertices, always start with the neighbors of the labeled vertex with largest degree whose neighbors are not labeled yet. For example, Fig. 1 shows a greedy tree with degree sequence (4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 2, 2, 1, . . . , 1).

Figure 1. A greedy tree.

Interestingly, the greedy trees are also extremal with respect to many other graph indices, among which is the following special case of the Randi´c index [3], also called the weight of a tree [4]:

74

Graphs: Theory and Algorithms

A comprehensive discussion of the extremal trees of given degree sequences with respect to functions defined on adjacent vertex degrees can be found in [10]. For trees of different given degree sequences, much work has been done in comparing the greedy trees (of the same order) of different degree sequence. In particular, for two no increasing sequences is said to majorize if for k = 1, · · · , n − 1 and

The concept of majorization has been applied to the comparison of greedy trees of different degree sequences in order to find the dense structures (with minimal total distance or maximal number of sub trees) under various constraints. See [13] for an example of such discussions. For convenience we also say that is “higher in the majorization ladder” than π 0 if π’’ . majorizes π’ and To find dense spanning trees, our edge-swap heuristic starts with a random spanning tree. We then continuously remove a “bad” edge and add a “good” edge in order to improve the density of the spanning tree. From the perspective of distance-based and structure-based graph indices, evaluating the corresponding index of the resulted tree at each step would be extremely time consuming. We propose an edge-swap heuristic that is based on the above results and use R (T) instead of the distance or number of sub trees as an effective measure. In every step, we consider the degrees of the end vertices of the edge to be removed or added, as well as the resulted change in R (T). Such a strategy simultaneously optimizes the value of the R (T) and improves the degree sequence in the ladder of majorization. The consideration of R (T) results in an efficient algorithm that quickly finds a dense spanning tree, which we present in the next section. Computational results will be provided for both randomly generated graphs and specific examples from applications. We also comment on improvements of the final result with the degree sequences taken into account.

THE EDGE-SWAP HEURISTIC In this section we present an edge-swap heuristic in detail. The following algorithm takes a graph G = (V, E) as input and returns a dense spanning tree T as output.

An Edge-Swap Heuristic for Finding Dense Spanning Trees

75

Step 1. Input G (V, E) and generate a random spanning tree T for G. Let SPARSE be “true”. Step 2. Step 2-1: Find the candidate edge e to be removed from T. For each edge e = uv ∈ E(T), let

where du and dv are the degrees of the vertices u and v respectively (in T), dui for 1 ≤ i ≤ du − 1 (dvi for 1 ≤ j ≤ dv − 1) are the degrees of the other neighbors of u (v) in T. Let e be an edge with the minimum f(.) value. Step 2-2: Generate the spanning forest T’ = T − e with two components Tu and Tv. Step 2-3: Find the candidate edge e’’ (with end vertices in Tu and Tv respectively) to be added to T For each edge

let

where du’ and dv’ are the degrees of the vertices u’ and v’ respectively (in T’), for neighbors of u’ (v’) in T’.

are the degrees of the

Step 2-4: Generate the spanning tree T’’ = T’ + e’’ Step 2-5: If f(e) < g(e’’), let SPARSE be “true”. Otherwise let SPARSE be “false”. Step 3. While SPARSE is “true”, let T = T’’ and repeat Step 2. Return T when SPARSE is “false”. Figure 2 and Figure 3 present a step by step illustration (left → right and top → bottom) of the algorithm, where the spanning trees in each step is shown in bold face and the removed edge in each step is shown with a dotted line. Remark 1. In the above algorithm, the value

76

Graphs: Theory and Algorithms

is the maximum possible improvement in R(.) over one swap. In the case of a tie (i.e., multiple edges can serve as e or e’’), we simply pick one of them. Since after each swap, the value of R (T) is strictly increasing, this process terminates after finitely many steps.

COMPUTATIONAL RESULTS Of course, the heuristic proposed in the previous section does not guarantee the densest spanning tree as an output. But as experimental results show, this heuristic effectively finds a dense spanning tree within very few swaps and hence is of great practical interests. When tested on 100 randomly generated graphs, each of order 15 and containing a spanning star, the algorithm returns a star in over 60 runs. Part of this data (that is representative of the general performance) is shown in Table 1. Note that a star on 15 vertices has total distance 196. This is attainable for all graphs considered above. As shown in Table 1, all resulting spanning trees are dense (even if it is not a star) with only one exception, the graph “D”. In the example shown in Figure 4, 7 edge-swaps resulted in the final spanning tree from the original graph on 15 vertices and 37 edges. When applied to the US Airports data set of 332 vertices and 2126 edges [5], only 15 edge-swaps were needed to obtain the final spanning tree. In this case, total distance of the tree is reduced from 1444880 to 1421327, a reduction of 23553.

Figure 2. Step by step illustration of the algorithm.

An Edge-Swap Heuristic for Finding Dense Spanning Trees

77

Figure 3. The original graph (on the left) and the resulted spanning tree (on the right).

Table 1. Results of ten randomly generated graphs on 15 vertices

Figure 4. The original graph with first spanning tree (on the left) and the resulted spanning tree (on the right).

FURTHER IMPROVEMENTS In earlier sections we provided a simple and efficient edge-swap heuristic to find dense spanning trees. Computational results are also presented and analyzed. A

78

Graphs: Theory and Algorithms

simple way of improving the likelihood of achieving denser spanning trees can be obtained by replacing Step 2-5 of the algorithm with the following: (Step 2-5)’: If f(e) < g(e’’) or f(e) = g(e’’) and the degree sequence of T’’ is higher in the majorization ladder than that of T, let SPARSE be “true”. Otherwise let SPARSE be “false”. In this case, after each swap, the value of R(T) is strictly increasing or no decreasing with the degree sequence moving up in the majorization ladder. Take, for instance, two of the randomly generated graphs on 15 vertices as discussed in the previous section, Figures 5 and 6 show improvements in the resulted spanning tree.

Figure 5. The resulted spanning trees from the original algorithm (left) and the modified algorithm (right).

Figure 6. The resulted spanning trees from the original algorithm (left) and the modified algorithm (right).

When applying the modified algorithm to the US Airports data set, the improvement over the original algorithm is shown in Table 2. For completeness, we also describe the pseudo-code (for the modified algorithm) in Algorithm 1.

An Edge-Swap Heuristic for Finding Dense Spanning Trees

79

Table 2. Comparison of the algorithms with the US Airport data set

Algorithm 1. Pseudo-code for the modified edge-swap heuristic. The algorithm starts with loading data set in the format of a n × 3 matrix. The first two columns represent an edge with two vertices and last column shows weights between these vertices. A minimum weighted spanning tree T is then generated through the Kruskal algorithm. Each iteration of edgeswapping includes removing a “bad” edge and adding a “good” edge which is not in the current tree. Function “find Removal Edges” takes adjacency matrix of the tree T as an input and returns list of the candidate edges with the minimum value of f(.). From candidate list, one of the edges, e = (u, v) is chosen to be removed. After removing the edge from the adjacency matrix of the tree T, we obtain “T- r” as the updated graph. Function “split” is used to split the adjacency matrices of new two sub trees up and return the lists of vertices in each component. Function “find Insertion Edges” takes the lists, adjacency matrices of the updated graph and the original graph as inputs. After calculating g(.) for each candidate edge, it returns minimum g(.) and

80

Graphs: Theory and Algorithms

list of corresponding edges. One of the candidate edges e 00 = (u’’, v’’) is chosen to be inserted and the updated tree “T a” is obtained. If f(e) < g(e’’), the new tree is denser than previous and the process continues. If f(e) = g(e’’), the degree sequences of the initial tree and current tree are calculated. If the degree sequence of the current tree “T a” majorizes that of T (and the two degree sequences are not the same), then an edge-swap is made; otherwise process is terminated.

COMPLEXITY ANALYSIS For the original question of finding a spanning tree that maximizes the number of sub trees or minimizes the total distance, the complexity appears to be difficult to determine. To our best knowledge, the complexity of this problem is not yet determined. However, given that “dense” trees usually have large number of leaves and the problem of finding spanning trees with the most leaves is NP-hard [1], it is natural to guess that finding “dense” spanning trees is also hard. On the other hand, the complexity can be easily deduced for our algorithm described in Section 3. It is easy to see that it takes O(n) time to find an edge to remove in Step 2-1. The total number of edges in G is at most (n 2), implying that Step 2-3 takes O(n2 ) time to complete. Since each iteration the original algorithm strictly increase the value of R(.) and the largest possible R(T) on an n-vertex tree is achieved by R(K1,n−1) = (n − 1)2 , there are at most O(n 2 ) iterations. Thus the complexity of the original algorithm is O(n4). Note that this is just worst case analysis. In practice the algorithm usually runs much faster. As shown in the previous section, the results can be improved by implementing a slightly different Step 2-5 as shown in the modified algorithm (Algorithm 1). Note that each degree sequence of a tree on n vertices is equivalent to (after removing 1 from each degree and only keeping non-zero entries) a non-increasing sequence of positive integers that sum up to 2n − 2. For example, a degree sequence (5, 4, 3, 3, 3, 2, 1, . . . , 1) of a tree on 16 vertices is mapped to (4, 3, 2, 2, 2, 1) with 4 + 3 + 2 + 2 + 2 + 1 = 14. Thus the number of all possible degree sequences of a spanning tree of a graph on n vertices is the same as the number of integer partitions of n − 2, which is exponential. This implies that the worst case scenario of the modified algorithm could have its number of iterations being exponential. However, as shown in Table 2 for the US Airport data set and experimentation with random graphs, the modified algorithm generally terminates much faster.

An Edge-Swap Heuristic for Finding Dense Spanning Trees

81

REFERENCES 1.

2.

3. 4. 5. 6.

7.

8. 9.

10. 11. 12.

13.

M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness (1979) W. H. Freeman & Co., New York, NY, US B. Knudsen, Optimal multiple parsimony alignment with affine gap cost using a phylogenetic tree, Lecture Notes in Bioinformatics 2812, Springer Verlag, 2003, 433–446. M. Randi´c, On characterization of molecular branching, J. Amer. Chem. Soc. 97 (1975) 6609–6615. D. Rautenbach, A note on trees of maximum weight and restricted degrees, Discrete Math. 271 (2003) 335–342. Pajek datasets, US Air lines: http://vlado.fmf.uni-lj.si/pub/networks/ data/ N. Schmuck, S. Wagner, H. Wang, Greedy trees, caterpillars, and Wiener-type graph invariants, MATCH Commun.Math.Comput.Chem. 68(1) (2012), 273–292. R. Silva, D. Silva, M. Resende, G. Mateus, J. Goncalves, P. Festa, An edge-swap heuristic for generating spanning trees with minimum number of branch vertices, Optim. Lett. 8 (2014) 1225–1243. L.A. Szekely, H. Wang, On subtrees of trees, Advances in Applied Mathematics 34 (2005), 138–155. H. Wang, The extremal values of the Wiener index of a tree with given degree sequence, Discrete Applied Mathematics, 156 (2008), 2647– 2654. H. Wang, Functions on adjacent vertex degrees of trees with given degree sequence, Central European J. Math. 12 (2014) 1656–1663. H. Wiener, Structural determination of paraffin boiling point, J. Amer. Chem. Soc. 69 (1947), 17–20. X.-D. Zhang, Q.-Y. Xiang, L.-Q. Xu, R.-Y. Pan, The Wiener index of trees with given degree sequences, MATCH Commun.Math.Comput. Chem., 60 (2008), 623–644. X.-M. Zhang, X.-D. Zhang, D. Gray, H. Wang, The number of subtrees of trees with given degree sequence, J. Graph Theory, 73(3) (2013), 280–295.

Identifying Network Structure Similarity using Spectral Graph Theory

7

Ralucca Gera2, L. Alonso1, Brian Crawford3, Jeffrey House4, J. A. Mendez-Bermudez1, Thomas Knuth1 and Ryan Miller2

Instituto de Física, Benemérita Universidad Autónoma de Puebla, Apartado Postal J-48, Puebla 72570, Mexico.

1

Department of Applied Mathematics, 1 University Avenue, Naval Postgraduate School, Monterey 93943, CA, USA.

2

Department of Computer Science, 1 University Avenue, Naval Postgraduate School, Monterey 93943, CA, USA.

3

Department of Operation Research, 1 University Avenue, Naval Postgraduate School, Monterey 93943, CA, USA.

4

Citation: (APA): Gera, R., Alonso, L., Crawford, B., House, J., Mendez-Bermudez, J. A., Knuth, T., & Miller, R. (2018). Identifying network structure similarity using spectral graph theory. Applied network science, 3(1), 2. (15 pages)

Copyright: © Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

84

Graphs: Theory and Algorithms

ABSTRACT Most real networks are too large or they are not available for real time analysis. Therefore, in practice, decisions are made based on partial information about the ground truth network. It is of great interest to have metrics to determine if an inferred network (the partial information network) is similar to the ground truth. In this paper we develop a test for similarity between the inferred and the true network. Our research utilizes a network visualization tool, which systematically discovers a network, producing a sequence of snapshots of the network. We introduce and test our metric on the consecutive snapshots of a network, and against the ground truth. To test the scalability of our metric we use a random matrix theory approach while discovering Erdös-Rényi graphs. This scaling analysis allows us to make predictions about the performance of the discovery process.

INTRODUCTION The successful discovery of a network/graph is of great interest to the Network Sciences community. Many algorithms have been proposed for network discovery. But, our focus is on when we have discovered enough of the network to be similar to the ground truth, namely representative of the whole network. We measure similarity of temporal snapshots of a network, as it is discovered through monitor placement, by comparing consecutive temporal snapshot (sub graphs) produced in the inference of the network. For a given network, one perspective on network discovery is to consider any subgraph as one of many possible outcomes from some discovery process. For a simple graph G (V, E), with |V (G)|=n, and |E (G)|=m, there are 2m possible sub graphs on n vertices. In real-world applications, say if m=1200, the count of possible sub graphs grows rapidly: 21200 is on the order of 10360. Any discovered sub graph is one of many possible random outcomes. We wish to determine whether one collection of discovered nodes and edges is very similar to the underlying graph. A comparison technique like percent of vertices discovered, percent of nodes discovered, or degree sequence distribution reveals a practical problem. In general, we do not know what the underlying network looks like. How do we compare graphs to the ground truth if we do not know the ground truth? We search for a method that identifies similar graphs during the discovery process, so that we know at what point of the inference only little information is being discovered, so pursuing the inference has little benefit.

Identifying network structure similarity using spectral graph theory

85

The method we use is based on nonparametric statistical tests that tells if two consecutive snapshots are similar, without actually knowing the true network. In a two sample nonparametric test we compare two samples and assume they came from the same distribution. The alternative hypothesis is that there was significant change between the two samples. For this purpose, we introduced in Crawford et al. (2016), the two sample nonparametric test on Sequential Adjacency and Laplacian Matrix Eigenvalue Distribution. For a proof of concept, we examined a synthetic network (an Erdös-Rényi random graph) and three terrorist networks. In the current paper, for a complementary analysis, we also use normalized Laplacian Matrix Eigenvalue Distribution to compare snapshots. We contrast the methodology based on either the Sequential Adjacency, or Laplacian or normalized Laplacian Matrix Eigenvalue Distribution. Furthermore, we perform a theoretical systematic study of our metric while discovering ensembles of Erdös-Rényi graphs. This allows us to make predictions about the performance of a discovery process, characterized by our metric, once the basic properties of a network (of Erdös-Rényi–type) are known. We then present the resulting theory on a terrorist network for validation.

BACKGROUND In graph theory, an established metric for graph comparison is isomorphism. Two labeled graphs G and H are isomorphic if there exists a bijection ϕ from V(G) to V(H) such that uv ∈E(G) if and only if ϕ(u) ϕ(v)∈E(H) (Chartrand and Zhang 2012). Comparing graphs based on isomorphism has a binary outcome: the graphs are either exactly the same (isomorphic), or they are different (non-isomorphic). In practice we prefer similarity values to belong to a range, and to converge as we approach isomorphism. We build in the validation of our comparison methodology by using a network discovery process (or lighting up a network) that produces a sequence of consecutive temporal snapshots. An assumption we make is that consecutive snapshots of the network are similar, which was validated using http://faculty.nps.edu/rgera/projects.html (Gera 2015).

Similarity of networks Existing similar research has considered the count or the percent of nodes/ edges discovered during a network’s discovery. For a network G, this was done by measuring the percent that has been discovered at step i with the subgraph G i through tracking |V(G i )|/|V(G)| and |E(G i )|/|E(G)|. While both

Graphs: Theory and Algorithms

86

are great intuitive measures, they only captures the cardinality of sets of nodes and edges discovered, but not so much the topology of the network itself (Davis et al. 2016; Chen et al. 2017; Wijegunawardana et al. 2017). Common metrics for measuring general network similarity use comparison of degree distributions, density, clustering coefficient, average path length and etc. Inexact matching of two networks after a sequence of edits is commonly measured by Graph Edit Distance (GED). GED measures the cost of adding/removing/substituting nodes and edges to make one graph look like the other one. This works well for shortest paths rather than arbitrary graphs. Algorithms use combinatorial searches over the space of possible edits, therefore they are computationally intensive. To optimize this idea, Kernel functions are used that explore regions of the network to be matched using GED such as: •

Node/Edge Kernel: For labeled graphs, whenever two nodes/ edges have the same label, the kernel function value is 1, otherwise 0. If the node/edge labels take real values, then a Gaussian kernel is used. • Path kernel: Whenever two paths (sequences of alternating nodes and edges) are of the same length, the path kernel is the product of the kernels of the node and edge on the paths. If the length of the common paths is different (i.e., the algorithm didn’t detect a common path between the networks), the value of the path kernel function is 0. • Graph kernel: The graph kernel compares sub graphs of each of the graphs by comparing the proportion of all common paths out of all possible paths in each of the two sub graphs. Kashima and Inokuchi (2002) map networks to feature vectors and then cluster the vectors based on Naïve distance methods. Attributes about the data are needed in order to create the feature vectors, which they mine using search engines on the Web. More sophisticated methods consider capturing networks’ topology before comparing them. For example, Gromov–Hausdorff (G-H) distance uses shape analysis. The network’s shape is constructed by piecing together small sub graphs whose similar structure is easy to find. This shape is then transformed into a linkage matrix that captures how these sub graphs interconnect. Then the G-H distance is the farthest distance any node of a network G is from the network H, or the farthest any node of H is from G, whichever is greater, taken over all possible embedding (drawing) of the two networks G and H (Lee et al. 2011).

Identifying network structure similarity using spectral graph theory

87

Similarly, Pržulj uses graph lets (Pržulj 2007) to capture the topology of a network. Graphlets are all possible subgraphs of small number of nodes capturing the local structure properties. Graphlet Frequency Distribution can be used to compare networks, by keeping track of the frequencies of all the different size graph lets (Rahman et al. 2014). This is unfeasible for large graphs as it requires an exact count of each graphlet. It has been used in comparing aerial images (Zhang et al. 2013), scene classification plays an important role in multimedia information (Zhang et al. 2011), and learning on synthetic networks (Janssen et al. 2012). Koutra et al. proposed DeltaCon (Koutra et al. 2013), a scalable algorithm for same size networks, based on nodes influence. It compares the two networks with the exact same node set, by computing each node’s influence on the other network’s nodes. These values are stored as matrices for each network, and the difference between the matrices is then measured to give an affinity score measuring similarity. Feasible inexact matching of different size graphs use spectral analysis. The spectrum of classes of graphs has been questioned since 1956 (Günthard and Primas 1956), and further studied since then (VanDam and Haemers 2003; Florkowski 2008; Gera and Stanica 2011). General graph theoretical results are known since the 70s (Cvetković 1971), reviewed in (Chung 1997), and later extended for complex networks (VanMieghem 2010). Spectral clustering of graphs can use the eigenvalues of several matrices (Wilson and Zhu 2008). We use the adjacency matrix A, the Laplacian L=D−A (where D is the degree matrix), and normalized Laplacian defined as the matrix:

where d v denotes the degree of the vertex v.

The eigenvalues of each of these matrices define the spectrum of the network. While the existence of co spectral graphs (i.e. graphs that share the same adjacency matrix spectrum) is known since 1971 (VonCollatz and Sinogowitz 1957; Harary et al. 1971) and ongoing research considers the graphs that are determined by their spectra (VanDam and Haemers 2003), spectral analysis is very useful in comparing networks as we explain next. Particularly, since finding these co-spectral graphs is “out of reach” (Schwenk 1973; Godsil and McKay 1982).

88

Graphs: Theory and Algorithms

Eigenvalue analysis is used to describe the behavior of a dynamic system (Trefethen and Bau III 1997), and in our case, the behavior of a network representing the system. To see its relevance in comparing networks, note that eigenvalues measure the node cluster cohesiveness or community structure that has widely been studied in network science. Moreover, the algebraic connectivity of the graph, and thus the spectra, captures the topology of the graph (Frankl and Rödl 1987). Of particular interest for us, is that the spectral clustering can differentiate between the structural equivalence and the regular equivalence of nodes. For the structural equivalence, nodes are placed in the same community if they have similar connection patterns to the same neighbors. For the regular equivalence, nodes are placed in the same community if they have similar connection patter to any neighbors. Of course, this can be extended to probabilistic models where stochastic equivalences are introduced based on groups being stochastically equivalent if their respective connecting probabilities to neighbors are the same. The distribution of eigenvalues of the adjacency matrix can be found in Chung (1997), focused on the correlation of the range of the distribution of eigenvalues to the type of graph. This was further studied as the behavior of the distribution of the eigenvalues of the graph, such as convergence results in (Dumitriu and Pal 2012). Correlations between the power law distribution of the graph and the distribution of the eigenvalues have been presented in (Mihail and Papadimitriou 2002). Analyzing several real graphs, they inferred that if the degrees of the graph d1…d n were power law distributed, then there is a high probability that the eigenvalues of the graph (Mihail and will be power law distributed (and take on the values Papadimitriou 2002). The distribution of the eigenvalues of the Laplacian is more closely linked to the structure of the graph than only using the eigenvalues of the adjacency list (Chung 1997). The normalized Laplacian contains the degree distribution as well as the adjacency matrix information from the graph. While spectral analysis was previously used to cluster similar trees and synthetic graphs (Wilson and Zhu 2008), we use the spectra within a different methodology based on nonparametric statistics. Nonparametric statistical tests can capture whether two samples of a network are similar without actually knowing the true networks. We compare the eigenvalues distributions of two samples (sub graphs) and test the assumption they came from the same distribution. The alternative hypothesis is there is significant change between the two samples. By looking

Identifying network structure similarity using spectral graph theory

89

at the actual step by step inference using Gera (2015), we could visually see small differences between consecutive snapshots. No major changes happen in consecutive snapshots. We use the nonparametric test of Ruth and Koyak (2011), where the first m of N observations X1⋯X m ⋯X N are assumed to follow distribution F1 and the rest are from F2. This allows us to see a “shift point” at Xm+1 where our samples are no longer from the same distribution. For our research, each observation X i is the eigenvalue distribution of a sampled/ inferred graph. In our case, the null hypothesis is the distributions of eigenvalues of the two samples are the same. Each test will yield a p-value representing the probability that the test statistic would be as extreme or more extreme than what was observed with a particular sample, assuming the null hypothesis is true. If two samples have very different eigenvalue distributions, the null hypothesis is less plausible and the p-value is low. If two distributions of eigenvalues match, that is strong evidence in support of the null hypothesis, and any other outcome would be more extreme than what was observed with a have close to 1 p-value.

Scaling the Erdös-Rényi networks We will use the Erdös-Rényi random graph model for a scaling analysis of the metric. The Erdös-Rényi random graph model is characterized by two parameters: the number of nodes (or graph size) N and the connectivity probability α, where α is defined as the fraction of the N(N−1)/2 independent non-vanishing off-diagonal adjacency matrix elements. All the nodes are isolated when α=0, whereas we have a complete graph for α=1. From a random matrix theory point of view it is a common practice to look for the scaling parameter(s) of a random matrix model; in this way the universal properties of the model can be revealed (Méndez-Bermúdez et al. 2017a; 2017b). Scaling studies of the Erdös-Rényi random graph model can be found in Méndez-Bermúdez et al. (2015) and Martínez-Mendoza et al. (2013). The average degree is then ξ=α×N,

(1)

where ξ is the mean number of nonzero elements per adjacency matrix row, called the scaling parameter of the Erdös-Rényi random graph model. In particular, it was shown that spectral, eigenfunction, and transport properties are universal (i.e. equivalent) for a fixed average degree ξ.

90

Graphs: Theory and Algorithms

In fact, several papers have been devoted to analytical and numerical studies of the Erdös-Rényi random graph model as a function of ξ. Among the most relevant results of these studies we can mention that: (i) In the very sparse case (ξ→1) the density of eigenvalues was found to deviate from the Wigner semicircle law with the appearance of singularities, around and at the band center, and tails beyond the semicircle (Rodgers and Bray 1988; Rodgers and deDominicis 1990; Mirlin and Fyodorov 1991; Evangelou and Economou 1992; Evangelou 1992; Semerjian and Cugliandolo 2002; Khorunzhy and Rodgers 1997; Kühn 2008; Rogers et al. 2008; Slanina 2011); (ii) A delocalization transition of eigenstates was found at ξ≈1.4, see Mirlin and Fyodorov (1991); Evangelou and Economou (1992); Evangelou (1992) and Fyodorov and Mirlin (1991); (iii) The nearest-neighbor eigenvalue spacing distribution P(s) was found to evolve from the Poisson to the Gaussian Orthogonal Ensemble (GOE) predictions for increasing ξ, see Jackson et al. (2001); Evangelou and Economou (1992) and Evangelou (1992) (the same transition was reported for the spectral number variance in Jackson et al. (2001)); (iv) The onset of the GOE limit for the spectral properties occurs at ξ≈7, see Méndez-Bermúdez et al. (2015) and Martínez-Mendoza et al. (2013), meaning that the spectral properties of the graph above this value coincide with those of a system with maximal disorder. Also, the first eigenvalue/eigenfunction problem was addressed in Kabashima et al. (2010) and Kabashima and Takahashi (2012). For our paper, following Méndez-Bermúdez et al. (2015) and MartínezMendoza et al. (2013), we look for universal properties of the discovery algorithm.

METHODOLOGY Using the Network Visualization Tool (Gera 2015), we choose a discovery algorithm and run it on a network. This produces the sequence of inferred sub graphs to be analyzed for comparison. The chosen algorithm is not relevant, it merely creates the sequence of sub graphs. For our research, we chose Fake Degree Discovery, a sophisticated degree greedy algorithm (Gera et al. 2017) (code is available at https://github.com/Pelonza/Graph_Inference/ blob/master/Clean_Algorithms/FDD.py, see Schmitt (2015), and it can be tested at http://faculty.nps.edu/rgera/projects.html (Gera 2015)). Let G i be a sequence of graphs recorded while lighting up some given graph G, where, if i500. (iv) We do not observe any relevant difference in the scaling of the nonparametric test p-value curves for adjacency and normalized Laplacian matrices. (v) The onset of the discovery transition takes place at (3) Moreover, the scaling shown in Fig. 5 (lower panels) can be used to predict how efficient a discovery algorithm, characterized by our graph comparison metric, will be once the average degree of the graph and its size are known: Eq. (3) means (for an Erdös-Rényi–type graph of size N and ξ>7) that a discovery algorithm needs more than CNγCNγ discovery steps to uncover most of the graph. See an application in “Application of the scaling analysis” subsection.

Application of the scaling analysis We now use the main result of our scaling analysis, i.e. Eq. (3), to estimate the performance of the discovery algorithm on the Noordin Top terrorist network (even though, this real-world network is different to the ErdösRényi random network model used to derive the scaling). We first recognize that the average degree of the Noordin Top terrorist network is 21.72; well above the requirement of ξ>7 for the scaling to be valid. Then, with N=139, Eq. (3) (in combination with the values

98

Graphs: Theory and Algorithms

of C and γ reported in Table 1) predicts that the discovery algorithm needs more than 58 steps to uncover most of the graph when the algorithm is applied to adjacency or normalized Laplacian matrices; in the case of the Laplacian matrix the discovery algorithm needs more than 70 steps. Then, in Fig. 6 we show plots of the nonparametric tests p-value as a function of discovery step for adjacency, Laplacian, and normalized Laplacian matrices of the Noordin Top terrorist network and of an ErdösRényi graph with the same number of nodes and edges.

Figure. 6. Nonparametric test p-value (against ground truth) as a function of discovery step for adjacency, Laplacian, and normalized Laplacian matrices of the Noordin Top terrorist network (black curves). Red-dashed curves, included as a reference, correspond to the nonparametric test p-value of an Erdös-Rényi graph with same number of nodes and edges

The good correspondence of the curves of nonparametric tests p-value for the adjacency and normalized Laplacian matrices of the synthetic and real-world networks validates the applicability of Eq. (3): Indeed, it is clear that the discovery algorithm needs just about 58 steps to uncover both networks. However, for the case of the Laplacian matrix the discovery algorithm works much faster on the real-world network; see the middle panel in Fig. 6. Therefore, it is a good proxy even for networks that are far from being random.

CONCLUSIONS This paper explores the potential of eigenvalue distribution analysis for graph comparison. There are many questions this analysis could be the answer to. As stated in the introduction, there are several network discovery algorithms, and it is important to identify which algorithm is the most effective for discovering a network. The methodology developed in this research could be applied to measure the effectiveness of different types of network discovery algorithms.

Identifying network structure similarity using spectral graph theory

99

We introduced a methodology that measures the similarity of networks, that we validated on consecutive snapshots of networks. We achieved it using a nonparametric test on the distribution of eigenvalues of networks, using three matrices: adjacency, Laplacian and normalized Laplacian. Our numerical experiments show what we anticipated: using the p-value from the nonparametric test as a measure of similarity, (1) the distribution of eigenvalues from consecutive subgraphs become more similar as the portion of the newly discovered network is small compared to the discovered network, and (2) that the p-values stabilize towards the end of the discovery. Further, comparisons using this metric to the true underlying graph is a nondecreasing function of time (temporal discovery). In addition, we performed a systematic study of our metric while discovering ensembles of Erdös-Rényi graphs. This allowed us to consider different network sizes for our analysis. The resulting scaling analysis allows us to make predictions about the performance of a discovery process that we successfully tested on a real-world network: the Noordin Top terrorist network. We conclude that the use of sequential adjacency, Laplacian, and normalized Laplacian matrix eigenvalue distribution comparisons based on the nonparametric test p-values is a promising method to guide network discovery.

FUTURE DIRECTION One possible extension of this paper is to explore the properties of eigenvalues from the normalized Laplacian in various graph types, particularly scale free and sparse graphs. An analysis of the properties of normalized eigenvalues from the Laplacian has led to the establishment of both specific and general boundary conditions depending on the type of graph. For example, all graphs will produce eigenvalues between 0 and 2 (Chung 1997). For a complete graph, they are bounded by 0 and n/(n−1) with multiplicity (n−1) (Chung 1997). Recall that the union of two separate components results in the union of the spectrum of each component of the graph (Chung 1997). Thus, if we consider the early snapshots subgraphs in our discovery, they can be disconnected graphs. By adding more monitors, we effectively add another component to the previous subgraph. Thus the resulting updated subgraph will inherent the eigenvalues of each separate component. For example,

100

Graphs: Theory and Algorithms

when the eigenvalue distributions are compared, the eigenvalues of early temporal snapshot will be present in late temporal snapshot. Following this logic, it may be possible to infer missing eigenvalues from the graph of the distribution and reverse engineer the network. Concerning the theoretical model, a next step in our research includes the scaling analysis of the discovery algorithm when applied on other graph models (as compared to the basic Erdös-Rényi random network model we use here), such as scale-free graphs. We are particularly interested in an extension to more complicated networks, such as multilayered networks, in seeing how the identification and interdependence of the layers influences our measure.

Identifying network structure similarity using spectral graph theory

101

REFERENCES 1.

2. 3.

4. 5.

6. 7.

8. 9. 10.

11. 12. 13. 14. 15.

Aliakbary, S, Motallebi S, Rashidian S, Habibi J, Movaghar A (2015) Distance metric learning for complex networks: Towards sizeindependent comparison of network structures. 25 2:023111. Chartrand, G, Zhang P (2012) A first course in graph theory. Courier Corporation. Chen, S, Debnath J, Gera R, Greunke B, Sharpe N, Warnke S (2017) Graph Structure Similarity using Spectral Graph Theory In: Discovering Community Structure using Network Sampling.. 32nd ISCA International Conference on Computers and Their Applications (CATA). Chung, FRK (1997) Spectral graph theory, Vol. 92. American Mathematical Soc. Crawford, B, Gera R, House J, Knuth T, Miller R (2016) Graph Structure Similarity using Spectral Graph Theory In: International Workshop on Complex Networks and their Applications, 209–221.. Springer. Cvetković, DM (1971) Graphs and their spectra. Publikacije Elektrotehničkog fakulteta. Serija Matematika i fizika 354/356:1–50. Davis, B, Gera R, Lazzaro G, Lim BY, Rye EC (2016) The Marginal Benefit of Monitor Placement on Networks In: Complex Networks VII, 93–104.. Springer. Dumitriu, I, Pal S (2012) Sparse regular random graphs: spectral density and eigenvectors. Ann Probab 40(5):2197–2235. Evangelou, SN (1992) A numerical study of sparse random matrices. J Stat Phys 69(1):361–383. Evangelou, SN, Economou EN (1992) Spectral density singularities, level statistics, and localization in a sparse random matrix ensemble. Phys Rev Lett 68(3):361–364. Florkowski, SF (2008) Spectral graph theory of the hypercube. Naval Postgraduate School, Monterey. Frankl, P, Rödl V (1987) Forbidden intersections. Trans Am Math Soc 300(1):259–286. Fyodorov, YV, Mirlin AD (1991) Localization in ensemble of sparse random matrices. Phys Rev Lett 67(15):2049–2052. Gera, R, Stanica P (2011) The spectrum of generalized Petersen graphs. Australas J Combin 49:39–45. Gera, R (2015) Network Discovery Visualization Project: Naval

102

16. 17. 18.

19. 20.

21. 22.

23.

24. 25. 26.

27.

28. 29.

Graphs: Theory and Algorithms

Postgraduate School network discovery visualization project. http:// faculty.nps.edu/dl/networkVisualization/. Gera, R, Juliano N, Schmitt K (2017) Optimizing Network Discovery with Clever Walks. Godsil, CD, McKay BD (1982) Constructing cospectral graphs. Aequationes Math 25(1):257–268. Günthard, HH, Primas H (1956) Zusammenhang von Graphentheorie und MO-Theorie von Molekeln mit Systemen konjugierter Bindungen. Helv Chim Acta 39:1645–1653. Harary, F, King C, Mowshowitz A, Read RC (1971) Cospectral graphs and digraphs. Bull Lond Math Soc 3(3):321–328. Jackson, AD, Mejia-Monasterio C, Rupp T, Saltzer M, Wilke T (2001) Spectral ergodicity and normal modes in ensembles of sparse matrices. Nucl Phys A 687(3–4):40–434. Janssen, J, Hurshman M, Kalyaniwalla N (2012) Model selection for social networks using graphlets. Internet Math 8(4):338–363. Kabashima, Y, Takahashi H, Watanabe O (2010) Cavity approach to the first eigenvalue problem in a family of symmetric random sparse matrices. J Phys Conf Ser 233(1) 012001. Kabashima, Y, Takahashi H (2012) First eigenvalue/eigenvector in sparse random symmetric matrices: influences of degree fluctuation. J Phys A Math Theor 45(32) 325001. Kashima, H, Inokuchi A (2002) Kernels for graph classification In: ICDM Workshop on Active Mining.. Citeseer. Khorunzhy, A, Rodgers GJ (1997) Eigenvalue distribution of large dilute random matrices. J Math Phys 38(6):3300–3320. Klir, GJ, Elias D (2003) Architecture of Systems Problem Solving. 2nd edn., IFSR International Series on Systems Science and Engineering, vol. 21. Kluwer/Plenum, New York. Koutra, D, Vogelstein JT, Faloutsos C (2013) Deltacon: A principled massive-graph similarity function In: Proceedings of the 2013 SIAM International Conference on Data Mining, 162–170.. SIAM. Kühn, R (2008) Spectra of sparse random matrices. J Phys A Math Theor 41(29):295002. Lee, H, Chung MK, Kang H, Kim B-N, Lee DS (2011) Computing the shape of brain networks using graph filtration and Gromov-Hausdorff

Identifying network structure similarity using spectral graph theory

30.

31.

32.

33.

34.

35. 36. 37. 38.

39.

40. 41. 42. 43.

103

metric In: International Conference on Medical Image Computing and Computer-Assisted Intervention, 302–309.. Springer. Martínez-Mendoza, AJ, Alcazar-López A, Méndez-Bermúdez JA (2013) Scattering and transport properties of tight-binding random networks. Phys Rev E 88(1):012126. Méndez-Bermúdez, JA, Alcazar-López A, Martínez-Mendoza AJ, Rodrigues FA, Peron TKD (2015) Universality in the spectral and eigenfunction properties of random networks. Phys Rev E 91(3) :032122. Méndez-Bermúdez, JA, Ferraz-de Arruda G, Rodrigues FA, Moreno Y (2017) Scaling properties of multilayer random networks. Phys Rev E 96(1):012307. Méndez-Bermúdez, JA, Ferraz-de Arruda G, Rodrigues FA, Moreno Y (2017) Diluted banded random matrices: Scaling behavior of eigenfunction and spectral properties. arXiv:1701.01484. Mihail, M, Papadimitriou C (2002) On the eigenvalue power law In: Randomization and approximation techniques in computer science, 254–262.. Springer. Mirlin, AD, Fyodorov YV (1991) Universality of level correlation function of sparse random matrices. J Phys A Math Gen 24(10):2273–2286. Neri, I, Metz FL (2012) Spectra of sparse non-hermitian random matrices: An analytical solution. Phys Rev Lett 109(3) 030602. Pržulj, N (2007) Biological network comparison using graphlet degree distribution. Bioinformatics 23(2):77–83. Rahman, M, Bhuiyan MA, Rahman M, AlHasan M (2014) GUISE: a uniform sampler for constructing frequency histogram of graphlets. Knowl Inf Syst 38(3):511–536. Roberts, N, Everton SF (2011) Terrorist Data: Noordin Top Terrorist Network (Subset). [Machine-readable data file]. https://sites.google. com/site/sfeverton18/research/appendix-1. Rodgers, GJ, Bray AJ (1988) Density of states of a sparse random matrix. Phys Rev B 37(7):3557–3562. Rogers, T, Castillo IP (2009) Cavity approach to the spectral density of non-hermitian sparse matrices. Phys Rev E 79(1):012101. Rodgers, GJ, deDominicis C (1990) Density of states of sparse random matrices. J Phys A Math Gen 23(9):1567–1573. Rogers, T, Castillo IP, Kühn R, Takeda K (2008) Cavity approach to the spectral

104

44. 45.

46. 47. 48. 49.

50. 51. 52. 53.

54. 55. 56. 57.

58.

Graphs: Theory and Algorithms

density of sparse symmetric random matrices. Phys Rev E 78(3):031116. Ruth, DM, Koyak RA (2011) Nonparametric tests for homogeneity based on non-bipartite matching. J Am Stat Assoc 106(496). Schmitt, K (2015) Fake degree discovery algorithm for lighting up networks[https://github.com/Pelonza/Graph_Inference/blob/master/ Clean_Algorithms/FDD.py]. Schwenk, AJ (1973) Almost all trees are cospectral. New directions in the theory of graphs X:275–307. Semerjian, G, Cugliandolo LF (2002) Sparse random matrices: the eigenvalue spectrum revisited. J Phys A Math Gen 35(23):4837–4851. Slanina, F (2011) Equivalence of replica and cavity methods for computing spectra of sparse random matrices. Phys Rev E 83(1):011118. Tong, H, Faloutsos C, Gallagher B, Eliassi-Rad T (2007) Fast besteffort pattern matching in large attributed graphs In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 737–746, ACM. Trefethen, LN, Bau III D (1997) Numerical linear algebra, Vol. 50. Siam. VanDam, ER, Haemers WH (2003) Which graphs are determined by their spectrum?Linear Algebra Appl 373:241–272. VanMieghem, P (2010) Graph spectra for complex networks, Cambridge University Press. VonCollatz, L, Sinogowitz U (1957) Spektren endlicher grafen In: Abhandlungen aus dem Mathematischen Seminar der Universität Hamburg, 63–77.. Springer. Wijegunawardana, P, Ojha V, Gera R, Soundarajan S (2017) Seeing Red: Locating People of Interest in Networks In: Complex Networks VIII.. Springer. Wilson, RC, Zhu P (2008) A study of graph spectra for comparing graphs and trees. Pattern Recogn 41(9):2833–2841. Zager, LA, Verghese GC (2008) Graph similarity scoring and matching. Appl Math Lett 21(1):86–94. Zhang, L, Bian W, Song M, Tao D, Liu X (2011) Integrating local features into discriminative graphlets for scene classification In: Neural Information Processing, 657–666.. Springer. Zhang, L, Han Y, Yang Y, Song M, Yan S, Tian Q (2013) Discovering discriminative graphlets for aerial image categories recognition. IEEE Trans Image Process 22(12):5071–5084.

On Generalized Distance Gaussian Estrada Index of Graphs

8

Abdollah Alhevaz 1,,Maryam Baghipur 1 and Yilun Shang 2

Faculty of Mathematical Sciences, Shahrood University of Technology, P.O. Box 316-3619995161, Shahrood, Iran

1

Department of Computer and Information Sciences, Northumbria University, Newcastle NE1 8ST, UK

2

ABSTRACT For a simple undirected connected graph G of order n, let D(G), DL (G), DQ(G) and Tr(G) be, respectively, the distance matrix, the distance Laplacian matrix, the distance signless Laplacian matrix and the diagonal matrix of the vertex transmissions of G. The generalized distance matrix Dα(G) is signified

Citation: (APA): Alhevaz, A., Baghipur, M., & Shang, Y. (2019). On generalized distance Gaussian Estrada index of graphs. Symmetry, 11(10), 1276. (21 pages) DOI: https://doi. org/10.3390/sym11101276 URL: https://www.mdpi.com/2073-8994/11/10/1276/htm

Copyright: © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

106

Graphs: Theory and Algorithms

by Dα(G) = αTr(G) + (1 − α)D(G), where α ∈ [0, 1]. Here, we propose a new kind of Estrada index based on the Gaussianization of the generalized distance matrix of a graph. Let ∂1, ∂2, . . . , ∂n be the generalized distance eigenvalues of a graph G. We define the generalized distance Gaussian Estrada index . Since characterization of Pα(G) is very appealing in quantum information theory, it is interesting to study the quantity Pα(G) and explore some properties like the bounds, the dependence on the graph topology G and the dependence on the parameter α. In this paper, we establish some bounds for the generalized distance Gaussian Estrada index Pα(G) of a connected graph G, involving the different graph parameters, including the order n, the Wiener index W(G), the transmission degrees and the parameter α ∈ [0, 1], and characterize the extremal graphs attaining these bounds. Keywords: Gaussian Estrada index; generalized distance matrix (spectrum); Wiener index; generalized distance Gaussian Estrada index; transmission regular graph.

INTRODUCTION In this paper, we study connected simple graphs with a finite number of vertices. Standard graph terminology will be adopted. We refer the reader to, e.g., [1] for more related concepts. A graph is denoted by G = (V (G), E(G)), where V(G) = {v1, v2, . . . , vn} is its vertex set and E(G) is its edge set. The order of G is the number n = |V(G)| and its size is the number m = |E(G)|. The set of vertices adjacent to v ∈ V(G), denoted by N(v), refers to the neighborhood of v. The degree of v, denoted by dG(v) (we simply write dv if it is clear from the context) means the cardinality of N(v). A graph is called regular if each of its vertices have the same degree. The distance between two vertices u, v ∈ V (G), denoted by duv, is defined as the length of the shortest path between u and v in G. The diameter of G is defined as the maximum distance between any two vertices in G. The distance matrix of G is denoted by D(G) and is defined as D(G) = (duv)u,v∈V(G) . The transmission TrG(v) of a vertex v is defined as the sum of the distances from v to all other vertices in G, that is, TrG(v) = . A graph G is said to be k-transmission regular if TrG(v) = k, for each v ∈ V(G). The transmission (sometimes referred to as the Wiener index) of a graph G is denoted by W(G). W (G) is given by the distance sum between each pair of vertices in G. Namely, we have . For any vertex vi ∈ V(G), the transmission TrG(vi) is also called the transmission degree, denoted by Tri for short and the sequence

On Generalized Distance Gaussian Estrada Index of Graphs

107

{Tr1, Tr2, . . . , Trn} is called the transmission degree sequence of the graph . G. The second transmission degree of vi, denoted by Ti is given by

Let Tr(G) = diag(Tr1, Tr2, . . . , Trn) be the diagonal matrix of vertex transmissions of G. In [2], M. Aouchiche and P. Hansen introduced the distance Laplacian matrix DL (G) and the distance signless Laplacian matrix DQ(G) as DL (G) = Tr(G) − D(G) and DQ(G) = Tr(G) + D(G). The spectral properties of D(G), DL (G) and DQ(G) have attracted much attention of researchers and a large number of papers have been published regarding their spectral properties, including spectral radius, second largest eigenvalue, smallest eigenvalue, etc. For some recent works we refer to [3–5] and the references therein. Nikiforov [6] investigated the integration of adjacency spectrum and signless Laplacian spectrum via cunning convex combinations between diagonal degree and adjacency matrices. Recently in [7], Cui et al. introduced the generalized distance matrix Dα(G) as convex combinations of Tr(G) and D(G), defined as Dα(G) = αTr(G) + (1 − α)D(G), for 0 ≤ α ≤ 1. Since D0(G) = = DQ(G), D1(G) = Tr(G) and Dα(G) − Dβ(G) = (α − β)DL (G), any D(G), result regarding the spectral properties of generalized distance matrix has its counterpart for each of these particular graph matrices, and these counterparts follow immediately from a straightforward proof. In fact, this matrix leads to merging the distance spectral, distance Laplacian spectral and distance signless Laplacian spectral theories. As Dα(G) is a real symmetric matrix, the eigenvalues become real. Therefore, we arrange them as ∂1 ≥ ∂2 ≥ · · · ≥ ∂n. The largest eigenvalue ∂1 of the matrix Dα(G) is called the generalized distance spectral radius of G. For simplicity, we will refer to ∂1(G) as ∂(G) in the sequel. It follows from the Perron-Frobenius theorem and the non-negativity and irreducibility of Dα(G) that ∂(G) is the unique eigenvalue and there is a unique positive unit eigenvector X corresponding to ∂(G), which is called the generalized distance Perron vector of G. As usual, Kn, Ks,t , Pn and Cn denote, respectively, the complete graph on n vertices, the complete bipartite graph on s + t vertices, the path on n vertices and the cycle on n vertices.

MOTIVATION Graph spectral theory has gained momentum during the last few decades partly due to the mounting availability of scientific data and network representation stemming from a wide range of areas including biology, economics, engineering and social sciences [8]. Graph spectral techniques are proved to be highly instrumental in dissecting interconnection network structures.

108

Graphs: Theory and Algorithms

Based upon investigations on geometric properties of biomolecules, Ernesto Estrada [9] considered an expression of the form

where λ1, λ2, . . . , λn are the eigenvalues of the adjacency matrix of a molecular graph G. The mathematical significance of this quantity was recognized later [10] and soon it became known under the name “Estrada index” [11]. The mathematical properties of the Estrada index have been intensively studied, see for example, [11,12]. Estrada index and its bounds have been extensively studied in the graph spectral community. We refer the interested reader to consult the recent nice survey [13]. This graph spectrum based invariant has also an important role in chemistry and physics. It can be used, for example, as a metric for the degree of folding of long chain polymeric molecules [14,15]. It has found a number of applications in complex networks and characterizes the centrality [9,16,17]. We refer the reader to [18] for an account of the numerous applications of the Estrada index. Other than the adjacency spectrum, the Estrada index has been explored in various forms based on non-adjacency matrices in the pioneering work [9]. Because of the remarkable usefulness of the graph Estrada index, this proposal has been put into effect and varied Estrada indices based on the eigenvalues of other graph matrices have been tackled: Estrada index based invariant with respect to distance matrix, Laplacian matrix, signless Laplacian matrix, distance Laplacian matrix and distance signless Laplacian matrix, to name just a few. For some related results on this subject, see, for example, [19–23] and the references therein. Let µ1, µ2, . . . , µn be the eigenvalues of the distance matrix of a graph G. Then the distance Estrada index of a connected graph G has been introduced in [20] as DEE (G) = . A different way to study graph spectra consists of analyzing matrix functions of the matrices associated with a graph or network. In analyzing graph invariants such as centrality and communicability, matrix functions of the form have been found as a power tool [24]. When the gap λ1 − λ2 is large, EE tends to be dominated by the largest eigenvalue λ1. In this sense, information that is hidden in the smaller eigenvalues, which are particularly useful, for example, in the context of molecular orbital theory [25], has been overlooked. Estrada et al. [26] recently proposed to scope this

On Generalized Distance Gaussian Estrada Index of Graphs

109

bit of information by using a Gaussian matrix function, which gives rise to the Gaussian Estrada index, H(G). H(G) can be defined as

Gaussian Estrada index H is able to describe the partition function of quantum mechanics systems with Hamiltonian A 2 [27]. It gives more weight to eigenvalues close to zero and ideally complements the Estrada index. Moreover, it is also related to the time-dependent Schrödinger equation with the squared Hamiltonian. Based on numerical simulations, H is found to be effective in differentiating the dynamics of particle hopping among bipartite and non-bipartite structures [24]. More results can be found in [28]. A distance matrix, on the other hand, is an important variation of an adjacency matrix. It encodes information that is related to random walk and self-avoiding walks of chemical graphs which are not manifest in an adjacency matrix. Distance spectrum has been intensively tackled in the past few years [29]. In the pedigree of distance matrix, important members include distance Laplacian and distance signless Laplacian matrices. In this work, we propose here a new kind of Estrada index based on the Gaussianization of the generalized distance matrix of a graph. The generalized distance eigenvalues of a graph G are denoted by ∂1, ∂2, . . . , ∂n. We define the generalized distance Gaussian Estrada index Pα(G), as (1) The results for the Gaussian Estrada index of distance matrix (namely, distance Gaussian Estrada index P D(G)) and Gaussian Estrada index of distance signless Laplacian matrix (namely, distance signless Laplacian Gaussian Estrada index P Q(G)) can be naturally defined when setting, respectively, α = 0 and α =

in the above definition.

Since characterization of Pα(G) tends to be very appealing in quantum information theory, it will be desirable to consider the quantity Pα(G) and explore some properties such as the bounds, the dependence on the structure of graph G and the dependence on the parameter α. In this paper, we aim to establish some bounds for the generalized distance Gaussian Estrada index Pα(G) of a connected graph G, in terms of the different graph parameters like the order n, the Wiener index W(G), the transmission degrees and the parameter α ∈ [0, 1]. We also characterize the extremal graphs attaining

110

Graphs: Theory and Algorithms

these bounds. Moreover, Pα(G) for some fundamental special graphs has been obtained, which helps us to interpret this metric when applied to sophisticated topologies. We also give an expression for Pα(G) of a (transmission) regular graph G in terms of the distance eigenvalues as well as adjacency eigenvalues of G, and describe the generalized distance Gaussian Estrada index of some graphs obtained by operations.

BOUNDS FOR GENERALIZED DISTANCE GAUSSIAN ESTRADA INDEX We present some useful bounds in this section for the generalized distance Gaussian Estrada index Pα(G) of a connected graph G, in terms of the different graph parameters including the order n, the Wiener index W(G), the transmission degrees and the parameter α ∈ [0, 1]. We will also identify the extremal graphs attaining these bounds. We start by giving some previously known results that will be needed below. Lemma 1. [7] For a connected graph G of order n, we have

where the equality holds if and only if G is transmission regular. Lemma 2. [7] Define the transmission degree sequence and the second transmission degree sequence of G as {Tr1, Tr2, . . . , Trn} and {T1, T2, . . . , Tn}, respectively. We have

If

≤ α ≤ 1, the equality holds if and only if G is transmission regular.

The proof of the following lemma is similar to that of Lemma 2 in [30], and is omitted here. Lemma 3. A connected graph G has exactly two distinct generalized distance eigenvalues if and only if G is a complete graph. Lemma 4. If the transmission degree sequence of G is {Tr1, Tr2, . . . , Trn}, then

On Generalized Distance Gaussian Estrada Index of Graphs

111

with equality if and only if G is transmission regular. Proof. The proof is analogous to that of Theorem 2.2 in [4], and is excluded. Lemma 5. [31] If A is an n × n non-negative matrix with the spectral radius λ(A) and row sums r1,r2, . . . ,rn, then

Moreover, if A is irreducible, then both of the equalities hold if and only if the row sums of A are all equal. Note that the i-th row sum of Dα(G) is Hence, applying Lemma 5, we derive the following result.

.

Corollary 1. Let G be a simple connected graph of order n. Let Trmax and Trmin denote, respectively, the largest and least transmissions of G. Then Trmin ≤ ∂(G) ≤ Trmax. Moreover, any of the equalities holds if and only if G is a transmission regular graph. Next, we present the upper bounds for the generalized distance Gaussian Estrada index involving different graph invariants. To fix notation, we first introduce some preliminaries. For k ≥ 0, define . Then S0 . = n and

(2) Our first result gives an upper bound for the generalized distance Gaussian Estrada index Pα(G), through the order n the transmission degrees and the parameter α Theorem 1. Let G be a connected graph of order n. Then for any integer k0 ≥ 2

(3)

112

Graphs: Theory and Algorithms

with equality if and only if G = K1.

Proof. Starting with Equation (2), we hav

and Equation (3) follows. From the derivation of Equation (3), it is clear that the equality will be attained in Equation (3) if and only if G has no nonzero Dα-eigenvalues, i.e., G = K1. The next result gives another upper bound as well as a lower bound for Pα(G) of a connected graph G.

Theorem 2. Suppose that G is a connected graph of order n. Then for any integer k0 ≥ 2,

(4) where sides of Equation (4) if and only if G = K1.

. Equality holds on both

Proof. We will first prove the right inequality. According to the definition of Pα(G), we hav

since

On Generalized Distance Gaussian Estrada Index of Graphs

113

Then the right hand side of Equation (4) follows. Again, from the derivation of the right hand side of Equation (4), it is evident that equality will be attained in the right hand side of Equation (4) if and only if G has no non-zero Dα-eigenvalues, i.e., G = K1.

Next, we want to prove the left inequality. Since by Taylor’s theorem e x ≥ . Consequently 1 + x, equality holds if and only if x = 0, we have

Hence, we get the left inequality. One can easily see that the left equality holds in Equation (4) if and only if G = K1.

Remark 1. Assume that G is a connected graph of order n with diameter d. Since and there are pairs of vertices in G, also for each i, we have , then from the lower bound of Theorem 2, we see that

Next, we turn our attention to giving some lower bounds for the generalized distance Gaussian Estrada index Pα(G) through different graph invariants. The following result presents a lower bound in terms of the order n, the transmission degrees and the parameter α ∈ [0, 1]. Theorem 3. Suppose that G is a connected graph with order n and

Then (5) with equality if and only if G = K1.

Proof. According to the definition of Pα(G), we have

114

Graphs: Theory and Algorithms

(6) By the arithmetic-geometric mean inequality, we obtain

(7) By means of a power-series expansion and noting that S0 = n and S1 = −2(1 − α) 2

we obtain (8)

since

holds for all i, and hence

then we get Equation (6), we see that

.if

. By substituting Equations (7) and (8) in

This gives us the first part of the proof From the derivation of Equation (5), it is clear that equality holds if and only if the graph G has no non-zero Dα-eigenvalues. Since G is a connected graph, this only happens in the case of G = K1. The proof is complete As an immediate consequence of Theorem 3, we have the following corollary

Corollary 2. Let G be a connected graph of order n and with diameter d. Then

Where Proof. Since from Theorem 3.

. Equality holds if and only if G =K1. the result follows

On Generalized Distance Gaussian Estrada Index of Graphs

115

One of our main results in this paper is the following theorem, which gives a lower bound for Pα(G) involving the order n, transmission degrees, second transmission degrees and the parameter α ∈ [0, 1]. Moreover, it gives an upper bound via order n and the parameter α ∈ [0, 1]. It shows that, among all connected graphs of order n, the generalized distance Gaussian Estrada index takes its maximum for the complete graph. Theorem 4. Assume that G is a connected graph of order n. Then

(9) The right equality holds if and only if G = Kn. Moreover, the left equality holds if and only if either G is a complete graph or for ≤ α ≤ 1, G is k-transmission regular graph with exactly three distinct Dα-eigenvalues

where Proof. We first consider the right-hand side inequality. Let ∂1 ≥ ∂2 ≥ . . . ≥ ∂n be the generalized distance eigenvalues of G. Note that by Corollary 1, we have ∂1 ≥ Trmin = n − 1. On the other hand, by Proposition 5 in [7] we get ∂n = ∂n(G) ≥ ∂n(Kn) = αn − 1. Therefore

with equality if and only if G has exactly two distinct generalized distance eigenvalues ∂1 = n − 1 and ∂2 = ∂3 = · · · = ∂n = αn − 1. Then by Lemma 3, we obtain G = Kn, and the proof is complete.

Next, we consider the left-hand side. According to the definition of Pα(G) and in view of the arithmetic-geometric mean inequality, we have

(10)

116

Graphs: Theory and Algorithms

(11) Consider the following function (12) for function for x ≥ [32]) and

It is easy to see that f(x) is an increasing . Since

, (see

also by the Cauchy-Schwartz inequality we have and furthermore

Sine

Therefore

It follows from Equation (11) that

(13) This completes the first part of the proof. Now, we suppose that the left equality holds in Equation (9). Then all inequalities in the above argument must be equalities. From Equation (13), we have , which, by

On Generalized Distance Gaussian Estrada Index of Graphs

117

Lemma 2, implies that for is a transmission regular graph. From Equation (10) and the arithmetic-geometric mean inequality, we get = , then |∂2| = · · · = |∂n| and hence

where

. Therefore

Hence, |∂i | can have at most two distinct values and we arrive at the following classification: (i) G has exactly one distinct Dα-eigenvalue. Then G = K1. Kn.

(ii) G has exactly two distinct Dα-eigenvalues. Then, by Lemma 3, G = (iii)

G

has

exactly

three

distinct

Dα-eigenvalues. Then

∂1

. Moreover, for , G is k-transmission regular graph. Then it is clear that G is a graph with exactly three distinct Dα-eigenvalues.

where

. We have arrived at the desired result.

This next result is for Pα(G). The parameters such as order n, Wiener index W(G), transmission degrees and the parameter α are used. Theorem 5. Let G be a connected graph of order n and

. Then (14)

Equality holds if and only if either G is a complete graph or a k-transmission regular graph with exactly three distinct Dα-eigenvalues

where

.

118

Graphs: Theory and Algorithms

Proof. By similar argument as in the proof of Theorem 4, we have

Consider the following function

for

. It is easy to see that f(x) is an increasing

Since function for x ≥ [32]) also by the Cauchy-Schwartz inequality, we have

, (see

Hence,

Note that

since

. Therefore,

It follows from Equation (15) that

The first part of the proof is complete. The remaining of the proof is similar to that of Theorem 4, and hence is omitted. Remark 2. If we use are led to the following simpler estimation

instead in Equation (13), we

On Generalized Distance Gaussian Estrada Index of Graphs

119

with equality if and only if G = K1. Let be the geometric mean of the transmission degrees sequence. It can be seen that holds, and equality is attained if and only if Tr1 = · · · = Trn (i.e., the graph G is transmission regular). Lemma 6. [33] Let a1, a2, . . . , an be non-negative numbers. Then

We next establish a further lower bound for Pα(G) in terms of the order n, the geometric mean of the transmission degrees sequence N(G), the Wiener index W(G) and the parameter α ∈ [0, 1]. Theorem 6. Let G be a connected graph of order n ≥ 2. Then

(17) The equality holds if either G is a complete graph or a graph with exactly three distinct Dα-eigenvalues. Proof. By similar argument as in the proof of Theorem 4, we have

(18) (19) By Lemma 4, we see that have

. Setting

in Lemma 6, we

Combining this with Lemma 4 yields (20) It is easy to see that

, and so, by Equation (16), we have (21)

120

Graphs: Theory and Algorithms

The remainder of the proof is similar to that of Theorem 4, and hence is omitted. Remark 3. If we rely on the inequality

, then we obtain

Since the function f(x) defined in Equation (12) is increasing, we see that our lower bound in Equation (17) is better than the given lower bound in Equation (14). Let G be a k-transmission regular graph. Then it is clear that W(G) = and N(G) = k. We have the following observation based upon Theorem 6. Corollary 3. Let G be a k-transmission regular graph. Then

with equality if and only if G = Kn.

EXAMPLES FOR SOME FUNDAMENTAL SPECIAL GRAPHS In this section, we explicitly derive the exact values of Pα(G) for some fundamental special graphs including complete graphs, complete bipartite graphs, transmission regular graphs, regular graphs, cycles and various graphs generated by graph operations. As mentioned in the introduction of the paper, for α = 0 the generalized distance matrix Dα(G) is equivalent to the distance matrix D(G) and for α = , twice the generalized distance matrix Dα(G) is the same as the distance signless Laplacian matrix DQ(G). Therefore, if in particular we put α = 0 and α = in all the results obtained in Section 3, we obtain the corresponding bounds for the distance Gaussian Estrada index P D(G) and the distance signless Laplacian Gaussian Estrada index P Q(G), respectively. 1

Since the generalized distance spectrum of Kn is spec(Kn) = {n − 1, αn − }, we have the following.

[n−1]

Lemma 7. Let Kn be the complete graph of order n. Then

On Generalized Distance Gaussian Estrada Index of Graphs

121

Following [34], the generalized distance spectrum of the complete bipartite graph Ka,b is

where η = (α − 2) 2 (a 2 + b 2 ) + 2ab(α 2 − 2). We have the following expression for the Pα(G) of the complete bipartite graph. Lemma 8. Let Ka,b be the complete bipartite graph of order a + b. Then

If we set α = 0, the distance Gaussian Estrada index of Ka,b is as follows.

Corollary 4.

1

Moreover, if we set α = , the distance signless Laplacian Gaussian 2 Estrada index of Ka,b is as follows.

Corollary 5.

The next result gives an expression for Pα(G) of a transmission regular graph G through the distance eigenvalues of G. Theorem 7. Let G be a k-transmission regular graph of order n having distance eigenvalues µ1, µ2, . . . , µn. Then

122

Graphs: Theory and Algorithms

Proof. Note that the generalized distance spectrum of the graph G reads as αk + (1 − α)µ1 ≥ αk + (1 − α)µ2 ≥ · · · ≥ αk + (1 − α)µn, where µ1 ≥ · · · ≥ µn is the distance spectrum of G. Therefore,

as desired. We next present an expression for Pα(G) of a regular graph G in terms of the adjacency eigenvalues of G. Theorem 8. Let G be an r-regular graph of order n, size m and diameter at most 2. If {r, λ2, . . . , λn} are the eigenvalues of the adjacency matrix A(G) of G, then

6 Proof. We know that the transmission of each vertex v ∈ V(G) is Tr(v) = d(v) + 2(n − d(v) − 1) = 2n − d(v) − 2 and Wiener index W(G) of G is W(G) = n 2 − n − m. Moreover,

where J is the all ones matrix. Therefore, we obtain

as desired. Example 1. Let Cn be a cycle of order n. Since Cn is a k-transmission if n is even and if n is odd, then the generalized regular graph with distance Gaussian Estrada index of Cn according to the parity of n, is as follows. If n = 2p (i.e., n is even), then following [2] the distance spectrum of Cn is

(22) Hence, applying Theorem 7 we have

On Generalized Distance Gaussian Estrada Index of Graphs

123

If n = 2p + 1 (i.e., n is odd), then following [2] the distance spectrum of Cn is (23) Hence, applying Theorem 7 we have

Therefore, if we set α = 0, the distance Gaussian Estrada index of Cn for n = 2p is

and for n = 2p + 1 is

The graph G∇G is obtained by connecting each vertex of G to each vertex of a copy of G. Example 2. Let G be an r-regular graph with an adjacency matrix A and adjacency spectrum spec(G) = {r, λ2, . . . , λn}. For any vertex v ∈ G∇G, it is easy to see that Then G∇G is a transmission regular graph and Tr(G∇G) = (3n − r − 2)I. Note that by [35] the graph G∇G has distance spectrum spec , for i = 2, . . . , n. By Theorem 7, we have

where σ1 = α(5n − r − 2) + n − r − 2, σ2 = (α + 1)(3n − r − 2) and σ3 = 2α(λi + 3n − r) − 2λi − 4. Therefore, if we set α = 0, the distance Gaussian Estrada index of G∇G is

124

Graphs: Theory and Algorithms

1

If we set α = , the distance signless Laplacian Gaussian Estrada index 2 of G∇G is

where σ1 = 7n − 3r − 6, σ2 = 3(3n − r − 2) and σ3 = 2(3n − λi − r − 4).

The cartesian product of two graphs G1 and G2 is denoted by G1 × G2. The cartesian product can be viewed as a graph with vertex set V(G1) × V(G2) and edge set containing all edges {(u1, u2),(v1, v2)} such that u1 = v1 and u2v2 ∈ E(G2) or u2 = v2 and u1v1 ∈ E(G1). Example 3. Let G be an r-regular graph of diameter at most 2 with an adjacency matrix A and adjacency spectrum spec(G) = {r, λ2, . . . , λn}, and let H = G × K2. Suppose V(G) = {v1, v2, . . . , vn} and V(K2) = {w1, w2}. From the fact dH((vi , wj),(vs , wt)) = dG(vi , vs) + dK2 (wj , wt) = dG(vi , vs) + 1, we see that all vertices of H have the same transmission and TrH(vi , wj) = 5n − 2r − 4. So Tr(H) = (5n − 2r − 4)I. Note that by [35], the graph H = G ×

K2 has distance spectrum spec(H) = . . . , n. Then by Theorem 7, we have

, for i = 2,

where ξ1 = α(11n − 4r − 8) − n and ξ2 = α(λi + 5n − 2r − 2) − λi − 2. Therefore, if we set α = 0, the distance Gaussian Estrada index of H = G × K2 is

1

If we set α = , the distance signless Laplacian Gaussian Estrada index 2 of H = G × K2 is

where ξ1 = 9n − 4r − 8 and ξ2 = 5n − λi − 2r − 6.

On Generalized Distance Gaussian Estrada Index of Graphs

125

Now, we give an expression for the generalized distance Gaussian Estrada index of the lexicographic product G[H] built upon two graphs G and H. Definition 1. [1] Let G and H be two graphs on vertex sets V(G) = {u1, u2, . . . , up} and V(H) = {v1, v2, . . . , vn}, respectively. Their lexicographic product G[H] is a graph defined by V(G[H]) = V(G) × V(H), the Cartesian product of V(G) and V(H), in which u = (u1, v1) is adjacent to v = (u2, v2) if and only if either (a) u1 is adjacent to v1 in G or

(b) u1 = v1 and u2 is adjacent to v2 in G.

Example 4. Let G be a k-transmission regular graph of order p. Let H be an r-regular graph of order n with adjacency eigenvalues {r, λ2, . . . , λn}. Let {µ1, . . . , µp} be the eigenvalues of the distance matrix D(G) of G. For v ∈ G[H], it is easy to see that Tr(v) = r + 2(n − r − 1) + kn = kn + 2n − r − 2. Then G[H] is a transmission regular graph and Tr(G[H]) = (kn + 2n − r − 2) I. Note that by [36], the graph G[H] has distance spectrum spec(G[H]) = , for i = 1, . . . , p and j = 2, . . . , n. In view of Theorem 7, we have

where ζ1 = α(2kn − nµi + 2n − r − 2) + nµi + 2n − r − 2 and ζ2 = α(kn + λi + 2n − r) − λi − 2. Similarly, if we set α = 0, the distance Gaussian Estrada index of G[H] is

1

If we set α = , the distance signless Laplacian Gaussian Estrada index 2 of G[H] is

where ζ1 = 2kn + nµi + 6n − 3r − 6 and ζ2 = kn − λi + 2n − r − 4.

We conclude by computing the generalized distance Gaussian Estrada index of the closed fence graph. Example 5. Let Cn be a cycle of order n and K2 be the complete graph of

126

Graphs: Theory and Algorithms

order 2. Then the closed fence graph is defined as G = Cn[K2], and depicted in Figure 1. Applying Example 4, we will be able to compute the generalized distance Gaussian Estrada index of closed fence G = Cn[K2]. It is well known that the adjacency spectrum of the graph K2 is spec(K2) = {1, −1}. Then, applying Example 4, the generalized distance Gaussian Estrada index of closed fence Cn[K2], according to the parity of n, is as follows. If n = 2z (i.e., n is even), then by Equation (22) and applying Example 4 we have

If n = 2z + 1 (i.e., n is odd), then by Equation (23) and applying Example 4 we have

Figure 1. The closed fence graph.

CONCLUSIONS The concept of Estrada index of a graph was first motivated by Ernesto Estrada in [9] as the sum of the exponential of the eigenvalues of adjacency matrix assigned to graphs. It has attracted increasing attention in recent

On Generalized Distance Gaussian Estrada Index of Graphs

127

years and has been extended to varied forms involving many of the important graph matrices such as the Laplacian matrix and distance matrix. The recently introduced Gaussian Estrada index H(G) [26] has the merit of encoding the information hidden in the eigenvalues close to zero which are overlooked in other Estrada indices. It has also played an essential role in quantum mechanics [27]. In this paper we have proposed a new sort of Estrada index based upon the Gaussianization of the generalized distance matrix of a graph. Let ∂1, ∂2, . . . , ∂n be the generalized distance eigenvalues of a graph G. We defined , which the generalized distance Gaussian Estrada index as reduces to merging the spectral theories of Gaussian Estrada index with respect to distance matrix and Gaussian Estrada index with respect to distance signless Laplacian matrix, and any result regarding the spectral properties of generalized distance Gaussian Estrada index has its counterpart for each of these particular indices, and these counterparts follow immediately from a single proof. Since characterization of Pα(G) turns out to be highly desirable in quantum information theory, it is interesting to study the quantity Pα(G) and explore some properties including the bounds, the dependence on the structure of graph G and the dependence on the parameter α. We established some bounds for the generalized distance Gaussian Estrada index Pα(G) of a connected graph G through the different graph parameters including the order n, the Wiener index W(G), the transmission degrees and the parameter α ∈ [0, 1]. We have also characterized the extremal graphs attaining these bounds. Some expressions for Pα(G) of some fundamental special graphs have been worked out.

ACKNOWLEDGMENTS The authors would like to thank the academic editor and the three anonymous referees for their constructive comments that helped improve the paper. The research of A. Alhevaz was in part supported by a grant from Shahrood University of Technology. This work of Y. Shang was supported in part by the UoA Flexible fund of Northumbria University.

CONFLICTS OF INTEREST The authors declare no conflict of interest.

128

Graphs: Theory and Algorithms

REFERENCES 1. 2. 3.

4.

5. 6. 7. 8. 9. 10.

11. 12.

13. 14. 15. 16.

Cvetković, D.M.; Doob, M.; Sachs, H. Spectra of Graphs—Theory and Application; Academic Press: New York, NY, USA, 1980. Aouchiche, M.; Hansen, P. Two Laplacians for the distance matrix of a graph. Linear Algebra Appl. 2013, 439, 21–33. Alhevaz, A.; Baghipur, M.; Hashemi, E.; Ramane, H.S. On the distance signless Laplacian spectrum of graphs. Bull. Malay. Math. Sci. Soc. 2019, 42, 2603–2621. Alhevaz, A.; Baghipur, M.; Paul, S. On the distance signlees Laplacian spectral radius and the distance signless Laplacian energy of graphs. Discrete Math. Algorithms Appl. 2018, 10, 19. Xing, R.; Zhou, B.; Li, J. On the distance signless Laplacian spectral radius of graphs. Linear Multilinear Algebra 2014, 62, 1377–1387. Nikiforov, V. Merging the A- and Q-spectral theories. Appl. Anal. Discrete Math. 2017, 11, 81–107. Cui, S.Y.; He, J.X.; Tian, G.X. The generalized distance matrix. Linear Algebra Appl. 2019, 563, 1–23. Cvetković, D.M. Applications of graph spectra: An introduction to the literature. Appl. Graph Spectra 2009, 21, 7–31. Estrada, E. Characterization of the amino acid contribution to the folding degree of proteins. Proteins 2004, 54, 727–737. Gutman, I.; Estrada, E.; Rodriguez-Velázquez, J.A. On a graphspectrum-based structure descriptor. Croat. Chem. Acta 2007, 80, 151–154. De la Peña, J.A.; Gutman, I.; Rada, J. Estimating the Estrada index. Linear Algebra Appl. 2007, 427, 70–76. Gutman, I.; Furtula, B.; Chen, X.; Qian, J. Resolvent Estrada index— Computational and mathematical studies. Match Commun. Math. Comput. Chem. 2015, 74, 431–440. Gutman, I.; Deng, H.; Radenković, S. The Estrada index: An updated survey. Sel. Top. Appl. Graph Spectra 2011, 22, 155–174. Estrada, E. Characterization of 3-D molecular structure. Chem. Phys. Lett. 2000, 319, 713–718. Estrada, E.; Rodriguez-Velázguez, J.A.; Randić, M. Atomic branching in molecules. Int. J. Quantum Chem. 2006, 106, 823–832. Shang, Y. Local natural connectivity in complex networks. Chin. Phys.

On Generalized Distance Gaussian Estrada Index of Graphs

17. 18. 19.

20. 21. 22. 23. 24.

25. 26.

27.

28. 29. 30. 31. 32.

129

Lett. 2011, 28, 068903. Shang, Y. Biased edge failure in scale-free networks based on natural connectivity. Indian J. Phys. 2012, 86, 485–488. Estrada, E. The Structure of Complex Networks-Theory and Applications; Oxford Univ. Press: New York, NY, USA, 2012. Ayyaswamy, S.K.; Balachandran, S.; Venkatakrishnan, Y.B.; Gutman, I. Signless Laplacian Estrada index. Match Commun. Math. Comput. Chem. 2011, 66, 785–794. Güngör, A.D.; Bozkurt, Ş.B. On the distance Estrada index of graphs. Hacet. J. Math. Stat. 2009, 38, 277–283. Ilić, A.; Zhou, B. Laplacian Estrada index of trees. Match Commun. Math. Comput. Chem. 2010, 63, 769–776. Shang, Y. Distance Estrada index of random graphs. Linear Multilinear Algebra 2015, 63, 466–471. Shang, Y. Bounds of distance Estrada index of graphs. Ars Combin. 2016, 128, 287–294. Alhomaidhi, A.A.; Al-Thukair, F.; Estrada, E. Gaussianization of the spectra of graphs and networks. Theory and applications. J. Math. Anal. Appl. 2019, 470, 876–897. Kutzelnigg, W. What I like about Hückel theory. J. Comput. Chem. 2007, 28, 25–34. Estrada, E.; Alhomaidhi, A.A.; Al-Thukair, F. Exploring the “Middel Earth” of network spectra via a Gaussian matrix function. Chaos 2017, 27, 023109. Wang, L.W.; Zunger, A. Solving Schrödinger’s equation around a desired energy: Application to silico quantum dots. J. Chem. Phys. 1994, 100, 2394. Shang, Y. Lower bounds for Gaussian Estrada index of graphs. Symmetry 2018, 10, 325. Aouchiche, M.; Hansen, P. Distance spectra of graphs: A survey. Linear Algebra Appl. 2014, 458, 301–386. Indulal, G. Sharp bounds on the distance spectral radius and the distance energy of graphs. Linear Algebra Appl. 2009, 430, 106–113. Minć, H. Nonnegative Matrices; John Wiley and Sons Inc.: New York, NY, USA, 1988. Diaz, R.C.; Rojo, O. Sharp upper bounds on the distance energies of a

130

33. 34. 35. 36.

Graphs: Theory and Algorithms

graph. Linear Algebra Appl. 2018, 545, 55–75. Zhou, B.; Gutman, I.; Aleksić, T. A note on Laplacian energy of graphs. Match Commun. Math. Comput. Chem. 2008, 60, 441–446. Ganie, H.A.; Pirzada, S.; Alhevaz, A.; Baghipur, M. Generalized distance spectral spread of a graph. 2019. submitted. Indulal, G.; Gutman, I.; Vijayakumar, A. On distance energy of graphs. Match Commun. Math. Comput. Chem. 2008, 60, 461–472. Indulal, G. The distance spectrum of graph compositions. Ars. Math. Contemp. 2009, 2, 93–100.

Nullity and Energy Bounds of Central Graph of Smith Graphs

9

Usha Sharma1and Renu Naresh2

Department of Mathematics and Statistics, Banasthali University, Banasthali Rajasthan, India 1,2

ABSTRACT A smith graph G is a graph whose at least one eigenvalue is 2. The η nullity (G) of a graph G is the multiplicity of the eigenvalue 0 in the spectrum of adjacency matrix of graph A (G). Energy of the graph is the sum of the absolute values of the Eigen values of the adjacency matrix G. Central graph C (G) of graph G is obtained by the subdividing each edge exactly once and joining all the non-adjacent vertices of graph G. We have evaluated

Citation: (APA): Sharma, U., & Naresh, R (2015). Nullity and Energy Bounds of Central Graph of Smith Graphs. IJSR, 2015. ISSN: 2319-7064 (6 pages). URL: https:// www.ijsr.net/archive/v5i12/ART20163183.pdf Copyright: Open Access. This article is distributed under the terms of the Creative Commons Attribution 3.0 Unported (CC BY 3.0).

132

Graphs: Theory and Algorithms

the nullity and the energy bounds of the central graph of smith graphs. By Huckel molecular Orbital theory, these spectral properties are applicable for determining the stability of unsaturated conjugate hydrocarbons which are isomorphic to central graph of smith graphs. Keywords: Smith graph, Central graph, Nullity, Energy, Huckel molecular Orbital theory

INTRODUCTION Spectral graph theory is the study of properties of graphs in relationship to the characteristic polynomials, Eigen values and eigenvectors of matrices associated with graphs. In a theory of graph spectra, some special types of graphs are studied in detail and their characteristics are well known and summarized in [CvDSa]. Here we discussed the smith graphs. Smith graphs are Cn (n ≥ 3), Wn (n ≥ 6), S5= K1,4, H7, H8 and H9.

Figure 1: Cn; n ≥ 3, Wn; n ≥ 6, S5 = K1,4, H7, H8, H9.

The central graph C (G) of a graph G is obtained by subdividing each edge of G exactly once and joining all the nonadjacent vertices of G.

Figure 2: C (Wn).

Nullity and Energy Bounds of Central Graph of Smith Graphs

133

Here, we have evaluated the nullity of a special type of graphs and also elaborated energy bounds. Let G be a graph of order n, having vertex set V (G) and edge set E (G). Let A (G) be the adjacency matrix of G, where

A scalar λ is called an eigenvalue of the square matrix A (G) if there exist a nonzero vector X such that AX = λ X, where X is called an eigenvector corresponding to the eigenvalue λ . In other words, Eigen values are the set of scalars which is associated with a characteristic equation of adjacency matrix of graph G that is |A- λ I |= 0, where I is the identity matrix. Nullity of G is the zero multiplicity of an eigenvalue in the spectrum of A(G). It is firstly introduced by Gutman in his paper. The chemical importance of this graph spectrum based invariant lies in the fact, that within the Huckel molecular orbital model, if η (G) > 0 for the molecular graph G, then the corresponding chemical compound is highly reactive and unstable or nonexistent and if η (G) = 0 then the respective molecule is predicted to have a stable, closed-shell, electron configuration and low chemical reactivity. In 1940’s energy introduced by chemistry. Ivan Gutman, (1978) interpreted the energy of graph in first time. However, the inspirations for his definition vision much earlier, in the 1930’s, when Erich Huckel recommended the famous Huckel Molecular Orbital Theory. Huckel’s method permits chemists to approximate energies related with π - electron orbital’s in a special group of molecules called conjugated hydrocarbons. The method assumes that the Hamiltonian operator is a simple linear combination of certain orbital, and given the time-independent Schrodinger equation to solve for the energies desired (2012). Gunthard, H. H. and Primas, H. (1956) realized that the matrix is applicable in the Huckel method is a first degree polynomial of the adjacency matrix of a certain graph related to the molecule being studied.

134

Graphs: Theory and Algorithms

Moreover, under certainly reasonable speculation about the molecule, its total π -electron energy can be written as the energy of the graph is the sum of the absolute values of the Eigen values λ of the adjacency matrix A(G) that is

is related to the total π -electron energy in a molecular graph.

We have appraised the energy bounds of the graph with the help of Schwartz inequality. Let a1,…, an and b1,…, bn be the two sequences. Then

Rest of the paper in section 2 we have given related work to nullity and energy of graphs. We use one proposition in this paper. We have mentioned in section 3. In section 4, we have determined new results on the nullity of the central graph of smith graphs. In section 5, we have evaluated energy bounds of central graph of smith graph. We have concluded the results in section 6.

LITERATURE REVIEW Spectral graph theory vision during the two decades starting from 1940 to 1960. ``The theory of graph spectra’’ written by Cvetkovic, D. M. et al. (1980), has included the monograph of the research area of spectral graph theory and further updated newer results in the theory of graph spectra (1989). In the preceding 9 -10 years, many developments in this arena such as Lubotzky, A. et al. (1988) gave isometric properties of expander graphs. Currently developed spectral techniques are more powerful and convenient for well-established graph. Gutman, I. (2001) has shown in his paper that nullity of the line graph of the tree that is η L(T) is at most one. Boravicanin, B. and Gutman, I. (2011) in their paper related to the nullity of graphs, explained the chemical importance of graph- spectrum based on Huckel molecular orbital theory and recently obtained general mathematical results on the nullity of graphs η (G). Barrette, W. et al. (2014) have found the maximum nullity of a complete subdivision graph. Gu, R. et al. (2014) have done the research on randic incidence energy of graphs. Sharaf, K. R. and Rasul, K. B. (2014) gave results on the nullity of expanded graphs and Sharaf, K. R. and Ali, D. A. (2014) have given nullity of t- tupple graphs. S. (2014) has shown the expression of the nullity set of unicyclic graphs is depend on the extremal nullity.

Nullity and Energy Bounds of Central Graph of Smith Graphs

135

Coulson, C. A. and et al. (1978) explained some graphical aspects of Huckel theory and also explained the energy level of the graph. Biggs, N. (1993) discussed the applications of linear algebra and matrix theory in Algebraic graph theory. Gutman, I. (2001) presented fundamental mathematical results on E and relation between E(G) and characteristic polynomial of G and bounds for E. Bo, Z.(2004) presented upper bound for the energy of a graph in terms of a degree sequence and specified maximal energy graph and maximal energy bipartite graph. Adiga, C. and et al. (2007) determined the several classes of graphs such as biregular, molecular graphs, triregular graphs satisfy the condition E(G) > n. Gutman, I. and et al. (2012) explicated the chemical origin of the graph energy concept and briefly survey of applications of total π -electron energy. Gutman, I. and et al. (2015) shown that for a fixed value of n, both the spectral radius and the energy of complete p- partite graphs are minimal for complete split graph CS(n, p-1) and are maximal for Turan graph T(n, p). Song Y. Z. and et al. (2015) proved that the nullity of the bipartite graph that is η (G) = |G|-2- 2 In2 (∆ (G)), if G is a reduced bipartite graph.

PREPOSITION Proposition Let G be a graph and λ 1, λ 2,… λ n are the Eigen values of adjacency matrix A(G), then

NULLITY OF CENTRAL GRAPH OF SMITH GRAPHS We will discuss the nullity of central Graph of different types of smith graph out of these many are nonsingular. Theorem 4.1: Nullity of central graph of smith graph Wn; n ≥ 6 is zero, that is η [C(Wn)] = 0 ∀ n ≥ 6.

Proof: Consider a smith graph Wn with vertex set V= {v1, v2, v3,…, vn} and edge set E= {e1, e2, e3,…,e(n-1) }respectively. Then, C(Wn) has vertex set {v1, u1, v2, u2, …, v(n-2), u(n-1), v(n-1), u(n-1), vn} and edge set {e1, e’ 1, e2, e’ 2,…, e(n1), e’ (n-1) ,…, e12, e14,…, e1(n-1), e24, e25,…, e(2n-1),…, e(n-1)(n-2)}.

136

Graphs: Theory and Algorithms

Figure 3: C(Wn)

Now the total number of vertices p in C(Wn) = number of vertices in (Wn) + number of edges in (Wn)= n + (n-1) = 2n1. The total number of edges q in C(Wn) = 2(number of edges in Wn) + number of edges between non - adjacent vertices in Wn = 2(n-1) + number of edges in Kn - number of edges in C(Wn).

Now we find the nullity of graph

Firstly we construct adjacency matrix

Now we apply elementary row operation for finding the nullity of the graph. Nullity of graph = |(G)| - r(A(G)), where r(A(G)) is the rank of matrix. Then, r(A(G)) = number of non – zero rows in the row reduced form of matrix = (2n-1). Then the nullity of graph = |C(Wn)| - r(A(G)) = (2n-1) (2n-1) = 0. Therefore, η [C (Wn)] = 0 ∀ n ≥ 6.

Theorem 4.2: Nullity of central graph of smith graph H7 is zero, that is η [C (H7)] = 0. Proof: Consider a smith graph H7 with vertex set V= {v1, v2, v3, v4, v5, v6, v7} and edge set E= {e1, e2, e3, e4, e5, e6} respectively. Then, C(H7) has vertex set {v1, u1, v2, u2, …, v5, u5, v6, u6, v7} and edge set {e1, e’ 1, e2, e’ 2,…, e6, e’ 6,…, e13, e14,…, e17, e24, e25,…, e27,…, e56, e57}.

Nullity and Energy Bounds of Central Graph of Smith Graphs

137

Figure 4: C (H7)

Number of vertices in H7 + number of edges in H7 = 13 and the total number of edges in C (H7) = 2(number of edges in H7) + number of edges between non - adjacent vertices in H7 = 12 + number of edges in K7 - number - 6 = 27. Now we find the nullity of graph C(H7). of edges in H7 = 12 Then, we construct the adjacency matrix A= [aij]n×n. The A[C(H7)] of the 13 × 13 order

We apply elementary row operation for determined the nullity of graph. Nullity of graph = |(G)| - r(A(G)), where r(A(G)) is the rank of matrix. The rank of graph = number of non – zero row in the row reduced form of matrix = 13. Therefore, η [C (H7)] = 0.

Theorem 4.3: Nullity of central graph of smith graph H8 is zero. i.e. η [C (H8)] = 0. Proof: Consider a smith graph H8 with vertex set V= {v1, v2,…, v8} and

138

Graphs: Theory and Algorithms

edge set E = {e1, e2,…, e7} respectively. Then, C (H8) has vertex set {v1, u1, v2, u2,…, u7, v8} and edge set {e1, e’ 1, e2, e’ 2,…, e7, e’ 7, e13, e14,…, e18, e24, e25,…, e28,…, e68}.

Figure 5: C (H8).

Now the total number of vertices in C (H8) = number of vertices in H8 + number of edges in H8 = 15 and the total number of edges in C (H8) = 2(number of edges in H8) + number of edges between non - adjacent vertices in H8 = 14 + number of edges in K8 - number of edges in H8 = 14 + 7 = 35. Now we find the nullity of graph C (H8).

Firstly we construct the adjacency matrix A = [aij]15×15. We apply elementary row operation for rank. Nullity of graph = |(G)|- r(A(G)), where r(A(G)) is the rank of matrix. Then, the rank of matrix = number of non – zero row in the row reduced form of matrix = 15. Therefore, η [C (H8)] = 0. Theorem 4.4: Nullity of central graph of smith graph S5 is zero. i.e. η [C (S5)] = 0.

Proof: Consider a smith graph S5 with vertex set V1= v1 and V2= {u1, u2, u3, u4} and edge set E= {e1, e2, e3, e4} respectively. Then, C(S5) vertex set {v1, w1, u1, w2, u2, w3, u3,w4, u4} and edge set {e1, e’ 1, e2, e’ 2 , e3, e’ 3, e4, e’ , e12, e13, e14, e23, e24, e34}. 4

Now the total number of vertices in C(H9) = number of vertices in H9 + number of edges in H9 = 17 and the total number of edges in C(H9) = 2( number of edges in H9) + number of edges between non - adjacent vertices in H9 = 16 + number of edges in K9 - number of edges in H9 = 16 + = 44. Now we find the nullity of graph C(H9).

-8

Firstly we construct the adjacency matrix A= [aij]17×17. We apply elementary row operation for rank. Nullity of graph =

Nullity and Energy Bounds of Central Graph of Smith Graphs

139

Figure 6: C(S5).

Now the total number of vertices in C(S5) = number of vertices in S5 + number of edges in S5 = 9 and the total number of edges in C(S5) = 2( number of edges in S5) + number of edges between non - adjacent vertices in S5 = 8 + number of edges in K5 – number of edges in S5 = 8 + - 4= 14. Now we find the nullity of graph C(K1, 4). Firstly we construct the adjacency matrix A= [aij]9×9. We apply elementary row operation for rank. Nullity of graph = |(G)|- r(A(G)), where r(A(G)) is the rank of matrix. Then, the rank of matrix = number of non – zero row in the row reduced form of matrix = 9. Therefore, η [C(S5)] = 0. Theorem 4.5: Nullity of central graph of smith graphs H9 is one. i.e. η [C(H9)] = 1.

Proof: Consider a smith graph H9 with vertex set V= {v1, v2,…, v9} and edge set E= {e1, e2,…, e8} respectively. Then, C(H9) has vertex set {v1, u1, v2, u2,…, u8, v9} and edge set {e_1, e’ 1, e2, e’ 2,…, e7, e’ 7, e13, e14,…, e19, e24, e25,…, e29,…, e79}.

Figure 7. C(H9)

|(G)|- r(A(G)), where r(A(G)) is the rank of matrix. Then, the rank of matrix = number of non – zero row in the row reduced form of matrix = 16. Therefore, η [C(H9)] = 1.

Graphs: Theory and Algorithms

140

Theorem 4.6: Nullity of central graph of smith graph Cn ; n ≥ 3 are as follows:

Proof: Consider a smith graph Cn with vertex set V= {v1, v2, v3…, vn} and edge set E= {e1, e2, e3…, en} respectively. Then, C(Cn) vertex set {v1, u1, v2, u2,…, vn-2, un-1, vn-1, un-1, vn}and edge set {e1, e’ 1, e2, e’ 2,…, en, e’ n,…, e12, e14,…, e1(n-1), e24, e25,…, e2(n-1),…, e(n-2) n}.

Figure 8: C(Cn)

Proof: Upper Bound: Energy of given graph

(1) Lower Bound: We use the arithmetic- geometric inequality E[C(G)] =

2n −1

∑λ i −1

i

 2n −1  E 2 [C(G)] =  ∑ λi   i −1  E 2 [C(G)] =

2n −1

∑(λ ) i −1

i

2

2

+ 2∑ λ i λ j kj

( ) E [C(G)] ≥ (n − 1)(n + 2) + (2n − 1)(2n − 2)G.M. ( λ λ ) 2

E [C(G)] ≥ (n − 1)(n + 2) + (2n − 1)(2n − 2)A.M. λ i λ j 2

i

j

(2)

Nullity and Energy Bounds of Central Graph of Smith Graphs

141

Now the total number of vertices in C (Cn) = number of vertices in Cn + number of edges in Cn = 2n and the total number of edges in C (Cn) = 2( number of edges in Cn) + number of edges between non - adjacent vertices in Cn = 2n + number of edges in Kn - number of edges in Cn = 2n+ . Now we find the nullity of graph C(Cn). Firstly we construct the adjacency matrix A= [aij] (n×n) of Cn, where n is an odd positive integer. Then the nullity of C (Cn) = |C (Cn)| - Rank of matrix =0 and if take n is an even number of vertices then the nullity of |C(Cn)| = |C(Cn)| - Rank of matrix =1.

Energy bounds of C (Wn); n ≥ 6, C (Hn); 7 ≤ n ≤ 9 and C (S5) Theorem 5.1: Let C (Wn) ∀ n ≥ 6, C (H7), C (H8), C (H9), C (S5) be the central graph of smith graphs. We denote these graphs by G. Then the energy bound of E (G) is

(

G.M. λ i λ j

(

G.M. λ i λ j

)

 = ∏ λ λ i j  i< j 

)

 = ∏ λ i   i< j

 G.M. λ i λ j = ∏ λ i   i< j

(

)

   

(2n −1)

   

2/ (2n −1)(2n − 2)

   

2/ (2n −1)(2n − 2)

2/ (2n −1)

Put the values of G. M. in equation (2) then we have

(3)

142

Graphs: Theory and Algorithms

CONCLUSIONS We have interpreted that any chemical compound is isomorphic to the central graph of smith graphs except central graph of smith graph H9 and Cn where n is an even are stable, closed – shell electron configuration and have low chemical reactivity and if any chemical compound is isomorphic to C(H9) and C(Cn) are unstable, open shell electron configuration and the respective compound is highly reactive.

Nullity and Energy Bounds of Central Graph of Smith Graphs

143

REFERENCES 1.

2. 3. 4.

5.

6.

7. 8.

9.

10.

11.

12.

13.

Adiga C., Khoshbakht Z. and Gutman I. (2007), “More graphs whose energy exceeds the number of vertices”, Iranian Journal of Mathematical Sciences and Informatics, vol. 2, no. 2, pp. (57-62). Biggs N.(1993), “Algebraic graph theory”, Cambridge University, Cambridge, 2ndEdition. Bo Zhou (2004), “Energy of Graph”, MATH Communications in Mathematical and in Computer Chemistry, no. 51, pp. (111-118). Coulson C. A., Leary B. D. and Mallion R. B. (1978), “Huckel theory for Organic Chemistry”, Academic Press Inc ISBN 10: 0121932508 / ISBN 13: 9780121932503. Dragan Stevanovic, Ivan Gutman, Masood U. Rehman, (2015), “On spectral radius and energy of complete multipartite graphs”, ARS Mathematica Contemporanea, no. 9, pp. (109-113). Gutman I. (1978), “The energy of a graph”, 10. Steiermarkisches Mathematisches Symposium (Stift Rein, Graz, 1978), no. 103, pp. (122). Gutman I. (2001), “The energy of a graph”, Algebraic Combinatorics and Applications, pp. (196-211). Gutman I., Antoaneta Klobucar and Snejezana Majstorovic, (2009) “Theory of Graph Energy: Hypoenergetic Graphs, Applications of Graph Spectra”, Mathematical Institution, Belgrade, pp. (65-105). Gutman I., Xueliang L. and Yongtang S. (2012), “Graph energy”, Springer-Verlag New York Inc., United States, ISBN 10: 1461442192 / ISBN 13: 9781461442196. Gunthard H. H. and Primas H. (1956) “Zusammenhang von Graphentheorie und MO theory von Molekeln mit Systemen konjugierter Bindungen”, Helv. Chim. Acta, no. 39, pp.(1645-1653). Ya- zhi Song, Xiao-qui Song and Mingeui Zhang, (2015), “An upper bound for the nullity of a bipartite graph in terms of its maximum degree”, Linear and Multilinear Algebra. Hu S., Xuezhong T. and Liu B. (2008), “On the nullity of bicyclic graphs”, Linear Algebra and its Applications, vol. 429, pp. (13871391), available online at: http://www.sciencedirect.com Jiang Q. and Chen S. (2011), “The nullity of 2- connected tricyclic Graphs”, Int. J. Contemp. Math. Sciences, vol.6, no. 12, pp. (571- 575).

144

Graphs: Theory and Algorithms

14. Sharaf K. R. and Ali. D. A. (2014), “Nullity of t- tupple Graphs”, International Journal of Mathematical, Computational, Physical, Electrical and Computer Engineering, vol. 8, no. 2. 15. Sharaf K. R., and Rasul K. B. (2014), “On the Nullity of expanded graphs”, Gen. Math. Notes, vol. 21, no. 1, pp. (97-117), available online at: http://www.geman.in. 16. Xuezhong T. and Liu B. (2005), “On the nullity of unicyclic graphs”, Linear Algebra and its Applications, vol. 408, pp. (212-220), available online at: http://www.sciencedirect.com

Induced Subgraph Saturated Graphs

10

Craig M. Tennenhouse

University of New England,

ABSTRACT A graph G is said to be H-saturated if G contains no sub graph isomorphic to H but the addition of any edge between non-adjacent vertices in G creates one. While induced sub graphs are often studied in the extremal case with regard to the removal of edges, we extend saturation to induced sub graphs. We say that G is induced Hsaturated if G contains no induced subgraph isomorphic to H and the addition of any edge to G results in an induced copy of H. We demonstrate constructively that there are non-trivial examples of saturated graphs for all cycles and an infinite family of paths and find a lower bound on the size of some induced path-saturated graphs. Citation: (APA): Tennenhouse, C. M. (2016). Induced Subgraph Saturated Graphs. Theory and Applications of Graphs, 3(2), 1. (15 pages) DOI: https://doi.org/10.20429/ tag.2017.030201 Copyright:- This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

146

Graphs: Theory and Algorithms

INTRODUCTION In this paper we address the problem of graph saturation as it pertains to induced graphs, in particular paths and cycles. We begin with some background and definitions, and complete Section 1 with statements of the main theorems. In Section 2 we demonstrate that there are non-trivial induced saturated graphs for an infinite family of paths, and prove lower bounds on the number of edges in possible constructions. We continue on to demonstrate results regarding induced cycles in Section 3 and claws in Section 4. A number of results were discovered using the SAGE mathematics software [11], an open source mathematics suite. Throughout we use Kn to denote the complete graph on n vertices, Cn the cycle on n vertices, Kn,m the complete bipartite graph with parts of order n and m, and Pn the path on n vertices. All graphs in this paper are simple, and by we mean the complement of the graph G. If u, v are nonadjacent vertices in G then G + uv is the graph G with edge uv added. Given graphs G and H the graph join G ∨ H is composed of a copy of G, a copy of H, and all possible edges between the vertices of G and the vertices of H. The graph union G∪H consists of disjoint copies of G and H. The graph kH consists of the union of k copies of H. In particular a matching is a collection of pairwise disjoint edges, denoted kK2. The order n(G) and size e(G) of G are the numbers of its vertices and edges, respectively. A vertex v in a connected graph G is a cut vertex if its removal results in a disconnected graph. If G has no cut vertex then it is 2-connected, and a maximal 2-connected sub graph of G is a block. Note that a cut edge is also a block. For simple graphs G and H we say that G is H-saturated if it contains no sub graph isomorphic to H but the addition of any edge from creates a copy of H. We refer to G as the parent graph. The study of graph saturation began when Mantel and students [9] determined the greatest number of edges in a K3-free graph on n vertices in 1907, which was generalized by Tur´an in the middle of the last century [12] to graphs that avoid arbitrarily large cliques. Erd˝os, Hajnal, and Moon then addressed the problem of finding the fewest number of edges in a Km-saturated graph [3]. In particular, they proved the following theorem. Theorem 1.1. For m ≥ 3 and n ≥ m, the unique smallest graph on n vertices that is Km-saturated is . This graph contains + (n − m + 2)(m − 2) edges

Induced Subgraph Saturated Graphs

147

Since then, graph saturation has been studied extensively, having been generalized to many other families of graphs, oriented graphs [7], topological minors [5], and numerous other properties. A comprehensive collection of results in graph saturation is available in [4]. Given a graph G and a subset X of vertices of G, the sub graph induced by X is the graph composed of the vertices X and all edges in G among those vertices. We say that a sub graph H of G is an induced sub graph if there is a set of vertices in G that induces a graph isomorphic to H. We say that G is H-free if G contains no induced sub graph isomorphic to H. Finding induced sub graphs of one graph isomorphic to another is a traditionally difficult problem. Chung, Jiang, and West addressed the problem of finding the greatest number of edges in degree-constrained Pn-free graphs [1]. Martin and Smith created the parameter of induced saturation number [10]. We include their definition below for completeness. Definition 1.2 (Martin, Smith 2012). Let T be a graph with edges colored black, white, and gray. The graph T realizes H if the black edges and some subset of the gray edges of T together include H as an induced sub graph. The induced saturation number of H with respect to an integer n is the fewest number of gray edges in such a graph T on n vertices that does not realize H but if any black or white edge is changed to gray then the resulting graph realizes H. In this paper we only consider adding edges to a simple non-colored graph. Definition 1.3. Given graphs G and H we say that G is induced H-saturated if does not contain an induced sub graph isomorphic to H but the addition of any edge from G to G creates one. Note that in Definition 1.3 we allow G to be a complete graph. This case provides for a trivial family of induced H-saturated graphs for any noncomplete graph H. Henceforth we will be concerned with determining nontrivial induced H-saturated graphs.

Main Results We will prove the following results to show the existence of non-complete induced Pm-saturated graphs for infinitely many values of m.

Theorem. (2.6) For any k ≥ 0 and n ≥ 14 + 8k there is a non-complete induced P9+6k-saturated graph on n vertices. Further, if n is a multiple of (14 + 8k) there is such a graph that is 3-regular.

148

Graphs: Theory and Algorithms

As we will see in Section 2.1, these orders are the result of the search for a longest induced path in a class of vertex transitive hamiltonian graphs of small size, visualizable with high rotational symmetry. Theorem. (2.11) If G is an induced Pm-saturated graph on n vertices with no pendant edges except a K2 component, m > 4, then G has size at least (n − 2) + 1. This bound is realized when m = 9 + 6k and n ≡ 2 mod (14 + 8k).

Figure 1: An induced P5-saturated graph

Theorem. (2.14) For every integer k > 0 there is a non-complete graph that is induced P11+6k-saturated. Regarding cycles, we will prove the following theorem. Theorem. (3.3) For any k ≥ 3 and n ≥ 3(k − 2) there is a noncomplete induced Ck saturated graph of order n. Finally, we will demonstrate the following regarding induced clawsaturated graphs. Theorem. (4.2) For all n ≥ 12, there is a graph on n vertices that is induced K1,3-saturated and is non-complete.

PATHS An infinite family of paths The only induced P2-saturated graph on n ≥ 2 vertices is . The induced P3saturated graph of order n with the smallest size is either the matching if n is even, or if n is odd. The case for P4 is similar, consisting

Induced Subgraph Saturated Graphs

149

of the matching if n is even and if n is odd. It is also easily seen that the graph of order 9 and size 12 consisting of a triangle with each vertex sharing a vertex with another triangle, as in Figure 1, is induced P5saturated, and that the Petersen graph is induced P6-saturated.

We begin our analysis of induced path-saturated graphs by examining an infinite family of cubic hamiltonian graphs developed by Lederberg [8] and modified by Coxeter and Frucht, and later by Coxeter, Frucht, and Powers [6, 2]. For our purposes we will only consider graphs from this family denoted in LCF (for Lederberg, Coxeter, Frucht) notation by [x, −x] a with x odd. A graph of this form consists of a 3-regular cycle on 2a vertices {v0v1 . . . v2a−1} and a matching that pairs each v2i with v2i+x, with arithmetic taken modulo 2a. See Figure 2 for an example.

Let Gk denote the graph with LCF notation [5, −5]7+4k. Note that the order of Gk is 14 + 8k. First we find a long induced cycle in Gk. Fact 2.1. The graph Gk has an induced cycle of length 8 + 6k.

Figure 2: The graph G2, which has LCF notation [5, −5]15, with an induced C20. Proof. Let n be the order of G, and let C = {v0, v1, v2, v3, vn−2, vn−3, vn−8, vn−9}. Then, proceed to add 6 vertices at a time to C in the following way until v5 is included. Let vp be the last vertex added to C. Add the vertices {vp−5, vp−4, vp−3, vp−2, vp−7, vp−8}. Once v5 is added, the graph induced by C is a chordless cycle of order 8 + 6k (Figure 2). Note that the closed neighborhoods of vn−5 and vn−6 are disjoint from the cycle C constructed in the proof of Fact 2.1. Therefore, the addition of any edge between any vertex on this cycle and vn−5 or vn−6 generates an induced

150

Graphs: Theory and Algorithms

path of order 9 + 6k. A simple reflection that reverses vn−5 and vn−6 shows that another induced cycle of the same length exists in Gk.

We next must bound the length of induced paths in Gk. Note that a simple counting argument is not sufficient, since in general a 3-regular graph on 14 + 8k vertices may contain an induced path on as many as 10+6k vertices as seen in the following construction. Consider a path P on 10+6k vertices and an independent set X of 4+2k vertices. From each internal vertex in P add a single edge to a vertex in X, and from the endpoints in P add two edges to vertices in X, in such a way as to create a 3-regular graph. The resulting graph has order 14 + 8k and an induced path on 10 + 6k vertices.

Lemma 2.2. The graph Gk contains no induced path on more than 8 + 6k vertices. Proof. First, note that an exhaustive search of G0 yields a longest induced path of order 7. Consider the case where k ≥ 1. Let P be a longest induced path in Gk. Let V be the m = 8 + 6k vertices in the cycle C from the proof of Fact 2.1. Let the sets U, X, and Y contain the remaining vertices that have 3, 2, and 0, neighbors, respectively, in V. Note that |X| = 4 and |Y | = 2, irrespective of k, and |U| = 2k. Consider the induced path P 0 on m vertices in Figure 3. For example, in Figure 2 P 0 would be the path v26v1v2v3v28v29v24 v23v22v21v20v19v14v13v8v9v10v11v6v5. We claim P is no longer than P 0.

Figure 3: The graph Gk with longest induced path Po.

For every vertex u ∈ U in P there is one neighbor (if u is an internal vertex of P) or two neighbors (if u is a terminal vertex) from V not in P. Let us assume that UP = {u1, . . . , ul} = U ∩P, and denote by N(UP ) ⊂ V the neighbors of all vertices in UP . So P includes l vertices from U and avoids at least l vertices from N(UP ).

Induced Subgraph Saturated Graphs

151

Similarly, assume there is a vertex x in P ∩ V that is not in P 0. Then either a neighbor of x in P 0 ∩ V is not in P, or a route through X ∪ Y on P 0 is diverted and hence a vertex from P 0 ∩ (X ∪ Y ) is not in P. In either case, the inclusion of any vertex from U in P leads to at least one fewer vertex from P0, and hence does not lead to a longer induced path. Finally, we consider the possible inclusion of vertices from X ∪ Y. It is easily seen that no induced path can include all vertices from X ∪ Y. If P has no vertices from X ∪ Y then it is strictly shorter than P 0. If P contains exactly one or two vertices from X then there are at least three vertices in V not in P, making P at least as short as P 0. Any induced path containing 5 vertices from X ∪ Y must exclude one from Y as in P 0, and so no other induced path also containing 5 of these vertices will be longer that P 0. Therefore, P 0 is a longest induced path in Gk, and hence Gk does not contain an induced path on more than 8 + 6k vertices. Now we demonstrate that the graph Gk is saturated with respect to the property of long induced paths. Lemma 2.3. The graph Gk is induced P9+6k-saturated

Proof. Once again we let n = 14 + 8k, the order of Gk. By Lemma 2.2 there is no induced path of order 9 + 6k in the graph Gk. Define the bijections φ and ψ on the vertices of Gk by φ(vi) = v1−i (reflection) and ψ(vi) = vi+1 (rotation), with arithmetic modulo n. Note that both φ 2 and ψ are auto orphisms of Gk. Given vi , vj ∈ V (Gk) there is an auto orphism of the form φ 2k ◦ ψ m that takes vi to vj , where k is some integer and m ∈ {0, 1}. Therefore, the graph Gk is vertex transitive under repeated applications of φ and ψ. In particular, the induced cycle Ck can be rotated and reflected via these functions to yield a function f such that for any pair x, y of nonadjacent vertices in Gk there is an image f(Ck) so that x is on f(Ck) and neither y nor its neighbors are on f(Ck). Therefore, between any two nonadjacent vertices the addition of any edge creates an induced P9+6k. We now generalize to an arbitrary number of vertices.

Lemma 2.4. The disjoint union H of m copies of Gk, with at most one complete graph on at least 2 vertices, is induced P9+6k-saturated.

Proof. Since each connected component of H is induced P9+6k-saturated by Lemma 2.3 we need only consider the addition of edges between components. Since Gk is vertex transitive and contains an induced cycle on (8+6k) vertices (Fact 2.1), each vertex in Gk is the terminal vertex of an induced path on (7 + 6k) vertices. Therefore, any edge between disjoint copies

152

Graphs: Theory and Algorithms

of Gk creates an induced path on far more than the necessary (8+6k) vertices. The addition of an edge between a copy of Gk and a complete component will also result in an induced P9+6k Therefore, H is induced P9+6k-saturated.

Lemma 2.5. The join of any complete graph to any induced Pn-saturated graph, for any n ≥ 4, generates a new induced Pn-saturated graph.

Proof. Note that joining a clique to a graph does not contribute to the length of the longest induced path except in the most trivial cases of P1, P2, and P3, nor does it add any non-edges to the graph which require testing for saturation. Note that joining a complete graph to any induced H-saturated graph, for any non-complete graph H, generates a new induced H-saturated graph. Therefore, we can prove the main result of this section. Theorem 2.6. For any k ≥ 0 and n ≥ 14 + 8k there is a non-complete induced P9+6ksaturated graph on n vertices. Further, if n is a multiple of (14 + 8k) there is such a graph that is 3-regular. Proof. By Lemma 2.3 there is such a graph on n = 14 + 8k vertices, and Lemmas 2.4 and 2.5 demonstrate that n increases without bound.

Lower bounds As an analogue to Theorem 1.1 by Erd˝os, Hajnal, and Moon [3], in which smallest Kn saturated graphs are studied, we now turn our attention to finding the smallest induced Pm-saturated graphs. Assume throughout that m > 3.

First we look at some properties of induced Pm-saturated graphs with pendant edges, and then we will turn our attention to graphs with minimum degree two. Fact 2.7. If u and v are distinct pendant vertices in an induced Pmsaturated graph G then the distance from u to v is greater than three. Proof. If u and v share a neighbor w then the addition of edge uv cannot create an induced path that includes w, so their distance is at least three. If instead u has neighbor wu and v has neighbor wv, with wu adjacent to wv, then the added edge uwv must begin an induced Pm. However, this edge can be replaced in G by vwv, so G must already contain an induced Pm. Therefore the neighbors of u and v cannot be adjacent. Next we examine the neighbor of a pendant vertex in an induced Pmsaturated graph.

Induced Subgraph Saturated Graphs

153

Fact 2.8. Let v be a pendant vertex in a non-complete component of an induced Pm-saturated graph G, with neighbor u. Then u has degree at least four. Proof. If deg(u) = 2 then the addition of the edge joining its neighbors cannot create a longer induced path than one that includes u. Assume u only has neighbors v, a, and b. If a and b are adjacent then the added edge va must begin an induced Pm that avoids b, but we can then replace va with ua and get an induced path of the same length. If instead a and b are not adjacent then adding edge ab to G does not result in an induced path longer than one containing the path aub. Hence u has at least one other neighbor c. For the remainder of the section we will consider non-complete graphs without pendant edges. Fact 2.9. If G is induced Pm-saturated and contains a vertex v of degree 2 then the neighbors of v are adjacent. Proof. Assume that deg(v) = 2 and v has neighborhood {u, w}. If u is not adjacent to w then the addition of edge uw cannot generate a longer induced path than one originally present in G that includes edges uv, vw. Therefore, u and w must already be adjacent. As noted in the beginning of Section 2.1, a matching with possibly an isolated vertex or a connected component isomorphic to K3 constitute an induced P3- and P4-saturated graph, respectively. Note that when m > 4 an induced Pm-saturated graph cannot have more than one complete component, as any edge between two such components generates an induced path of order at most 4. We now demonstrate that any induced Pm-saturated graph on n vertices, for m > 4, has average degree at least 3 among its noncomplete components. Lemma 2.10. For m > 5 all non-complete connected components of an induced Pm-saturated graph with no pendant edges have average degree at least 3. Proof. Let G be an induced Pm-saturated graph. If all vertices of G have degree 3 or more then the result is clear, so let us assume that v is a vertex of G with degree 2 and with neighbor’s u and w. By Lemma 2.9 u and w are adjacent. Without loss of generality we may assume that deg(w) ≤ deg(u). We will consider the cases in which deg(w) = 2 and deg(w) > 2. First assume that both deg(u) and deg(w) are at least 3. We demonstrate that there are sufficiently many vertices of high degree to yield an average degree of at least 3. Let a be a neighbor of u and b a neighbor of w, with a,

154

Graphs: Theory and Algorithms

b {u, v, w}. If deg(w) = 3 and a and b are distinct then no induced path containing the new edge wa can be longer than an induced path containing the sub-path wua. If instead a = b then the addition of va does not create any induced path not already in G by means of edge wa. Thus, if deg(u) ≥ deg(w) ≥ 3 then deg(u) ≥ deg(w) ≥ 4. Now we consider the case in which deg(w) = 2. Vertex u is therefore a cut vertex of G. Note that if there is another block containing u that is isomorphic to K3 then the addition of an edge between two such blocks does not result in a longer induced path than one already present in the graph. If deg(u) = 3, with u adjacent to a vertex a distinct from w and v, then adding edge va to G does not create any induced path longer than one already present in G that uses the edge ua. So deg(u) ≥ 4. Say that {v, w, u’, w’} are in the neighborhood of u and note that, as above, if deg(u 0 ) = deg(w’) = 3 then the addition of edge w 0a shows that G is not induced Pmsaturated. So the graph G with vertices v, w removed must also have average degree at least 3, and therefore G does as well. Next consider the set T of vertices of degree 2 whose neighbors all have degree at least 3, and the set S composed of neighbors of vertices in T. Since vertices in S all have degree at least 4, if t = |T| ≤ |S| = s then the graph has at least as many vertices with degree greater than three than those with degree 2 and we are done. Assume instead that t > s. Since the two neighbors of each vertex in T are adjacent, we know that for each vertex in T there are at least 3 edges in the induced graph < T ∪ S >. Hence the average degree among vertices in < T ∪S > is at least . Since all other vertices in G either have degree at least 3 or are part of a distinct triple with total degree at least 9 as shown above, the average degree of any non-complete component of an induced Pm-saturated graph is at least 3. This leads us to the proof of the lower bound for the size of a class of induced Pm-saturated graphs.

Theorem 2.11. If G is an induced Pm-saturated graph on n vertices with no pendant edges except for a K2 component, m > 5, then G has size at least (n − 2) + 1. This bound is realized when m = 9 + 6k and n ≡ 2 mod (14 + 8k). Proof. In the graph G all but at most one connected component consists of vertices of average degree at least 3, with potentially one component isomorphic to K2 or K3 by Lemma 2.10. Therefore, e(G) ≥ (n − 2) + 1. By Lemma 2.4 the graph consisting of disjoint copies of Gk and a K2 has size (n − 2) + 1 and is induced Pm-saturated.

Induced Subgraph Saturated Graphs

155

Other Path Results For certain induced Pm-saturated graphs we can create induced Pm+2-saturated graphs by using the following constructions. Construction 2.12. Generate the graph Tv(G) by identifying each vertex in G with one vertex of a distinct triangle. The new graph Tv(G) has order 3n(G) and size e(G) + 3n(G) (Figure 4).

Figure 4: The graphs C5, Tv(C5), and Te(C5).

Construction 2.13. The graph Te(G) is composed of the graph G along with a new vertex for each edge of G, adjoined to both endpoints of that edge. The graph Te(G) has order n(G) + e(G) and size 3e(G) (Figure 4). Now we will show that both constructions yield the expected results. First, we restate and prove Theorem 2.14 in a different form than that given in Section 1.1. Theorem 2.14. The graph Tv(Gk) is induced P11+6k-saturated.

Proof. First we establish that every vertex v in the graph Gk is the terminal vertex for two induced paths of order 8+6k, each with a different terminal edge. Let P be the path that begins {v0v5v6v7v2v3vn−2vn−3vn−4vn−5vn v v }. If i = (n − 12) then proceed similar to C in Fact 2.1 by adding −6 n−7 n−12 {vivi−1vi−6vi−5vi−4vi−3vi−8} to P, repeating until the addition of edge v10v9. By applying the auto orphism φ 8 ◦ ψ we get another induced path of order 8 + 6k starting at v0, P’ . Notice that P contains the edge v0v5 and P’ the edge v0vn−1. Again, since Gk is vertex transitive, we see that each vertex in Gk is the terminal vertex for two induced paths with distinct terminal edges. Next, we consider a pair x, y of distinct non-adjacent vertices in Gk. We need only consider the cases in which a pair of nonadjacent vertices are both in the original graph Gk, neither in the original graph Gk, or exactly one is in Gk.

156

Graphs: Theory and Algorithms

Say that x, y ∈ Gk. Since the addition of edge xy to Gk creates an induced P9+6k, one vertex from each added triangle to the endpoints of this path in Tv(Gk) yields an induced P11+6k. If neither x nor y are in Gk and their neighbors in Gk, say x 0 and y 0 respectively, are not adjacent then, since a new edge between x 0 and y 0 in Gk generates an induced P9+6k, this path is extended similarly by two edges to create an induced P11+6k in Tv(Gk). If instead x’y’ is an edge of Gk, consider the induced P8+6k in Gk that begins at x’ and avoids the edge x’ y’. This extends to an induced P9+6k in Tv(Gk). Since x’y’ is not in this path, then the vertex y’ is also avoided entirely. The addition of edge xy to Tv(Gk) creates an induced P11+6k beginning with y. Lastly, consider the case in which x is a vertex of Gk and y is not. Let y’be y’s neighbor in Gk and y’’ the vertex of degree 2 adjacent to y in Tv(Gk). Again, since there is an induced P9+6k in Tv(Gk) that begins at x and avoids y’, the addition of the edge xy creates an induced P11+6k beginning at y’’.

Theorem 2.15. If G is K3-free, induced Pm-saturated, and every vertex is in a component of order at least three, then Te(G) is induced Pm+2-saturated. Proof. Just as in the proof of Theorem 2.12 we need to consider the same three cases.

If x, y are nonadjacent vertices, both in G, then the addition of edge xy to G generates an induced Pm. Since none of the neighbors of the end vertices are in the path, it can be extended on both ends to the added vertices, yielding an induced Pm+2. If both x and y are new vertices added in the construction of Te(G), then let the neighbors of x be x’ , x’’ and the neighbors of y y’ , y’’. If adding edge x’ y’ or x’’y’’ creates an induced Pm that avoids the edges x’x ‘’ and y’ y’’ then the addition of edge xy generates an induced Pm+2. If instead every induced Pm created by adding either edge x’y 0 or x’’y’’ includes at least one of the edges {x ‘x’’, y’ y’’} then there is an induced Pm that includes the added edge x’ y’’ that does not since G is K3-free . Therefore the addition of edge xy is equivalent to adding an edge between a neighbor of each in G, with the induced Pm extended by one edge toward x and one toward y. Lastly, if x is in the original graph G and y is not, then we proceed as above and extend the induced Pm that results from joining x to a neighbor of y not already adjacent to x (which exists since G is K3-free) by one edge toward y and by another edge at a terminal vertex of the path. Therefore, Te(G) is induced Pm+2-saturated. We end this section by noting that a computer search using SAGE [11], in conjunction with Constructions 2.12 and 2.13, has found induced Pm-saturated graphs for all 7 ≤ m ≤ 30. The results are listed in Table 5, most in LCF notation.

Induced Subgraph Saturated Graphs

157

In the interest of space we have omitted the proof that they are saturated, as each is simply a case analysis. Note that there are induced Pm-saturated graphs that cannot be written in the form [x, −x] n, and further some are the result of the operations Te and Tv. Therefore, not all induced Pm-saturated graph are regular nor the result of a regular graph joined to a complete graph.

CYCLES The star K1,(n−1) is induced C3-saturated, and is in fact the graph on n vertices of smallest size for n ≥ 3. This is a direct consequence of Theorem 1.1. The due to Mantel [9]. largest such graph is Note that C5 is trivially both induced C3-saturated and induced C4-saturated.

We now show that for all integers k ≥ 3, n ≥ 3(k − 2) there is an induced Cksaturated graph on n vertices that is non-complete. We begin with another construction. Construction 3.1. Define the graph G[k] on 3k vertices, k ≥ 3, in the following way. Let v0v1 . . . vk−1v0 be a k-cycle, the internal cycle of G[k]. Add the matching uiwi and the edges uivi , wivi, and wiui+1, 0 ≤ i ≤ (k −1) with addition modulo k, the external cycle (Figure 6).

Figure 5: Induced path-saturated graphs.

158

Graphs: Theory and Algorithms

Figure 6: The graph G[5], which is induced C7-saturated.

Claim 3.2. The graph G[k] is induced Ck+2-saturated.

Proof. First we show that G[k] does not contain an induced cycle of length k + 2. Note that every copy of Ck+2 in G[k] contains vertices from both the internal and external cycles. Any induced cycle C in G[k] contains paths on the internal cycle of the form vivi+1vi+2 . . . vj and/or paths on the outer cycle, and edges joining these paths into a cycle. The cycle C therefore has length either k (if i = j) or at least (k + 3). Now we show that the addition of any edge e to G[k] results in an induced Ck+2. Note that there are three potential forms that e can take: an edge among the vertices of the internal cycle, an external cycle edge, or an edge between these cycles. If e = vivj then we need only consider k > 3. There is a newly created induced cycle of length l ≥ ( + 1) along the internal cycle. This can be extended by considering an edge from vi to one of its neighbors on the external cycle, and traversing an appropriate number of edges before rejoining the internal cycle. In this way we create an induced cycle of every length between ( + 4) and (k + 3), inclusive. If instead e joins vertices between the internal and external cycles, then we create an induced Ck+2 in the following way. Without loss of generality we assume that e = viu0. We get an induced Ck−i+2 by proceeding around the internal cycle from vi to vk−1 then to wk−1. Other induced cycles result from returning to the outer cycle sooner, creating cycles of length (k − i + 3) through (2k − 2i + 1). We can also find induced cycles proceeding in the other direction along the internal cycle from vi down to v1 (or v0 if =

Induced Subgraph Saturated Graphs

159

(k − 1)), then back to u1, yielding induced cycles with lengths from (i + 3) through (2i + 2). Therefore, an induced cycle of length (k + 2) can be found in G[k] with the added edge for in the latter case and in the former. Finally, if the new edge joins vertices on the external cycle of G[k] then an induced cycle of length (k + 2) can be formed by utilizing an edge from vi to the internal cycle, continuing along a sufficiently long path, then rejoining the described induced path along the external cycle. Theorem 3.3. For any k ≥ 3 and n ≥ 3(k−2) there is a non-complete induced Ck-saturated graph of order n.

Proof. By Claim 3.2 there is an n and a graph G[k − 2] on n vertices that is induced Ck-saturated. We can extend Construction 3.1 to a larger number of vertices by joining it to a clique, since any vertex in the joined clique is adjacent to all other vertices so cannot be in the induced cycle. The graph G[k] can also be extended to more vertices by replacing each vertex with a clique. If the vertices are distributed in a balanced way the resulting graph has size approximately

Figure 7: A claw K1,3 and the induced K1,3-saturated graph G

CLAWS We now turn our attention to the claw graph K1,3 (Figure 7). We build a graph that is induced K1,3-saturated. Construction 4.1. Let G be a 6-cycle on the vertices {v0, . . . , v5}. To this set, we add the vertices {u0, . . . , u5} and for each i join ui to vi , vi+1, and ui+3 with addition taken modulo 6 (Figure 7).

It is easy to see that the graph in Construction 4.1 is claw-free. We can now prove the following theorem.

160

Graphs: Theory and Algorithms

Theorem 4.2. For all n ≥ 12, there is a graph on n vertices that is induced K1,3-saturated and is non-complete graph.

Proof. First we demonstrate that the graph G in Construction 4.1 is induced K1,3-saturated. If we join ui to uj then ui is the center of a claw with neighbors uj , ui+3, and vi . If we join ui to vj then vj has pairwise nonadjacent neighbors ui, vj−1 or vj+1, and either uj or uj−1. Finally, if edge vivj is added to G then vi is the center of a claw along with vj , ui , and ui−1. Since the disjoint union of induced K1,3-saturated graphs is itself induced K1,3-saturated we can generate a graph on n vertices with disjoint copies of G and possibly a complete connected component.

FUTURE WORK It would be interesting to find a smallest construction G (m) that is induced Pm-saturated for all m > 1, or determine that no such construction exists. It is suspected that G (m) has size , but the largest such graph, in the spirit of Turans Theorem [12], would also be worth investigating. Induced Pm-saturated graphs with pendant edges also remain to be studied, as these graphs may be smaller than those in Theorem 2.11. Indeed, it is not hard to construct such a graph by joining K1 ∪Gk to a single vertex, but this graph is quite large. Further, as we have considered paths and claw graphs in this paper the study of induced saturation could be furthered by considering the family of trees. We wish to thank the referees for their very helpful comments and suggestions, which improved this manuscript measurably.

Induced Subgraph Saturated Graphs

161

REFERENCES 1.

M.S. Chung, T. Jiang, D.B. West “Induced Tur´an problems: Largest Pm-free graphs with bounded degree”, Preprint. 2. H.S.M. Coxeter, R. Frucht, D.L. Powers “Zero-symmetric graphs: Trivalent graphical regular representations of groups”, New York: Academic Press (1981). 3. P. Erd˝os, A. Hajnal, J.W. Moon “A problem in graph theory”, Amer. Math. Monthly 71 (1964), 1107-1110. 4. J.R. Faudree, R.J. Faudree, J.R. Schmitt “A survey of minimum saturated graphs”, Elec. J. of Comb. DS19 (2011), 36 pp. 5. M. Ferrara, M. Jacobson, K. Milans, C. Tennenhouse, P. Wenger “Saturation numbers for families of graph subdivisions”, J. Graph Theory 71,4 (2012), 416-434. 6. R. Frucht “A canonical representation of trivalent Hamiltonian graphs”, J. Graph Theory 1,1 (1976), 45-60. 7. M.S. Jacobson, C. Tennenhouse “Oriented graph saturation”, J. Comb. Math. Comb. Comp. 80 (2012), 157-169. 8. J. Lederberg “DENDRAL-64: A System for Computer Construction, Enumeration and Notation of Organic Molecules as Tree Structures and Cyclic Graphs. Part II. Topology of Cyclic Graphs”, Interim Report to the National Aeronautics and Space Administration (1965). 9. W. Mantel, “Problem 28”, solution by H. Gouwentak, W. Mantel, J. Teixeira de Mattes, F. Schuh, W.A. Wythoff, Wiskundige Opgaven 10 (1907), 60-61. 10. R.R. Martin, J.J. Smith “Induced saturation number”, Disc. Math. 312 21 (2012), 3096-3106. 11. W. Stein et al. “Sage Mathematics Software (Version 5.8)”, The SAGE Development Team (2013), http://www.sagemath.org. 12. P. Tur´an “On the theory of graphs”, Colloq. Math 3 (1954), 19-30, MR 15,976b.

Connection and Separation in Hyper Graphs

11

Mohammad A. Bahmanian1 and Mateja Sajna2 1.

Illinois State University

2.

University of Ottawa,

ABSTRACT In this paper we study various fundamental connectivity properties of hyper graphs from a graph-theoretic perspective, with the emphasis on cut edges, cut vertices, and blocks. We prove a number of new results involving these concepts. In particular, we describe the exact relationship between the block decomposition of a hyper graph and the block decomposition of its incidence graph.

Citation: (APA): Bahmanian, M. A., & Sajna, M. (2015). Connection and separation in hyper graphs. Theory and Applications of Graphs, 2(2), 5. (25 pages) DOI: https://doi. org/10.20429/tag.2015.020205 Copyright:- This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

164

Graphs: Theory and Algorithms

Keywords: Hyper graph, incidence graph, walk, trail, path, cycle, connected hyper graph, cut edge, cut vertex, separating vertex, block.

INTRODUCTION A data base search under “hyper graph” returns hundreds of journal articles published in the last couple of years alone, but only a handful of monographs. Among the latter, most either treat very specific problems in hyper graph theory (for example, colouring in [8] and also in [9]), or else are written with a non-mathematician audience in mind, and hence focus on applications (for example, [6]). A mathematician or mathematics student looking for a general introduction to hyper graphs is left with Berge’s decades-old Hyper graphs [3] and Graphs and Hyper graphs [2], and Voloshin’s much more recent Introduction to Graphs and Hyper graphs [9], aimed at undergraduate students. The best survey on hyper graphs that we could find, albeit already quite out of date, is Duchet’s chapter [7] in the Handbook on Combinatorics. In particular, it describes the distinct paths that lead to the study of hyper graphs from graph theory, optimization theory, and extremal combinatorics, explaining the fragmented terminology and disjointed nature of the results. Berge’s work, for example, though an impressive collection of results, shows a distinct bias for hyper graphs arising from extremal set theory and optimization theory, and as such is rather unappealing to graph theorists, in general. The numerous journal publications, on the other hand, treat a great variety of specific problems on hyper graphs. Graph theorists find various ways of generalizing concepts from graph theory, often without justifying their own approach or comparing it with others. The same term in hyper graphs (for example, cycle) may have a variety of different meanings. Sometimes, authors implicitly assume that results for graphs extend to hyper graphs. A coherent theory of hyper graphs, as we know it for graphs, is sorely lacking. This article can serve as an introduction to hyper graphs from a graphtheoretic perspective, with a focus on basic connectivity. To prepare the ground for the more involved results on block decomposition of hyper graphs, we needed to carefully and systematically examine the fundamental connectivity properties of hyper graphs, attempting to extend basic results such as those found in the first two chapters of a graph theory textbook. We are strongly biased in our approach by the second author’s graph-theoretic perspective, as well as in our admiration for Bondy and Murty’s graph theory “bible” [5] and its earlier incarnation [4]. While we expect that some

Connection and Separation in Hyper Graphs

165

of these observations have been made before, to the best of our knowledge they have never been tied to a coherent theory of connection in hyper graphs and published in a widely accessible form. Our paper is organized as follows. In Section 2 we present the fundamental concepts involving hyper graphs, as well as some immediate observations. Section 3 forms the bulk of the work: from graphs to hyper graphs, we generalize the concepts of various types of walks, connection, cut edges and cut vertices, and blocks, and prove a number of new results involving these concepts. A longer version of this paper is available on ArXiv [1].

FUNDAMENTAL CONCEPTS Hyper graphs and sub hyper graphs We shall begin with some basic definitions pertaining to hyper graphs. The graph-theoretic terms used in this article are either analogous to the hyper graph terms defined here, or else are standard and can be found in [5]. Definition 2.1. A hype graph H is an ordered pair (V, E), where V and E are disjoint finite sets such that V = ∅, together with a function ψ: E → 2 V, called the incidence function. The elements of V = V (H) are called vertices, and the elements of E = E (H) are called edges. The number of vertices |V | and number of edges |E| are called the order and size of the hyper graph, respectively. Often we denote n = |V | and m = |E|. A hyper graph with a single vertex is called trivial, and a hyper graph with no edges is called empty. Two edges e, e′ ∈ E are said to be parallel if ψ (e) = ψ (e ′ ), and the number of edges parallel to edge e (including e) is called the multiplicity of e. A hyper graph H is called simple if no edge has multiplicity greater than 1; that is, if ψ is injective. As it is customary for graphs, the incidence function may be omitted when no ambiguity can arise (in particular, when the hyper graph is simple, or when we do not need to distinguish between distinct parallel edges). An edge e is then identified with the subset ψ(e) of V , and for v ∈ V and e ∈ ψ(e), E, we conveniently write v ∈ e or v e instead of v ∈ ψ(e) or v respectively. Moreover, E is then treated as a multi set, and we use double braces to emphasize this fact when needed. Thus, for example, {1, 2} = {{1, 2}} but {1, 1, 2} = {1, 2} ̸= {{1, 1, 2}}.

Graphs: Theory and Algorithms

166

Definition 2.2. Let H = (V, E) be a hyper graph. If v, w ∈ V are distinct vertices and there exists e ∈ E such that v, w ∈ e, then v and w are said to be adjacent in H (via edge e). Similarly, if e, f ∈ E are distinct (but possibly parallel) edges and v ∈ V is such that v ∈ e ∩ f, then e and f are said to be adjacent in H (via vertex v). Each ordered pair (v, e) such that v ∈ V, e ∈ E, and v ∈ e is called a flag of H; the (multi)set of flags is denoted by F(H). If (v, e) is a flag of H, then we say that vertex v is incident with edge e.

The degree of a vertex v ∈ V (denoted by degH(v) or simply deg(v) if no ambiguity can arise) is the number of edges e ∈ E such that v ∈ e. A vertex of degree 0 is called isolated, and a vertex of degree 1 is called pendant. A hyper graph H is regular of degree r (or r-regular) if every vertex of H has degree r The maximum (minimum) cardinality |e| of any edge e ∈ E is called the rank (corank, respectively) of H. A hyper graph H is uniform of rank r (or r-uniform) if |e| = r for all e ∈ E. An edge e ∈ E is called a singleton edge if |e| = 1, and empty if |e| = 0.

The concepts of isomorphism and incidence matrix for hyper graphs are straightforward generalizations from graphs and designs; see [1] for more details. The following types of sub hyper graphs will be used in this paper. Definition 2.3. Let H = (V, E) be a hyper graph. 1.

2 3.

4. 5.

A hyper graph H′ = (V ′, E′) is called a sub hyper graph of H if V ′ ⊆ V and either E ′ = ∅ or the incidence matrix of H′, after a suitable permutation of its rows and columns, is a sub matrix of the incidence matrix of H. Thus, every edge e ′ ∈ E ′ is of the form e ∩ V ′ for some e ∈ E, and the corresponding mapping from E ′ to E is injective. . A sub hyper graph H′ = (V ′, E′) of H with E ′ = {{e ∩ V ′: e ∈ E, e ∩ V ′ ∅}} is said to be induced by V ′. If |V | ≥ 2 and v ∈ V, then H\v will denote the sub hyper graph of H induced by V − {v}, also called a vertex-deleted sub hyper graph of H. A hyper graph H′ = (V ′, E′) is called a hyper sub graph of H if V ′ ⊆ V and E ′ ⊆ E. A hyper sub graph H′ = (V ′, E′) of H is said to be induced by V ′, denoted by H [V ′], if E ′ = {{e ∈ E: e ⊆ V ′, e ⊆= ∅}}.

Connection and Separation in Hyper Graphs

167

6

. A hyper sub graph H′ = (V ′, E′) of H is said to be induced by E ′, denoted by H [E ′], if V ′ = ∪e∈E′e. 7. For E ′ ⊆ E and e ∈ E, we write shortly H − E ′ and H − e for the hyper sub graphs (V, E − E ′) and (V, E − {{e}}), respectively. The hyper sub graph H − e may also be called an edge-deleted hyper sub graph. Note that the above definitions of sub hyper graphs and vertex-subsetinduced sub hyper graphs are consistent with [7]. A more detailed discussion of these terms can be found in [1]. Observe that, informally speaking, the vertex-deleted sub hyper graph H\v is obtained from H by removing vertex v from V and from all edges of H, and then discarding the empty edges. It is easy to see that every hyper sub graph of H = (V, E) is also a sub hyper graph of H, but not conversely. However, not every hyper sub graph of H induced by V ′ ⊆ V is a sub hyper graph of H induced by V ′.

Observe also that if H is a 2-uniform hyper graph (and hence a loop less graph), its hyper sub graphs, vertex-subset-induced hyper sub graphs, edge-subset-induced hyper sub graphs, and edge-deleted hyper sub graphs are precisely its sub graphs, vertex-subset-induced sub graphs, edge-subsetinduced sub graphs, and edge-deleted sub graphs (in the graph-theoretic sense), respectively. However, its vertex-deleted sub graphs are obtained by deleting all singleton edges from its vertex-deleted sub hyper graphs. The union and intersection of hyper graphs is again defined analogously to graphs, and if a hyper graph H is an edge-disjoint union of hyper graphs H1 and H2, then {H1, H2} is a decomposition of H, and we write H = H1 ⊕ H2.

The dual of a non-empty hyper graph H is a hyper graph HT whose incidence matrix is the transpose of the incidence matrix of H. To obtain the dual HT = (E T, V T) of a hyper graph H = (V, E), we label the edges of H as e1, . . . , em (with distinct parallel edges receiving distinct labels). Then let E T = {e1, . . . , em} and V T = {{v T : v ∈ V }}, where v T = {e ∈ E T : v ∈ e} for all v ∈ V . Observe that (v, e) ∈ F(H) if and only if (e, vT ) ∈ F(HT ). Hence (HT) T = H. Lemma 2.4. Let H = (V, E) be a non-empty hyper graph with the dual H = (E T, V T), and let v ∈ V and e ∈ E. Then: T

1.

degH(v) = |v T |.

Graphs: Theory and Algorithms

168

2.

v is an isolated vertex (pendant vertex) in H if and only if v T is an empty edge (singleton edge, respectively) in HT . 3. If |V | ≥ 2, H has no empty edges, and {v} ∉ E, then (H\v) T = HT − v T. 4 . If |E| ≥ 2, H has no isolated vertices, and e contains no pendant vertices, then (H − e) T = HT \e. Proof. The first two statements of the lemma follow straight from the definition of vertex degree. To see the third statement, assume that |V | ≥ 2, H has no empty edges, and {v} ∉ E. Now H\v is obtained from H by deleting vertex v, deleting all flags containing v from F (H), and discarding all resulting empty edges. Hence (H\v) T is obtained from HT by deleting edge v T, deleting all flags containing v T from F (HT), and discarding all resulting isolated vertices. However, any such isolated vertex would in HT correspond either to an isolated vertex or a pendant vertex incident only with the edge v T. This would imply existence of an empty edge or an edge {v} in H, a contradiction. Hence (H\v) T was obtained from HT just by deleting edge v T and all flags containing v T; that is, (H\v) T = HT − v T. To prove the fourth statement, assume that |E| ≥ 2, H has no isolated vertices, and e contains no pendant vertices. Recall that H − e is obtained from H, and similarly (H − e) T from HT, by deleting e and all flags containing e. This operation on HT is exactly vertex deletion provided that (H − e) T has no empty edges. Now an empty edge in (H − e) T corresponds to an isolated vertex in H − e, and hence in H, it corresponds either to an isolated vertex or a pendant vertex incident with e. However, by assumption, H does not have such vertices. We conclude that (H − e) T = HT \e as claimed.

Graphs associated with a hyper graph A hyper graph is, of course, an incidence structure, and hence can be represented with an incidence graph (to be defined below). This representation retains complete information about the hyper graph, and thus allows us to translate problems about hyper graphs into problems about graphs — a much better explored territory Definition 2.5. Let H = (V, E) be a hyper graph with incidence function ψ. The incidence graph G(H) of H is the graph G(H) = (VG, EG) with VG = V ∪ E and EG = {ve : v ∈ V, e ∈ E, v ∈ ψ(e)}. Observe that the incidence graph G (H) of a hyper graph H = (V, E) with

Connection and Separation in Hyper Graphs

169

E ≠ ∅ is a bipartite simple graph with bipartition {V, E}. We shall call a vertex x of G(H) a v-vertex if x ∈ V , and an e-vertex if x ∈ E. Note that the edge set of G(H) can be identified with the flag (multi)set F(H); that is, EG = {ve : (v, e) ∈ F(H)}. The following is an easy observation, hence the proof is left to the reader.

Lemma 2.6. Let H = (V, E) be a non-empty hyper graph and HT = (E , V T) its dual. The incidence graphs G(H) and G(HT ) are isomorphic with an isomorphism φ : V ∪E → E T ∪V T defined by φ(e) = e for all e ∈ E, and φ(v) = v T for all v ∈ V . T

Next, we outline the relationship between sub hyper graphs of a hyper graph and the sub graphs of its incidence graph. The proof of this lemma is straightforward and hence omitted. Lemma 2.7. Let H = (V, E) be a hyper graph and H′ = (V ′, E′) a sub hyper graph of H. Then: 1.

G (H′) is the sub graph of G (H) induced by the vertex set V ′ ∪ E ′. 2. If H′ is a hyper sub graph of H, then in addition, degG(H′) (e) = degG(H) (e) = |e| for all e ∈ E ′ Conversely, take a sub graph G′ of G (H). Then: 1.

V (G′) = V ′ ∪ E ′ for some V ′ ⊆ V and E ′ ⊆ E, and E (G′) ⊆ {ve: v ∈ V ′, e ∈ E ′, v ∈ e}. 2. G′ is the incidence graph of a sub hyper graph of H if and only if V ′ ≠ ∅ and for all e ∈ E ′ we have {ve : v ∈ e ∩ V ′} ⊆ E(G′ ). 3. G′ is the incidence graph of a hyper sub graph of H if and only if V ′ ≠ ∅ and degG′(e) = degG(H) (e) = |e| for all e ∈ E ′ . In the following lemma, we determine the incidence graphs of vertexdeleted sub hyper graphs and edge-deleted hyper sub graphs. Lemma 2.8. Let H = (V, E) be a hyper graph. Then: 1. 2.

For all e ∈ E, we have G(H − e) = G(H)\e. If |V | ≥ 2, H has no empty edges, and v ∈ V is such that {v} ∉ E, then G (H\v) = G (H)\v. Proof. 1. Recall that H −e is obtained from H by deleting e from E, thus also destroying all flags containing e. This is equivalent to deleting e from the vertex set of G(H), as well as all edges of G(H) incident with e, which results in the vertex-deleted sub graph G(H)\e.

Graphs: Theory and Algorithms

170

2.

Now H\v is obtained from H by deleting v from V and from all edges containing v, and then discarding all resulting empty edges. However, if H has no empty edges and {v} ∉ E, then there are no empty edges to discard, and so this operation is equivalent to deleting v from the vertex set of G (H) and deleting all edges of G(H) incident with v, resulting in the vertex-deleted sub graph G(H)\v. Hence G (H)\v = G (H\v).

CONNECTION IN HYPER GRAPHS Walks, trails, paths, cycles In this section, we would like to systematically generalize the standard graph-theoretic notions of walks, trails, paths, and cycles to hyper graphs. In this context, we need to distinguish between distinct parallel edges, hence the original definition of a hyper graph that includes the incidence function will be used. Definition 3.1. Let H = (V, E) be a hyper graph with incidence function ψ, let u, v ∈ V , and let k ≥ 0 be an integer. A (u, v)-walk of length k in H is a sequence v0e1v1e2v2 . . . vk−1ekvk of vertices and edges (possibly repeated) such that v0, v1, . . . , vk ∈ V , e1, . . . , ek ∈ E, v0 = u, vk = v, and for all i = 1, 2, . . . , k, the vertices vi−1 and vi are adjacent in H via the edge ei . If W = v0e1v1e2v2 . . . vk−1ekvk is a walk in H, then vertices v0 and vk are called the endpoints of W, and v1, . . . , vk−1 are the internal vertices of W.

We denote the set of all edges of a walk W by E(W), and the set of all its vertices by V (W); that is, V (W) = ∪ e∈E(W) e. Furthermore, vertices v0, v1, . . . , vk are called the anchor vertices (or anchors) of W, and we write Va(W) = {v0, v1, . . . , vk}.

Observe that since adjacent vertices are by definition distinct, no two consecutive vertices in a walk are the same. Note that the edge set E(W) of a walk W may contain distinct parallel edges. Recall that a trail in a graph is a walk with no repeated edges. For a walk in a graph, having no repeated edges is necessary and sufficient for having no repeated flags; in a hyper graph, only sufficiency holds. This observation suggests two possible ways to define a trail.

Definition 3.2. Let W = v0e1v1e2v2 . . . vk−1ekvk be a walk in a hyper graph H = (V, E) with incidence function ψ.

Connection and Separation in Hyper Graphs

171

1.

If the anchor flags (v0, e1),(v1, e1),(v1, e2), . . . ,(vk−1, ek),(vk, ek) are pairwise distinct, then W is called a trail. 2. If the edges e1, . . . , ek are pairwise distinct, then W is called a strict trail. 3. If the anchor flags (v0, e1),(v1, e1),(v1, e2), . . . ,(vk−1, ek),(vk, ek) and the vertices v0, v1, . . . , vk are pairwise distinct (but the edges need not be), then W is called a pseudo path. 4. If both the vertices v0, v1, . . . , vk and the edges e1, . . . , ek are pairwise distinct, then W is called a path. We emphasize that in the above definitions, “distinct” should be understood in the strict sense; that is, parallel edges need not be distinct. We extend the above definitions to closed walks in the usual way. Definition 3.3. Let W = v0e1v1e2v2 . . . vk−1ekvk be a walk in a hyper graph H = (V, E) with incidence function ψ. If k ≥ 2 and v0 = vk, then W is called a closed walk. Moreover: 1.

If W is a trail (strict trail), then it is called a closed trail (closed strict trail, respectively). 2. If W is a closed trail and the vertices v0, v1, . . . , vk−1 are pairwise distinct (but the edges need not be), then W is called a pseudo cycle. 3. If the vertices v0, v1, . . . , vk−1 and the edges e1, . . . , ek are pairwise distinct, then W is called a cycle. From the above definitions, the following observations are immediate. Lemma 3.4. Let W be a walk in a hyper graph H. Then: 1.

If W is a trail, then no two consecutive edges in W are the same (including the last and the first edge if W is a closed trail). 2. If W is a (closed) strict trail, then it is a (closed) trail. 3. If W is a pseudo path (pseudo cycle), then it is a trail (closed trail, respectively), but not necessarily a strict trail (closed strict trail, respectively). 4. If W is a path (cycle), then it is both a pseudo path (pseudo cycle, respectively) and a strict trail (closed strict trail, respectively). We mention that several special types of hyper graph cycles have been defined and studied in the literature, for example, loose cycles and tight cycles. Our definition coincides with the one in [2] and [7]; in fact, our cycles are sometimes called Berge cycles.

Graphs: Theory and Algorithms

172

In a graph, a path or cycle can be identified with the corresponding sub graph (also called path or cycle, respectively). This is not the case in hyper graphs. First, we note that there are (at least) two ways to define a sub hyper graph associated with a path or cycle. We define these more generally for walks. Definition 3.5. Let W be a walk in a hyper graph H = (V, E). Define the hyper sub graph H(W) and a sub hyper graph H′ (W) of H associated with the walk W as follows: H(W) = (V (W), E(W)) and H′ (W) = (Va(W), {{e ∩ Va(W) : e ∈ E(W)}}).

That is, H′ (W) is the sub hyper graph of H (W) induced by the set of anchor vertices Va(W).

Second, we observe that, even when W is a path or a cycle, not much can be said about the degrees of the vertices in the associated sub hyper graphs H(W) and H′ (W). Thus, unlike in graphs, we cannot use a path (cycle) W (as a sequence of vertices and edges) and its associated sub hyper graphs H (W) and H′ (W) interchangeably. The following lemma will justify the terminology introduced in this section. Lemma 3.6. Let H = (V, E) be a hyper graph and G = G(H) its incidence graph. Let vi ∈ V for i = 0, 1, . . . , k, and ei ∈ E for i = 1, . . . , k, and let W = v0e1v1e2v2 . . . vk−1ekvk be an alternating sequence of vertices and edges of H. Denote the corresponding sequence of vertices in G by WG. Then the following hold: 1.

W is a (closed) walk in H if and only if WG is a (closed) walk in G with no two consecutive v-vertices the same. 2. W is a trail (path, cycle) in H if and only if WG is a trail (path, cycle, respectively) in G. 3. W is a strict trail in H if and only if WG is a trail in G that visits every e ∈ E at most once. 4. W is a pseudo path (pseudo cycle) in H if and only if WG is a trail (closed trail, respectively) in G that visits every v ∈ V at most once. Proof. 1. If W is a walk in H, then any two consecutive elements of the sequence W are incident in H, and hence the corresponding

Connection and Separation in Hyper Graphs

173

vertices are adjacent in G. Thus WG is a walk in G. Moreover, no two consecutive vertices in W are the same, whence not two consecutive v-vertices in WG are the same. The converse is shown similarly. Clearly W is closed if and only if WG is. Observe that the anchor vertices and the edges of W correspond to the v-vertices and e-vertices of WG, respectively, and the anchor flags of W correspond to the edges of WG. 2.

If W is a trail in H, then W is a walk with no repeated anchor flags; hence WG is a walk in G with no repeated edges, that is, a trail. Conversely, if WG is a trail in G, then it is a walk with no repeated edges, and hence no two identical consecutive v-vertices. It follows that W is a walk in H with no repeated anchor flags, that is, a trail. Similarly, if W is a path (cycle) in H, then W is a walk with no repeated edges and no repeated vertices (except the endpoints for a cycle). Hence WG is a walk in G with no repeated vertices (except the endpoints for a cycle), that is, a path (cycle, respectively). The converse is shown similarly. 3.

If W is a strict trail in H, then it is a trail with no repeated edges. Hence WG is a trail in G with no repeated e-vertices. The converse is shown similarly. 4. If W is a pseudo path (pseudo cycle) in H, then it is a trail with no repeated vertices (except the endpoints for a pseudo cycle). Hence WG is a trail in G with no repeated v-vertices (except the endpoints for a pseudo cycle). The converse is similar. The next observations are easy to see, hence the proof is omitted. Lemma 3.7. Let H = (V, E) be a non-empty hyper graph and HT = (E T, V T) it’s dual. Let vi ∈ V for i = 0, 1, . . . , k − 1, and ei ∈ E for i = 0, 1, . . . , k − 1, and let W = v0e0v1e1v2 . . . vk−1ek−1v0 be a closed walk in H. Denote WT = , where for each vertex vi of H, the symbol v T i denotes the corresponding edge in HT . Then the following hold: 1. 2. 3. 4.

If for all i , then WT is a closed walk in HT. If W is a closed trail (cycle) in H, then WT is a closed trail (cycle, respectively) in HT. If W is a strict closed trail in H, then WT is a pseudo cycle in HT. If W is a pseudo cycle in H, then WT is a strict closed trail in HT.

174

Graphs: Theory and Algorithms

Connected hyper graphs Connected hyper graphs are defined analogously to connected graphs, using existence of walks (or equivalently, existence of paths) between every pair of vertices. The main result of this section is the observation that a hyper graph (without empty edges) is connected if and only if its incidence graph is connected. The reader will observe that existence of empty edges in a hyper graph does not affect its connectivity; however, it does affect the connectivity of the incidence graph. Definition 3.8. Let H = (V, E) be a hyper graph. Vertices u, v ∈ V are said to be connected in H if there exists a (u, v)-walk in H. The hyper graph H is said to be connected if every pair of distinct vertices are connected in H. Lemma 3.9. Let H = (V, E) be a hyper graph, and u, v ∈ V. There exists a (u, v)-walk in H if and only if there exists a (u, v)-path.

Proof. Suppose H has a (u, v)-walk. By Lemma 3.6, it corresponds to a (u, v)-walk in the incidence graph G(H), and by a classical result in graph theory, existence of a (u, v)-walk in a graph guarantees existence of a (u, v)-path. Finally, by Lemma 3.6, a (u, v)-path in G(H) (since u, v ∈ V ) corresponds to a (u, v)-path in H. The converse obviously holds by definition.

It is clear that vertex connection in a hyper graph H = (V, E) is an equivalence relation on the set V. Hence the following definition makes sense.

Definition 3.10. Let H = (V, E) be a hyper graph, and let V ′ ⊆ V be an equivalence class with respect to vertex connection. The hyper sub graph of H induced by V ′ is called a connected component of H. We denote the number of connected components of H by c(H).

Observe that, by the definition of a vertex-subset-induced hyper sub graph, the connected components of a hyper graph have no empty edges. Alternatively, the connected components of H can be defined as the maximal connected hyper sub graphs of H that have no empty edges. It is easy to see that for a Hyper graph H = (V, E) with the multiset of empty edges denoted E0, the hyper sub graph H − E0 decomposes into the connected components of H. Theorem 3.11. Let H = (V, E) be a Hyper graph without empty edges. Then H is connected if and only if its incidence graph G = G(H) is connected. Proof. Assume H is connected. Take any two vertices x, y of G. If x and y are both v vertices, then there exists an (x, y)-walk in H, and hence, by

Connection and Separation in Hyper Graphs

175

Lemma 3.6, an (x, y)-walk in G. If x is an e-vertex and y is a v-vertex in G , then x is a non-empty edge in H. Choose any v ∈ x. Since H is connected, it possesses a (v, y)-walk W. Then xW is an (x, y)-walk in G. The remaining case x, y ∈ E is handled similarly. We conclude that G is connected.

Assume G is connected. Take any two vertices u, v of H. Then there exists (u, v)-path in G, and hence by Lemma 3.6, a (u, v)-path in H. Therefore H is connected. Corollary 3.12. Let H be a Hyper graph and G = G (H) its incidence graph. Then: 1.

If H′ is a connected component of H, then G (H′) is a connected component of G. 2. If G′ is a connected component of G with at least one v-vertex, then there exists a connected component H′ of H such that G′ = G (H′). 3. If H has no empty edges, then there is a one-to-one correspondence between connected components of H and connected components of its incidence graph. Proof. 1 . Let H′ be a connected component of H, and let G′ = G(H′ ). Since H′ has no empty edges by definition, G′ is connected by Theorem 3.11. Let G′′ be the connected component of G containing G′ as a subgraph. Then G′′ contains v-vertices and degG′′(e) = degG(e) for all e-vertices e of G′′, and so by Lemma 2.7, G′′ = G(H′′) for some hyper sub graph H′′ of H. Since G′′ is connected and the incidence graph of a Hyper graph, it has no isolated e-vertices. Hence H′′ has no empty edges, and so by Theorem 3.11, H′′ is connected since G′′ is. Now H′ is a maximal connected hyper sub graph of H without empty edges, and a hyper sub graph of a connected hyper sub graph H′′ without empty edges; it must be that H′′ = H′ . Consequently, G′ = G′′ and so G′ is indeed a connected component of G. 2. Let G′ be a connected component of G with at least one v-vertex. Then degG′(e) = degG(e) for all e-vertices e of G′ , and so by Lemma 2.7, G′ = G(H′ ) for some hyper sub graph H′ of H. Since G′ is connected and the incidence graph of a Hyper graph, it has no isolated e-vertices; hence H′ has no empty edges. Thus, by Theorem 3.11, H′ is connected since G′ is. Let H′′ be the connected component of H containing H′ , and G′′ = G(H′′). Again by Theorem 3.11, G′′ is connected, and hence G′′ = G′ by

Graphs: Theory and Algorithms

176

the maximality of G′. It follows that H′ = H′′, so indeed G′ = G(H′ ), where H′ is a connected component of H. 3. Since H has no empty edges, every connected component of G has at least one v-vertex. The conclusion now follows directly from the first two statements of the corollary. Corollary 3.13. Let H be a Hyper graph without empty edges and G = G(H) its incidence graph. Then: 1. 2.

c(H) = c(G). If H is non-empty and has no isolated vertices, and HT is its dual, then c(H) = c(HT ). Proof. 1. Since H has no empty edges, by Corollary 3.12 there is a one-to-one correspondence between the connected components of H and G. Therefore, c(H) = c(G). 2.

Assume H is non-empty and has no isolated vertices. Then HT is well defined and has no empty edges, and so c(HT ) = c(G(HT )) by the first statement. Since by Lemma 2.6 a Hyper graph and its dual have isomorphic incidence graphs, it follows that c(HT ) = c(G(HT )) = c(G) = c(H).

Cut Edges and Cut Vertices In this section, we define cut edges and cut vertices in a Hyper graph analogously to those in a graph. The existence of cut edges and cut vertices is one of the first measures of strength of connectivity of a connected graph. In Hyper graphs, however, we must consider two distinct types of cut edges. Definition 3.14. A cut edge in a Hyper graph H = (V, E) is an edge e ∈ E such that c(H − e) > c(H). Lemma 3.15. Let e be a cut edge in a Hyper graph H = (V, E). Then c(H) < c(H − e) ≤ c(H) + |e| − 1. Proof. The inequality on the left follows straight from the definition of a cut edge. To see the inequality on the right, first observe that e is not empty. Let H1, . . . , Hk be the connected components of H −e whose vertex sets intersect e. Since e has at least one vertex in common with each V (Hi), we have |e| ≥ k. Hence c(H − e) = c(H) + k − 1 ≤ c(H) + |e| − 1. Definition 3.16. A cut edge e of a Hyper graph H is called strong if c(H −e) = c(H)+|e|−1, and weak otherwise. Observe that a cut edge has cardinality at least two, and that any cut edge

Connection and Separation in Hyper Graphs

177

of cardinality two (and hence any cut edge in a simple graph) is necessarily strong. Recall that an edge of a graph is a cut edge if and only if appears in no cycle. We shall now show that an analogous statement holds for hyper graphs if we replace “cut edge” with “strong cut edge”. Theorem 3.17. Let e be an edge in a connected Hyper graph H = (V, E). The following are equivalent: 1. 2.

e is a strong cut edge, that is, c(H − e) = |e|. e contains exactly one vertex from each connected component of H − e. 3. e lies in no cycle of H. Proof. (1) ⇒ (2): Let e be a strong cut edge of H. Since H is connected, the edge e must have at least one vertex in each connected component of H −e. Since there are |e| connected components of H − e, the edge e must have exactly one vertex in each of them. (2) ⇒ (1): Assume e contains exactly one vertex from each connected component of H − e. Then clearly c(H − e) = |e|.

(2) ⇒ (3): Assume e contains exactly one vertex from each connected component of H −e, and suppose e lies in a cycle C = v0e1v1e2v2 . . . vk−1ev0 of H. Then v0e1v1e2v2 . . . vk−1 is a path in H −e, and so v0 and vk−1 are two vertices of e in the same connected component of H − e, a contradiction. Hence e lies in no cycle of H. (3) ⇒ (2): Assume e lies in no cycle of H. Since H is connected, the edge e must contain at least one vertex from each connected component of H −e. Suppose e contains two vertices u and v in the same connected component H′ of H − e. Then H′ contains a (u, v)-path P, and P veu is a cycle in H that contains e, a contradiction. Hence e possesses exactly one vertex from each connected component of H − e. The above theorem can be easily generalized to all (possibly disconnected) hyper graphs as follows. Corollary 3.18. Let e be an edge in a Hyper graph H = (V, E). The following are equivalent: 1. 2. 3.

e is a strong cut edge, that is, c(H − e) = c(H) + |e| − 1. e contains exactly one vertex from each connected component of H −e that it intersects. e lies in no cycle of H.

178

Graphs: Theory and Algorithms

We know that an even graph has no cut edges; in other words, every edge of an even graph (that is, a graph with no odd-degree vertices) lies in a cycle. This statement is false for hyper graphs, as the example below demonstrates. In the following two theorems, however, we present two generalizations to hyper graphs that do hold. Counterexample 3.19. For every even n ≥ 2, define a Hyper graph H = (V, E) as follows. Let V = {vi : i = 1, . . . , 2n} and E = {ei : i = 1, . . . , 2n}, and let F(H) = {(vi , ej ) : i, j = 1, . . . , n} ∪ {(vi , ej ) : i, j = n + 1, . . . , 2n} ∪ {(v1, en+1)} − {(v1, e1)}. Then every vertex in H has degree n, which is even, but en+1 is a cut edge in H. Theorem 3.20. Let H = (V, E) be a k-uniform Hyper graph such that degH(u) ≡ 0 (mod k) for every vertex u of H. Then H has no cut edges.

Proof. Suppose e is a cut edge of H, and let H1 = (V1, E1) be a connected component of H ∑− e that contains a vertex of e. Furthermore, let r = |e ∩ V1|. Then 1 ≤ r ≤ k − 1, and (mod k). However, ∑ v∈V1 degH1 (v) = ∑ f∈E1 |f| = k|E1|, a contradiction. Hence H cannot have cut edges. Theorem 3.21. Let H = (V, E) be a Hyper graph such that the degree of each vertex and the cardinality of each edge are even. If e is a cut edge of H, then every connected component of H − e contains an even number of vertices of e. In particular, H has no strong cut edges. Proof. Suppose e is a cut edge of H, and let H1 = (V1, E1) be any connected component of H − e. Furthermore, let r = |e ∩ V1|. Then ∑ v∈V1 degH1 (v) = (∑ ∑ degH(v)) − r = ∑f∈E1 |f|. Since ∑ v∈V1 degH(v) and ∑ f∈E1 |f| are both even, v∈V1 so is r. Thus e intersects every connected component in an even number of vertices, and hence by Corollary 3.18 cannot be a strong cut edge.

We now turn our attention to cut vertices. Recall that the vertex-deleted sub Hyper graph H\v is obtained from H by deleting v from the vertex set, as well as from all edges containing v, and then discarding any resulting empty edges. Definition 3.22. A cut vertex in a Hyper graph H = (V, E) with |V | ≥ 2 is a vertex v ∈ V such that c(H\v) > c(H).

Before we can prove a result similar to Lemma 3.15 for cut vertices, we need to examine the relationship between cut vertices and cut edges of a Hyper graph and its dual, as well as the relationship between cut vertices and cut edges of a Hyper graph and cut vertices of its incidence graph.

Connection and Separation in Hyper Graphs

179

Theorem 3.23. Let H = (V, E) be a Hyper graph without empty edges, and G = G(H) be its incidence graph. 1.

Take any e ∈ E. Then e is a cut edge of H if and only if it is a cut vertex of G. 2. Let |V | ≥ 2 and take any v ∈ V such that {v} ⊆∈ E. Then v is a cut vertex of H if and only if it is a cut vertex of G. Proof. 1. By Lemma 2.8, we have G(H − e) = G\e. Since H, and hence H − e, has no empty edges, Corollary 3.13 tells us that c(H) = c(G) and c(H − e) = c(G(H − e)). Hence c(H − e) = c(G\e). Thus c(H − e) − c(H) = c(G\e) − c(G), and it follows that e is a cut edge of H if and only if it is a cut vertex of G. 2. Since H has no empty edges and {v} ∉ E, Lemma 2.8 shows that G(H\v) = G\v. Since H and H\v have no empty edges, Corollary 3.13 gives c(H) = c(G) and c(H\v) = c(G(H\v)), respectively. Hence c(H\v) − c(H) = c(G\v) − c(G), and v is a cut vertex of H if and only if it is a cut vertex of G. In the next corollary, recall that we denote the dual of a Hyper graph H = (V, E) by HT = (E T, V T), where E T is the set of labels for the edges in E, V T = {v T: v ∈ V}, and v T = {e ∈ E T: v ∈ e} for all v ∈ V.

Corollary 3.24. Let H = (V, E) be a non-empty Hyper graph with neither empty edges nor isolated vertices, and let HT be its dual. 1.

Let |E| ≥ 2 and let e ∈ E be an edge without pendant vertices. Then e is a cut edge of H if and only if e is a cut vertex of HT. 2. Let |V | ≥ 2 and let v ∈ V be such that {v} ∉ E. Then v is a cut vertex of H if and only if v T is a cut edge of HT. Proof. 1. First, since H has no empty edges, by Theorem 3.23, e is a cut edge of H if and only if it is a cut e-vertex of G(H), and hence if and only if e is a cut v-vertex of G(HT ). On the other hand, since e contains no pendant vertices of H, we have that {e} ∉ V T . Also, HT has no empty edges since H has no isolated vertices. Hence by Theorem 3.23, e is a cut vertex of HT if and only if e is a cut v-vertex of G (HT). The result follows. 2. By Theorem 3.23, since H has no empty edges and {v} ∉ E, vertex v is a cut vertex of H if and only if v is a cut v-vertex of G(H), and hence if and only if v T is a cut e-vertex of G(HT ). Again by Theorem 3.23, since HT has no empty edges, this is the case if and only if v T is a cut edge of HT.

180

Graphs: Theory and Algorithms

Corollary 3.25. Let H = (V, E) be a Hyper graph with |V | ≥ 2, |E| ≥ 1, and with neither empty edges nor isolated vertices. Furthermore, let v be a cut vertex such that {v} ∉ E. Then c(H\v) ≤ c(H) + degH(v) − 1.

Proof. Consider the dual HT of H. Since v is a cut vertex of H and {v} ∉ E, by Corollary 3.24, the edge v T of HT is a cut edge, and hence c(HT − v T ) ≤ c(HT ) + |v T | − 1 by Lemma 3.15. By Corollary 3.13 we have c(HT ) = c(H), and by Lemma 2.4, we have |v T | = degH(v). It remains to show that c(HT − v T ) = c(H\v). Using Corollary 3.13 and Lemma 2.8, we have since HT − v T has no empty edges, since G(HT − v T ) = G(HT )\v T , and since G(HT )\v T is isomorphic to G(H)\v, which in turn is equal to G(H\v) because {v} ∉ E. We conclude that c(H\v) ≤ c(H) + degH(v) − 1.

A graph with a cut edge and at least three vertices necessarily possesses a cut vertex. Here is the analogue for Hyper graphs.

Theorem 3.26. Let H = (V, E) be a Hyper graph with a cut edge e such that for some non-trivial connected component H′ of H − e, we have |e ∩ V (H′)| = 1. Then H has a cut vertex. Proof. We may assume H is connected. Let H′ and H′′ be two connected components of H −e, with H′ non-trivial and e∩V (H′) = {u}. Take any x ∈ V (H′) − {u} and y ∈ V (H′′). Since e is a cut edge, every (x, y)-path P in H must contain the edge e, and since u is the only vertex of e in V (H′), any such path P must also contain u as an anchor vertex. Hence x and y are disconnected in H\u, and u is a cut vertex of H. Corollary 3.27. Let H = (V, E) be a connected Hyper graph with a strong cut edge e such that |e| < |V |. Then H has a cut vertex. Proof. Let H1, . . . , Hk be the connected components of H − e. By Theorem 3.17, the edge e contains exactly one vertex from each Hi (for i = 1, . . . , k), and so k = |e| < |V |. Hence |V (Hi)| ≥ 2 for at least one connected component Hi , and |e∩V (Hi)| = 1 since e is a strong cut edge. It follows by Theorem 3.26 that H has a cut vertex.

Blocks and non-separable hyper graphs Throughout this section, we shall assume that our Hyper graphs are connected and have no empty edges. We begin by extending the notion of a cut vertex as follows. Definition 3.28. Let H = (V, E) be a connected Hyper graph without

Connection and Separation in Hyper Graphs

181

empty edges. A vertex v ∈ V is a separating vertex for H if H decomposes into two non-empty connected hyper sub graphs with just vertex v in common. That is, H = H1 ⊕ H2, where H1 and H2 are two non-empty connected hyper sub graphs of H with V (H1) ∩ V (H2) = {v}.

Theorem 3.29. Let H = (V, E) be a connected Hyper graph without empty edges, with |V | ≥ 2 and v ∈ V. 1. 2.

If v is a cut vertex of H, then v is a separating vertex of H. If v is a separating vertex of H and {v} ∉ E, then v is a cut vertex of H. Proof. 1. Assume v is a cut vertex of H, let V1 be the vertex set of one connected component of H\v, and let V2 = V (H\v) − V1. Furthermore, let H1 and H2 be the subHyper graphs induced by the sets V1 ∪ {v} and V2 ∪ {v}, respectively, so that E(Hi) = {{e ∩ (Vi ∪ {v}) : e ∈ E, e ∩ (Vi ∪ {v}) ≠ ∅}} for i = 1, 2. Clearly V (H1)∩V (H2) = {v}. We show that H1 and H2 are in fact hyper sub graphs of H with just vertex v in common. To see that each Hi is connected, note that every vertex x ∈ Vi is connected to v in H, and hence also in Hi. Since H1 and H2 are non-trivial and connected, they must be non-empty. Thus v is a separating vertex for H. Assume v is a separating vertex of H such that {v} ∉ E. Let H1 and H2 be non-empty connected hyper sub graphs of H with just vertex v in common such that H = H1 ⊕H2. Hence either e ∈ E (H1) or e ∈ E(H2) for all e ∈ E. For each i = 1, 2, since Hyper graph Hi is nonempty and connected without edges of the form {v}, there exists a vertex vi ∈ V (Hi) − {v} connected to v in Hi. We can now see that vertices v1 and v2 are connected in H but not in H\v, since every (v1, v2)-path in H must contain v as an anchor vertex. It follows that H\v is disconnected, and so v is a cut vertex of H. Observe that the additional condition in the second statement of the theorem cannot be omitted: a vertex incident with a singleton edge and at least one more edge (which, as we show below, is necessarily a separating vertex) need not be a cut vertex. A simple example is a Hyper graph H = (V, E) with V = {u, v} and E = {e1, e2} for e1 = {v} and e2 = {u, v}. Then v is a separating vertex of H since H = H1 ⊕ H2 for H1 = ({v}, {e1}) and H2 = ({u, v}, {e2}), so v is a separating vertex. However, v is not a cut vertex since H\v = ({u}, {{u}}) is connected. 2.

182

Graphs: Theory and Algorithms

Lemma 3.30. Let H = (V, E) be a connected Hyper graph without empty edges, with |E| ≥ 2, and with v ∈ V such that {v} ∈ E. Then v is a separating vertex for H.

Proof. Since H is connected and has at least two (non-empty) edges, it must have at least two edges incident with v. Let e1 = {v} and e2 be another edge incident with v. Furthermore, let H1 = ({v}, {e1}) and H2 = (V, E − {e1}). Then H1 and H2 are two non-empty connected hyper sub graphs of H with just vertex v in common such that H = H1 ⊕ H2. Hence v is a separating vertex for H.

Recall that in a graph without loops, separating vertices are precisely the cut vertices. Hence these two terms are equivalent for the incidence graph of a Hyper graph. Next, we determine the correspondence between separating vertices of a Hyper graph and separating vertices (cut vertices) of its incidence graph. Theorem 3.31. Let H = (V, E) be a connected Hyper graph without empty edges, and G = G(H) be its incidence graph. Take any v ∈ V. Then v is a separating vertex of H if and only if it is a separating vertex (cut vertex) of G.

Proof. If |V | ≥ 2 and {v} ∉ E, then by Theorem 3.29, v is a separating vertex of H if and only if it is a cut vertex of H and therefore, by Theorem 3.23, if and only if it is a cut vertex (separating vertex) of G. Assume e = {v} ∈ E. If v is a separating vertex of H, then it must be incident with another edge e ′ . Hence in the graph G\v, vertex e is an isolated vertex and e ′ lies in another connected component, showing that v is a cut vertex for G. Conversely, if v is a cut vertex of G, then G must contain e-vertices adjacent to v other than e, and hence H contains edges incident with v other than e. Hence, by Lemma 3.30, v is a separating vertex of H. The remaining case is that |V | = 1 and {v} ∉ E. Then H must be empty, G is a trivial graph, and v is a separating vertex for neither. Corollary 3.32. Let H = (V, E) be a connected non-empty Hyper graph with neither empty edges nor isolated vertices, and let HT be its dual. Let v ∈ V and e ∈ E, and let v T and e be the corresponding edge and vertex, respectively, in HT. Then:

1 . v is a separating vertex of H if and only if v T is a cut edge of HT . 2 . e is a cut edge of H if and only if it is a separating vertex of HT . Proof. Observe that by Corollary 3.13, HT is connected since H is. Clearly, it is also nonempty with neither empty edges nor isolated vertices.

Connection and Separation in Hyper Graphs

183

1

. By Theorem 3.31, v is a separating vertex of H if and only if it is a cut vertex of its incidence graph G(H), and by Theorem 3.23, v T is a cut edge of HT if and only if it is a cut vertex of G(HT ). Since G(H) and G(HT ) are isomorphic with an isomorphism mapping v to v T , the result follows. 2 . Interchanging the roles of H and HT, this statement follows from the previous one. We shall now define blocks of a hyper graph, and in the rest of this section, investigate their properties. Definition 3.33. A connected Hyper graph without empty edges that has no separating vertices is called non-separable. A block of a hyper graph H is a maximal non-separable hyper sub graph of H. Lemma 3.34. Let H be a connected hyper graph without empty edges and B an empty block of H. Then H = B, and H is empty and trivial. Proof. Since B is empty and connected, it contains a single vertex, say v. If H is nonempty, then it contains an edge e incident with v. But then (e, {e}) is a non-separable hyper sub graph of H that properly contains the block B, a contradiction. Hence H is empty. Since it is connected, it must also be trivial (that is, V = {v}). Consequently, H = B. In a graph, every cycle is contained within a block. What follows is the analogous result for hyper graphs. Lemma 3.35. Let H be a Hyper graph without empty edges, C a cycle in H, and H(C) and H′ (C) the hyper sub graph and sub Hyper graph, respectively, of H associated with C (see Definition 3.5). Then H(C) and H′ (C) are non-separable. Proof. As in Definition 3.5, let V (C), Va(C), and E(C) be the sets of vertices, anchors, and edges of the cycle C, respectively. Recall that H(C) = (V (C), E(C)) and H′ (C) = (Va(C), {{e ∩ Va(C): e ∈ E(C)}}).

To see that H(C) is non-separable, first observe that it is connected. Let GC be the incidence graph of H(C). Then GC consists of a cycle CG with v-vertices and e-vertices alternating, and with additional v-vertices (corresponding to vertices of C that are not anchors) adjacent to some of the e-vertices of the cycle. Suppose v ∈ V is a separating vertex of H(C). By Theorem 3.31, v is then a cut v-vertex of GC. Because GC is bipartite, every connected component of GC\v must contain e-vertices. However, GC\v contains the cycle CG if v is not an anchor, and the path CG\v if v is an anchor, both containing all e-vertices of GC. Thus GC\v must have a single

Graphs: Theory and Algorithms

184

connected component, and GC has no cut vertices, a contradiction. Hence H(C) is non-separable. Similarly it can be shown that H′ (C) is non-separable. (Note that the incidence graph of H′ (C) possesses a Hamilton cycle.) We are now ready to show that a hyper graph decomposes into its blocks just as a graph does. Theorem 3.36. Let H = (V, E) be a connected Hyper graph without empty edges. Then: 1

. The intersection of any two distinct blocks of H contains no edges and at most one vertex. 2 . The blocks of H form a decomposition of H. 3. The hyper sub graph H(C) associated with any cycle C of H is contained within a block of H. Proof. 1. Suppose B1 and B2 are distinct blocks of H that share more than just a single vertex. First assume that B1 and B2 have at least two vertices in common, and let B = B1 ∪B2. We’ll show B is a non-separable hyper graph. First, B is connected since B1 and B2 are connected with intersecting vertex sets. Take any v ∈ V (B). Can v be a separating vertex of B? Since B1 and B2 are non-separable, v is not a separating vertex in either block, and hence by Theorem 3.29, v is not a cut vertex in either block, and B1\v and B2\v are connected. Since B\v = (B1\v) ∪ (B2\v), and B1\v and B2\v are connected with at least one common vertex, it follows that B\v is connected. Hence v is not a cut vertex of B. If v is a separating vertex of B, then by Theorem 3.29, we must have e ∈ E(B) for e = {v}. Hence, without loss of generality, e ∈ E(B1). But then, by Lemma 3.30, v is a separating vertex of B1, because B1 is connected with at least two vertices and hence at least one more edge incident with v — a contradiction. Hence B is a non-separable hyper sub graph of H, and since B1 and B2 are maximal non-separable hyper sub graphs of H, we must have B1 = B2 = B, a contradiction.

Hence B1 and B2 have at most one common vertex. Suppose they have a common edge e. Then e must be a singleton edge, say e = {v}. If B1 or B2 contains another edge, then by Lemma 3.30, v is a separating vertex for this block, a contradiction. Hence B1 = B2 = ({v}, {e}), again a contradiction. We conclude that B1 and B2 have no common edges and at most one common vertex. 2

. If H has an isolated vertex v, then V = {v} and E = ∅, so H is a block. Hence assume every vertex of H is incident with an edge. Observe that any e ∈ E induces a hyper sub graph (e, {e}) of H,

Connection and Separation in Hyper Graphs

185

which is non-separable and hence is a hyper sub graph of a block of H. Thus every edge and every vertex of H is contained in a block. Since by the first statement of the theorem no two blocks share an edge, every edge of H is contained in exactly one block, and H is an edge-disjoint union of its blocks. 3. By Lemma 3.35, the hyper sub graph H(C) of a cycle C is nonseparable, and hence a hyper sub graph of a block of H. The next lemma will be used several times. Lemma 3.37. Let H′ be a connected hyper sub graph of a connected Hyper graph H without empty edges, and v ∈ V (H′). If H′ contains edges of two blocks of H that intersect in vertex v, then v is a separating vertex of H′.

Proof. Let B1 and B2 be distinct blocks of H intersecting in vertex v such that H′ contains an edge from each of them. Note that B1 and B2 must both be non-empty, since otherwise B1 = B2 = H is empty by Lemma 3.34. If B1 is trivial, then {v} ∈ E(B1) ∩ E(H′ ), and v is a separating vertex of H′ by Lemma 3.30. Hence assume B1 and B2 are both non-trivial. Since H′ is connected, we may assume there exist a vertex x adjacent to v in B1∩H′ via edge e1, and a vertex y adjacent to v in B2 ∩ H′ via edge e2. Suppose there exists an (x, y)-path P in H′\v. Then P ye2ve1x is a cycle in H′ containing vertices v, x, and y. By Statement (3) of Theorem 3.36, these three vertices lie in a common block B, and by Statement (1) of the same result, B1 = B = B2, a contradiction. Hence x and y must lie in distinct connected components of H′\v. It follows that v is a cut vertex of H′, and hence a separating vertex of H′ by Theorem 3.29.

Theorem 3.38. Let H = (V, E) be a connected Hyper graph without empty edges, and v ∈ V. Then v is a separating vertex of H if and only if it lies in more than one block.

Proof. Assume v is a separating vertex of H. Then H = H1 ⊕ H2, where H1 and H2 are non-empty connected hyper sub graphs with just vertex v in common. Hence there exist e1 ∈ E(H1) and e2 ∈ E(H2) such that v ∈ e1 ∩ e2. By Statement (2) of Theorem 3.36, there exist blocks B1 and B2 of H such that e1 ∈ E(B1) and e2 ∈ E(B2). Observe that B1 ∩ H1 is connected: since B1 is connected, and H1 and H2 intersect only in the vertex v, every vertex in B1 ∩ H1 is connected to v in B1 ∩ H1. Similarly, B1 ∩ H2 is connected.

Suppose that B1 = B2. Then B1 = (B1 ∩ H1) ⊕ (B1 ∩ H2) with B1 ∩ H1 and B1 ∩ H2 connected, non-empty, and intersecting only in vertex v — a

186

Graphs: Theory and Algorithms

contradiction, because B1 is non-separable. Hence B1 and B2 must be distinct blocks of H containing vertex v. Conversely, assume that v lies in the intersection of distinct blocks B1 and B2 of H. By Lemma 3.34, B1 and B2 are non-empty. Then H itself is a connected hyper sub graph of H containing edges from two blocks of H that intersect in v. It follows from Lemma 3.37 that v is a separating vertex of H. Theorems 3.36 and 3.38 show that a block graph of a hyper graph can be defined just as for graphs. Namely, let H be a connected Hyper graph without empty edges, S the set of its separating vertices, and B the collection of its blocks. Then the block graph of H is the bipartite graph with vertex bipartition {S, B} and edge set {vB: v ∈ S, B ∈ B, v ∈ V (B)}. From the third statement of Theorem 3.36 it then follows that the block graph of H is a tree. Next, we show that blocks of a Hyper graph correspond to maximal clusters of blocks of its incidence graph, to be defined below.

Definition 3.39. Let H = (V, E) be a connected Hyper graph without empty edges, and G = G(H) its incidence graph. A cluster of blocks of G is a connected union of blocks of G, no two of which share a v-vertex. Theorem 3.40. Let H = (V, E) be a connected Hyper graph without empty edges and H′ its hyper sub graph, and let G = G(H) and G′ = G(H′ ) be their incidence graphs, respectively. Then H′ is a block of H if and only if G′ is a maximal cluster of blocks of G. Proof. Assume H′ is a block of H. We first show that G′ = G (H′) is a cluster of blocks of G. Let C be the union of all blocks of G that have a common edge with G′. Observe that since H′ is connected and has no empty edges, G′ is connected by Theorem 3.11, and consequently C is connected. Suppose that two distinct blocks of C, say B1 and B2, share a v-vertex of G. Since G′ contains an edge from both B1 and B2, v is a separating vertex of G′ by Lemma 3.37. However, by Theorem 3.31, v is then a separating vertex of the block H′ of H, a contradiction. Hence no two distinct blocks in C intersect in a v-vertex, and C is a cluster of blocks of G. Let C ⊆ be a maximal cluster of blocks of G containing C. Then C ⊆ is connected, and has no separating v-vertices by Theorem 3.38. Since C ⊆ is maximal, no e-vertex of C ⊆ can be contained in a block not in C ⊆. Consequently, for every e-vertex e of C ⊆, all edges of the form ev (for v ∈ V) are contained in C ⊆. Hence, by Lemma 2.7, C ⊆ is the incidence graph of a hyper sub graph H⊆ of H. Now H⊆ is connected and has no separating vertices since C ⊆ is connected and has no separating v-vertices. Moreover,

Connection and Separation in Hyper Graphs

187

H⊆ contains the block H′. We conclude that H⊆ = H′ and C ⊆ = G′. It follows that G′ is a maximal cluster of blocks of G. Conversely, let G′ be a maximal cluster of blocks of G. Then for every e-vertex e of G′ , all edges of G of the form ev (for v ∈ V such that v ∈ e) must be in G′ , so by Lemma 2.7, G′ = G(H′ ) for some hyper sub graph H′ of H. Since G′ is connected and has no separating v-vertices, H′ is connected and non-separable. Hence H′ is contained in a block B of H. By the previous paragraph, G (B) is a maximal cluster of blocks of G, and it also contains the maximal cluster G′. We conclude that G (B) = G′, that is, G′ is the incidence graph of a block of H. The next corollary is immediate. Corollary 3.41. Let H = (V, E) be a connected Hyper graph without empty edges, and G = G(H) its incidence graph. Then H is non-separable if and only if G is a cluster of blocks of G. To complete the discussion on the blocks of the incidence graph of a Hyper graph, we show the following. Theorem 3.42. Let H = (V, E) be a non-separable Hyper graph with at least two edges of cardinality greater than 1. Let G = G(H) be its incidence graph and x a cut vertex of G. Then x ∈ E and x is a weak cut edge of H.

Proof. If x ∈ V, then x is a separating vertex of H by Theorem 3.31, a contradiction. Hence x ∈ E, and x is a cut edge of H by Theorem 3.23. Suppose x is a strong cut edge. If |x| < |V |, then H has a cut vertex by Corollary 3.27, amd hence a separating vertex by Theorem 3.29, a contradiction. Hence |x| = |V |, and by Theorem 3.17, H − x has exactly |x| connected components, implying that x is the only edge of H of cardinality greater than 1, a contradiction. Hence x must be a weak cut edge of H. In the last four theorems we attempt to generalize the following classic result from graph theory. Theorem 3.43. [5] 1.

A connected graph is non-separable if and only if any two of its edges lie on a common cycle. 2. A connected graph with at least three vertices has no cut vertex if and only if any two of its vertices lie on a common cycle. Theorem 3.44. Let H = (V, E) be a non-separable Hyper graph with |V | ≥ 2 and |E| ≥ 2, and let G = G(H) be its incidence graph. Assume in addition that V ∉ E and that H has no weak cut edges. Then any two distinct vertices of H and any two distinct edges of H lie on a common cycle.

Graphs: Theory and Algorithms

188

Proof. Suppose that G has a separating vertex x. If x ∈ V, then by Theorem 3.31, x is a separating vertex of H, a contradiction. Thus x ∈ E, and x is a cut edge of H by Theorem 3.23. By assumption, x is a strong cut edge and |x| < |V |. Hence H has a cut vertex, and hence a separating vertex, by Corollary 3.27 and Theorem 3.29, respectively — a contradiction. Hence G has no cut vertex, and by Theorem 3.43, any two vertices of G lie on a common cycle. It then follows from Lemma 3.6 that any two vertices (and any two edges) of H lie on a common cycle. Theorem 3.45. Let H = (V, E) be a connected Hyper graph with |V | ≥ 2, without edges of cardinality less than 2, and without vertices of degree less than 2. Then the following are equivalent: 1. H has no separating vertices and no cut edges. 2. Every pair of elements from V ∪ E lie on a common cycle. 3. Every pair of vertices lie on a common cycle. 4. Every pair of edges lie on a common cycle. Proof. Let G = G(H) be the incidence graph of H. (1)

(2) (3)

(4)

⇒ (2): Since H has no separating vertices and no cut edges, G has no cut vertices by Theorems 3.31 and 3.23. Hence by Theorem 3.43, since |V (G)| ≥ 3, every pair of vertices of G lie on a common cycle in G, and therefore every pair of elements from V ∪ E lie on a common cycle in H. ⇒ (3): This is obvious. ⇒ (4): Since every pair of vertices of H lie on a common cycle in H, every pair of v-vertices of G lie on a common cycle in G. Consequently, by Theorem 3.36, all v-vertices of G are contained in the same block B, and if G has any other blocks, then they are isomorphic to K2. Let B1 be one of these “trivial” blocks, and let e be its e-vertex. Then degG(e) = 1 — a contradiction, since H has no singleton edges. It follows that G has no “trivial” blocks, and hence no cut vertices. Therefore every pair of e-vertices of G lie on a common cycle in G, and every pair of edges of H lie on a common cycle in H. ⇒ (1): Since every pair of edges of H lie on a common cycle in H, every pair of e-vertices of G lie on a common cycle in G. Consequently, all e-vertices of G are contained in the same block B, and if G has any other blocks, then they are isomorphic to K2. Let B1 be one of these “trivial” blocks, and let v be its v-vertex.

Connection and Separation in Hyper Graphs

189

Then degG(v) = 1 — a contradiction, since H has no pendant vertices. It follows that G has no “trivial” blocks, and hence no cut vertices. Therefore H has no separating vertices and no cut edges by Theorems 3.31 and 3.23, respectively. Theorem 3.46. Let H = (V, E) be a connected Hyper graph with |V | ≥ 2, without edges of cardinality less than 2, and without vertices of degree less than 2. Then the following are equivalent: 1. 2.

H has no cut edges. Every pair of elements from V ∪ E lie on a common strict closed trail. 3. Every pair of vertices lie on a common strict closed trail. 4. Every pair of edges lie on a common strict closed trail. Proof. Let G = G(H) be the incidence graph of H. (1)

(2) (3)

⇒ (2): Since H has no cut edges, G has no cut e-vertices by Theorem 3.23. Take any two elements x0 and xk of V ∪ E. We construct a strict closed trail in H containing x0 and xk as follows. Let B1 and Bk be blocks of G containing x0 and xk, respectively, and let P = B1x1B2 . . . Bk−1xk−1Bk be the unique (B1, Bk)-path in the block tree of G. Here, of course, B1, . . . , Bk are blocks of G, x1, . . . , xk−1 are separating (cut) vertices of G, and each separating vertex xi (necessarily a v-vertex) is shared between blocks Bi and Bi+1. (We may assume that vertex x0 does not lie in block B2, and xk does not lie in Bk−1, otherwise the path P may be shortened accordingly.) By Theorem 3.43, each pair of vertices xi−1 and xi , for i = 1, . . . , k, lie on a common cycle Ci within block Bi . Note that these cycles C1, . . . , Ck are pairwise edge-disjoint and intersect only in the v-vertices x1, . . . , xk−1. Let T = C1⊕. . .⊕Ck. Then T is a closed trail in G containing x0 and xk that does not repeat any e-vertices. (We count the first and last vertex of a closed trail — which are identical — as one occurrence of this vertex.) We conclude that every pair of vertices of G lie on a common closed trail in G that traverses each e-vertex at most once. Therefore, by Lemma 3.6, every pair of elements from V ∪ E lie on a common strict closed trail in H. ⇒ (3): This is obvious. ⇒ (4): Since every pair of vertices of H lie on a common strict closed trail in H, every pair of v-vertices of G lie on a common

190

Graphs: Theory and Algorithms

closed trail in G that visits each e-vertex at most once. Suppose G has a cut e-vertex e. Let let v1 and v2 be two v-vertices in distinct connected components of G\e. Since e is a cut vertex, v1 and v2 are disconnected in G\e. On the other hand, by assumption, v1 and v2 lie on a closed trail T that traverses e at most once. Hence T\e contains a (v1, v2)-path of G\e, a contradiction. Consequently, G has no cut e-vertices, which implies (as seen in the previous paragraph) that any two vertices — and hence any two e-vertices — lie on a common closed trail in G that does not repeat any e-vertices. Therefore every pair of edges of H lie on a common strict closed trail in H. (4) ⇒ (1): Since every pair of edges of H lie on a common strict closed trail in H, every pair of e-vertices of G lie on a common closed trail in G that has no repeated e-vertices. Suppose G has a cut e-vertex e. Since H has no vertex of degree less than 2, G\e has no trivial connected components; that is, each connected component of G\e contains an e-vertex. Let e1 and e2 be two e-vertices from distinct connected components of G\e. Then e1 and e2 are disconnected in G\e. On the other hand, by assumption, e1 and e2 lie on a closed trail T that traverses e at most once. Hence T\e contains an (e1, e2)-path of G\e, a contradiction. It follows that G has no cut e-vertices, and H has no cut edges by Theorem 3.23. We conclude with the dual version of the previous theorem. Corollary 3.47. Let H = (V, E) be a connected Hyper graph with |E| ≥ 2, without edges of cardinality less than 2, and without vertices of degree less than 2. Then the following are equivalent: 1 . H has no separating vertices. 2. Every pair of elements from V ∪ E lie on a common pseudo cycle. 3. Every pair of edges lie on a common pseudo cycle. 4. Every pair of vertices lie on a common pseudo cycle. Proof. Let HT be the dual of H, and observe that (by Corollary 3.13 and since H must have at least 2 vertices) HT satisfies the assumptions of Theorem 3.46. Since separating vertices of H correspond precisely to cut edges of HT by Corollary 3.32, and pseudo cycles of H correspond to strict closed trails of HT by Lemma 3.7, the corollary follows easily from Theorem 3.46.

Connection and Separation in Hyper Graphs

191

CONCLUSION In this paper, we generalized several concepts related to connection in graphs to hyper graphs. While some of these concepts generalize naturally in a unique way, or behave in hyper graphs similarly to graphs, other concepts lend themselves to more than one natural generalization, or reveal surprising new properties. Many more concepts from graph theory remain unexplored for hyper graphs, and we hope that our work will stimulate more research in this area.

ACKNOWLEDGEMENT The first author wishes to thank the Department of Mathematics and Statistics, University of Ottawa, for its hospitality during his postdoctoral fellowship, when this research was conducted. The second author gratefully acknowledges financial support by the Natural Sciences and Engineering Research Council of Canada (NSERC). Thanks are also due to the anonymous referee for their quick and careful review.

192

Graphs: Theory and Algorithms

REFERENCES 1. 2. 3.

4. 5. 6. 7.

8.

9.

M. A. Bahmanian, M. Sajna. Hyper graphs: connection and separation. ArXiv:1504.04274 ˇ [math.CO]. C. Berge. Graphs and Hyper graphs. North-Holland, New York, 1976. C. Berge. Hyper graphs, Combinatorics of Finite Sets. North-Holland Mathematical Library 45, North-Holland Publishing, Amsterdam, 1989. J. A. Bondy, U. S. R. Murty. Graph Theory with Applications. American Elsevier Publishing, New York, 1976. J. A. Bondy, U. S. R. Murty. Graph Theory. Graduate Texts in Mathematics 244, Springer, New York, 2008. A. Bretto. Hyper graph Theory, an Introduction. Springer, 2013. P. Duchet. Hyper graphs, in Handbook of Combinatorics, edited by R. L. Graham, M. Gr¨otschel, and L. Lov´asz. Elsevier, Amsterdam, 1995. V. I. Voloshin. Coloring Mixed Hyper graphs: Theory, Algorithms and Applications. Fields Institute Monographs 17, American Mathematical Society, Providence, RI, 2002. V. I. Voloshin. Introduction to Graph and Hyper graph Theory. Nova Science Publishers, New York, 2009.

Vertex Rough Graphs

12

Bibin Mathew1, Sunil Jacob John1 and Harish Garg2

Department of Mathematics, National Institute of Technology, Calicut, India 1

School of Mathematics, Thapar Institute of Engineering and Technology, Patiala, India 2

ABSTRACT This article introduces the notion of vertex rough graph and discusses certain basic graph theoretic definitions and examples. Adjacency of vertices is used to create a matrix corresponding to a vertex rough graph. Also, the membership function of a vertex rough graph is introduced with the help of

Citation: (APA): Mathew, B., John, S.J. and Garg, H. (2020). Vertex rough graphs. Complex Intell. Syst. (7 pages) DOI: https://doi.org/10.1007/s40747-020-00133-8 Copyright:- Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

194

Graphs: Theory and Algorithms

Pawlak’s Rough set theory, and using this certain results are obtained. The concepts of rough precision and rough similarity degree are extended to vertex rough graphs.

INTRODUCTION Uncertainty and imprecision occurring in the form of vagueness and ambiguity make many of the naturally occurring situations complex and complicated. Classical mathematical techniques often fail to prosper in situations like this. Further, most of these techniques are crisp, precise and deterministic. The classical technique of probability theory has the limitation that the happening of an event is strictly determined by chance. Zadeh [1] has defined fuzzy sets which can mathematically model situations which are imprecise and vague. Pawlak [2] introduced the concept of rough sets which is an excellent mathematical tool to handle ambiguity and equivocalness associated with the given information. The main advantage of rough set theory is that it does not need any additional information about the data, like membership values in fuzzy sets. In classical set theory, Crisp sets are defined by a membership function, but in rough set theory, the primary concept to define a rough set is an indiscernibility relation. It employs indiscernibility relations to evaluate to what extent two objects were similar. Using this indiscernibility relation, one can construct lower and upper approximations of a set. Lower approximation consists of all instances which surely belongs to the concept, and upper approximation consists of all cases which possibly belongs to the concepts. One benefit of the rough set theory is that it does not require any additional parameter to extract information. Rough set theory has found main applications [3] in many branches like rough classification and logic [4, 5], decision making [6, 7], machine learning [8], data mining [9, 10], banking [11], medicine [12], etc. A Graph is a symmetric binary relation on a set. It is a fundamental tool in mathematical modelling and has applications in almost all branches of Science and Engineering. Many of the real life problems were solved through mathematical modelling with the help of graph theory. The theory of rough graphs is an attempt to unify rough set theory and graph theory. Graph theory, where objects are represented by vertices and relations by edges, is a convenient way of representing information involving relationship between objects. When there is ambiguity in the description of the objects or in its relationships or in both, it is quite natural that we need to design a structure supporting it, which is called a Rough Graph.

Vertex Rough Graphs

195

With the advent of World Wide Web, the amount of data need to be collected and stored has increased exponentially and a major part of this data can be represented as graphs which includes page link structures, social, professional and academic networks such as Facebook, Linkedin, DBLP, etc. Most of the times, the patterns of connection between entities in these, which represents non trivial topological features, which are neither purely crisp nor completely random, is called a Complex Network [13]. A major challenge nowadays is to mine these complex networks and the abundance of data in these motivated a new area, called Graph mining, which focus on investigate, propose and develop new algorithms designed to mine complex networks. As ambiguity is naturally inherited in these networks, a suitable modelling can be achieved by utilizing the concept of Rough Graphs. The notion of edge rough graph was introduced by He and Shi [14]. They have established the concept using a partition on the edge set of a graph. He et al. [15] extended this concept to weighted rough graph by enduing the edges of rough graph with weight attribute, and gave the algorithm of exploring the class optimal tree in weighted rough graph, which generalizes the classical Kruskal algorithm of exploring the optimal tree and presented an application in relationship analysis. Another application of Weighted Rough graph was discussed in [16]. Combining the edge rough graphs and cayley graphs, Liang et al. [17] studied an application of rough graph in data mining. Tong He introduced further rough theoretic properties of rough graphs [18] and representation forms of rough graphs [19]. Some other hybrid structures of rough graphs like soft rough graphs, neutrosophic soft rough graphs, intuitionistic fuzzy rough graphs are also introduced in [20,21,22]. In edge rough graph, there is no significance for vertex set. It is not possible to compare any two arbitrary rough graphs. They can be compared only if their vertex sets are same. If such a comparison is possible, then the real life applications of rough graph will have more flexibility. The main objective of this paper was to introduce the concept of vertex rough graph which is a more general concept than the edge rough graph. The vertex rough graph is constructed using a partition on the vertex set. Using a partition of vertex set, we define lower approximation and upper approximation of a graph. Hence, this paper is an introduction to the theory of vertex rough graph. In this paper, the basic idea of edge rough graph is extended to vertex rough graph. Section 2 discusses some basic definitions of graph theory,

Graphs: Theory and Algorithms

196

rough set and edge rough graph. In Sect. 3, the notion of vertex rough graph is introduced and some examples are given. Basic graph theoretic definitions of vertex rough graphs are defined and a counter example for a connected graph which is not surely connected is provided. Later, adjacency matrix of vertex rough graph is defined and some of its properties are discussed. In the last section, some rough theoretic ideas like membership functions and precisions of a vertex rough graphs are defined and related properties are derived.

PRELIMINARIES Some basic definitions from graph theory, Rough set theory and edge rough graph are given:

Definition 2.1 [23] A graph G is an ordered triple (V(G), E(G), ψG) consisting of a nonempty set V(G) of vertices, a set E(G), disjoint from V(G), of edges, and an incidence function ψG that associates with each edge of G an unordered pair of (not necessarily distinct) vertices of G. If e is an edge and u and v are vertices such that ψG(e) = uv, then e is said to join u and v; the vertices u and v are called the ends of e. Two graphs G and H are identical (written G = H) if V(G) = V(H), E(G) = E(H), and ψG = ψH . Two graphs G and H are said to be isomorphic ( written G ∼= H) if there are bijections θ : V(G) → V(H) and φ : E(G) → E(H) such that ψG(e) = uv if and only if ψH (φ(e)) = θ (u)θ (v); such a pair (θ , φ) of mappings is called an isomorphism between G and H.

Definition 2.2

[2] Suppose we are given a set of objects U called the universe and an indiscernibility relation R ⊆ U × U, representing our lack of knowledge about elements of U. For the sake of simplicity we assume that R is an equivalence relation. Let X be a subset of U. We want to characterize the set X with respect to R: • • • • •

R-lower approximation of X R(x)∗=∪x∈X{R(x):R(x)⊆X} R-upper approximation of X R(x)∗=∪x∈X{R(x):R(x)∩X≠ϕ} R-boundary region of X

Vertex Rough Graphs

197

• RNR(x)=R(x)∗−R(x)∗ The pair(R(x)∗, R(x)∗)is called Rough Set. X is crisp (exact with respect to R), if the boundary region of X is empty. Set X is rough (inexact with respect to R), if the boundary region of X is non-empty.

Definition 2.3 [14] Given universe of discourse U, V = {v1, v2,...,v|V|}, P = {r1,r2,...,r|P|} is attributes set on U, and P contains vertex attribute (vi, v j), where,vi ∈ V, v j ∈ V. Let E = ∪ek (vi, v j) is edge set on U, graph U = (V, E) is called universe graph. For any attribute set R ⊆ P on E, the elements (or be called edges) in E can be classified into different equivalence classes [e]R. For any sub graph T = (W, X), where W ⊆ V, X ⊆ E, graph T is called R-definable graph or R-exact graph if X is the union of some [e]R. Conversely, graph T is called R-undefinable graph or R-rough graph. For R-rough graph, two exact graphs R(T )∗ = (W, R(X)∗) and R(T )∗ = (W, R(X)∗) can be used to define it approximately, where

The graphs R(T )∗ and R(T )∗ are called R-lower and Rupper approximate graphs of T . The pair of graph (R(T )∗, R(T )∗) is called R-rough graph. The set bnR(X) = R(X)∗− R(X)∗ is called the R-boundary of edges set X of T .

VERTEX ROUGH GRAPH

In this section, Vertex rough graph of a graph with respect to a indiscernability relation on vertex set V is presented.

Definition 3.1 Let G=(V,E) be a universe graph with V={v1,v2,…,vn} and E={e1,e2,…,em}. Let R be an equivalence relation defined on V. Then the elements in V can be divided into different equivalence classes [v]R.

Definition 3.2 Let T(W, X) be a sub graph of G(V, E) where W⊆V,X⊆E, graph T is called R-definable graph or R-exact graph if W is the union of some [v]R. Otherwise, the graph T is called R-undefinable graph or R-rough graph.

198

Graphs: Theory and Algorithms

Definition 3.3 R-vertex rough graph is defined in terms of two exact graphs R∗(T)=(R∗(W),R∗(X)) and R∗(T)=(R∗(W),R∗(X)), where

The graphs R∗(T) and R∗(T) are called R-lower approximate graph of T and Rupper approximate graph of T. The pair of graph (R∗(T),R∗(T))is called R-vertex rough graph.

Example 3.1 Consider G(V, E) V={v1,v2,v3,v4,v5} V/R={{v3,v4,v5},{v1,v2}}

Consider T=(W,X) be a sub graph of G(V, E) (Fig. 1)

By using definition 3.3, we get the lower and upper approximations of vertex set and edge set as (Fig. 2):

Figure. 1. Graph G(V, E) and its sub-graph T(W, X) in Example 3.1.

Vertex Rough Graphs

199

Figure. 2. Lower and upper approximations of the graph T in Example 3.1.

Figure. 3. Graph G and its sub-graph T in Example 3.2.

Proposition 3.1 Lower and upper approximations of a graph have the following properties: For all T , T1, T2 ⊆ G,

Definition 3.4 Let T(W1,X) and S(W2,Y) are sub graphs of G(V, E) where W1⊆V, W2⊆V, X⊆E, Y⊆E, T=(R∗(T),R∗(T)) and S=(R∗(S),R∗(S)) be its rough graphs. S is

200

Graphs: Theory and Algorithms

said to be surely sub graph of T if R∗(S)⊆R∗(T). Also S is said to be possibly sub graph of T if R∗(S)⊆R∗(T) if S is both surely sub graph and possibly sub graph of T, then S is a rough sub graph of T.

Definition 3.5 A set of two or more edges of a rough graph T is said to be multiple or parallel edges if they have the same end vertices. An edge for which two ends are the same is called a loop at the common vertex. A rough graph T=(R∗(T),R∗(T)) is said to be surely simple if R∗(T) contains no loops and parallel edges. A rough graph T=(R∗(T),R∗(T)) is said to be possibly simple if R∗(T) contains no loops and parallel edges. A rough graph T=(R∗(T),R∗(T)) is said to be simple if it is both surely and possibly simple graphs.

Definition 3.6 Two rough graphs T= (R∗(T),R∗(T)) and S=(R∗(S),R∗(S)) are said to be surely isomorphic if there is a graph isomorphism between R∗(T) and R∗(S). Also it is said to be possibly isomorphic if there is a graph isomorphism between R∗ (T) and R∗(S). Two rough graphs T= (R∗ (T), R∗ (T)) and S= (R∗(S), R∗(S)) are said to be isomorphic if they are both surely and possibly isomorphic.

Definition 3.7 Let T=(R∗(T),R∗(T)) be a Rough graph. The Complement Tc of T with respect to G is defined by taking V(Tc)=V(T) and Tc=(R∗(T)c,R∗(T)c) where adjacency of R∗(T)c is defined as two vertices of u and v are adjacent if and only if they are non-adjacent in R∗(T). Also adjacency of R∗ (T) c is defined as two vertices u and v are adjacent if and only if they are non-adjacent in R∗ (T).

Remark 3.1 The connectedness of vertex rough graph is the same as the connectedness of edge rough graph.

Result 3.1 If T(W, X) is connected then it need not be surely connected. Similarly T is a tree then it need not be a sure tree.

Vertex Rough Graphs

201

Example 3.2 Consider G(V, E)

Consider T=(W,X) be a sub graph (Fig. 3) of G(V, E) Then, we get the lower approximation of vertex set and edge set as (Fig. 4)

Figure. 4. Lower approximation of T in Example 3.2

Here R∗(T ) is disconnected. but T = (W, X) is connected. Also T is a tree but R∗(T ) is not a tree. It is a forrest.

Matrix Corresponding To a Rough Graph

Let G=(V,E) be a universe graph with V={v1,v2,…,vn}. R be an equivalent relation defined on V. Let T(W, X) be a sub graph of G. T=(R∗(T),R∗(T)) be the corresponding rough graph. Then we can define a nonzero ternary matrix AR(T) of T by

Graphs: Theory and Algorithms

202

Example 3.3 Matrix corresponding to the rough graph in Example 3.1 is

Note: There is no one to one correspondence between set of all rough graph and set of all ternary matrices. But for every rough graph, there is a ternary matrix.

Remark 3.2 Let T be a rough graph and AR (T) be its corresponding matrix. Then 1. 2.

T is exact if all entries of AR (T) are 0 & 2. T is Rough if at least one entry of AR (T) is 1.

ROUGH PROPERTIES OF ROUGH GRAPH In the same way of rough set theory, rough graph can be also defined employing instead of approximation, rough membership function.

Definition 4.1 Definition 4.1 The rough vertex membership function of a rough graph T = (R∗(T ), R∗(T )) is a function

: V → [0, 1] is defined as

Also, the rough edge membership function of a rough graph T = (R∗(T ), R (T ))is a function : V ×V → [0, 1] is defined as (vi, v j) = min{ (vi), (v j)}. ∗

Definition 4.2 The vertex and edge membership function can be used to define the rough graph of a graph as shown

Vertex Rough Graphs

Proposition 4.1 The membership function has the following properties:

Proof 1.

203

204

Graphs: Theory and Algorithms

Next we extend the definition of Edge precision αR(T)αR(T) [18] of a edge rough graph to vertex rough graph:

Definition 4.3 A vertex rough graph T = (R∗(T ), R∗(T )) where R∗(T ) = (R∗(W), R∗(X)) and R∗(T ) = (R∗(W), R∗(X)). αR(T ) is the R-vertex precision of T and βR(T ) is the R-edge precision of T defined by αR(T ) = and where .

Result 4.1 Let M be the set of all vertex rough graphs. For any vertex attribute set W and edge attribute set X and t ⊆ M then, 0 ≤ αR(T ) ≤ 1&0 ≤ βR(T ) ≤ 1. If T is exact iff αR(T ) = 1 & βR(T ) = 1.

Proof Since R∗(W) ⊆ R∗(W) and R∗(X) ⊆ R∗(X).

Therefore, 0 ≤ αR(T ) ≤ 1&0 ≤ βR(T ) ≤ 1.

If T is exact ⇔ R∗(W) = R∗(W) & R∗(X) = R∗(X) ⇔ αR(T ) = 1 & βR(T ) = 1

Vertex Rough Graphs

205

Result 4.2 If T and S are two vertex rough graphs, where T = (W1, X1) and S = (W2, X2). S is a vertex rough subgraph of T, then αR(S) ≤ αR(T ) and βR(S) ≤ βR(T ).

To compare two rough graphs, rough similarity degree [18] is an important measure. We can extend it to vertex rough graph.

Definition 4.4 Given vertex rough graph set M, attribute set R. K = (M, R) is a knowledge system. Let H, J ⊆ M where H = (W1, X), J = (W2, Y ) and 1.

Rough vertex similarity degree (⟨H,J⟩R) and rough edge similarity degree([[H,J]R) between H and J are defined by

2.

Lower rough vertex similarity degree (⟨H, J⟩R∗) and lower rough edge similarity degree ([H, J] R∗) between H and J are defined by and Upper rough vertex similarity degree (< H, J >R∗) and upper rough edge similarity degree ([H, J] R∗) between H and J are

3.

defined by

Proposition 4.2 Given vertex rough graph set M, R to be the attribute set.K = (M, R) to be the knowledge system. H, J ⊆ M. Then 1. 2. 3.

H and J are R-rough equal iff ⟨H,J⟩R=[H,J]R=1 H and J are R-lower rough equal iff ⟨H,J⟩R∗=[H,J]R∗=1 H and J are R-upper rough equal iff ⟨H,J⟩R∗=[H,J]R∗=

206

Graphs: Theory and Algorithms

Proof

CONCLUSION Both the Rough set theory and the graph theory have a variety of applications across different fields. This paper introduced the concept of vertex rough graph which combines rough set theory and the graph theory. Similar to

Vertex Rough Graphs

207

rough set theory, the notion of vertex and edge rough membership function is introduced and using this membership functions, an alternative definition of vertex rough graph has been developed. Later, vertex precision and edge precision are defined and some properties are discussed. Since edge rough graph has lot of applications in various fields, like relationship analysis, data mining, etc., the vertex rough graphs also will have applications in these fields as well as many other fields. In future we will find out further rough properties and applications of vertex rough graphs.

ACKNOWLEDGEMENTS The first author acknowledges the financial assistance given by University Grants Commission (UGC), Government of India throughout the preparation of this paper. Authors are very much thankful to the reviewers for the valuable suggestions which helped greatly in improving the quality of the paper.

208

Graphs: Theory and Algorithms

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8.

9. 10. 11.

12.

13. 14. 15. 16. 17.

Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353 Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11:341–356 Pawlak Z (2002) Rough set theory and its application. J Telecommun Inf Technol 3:7–10 Pawlak Z (1984) Rough classification. Int J Man Mach Stud 20(5):469–483 Pawlak Z (1987) Rough logic. Bull Pol Acad Sci Tech Sci 35(5– 6):253–258 Pawlak Z (1998) Rough set theory and its applications to data analysis. Cybern Syst 29(7):661–688 Fariha Z, Akram M (2018) A novel decision-making method based on rough fuzzy information. Int J Fuzzy Syst 20(3):1000–1014 Prasad V, Rao TS, Babu MSP (2016) Thyroid disease diagnosis via hybrid architecture composing rough data sets theory and machine learning algorithms. Soft Comput 20(3):1179–1189 Sheeja TK, Kuriakose AS (2018) A novel feature selection method using fuzzy rough sets. Comput Ind 97:111–116 Chen H, Li T, Fan X, Luo C (2019) Feature selection for imbalanced data based on neighborhood rough sets. Inf Sci 483:1–20 Shi B, Meng B, Yang H, Wang J, Shi W (2018) A novel approach for reducing attributes and its application to small enterprise financing ability evaluation. Complexity 1032643:17 Immaculate HJ, Arockiarani I (2018) Cosine similarity measure for rough intuitionistic fuzzy sets and its application in medical diagnosis. Int J Pure Appl Math 118(1):1–7 Newman ME (2003) The structure and function of complex networks. SIAM Rev 45(2):167–256 He T, Shi K (2006) Rough graph and its structure. J Shandong Univ (Nat Sci) 6:88–92 He T, Chan Y, Shi K (2006) Weighted rough graph and its application. IEEE Sixth Int Conf Intell Syst Des Appl 1:486–492 He T, Xue P, Shi K (2008) Application of rough graph in relationship mining. J Syst Eng Electron 19:742–747 Liang M, Liang B, Wei L, Xu X (2011) Edge rough graph and its application. Proc. Of eighth International Conference on Fuzzy Systems and Knowledge Discovery 335–338

Vertex Rough Graphs

209

18. He T (2012) Rough properties of rough graph. Appl Mech Mater 157– 158:517–520 19. He T (2012) Representation form of rough graph. Appl Mech Mater 157–158:874–877 20. Akram M, Nawaz S (2015) On fuzzy soft graphs. Ital J Pure Appl Math 34:497–514 21. Akram M, Malik H, Shahzadi S, Smarandache F (2018) Neutrosophic soft rough graphs with application. Axioms 7:1–14 22. Malik H, Akram M (2018) A new approach based on intuitionistic fuzzy rough graphs for decision-making. J Intell Fuzzy Syst 34:2325–2342 23. Bondy J, Murty DS (1992) Graph theory with applications. NorthHolland, Amsterdam

Incremental Graph Pattern Matching Algorithm for Big Graph Data

13

Lixia Zhang1 and Jianliang Gao 2

College of Mathematics and Computer Science, Key Laboratory of High Performance Computing and Stochastic Information Processing, Ministry of Education of China, Hunan Normal University, Changsha 410081, China 1

School of Information Science and Engineering, Central South University, Changsha 410083, China

2

ABSTRACT Graph pattern matching is widely used in big data applications. However, real-world graphs are usually huge and dynamic. A small change in the data graph or pattern graph could cause serious computing cost. Incremental graph matching algorithms can avoid re computing on the whole graph

Citation: (APA): Zhang, L., & Gao, J. (2018). Incremental graph pattern matching algorithm for big graph data. Scientific Programming, 2018. (8 pages) DOI: https://doi. org/10.1155/2018/6749561 Copyright:- © 2018 Lixia Zhang and Jianliang Gao. This is an open access article distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

212

Graphs: Theory and Algorithms

and reduce the computing cost when the data graph or the pattern graph is updated. The existing incremental algorithm PGC_IncGPM can effectively reduce matching time when no more than half edges of the pattern graph are updated. However, as the number of changed edges increases, the improvement of PGC_IncGPM gradually decreases. To solve this problem, an improved algorithm iDeltaP_IncGPM is developed in this paper. For multiple insertions (resp., deletions) on pattern graphs, iDeltaP_IncGPM determines the nodes’ matching state detection sequence and processes them together. Experimental results show that iDeltaP_IncGPM has higher efficiency and wider application range than PGC_IncGPM.

INTRODUCTION Graph pattern matching is to find all the sub graphs that are the same or similar to a given pattern graph 𝑃 in a data graph 𝐺. It is widely used in a number of applications, for example, web document classification, software plagiarism detection, and protein structure detection [1–3]. With the rapid development of Internet, huge amounts of graph data emerge every day. For example, the Linked Open Data Project, which aims to connect data across the Web, has published 149 billion triples until 2017 [4]. In addition, real-world graphs are dynamic [5]. It is often cost prohibitive to recomputed matches starting from scratch when 𝐺 or 𝑃 is updated. An incremental matching algorithm is needed, which aims to minimize unnecessary recomputation by analyzing and computing the changes of matching result in response to updates Δ𝐺 (resp., Δ𝑃) to 𝐺 (resp., 𝑃).

For example, Figure 1(a) is a pattern graph P and Figure 1(b) is a data graph G. Te sub graph which is composed of A1, B1, C1, D1, E1, and the edges between them (for simplicity, denoted as {A1, B1, C1, D1, E1}) is the only matching sub graph. Assuming that (B, E) and (C, D) are removed from the pattern graph, the traditional re computing algorithm will compute the matches for the new pattern graph on the whole data graph. It is time consuming. Te incremental algorithm will just check a part of nodes in G, that is, B2, B3, C2, C3,A2, and A3, and add new matching sub graphs ({A2, B2, C2, D2, E2}, {A3, B3, C3, D3, E3}) to the original matching result. At present, the study of incremental graph pattern matching is still in its infancy and existing work [6–12] mainly focuses on the updates of data graphs. In our previous study, we proposed an incremental graph matching algorithm named PGC_IncGPM, which can be used in scenarios where data

Incremental Graph Pattern Matching Algorithm for Big Graph Data

213

graphs are constant and pattern graphs are updated [13]. PGC_IncGPM can effectively reduce the runtime of graph matching as long as the number of changed edges is less than the number of unchanged edges in P. However, the improvement effect of PGC_IncGPM gradually decreases as the number of changed edges increases. In this paper, the bottleneck of PGC_IncGPM is further analyzed. An optimization method of nodes’ matching state detection sequence is proposed, and a more efficient algorithm called iDeltaP_IncGPM is designed and implemented.

Figure 1: An example of incremental graph pattern matching.

Using Figure 1 as an example, suppose (B, E) and (C, D) are deleted from the pattern graph. PGC_IncGPM algorithm will first consider the deletion of (B, E), that is, checking B2, A2, B3, and A3, and then consider the deletion of (C, D), that is, checking C2, B2, A2, C3, B3, and A3. Thus B2, A2, B3, and A3 are all checked twice. iDeltaP_IncGPM considers the two deletions together; C2, B2, A2, C3, B3, and A3 are all checked only once. The remainder of this paper is organized as follows. In Section 2, related work is reviewed. The model and definition are described in Section 3. In Section 4, our algorithm is presented. Section 5 is experimental results and comparison, and Section 6 presents the conclusion.

RELATED WORK We surveyed related work in two categories: graph pattern matching models and incremental algorithms for graph matching on massive graphs. Graph pattern matching is typically defined in terms of sub graph isomorphism [14, 15]. However, subg raph isomorphism is an NP-complete problem [16]. In addition, sub graph isomorphism is often too restrictive because it requires that the matching sub graphs have exactly the same topology as the pattern graph. These hinder its applicability in emerging

214

Graphs: Theory and Algorithms

applications such as social networks and crime detection. Thus, graph simulation [17] and its extensions [18–22] are adopted for pattern matching. Graph simulation preserves the labels and the child relationship of a graph pattern in its match. In practical applications, graph simulation is so loosely that it may produce a large number of useless matches, which can flood useful information. Dual simulation [18] enhances graph simulation by imposing an additional condition, to preserve both child and parent relationships (downward and upward mappings). Due to the good balance and high practical value of dual simulation in response time and effectiveness, graph pattern matching is defined as dual simulation in this paper. At present, the study of incremental graph pattern matching is still in its infancy; existing work [6–12] mainly focuses on the updates of data graphs. Fan et al. proposed the incremental graph simulation algorithm Inc Match [6, 7]. Sun et al. studied the Maximal Clique Enumeration problem on dynamic graph [8]. Stotz et al. studied incremental inexact sub graph isomorphic problem [9]. Wang and Chen proposed an incremental approximation graph matching algorithm, which transformed the approximate sub graph search into vector space relation detection [10]. When inserting or deleting on the data graph, the vectors of relevant nodes are modified and whether the new vectors still contain the vector of the pattern graph is rechecked. Choudhury et al. developed a fast matching system Stream Works for dynamic graphs [11]. The system can real-time detect suspicious pattern graphs and early warn high-risk data transfer modes on constantly updated network graphs. Semertzidis and Pitoura proposed an approach to find the most durable matches of an input graph pattern on graphs that evolve over time [12]. In [13], an incremental graph matching algorithm was proposed for updates of pattern graphs. In big data era [23], graph computing is widely used in different fields such as social networks [24], sensor networks [25, 26], internet-of-things [27, 28], and cellular networks [29]. Therefore, there is urgent demand for improving the performance of big graph processing, especially graph pattern matching.

MODEL AND DEFINITION For graph pattern matching, pattern graphs and data graphs are directed graphs with labels. Each node in graphs has a unique label, which defines the

Incremental Graph Pattern Matching Algorithm for Big Graph Data

215

attitude of the node (such as keywords, skills, class, name, and company). Definition 1 (graph). A node-labeled directed graph (or simply a graph) is defend as G = (V, E, L), where V is a finite set of nodes, E ⊆ V×V is a finite set of edges, and L is a function that map each node u in V to a label L(u); that is, L(u) is the attribute of u. Definition 2 (graph pattern matching). Given a pattern graph and a data graph matches G if there is a binary relation , such that (1) (2) and (a)

if (u, V)∈R, then Lp(u) = L(V); ∀u ∈ Lp, there exists a node V in G such that (u, V)∈R

such that .

, there exists an edge

, there exists an edge

such that

Condition (2)(a) ensures that the matching node V keeps the child relationship of u; condition (2)(b) ensures that V maintains the parent relationship of u. For any P and G, there exists a unique maximum matching relation RM. Graph pattern matching is to find RM, and the result graph Gr is a sub graph of G that can represent RM.

Considering a real-life example, a recruiter wants to find a professional software development team from social network. Figure 2(a) is the basic organization graph of a software development team. The team consists of the following staffs with identity: project manager (PM), database engineer (DB), software architecture (SA), business process analyst (BA), user interface designers (UD), software developer (SD), and software tester (ST). Each node in the graph represents a person, and the label of node means the identity of person. The edge from node A to node B means that B works well under the supervision of A. A social network is shown in Figure 2(b). In this example, RM is{ (DB, DB1), (PM, PM1), (SA, SA1), (BA, BA1), (UD, UD1), (SD, SD1), (SD, SD2), (ST, ST1), (ST, ST2)}. Because BA2 does not have a child matching UD and SA2 does not have a parent matching DB, PM2 does not keep the child relationship of PM. For the same reason, SD3 (resp., ST3) does not match SD (resp., ST).

216

Graphs: Theory and Algorithms

Figure 2: An example of graph pattern matching

Definition 3 (incremental graph pattern matching for pattern graph changing). Given a data graph 𝐺 and a pattern graph 𝑃, the matching result in 𝐺 for 𝑃 is (𝑃, 𝐺). Assuming that 𝑃 changes Δ𝑃, the new pattern graph is expressed as 𝑃⊕Δ𝑃. As opposed to batch algorithms that re compute matches starting from scratch, an incremental graph matching algorithm aims to find changes of Δ𝑀 to 𝑀(𝑃, 𝐺) in response to Δ𝑃 such that 𝑀(𝑃 ⊕ Δ𝑃, 𝐺) = 𝑀(𝑃, 𝐺) ⊕ Δ𝑀.

When Δ𝑃 is small, Δ𝑀 is usually small as well, and it is much less costly to compute than to re compute the entire set of matches. In other words, this suggests that we compute matches once on the entire graph via a batch-matching algorithm and then incrementally identify new matches in response to Δ𝑃 without paying the cost of the high complexity of graph pattern matching. In order to get Δ𝑀 quickly, indexes can be prebuilt based on the selected data features of graphs to reduce the search space during incremental matching. The more indexes, the shorter the time to get Δ𝑀 and the larger the space to store indexes. For large-scale data graphs, both response time and storage cost are needed to be reduced. Considering the balance of storage cost and response time, in this paper, three kinds of sets generated in the process of graph matching are used as index. (1) First are candidate matching sets cand(⋅); for each node 𝑢 in 𝑃, cand(u) includes all the nodes in 𝐺 which only have the same label with 𝑢. The nodes in cand(⋅) are called c-nodes. (2) The second are child matching sets sim(⋅); for each node 𝑢 in 𝑃, sim(u) includes all the nodes in 𝐺 which preserve the child relationship of 𝑢.The nodes in sim(⋅) are called s-nodes. (3) The third are complete matching sets mat (⋅); for each node 𝑢 in 𝑃, mat (u) includes all the nodes in 𝐺 which preserve both the child and parent relationship of 𝑢. The nodes in mat (⋅) are called m-nodes.

The symbols used in this paper are shown in Notions Section.

Incremental Graph Pattern Matching Algorithm for Big Graph Data

217

iDeltaP_IncGPM Algorithm In this section, we propose the improved incremental graph pattern matching algorithm for pattern graph changing (Δ𝑃).

The Idea of PGC IncGPM Algorithm The basic framework of PGC IncGPM [13] is shown in Figure 3. The graph pattern matching algorithm (GPMS) is first performed on the entire data graph 𝐺 for the pattern graph 𝑃. It computes the matching result graph 𝐺𝑟 and creates the index needed for subsequent incremental matching. Δ𝑃 may include edge insertions (𝐸+) and edge deletions (𝐸−). Incremental graph pattern matching algorithm PGC IncGPM first calls the sub algorithm Add Edges for 𝐸+ to get and index and then calls the sub algorithm Sub − Edges for 𝐸 to get and is the new matching result 𝑀(𝑃 ⊕ Δ𝑃, is the new index that can be used for subsequent incremental 𝐺), and matching if the pattern graph changes again. Edge insertions (resp., edge deletions) in Δ𝑃 are processed one by one by Add Edges (resp., Sub Edges). For example, when deleting multiple edges from 𝑃, the processing of PGC IncGPM is as follows. Edge insertions (resp., edge deletions) in Δ𝑃 are processed one by one by Add Edges (resp., Sub Edges). For example, when deleting multiple edges from 𝑃, the processing of PGC IncGPM is as follows.

In the first step, the following operations are performed for each deleted edge (𝑢, 𝑢’): for each V ∈ 𝑐𝑎(𝑢), whether V keeps the child relationship of 𝑢 in𝑃⊕Δ𝑃is checked. If V keeps the child relationship of u, then V is removed from cand(u) to sim(u) and the parents of V in cand(⋅) are also processed. In the second step, each node in sim(⋅) is repeatedly filtered according to its parents and children; the new generated m-nodes are added to mat(⋅). In the first step, when deleting (𝑢, 𝑢’) from P, some nodes in cand(u) and cand(𝑢’’) (𝑢’’ is an ancestor of 𝑢) may change from c-nodes to s-nodes. So when a c-node becomes an snode, a bottom-up approach is used to find its parents and ancestors from cand(⋅). If (𝑢1, 𝑢’1) and (𝑢2, 𝑢’2) are deleted, and 𝑢1 and 𝑢2 have a common ancestor 𝑢’, then cand(𝑢’) will be visited twice. In summary, there is a bottleneck of PGC IncGPM for multiple deleted edges. There is the same problem for multiple inserted edges.

218

Graphs: Theory and Algorithms

Optimization for Matching State Detection Sequence Since PGC IncGPM deals with edge insertions (resp., deletions) one by one, the efficiency of it gradually decreases as the number of changed edges increases. To overcome the bottleneck of PGC IncGPM, multiple edge insertions (resp., deletions) should be considered together. In this paper, the optimization method for nodes’ matching state detection sequence is proposed. The optimization can be applied to both insertions and deletions on 𝑃. Taking Sub Edges as an example, the optimization method is as follows.

Figure 3: Basic framework of PGC IncGPM algorithm.

First, analyze all edges deleted from 𝑃 to determine which nodes’ candidate matching sets may change. If cand(u) may change, then 𝑢 is added to 𝑓𝑖𝑙𝑡𝑜𝑟𝑑𝑒𝑟− set. Secondly, 𝑓𝑖𝑙𝑡𝑜𝑟𝑑𝑒𝑟− is sorted by the inverse topological sequence of 𝑃. There may be some strong connected components in 𝑃. In this case, we first find out all the strong connected components in P and, then, converge each strong connected component into a node to get a directed acyclic graph 𝑃’and find the inverse topological sequence of 𝑃’; finally, we replace the strong connected component convergence node with the original node set. Thus, the approximate inverse topological sequence of P is obtained.

Finally, for each 𝑢 in filtorder−, cand(𝑢) is processed in turn. Depending on whether there is a deleted edge from 𝑢, two different filtering methods are used: (1) if 𝑢 has at least one out-edge to be deleted, then each node in cand(𝑢) is likely to keep the child relationship of 𝑢 now. So whether they keep the child relationship of 𝑢 should be checked; (2) if 𝑢 does not have an out-edge be deleted, then only part of the nodes in cand(𝑢) are needed to be checked. That is, a node in cand(𝑢) will be checked only if it has at least one child which changes from c-node to s-node. The visited times of some candidate matching sets can be reduced through the above optimization.

Incremental Graph Pattern Matching Algorithm for Big Graph Data

219

iDeltaP IncGPM Algorithm Based on the optimization method proposed in Section 4.2, iDeltaP IncGPM is proposed. It uses the optimized method for both multiple inserted edges and multiple deleted edges. The optimization algorithm for edge deletions is shown in Algorithm 1. In Algorithm 1, 𝑛𝑜𝑑𝑒𝑠− contains all the nodes which have outedge deleted. For a node 𝑢 in 𝑃, if the changes of 𝑃 may result in some nodes in cand(𝑢) becoming s-nodes, then 𝑢 ∈ 𝑓𝑖𝑙𝑡𝑜𝑟𝑑𝑒𝑟−. 𝐹𝑖𝑙𝑡𝑜𝑟𝑑𝑒𝑟− is sorted by the inverse topological sequence of 𝑃 (lines (1)–(5)). If 𝑢 has an out-edge removed, that is, 𝑢 ∈ nodes−, then all the nodes in cand(𝑢) need to be checked whether they keep the child relationship of u (lines (7)–(12)). If 𝑢 ∈ filtorder− and 𝑢 is not in nodes−, then only part of nodes in cand(𝑢) are checked. That is, if 𝑤 has a child 𝑤’ and 𝑤’ is moved from cand(𝑢’ ) to sim(𝑢’ ) (𝑤’ ∈ 𝑠𝑛𝑒𝑤(𝑢’ )), then whether 𝑤 is still an s-node will be checked (lines (14)–(20)). Here we use an example to illustrate the implementation process of PGC IncGPM and iDeltaP IncGPM. The pattern graph 𝑃 is shown in Figure 4, assuming that (E, H), (G, I), and (C, G) are deleted from P.

The process of PGC IncGPM is as follows. (1) the deletion of (E, H) is processed, and each 𝑤 in cand(E) is checked whether it keeps the child relationship of 𝑢 in 𝑃 ⊕ Δ𝑃. If 𝑤 keeps the child relationship of 𝑢, then its parents founded from cand(B) (resp., cand(C)) are checked. If these nodes keep the child relationship of B (resp., C), then they are removed to sim(B) (resp., sim(C)). After that, their parents founded from cand(A) are checked; (2) the deletion of (G, I) is processed, and the nodes in cand(G), cand(C), cand(D), and cand(A) are checked in turn; (3) the deletion of (C, G) is processed, and the nodes in cand(C) and cand(A) are checked in turn. From the above steps, it can be seen that cand(C) and cand(A) are visited three times, cand(G), cand(D), cand(E), and cand(B) are visited once. The process of iDeltaP IncGPM is as follows: because of the deletion of (E, H), (G, I), and (C, G), some nodes in cand(E), cand(G), cand(C), cand(B), cand(D), and cand(A) may become s-nodes. The nodes in cand(⋅) are checked by the order {G, E, D, C, B, A}. That is, the nodes in cand(G) are checked first, and the nodes in cand(A) are checked at last. E, G, and C all have out-edges deleted, so all the nodes in their candidate matching sets are checked. For the nodes in cand(B), cand(D), and cand(A), only if they have a child changing from c-node to s-node, they will be checked. Therefore,cand(C),cand(D),can d(A),cand(G),cand(E), and cand(B) are only visited once. In other words, the optimized scheme reduces the visited times of cand(⋅).

220

Graphs: Theory and Algorithms

Algorithm 1: iDeltaP IncGPM for edge deletions.

Figure 4: An example for pattern graph changing.

For multiple edges inserted to the pattern graph, the similar optimization method is adopted. Nodes+ contains all the source nodes of inserted edges. If some nodes in sim(𝑢) may become c-nodes because of edge insertions, then 𝑢 is in filtorder+. filtorder+ is ordered by the reverse topological sequence of the pattern graph. Nodes+ and filtorder+ are used to reduce the visited times of sim(⋅) and mat(⋅).

Incremental Graph Pattern Matching Algorithm for Big Graph Data

221

EXPERIMENTS AND RESULTS ANALYSIS The following experiments evaluate our proposed algorithm. Runtime is used as a key assessment of algorithms. In addition, in order to show the effectiveness of incremental algorithms visually, improvement ratio (IR) is proposed, which is the ratio of runtime saved by incremental matching algorithms to the runtime of Re Computing algorithm. Two real data sets (Epinions and Slashdot [30]) are used for experiments. The former is a trust network with 75879 nodes and 508837 edges. The latter is a social network with 82168 nodes and 948464 edges. In previous work, we experimented with normal size and large size pattern graphs, respectively, and the results show that the complexity and effectiveness of incremental matching algorithm are not affected by the size of pattern graph.Therefore, in this paper, by default, the number of nodes in P (|𝑉𝑝|) is 9, the original number of edges in P (|𝐸𝑝|) is 8 (resp., 16) for insertions (resp., deletions) and 9 for both insertions and deletions.

In order to evaluate the improvement of our proposed algorithm, iDeltaP IncGPM, PGC IncGPM, and ReComputing are all performed on Epinions and Slashdot under different settings. Each experiment was performed 5 times with different pattern graphs, and the average results are reported here. The experimental results are shown in Figure 5. The histogram represents the runtime of algorithm, and the line chart represents the improvement ratio of iDeltaP IncGPM and PGC IncGPM to Re Computing.

Figure 5(a) (resp., Figure 5(b)) shows the runtime of three algorithms over Epionions (resp., Slashdot) for insertions on pattern graphs. The 𝑋-axis represents the number of insertions on P, “+2” represents that two edges are inserted to P, “+4” represents four edges are inserted to P, and so on. The figure tells us the following: (a) when insertions are no more than 10, the runtime of PGC IncGPM and iDeltaP IncGPM is significantly shorter than that of Re Computing, and iDeltaP IncGPM has the shortest runtime; (b) when insertions are 12 (new inserted edges account for 60% of the edges in 𝑃 ⊕ Δ𝑃), the runtime of PGC IncGPM is longer than that of ReComputing, while iDeltaP IncGPM still gets the shortest runtime; (c) the improvement ratio of iDeltaP IncGPM and PGC IncGPM decreases with the increase of edge insertion, but the decrease of iDeltaP IncGPM is smaller. The more inserted edges, the better iDeltaP IncGPM than PGC IncGPM. When 12 edges are inserted to 𝑃, the IR of iDeltaP IncGPM is 40% on average, and the IR of PGC IncGPM is 33% on average. Therefore, iDeltaP IncGPM is better than PGC IncGPM.The reason is that PGC IncGPM processes the

222

Graphs: Theory and Algorithms

inserted edges one by one. Therefore, as insertion increases, its runtime grows almost linearly. However, iDeltaP IncGPM integrates all the inserted edges, analyzes which matching sets are affected, and processes them in the appropriate order. This will prevent some matching sets to be processed repeatedly, which will shorten the running time.

Figure 5: The runtime of different algorithms when pattern graph changed.

Figure 5(c) (resp., Figure 5(d)) shows the runtime of three algorithms over Epionions (resp., Slashdot) for deletions on pattern graph. The 𝑋-axis represents the number of deletions on 𝑃, “−2” represents that two edges are deleted from 𝑃, “−4” represents four edges are deleted from 𝑃, and so on. It can be seen that (a) when deletion changes from 2 to 12, the runtime of all three algorithms increases, and iDeltaP IncGPM always has the shortest runtime; (b) as the deletion increases, the IR of PGC IncGPM decreases and the IR of iDeltaP IncGPM slowly increases. For 12 deletions, the IR of PGC IncGPM decreases to 7% on average, while the IR of iDeltaP IncGPM increases to 78% on average. The reason is that as the deletion increases, the runtime of ReComputing increases dramatically, while the runtime of iDeltaP IncGPM increases a little. iDeltaP IncGPM is better than PGC IncGPM because it compositely processes deleted edges and its runtime does not increase linearly as the number of deleted edges increases.

Incremental Graph Pattern Matching Algorithm for Big Graph Data

223

Figure 5(e) (resp., Figure 5(f)) shows the runtime of three algorithms over Epionions (resp., Slashdot) for both insertions and deletions on pattern graph. The 𝑋-axis represents the number of insertions and deletions on P, “+2−2” means that two edges are inserted to 𝑃 and the other two edges are removed from 𝑃, and so on. As shown in the figure, iDeltaP IncGPM always has shorter runtime than the others do.

In conclusion, iDeltaP IncGPM effectively improves the efficiency of PGC IncGPM through the optimization strategy. For the same Δ𝑃, the runtime of iDeltaP IncGPM is shorter, and as |Δ𝑃| increases, the runtime increases less; the decrease of IR is also more moderate. Therefore, iDeltaP IncGPM can be applied to larger changes of the pattern graph, and it has a wider range of applications.

CONCLUSION In this paper, we analyze PGC IncGPM to find its efficiency bottleneck and propose a more efficient incremental matching algorithm iDeltaP IncGPM. Multiple insertions (resp., deletions) are considered together and the optimization method for nodes’ matching state detection sequence is used. Experimental results on real data sets show that iDeltaP IncGPM has higher efficiency and wider application range than PGC IncGPM. Next, we will study the distributed incremental graph matching algorithm. Real-life graphs grow rapidly in size and hyper-massive data graphs cannot be centrally stored in one data center and need to be distributed across multiple data centers. It is very worthy studying how to make efficient incremental matching on distributed large graphs.

NOTATIONS 𝑃/𝐺: Pattern graph/data graph 𝑢/𝑢’: Nodes in 𝑃

V/V’: Nodes in 𝐺

Δ𝑃: Changes of 𝑃

Δ𝐺: Changes of 𝐺

𝑃 ⊕ Δ𝑃: New pattern graph

𝑐𝑎(𝑢): Nodes in 𝐺 that have same label with 𝑢 but do not keep child relationship of 𝑢

224

Graphs: Theory and Algorithms

𝑠(𝑢): Nodes in 𝐺 that only keep child relationship of 𝑢

𝑚(𝑢): Nodes in 𝐺 that keep child and parent relationships of 𝑢 index: The sets including 𝑐𝑎𝑛𝑑(𝑢), 𝑠𝑖𝑚(𝑢) and 𝑚𝑎𝑡(𝑢)

𝑐/𝑠/𝑚-𝑛𝑜𝑑𝑒: V in 𝐺 such that V ∈ 𝑐𝑎(𝑢)/𝑠𝑖𝑚(𝑢)/𝑚𝑎𝑡(𝑢) (𝑃, 𝐺): The maximum match in 𝐺 for 𝑃

𝐺𝑟: The result graph, a subgraph represents (𝑃, 𝐺).

CONFLICTS OF INTEREST

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Incremental Graph Pattern Matching Algorithm for Big Graph Data

225

REFERENCES 1.

X. Ren and J. Wang, “Multi-query optimization for subgraph isomorphism search,” Proceedings of the VLDB Endowment, vol. 10, no. 3, pp. 121–132, 2016. 2. Z. Yang, A. W.-C. Fu, and R. Liu, “Diversified top-k subgraph querying in a large graph,” in Proceedings of the 2016 ACM SIGMOD International Conference on Management of Data, SIGMOD 2016, pp. 1167–1182, USA, July 2016. 3. J. Gao, B. Song, W. Ke, and X. Hu, “BalanceAli: Multiple PPI Network Alignment With Balanced High Coverage and Consistency,” IEEE Transactions on NanoBioscience, vol. 16, no. 5, pp. 333–340, 2017. 4. A. Jentzsch, “Linked Open Data Cloud,” in Linked Enterprise Data, X.media.press, pp. 209–219, Springer Berlin Heidelberg, Berlin, Heidelberg, 2014. 5. Y. Hao, G. Li, P. Yuan, H. Jin, and X. Ding, “An Association-Oriented Partitioning Approach for Streaming Graph Query,” Scientific Programming, vol. 2017, pp. 1–11, 2017. 6. W. Fan, J. Li, J. Luo, Z. Tan, X. Wang, and Y. Wu, “Incremental graph pattern matching,” in Proceedings of ACM SIGMOD and 30th PODS 2011 Conference, pp. 925–936, Greece, June 2011. 7. W. Fan, C. Hu, and C. Tian, “Incremental Graph Computations,” in Proceedings of ACM International Conference, pp. 155–169, Chicago, Illinois, USA, May 2017. 8. S. Sun, Y. Wang, W. Liao, and W. Wang, “Mining Maximal Cliques on Dynamic Graphs Efficiently by Local Strategies,” in Proceedings of IEEE 33rd International Conference on Data Engineering (ICDE), pp. 115–118, San Diego, CA, USA, April 2017. 9. A. Stotz, R. Nagi, and M. Sudit, “Incremental graph matching for situation awareness,” in Proceedings of 12th International Conference on Information Fusion, FUSION 2009, pp. 452–459, usa, July 2009. 10. C. Wang and L. Chen, “Continuous subgraph pattern search over graph streams,” in Proceedings of the 25th IEEE International Conference on Data Engineering, ICDE 2009, pp. 393–404, China, April 2009. 11. S. Choudhury, L. Holder, G. Chin, A. Ray, S. Beus, and J. Feo, “Streamworks - A system for dynamic graph search,” in Proceedings of ACM SIGMOD Conference on Management of Data, SIGMOD 2013, pp. 1101–1104, USA, June 2013.

226

Graphs: Theory and Algorithms

12. K. Semertzidis and E. Pitoura, “Durable Graph Pattern Queries on Historical Graphs,” in Proceedings of International Conference on Data Engineering, pp. 541–552, October 2016. 13. L. X. Zhang, W. P. Wang, J. L. Gao, and J. X. Wang, “Pattern graph change oriented incremental graph pattern matching,” Journal of Software. Ruanjian Xuebao, vol. 26, no. 11, pp. 2964–2980, 2015. 14. X. Ren and J. Wang, “Exploiting Vertex Relationships in Speeding up Subgraph Isomorphism over Large Graphs,” in Proceedings of the 3rd Workshop on Spatio-Temporal Database Management, Co-located with the 32nd International Conference on Very Large Data Bases, VLDB 2006, pp. 617–628, kor, September 2006. 15. F. Bi, L. Chang, X. Lin, L. Qin, and W. Zhang, “Efficient subgraph matching by postponing Cartesian products,” in Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1199–1214, USA, July 2016 16. J. R. Ullmann, “An algorithm for subgraph isomorphism,” Journal of the ACM, vol. 23, no. 1, pp. 31–42, 1976. 17. W. Fan, J. Li, S. Ma, N. Tang, Y. Wu, and Y. Wu, “Graph pattern matching,” Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 264–275, 2010. 18. W. Fan, “Graph pattern matching revised for social network analysis,” in Proceedings of the 15th International Conference on Database Theory, ICDT 2012, pp. 8–21, deu, March 2012. 19. J. Gao, Q. Ping, and J. Wang, “Resisting re-identification mining on social graph data,” World Wide Web-Internet and Web Information Systems, 2017. 20. S. Ma, Y. Cao, W. Fan, J. Huai, and T. Wo, “Capturing topology in graph pattern matching,” Proceedings of the VLDB Endowment, vol. 5, no. 4, pp. 310–321, 2011. 21. A. Fard, M. U. Nisar, L. Ramaswamy, J. A. Miller, and M. Saltz, “A distributed vertex-centric approach for pattern matching in massive graphs,” in Proceedings of IEEE International Conference on Big Data, Big Data 2013, pp. 403–411, USA, October 2013. 22. Y. Liang and P. Zhao, “Similarity Search in Graph Databases: A Multi-Layered Indexing Approach,” in Proceedings of IEEE 33rd International Conference on Data Engineering (ICDE), pp. 783–794, San Diego, CA, USA, April 2017.

Incremental Graph Pattern Matching Algorithm for Big Graph Data

227

23. X. Liu, Y. Liu, H. Song, and A. Liu, “Big Data Orchestration as a Service Network,” IEEE Communications Magazine, vol. 55, no. 9, pp. 94–101, 2017. 24. J. Gao, J. Wang, J. He, and F. Yan, “Against Signed Graph Deanonymization Attacks on Social Networks,” International Journal of Parallel Programming. 25. Q. Zhang and A. Liu, “An unequal redundancy level-based mechanism for reliable data collection in wireless sensor networks,” EURASIP Journal on Wireless Communications and Networking, vol. 2016, article 258, 2016. 26. J. Gao, J. Wang, P. Zhong, and H. Wang, “On Threshold-Free Error Detection for Industrial Wireless Sensor Networks,” IEEE Transactions on Industrial Informatics, pp. 1–11. 27. Y. Xu, A. Liu, and C. Changqin, “Delay-aware program codes dissemination scheme in internet of everything, mobile information systems,” Mobile Information Systems, vol. 2016, Article ID 2436074, 18 pages, 2016. 28. X. Liu, S. Zhao, A. Liu, N. Xiong, and A. V. Vasilakos, “Knowledgeaware Proactive Nodes Selection approach for energy management in Internet of Things,” Future Generation Computer Systems, 2017. 29. K. Zhou, J. Gui, and N. Xiong, “Improving cellular downlink throughput by multi-hop relay-assisted outband D2D communications,” EURASIP Journal on Wireless Communications and Networking, vol. 2017, no. 1, 2017. 30. Stanford Large Network Dataset Collection, http://snap.stanford.edu/ data/index.html.

Framework and algorithms for identifying honest blocks in block chain

14

Xu Wang1,2,3, Guohua Gan3,4,5, and Ling-Yun WuID1,2,3

Key Laboratory of Management, Decision and Information Systems, Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, 1

School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China, 2

Laboratory of Big Data and Block chain, National Center for Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences, Beijing, China,

3

4

Beijing Taiyiyun Technology Co., Ltd., Beijing, China,

5

University of Science & Technology Beijing, Beijing, China

Citation: (APA): Wang, X., Gan, G., & Wu, L. Y. (2020). Framework and algorithms for identifying honest blocks in block chain. PloS one, 15(1), e0227531. (14 pages) Copyright: © 2020 Wang et al. This is an open access article distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

230

Graphs: Theory and Algorithms

ABSTRACT Block chain technology gains more and more attention in the past decades and has been applied in many areas. The main bottleneck for the development and application of block chain is its limited scalability. Block chain with directed acyclic graph structure (Block DAG) is proposed in order to alleviate the scalability problem. One of the key technical problems in Block DAG is the identification of honest blocks which are very important for establishing a stable and invulnerable total order of all the blocks. The stability and security of Block DAG largely depends on the precision of honest block identification. This paper presents a novel universal framework based on graph theory, called Max Cord, for identifying the honest blocks in Block DAG. By introducing the concept of discord, the honest block identification is modelled as a generalized maximum independent set problem. Several algorithms are developed, including exact, greedy and iterative filtering algorithms. The extensive comparisons between proposed algorithms and the existing method were conducted on the simulated Block DAG data to show that the proposed iterative filtering algorithm identifies the honest blocks both efficiently and effectively. The proposed Max Cord framework and algorithms can set the solid foundation for the Block DAG technology.

INTRODUCTION Block chain is a decentralized transaction and data management technology which was firstly developed by Nakamoto [1] for Bitcoin. A sequence of blocks which contain block head and block body form block chain chronologically. Marc Andreessen, the doyen of Silicon Valley’s capitalists, listed block chain as the most significant invention which has the potential to transform the world of finance and beyond since the Internet itself [2,3]. Nowadays block chain has gained more and more attention, and has been applied in lots of areas, such as IoT [4–7], healthcare [8], finance [9–12] and supply chain [13]. But there still exist some developmental bottlenecks for block chain. Swan [14] points out seven challenges that the block chain technology faces, including throughput, security and so on. The throughput of current block chain system under the Proof of Work (PoW) consensus mechanism is very low, nearly 6–7 transactions per second, while bank card such as Visa can process thousands of transactions per second. Many other consensus mechanisms such as Proof of Stake (PoS), Delegated Proof of Stake (DPoS), Proof of Importance (PoI), Proof of Luck (PoL), hybrid PoW/PoS, were proposed to accelerate the block validation process, but

Framework and algorithms for identifying honest blocks ...

231

the performance of block chain is still not substantially improved due to the occurrence of block chain forks. Moreover, the consensus mechanisms with higher throughput often sacrifice the decentralization to some extent. The poor scalability limits the development and application of block chain greatly, especially in the areas with high-frequency transactions. Most of research on block chain focused on improving block chain from the perspective of privacy and security [15–19], and some worked on the consensus algorithms [20], while only a few of researchers conducted some research on its scalable limitation. Lin et al. [21] showed the SPV (Simplified Payment Verification) technology to handle with the scalable limitation of block chain by only using the block head message without maintaining the full block information, which is equivalent to expand the block size in disguised form. Decker et al. [22] studied the information propagation in the Bit coin network and pointed out that the propagation delay in the network is the main reason for block chain forks, and the abandoned blocks due to block chain forks are the source of low throughput and poor scalability. Biais et al. [23] modeled the block chain protocol as a stochastic game to analyze the equilibrium strategies of miners. They found the longest chain without forks is a Markov perfect equilibrium and there also exist equilibria with forks, which leads to the orphaned blocks. The model can show how folks can be generated through information delays and software updates. Zohar et al. [24] proposed an alternative to the longest-chain rule used in Bitcoin, named as GHOST, which determines the main chain using all blocks in the sub tree at the fork. By utilizing the abandoned blocks, GHOST can improve the security of block chain, but the throughput remains the same. In another paper, Zohar et al. [25] presented the block chain with directed acyclic graph structure (Block DAG), which allows the blocks to reference multiple predecessors to incorporate the information from all blocks into log. The DAG structure works well and leads to an increased throughput. More and more researchers agreed that Block DAG is the next generation direction for the block chain technology. An increasing number of Block DAG-based block chain platforms appeared in industry, such as IOTA (Tangle), byte ball, and XDAG [26–28]. The block ordering problem is the main concern under directed acyclic graph (DAG) structure. In order to avoid the double-spending attack (attackers try to spend the same crypto currencies more than once), the ordering of transactions is very necessary: People can determine the valid transactions through accepting the first transaction and rejecting the later one according to the order of two conflict transactions. And the ordering of

232

Graphs: Theory and Algorithms

transactions is dependent on the ordering of blocks. Unlike the traditional block chain, the ordering of blocks in the DAG is not straightforward since the DAG is essentially a partial order graph. Therefore, many approaches were developed to derive a stable and invulnerable total order of blocks from the DAG. A typical approach for block ordering problem, as PHANTOM proposed by Zohar et al. [29], is mainly composed of two steps. First, distinguish the honest blocks (blocks generated and propagated timely by the miners who conform with the rules of block chain) from the suspect dishonest blocks (blocks generated by the miners who deliberately keep the blocks secrete for a long time or take other actions which are not in accordance with the rules of block chain). Second, derive a full topological ordering based on the temporal information embedded in the topological structure of honest blocks. In the two-step approach, the quality of final block ordering is largely dependent on the precision of the honest block identification since the dishonest blocks might disturb the true temporal information. The more precise the honest block identification is, the better the block ordering. In this paper, we presented a novel graph theory-based framework Max Cord for identifying the honest blocks in Block DAG. Using this framework, several algorithms were developed and evaluated on the simulated Block DAG data. Compared with the existing approach PHANTOM, the proposed algorithms are both effective and efficient which can lay the solid foundation for the following development of Block DAG technology. The following of the paper is organized as follows. In Section 2, we described the honest block identification problem and proposed a novel general framework and illustrated its relationship with PHANTOM. Several algorithms for solving the problem were presented, including an iterative filtering algorithm MAXCORD-IFA. The algorithms were evaluated, analyzed and compared in Section 3 based on the simulated Block DAG datasets. The conclusions were drawn, and further research directions were given in the last section.

HONEST BLOCK IDENTIFICATION PROBLEM Briefly speaking, Block DAG is a DAG of blocks, in which each block is constructed by one of the participated miners and linked to other blocks. The new block is connected to the existing DAG by referencing all tip

Framework and algorithms for identifying honest blocks ...

233

blocks (those blocks are not referenced by any other blocks) of the DAG found by the miner. If there is no network delay, the DAG will become a directed chain where the first block (genesis block) is the root. Generally, the blocks constitute a growing DAG. If all participated miner honestly obey the rules of block generation and connection, the relative temporal order of blocks can be straightforwardly inferred, with acceptable small error which depends on the extent of network delay. However, the blocks generated by potential attackers might significantly disturb the temporal order information embedded in the Block DAG. By analyzing the possible attack modes, we found that there are two key policies can be taken by the attackers: 1.

keep their blocks secret and publish them later, e.g. after a certain transaction is confirmed; 2. Only reference the specific blocks, e.g. the blocks created by themselves. When generating a block and connecting it to the Block DAG, the attackers might take any one or both two of the above policies depending on their attacking strategy. Whatever the policy the attacker’s use, the connection patterns between the blocks created by the attackers and the normal blocks are very different from the patterns within each group. Based on the intuition, we first defined a novel discord measurement which can estimate the possibility of two blocks belonging to different groups. Using the discord measurement, the honest block identification problem can be formulated as a maximum k-independent set problem.

Discord between blocks Given a block DAG G = (V, E) where each vertex in V represents the block and each directed edge in E represents the reference link. For a block A, the future set is the blocks that can reach A, denoted as future (A, G). Similarly, the past set of block A is the blocks that can be reached from A, denoted as past (A, G). Fig 1 shows an example of the future and past sets. Naturally, there are still some blocks left except the block’s future, past and itself. These blocks are the complementary set of the past (A, G), A and future (A, G), denoted as antic one (A, G): anticone(A,G)=G\(A∪past(A,G)∪future(A,G)).

234

Graphs: Theory and Algorithms

Figure 1. The illustration of past, future, antic one of block A.

The discord between blocks A and B is 6 (the length of blue loop), and the discord between blocks A and D is 4 (the length of red loop). The example of antic one is also illustrated in Fig 1. Normally, the antic one set denotes the blocks that are created during the block A’s propagation time. When blocks are created very frequently, there are more blocks in the antic one set of each block. We further define a novel measurement called discord to distinguish the blocks in the antic one set. For a block A, the discord between A and any block in past (A, G) and future (A,G), is defined as zero. The discord between A and a block B in antic one (A, G) is defined as the length of loop formed by A, B, their nearest common ancestor and nearest common descendant. For the situation where there is no common descendant between two blocks, a virtual block is added which references all tip blocks so that the blocks A, B, their nearest common ancestor and the virtual block can form a loop. An example of discord is illustrated in Fig 1. Suppose dij denotes the shortest distance from block i to block j in the original DAG, and

The discord between blocks A and B can be mathematically defined by the following formula.

Framework and algorithms for identifying honest blocks ...

235

(1)

Max Cord framework The discord is designed to estimate the temporal inconsistency (ambiguity) between two blocks. If two blocks are in the antic one of each other, their temporal order cannot be determined. But the time discrepancy of two blocks are bounded by their nearest common ancestor and nearest common descendant, since the real creation time of a block is bounded by its ancestors and descendants. The dishonest blocks always intend to hide or counterfeit its real creation time in order to carry out some attack such as double spending. Therefore, the discords between the dishonest block and most honest blocks should be very large, otherwise the real creation time of the dishonest block would be bounded by some honest blocks into a small interval. On the other hand, the discord of two honest blocks is normally much smaller. If the links between blocks are not artificially manipulated, the discord of two blocks should only depend on the network propagation speed and the block creation rate. When the network propagation speed or the block creation rate increases, the time discrepancy between the nearest common ancestor and the nearest common descendant will decline, however, the length of shortest path between two blocks will increase and cancel out the decline of time discrepancy to some extent. Therefore, the discords are not very sensitive to the network propagation speed and the block creation rate. The analysis on simulated Block DAG datasets shows that the discords between two honest blocks are mostly smaller than 10 while the discords between the honest block and the dishonest block might be higher in two or more order of magnitude. Fig 2 shows the relationship between the block creation rate and the maximum discord between honest blocks. In every case of block creation rate, 100 simulations are conducted. The statistical analysis is shown in Table 1. Even when the block creation rate reaches 600 blocks per second, the discord between honest blocks still does not increase too much. Therefore, the discords can be utilized to filter the suspect dishonest blocks. In this section, we give a novel framework named Max Cord for identifying the honest blocks based on the discord.

236

Graphs: Theory and Algorithms

Figure 2. The relationship between block generation rate and maximum honest block discord.

Table 1. The statistical analysis of maximum honest block discords. Block creation rate (blocks/s) Mean Maximum Variance

1/120

1/30

1/5

1

5

30

120

600

4.08 5 0.07

7.68 9 0.30

8.50 10 0.29

8.09 9 0.12

8.34 9 0.23

8.78 10 0.27

10 10 0.00

9.07 10 0.61

Given a block DAG, we first calculate the discords for every pairs of blocks and get the discord matrix. Then the discord matrix is converted into a binary matrix where each element is 1 if the corresponding element in the discord matrix is larger than a preset threshold d and 0 otherwise. By using the obtained binary matrix as the adjacency matrix, we can construct an undirected graph, in which each vertex represents a block. This graph is called d-discord graph of the given block DAG. Intuitively, if a block DAG only contains honest blocks, the degrees in the d-discord graph would be very small since the discords between most honest blocks are zero and the remaining non-zero discords are also very small. Considering the honest blocks are the majority, the honest block identification problem can be addressed by identifying the maximum subset of vertexes with small degrees.

Framework and algorithms for identifying honest blocks ...

237

Considering a graph G = (V, E), the k-independent set of G refers to the vertex subset V’ in which the maximum degree in the induced sub graph does not exceed k. The maximum k-independent set problem is to find the k-independent set with maximum size, which is a generalization of classical maximum independent set problem. The maximum k-independent set can be formulated as the following integer programming, in which xs represents whether a certain vertex s is selected and aij denotes the element of adjacency matrix of the graph G.

(2) Given the non-negative integer parameters k and d, the honest block identification problem can be formulated as a maximum k-independent set problem in the d-discord graph of the given block DAG, denoted as Max Cord-(k,d). It is straightforward to verify that Max Cord-(0,0) is equivalent to the longest-chain rule adopted in the traditional block chain system such as Bitcoin. PHANTOM also falls into the Max Cord framework as it indeed solves the model Max Cord-(k,0). PHANTOM’s intuition is that the number of honest blocks in the anticone set of each honest block should be very small, e.g. less than k, since there are not many blocks generated during the propagation of the honest block. However, the parameter k is closely related to the block creation rate and it is difficult to determine a proper value of k in practice. Notice that the discord information is not fully utilized in the model MaxCord-(k,0). In this paper, we focus on the other special case, i.e. Max Cord-(0,d), which assumes the discords between each pair of honest blocks should be very small, e.g. less than d. The parameter d in the model Max Cord-(0,d) plays a very important role as it determines the potential boundary between the honest blocks and the suspect dishonest blocks. Smaller threshold d implies more stringent criterion of honest blocks. If the discords between a suspect dishonest block and all honest blocks are smaller than the threshold, the suspect dishonest block might be misidentified as an honest block. Therefore, larger threshold might decline the precision of honest block identification. On the other hand, if the discord between two honest blocks is larger than the threshold, one of these two blocks

238

Graphs: Theory and Algorithms

would be misidentified as suspect dishonest block. Thus, smaller threshold might decrease the recall. In a word, small d emphasizes the precision of honest block identification, while large d emphasizes the recall. This property will be exploited to develop a heuristic algorithm in the next section.

Algorithms for Max Cord-(0, d) There are many algorithms of maximum independent set which can be directly applied to solve the model Max Cord-(0,d), including approximation algorithms [30, 31] and exact algorithms for particular graphs [32, 33]. It is well-known that the maximum independent set problem is NP-hard, thus, there does not exist a polynomial-time exact algorithm. Generally, the exact algorithm is fast if the graph is very sparse, and very slow for dense graph. Besides the exact algorithm, a simple greedy algorithm was also implemented to solve the problem more efficiently, in which the vertex with the maximum degree is repeatedly removed as long as the remaining vertex set is not an independent set. The exact and greedy algorithms are named MAXCORD-EXACT and MAXCORD-GREEDY, respectively. In order to solve the honest block identification problem more efficiently, we further developed a new heuristic algorithm called MAXCORD-IFA. Notice that when the threshold d is large, the model Max Cord-(0, d) has high recall. Large d also leads to a sparse d-discord graph. By exploiting these two special characteristics of Max Cord framework, MAXCORDIFA solves the model Max Cord-(0,d) by iteratively filtering out the suspect dishonest blocks identified by a series of models Max Cord-(0,s) with s > d. Starting with a very large s, MAXCORD-IFA applies the exact algorithm of the maximum independent set problem to solve the model Max Cord-(0,s), and removes the vertexes identified as suspect dishonest blocks. Then it decreases the value of s and solves a new model Max Cord-(0, s) defined on the remain blocks. The procedure is repeated until s converges to the predetermined d. The detail of the MAXCORD-IFA algorithm is described in Fig 3.

Framework and algorithms for identifying honest blocks ...

239

Figure 3. The pseudo code of MAXCORD-IFA algorithm.

There are two parameters in Algorithm 1. The parameter d is the final value of the threshold for constructing the d-discord graph. In each iteration, the algorithm takes the top (1-alpha) percentage discords into account. That is, we gradually decrease the value of the threshold until it reaches the desired value d. In this way, the suspect dishonest blocks with high probability are filtered out step by step. The parameter alpha determines the sparsely of the d-discord graph in each iteration. If it is set close to 1, the d-discord graph becomes very sparse thus the computation of each iteration is fast, but more iterations are required to finish the algorithm. On the other hand, if it is set too low, the computation of each iteration would be unacceptably slow, and the result would be similar to that of the single iteration exact algorithm. Because MAXCORD-IFA is by means of multiple iterations while MAXCORD-EXACT is through one iteration to convert the discord matrix into binary matrix, as long as the parameters d takes the same value, MAXCORD-EXACT is a special case of MAXCORD-IFA.

240

Graphs: Theory and Algorithms

RESULTS In order to evaluate the proposed Max Cord framework, we applied several algorithms, including MAXCORD-EXACT, MAXCORD-GREEDY and MAXCORD-IFA to the simulated Block DAG datasets and compared them with the existing method PHANTOM. Although there already exist some real Block DAG systems, it is hard to get the real data. We simulated the Block DAG datasets by using the probabilistic model of block creation and propagation on the P2P network. The computation power is assumed equally distributed, i.e. the blocks are created by the participated miners with equal probability. Two types of interval between two consecutively generated blocks is considered, including fixed interval and random interval (e.g. exponential distribution). The block transfer time on the P2P network is simulated using a gamma distribution. The parameters are tuned so that the throughput of block chain with the longest-chain rule approximates the real world. The simulation algorithm is implemented in the R package Block Sim which is available at http://github.com/wulingyun/BlockSim. The following simulation experiments were all run in R 3.5.1.

Parameter Determination The parameter d of the Max Cord-(0, d) model is set identical for three algorithms MAXCORD-EAXCT, MAXCORD-GREEDY and MAXCORDIFA. According to the analysis in the previous section, we take 8 as the default value of the parameter d to identify the honest blocks in this study. There is an additional parameter alpha in the algorithm MAXCORDIFA. The parameter alpha is chosen not only to make sure the d-discord graph constructed in each iteration is sparse enough thus easy to solve by the exact algorithm of maximum independent set problem, but also to strike a balance between the number of iterations and the time taken by each iteration. The influence of the parameter alpha on running time, precision and recall of different attack powers is shown in Fig 4. The results of MAXCORDEXACT are also shown for comparison. Two attack modes are simulated, moderate attack (the attackers have 33% of all computation power) and heavy attack (the attackers have 49% of all computation power). In each attack mode, the algorithms are applied to 100 Block DAG datasets. Each dataset contains 1000 blocks and the block creation rate is 10 blocks per second. The left part A of Fig 4 is the results under the moderate attack case while the right part B is the results under the heavy attack case. The horizontal axis represents that the parameter alpha ranges from 0.7 to 0.95.

Framework and algorithms for identifying honest blocks ...

241

Notice that when the parameter alpha takes the value of 0.7, the results of MAXCORD-EXACT and MAXCORD-IFA are the same. Because when the value of alpha is small enough, MAXCORD-IFA takes only one iteration, and degenerates to MAXCORD-EXACT. From the aspect of running time, as the number of iterations increases, namely the gradual increment of the alpha, the total running time decreases although the decrements become less and less. Considering the indicators of precision and recall, 0.9 is the best choice for alpha no matter what level the attack power. In the following analysis of this study, we set the default value of alpha as 0.9.

Figure 4. The influence of the parameter alpha on MAXCORD-IFA.

Part A represents the case with moderate attack power, while part B represents the case with heavy attack power.

Comparison between algorithms for Max Cord-(0,d) We first evaluated several algorithms proposed for the Max Cord-(0,d) model, including MAXCORD-EXACT, MAXCORD-GREEDY and MAXCORD-IFA, and the comparisons among them are made. The Block DAG is simulated with 1000 blocks and the block creation rate is 10 blocks per second. For each circumstance, we simulate 100 Block DAG networks and the algorithms are applied to the same block DAG each time. The results are shown in Fig 5.

242

Graphs: Theory and Algorithms

Figure 5. The comparisons among the algorithms for Max Cord-(0,d).

The left part A represents the case with moderate attack power, while right part B represents the case with heavy attack power. The horizontal axis represents the parameter d. With the increment of the parameter d, the indicator precision of the three algorithms decreases step by step, while their indicator of recall makes no big difference and approximates 1 when the attack power is moderate. The results of MAXCORD-GREEDY and MAXCORD-EXACT are almost identical under this case. MAXCORD-EXACT is a little bit better than MAXCORDIFA in the aspect of the indicator of recall and they are nearly the same in the aspect of the indicator of precision. Stated in another way, MAXCORDIFA improves the running time a lot by omitting a little bit honest blocks and maintaining the precision of the honest blocks when the attack power is moderate, but this little omission does not have large influence on the subsequent blocks ordering problem. When the attack power is heavy, the parameter d has larger influence on the results for MAXCORD-EXACT than the MAXCORD-IFA that is MAXCORD-IFA is more robust. MAXCORD-IFA outperforms MAXCORD-EXACT no matter the indicators of precision or recall. MAXCORD-GREEDY is the worst under this case. In a word, MAXCORDIFA is both effective and efficient in distinguishing the honest blocks from the suspect dishonest blocks.

Framework and algorithms for identifying honest blocks ...

243

Comparison between MAXCORD-IFA and PHANTOM Because PHANTOM is a recursive algorithm which is very time-consuming, it is impracticable to apply PHANTOM in large-scales cases. Therefore, we only simulated small-scale cases to compare MAXCORD-IFA with PHANTOM. In detail, we simulated the Block DAG with 200 blocks and the block creation rate is 1/10 blocks per second. The attackers’ computation power percentage ranges from 0.1 to 0.45. For each attack circumstance, we conducted 10 simulations and both MAXCORD-IFA and PHANTOM are applied to the same 10 Block DAG datasets. The parameter k in PHANTOM takes the default value 3. Fig 6 shows the precision and recall indicators of the two algorithms. It can be seen that MAXCORD-IFA can well recognize the honest blocks no matter how much computation power the attackers own. Even if the recall indicator of MAXCORD-IFA is not always 1 (very close to 1), which means it may omit a little bit honest blocks. This small sacrifice is worthwhile since it can save plenty of time. When the attack power is very strong, MAXCORDIFA obviously outperforms PHANTOM. In a word, MAXCORD-IFA is an effective and efficient algorithm in the honest block identification problem

. Figure 6. The comparison between MAXCORD-IFA and PHANTOM.

CONCLUSIONS AND DISCUSSIONS In the paper, we introduce the honest block identification problem in the Block DAG technology and present a novel universal framework of honest block identification problem through converting it into the maximum k-independent

244

Graphs: Theory and Algorithms

set problem on the basis of the definition of discord measurement between blocks. We point out the existing method PHANTOM is one of its special cases, and give several algorithms for the other special case Max Cord-(0,d), named MAXCORD-EXACT, MAXCORD-GREEDY, and MAXCORDIFA. Comparisons are made among these algorithms on the simulated Block DAG datasets. MAXCORD-IFA outperforms PHANTOM to large extents, especially when the attack power is heavy. PHANTOM takes very long time therefore can only be applied to small-scale cases. The results of PHANTOM are sensitive to the value of parameter k which is difficult to be determined for a given dataset. In large-scale cases, MAXCORD-IFA outperforms MAXCORD-EXACT and MAXCORD-GREEDY except that the recall of MAXCORD-IFA is a little bit lower than that of another two algorithms when the attack power is moderate. But this small sacrifice, only omitting a little bit honest blocks while maintaining the precision of identification, can save lots of time and does not have significant influence on the subsequent block ordering problem. This study is the first work on the Max Cord framework. In this direction, there are many problems deserved for future research. For example, Max Cord-(0,d) and Max Cord(k,0) are only two special cases of the Max Cord framework to identify the honest blocks. As our study suggests that Max Cord-(0,d) is better than Max Cord-(k,0), it is interesting to investigate the performance of general model Max Cord-(k,d) for the cases where d and k are both non-zero. Development of better algorithms for Max Cord models is also a very important and challenging task. The influence of different algorithms on the following block ordering problem is still not very clear. The block ordering derived from the honest block sets identified by several algorithms might be very different, even if their honest block identification results are very similar. There are also other mathematical problems in the Block DAG technology that is closely related to the honest block identification, such as the transaction fee allocation. In this paper, we consider the Block DAG in which the new block could be connected to all the tip blocks observed by the node when issuing the new block. All blocks are kept in the system and can be referenced directly or indirectly by the new blocks in the future. Block DAG attempts to identify the dishonest blocks by an algorithm independent of the construction of DAG. There also exist different approaches for the DAG-based block chain technology. For example, IOTA Tangle attempts to distinguish the dishonest

Framework and algorithms for identifying honest blocks ...

245

transactions from normal transactions by the tip selection algorithm based on MCMC (Markov Chain Monte Carlo) random walk and cumulative weights, and the dishonest transaction may fall into oblivion. It is interesting and important to study the pros and cons of two different approaches as well as the possible meld of them.

FUNDING STATEMENT LYW was supported by the Laboratory of Big Data and Block chain, National Centerfor Mathematics and Interdisciplinary Sciences, Chinese Academy of Sciences. GG received salary from Beijing Taiyiyun Technology Co., Ltd. The funders did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.

246

Graphs: Theory and Algorithms

REFERENCES 1.

2. 3.

4.

5.

6.

7.

8.

9. 10. 11.

12.

13.

Nakamoto S. Bitcoin: A peer-to-peer electronic cash system. 2008. Available from: http://citeseerx.ist.psu.edu/viewdoc/download?doi=1 0.1.1.221.9986&rep=rep1&type=pdf Iansiti M, Lakhani K R. The truth about block chain. Harv Bus Rev. 2017; 95:118–127. Crosby M, Nachiappan, Pattanayak P, Verma S, Kalyanaraman V. Block chain technology: Beyond bitcoin. Applied Innovation. 2016; 2(6–10):71. Bahga A, Madisetti V K. Block chain platform for industrial internet of things. Journal of Software Engineering and Applications. 2016; 9(10):533. Huh S, Cho S, Kim S. Managing IoT devices using block chain platform. 19th international conference on advanced communication technology (ICACT), IEEE. 2017:464–467. Zhang Y, Wen J. The IoT electric business model: Using block chain technology for the internet of things. Peer Peer Netw Appl. 2017; 10(4):983–994. Reyna A, Martín C, Chen J, Soler M, Diaz M. On block chain and its integration with IoT Challenges and opportunities. Future Gener Comput Syst. 2018; 88:173–190. Mettler M. Block chain technology in healthcare: The revolution starts here. 18th International Conference on e-Health Networking, Applications and Services (Healthcom), IEEE. 2016:1–3. Treleaven P, Brown R G, Yang D. Block chain technology in finance. Computer. 2017; 50(9): 14–17. Tapscott A, Tapscott D. How block chain is changing finance. Harv Bus Rev. 2017; 1(9):2–5. Kristoufek L. What are the main drivers of the Bitcoin price? Evidence from wavelet coherence analysis. PLoS ONE. 2015; 10(4):e0123923 10.1371/journal.pone.0123923 Kim Y B, Lee J, Park N, Choo J, Kim J, Kim CH. When Bitcoin encounters information in an online forum: Using text mining to analyse user opinions and predict value fluctuation. PLoS ONE. 2017; 12(5):e0177630 10.1371/journal.pone.0177630 Apte S, Petrovsky N. Will block chain technology revolutionize

Framework and algorithms for identifying honest blocks ...

14. 15.

16.

17. 18.

19.

20. 21. 22.

23. 24.

25.

26. 27.

247

excipient supply chain management? Journal of Excipients and Food Chemicals. 2016; 7(3):910. Swan M. Block chain: Blueprint for a new economy. O’Reilly Media, Inc; 2015. Yli-Huumo J, Ko D, Choi S, Park S, Smolander K. Where is current research on block chain technology?—a systematic review. PLoS ONE. 2016; 11(10):e0163477 10.1371/journal.pone.0163477 Zheng Z, Xie S, Dai H N, Chen X, Wang H. Block chain challenges and opportunities: A survey. International Journal of Web and Grid Services. 2018; 14(4):352–375. Li X, Jiang P, Chen T, Luo X, Wen Q. A survey on the security of block chain systems. Future Gener Comput Syst. 2017. Feng Q, He D, Zeadally S, Khan MK, Kumar N. A survey on privacy protection in block chain system. Journal of Network and Computer Applications. 2019; 126:45–58. Juhász P L, Stéger J, Kondor D, Vattay G. A Bayesian approach to identify Bitcoin users. PLoS ONE. 2018; 13(12):e0207000 10.1371/ journal.pone.0207000 Nguyen G T, Kim K. A Survey about Consensus Algorithms Used in Block chain. Journal of Information processing systems. 2018; 14(1). Lin I C, Liao T C. A Survey of Block chain Security Issues and Challenges. IJ Network Security. 2017, 19(5):653–659. Decker C, Wattenhofer R. Information propagation in the bitcoin network. IEEE Thirteenth International Conference on P2P Computing. 2013:1–10. Biais B, Bisiere C, Bouvard M, Casamatta C. The block chain folk theorem. The Review of Financial Studies. 2019, 32(5):1662–1715. Sompolinsky Y, Zohar A. Secure high-rate transaction processing in bitcoin International Conference on Financial Cryptography and Data Security, Springer, Berlin, Heidelberg; 2015:507–527. Lewenberg Y, Sompolinsky Y, Zohar A. Inclusive block chain protocols International Conference on Financial Cryptography and Data Security, Springer, Berlin, Heidelberg: 2015:528–547. Popov Serguei. The tangle. 2016. Available from: https://www.iotawiki.com/download/iota_whitepaper.pdf Pervez H, Muneeb M, Irfan MU, Haq IU. A comparative analysis of

248

28.

29. 30. 31.

32.

33.

Graphs: Theory and Algorithms

DAG-based block chain architectures. 12th International Conference on Open Source Systems and Technologies, IEEE. 2018:27–34. Bai Chong. State-of-the-art and future trends on block chain based on DAG structure International workshop on structured object-oriented formal language and method, Springer, Cham: 2018:183–196. Sompolinsky Y, Zohar A. PHANTOM: A scalable block DAG protocol. IACR Cryptology ePrint Archive, Report. 2018, 104. Tarjan R E, Trojanowski A E. Finding a maximum independent set. SIAM J Sci Comput. 1977; 6(3):537–546. Feo T A, Resende M G C, Smith S H. A greedy randomized adaptive search procedure for maximum independent set. Operations Research. 1994; 42(5):860–878. Xiao M, Nagamochi H. An exact algorithm for maximum independent set in degree-5 graphs. Discrete Applied Mathematics. 2016; 199:137– 155. Tsukiyama S, Ide M, Ariyoshi H, Shirakawa I. A new algorithm for generating all the maximal independent sets. SIAM J Sci Comput. 1977; 6(3):505–517.

Enabling Controlling Complex Networks with Local Topological Information

15

Guoqi Li1,4, Lei Deng1,5, Gaoxi Xiao2, Pei Tang1,4, Changyun Wen2, Wuhua Hu 2, Jing Pei1,4, Luping Shi1,4 and H. Eugene Stanley3

Center for Brain Inspired Computing Research, Department of Precision Instrument, Tsinghua University, Beijing, P. R. China. 1

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore.

2

Center for Polymer Studies, Department of Physics, Boston University, Boston, USA.

3

4

Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, P. R. China.

Present address: Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA, USA. Guoqi Li, Lei Deng, Gaoxi Xiao and Pei Tang contributed equally to this work.

5

Citation: (APA): Li, G., Deng, L., Xiao, G., Tang, P., Wen, C., Hu, W., ... & Stanley, H. E. (2018). Enabling controlling complex networks with local topological information. Scientific reports, 8(1), 1-10. (11 pages) https://doi.org/10.1038/s41598-018-22655-5 URL: https://www.nature.com/articles/s41598-018-22655-5 Copyright:- Open Access. This article is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.

250

Graphs: Theory and Algorithms

ABSTRACT Complex networks characterize the nature of internal/external interactions in real-world systems including social, economic, biological, ecological, and technological networks. Two issues keep as obstacles to fulfilling control of large-scale networks: structural controllability which describes the ability to guide a dynamical system from any initial state to any desired final state in finite time, with a suitable choice of inputs; and optimal control, which is a typical control approach to minimize the cost for driving the network to a predefined state with a given number of control inputs. For large complex networks without global information of network topology, both problems remain essentially open. Here we combine graph theory and control theory for tackling the two problems in one go, using only local network topology information. For the structural controllability problem, a distributed localgame matching method is proposed, where every node plays a simple Bayesian game with local information and local interactions with adjacent nodes, ensuring a suboptimal solution at a linear complexity. Starring from any structural controllability solution, a minimizing longest control path method can efficiently reach a good solution for the optimal control in large networks. Our results provide solutions for distributed complex network control and demonstrate a way to link the structural controllability and optimal control together.

INTRODUCTION Over the past decade the complex natural and technological systems that permeate many aspects of everyday life—including human brain intelligence, medical science, social science, biology, and economics—have been widely studied1,2,3. Many of these complex systems can be modeled as static or dynamic networks, which stimulates the emergence and booming developments of research on complex networks. There are two fundamental issues associated with the control of complex networks, with different focuses on figuring out (i) whether the networks are controllable; and (ii) how to control them with least cost when they are controllable, respectively. The first issue is typically investigated by studying the structural controllability problem, which describes the ability to guide a dynamical system from any initial state to any desired final state in finite time. The second issue is known as the optimal cost control problem, with the main objective of minimizing the cost for driving the network to a predefined state with a given number of control inputs. Figure 1 illustrates the structural controllability problem

Enabling Controlling Complex Networks with Local ...

251

and the optimal cost control problem. Note that for large complex networks without global information of network topology, both problems remain essentially open. In this work, we shall combine graph theory and control theory for tackling the two problems in one go, using only local network topology information.

Figure 1. Two basic problems in controlling complex networks. (a) An essential issue interconnecting graph theory and control theory: how to provide a link from “structural controllability” to “optimal cost control”. (b) Illustration of the local topological information available to a node in a network. For a node x0 with neighbor nodes x1, x2, x3 and x4, it is assumed that the node x0 can only observe the numbers of incoming and outgoing links connected to each of the nodes x1, x2, x3 and x4. (c) Optimal cost control aims to determine a solution of driving a system state to any predefined state with minimum cost.

Researchers are using a multidisciplinary approach to study the structural controllability of complex networks, focusing on linear time invariant (LTI)4 systems =Ax(t)+Bu(t), where x(t) = [x1(t), …, xN(t)]T is the state vector of N nodes at time t with an initial state x(0), u(t) = [u1(t), …, u M (t)] T is the time-dependent external control input vector, and M (M ≤ N) is the number of inputs in which the same input ui(t) can drive multiple nodes. The matrix A = [a ij ]N×N is the weighted adjacency matrix of the network, i.e., a ij ≠ 0 if there is a link connecting node i to node j and a ij = 0 otherwise, and B = [b im ] is the input matrix where b im is nonzero when controller m is connected N×M to node i and zero otherwise. The nodes have different physical meanings in different scenarios. In the Traveling Salesman Problem (TSP) it is a city or location, in a social network it is a person or group, and in an organism it could be an interacting protein. Even when networks have similar properties, a node can have a variety of interpretations in different applications. For example, in a recent work studying the structural controllability of brain networks5, researchers show that the neural activity process can be approximated using linearized generalizations of nonlinear models of cortical circuit activities. In their proposed LTI system, a ij is the number of streamlines

252

Graphs: Theory and Algorithms

connecting brain region i to region j. Using an intricately detailed model of a very small region reveals whether a ij describes a connection between neuron i and neuron j. Note that in this report, “controllability” always refers to “structural controllability”. Hence hereafter we shall use these two terms interchangeably for convenience of discussion. Though there are recent literatures studying on nonlinear dynamics of complex networks6,7, this paper follows the mainstream work focusing on LTI systems, mainly for two reasons: (i) a lot of real-world systems can be approximated by LTI systems; the optimal control of LTI dynamics on complex networks thus forms a basis for the control and optimal control of complex systems; (ii) even for LTI dynamics on complex networks, no existing literature has considered their control and optimal cost control with only local topology information. We focus on the fundamental issues of control and optimal control of complex systems, for the first time to the best of our knowledge, demonstrating a way to link “structural controllability” and “optimal cost control” of LTI systems together. The results shall find wide applications and great potentials for further extensions in control and optimal control of complex systems. Meanwhile, there would be a long way to go in our future research to develop general methodology for control of nonlinear dynamics on complex networks. Our work is mainly composed of two parts. In the first part, we study on the controllability problem. We show that with properly designed local operations strictly based on only local network topology information, a controllability solution can be found which is nearly as good as the optimal solution calculated using global network topology information. In the second part, we firstly propose a relatively sophisticated optimal cost control algorithm which works effectively for small or medium-sized networks. Such an algorithm has its applications in those complex systems that are not so big, and provides a benchmark for heuristic algorithm design as well. Then we propose a simple algorithm that works efficiently for large and extralarge networks. To the best of our knowledge, this is the first time that an efficient algorithm is proposed for the optimal control of large-scale complex networks. Some brief discussions on each of the three proposed algorithms are presented below, while further technical details and mathematical work can be found in the Supplementary Information (hereafter termed as SI). Maximum matching (MM) is a concept in graph theory that has been used to address the structural controllability problem, a classic concept in control theory8,9,10. Generally speaking, MM is to find the largest set of

Enabling Controlling Complex Networks with Local ...

253

the edges that do not share start or end nodes. A node is said to be matched if a link in the maximum matching points at it; otherwise it is unmatched. By assuming that the topological information of a network is fully known and by employing MM, the matched and unmatched nodes and edges form elementary stems and elementary circles8. Here an elementary stem is a directed network component consisting of N nodes 1, ...., n connected by a sequence of n − 1 directed edges {1 → 2, 2 → 3, …, n − 1 → n}, and an elementary stem becomes an elementary circle when an additional edge n → 1 is added. Note that only the starting node of an elementary stem is an unmatched node. The network controllability can be achieved when unmatched nodes are the driver nodes and each of which is connected to an independent external input. A driver node can control one of its immediate neighbors, and the propagation of control influence is through the stem. Thus all nodes on the stem can be fully controlled. On the other hand, all the matched nodes in the elementary circles do not need to be connected to extra external inputs. These nodes can be fully controlled by connecting one of the nodes in the circle to an existing external input8. This approach indicates which node sets are connected to a minimum number of external inputs, and subsequently reveals the input matrix B. Existing schemes have focused on these problems11,12,13, and they have wide significance in many real-world network applications11,14. We propose a local-game matching (LM) algorithm to explore the structural controllability of large scale real-world networks when the global topological information of matrix A is absent and only local topological information is available [Fig. 1(b)]. The main idea is to form up elementary stems and elementary circles based on matching requests between adjacent nodes, using only local network topology information. We show that LM is equivalent to a static game with incomplete information as in the static Bayesian game theory15, a configuration common in economic or social networks, and the LM algorithm achieves a Nash equilibrium in the game (Theorems 3–4 in SI). We show that LM consistently approximates the global optimal solution found using MM, with a complexity linear in time O(N) (SI, Theorem 5). Its satisfactory performance is demonstrated in various synthetic and real-world networks (SI, Section 2.4). For the optimal control problem, we propose an orthonormal-constraintbased projected gradient method (OPGM) (SI, Section 3.2) and an implicit linear quadratic regulator (ILQR) to design an optimal controller for linear systems when the input matrix B is a matrix variable to be determined [Fig. 2(b)]. We find that, in the solutions, nodes connected to external inputs

254

Graphs: Theory and Algorithms

tend to divide the network into control paths of the same length because the control cost is strongly dependent on the length of the longest path. This finding inspires us to construct a fast and efficient minimizing longest control path (MLCP) algorithm without using global topology information.

Figure 2. The contribution of this work in demonstrating a “link” (red solid arrows) from “structural controllability” to “optimal cost control”. In (a), the question mark means that currently there is no existing work considering such a problem. Our work is based on two facts: (i) To ensure “structural controllability”, we propose LM which is proven to steadily approximate the global optimal solution found by MM with linear time complexity (SI, Section 2.2); (ii) To achieve “optimal cost control”, we introduce ILQR in (b) to design an optimal controller for uncertain LTI systems when the input matrix B becomes selectable, by employing “OPGM” (SI, Section 3.2). We uncover that nodes which should be connected to external inputs tend to divide elementary topologies (stem, circle and dilation) averagely for achieving a lower cost since the control cost is mainly dependent on the length of the longest control path, which inspires (red dashed arrows) the design of the MCLP algorithm without using global topology. By combining LM and MLCP together (red solid arrows), we are able to obtain an optimal control of large scale complex networks by only using local topological information.

As later we would see, MLCP algorithm can efficiently work out a good solution for the optimal control problem based on the results of the LM algorithm for the structural controllability. Combining LM and MLCP thus demonstrates a link (red solid line) between “structural controllability” and “optimal cost control” [Fig. 1(a)]. This allows us to control large-scale complex networks using only local topological information [Fig. 2(a)].

Enabling Controlling Complex Networks with Local ...

255

MINIMIZING THE NUMBER OF DRIVER NODES THROUGH LOCAL-GAME MATCHING Prior research has focused on structural controllability, and has not addressed method efficiency. For example, when using MM the value of ND is exact but it is expensive to calculate in large networks with complexity . It is also extremely difficult if not impossible to apply to large real-world complex networks where global network topological information is seldom available. Even when this information is available, it is generally very difficult to control all the nodes by simply implementing MM16, as the communications between the central controller and so many nodes can be prohibitively expensive. We address this issue by proposing an iterative local-game matching (LM) method. Figure 1(b) shows how we assume that each node requests only its local topology information, i.e., the input-output degrees of its immediate neighbors. We also assume that each node can initiate an action without global coordination. In a directed network, when there is a directional link from node x i to node x j , we designate x i the “parent” and x j the “child.” In implementing LM, x i → x j → x k is a matching sequence with two parent-child matches, one between nodes x i and x j and the other between nodes x j and x k . When a sequence of parent-child matches forms a path, we designate it a directed control path (DCP) when it begins at an inaccessible node, and a circled control path (CCP) when it is configured end-to-end. Thus DCP and CCP correspond to elementary stems and circles in the maximum matching. To guarantee network controllability, the inaccessible nodes are connected to external inputs. To avoid confusion, all inaccessible nodes found using MM and LM are called driver nodes, and their numbers are denoted as N D and NLMDNDLM, respectively. By directly controlling these driver nodes we can steer all the nodes along the control paths. We determine the minimum driver node set in a network by locating the parentchild matches for all the nodes that form directed control paths (DCPs) and circled control paths (CCPs) and minimizing the number of DCPs. In the LM method, using local information each node requests one neighbor to become its parent and another to become its child. When there is a match of requests (e.g., when node x i requests node x j to become its parent node and node x j requests node x i to become its child node), a parentchild match is achieved and is fixed. The parent node then removes all of its other outgoing links, and in the iterations that follow no other node can send it a parent request. At the same time the child node removes all of its other incoming links, and in the iterations that follow no other node can

256

Graphs: Theory and Algorithms

send it a child request. Note that a node may send a child or parent request to itself when it has a self-loop connection. The iterative request-matching operations continue until no more child or parent requests can be sent. After implementing LM, those sequences of parent-child matches not forming a closed loop form a DCP that begins at an inaccessible node without a matched parent and ends at a node without a matched child. A closed loop (a “circle”) of parent-child matches becomes a CCP. A DCP requires an independent outside controller, but a CCP does not and can be controlled by connecting a node on the circle to any existing external control input of a directed control path. Thus the number of the independent external control inputs equals the number of DCPs found using LM. When a node is seeking a match, we define the current number of its unmatched child (parent) nodes, i.e., the nodes that have not yet achieved a match with a parent (child), as its u-output (u-input) degree. To increase the probability that a match of requests will take place, we have each node send a child (parent) request to the unmatched neighbor child (parent) node with the lowest u-input (u-output) degree. Figure 3(a1) and (a2) show a simple example. Because we assume that nodes with lower u-input (u-output) degrees will on average receive fewer child (parent) requests, we expect that this technique will increase the probability of achieving a match and thus lower the probability that a node will become a driver node with no match. A simple example in Figure 3(a3 and a4 shows that by using this simple strategy the LM gives the same result as the MM.

Figure 3. Controlling networks with local topological information by LM and MLCP. (a) Illustrations of the LM algorithm. In Child locating (a1), node x0 has two child nodes x1 and x2. As node x1 has only one parent node, while node x2 has two parent nodes, node x0 sends a child request to x1. In Parent locating (a2), Node x0 has two parent nodes x1 and x2. As node x1 has two child nodes while node x2 has three child nodes, node x0 sends a parent request to x1. And a simple example of LM is shown in (a3), where node 2 and node 3 receive parent request and child request from each other. These two nodes match with each

Enabling Controlling Complex Networks with Local ...

257

other. Thus, LM gives the same result as MM in this example. (b) Controlling a network with LM and MLCP. Firstly, obtain the control paths (DCPs and CCPs) by LM. To control the network in (b1), node 2 and node 7 (red color in (b2)) form up the driver node set which should be connected to the external control inputs. Note that the driver node set for a network may not be unique. If we add one more external input to the network, as seen in (b3), the new input node will be added on node 6 by applying minimizing longest control path (MLCP). More detailed examples can be seen in SI, Section 4.

When there is a tie, i.e., when a node has multiple unmatched child (parent) nodes with the same minimum u-input (u-output) degree, the node can either do nothing with a waiting probability ω, or it can break the tie randomly at this iteration step. Our experiments show that introducing waiting probability ω into the LM method improves its performance in certain cases (SI, Section 2.6). A detailed description of the LM algorithm (the codes are available at https:// github.com/PinkTwoP/Local-Game-Matching) and a few examples showing step-by-step execution of it can be found in Section 2.1 of SI. In LM, each node tends to maximize its own chance to be matched, collecting and using only local topological information to quickly accomplish matches as far as such is possible, thus allowing LM to be used in largescale complex networks. It is shown that the LM algorithm is equivalent to the static Bayesian game with incomplete information (SI, Theorem 4). Because this configuration is common in real-world complex economic and social networks, LM helps us understand them. We test the LM method on synthetic and real-life networks. The synthetic networks include the ER mode l17, the BA network18,19, and networks generated using Chi-squared, Weibull and Gamma distributions, respectively. Topology information about all the real-life networks we have tested is available from open sources (see the reference citations in Table S3). Figure 4(a) shows the percentage of driver nodes identified by the LM and MM methods in the synthetic networks with different average nodal degrees. The results for real-life networks are summarized in Table S3 of SI. It is observed that the number of driver nodes identified by the LM method are consistently to be close to or equal to the optimal solutions identified by the MM method in both synthetic and real-life networks.

258

Graphs: Theory and Algorithms

Figure 4. LM and MM in synthetic networks: (a) Number of driver nodes with respect to the mean degree μ for ER, B-A and other networks for LM and MM (N = 10000). (b) Input nodal degree distributions of the driver nodes found by LM and MM, respectively (N = 10000 and μ = 6). (c) Output degree nodal degree distributions of the driver nodes found by LM and MM, respectively.

Although the numbers of driving nodes found by MM and LM in different networks are about the same, an immediate question that arises is whether the driver nodes identified by the two different methods have similar statistical properties. Figure 4(a–c) show that the two methods produce approximately the same number of driver nodes and that they also identify the nodes with approximately the same input and output degree distributions. In addition, it is easy to see that the driver nodes generally avoid the hubs, which is consistent with the results in8. Thus, we conclude that the two methods either find approximately the same set of nodes, or find

Enabling Controlling Complex Networks with Local ...

259

two sets of nodes with approximately the same statistical properties. The sub-optimality of the LM method in these synthetic networks is verified. The SI further supplies formal proofs suggesting that (i) network’s structural controllability is guaranteed by LM (SI, Theorem 1); (ii) LM minimizes the probability that augmented paths will be formed based on local topological information and thus reduces the number of required external control inputs (SI, Theorems 2–3); and (iii) the Nash equilibrium of the Bayesian game can be achieved by LM (SI, Theorem 4). This theoretically explains why the solution of the LM method approximates the global optimal solution of the MM method in a number of synthetic and real-world networks (SI, Tables S2 and S3). We also prove that the time complexity of the LM algorithm is linear O(N) (SI, Theorem 5), which is much lower than that of MM O(N−− LNL )20,21 and comparable to the state-of-theart approximation algorithms in graph theory22,23. The difference between the LM and prior algorithms is that it uses much less local topological information when approximating maximum matching (SI, Section 1.3).

MINIMIZATION OF THE COST CONTROL Implicit linear quadratic regulator (ILQR) Although the controllability of complex networks is an important concern, minimizing the cost of control is even more important. Simply knowing the number of driver nodes does not tell us how to design an optimal controller for a given particular control objective. Figure 1(a) shows how control theory and graph theory can be used to determine the optimal cost control, a critical problem in complex network control. The objective is to find an optimal or suboptimal input matrix B* with a fixed dimension M without access to global topological information about adjacency matrix A. Although there have been some recent studies on the relationship between network controllability and control cost24,25,26, this is an issue that traditional control theory has not considered. Traditional control theory allows us to design an input signal u(t) with the lowest cost when the systems are LTI and the cost function is quadratic. If both the topological connection matrix A and input matrix B are known27, this linear quadratic (LQ) problem provides a solution given by a linearquadratic regulator (LQR) that involves solving a set of complicated Riccati differential equations28,29. However, LQR cannot be used in largescale real-world networks because (i) solving a high-dimension Riccati

260

Graphs: Theory and Algorithms

differential equation is difficult and time consuming; and (ii) the Riccati equation requires global information of network topology A and input matrix B, which is seldom available for large-scale real-world networks. To address this issue we use a constrained optimization model for the optimal cost control problem in which the controller determines the B variable. Once the input matrix B is obtained, the optimal controller can be constructed. This optimal controller is called an implicit linear quadratic regulator (ILQR) because it is implicitly dependent on B. Figure 2(b) shows how ILQR differs from LQR. The only decision variable to be determined in LQR is u(t). The value of the input signal at each operating time u(t) and the nodes to which the inputs are connected and the connection weights B are both decision variables to be determined using ILQR. We formulate ILQR as a matrix optimization problem under an orthonormal boundary condition, where the objective is to drive the states from any initial state x0 = x(0) = [x1(0), ...., x N (0)]T ∈ RN×1 to converge to the origin during

the time interval [0, t f ] using a minimum cost defined by

.

When u(t) is given by u(t)=− the system state is driven to the origin. As when the input matrix B is selectable, both x(t) and u(t) become functions of B, which are denoted as x(t) = x(t, B) and u(t) = u(t, B), respectively. We thus present a constrained non-convex matrix optimization problem with the input matrix B ∈ RN × M as its variable,

(1) where with M being the number is the expectation of the control cost of driving the of control inputs, system from an arbitrary initial state to the origin (x f = x(t f ) = 0) during the time interval [0, t f ]. Here is the argument over all realizations of the random initial state, tr(.) is a matrix trace function, and I M is an identity matrix with a dimension M. Note that a necessary condition for (A, B) to be controllable is that M ≥ N D . In fact, the constraint on the controllability is invertible, of (A, B) implies that the Gramme matrix T and B B = I M refers to the orthonormal boundary condition under which all columns of B are orthogonal to each other. The derivation of the model and discussion on the orthonormal constraint are presented in Section 3

Enabling Controlling Complex Networks with Local ...

261

of SI. By assuming that each element of the initial state x0 is an identical independently distributed (i.i.d) variable with zero mean and variance 1, we have in Equation (1). Because the above nonlinear constrained optimization problem has complicated matrices as its variables, it is difficult to obtain a solution. The challenge is to obtain the gradient of the cost function, which involves a series of nonlinear matrices-by-matrices derivatives that are not widely considered. We address this problem by proposing an iterative algorithm, the orthonormal-constraint-based projected gradient method (OPGM), on Stiefel manifolds for designing ILQR (SI, Section 3)

(2) where η is the learning step, we have

is the gradient

, and

(3) Therefore, throughout the process in ILQR, both u(t) and B are decision variables to be determined, where u(t) specifies the values of controller inputs at each operating time and B determines the nodes to which the controller inputs are connected and the weights of the connections. We prove that the iteration is convergent (SI, Theorem 7), with converging to where B* is an orthonormal matrix, i.e., B*TB* = I M if η is sufficiently small. Our objective is to control the system at the lowest cost using the lowest possible number of independent control inputs determined without knowing the global topology of the network, i.e., to find the optimal input matrix B* when global topological information about both A and B is unavailable. Our immediate task is to locate the control nodes, i.e., the nodes directly connected to external inputs for minimizing control cost. Math work for developing the relatively complicated OPGM is presented in detail in Section 3.2 of SI. Figure 5(a,1–a9) shows that by employing OPGM in three elementary topologies, control nodes divide the

262

Graphs: Theory and Algorithms

three topologies averagely for a lower energy cost, and the control energy is strongly dependent on the length of the longest control path [see Fig. 5(b)]. This finding enables us to design a minimizing longest control path (MLCP) method, an efficient scheme for controlling large-scale complex networks when there are sufficient control nodes to avoid the numerical controllability transition area33.

Figure 5. Interesting observations of ILQR. (a) Illustration that control nodes tend to divide elementary topologies averagely for consuming less energy cost as the cost is mainly dependent on the length of the longest control path. The located control nodes are marked in an elementary stem and circle in (a1–a3) and (a7–a9) (M = 1, 2, 3) and in an elementary dilation with (a4–a6) (M = 2, 3). When M = 2, the control node set converges to either {node 1, 5} in case a4 or {node 1, 2} in case a5 with different probabilities, around 34.19% and 65.81% respectively in 10000 rounds of experiments. When M = 3, the control node set approaches {node 1, 4, 5} and {node 1, 3, 5} with percentages 24.64% and 75.36%, respectively, in 10000 rounds of experiments. (b) Illustration that the control cost is proportional to the longest control path. Here M external control inputs are randomly allocated on a 100-node elementary circle, and the control cost vs the longest control path that is just the maximum values of all the number of edges between any two adjacent control nodes, are recorded. The experiments are simulated on Matlab with higher precision (130 significant digits) by using the Advanpix multi-precision computing toolbox. (c) Illustration that MLCP performs significantly better than Random Allocation Method (RAM), in low-degree networks while they become almost indistinguishable as

Enabling Controlling Complex Networks with Local ...

263

the mean degree (mean in-/out-degree) of the networks becomes dense. The experiment is done on an ER network by adding edges randomly and persistently (SI, Section 4.4), with M being given by M=NLMD+m0M=NDLM+m0 (m0 = 100). The mean degree increases as more edges are continuously added, and the three fitting curves plotted for RAM, OPGM and MLCP respectively coincide with each other when mean degree is around 6.

Minimizing the Longest Control Path (MLCP) Finding the minimum number of driver nodes using maximum matching ND is insufficient for controlling real-world complex networks. When we attempt to gain control by imposing input signals on the minimum set of driver nodes indicated by structural controllability theory25, we may not be able to reach the target state because too much energy is required. This is known as the numerical controllability transition where large networks often cannot be controlled by a small number of drivers33 even when existing controllability criteria are satisfied unambiguously. The reason is that a small number of driver nodes is barely enough to ensure controllability due to the fact that the controllability Gramian may be ill-conditioned. Thus we need to set the number of control nodes connected to external control inputs to be sufficiently large. Basically the numerical success rate increases abruptly from zero to approximately one as the number of control inputs is increased33. After implementing LM, a number of DCPs and CCPs form in the network. Based on the fact that control nodes tend to divide elementary topologies on an average, we design the minimizing longest control path (MLCP) algorithm, an efficient scheme for controlling large-scale complex is networks for M control inputs such that approximately the same as N D and m0 ≥ 0 is large enough for the number of controllers to go beyond the numerical controllability transition area. The main idea of MLCP is to make each DCP to be of nearly the same length as much as possible with minimum changes to the results got by LM. We assume that we have NDLM DCPs and L i is the length of path i for . We add m0 additional control inputs to these paths to minimize the longest path length of the newly formed paths. If n i is the number of additional control inputs added on the path L i subject to then MLCP is formulated as a min-max optimization problem,

,

264

Graphs: Theory and Algorithms

(4) where is the sequence of for all i. Thus the longest control path is max where is a ceiling function. Figure 3(b) shows an example. After applying LM the lengths of two directed control paths are L1 = 7 and L2 = 3, respectively. Then the longest control path length is max {7, 3} = 7. Figure 3(b2 and b3 shows that when m0 = 1 the new control input is added to L1 by MLCP, giving n1 = 1, n2 = 0, and n1 + n2 = 1. Thus the longest control path length of the newly formed paths is max When both DCPs and CCPs exist after applying LM, and CCP does not require an additional external control input, we assign each CCP . Thus to a particular DCP and have a new sequence of L i for assigning m0 additional inputs can be done by MLCP. A more detailed illustration is given in SI, Section 4. MLCP is applied to synthetic networks including ER networks17, BA networks18,19, and a number of real-world networks, and comparisons to OPGM and a random connection method between controllers and network nodes are drawn. Note that in order to generate this random connection method that ensures network controllability we first apply the MM method to find one set of driver nodes, which can be any one among the multiple maximum matching solutions for the network. Using the simple random allocation method (RAM) we then randomly select M − ND additional network nodes to construct the control node set to be connected to external inputs. Figure 5(c) presents simulation results on synthetic networks, while extensive results on real-world networks are summarized in Tables S5 and S6 in SI. We conclude that MLCP performs comparably to OPGM although network nodes are restricted to binary connections to external inputs (this shows the validity of MLCP), and it performs better than RAM. As an increasing number edges are added both MLCP and RAM are only slightly inferior to OPGM, and they become nearly indistinguishable as the network density increases. This is because when adding edges onto a low-degree network, the required number of driver nodes gradually reduces and the average/maximum length of control paths becomes longer, which causes a higher control cost. However, with the network becoming further denser, the number of paths from an arbitrary node x i to another arbitrary

Enabling Controlling Complex Networks with Local ...

265

node x j increases, implying many possibilities and opportunities for x i to affect x j . This makes the required driver nodes reduce insignificantly but the average/maximum length of control paths become shorter, which drastically decreases the control cost. Thus the performance of MLCP finally converges to that of RAM. This is a significant finding for complex network control. In large-scale dense networks we can simply randomly select the control nodes to obtain an optimal cost control, but for lower degree networks MLCP is the best choice.

DISCUSSIONS AND CONCLUSION We begin by proposing local-game matching (LM) to ensure the structural controllability of complex networks when we have incomplete information about the network topology, and we test the performance using real-world networks with millions of nodes. We then design a suboptimal controller, the “implicit linear quadratic regulator” (ILQR) for LTI systems with incomplete information about the input matrix. It is found that the control cost can be significantly reduced if we minimize the longest control path length. This conclusion is consistent with the findings in30,34. Thus, by combining the LM and MLCP, we are able to demonstrate a “link” from “structural controllability” to “optimal cost control” in complex networks without using the global topology. We can apply MLCP to select control nodes in networks of relative lower degrees, while in dense networks the random selection of control nodes is effective. As commonly most real-world networks can be modeled using LTI systems with various assigned physical meanings of nodes and edges, we believe the methodology we propose here can be applied to many real-world networks studied in the human brain, medical science, social science, biology, and economics. Furthermore, many physical constraints may exist in real-life systems, affecting the selection of matrix B. While some studies have been carried out to handle such constraints35 and MLCP may be viewed as to a certain extent helping fulfill the simple control of large-scale complex systems, extensive further studies are in demand to handle various constraints in real-life applications. In this work, we uncover that the local network topology information of directed networks provides sufficient rich information not only for modelling real world, but also for brain inspired computing. It is well known that the human brain can be described as an extremely large-scale complex networks having 1011 nodes (neurons) and 1015 edges (synapses), with scale-free organization. And many studies indicate that the brain neural

266

Graphs: Theory and Algorithms

networks are organized by dense local clustering to efficiently support the distributed multi-modal information processing, including vision, hearing, olfaction, etc. These characteristics make it possible for human brain works efficiently as parallel-distributed processing computers with various circuits that process local information parallelly and distributedly. We envision that our breakthrough on the control and optimal control of complex network with limited topological information would exploit advances in future braininspired computing theory aiming for deliver cost-effective information processing.

ACKNOWLEDGEMENTS The work was partially supported by National Science Foundation of China (61603209), and Beijing Natural Science Foundation (4164086), and the Study of Brain-Inspired Computing System of Tsinghua University program (20151080467), and Ministry of Education, Singapore, under contracts RG28/14, MOE2014-T2-1-028 and MOE2016-T2-1-119. Part of this work is an outcome of the Future Resilient Systems project at the Singapore-ETH Centre (SEC), which is funded by the National Research Foundation of Singapore (NRF) under its Campus for Research Excellence and Technological Enterprise (CREATE) programme.

Enabling Controlling Complex Networks with Local ...

267

REFERENCES 1. 2. 3. 4.

5. 6. 7.

8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

Watts, D. J. & Strogatz, S. H. Collective dynamics of small-world networks. nature 393, 440–442 (1998). Strogatz, S. H. Exploring complex networks. Nature 410, 268–276 (2001). Barabási, A. L. & Bonabeau, E. Scale-free networks. Scientific American 288, 50–59 (2003). Kalman, R. E. Mathematical description of linear dynamical systems. Journal of the Society for Industrial and Applied Mathematics, Series A: Control 1, 152–192 (1963). Gu, S. et al. Controllability of structural brain networks. Nature communications 6 (2015). Cornelius, S. P., Kath, W. L., & Motter, A. E. Realistic control of network dynamics. Nature communications 4 (2013). Wang, L. Z. et al Control and controllability of nonlinear dynamical networks: a geometrical approach. arXiv preprint arXiv:1509.07038. (2015). Liu, Y. Y., Slotine, J. J. & Barabási, A. L. Controllability of complex networks. Nature 473, 167–173 (2011). Lin, C. T. Structural controllability. IEEE Transactions on Automatic Control 19, 201–208 (1974). Murota, K. Matrices and matroids for systems analysis, vol. 20 (Springer Science & Business Media, 2009). Yuan, Z., Zhao, C., Di, Z., Wang, W. X. & Lai, Y. C. Exact controllability of complex networks. Nature communications 4 (2013). Gao, J., Liu, Y. Y., D’souza, R. M. & Barabási, A. L. Target control of complex networks. Nature communications 5 (2014). Klickstein, I., Shirin, A. & Sorrentino, F. Energy scaling of targeted optimal control of complex networks. Nature Communications 8 (2017). Ruths, J. & Ruths, D. Control profiles of complex networks. Science 343, 1373–1376 (2014). Hansanyi, J. Games with incomplete information played by bayesian players. Management Sci 14, 159–183 (1967). Weiss, G. Multiagent systems: a modern approach to distributed artificial intelligence (MIT press, 1999). Erdos, P. & Rényi, A. On the evolution of random graphs. Publ. Math.

268

18. 19. 20.

21.

22. 23.

24. 25.

26.

27.

28.

29.

30.

Graphs: Theory and Algorithms

Inst. Hung. Acad. Sci 5, 17–60 (1960). Albert, R. & Barabási, A. L. Statistical mechanics of complex networks. Reviews of modern physics 74, 47 (2002). Newman, M. E. Power laws, pareto distributions and zipf’s law. Contemporary physics 46, 323–351 (2005). Hopcroft, J. E. & Karp, R. M. A n 5/2 algorithm for maximum matchings in bipartite. In Switching and Automata Theory, 1971., 12th Annual Symposium on, 122–125 (IEEE, 1971). Micali, S. & Vazirani, V. V. An |V|1/2|E| algoithm for finding maximum matching in general graphs. In Foundations of Computer Science, 1980., 21st Annual Symposium on, 17–27 (IEEE, 1980). Lotker, Z., Patt Shamir, B. & Rosén, A. Distributed approximate matching. SIAM Journal on Computing 39, 445–460 (2009). Mansour, Y. & Vardi, S. A local computation approximation scheme to maximum matching. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, 260–273 (Springer, 2013). Yan, G. et al. Spectrum of controlling and observing complex networks. Nature Physics 11, 779–786 (2015). Chen, Y. Z., Wang, L., Wang, W. & Lai, Y. C. The paradox of controlling complex networks: control inputs versus energy requirement. arXiv preprint arXiv:1509.03196 (2015). Yan, G., Ren, J., Lai, Y. C., Lai, C. H. & Li, B. Controlling complex networks: How much energy is needed? Physical review letters 108, 218703 (2012). Doyle, J. C., Glover, K., Khargonekar, P. P. & Francis, B. A. State-space solutions to standard h/sub 2/and h/sub infinity/control problems. IEEE Transactions on Automatic control 34, 831–847 (1989). Nguyen, T. & Gajic, Z. Solving the matrix differential riccati equation: a lyapunov equation approach. IEEE Transactions on Automatic Control 55, 191–194 (2010). Jiménez Lizárraga, M., Basin, M., Rodrguez, V. & Rodrguez, P. Openloop nash equilibrium in polynomial differential games via statedependent riccati equation. Automatica 53, 155–163 (2015). Li, G. et al. Minimum-cost control of complex networks. New Journal of Physics 18, 013012 (2015).

Enabling Controlling Complex Networks with Local ...

269

31. Rugh, W. J. Linear system theory, vol. 2 (prentice hall Upper Saddle River, NJ, 1996). 32. Klipp, E., Liebermeister, W., Wierling, C., Kowald, A. & Herwig, R. Systems biology: a textbook (John Wiley & Sons, 2016). 33. Sun, J. & Motter, A. E. Controllability transition and nonlocality in network control. Physical review letters 110, 208701 (2013). 34. Chen, Y. Z., Wang, L. Z., Wang, W. X. & Lai, Y. C. Energy scaling and reduction in controlling complex networks. Royal Society open science 3, 160064 (2016). 35. Iudice, F. L., Garofalo, F. & Sorrentino, F. Structural permeability of complex networks to control signals. Nature communications 6 (2015).

Estimation of traffic flow changes using networks in networks approaches

16

Jürgen Hackl and Bryan T. Adey

Institute for Construction Engineering and Management, ETH Zurich, Stefano-Franscini-Platz 5, 8093, Zurich, Switzerland

ABSTRACT Understanding traffic flow in urban areas has great importance and implications from an economic, social and environmental point of view. For this reason, numerous disciplines are working on this topic. Although complex network theory made their appearance in transportation research through empirical measures, the relationships between dynamic traffic patterns and the underlying transportation network structures have scarcely been investigated so far. In this work, a novel Networks in Networks (NiN)

Citation: (APA): Hackl, J., & Adey, B. T. (2019). Estimation of traffic flow changes using networks in networks approaches. Applied Network Science, 4(1), 1-26. (26 pages) DOI: https://doi.org/10.1007/s41109-019-0139-y Copyright:- Open Access. This article is distributed under the terms of the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

272

Graphs: Theory and Algorithms

approach is presented to study changes in traffic flows, caused by topological changes in the transportation network. The NiN structure is a special type of multi-layer network in which vertices are networks themselves. This embedded network structure makes it possible to encode multiple pieces of information such as topology, paths, and origin-destination information, within one consistent graph structure. Since each vertex is an independent network in itself, it is possible to implement multiple diffusion processes with different physical meanings. In this way, it is possible to estimate how the travelers’ paths will change and to determine the cascading effect in the network. Using the Sioux Falls benchmark network and a real-world road network in Switzerland, it is shown that NiN models capture both topological and spatial-temporal patterns in a simple representation, resulting in a better traffic flow approximation than single-layer network models.

INTRODUCTION Mobility and accessibility are essential factors for lifestyle and prosperity. People travel to satisfy their needs, by carrying out certain activities at specific places such as work, leisure and learning. The spatial distribution of these activities often leads to a coordination problem, which can significantly affect the equilibrium between the demand for, and the supply of transportation. To determine network flow, costs and other aspects of interest, the satisfaction of a given demand for movements of persons and goods with different trip purposes, at different times, using various modes of transport, must be ensured in a transport system with a given operational capacity (de Dios Ortuozar and Willumsen 2011). As a result of the dynamic nature between mobility demand and supply, in combination with the topology and capacity limitation of the underlying network, transportation networks exhibit atypical dynamic behaviour. Unlike many other networks, network performance deteriorates as soon as the number of vehicles in the network exceeds a critical accumulation (Daganzo and Geroliminis 2008; Hoogendoorn and Knoop 2012), i.e. vehicles block each other and the flow decreases, leading to spillbacks and gridlock effects. This phenomenon is amplified by the fact that even small (unexpected) failures or damage to the infrastructure (i.e. changes in topology) can lead to significant disruptions that are disproportionate to the actual physical damage itself (Vespignani 2010). To prevent such situations, scientists and engineers are working on the implementation of resilient systems capable of withstanding failures, natural hazards and human-made disruptions. Part of the research deals with

Estimation of traffic flow changes using networks in ...

273

the quantification of network-related risks, including the modelling of traffic flows after multiple link failures (Erath 2011; Hackl et al. 2018a; Hackl et al. 2018b). Traffic models are needed to simulate the current and predict future traffic flows. An essential component of such models is the so-called traffic assignment process, which aims to reproduce the pattern of vehicular movements based on certain behavioral rules (Wang et al. 2018). For example, a common behavioral rule is that travellers choose paths with minimum travel time (Wardrop 1952) or maximise their utilities (Charypar and Nagel 2005). In order to satisfy the needs of all travelers, an equilibrium between demand and supply has to be found, i.e. no traveler wants to change his path. This complex and computationally intensive mathematical problem is still being actively researched. To make matters worse, in order to quantify network-related risks, resilience, or optimal intervention strategies, the traffic assignment problem must not only be solved once but many times with different network topologies (e.g. see (Erath 2011; Vugrin et al. 2014; Hackl et al. 2018a; Hackl et al. 2018b; Schlögl et al. 2019)). While addressing such problems have led to a substantial body of work in areas such as geography, economics, and transportation research, complex network theory still plays a minor role. Although complex networks made their appearance in transportation research through empirical measures, little research has so far been done to investigate the relationship between dynamic traffic patterns and the underlying structures of the transportation networks (Barrat et al. 2008). In this work, the application of a novel Networks in Networks (NiN) approach is presented. This approach is used to study traffic flow changes caused by topological changes in the transportation network (e.g. due to multiple link failures) from a complex network perspective. NiNs are based on a multi-layer approach where each vertex itself represents a network. This embedded network structure allows encoding multiple pieces of information such as the topology, paths used and origin-destination information, within one consistent graph structure (i.e. using only vertices and edges). In combination with a multi-layered diffusion process, an approximation to changes in traffic flow due to topology changes can be made. Specifically, this work advances the state-of-the-art in the field of complex network science in transportation research as follows. •

Using a modified multi-layer hyper graph it is formally feasible to describe vertices that are networks themselves. Thereby the

274

Graphs: Theory and Algorithms

relationships in the incidence graph represent the edges connecting different layers. The edges within the different layers are given by a connection model, which allows different topologies in the different layers. • Because each vertex is an independent network in itself, it is possible to implement multiple diffusion processes. Therefore, it is possible to assign different physical meanings to the processes. For example, one process can describe how individual travelers switch between different paths, while another process describes the propagation of disturbances through the network. • The proposed approach allows approximating traffic flow changes due to multiple edge failures. Using the Sioux Falls benchmark network and a real-world road network in Switzerland, it is shown that NiN models capture both topological and spatial-temporal patterns in a simple representation, resulting in a better traffic flow approximation than single-layer network models. This work is organized as follows. A brief overview of the modelling, functionality and complexity of transport systems is given in the following section. In addition, advances in complex network theory regarding transport systems and dynamic processes are discussed. A general formulation for the NiN representation is presented in the “Methodology” section. In addition, the application to transportation networks and the modelling of traffic flow changes are discussed on a general level. Two applications are presented, the modelling of a small benchmark network and the modelling of a real network located in Switzerland. In particular, this section is divided into an overview of the data used, the assumptions made and the implementation of the methodology. Subsequently, the results and a critical discussion about the results, advantages and disadvantages of the method are given. Finally, concluding remarks and suggestions for further work in this area are presented. The notation used in this work is listed in the appendix.

BACKGROUND Transportation networks (preliminaries) The purpose of transport systems is to balance supply and demand for mobility. The demand for transport is derived from people trying to satisfy their needs (work, leisure, health, education) through activities in specific places. Transport supply is the service provided at a certain point

Estimation of traffic flow changes using networks in ...

275

in time. This includes the infrastructure (e.g. road network) and a set of mobile units (e.g. persons, vehicles, goods). In combination with a set of rules for operation, the movements of persons and goods can be ensured (de Dios Ortuozar and Willumsen 2011). In order to predict how the need for mobility will manifest itself in space and time, a formal representation of the transport system is required. In a mathematical sense such systems are often represented as graphics or networks, which are denoted by G = (V, E) and consist of a set of edges E and a set of vertices V. In this work, the term infrastructure network is used to refer to networks where only topology and connectivity are considered (Rodrigue et al. 2009), i.e. the network comprises vertices and edges that form a connected component. If, in addition to topology, flow characteristics, such as origin-destination demands, capacity constraints, path choice and travel costs, are taken into account to represent the movement of people, vehicles or goods, the network is referred to as transportation network. In a transportation network, the edges represent the movement between vertices, which in turn represent points in space. An edge e∈E connects two vertices vi,vj∈V and a vertex connects two or more edges. Edges can be either directed e=(vi,vj)∈E, indicating that vi and vj are directly connected and movement is only possible from vi to vj, or undirected e={vi,vj}∈E). Important properties of transportation network edges include edge length, edge cost and edge capacity. The edge length corresponds to the length of the road section connecting two vertices. The term edge cost is used to describe the disutility perceived by the network user for travelling on this edge. It is a composite measure of all factors known to be important for decision-making. Travel time and direct costs such as fuel consumption, parking fees and tolls are often taken into account for this purpose. Since transportation networks are physically constrained, it is assumed that each edge has a maximum capacity, i.e. a maximum rate at which people, vehicles or goods can travel on an edge during a given period under prevailing roadway, traffic and operation conditions (Hoogendoorn and Knoop 2012). The movements in a transportation network correspond to flows with a distinct origin and destination. Origins and destinations can represent particular locations such as residential buildings, offices, shopping centres, or specific zones. In the context of transportation networks, origins and destinations are represented as vertices o,d∈V. It should be noted that not all vertices in the network need to be an origin or a destination. Vehicle movements from origin o to destination d vertices occurring along edges are represented as paths. A path p∈P is considered as a sequence of

276

Graphs: Theory and Algorithms

edges that ordered so that two vertices are adjacent if and only if they are consecutive. PP therefore denotes the set of all simple non-empty paths in G=(V,E). The set of od-paths is denoted by Pod⊆P.

To estimate the movements in a transportation network, it is necessary to find an equilibrium between demand and supply, i.e. in the equilibrium situation, the user chooses the path that he perceives to be the least costly at the time. Economic theory admits that this equilibrium may never really actually occur in practice, as the system of demand and supply levels is constantly adapting to cope with internal and external changes. However, the concept of equilibrium is still valuable to understand movement in transportation networks, assuming that the system is at least near an equilibrium situation. In order to find this equilibrium, various traffic flow models have been developed in recent decades. The most common classification in current traffic flow research is the distinction between macroscopic and microscopic traffic flow modelling approaches (Hoogendoorn and Knoop 2012). The macroscopic perspective considers the overall or average state of traffic, while the microscopic perspective considers the behaviour of individuals interacting with surrounding vehicles. Macroscopic models were the first to be derived by scientists (Wardrop 1952; Lighthill and Whitham 1955) who studied vehicle flow as an analogy to the flow of continuous media such as fluids or gases. These models are based on a limited number of partial differential equations, which reduces the computational complexity. The disadvantage, however, is that dynamic features cannot be modelled as accurate as with microscopic models. Microscopic models have been developed to try to emulate human behavior in traffic situations. To accomplish this, the models contain different driving conditions to describe typical driving reactions. As each vehicle is an autonomous entity, microscopic models become very computationally expensive with increasing system size. In order to reduce the computational time for both modelling approaches, scholars have developed various techniques. This includes among others, the improvement of the optimization algorithms to find an equilibrium solution (Charypar and Nagel 2005; Mitradjieva and Lindberg 2013; Gentile 2014); the development of speed-up techniques for sub-problems of the traffic assignment (e.g. finding the shortest paths) (Geisberger et al. 2008; Delling et al. 2009; Buchhold et al. 2018) or the utilisation of GPU cards to parallelise (agent-based microscopic) traffic models (Song et al. 2017; Heywood et al. 2018). Another way to address these problems could be through the use of complex network approaches.

Estimation of traffic flow changes using networks in ...

277

Traffic on complex networks Complex networks are based on the ideas of mathematical graph theory in order to gain insights into the behaviour of complex systems by abstracting information into ordinary graphs (networks). In these representations, the network comprises vertices connected by edges, with vertices representing individual elements and edges indicating interactions or relationships between them. Although this approach is simple in many ways, it allows the characterization of the complex system so that traditional graphictheoretical metrics can be used and analyses performed. For example, such abstractions have been used to study growth mechanisms (Barabási and Albert 1999; Clauset et al. 2009), processes of collective dynamics (Watts and Strogatz 1998), and to illustrate that certain vertices play a central role in the complex system (Freeman 1977; Wasserman and Faust 1994). The strength of the complex network paradigm lies in its ability to capture some of the essential structural features of interacting systems while reducing the details of both the elements and their interactions. Consequently, the early complex network literature focused almost exclusively on the structural properties of networks Smith et al. (2011). This topology-driven analysis can reveal relevant properties of the structure of a complex system (Albert et al. 1999; Watts and Strogatz 1998) by highlighting the role of vertices and edges (Bavelas 1950; Freeman 1977) or global network properties (Taaffe et al. 1973; Cliff et al. 1979). The robust mathematical framework allows the derivation of analytical solutions even for large complex systems. For example, hierarchical network representations are used to study large complex transportation systems (Gómez et al. 2013; Lim et al. 2015). Thereby, hierarchical models are obtained by successive clustering of networks, i.e. decomposition of the system into different levels of details (Ferrario et al. 2016). While structural properties are still important in constraining the behavior of a system (Marr and Hütt 2005), the focus has expanded to an understanding of the relationship between structure and dynamics that takes place in networks and the impact of this relationship on network design (Toroczkai 2005). Most technological, biological, economic, social or infrastructural networks support a number of dynamic (transport) processes, such as the movement of information packages (Wang et al. 2006), finance and wealth (Coelho et al. 2005), rumours (Moreno et al. 2004), diseases (Newman 2002), people or goods. Gradually, these theories have been introduced to the field of transportation. More and more scholars have

278

Graphs: Theory and Algorithms

conducted research on the characteristics of various transportation networks, among others those of (urban) road networks (De Montis et al. 2007; Erath et al. 2009; Barthélemy 2011; Lin and Ban 2013), railway networks (Latora and Marchiori 2002; Sen et al. 2003), and transit networks (Guo Xl and Lu 2016; Solé-Ribalta et al. 2016). In addition, current studies use complex networks to analyse traffic time series (Tang et al. 2013; Yan et al. 2017; Bao et al. 2017). In the field of complex network sciences, a widely used approach to study the relationship between dynamic processes and the underlying network structures is through the use of random walks. In such a model, random walkers move in the network and visit various edges and vertices over time. An extensive overview of the use of random walks and diffusion on complex networks is given by Masuda et al. (2017). Researchers, using this technique, have been able to gain insights into topological features such as vertex centralities (Brin and Page 1998) or community structures (Newman 2006; Jeub et al. 2015). In many real transportation systems, however, the assumption of such simple random walker models may not always be justified, since it is assumed that the walker moves randomly in the network without considering its origin or destination. For example, in a diffusion process on a road network, the next position of a vehicle depends only on its current position (occupied vertex) and the outgoing roads (edges), but not on one of the previously visited locations. In reality, however, travel in a network has a specific purpose: a person starts at home and navigates through the network to reach a particular destination (e.g. work), and then returns home with a high probability (Salnikov et al. 2016). Consequently, the naive application of (static) network paradigm in modelling dynamic complex systems might lead to wrong conclusions (Rosvall et al. 2014; Scholtes et al. 2014; Scholtes 2017). One way to address this issue is through the extension to a multilayer networks representation. Multi-layer networks represent complex systems that are formed from several networks (layers), each of which represents interactions of different nature and connections. Due to the distinction between the different types of edges and vertices, multi-layer networks encode significantly more information than conventional single-layer networks (Iacovacci and Bianconi 2016). In network science, the two most prominent classes of multi-layer networks are multiplex networks and networks of networks. Networks of networks are formed by layers composed of different vertices.

Estimation of traffic flow changes using networks in ...

279

Edges connecting different networks do not necessarily indicate dependency relationships. Examples can be found in complex infrastructure networks such as road networks, railway networks and flight networks, where each layer represents its own infrastructure. Multiplex networks, on the other hand, are formed by the same set of vertices connected by edges indicating different types of interactions. In the context of transport systems, this approach has been used, for example, for the analysis of flight networks (Cardillo et al. 2013). An overview of other types of multi-layer networks is given by Boccaletti et al. (2014); Kivelä M et al. (2014); Bianconi (2018). In all these definitions, it is assumed that a multi-layer network can be represented as a graph GM, which is an ordered tuple GM=(VM,EM) considering a non-empty labelled vertex set VM and a multiset EM⊆VM×VM of edges. A vertex vα∈VM is a tuple representing vertex v on the layer α∈L, where L is the set of layers in the network.

METHODOLOGY Networks in networks ANetworks in Networks (NiN) structure is a special type of multi-layer network in which vertices themselves are networks, i.e. GNiN=(GNiN,ENiN) is a network with a set of graphs GNiNGNiN acting as vertices and a multiset ENiN⊆GNiN×GNiN of edges. A vertex vβi∈Gβ is a tuple representing a graph viβ:=(Gα,Eα), where α 0 where the simplicial complexes containing the higher-order cliques remain strongly interconnected until the beforelast level qmax − 1 = 8. These findings agree with the impact of the chemical potential favoring the clique’s attraction for ν > 0 and repulsion for ν < 0. In this context, it is interesting to note that structure that was grown solely under the geometrical rules (ν = 0) already possesses a sizable hierarchical organization of simplicial complexes; although the degree of connectivity is systematically lower than in the case ν =+1, the structure holds together until the level q = 7. (See Table 1 for the exact values). As the Fig. 3 shows, this hierarchical architecture of the assembled networks gradually builds with increasing values of the parameter ν. To illustrate the differences in the hierarchical organization of the systems for ν =−1, 0, + 1, in Fig. 4 we display those parts of their structure that are still visible at the topology level q = 5. Precisely, the nodes participating in the simplexes of order q ≤ 5 which

Hidden geometries in networks arising from cooperative

319

are not faces of the cliques of the order q > 5, are removed. The connections among the remaining nodes are shown according to the network’s adjacency matrix. Table 1. The components of the three structure vectors for the networks generated at different chemical potential ν. q

ν = −1

ν=0

Qq

nq

0

1

453

1 2 3

308 227 149

4

ν = +1

Qq

nq

Qq

nq

0.997

1

632 0.998

1

815

0.999

453 259 157

0.320 0.124 0.051

367 203 124

632 0.419 330 0.385 194 0.361

411 166 126

815 442 300

0.495 0.624 0.580

110

111

0.009

98

134 0.269

83

196

0.576

5

76

76

0.000

84

94

0.106

65

125

0.480

6

63

63

0.000

56

59

0.051

58

86

0.325

7 8

36 20

36 20

0.000 0.000

34 22

35 22

0.029 0.000

43 30

54 32

0.204 0.063

9

11

11

0.000

11

11

0.000

17

17

0.000

Figure 4. Adjacency matrix of the network’s nodes which participate in structures that are “visible” at the topology level q = 5. L Left to right: ν = −1, 0, and +1. Different colors identify clusters or communities.

The node’s participation in building various simplexes also manifests in the global statistical features of the network. The cumulative degree distribution for several studied aggregates is given in Fig. 5. It is averaged over several realizations of the systems containing over 5000 nodes, where

320

Graphs: Theory and Algorithms

smax ∈ [2,12]. Although a broad distribution of the node’s degree occurs in each case, it strongly varies with the parameter ν. It is interesting to note that, in the networks grown by geometrical constraints with ν = 0, we obtain the distribution with a power-law decay τ + 1 ≈ 3 (within the numerical error bars); its cut-off appears to depend on the size of the largest clique. In contrast, the exponential decay is observed for ν < 0 while a structure containing many nodes of a large degree is present in the case of clique attraction for ν > 0, which is separated from the low-degree nodes. Other graph theoretic measures also vary accordingly.

Figure 5. Cumulative distributions of the degree in networks of aggregated poly-disperse cliques s ∈ [2, 12] and varied chemical potential ν (top panel) and for purely geometrical aggregation (ν = 0) and varied size of the largest added clique smax (lower panel). Each distribution is averaged over several samples of the networks with the number of nodes N ≥ 5000.

δ-Hyperbolicity of the emergent networks For network structures, δ-hyperbolicity is a generalization of negative curvature in the large37. Here, we consider the aggregates of cliques, which are known 0-hyperbolic graphs; therefore, these structures are expected to exhibit this intrinsic property at a larger scale. Following the procedure described in37, we investigate the 4-point Gromov hyperbolicity of different emergent networks. Specifically, we determine the average hyperbolicity 〈δ〉 in comparison to the graph’s diameter for ν =−5, −1, 0, +1, and +5, by a sampling of 109 sets of four nodes, as described in Methods. Considering

Hidden geometries in networks arising from cooperative

321

three different realizations of the network for each ν, we find numerically that δ can take the values {0,1/2,1}; hence, the maximum value δmax = 1 suggests that these assemblies are 1-hyperbolic. In Fig. 6 (bottom panel) we plot the average hyperbolicity 〈δ〉 against the minimal distance dmin of the involved pair in the smallest sum JJ, see Methods. Notably, for all network types 〈δ〉 remains bounded at small values. In particular, we find that 〈δ〉 = 0 for the tree graph of cliques corresponding to ν = −5. Whereas, the hyperbolicity parameter is close to zero in the sparse network of cliques for ν =−1, and slightly increases in the more compact structures corresponding to ν = 0 and ν > 0. Note that due to a small number of pairs of nodes having the largest distance in the graph we observe the fluctuation of 〈δ〉 ∈ [0, 0.5]. The histograms of distances between all pairs of nodes in the considered networks are also shown in Fig. 6 (top panel).

Figure 6. Histograms of shortest distances d between pairs of nodes (top) and average hyperbolicity 〈δ〉 vs. dmin (bottom) in three samples of networks for ν = −1, 0, +1, and +5. Network size is above N = 500 nodes.

Aggregation of monodisperse cliques In this section, we briefly consider the structures grown with the same aggregation rules but with mono-disperse building blocks. Some compelling examples are the aggregates of tetrahedra and triangles. Tetrahedral forms are ubiquitous minimum-energy clusters of covalently bonded materials12. We also study the impact of the chemical potential in the event of aggregation of

322

Graphs: Theory and Algorithms

triangles. The importance of triangular geometry was recently pointed in the context of quantum networks44. Some examples of these structures grown by the aggregation rules of our model are shown in Fig. 7.

Figure 7. Aggregates of tetrahedra (left) and triangles (right) for ν = 0.0.

Since the aggregation process does not alter the size of the largest clique, these networks have only few topology levels. Specifically, in the aggregates of tetrahedra qmax = 3, and they can share nodes, links, and triangles as faces of lower orders; for triangles, qmax = 2 and shared faces are links and nodes. Therefore, their structure vectors are rather short. However, they possess a captivating structure of simplicial complexes, depending on the chemical potential and geometry constraints. Consequently, the degree distributions are altered by changing ν, as shown in Fig. 8. Notably, the appearance of some scale-invariant structures is favored by the mutual attraction of cliques for ν > 0. The aggregation of tetrahedra more efficiently builds such structures as compared with triangles. Whereas, the scale-free range is limited with the exponential cut-offs in the case of triangles unless ν is sufficiently large. Further analysis of these and other networks of mono-disperse simplexes is left for future work.

Hidden geometries in networks arising from cooperative

323

Figure 8. Cumulative distributions of the node’s degree in networks of aggregated mono-disperse cliques (main panel) tetrahedra, and (inset) triangles, for different values of the chemical potential ν. Sample averaging and the number of nodes N ≥ 5000 applies. Thick broken and full lines indicate the range where the slopes given in the legend are measured (within the maximal error bars ±0.07).

DISCUSSION We have introduced a computational model for cooperative self-assembly where small, formed groups of particles appear as building blocks for a large-scale structure. In this context, in addition to the binding forces, the geometric constraints exerted by the rising architecture play an important role on the proper nesting of the added block. Different geometrically suitable options for nesting a given block structure are further altered due to the chemical affinity ν of the system for receiving the excess number of particles. Formal rules of the model are motivated by situations that usually occur in self-assembly of nanoparticles, where the possibilities for creating different clusters are tremendous. Nevertheless, the rules can be easily adapted to describe different other cases where, for example, due to interactions, only clusters of a certain type can appear and then combine into a hierarchical network. It should be noted that the model explicitly does not take into account the effects of temperature and diffusion, which are experimentally controllable parameters. In the assembly, simplexes are added one by one, and every

324

Graphs: Theory and Algorithms

added object is attached (with probability one) to the structure when a geometrically suitable nesting site is found and remains in place. Therefore, in the limits described below, these aggregation characteristics resemble the well-known processes of diffusion limited aggregation (DLA), where the random particle tree grows in low-density conditions by attaching a particle that diffuses in the solvent when it approaches the tree45. Indeed, within the limits of the significant negative values of the ν parameter that promotes repulsive interactions between simplexes, the structure resembles a DLA tree, but here it is made from expanded objects (simplexes) and not individual particles. Note that in this case, ν refers to the number of excess particles of the coming simplex, while simplex joins the structure along the nest, containing the remaining particles. Hence, effectively, the chemical potential for the nested particles of the simplex is positive, in analogy to DLA binding. More specifically, when qmax = 1, only one particle can be added with its link, and the growing structure is a random tree, independent of the ν value (see the web demo40). In this case, only one type of binding process occurs with a probability one in the equation (1), regardless of the value and the sign of the parameter ν. For qmax > 1, however, there are several types of bindings that are differentiated by exponential factors in the formula (1) as described above. Consequently, the emerging structure builds non-random features that are different from DLA clusters, as described in the Results section. We have demonstrated how different assemblies with a complex architecture can be formed in the interplay of these geometric and chemical factors. Moreover, the systematic mapping of the developing structure to the graph not only helps us formally implement the self-assembly process but also provides ways to adequately investigate the new structure employing advanced graph theory and algebraic topology methods. It is interesting that the complex structure of the assembly that possesses combinatorial topology of higher order can arise due to only geometric factors. These topology features are further enhanced in chemically enforced compaction, and, on the contrary, are gradually reduced in sparse networks resulting from chemically favored repulsion between building blocks. Moreover, depending on the dispersion of the components and the chemical factor, the new assemblies may possess scale invariance and an intrinsic global negative curvature; these features are essential for their practical use and functionality. Our model with graph-based representation provides a better insight into the mechanisms that drive the assembly of hierarchically organized networks with higher topological complexity, which is a growing demand for technological applications.

Hidden geometries in networks arising from cooperative

325

METHODS Program flow for clique aggregation Q-analysis: definition of structure vectors To describe the global graph’s connectivity40 at different topology levels q = 0, 1, 2 ··· max, Q-analysis uses notation from algebraic topology of graphs 32,41,42,43. Specifically, the first structure vector Qq represents the number of q-connected components and the second structure vector nq is defined as the number of simplexes of the order greater than or equal to q. In this context, two simplexes are q-connected if they share a face of the order q, i.e., they have at least q + 1 shared nodes. Then the third structure vector determined as ≡1−Qq/nq measures the degree of connectivity at the topology level q among the higher-order simplexes. From the adjacency matrix of a considered graph, we construct incidence matrix by Bron-Kerbosch algorithm 46, where simplexes are identified as maximal complete subgraphs (cliques). Then the dimension of the considered simplicial complex equals the dimension of the largest clique qmax + 1 belonging to that complex.

Measure of curvature: δ-hyperbolicity definition Following the studies in37 and references there, we implement an algorithm which uses the Gromov’s hyperbolicity criterion. Specifically, for an arbitrary set of four nodes A, B, C, and D, the distances (shortest path lengths) between distinct pairs of these nodes are combined in three ways and ordered. For instance, d A( , B d ) ( +≤+ C D, ) d A( , C d ) (B D, ) ≤ + d A( , D d ) (B C, ). Algorithm 1. Program Flow: Growth of the graph by attaching simplexes.

326

Graphs: Theory and Algorithms

We denote the largest value L=d(A,D)+d(B,C), the middle M=d(A,C)+d(B,D), smallest J=d(A,B)+d(C,D), and the smallest pair distance of J as dmin=min{d(A,B),d(C,D)}. Then the graph is δ-hyperbolic if there is a fixed value δ for which any four nodes of the graph satisfy the 4-point condition: (2) There is a trivial upper bound (L−M)/2≤dmin(L−M)/2≤dmin. Hence, by plotting (L−M)/2(L−M)/2 against dmin we can investigate the worst case growth of the function. For a given graph, we first compute the matrix of distances between all pairs of nodes; then, by sampling a large number of sets of nodes for the 4-point condition (2) we determine and plot the average 〈δ〉 against the corresponding distance dmin.

Graphs visualization We used gephi.org for graph presentation and community structure detection by maximum modularity method47.

ACKNOWLEDGEMENTS The authors acknowledge the financial support from the Slovenian Research Agency under the program P1-0044 and from the Ministry of Education, Science and Technological Development of the Republic of Serbia, under the projects OI 171037, III 41011 and OI 174014.

Hidden geometries in networks arising from cooperative

327

REFERENCES 1.

2. 3.

4.

5.

6. 7.

8. 9. 10. 11.

12.

13.

Pelaz, B. et al. The state of nanoparticle-based nanoscience and biotechnology: Progress, promises, and challenges. ACS Nano 6, 8468–8483 (2012). Whitesides, G. M. & Grzybowski, B. Self-assembly at all scales. Science 295, 2418–2421 (2002). Boles, M. A., Engel, M. & Talapin, D. V. Self-assembly of colloidal nanocrystals: From intricate structures to functional materials. Chemical Reviews 116, 11220–11289 (2016). Meledandri, C. J., Stolarczyk, J. K. & Brougham, D. F. Hierarchical gold-decorated magnetic nanoparticle clusters with controlled size. ACS Nano 5, 1747–1755 (2011). Wang, L., Xu, L., Kuang, H., Xu, C. & Kotov, N. A. Dynamic nanoparticle assemblies. Accounts of Chemical Research 45, 1916– 1926 (2012). Luo, D., Yan, C. & Wang, T. Interparticle forces underlying nanoparticle self-assemblies. Small 11, 5984–6008 (2015). Liu, S. & Yu, J. Cooperative self-construction and enhanced optical absorption of nanoplates-assembled hierarchical Bi2WO6 flowers. Journal of Solid State Chemistry 181, 1048–1055 (2008). Wang, R. et al. Self-replication of information-bearing nanoscale patterns. Nature 478, 225–229 (2011). Rossi, L. et al. Cubic crystals from cubic colloids. Soft Matter 7, 4139– 4142 (2011). Gu, Y. et al. Collective alignment of nanorods in thin newtonian films. Soft Matter 9, 8532–8539 (2013). Toulemon, D. et al. Enhanced collective magnetic properties induced by the controlled assembly of iron oxide nanoparticles in chains. Adv. Funct. Mat. 26, 1616–3028 (2016). Xing, X. et al. Probing the low-energy structures of aluminummagnesium alloy clusters: a detailed study. Phys. Chem. Chem. Phys. 18, 26177–26183 (2016). Krivovichev, S. V. Combinatorial topology of salts of inorganic oxoacids: zero-, one- and two-dimensional units with corner-sharing between coordination polyhedra. Crystallography Reviews 10, 185– 232 (2004).

328

Graphs: Theory and Algorithms

14. Senyuk, B., Liu, Q., Bililign, E., Nystrom, P. D. & Smalyukh, I. I. Geometry-guided colloidal interactions and self-tiling of elastic dipoles formed by truncated pyramid particles in liquid crystals. Phys. Rev. E 91, 040501 (2015). 15. Kotani, M. & Ikeda, S. Materials inspired by mathematics. Science and Technology of Advanced Materials 17, 253–259 (2016). 16. Trefalt, G., Tadić, B. & Kosec, M. Formation of colloidal assemblies in suspensions for Pb(Mg1/3Nb2/3)O3 synthesis: Monte Carlo simulation study. Soft Matter 7, 5566–5577 (2011). 17. Živković, J. & Tadić, B. Nanonetworks: The graph theory framework for modeling nanoscale systems. Nanoscale Systems MMTA 2, 30–48 (2013). 18. Blunt, M. O. et al. Charge transport in cellular nanoparticle networks: Meandering through nanoscale mazes. Nano Letters 7, 855–860 (2007). 19. Šuvakov, M. & Tadić, B. Modeling collective charge transport in nanoparticle assemblies. Journal of Physics: Condensed Matter 22, 163201 (2010). 20. Tadić, B., Andjelković, M. & Šuvakov, M. The influence of architecture of nanoparticle networks on collective charge transport revealed by the fractal time series and topology of phase space manifolds. Journal of Coupled Systems and Multiscale Dynamics 4, 30–42 (2016). 21. Hirata, A. et al. Geometric frustration of icosahedron in metallic glasses. Science 341, 376–379 (2013). 22. Ikeda, S. & Kotani, M. A new direction in mathematics for materials science. Springer Briefs in the Mathematics of Materials 1 (Springer, Tokyo, 2015). 23. Šuvakov, M. & Tadić, B. Topology of Cell-Aggregated Planar Graphs. Computational Science–ICCS 2006: 6th International Conference, Reading, UK, May 28-31, 2006. Proceedings, Part III, Alexandrov, V. N., van Albada, G. D. & Sloot, P. Editors (Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, pp. 1098–1105 2006). 24. Tesoro, S., Göpfrich, K., Kartanas, T., Keyser, U. F. & Ahnert, S. E. Nondeterministic self-assembly with asymmetric interactions. Phys. Rev. E 94, 022404 (2016). 25. Bianconi, G. & Rahmede, C. Emergent hyperbolic network geometry. Sci. Rep. 7, 41974 (2017). 26. Krioukov, D., Papadopoulos, F., Kitsak, M., Vahdat, A. & Boguñá, M.

Hidden geometries in networks arising from cooperative

27.

28.

29. 30.

31.

32. 33.

34. 35.

36.

37.

38.

329

Hyperbolic geometry of complex networks. Phys. Rev. E 82, 036106 (2010). Šuvakov, M. & Tadić, B. Transport processes on homogeneous planar graphs with scale-free loops. Physica A: Statistical Mechanics and its Applications 372, 354–361 (2006). Toulemon, D. et al. Enhanced collective magnetic properties induced by the controlled assembly of iron oxide nanoparticles in chains. Advanced Functional Materials 26, 2454–2462 (2016). Bollobás, B. Modern Graph Theory (Springer, New York, inc, 1998). Dorogovtsev, S. N. Lectures on Complex Networks. Oxford Master Series in Statistical, Computational, and Theoretical Physics. (Oxford University Press, Oxford, 2010). Kozlov, D. Combinatorial Algebraic Topology. Springer Series Algorithms and Computation in Mathematics 21 (Springer-Verlag, Berlin, Heidelberg, 2008). Jonsson, J. Simplicial Complexes of Graphs. Lecture Notes in Mathematics. (Springer-Verlag, Berlin, 2008). Andjelković, M., Tadić, B., Maletić, S. & Rajković, M. Hierarchical sequencing of online social graphs. Physica A: Statistical Mechanics and its Applications 436, 582–595 (2015). Andjelković, M., Gupte, N. & Tadić, B. Hidden geometry of traffic jamming. Phys. Rev. E 91, 052817 (2015). Tadić, B., Andjelković, M., Boshkoska, B. M. & Levnajić, Z. Algebraic topology of multi-brain connectivity networks reveals dissimilarity in functional patterns during spoken communications. PLoS ONE 11, e0166787 (2016). Andjelković, M., Tadić, B., Mitrović Dankulov, M., Rajković, M. & Melnik, R. Topology of innovation spaces in the knowledge networks emerging through questions-and-answers. PLoS ONE 11, e0154655 (2016). Kennedy, W. S., Saniee, I. & Narayan, O. On the hyperbolicity of large-scale networks and its estimation. In 2016 IEEE International Conference on Big Data (Big Data), IEEEXplore, pp. 3344–3351 (2016). Albert, R., DasGupta, B. & Mobasheri, N. Topological implications of negative curvature for biological and social networks. Phys. Rev. E 89, 032811 (2014).

330

Graphs: Theory and Algorithms

39. Narayan, O. & Saniee, I. Large-scale curvature of networks. Phys. Rev. E 84, 066108 (2011). 40. Šuvakov, M., Andjelković, M. & Tadić, B. Applet: Simplex aggregated growing graph Date of access: 03/01/2018. http://suki.ipb.ac.rs/ggraph/ (2017) 41. Atkin, R. H. An algebra for patterns on a complex, II. International Journal of Man-Machine Studies 8, 483–498 (1976). 42. Johnson, J. Some structures and notation of Q-analysis. Environment and Planning B: Planning and Design 8, 73–86 (1981). 43. Beaumont, J. R. & Gatrell, A. C. An Introduction to Q-Analysis. (Geo Abstracts, Norwich-Printed by Edmund Nome Press, Norwich, 1982). 44. Bianconi, G., Rahmede, C. & Wu, Z. Complex quantum network geometries: Evolution and phase transitions. Phys. Rev. E 92, 022815 (2015). 45. Ball, R., Nauenberg, M. & Witten, T. A. Diffusion-controlled aggregation in the continuum approximation. Phys. Rev. A 29, 2017– 2020 (1984). 46. Bron, C. & Kerbosch, J. Finding all cliques of an undirected graph. Comm ACM 16, 575–577 (1973). 47. Blondel, V. D., Guillaume, J. L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10, P10008 (2008).

Using Graph Theory to Assess the Interaction between Cerebral Function, Brain Hemodynamics, and Systemic Variables in Premature Infants

18

Dries Hendrikx 1,2 Liesbeth Thewissen 3,4 Anne Smits,3,4 Gunnar Naulaers,3,4 Karel Allegaert , 3,5,6 Sabine Van Huffel1,2 and Alexander Caicedo1,2

Department of Electrical Engineering (ESAT), STADIUS, KU Leuven, Leuven, Belgium

1

imec, Leuven, Belgium

2

Department of Development and Regeneration, KU Leuven, Leuven, Belgium

3

Department of Neonatology, UZ Leuven, Leuven, Belgium

4

Department of Pediatric Surgery and Intensive Care, Erasmus MC-Sophia Children’s Hospital, Rotterdam, Netherlands

5

Department of Neonatology, Erasmus MC-Sophia Children’s Hospital, Rotterdam, Netherlands

6

Citation: (APA): Hendrikx, D., Thewissen, L., Smits, A., Naulaers, G., Allegaert, K., Van Huffel, S., & Caicedo, A. (2018). Using graph theory to assess the interaction between cerebral function, brain hemodynamics, and systemic variables in premature infants. Complexity, 2018. (15 pages) DOI: https://doi.org/10.1155/2018/6504039 Copyright:- © © 2018 Dries Hendrikx et al. This is an open access article distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License.

332

Graphs: Theory and Algorithms

ABSTRACT Graphs can be used to describe a great variety of real-world situations and have therefore been used extensively in different fields. In the present analysis, we use graphs to study the interaction between cerebral function, brain hemodynamics, and systemic variables in premature neonates. We used data from a propofol dose-finding and pharmacodynamics study as a model in order to evaluate the performance of the graph measures to monitor signal interactions. Concomitant measurements of heart rate, mean arterial blood pressure, arterial oxygen saturation, regional cerebral oxygen saturation—measured by means of near-infrared spectroscopy—and electroencephalography were performed in 22 neonates undergoing INSURE (intubation, surfactant administration, and extubation). The graphs used to study the interaction between these signal modalities were constructed using the RBF kernel. Results indicate that propofol induces a decrease in the signal interaction up to 90 minutes after propofol administration, which is consistent with clinical observations published previously. The clinical recovery phase is mainly determined by the EEG dynamics, which were observed to recover much slower compared to the other modalities. In addition, we found a more pronounced loss in cerebral-systemic interactions with increasing propofol dose.

INTRODUCTION A graph is a structure that can be used to represent the relation between different objects. In this context, a graph can be thought of as a diagram which consists of a set of points, where some or all of them are joined by lines. Formally, the points of the graph are referred to as vertices or nodes, whereas the lines between them are called edges or links. In general, graphs can be used to describe a great variety of real-world situations [1]. Think, for example, of a social network, where people are represented by nodes and the edges between the nodes are used to indicate friendship. Another example is a geographic network of cities, with an edge between two cities indicating a direct connection through a highway. In addition to the presence (or lack) of an edge connecting two nodes, extra measurements can be associated with the edges. These measurements are formally referred to as edge weights. In a social network, edge weights could be used to denote the strength of the friendship (acquaintances, close friends,). In a geographic network, the weights can indicate the physical distance or the amount of traffic typically encountered on each road. Mathematically, this type of diagram corresponds to a weighted graph.

In the present analysis, weighted graphs are used to study the interaction of cerebral function, brain hemodynamics, and systemic variables in premature neonates. Multiple studies are available in the literature that

Using Graph Theory to Assess the Interaction between Cerebral ...

333

studied the pairwise interactions between some of these variables. Caicedo et al. analysed the relation between mean arterial blood pressure (MABP) and regional cerebral oxygen saturation (rScO2), measured by means of nearinfrared spectroscopy (NIRS) [2]. The coupling between these two variables, defined using a transfer function approach, was found to be a measure to assess cerebral autoregulation. Semenova et al. examined the relation between MABP and electroencephalography (EEG) [3]. The authors documented that preterm infants with a high clinical risk index for babies (CRIB) score were found to be associated with a higher nonlinear coupling between EEG activity and MABP, quantified by means of mutual information. Tataranno et al. examined the relation between rScO2 and EEG and found that increased oxygen extraction was related to spontaneous activity transients observed in the EEG [4]. In contrast to the studies mentioned above, we aim to analyse the interaction between cerebral and systemic variables using an extended multimodal approach, integrating three systemic variables: heart rate (HR), MABP and arterial oxygen saturation (SaO2), rScO2, and EEG.

This study is situated within the interdisciplinary field of network physiology, which analyses how diverse physiologic systems dynamically interact and collectively behave to produce distinct physiologic states and functions [5]. Moreover, the use of graphs enables a graphical representation of the interaction between the different physiological systems in time. This study shows for the first time a comprehensive model of different physiological processes comprising auto regulation, neurovascular coupling, or bar reflex, working at the same moment in time. In literature, most studies focus on these processes individually without taking into account the influence of the other processes. With the graph approach outlined in this paper, we try to show the different processes, their interaction, and the importance of the individual processes at each moment in time. To the best of our knowledge, this is a totally new mindset and way of showing the physiological interaction between cerebral function, brain hemodynamics, and systemic variables in newborn neonates. The interaction between the different variables is studied using premedication by means of propofol as a model. Propofol (2, 6 diisopropylphenol) is a short-acting anesthetic: it has a rapid onset of action and is generally short in duration. In neonates, however, it is documented that clinical recovery takes time [6]. In clinical practice, propofol is administered to the neonates as a single intravenous (IV) bolus. Propofol administration is frequently associated with a decrease in MABP in neonates [6–11], children [12], and adults [13–15]. Propofol distributes into the central nervous system

334

Graphs: Theory and Algorithms

and fat tissue immediately after intravenous dosing, which explains the rapid onset of this anesthetic drug. In a secondary phase, propofol is redistributed into the circulation, which leads to vasodilation. Combined with the blunted reflex tachycardia, this can result in hypotension [10]. Therefore, a decrease in MABP is observed up to one hour after administration of propofol in neonates [8]. Premedicating neonates with propofol generally causes a modest and short-lasting decrease in HR, SaO2, and rScO2, as opposed to the longer-lasting and more pronounced decrease in MABP [8], [11, 16, 17]. In addition, the discontinuity pattern of the EEG is also influenced by propofol, which induces a reversible state of diminished responsiveness behaviorally similar to quiet (nonrapid eye movement (NREM)) sleep [18]. During quiet sleep, the EEG of premature neonates shows a spontaneous, physiological discontinuity of electrical activity, characterized by higher amplitude, lowerfrequency EEG rhythms (tracé alternant (TA)) [19, 20]. This phenomenon is generally referred to as burst suppression, which corresponds to an increase in interburst interval (IBI) duration [21, 22]. Moreover, a larger IBI duration is associated with smaller FTOE values, which indicate lower brain energy consumption [23]. This paper is structured as follows. Section 2 describes the dataset used in the present analysis. Section 3 discusses the methods, which include EEG processing, the construction of the graph models, and the definition of features computed from the graph models to quantify the strength of the effect of propofol on these interactions. Section 4 presents the results of the paper, which are extensively discussed in Section 5. Finally, Section 6 summarizes the conclusions.

DATASET The dataset used in the present analysis was collected as part of a study on propofol dose selection by Smits et al. [6]. In the study, 50 neonates were sedated using propofol as part of an endotracheal intubation procedure. All subjects in the group of study were recruited at the NICU of the University Hospitals Leuven, Gasthuisberg. The trial was registered on ClinicalTrials. gov NCT01621373, and ethical approval was provided by the ethical committee at the University Hospitals Leuven. Due to incomplete data and overly noisy channels found in 28 neonates, only 22 of the 50 neonates are included in this study. These neonates were all sedated using propofol as part of an INSURE (intubation, surfactant, and extubation) procedure. The neonates are characterized by median (range) postmenstrual ages (PMA) of 30 (26–35) weeks and a median (range) dose of propofol (Diprivan 1%; AstraZeneca, Brussels,

Using Graph Theory to Assess the Interaction between Cerebral ...

335

Belgium) of 1.0 (0.5–4.5) mg·kg−1 . In the present analysis, the neonates are stratified into three groups, based on PMA, since this is a major covariate of propofol clearance in the absence of variability in postnatal age (PNA) [24]. These groups are generally referred to as extremely preterm (group 1: about the clinical characteristics of the subjects can be found in the original paper by Smits et al. [6].

Practices on propofol dosing, particularly in highly vulnerable premature neonates, are not standardized and vary between different NICUs. Multiple studies, however, indicate that propofol dose values of 2.0 to 2.5 mg·kg−1 should be used as preintubation medication in premature neonates [9–11, 16]. The dataset used in the present analysis was collected with the aim to find the median effective dose (ED50) of propofol for sedation. Therefore, lower values of starting propofol dose were used, as indicated in Table 1. More specifically, administered dose ranges from 0.5 to 4.5 mg·kg−1 [6]. In general, the oldest neonates were sedated using higher propofol doses compared to the youngest neonates, as can be observed from Table 1. Table 1. Stratification of the neonates into three age groups, based on postmenstrual age (PMA) in weeks. For each group, the number of patients, postnatal age (PNA) of the patients, and propofol dose values administered to the subjects in the group are presented.

The multimodal dataset used in this study consists of concomitant measurements of five signal modalities, comprising HR, MABP, SaO2, rScO2, and EEG, recorded from 5 minutes before propofol administration up to 10 hours after. For each neonate, a 6-hour long segment of multimodal data was considered in the analysis, where t = 0 was aligned with the moment of propofol administration. This length was defined based on the shortest recording found in the dataset. Thus, all signals were shortened to six hours for all patients in order to provide uniformity. Moreover, the use of a long time window of 6 hours allows focusing on the regime of interest, since we can study the effect of propofol together with the recovery of the neonates from the drug. Propofol is a threecompartment drug, characterized by a short α and β (median estimates of 1 and 13 minutes, resp.) and a long γ half-life (median estimate of 350 minutes) [26, 27]. The pharmacodynamic

336

Graphs: Theory and Algorithms

effects are primarily associated with the first (α) and second (β) exponential half-life, which indicates that the effect of propofol at the end of the analysis window is minimal. This is confirmed by Smits et al., Vanderhaegen et al., and Ghanta et al., who all observed a clinical recuperation from single intravenous bolus propofol administration within the first hour in neonates [6, 8, 16]. Therefore, the analysis window is divided in two parts: the first 3-hour long time window is used to study the response of the neonates to propofol and the intubation procedure, while the last 3 hours are used as reference. Figure 1 presents an example of a 6-hour long segment of multimodal data for one neonate from the group of study. The systemic variables (HR (beats/min), MABP (mmHg), and SaO2 (%)) were measured with IntelliVue MP70 (Philips, Eindhoven, The Netherlands) with a Nellcor pulse oximeter. These variables were recorded continuously with a sampling frequency of 1 Hz (Rugloop; Demed, Temse, Belgium). All 22 neonates incorporated in the present analysis had an arterial line, which enabled an invasive measurement of MABP. NIRS was used to measure rScO2 (%) noninvasively with INVOS 5100 using a cerebral neonatal OxyAlert NIRS sensor (Covidien, Mansfield, Massachusetts). As for the systemic variables, the sampling frequency for rScO2 is equal to 1 Hz. Cerebral functioning was assessed using a one-channel EEG (μV). The EEG was measured between the C3 and C4 electrodes according to the international 10–20 system with a sampling frequency of 100 Hz (Olympic Cerebral Function Monitor 6000, Natus). EEG segments with impedance values exceeding 10 kΩ were removed from the raw EEG signal [28]. In addition, movement artifacts identified as rapid changes in the impedance measurement were detected and also removed from the raw EEG signal.

METHODS Running Interburst Interval Duration In general, EEG signals of premature neonates alternate between periods of activity, called bursts or burst intervals (BIs), and periods of suppressed activity, referred to as IBIs. Thus, the morphology of neonatal EEG is discontinuous, as indicated by the IBIs. However, this discontinuous pattern evolves towards a more continuous trace with increasing PMA. Therefore, some studies have investigated the use of the length of the IBIs as a marker for maturation [29, 30].

Using Graph Theory to Assess the Interaction between Cerebral ...

337

Due to the different temporal characteristics between the EEG and all other signal modalities, the EEG signals are processed in order to obtain surrogates for brain activity in a similar time frame as the other measured signals. The EEG signal is segmented in burst and IBI segments using an in-house algorithm based on the line length [31]. The root mean squared (RMS) value and the duration in time for burst and IBIs in overlapping windows of two minutes are used as a surrogate for EEG. The running window is shifted in one second, producing a new score every second. In this way, the sampling frequency of the surrogate measures for EEG has the same sampling frequency as the other signal modalities.

Figure 1. Illustration of the 5 signal modalities used to construct graph models for the neonates. A 6-hour long segment of multimodal data is presented for one neonate in the group of study (PMA 27 weeks, 0.5 mg·kg−1).

In total, five features are computed from the discontinuous neonatal EEG: running RMS values of the original EEG, BIs, and IBIs and running duration values of the BIs and IBIs. In this paper, we only report the results using the running IBI duration, since this is a very robust measure for EEG activity, and thus cerebral metabolism, as validated by our group in a previous study [31]. In addition, this measure is highly interpretable. It is important to note, however, that the other EEG features indicate similar results, since the different feature values are highly related. An example of the five EEG features is presented in Figure 2.

338

Graphs: Theory and Algorithms

Graph Model Developed for This Study In order to quantify the common dynamics of the different signal modalities, and changes thereof due to propofol, the interaction between the variables is modeled using a graph, as illustrated in Figure 3. In general, a graph is defined by a nonzero number of vertices (nodes) and a number of edges (links, connections) between these nodes. The model for the neonates is constructed using a complete graph. A complete graph is characterized by the presence of an edge between all the vertices. The vertex set V of the graph consists of n = 5 vertices, corresponding to the 5 signal modalities measured in the present analysis, that is, (1)

A complete graph with n vertices has m = n − 1 /2 edges. Therefore, the edge set E of the graph considered here consists of 10 edges. The vertices of the graph model defined in (1) are connected by edges. These edges are defined by the corresponding edge weight values, which are generally used to assess the strength of the connection between a pair of vertices. The topology of the complete graph described in (1) is assumed to be fixed in time. The edge weights, however, change in time, which we hypothesize to reflect the changes in the interaction between the different signals. In order to compute the graph models, the signals are first normalized to N (0, 1), since we are interested in the assessment of common dynamics (signal trends in time) and not absolute values of the signals. Next, the edge weights are computed using a 15-minute long running window of multimodal data, which is shifted by 1 minute (14 minutes overlap). Thus, new edge weight values are computed every minute. Finally, two types of interaction curves are extracted from the graph models: the pairwise interaction between two signal modalities, represented by the time course of the corresponding edge weight, and the overall signal interaction, represented by the graph average degree (see Section 3.4).

Using Graph Theory to Assess the Interaction between Cerebral ...

339

Figure 2. Illustration of the features computed from the EEG signal. (a) Illustrates a 6-hour long EEG segment for one neonate in the group of study (PMA 27 weeks, 0.5 mg·kg−1). (b) Illustrates the running RMS value. (c) and (d) illustrate the running RMS and running duration values for BIs (black) and IBIs (gray), respectively

Figure 3. Physiological network representing the interaction between 5 signal modalities recorded on a neonate after propofol administration. The graph consists of 5 vertices, corresponding to the signal modalities. In addition, an edge is present between every pair of nodes (complete graph). Each edge is defined by a weight value that represents the interaction between the corresponding signal modalities.

In the present analysis, weight values are used to denote the interaction between two vertices, that is, two signal modalities. If two modalities are characterized by common nonlinear interactions, they follow the same trends

340

Graphs: Theory and Algorithms

in time. We compute the pairwise similarity using two different similarity measures. Consequently, we generate two graph models for each neonate. Both similarity measures use the radial basis function (RBF) kernel, which is a nonlinear similarity measure. As such, the similarity of the different signals is assessed in a possibly infinitely dimensional feature space, defined by the nonlinear map ⊆. However, the similarity in this feature space is computed implicitly using the RBF kernel function. The first similarity measure kT (xi, xj) uses the raw signals in the RBF kernel and is thus defined as

(2) where xi and xj represent two segments of multimodal data [32] (subscript T indicates that time domain signals are used for the Euclidean distance in the exponent of the RBF kernel). In the present analysis, xi and xj are segments with a length of 15 minutes, as mentioned before. The similarity kT(xi , xj) is bounded by 0 (absence of common interactions) and 1 (exact common interactions). The signal similarity computed by (2) is a function of the Euclidean distance between input signals. Consequently, it highly depends on signal amplitudes and can be affected by delays between the signals. A graph model computed using the similarity measure kT (xi, xj) is denoted as GT.

The second similarity measure uses the power spectral density (PSD) of the signals in the RBF kernel. Thus, the time input data is transformed to the frequency domain, before computing the RBF kernel function. Mathematically, this similarity measure kF (xi , xj) is defined as

(3)

where Sxi and Sxj represent the PSD of input signals xi and xj (length of 15 minutes), respectively (subscript F indicates that frequency domain signals are used for the Euclidean distance in the exponent of the RBF kernel). The PSD is computed using Welch’s method using overlapping sub windows of 5 minutes in order to reduce the noise in the PSD estimate (with use of Hamming window, overlap of 4 minutes and 59 seconds). Note that the kernel presented in (3) is a valid positive definite kernel, since the input data is transformed before application of the kernel function. As before, the similarity defined by kF (xi, xj) is bounded by 0 and 1. The transformation to the frequency domain allows to include time-delayed signal interactions and interactions of opposite sign, in contrast to kT (xi, xj) which only takes into account instantaneous amplitude interactions. In physiological

Using Graph Theory to Assess the Interaction between Cerebral ...

341

systems, it is possible that if one signal increases (decreases), another signal decreases (increases) to maintain homeostasis and that this interaction is not instantaneous but delayed. A graph model computed using the similarity measure kF (xi, xj) is denoted as GF.

Kernel Tuning In order to compute the similarity measure, the bandwidth σ of the RBF kernel should to be tuned, that is, optimized to avoid kernel over fitting and under fitting. In the present analysis, the similarity measures kT (xi, xj) and kF (xi, xj) both depend on this parameter σ. The optimization procedure is the same for both similarity measures. Therefore, it is outlined in terms of k (xi, xj), which represents the two similarity measures. The strategy used to select the kernel bandwidth for the present analysis considers kernel matrix Ω, which is defined as (4)

Note that the kernel matrix Ω is defined by the kernel bandwidth σ through the definitions presented in (2) and (3). The kernel bandwidth σ is tuned by maximizing the Shannon entropy of kernel matrix Ω. The Shannon entropy H Ω is defined as (5)

where pk is equal to the probability of seeing the kth possible element of matrix Ω. The entropy is thus determined by estimation of the probability density function (PDF) of matrix Ω. By maximizing the Shannon entropy, we try to obtain a uniform distribution of the values in the kernel matrix, and therefore, we avoid over fitting as well as under fitting. The kernel bandwidth is tuned for each neonate individually. The tuned bandwidth is denoted as σopt. The following optimization problem is defined to estimate σopt: (6)

with (7)

342

Graphs: Theory and Algorithms

where ΩC is a collection of kernel matrices, computed from all the signal segments recorded per neonate. Thus, a collection of kernel matrices is computed from the 6-hour long data segment instead of only one kernel matrix in the optimization procedure. If we would consider only one kernel matrix per neonate, it would only contain 25 entries, since the kernel matrix is a 5 × 5 matrix. Clearly, this is not enough data to estimate a robust PDF. Therefore, to solve this problem, we assume that the graph model does not change and that it is situated in the same nonlinear subspace throughout the 6-hour long analysis window. This assumption indicates that σopt should be uniform throughout the analysis window and that σopt can be computed using a concatenation of kernel matrices ΩC, as defined in the optimization problem in (6) and (7). Figure 4 illustrates the optimization procedure in a schematic way. The original data segment of 6 hours was segmented into no overlapping segments of 15 minutes. Thus, N = 24 signal segments of 15 minutes were defined. For each of these segments l, kernel matrix Ωl was computed and all these kernel matrices Ωl were concatenated as indicated in (7). The use of a collection of kernel matrices allows to estimate the probability density function, and consequently, the Shannon entropy. Therefore, H (ΩC) is characterized by one global maximum. For the group of study, median (range) values of σopt are 27 (26–29) and 94 (86–113) for kT (xi , xj) and kF (xi , xj) , respectively.

Graph Measures In order to assess the overall interaction of the multimodal dataset, the average degree of the graph is used. This section introduces the adjacency matrix A of a graph, the degree di of a vertex, and the average degree δ (G) of a graph G.

Adjacency Matrix A weighted graph G consists of a nonempty finite set V of elements called vertices vi (or nodes) and a finite set E of distinct unordered pairs of distinct elements of V called edges wij (or links) [33]. Note that the edges of the graph are represented by their weights wij. The adjacency matrix A is a matrix commonly used to define the graph G. The adjacency matrix A denotes the presence of edges between the vertices vi of V and their corresponding weights. More precisely, the adjacency matrix A is constructed as

(8)

Using Graph Theory to Assess the Interaction between Cerebral ...

343

Figure 4. Method used to tune the kernel bandwidth σ. In (a), the data is segmented in nonoverlapping signal segments of 15 minutes. For all of these segments, a kernel matrix Ωl is computed using a predefined σgs (4). All the individual kernel matrices Ωl are concatenated in ΩC, which is depicted in (b). Next, the Shannon entropy of ΩC is computed. This procedure is repeated for a range of σ values. The σ value associated with maximal H (ΩC) is selected as the bandwidth for the kernel function.

Vertex Degree The degree dj associated with a vertex vj of an undirected weighted graph G, with adjacency matrix A, is defined as the sum of all edges incident to vj: (9)

where n is the number of vertices. Therefore, the degree dj characterizes the connection strength of the vertex vj with respect to the other vertices of the graph. In practice, the weights of the edges of a graph are often restricted to a predefined range, which is often normalized to wij ∈ [0, 1]. Considering normalized weights, the degree is bounded by 0 and n − 1, where n is the number of vertices of the graph, that is, (10)

If dj= 0, vertex vj is called an isolated vertex, since it is not connected to any other vertex of the graph. A vertex degree dj= n − 1 indicates a dominating vertex vj, connected to all other vertices of the graph with edge weight equal to 1.

344

Graphs: Theory and Algorithms

Average Degree The average degree δ (G) of a graph G is defined as the mean value of all vertex degrees dj

(11)

and is a measure associated with the overall connectivity of the graph. Evidently, the bounds of δ (G) are equal to those of the individual vertex degree dj defined in (10). Small values (close to 0) imply a weak connectivity, whereas high values (close to n − 1) indicate a very strong connectivity of the graph.

Features to Quantify Interaction Strength In order to quantify the strength of the changes in signal interaction, two features are computed from the interaction curves: the normalized area S between the interaction curve and reference level and the maximal deviation Δ from the reference level. Both feature values are computed in a time frame from 0 to 90 minutes after propofol administration. Reference levels are defined as the median value of an interaction curve in a time frame from 180 to 360 minutes after propofol administration, as mentioned before. Normalization of S is done by dividing the area by the length of the time interval.

Figure 5. Features used to quantify the reduction in signal interaction strength: S (gray shaded area) (a) and Δ (gray arrow) (b). The feature values are illustrated for one neonate in the group of study (PMA 30 weeks, 2.5 mg·kg−1 ), where the pairwise interaction was computed using kT (xi , xj) (2). Feature values S and Δ are computed from 0 to 90 minutes, while the reference level is defined as the median value of the interaction curve from 180 to 360 minutes.

Using Graph Theory to Assess the Interaction between Cerebral ...

345

Note that S and Δ are bounded by 0 (no deviation from the reference level) and 1 (very strong deviation from the reference level). Figure 5 presents a graphical example of S (Figure 5(a)) and Δ (Figure 5(b)). The features are computed from the interaction curves in order to assess the effect of propofol on the dynamical interactions among the different signal modalities. In addition, we investigated how these features change with PMA and propofol dose. In the present analysis, the relation between the feature values S and Δ (dependent variables) and PMA and propofol dose (predictor variables) is studied using linear regression models. The coefficient of determination Ri 2 is used to indicate the goodness of fit of the linear model (subscript i denotes the predictor variable i). In addition, the coefficient of partial determination was computed to account for the effect of both predictor variables at the same time. The significance of the coefficient of (partial) determination was assessed using the Monte Carlo permutation test with 105 repetitions. A p < 0 05 was defined to be statistically significant. A single asterisk, double asterisks, and triple asterisks denote a p value smaller than 0.05, 0.01, and 0.001, respectively.

Implementation The analysis, the corresponding computations, and figures presented throughout this study are implemented using MATLAB Release 2016b (The Math Works, Natick, Massachusetts). Graph theory analysis is performed using the MATLAB toolbox for network analysis, provided by MIT Strategic Engineering [34].

RESULTS MABP-EEG Pairwise Interaction The interaction curves of MABP with respect to EEG after administration of propofol at t = 0 minutes are illustrated in Figure 6. These curves are computed using kT (xi, xj), defined in (2). The EEG signal is represented by the running IBI duration, as outlined in Section 3. From top to bottom, the interaction pattern is shown for the entire group of study (N = 22) and the individual age groups presented in Table 1. First, a pronounced loss in interaction is observed, followed by a gradual increase to a reference level, which is in general reached at t = 90 minutes. Note that this loss in interaction is present among all of the signal modalities of the multimodal dataset, as indicated by the graphs in Figure 7. Figure 8 presents the relation

346

Graphs: Theory and Algorithms

between the features used to quantify interaction strength (S and Δ) and PMA and propofol dose. In addition to the data points, the least squares linear fit is defined (straight lines), together with the 0.95 percentiles of the linear fit (shaded area). The goodness of the linear fit is assessed using the coefficient of determination R2 i, which is equal to R2 A = 0 09 and R2 D = 0 53 for feature S and R2 A = 0 17 and R2 D = 0 30 for feature Δ (subscripts A and D are used to denote PMA and dose, resp.). Since PMA and dose are correlated (Pearson correlation coefficient rAD = 0 45), we also define the coefficient of partial determination in order to account for the effect of both predictor variables on features S and Δ. Numerical values are equal to R2 A∣D = 0 002 and R2 D∣A = 0 49 for feature S and R2 A∣D = 0 05 and R2 D∣A = 0 20 for feature Δ. The statistical significance of the coefficients of (partial) determination is denoted in Figure 8. Finally, it is important to note that PMA and dose are not collinear using a linear model. This can be assessed by computing the variance inflatable factor (VIF) [35], which is equal to V IF = 1 2572. A VIF close to 1 indicates the lack of collinearity.

Figure 6: Signal interaction between MABP and EEG after administration of propofol at t = 0 minutes. The signal interaction was computed using kT (xi , xj) . A reduction in interaction is observed among the different signal modalities after the administration of propofol, with a slow recovery to the reference level. The black line and gray shaded area present the median and interquartile range (IQR), respectively.

Using Graph Theory to Assess the Interaction between Cerebral ...

347

Overall Interactions Figure 9(a) presents a comparison of the vertex degree di (in red is the interaction of modality i with respect to the other modalities) with the average degree δ (GT) (in black is the average interaction of all signal modalities) for all of the signal modalities after administration of propofol at t = 0 minutes. The curves are computed from graph models constructed using the similarity measure kT (xi, xj) (2). The results are presented for the whole group of study (N = 22). Propofol-induced loss of interaction among the signals is associated with a drop in δ (GT). The drop in average graph degree can also be observed in Figure 7, which illustrates the graph model for one neonate in the group of study at different time instances. As shown in Figure 9(a), the δ (GT) value is highly determined by dMABP during the first 30 minutes. Indeed, the MABP vertex degree is considerably lower compared to the degree of the other modalities in this time frame. From 30 minutes onwards, the increase of δ (GT) to the reference level is highly influenced by dEEG, which is associated with the slowest recovery in signal dynamics. Figure 9(b) shows the vertex degree di (red) with the graph average degree δ (GF) (black) after propofol administration at t = 0 for the graph models constructed using the second similarity measure, that is, kF (xi , xj) (3). As before, the results are presented for the whole group of study (N = 22). A reduction in interaction can be observed after propofol administration, which is in agreement with the results of Figure 9(a). Again, MABP is observed to be the contributing factor in the propofol-induced loss of interaction during the first 30 minutes after propofol administration.

Figure 7. Changes in the physiological network, assessed using a graph model GT, for one neonate in the group of study (PMA 30 weeks, 1.0 mg·kg−1) at three different time instances: plots (a), (b), and (c) illustrate the edge weights for t = 10, 30, and 180 minutes after propofol administration, respectively. The graph

348

Graphs: Theory and Algorithms

model was constructed using kT (xi, xj) defined in (2). Under each graph, the average graph degree δ (GT) is presented in a time frame starting right after propofol administration (t = 0) up to 6 hours after. The average graph degree measures the average connection strength of the graph edges. From (a) to (c), the edge weights increase, which translates in an increased δ (GT).

Indeed, this vertex is associated with the lowest degree values during this time frame. From 30 minutes onwards, the increase of δ GF is again influenced by EEG dynamics. This effect is however less pronounced compared to the observation of Figure 9(a). In general, the results from kT (xi , xj) and kF (xi , xj) are similar, which might indicate that time delayed and/or interaction of opposite signs are not present in our dataset or that the influence of those interactions is not relevant, probably due to the length of the analysis window (15 minutes) that we used in the analysis.

DISCUSSION In the present analysis, we study how different physiologic systems dynamically interact and collectively behave after a propofol bolus administration in preterm neonates. These physiologic systems are presented by the different signal modalities under study. Note that we focus on the interaction between the brain and the cardiovascular system. This study can therefore be situated in the interdisciplinary field of network physiology [5]. Results indicate that propofol causes a change in the dynamical interactions between the different signals up to 90 minutes after propofol administration. The strength of this effect was observed to be mainly determined by propofol dose. In addition, the recovery phase was observed to be mainly determined by EEG dynamics, due to a much slower recovery to the reference level compared to the other signal modalities.

MABP-EEG Pairwise Interaction Sedation of neonates using propofol induces a reduction in the interaction between MABP and EEG (Figure 6), with only a slow, gradual increase back to the reference level. The most pronounced decrease in interaction pattern is associated with the oldest neonates in the group of study (moderate to late preterm): a strong loss of interaction is observed during the first 60 minutes

Using Graph Theory to Assess the Interaction between Cerebral ...

349

after propofol administration, followed by a brisk increase back to baseline (Figure 6(d)). This pattern clearly differs from that of the younger neonates (extremely to very preterm), which are characterized by a less-pronounced reduction in interaction and a more gradual increase back to reference levels (Figures 6(b) and 6(c)). Two possible indicators for the observed difference in signal interaction patterns are proposed. Both indicators are based on signal amplitude changes, since the signal interaction measure kT (xi, xj) highly depends on signal amplitudes. Firstly, the discontinuity pattern of neonatal EEG changes with age. Especially, the oldest neonates (moderate to late preterm) are characterized by a much more continuous EEG pattern (tracé continue) compared to the younger neonates (extremely to very preterm; tracé discontinue) [30]. A more continuous EEG can result in a more pronounced increase in IBI duration after propofol, potentially explaining the more pronounced loss in signal interaction observed among the oldest neonates in the group of study. Secondly, Simons et al. observed a higher incidence of hypotension with increasing dose of propofol [10]. In this study, higher doses were administered to older neonates, as demonstrated by Table 1. Evidently, a more pronounced impact on MABP can be responsible for a stronger loss in signal interaction. Since PMA and propofol dose (predictor variables) are correlated (rAD = 0 45), the influence of each factor on the resulting signal interaction pattern is assessed using features S and Δ (independent variables). Figure 8 presents the relation between these features and PMA and propofol dose. From Figure 8, it is clear that the influence of PMA on the independent variables is minimal, especially when taking into account the influence of the dose. Indeed, the coefficients of partial determination are very small for PMA. (R2 = 0 002 and R2 A∣D = 0 05 for S and Δ, resp.). A∣D

This observation is confirmed by the fact that the coefficient of partial determination is only slightly smaller compared to the coefficient of determination for propofol dose, especially for feature S. Therefore, it is clear that the interaction between MABP and EEG is mainly influenced by propofol dose. The difference in interaction pattern observed in Figure 6 is thus mainly caused by the difference in propofol dose administered to the neonates in the different age groups, and not by the difference in PMA.

350

Graphs: Theory and Algorithms

Figure 8. The relation between features S and Δ, computed from the MABPEEG interaction curves presented in Figure 6, and PMA and propofol. The data points and the linear least squares fit are depicted in black and gray, respectively.

The shaded area indicates the 95-percentage confidence bounds on the least squares fit. The coefficient of (partial) determination is indicated in each plot (subscripts A and D denote PMA and propofol dose, resp.). A single asterisk, double asterisks, and triple asterisks denote a p value smaller than 0.05, 0.01, and 0.001, respectively.

Overall Interactions The phase of sedation using propofol is characterized by a markedly different network structure compared to the reference phase, indicating a clear association between network topology and physiologic function. This is illustrated in Figure 7: after 10 minutes, the graph is weakly connected indicating a highly reduced overall signal interaction as opposed to the strongly connected graph observed at 3 hours after propofol administration.

Using Graph Theory to Assess the Interaction between Cerebral ...

351

Figure 9. Comparing the vertex degree values (red) with the graph average degree (black) after administration of propofol at t = 0 minutes. The graph models were constructed using kT (xi, xj) (a) and kF (xi, xj) (b). The results are presented for the whole group of study (N = 22). From top to bottom, the vertex degree di is compared to the graph average degree δ (G) for HR, MABP, SaO2, rScO2, and EEG, respectively. dMABP highly determines the signal interaction pattern during the first 30 minutes after propofol administration, while dEEG highly influences the signal interaction pattern from 30 minutes to 90 minutes after propofol administration. After 90 minutes, the neonates are recovered from propofol, as indicated by the steady reference levels observed after 90 minutes.

MABP is observed to be the main contributor to the reduction in signal interaction during the first 30 minutes after propofol administration, as indicated in Figure 9. During this time frame, MABP strongly influences the strength of the overall interaction pattern, since the vertex degree is lower compared to the average graph degree. This effect can partly be explained as an amplitude effect. Indeed, propofol administration is associated with a pronounced decrease in MABP, which can last up to one hour after propofol administration, as described by many authors [6–8, 10]. The physiologic response of the other signal modalities is less affected by propofol compared to MABP. This pronounced change in signal amplitude could explain why MABP highly influences the overall interactions, especially during the first 30 minutes after propofol administration. It is important to note, however, that the explained loss in signal interaction cannot be entirely explained by only taking into account the signal amplitude and change thereof in time.

352

Graphs: Theory and Algorithms

Indeed, the propofol-induced loss in signal interaction is also observed in Figure 9(b), which presents the results using similarity measure kF (xi, xj). This measure assesses the interaction of the signals in the frequency domain. For 30 minutes up to 90 minutes after propofol administration, the degree of the EEG signal is considerably lower than the degree values of the other modalities. As before, this finding can be observed in Figure 9. The EEG signal is the only signal associated with degree values below the average degree, indicating the slow recovery of EEG dynamics with respect to the other modalities. Thus, MABP dynamics recover faster (generally recovered 30 minutes after propofol administration) compared to EEG dynamics (recovery takes up to 90 minutes after propofol administration). From a signal processing point of view, this might indicate the safety of propofol, since MABP can adapt to the needs of brain metabolism, once the EEG signal is recovered. It is important to note, however, that the neonates included in the present analysis were all sedated using propofol as part of an INSURE procedure. Surfactant causes a significant decrease in EEG activity, which can last up to 24 hours after surfactant administration, as described by van den Berg et al. [36]. Therefore, surfactant could also influence the decreased EEG interactions observed in Figure 9. The extent of this effect is however not clear at this point, since no control group without surfactant was available to compare with. From 90 minutes after propofol administration onwards, the vertex degree and average degree curves presented in Figure 9 are characterized by stable reference levels. This indicates that the signal interaction pattern is restored after propofol administration.

CONCLUSIONS In this study, we have shown that graph theory can be used to assess changes in signal interaction and that the resulting graph models can be used to study the difference between distinct physiologic states. Moreover, for our propofol case study, we derived that the overall signal interaction pattern after propofol administration is highly influenced by both MABP and EEG. The MABP signal is the main contributor to the loss in signal interactions during the first 30 minutes after propofol, due to the strong decoupling of MABP dynamics with respect to the other signal modalities, while the EEG signal highly influences the interaction pattern thereafter. This finding indicates that MABP dynamics recover first, followed by a much slower recovery of the EEG signal, meaning that MABP

Using Graph Theory to Assess the Interaction between Cerebral ...

353

dynamics are recovered while EEG metabolism is still down. Thus, when EEG dynamics recover, MABP can adapt to supply new needs of the brain in order to sustain its function Propofol affects signal dynamics with an overall recovery time of around 90 minutes, as assessed by the graph average degree. After 90 minutes, these curves are characterized by steady reference levels, indicating that, at least from a biosignal processing point of view, the overall signal dynamics are recovered from propofol and that the physiological system is associated with a high degree of signal interaction. The signal interaction pattern observed after propofol administration is influenced only by propofol dose, and thus not by PMA. This relation was observed for the pairwise interaction curves and the system interaction measure (average graph degree) derived from the graph model of the neonate.

DATA AVAILABILITY The data used to support the findings of this study are restricted by the Ethische Commissie onderzoek UZ/KU Leuven in order to protect patient privacy. Data are available from Dries Hendrikx ([email protected]) for researchers who meet the criteria for access to this confidential data.

DISCLOSURE This paper reflects only the authors’ views and the union is not liable for any use that may be made of the contained information.

CONFLICTS OF INTEREST The authors declare that there are no conflicts of interest regarding the publication of this paper, since the received funding, as stated in the Acknowledgments, does not lead to any conflicts of interest.

ACKNOWLEDGMENTS This research is supported by Bijzonder Onderzoeksfonds (BOF), KU Leuven: SPARKLE Sensor-based Platform for the accurate and remote monitoring of kinematics linked to Ehealth (no. IDO-13-0358), the effect of perinatal stress on the later outcome in preterm babies (no. C24/15/036), and TARGID—development of a novel diagnostic medical device to assess gastric motility (no. C32-16-00364); Fonds voor Wetenschappelijk

354

Graphs: Theory and Algorithms

Onderzoek (FWO), Vlaanderen: Hercules Foundation (AKUL 043) “Flanders BCI Lab—High-End, Modular EEG Equipment for Brain Computer Interfacing”; Agentschap Innoveren en Ondernemen (VLAIO) (150466: OSA+); Agentschap voor Innovatie door Wetenschap en Technologie (IWT) (O&O HBC 2016 0184); eWatch, imec funds 2017, imec ICON projects (ICON HBC.2016.0167) and “SeizeIT”, Belgian Foreign Affairs Development Cooperation: VLIR UOS programs (20132019); EU: European Union’s Seventh Framework Programme (FP7/2007-2013), The HIP Trial: no. 260777; ERASMUS + (INGDIVS 2016-1-SE01- KA203022114); and European Research Council. The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/20072013)/ERC advanced grant: BIOTENSORS (no. 339804) and EU H2020- FETOPEN “AMPHORA” no. 766456. Dries Hendrikx is a SB Ph.D. fellow at Fonds voor Wetenschappelijk Onderzoek (FWO), Vlaanderen, supported by the Flemish government.

Using Graph Theory to Assess the Interaction between Cerebral ...

355

REFERENCES 1.

J. A. Bondy and U. S. R. Murty, Graph Theory with Applications, Elsevier Science Publishing Co., Inc., 1982. 2. A. Caicedo, G. Naulaers, P. Lemmers, F. van Bel, M. Wolf, and S. Van Huffel, “Detection of cerebral autoregulation by nearinfrared spectroscopy in neonates: performance analysis of measurement methods,” Journal of Biomedical Optics, vol. 17, no. 11, article 117003, 2012. 3. O. Semenova, J. M. O’Toole, G. Boylan, E. Dempsey, and A. Temko, “Modelling interactions between blood pressure and brain activity in preterm neonates,” in 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 3969–3972, Seogwipo, South Korea, July 2017. 4. M. L. Tataranno, T. Alderliesten, L. S. de Vries et al., “Early oxygenutilization and brain activity in preterm infants,” PLoS One, vol. 10, no. 5, article e0124623, 2015. 5. R. P. Bartsch, K. K. L. Liu, A. Bashan, and P. C. Ivanov, “Network physiology: how organ systems dynamically interact,” PLoS One, vol. 10, no. 11, article e0142143, 2015. 6. A. Smits, L. Thewissen, A. Caicedo, G. Naulaers, and K. Allegaert, “Propofol dose-finding to reach optimal effect for (semi-)elective intubation in neonates,” The Journal of Pediatrics, vol. 179, pp. 54–60. e9, 2016. 7. L. Welzing, A. Kribs, F. Eifinger, C. Huenseler, A. Oberthuer, and B. Roth, “Propofol as an induction agent for endotracheal intubation can cause significant arterial hypotension in preterm neonates,” Paediatric Anaesthesia, vol. 20, no. 7, pp. 605–611, 2010. 8. J. Vanderhaegen, G. Naulaers, S. Van Huffel, C. Vanhole, and K. Allegaert, “Cerebral and systemic hemodynamic effects of intravenous bolus administration of propofol in neonates,” Neonatology, vol. 98, no. 1, pp. 57–63, 2010. 9. P. Papoff, M. Mancuso, E. Caresta, and C. Moretti, “Effectiveness and safety of propofol in newborn infants,” Pediatrics, vol. 121, no. 2, pp. 448-449, 2008. 10. S. H. P. Simons, R. van der Lee, I. K. M. Reiss, and M. M. van Weissenbruch, “Clinical evaluation of propofol as sedative for endotracheal intubation in neonates,” Acta Paediatrica, vol. 102, no.

356

11.

12.

13.

14.

15.

16.

17.

18. 19.

20.

21.

Graphs: Theory and Algorithms

11, pp. e487–e492, 2013. M. G. Penido, D. F. de Oliveira Silva, E. C. Tavares, and Y. P. e Silva, “Propofol versus midazolam for intubating preterm neonates: a randomized controlled trial,” Journal of Perinatology, vol. 31, no. 5, pp. 356–360, 2011. A. Vardi, Y. Salem, S. Padeh, G. Paret, and Z. Barzilay, “Is propofol safe for procedural sedation in children? A prospective evaluation of propofol versus ketamine in pediatric critical care,” Critical Care Medicine, vol. 30, no. 6, pp. 1231–1236, 2002. M. A. Claeys, E. Gepts, and F. Camu, “Haemodynamic changes during anaesthesia induced and maintained with propofol,” British Journal of Anaesthesia, vol. 60, no. 1, pp. 3–9, 1988. M. Muzi, R. A. Berens, J. P. Kampine, and T. J. Ebert, “Venodilation contributes to propofol-mediated hypotension in humans,” Anesthesia and Analgesia, vol. 74, no. 6, pp. 877– 883, 1992. C. C. Hug Jr, C. H. McLeskey, M. L. Nahrwold et al., “Hemodynamic effects of propofol: data from over 25,000 patients,” Anesthesia and Analgesia, vol. 77, 4 Supplement, pp. S21– S29, 1993. S. Ghanta, M. E. Abdel-Latif, K. Lui, H. Ravindranathan, J. Awad, and J. Oei, “Propofol compared with the morphine, atropine and suxamethonium regimen as induction agents for neonatal endotracheal intubation: a randomized, controlled trial,” Pediatrics, vol. 119, no. 6, pp. e1248–e1255, 2007. Y.-C. Hung, C.-J. Huang, C. H. Kuok, C.-C. Chen, and Y.-W. Hsu, “The effect of hemodynamic changes induced by propofol induction on cerebral oxygenation in young and elderly patients,” Journal of Clinical Anesthesia, vol. 17, no. 5, pp. 353–357, 2005. A. Shafer, “Metaphor and anesthesia,” Anesthesiology, vol. 83, no. 6, pp. 1331–1342, 1995. E. N. Brown, R. Lydic, and N. D. Schiff, “General anesthesia, sleep, and coma,” The New England Journal of Medicine, vol. 363, no. 27, pp. 2638–2650, 2010. N. Roche-Labarbe, F. Wallois, E. Ponchel, G. Kongolo, and R. Grebe, “Coupled oxygenation oscillation measured by NIRS and intermittent cerebral activation on EEG in premature infants,” NeuroImage, vol. 36, no. 3, pp. 718–727, 2007. W. P. Akrawi, J. C. Drummond, C. J. Kalkman, and P. M. Patel, “A

Using Graph Theory to Assess the Interaction between Cerebral ...

22.

23.

24.

25.

26.

27.

28.

29. 30.

357

comparison of the electrophysiologic characteristics of EEG burstsuppression as produced by isoflurane, thiopental, etomidate, and propofol,” Journal of Neurosurgical Anesthesiology, vol. 8, no. 1, pp. 40–46, 1996. S. Ching, P. L. Purdon, S. Vijayan, N. J. Kopell, and E. N. Brown, “A neurophysiological-metabolic model for burst suppression,” Proceedings of the National Academy of Sciences of the United States of America, vol. 109, no. 8, pp. 3095–3100, 2012. A. Caicedo, L. Thewissen, A. Smits, G. Naulaers, K. Allegaert, and S. Van Huffel, “Relation between EEG activity and brain oxygenation in preterm neonates,” in Oxygen Transport to Tissue XXXIX, H. Halpern, J. LaManna, D. Harrison, and B. Epel, Eds., vol. 977 of Advances in Experimental Medicine and Biology, pp. 133–139, Springer, Cham, 2017. K. Allegaert, M. Y. Peeters, R. Verbesselt et al., “Inter-individual variability in propofol pharmacokinetics in preterm and term neonates,” British Journal of Anaesthesia, vol. 99, no. 6, pp. 864–870, 2007. March of Dimes, PMNCH, Save the Children, and WHO, Born Too Soon: the Global Action Report on Preterm Birth, C. P. Howson, M. V. Kinney, and J. E. Lawn, Eds., World Health Organization, Geneva, 2012. K. Allegaert, J. D. Hoon, R. Verbesselt, G. Naulaers, and I. Murat, “Maturational pharmacokinetics of single intravenous bolus of propofol,” Pediatric Anesthesia, vol. 17, no. 11, pp. 1028–1034, 2007. D. J. Eleveld, P. Colin, A. R. Absalom, and M. M. R. F. Struys, “Pharmacokinetic–pharmacodynamic model for propofol for broad application in anaesthesia and sedation,” British Journal of Anaesthesia, vol. 120, no. 5, pp. 942–959, 2018 E. K. S. Louis and L. C. Frey, Electroencephalography (EEG): An Introductory Text and Atlas of Normal and Abnormal Findings in Adults, Children, and Infants, American Epilepsy Society, Illinois, 2016. A. M. Husain, “Review of neonatal EEG,” American Journal of Electroneurodiagnostic Technology, vol. 45, no. 1, pp. 12–35, 2005. A. Dereymaeker, N. Koolen, K. Jansen et al., “The suppression curve as a quantitative approach for measuring brain maturation in preterm infants,” Clinical Neurophysiology, vol. 127, no. 8, pp. 2760–2765,

358

31.

32.

33. 34.

35.

36.

Graphs: Theory and Algorithms

2016. N. Koolen, K. Jansen, J. Vervisch et al., “Line length as a robust method to detect high-activity events: automated burst detection in premature EEG recordings,” Clinical Neurophysiology, vol. 125, no. 10, pp. 1985–1994, 2014. J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle, Least Squares Support Vector Machines, World Scientific, 2002. R. J. Wilson, Introduction to Graph Theory, Prentice Hall, 4th edition, 1996. O. L. de Weck, “MIT strategic engineering: Matlab tools for network analysis, ” 2011, February 2017 http://strategic.mit. edu/downloads. php?%20page=matlab%20networks. J. O. Rawlings, S. G. Pantula, and D. A. Dickey, Applied Regression Analysis: A Research Tool, Springer-Verlag New York, Inc., Second Edition edition, 1998. E. van den Berg, P. M. A. Lemmers, M. C. Toet, J. H. G. Klaessens, and F. van Bel, “ E ffect of the “InSurE ” procedure on cerebral oxygenation and electrical brain activity of the preterm infant, ” Archives of Disease in Childhood - Fetal and Neonatal Edition, vol. 95, no. 1, pp. F53 –F58, 2010.

INDEX

A Adjacency matrices 79 adjacency matrix 342, 343 Adjacency matrix 87, 88, 89, 91, 93 Adjacency matrix spectrum 87 Adjacency spectrum 107, 108, 123, 124, 126 Adjacent edges 44 Adjacent vertex degrees 71, 74, 81 Algebraic-specific structures 68 algebraic topology methods 324 algebraic topology of graphs 310, 312, 325 Algorithm 36, 37, 38, 39 Alternative hypothesis 85, 88 Approximate entropy 14 Approximation graph matching algorithm 214 Arithmetic-geometric 114, 115, 117

Arrangement of pixels 7

B Bayesian 250, 253, 257, 258 Berge’s decades-old Hyper graphs 164 Bernoulli map 6, 7, 8, 12, 13, 26, 28 Big graph processing 214 binding energy 310 Biological 250 Biology 250, 265, 269 Bipartite graph 121, 135, 143 Bit coin network 231 Blinking node system 31, 32, 33, 34, 35, 39 Block chain 230 block chain forks 231 Block chain technology 230 Block DAG datasets 232, 235, 240, 243, 244

360

Graphs: Theory and Algorithms

Block frequency 14 booming developments of research on complex networks 250 business process analyst (BA) 215

C Cartesian product 124 Cauchy-Schwartz inequality 116, 118 Central graph 131, 132 Central graph of smith graphs 132, 134, 139, 141, 142 Chaotic magic transform 6 Chaotic system 6, 27, 29 chemical composition 311 chemical factor 312, 324 Chemical reactivity 133, 142 Chromatic index 44, 61, 62, 63, 64, 65, 66, 67, 68 Chur scenario 293, 296, 297, 298 circled control path (CCP) 255 Classical mathematical techniques 194 Claw graph 159 Claw-saturated graphs 148 Clinical risk index for babies (CRIB) 333 complexity of transport systems 274 Complex materials 310 Complex natural 250 Complex networks 250, 277, 307 complex network theory 271, 273, 274, 299, 300, 307 Complex optimization problem 298 complex systems 250, 252, 265 Comprehensive collection 147 Computer programming 68 Computing algorithm 221 Connected graph 43, 44, 45, 47

Connected Hyper graph 177, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190 Connected hyper graphs 174 connection models 281, 283 control theory 250, 251, 259 Correlation coefficients 24 Cryptography 7, 21 Cryptosystems 6, 7, 28 Cut edge 164, 176, 177, 178, 179, 180, 182, 183, 187, 188 Cut vertex 164, 178, 179, 180, 181, 182, 183, 184, 185, 187, 188, 190

D database engineer (DB) 215 data graph 211, 212, 214, 215, 216, 217, 223 data management technology 230 Data mining 194, 195, 207 decryption algorithm 16 Delegated Proof of Stake (DPoS) 230 diffusion limited aggregation (DLA) 324 Digital images 5, 6, 7, 9, 10, 11 directed acyclic graph (DAG) 231 directed control paths (DCPs) 255 Discrete function 19 Distance-based index 72 Distance matrix 105, 106, 108, 109, 120, 125, 127, 128 Distance spectrum 109 Distinct triangle 155 Distributed complex network control 250 Dynamic graphs 214 dynamic networks 250

Index

E Ecological 250 Economic 250, 253, 257 Edge coloring 43, 44, 45, 47, 49, 51, 54, 56 Edge-deleted sub graphs 167 Edge rough graph 195, 196, 200, 204, 207 Edges of a graph 61, 68 Edge-subset-induced hyper sub graphs 167 Edge-subset-induced sub graphs 167 Edge-swap heuristic 71, 72, 74, 77, 79, 81 Eigenvalue analysis 88 Eigen values 131, 132, 133, 134, 135 Eigenvalues 87, 88, 89, 90, 91, 93, 99, 100 Elaborated energy bounds 133 Electroencephalography (EEG) 333 Electron energy 134, 135 Empty edges 167, 168, 169, 170, 174, 175, 176, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187 encryption algorithm 16, 26, 27, 28, 29 Encryption algorithms 6, 13, 23 Energy of graphs 134 Erdös-Rényi graphs 84, 85, 94, 95, 96, 99 Erdös-Rényi random graph model 89, 90, 97 Erdös-Rényi random network model 97, 100 ER mode 257 Estrada index 106, 108, 109, 120,

361

126, 127, 128, 129 Eulerian version 41 Evaluate the colors of edges 65 Extremal combinatorics 164

F Fake Degree Discovery 90, 91 Feasible inexact matching 87 File-based algorithms 6 Frequency 14 Friendship graph 45, 46

G Gaussian Estrada index 105, 106, 109, 110, 111, 113, 115, 120, 121, 122, 123, 124, 125, 126, 127, 129 Gaussianization 106, 109, 127, 129 Gaussian Orthogonal Ensemble (GOE) 90 General graph theoretical 87 Generalized distance Gaussian Estrada index 106, 110, 126, 127 Generalized distance matrix 105, 106, 107, 109, 120, 127, 128 Geographic network 332 geometric factor 312 Gramme matrix 260 Graph coloring 61 Graph Edit Distance (GED) 86 Graph kernel 86 Graphlet Frequency Distribution 87 Graph mining 195 Graph pattern matching 211, 212, 213, 214, 215, 216, 217, 225, 226 graph pattern matching algorithm (GPMS) 217 Graphs 332

362

Graphs: Theory and Algorithms

Graph simulation 214 Graph spectral community 108 Graph spectral theory 107 Graph spectrum 108 Graphs visualization 326 Graph theorists 164 Graph theory 5, 7, 8, 61, 68, 83, 85, 101, 194, 209 Grid graphs 41 Gromov–Hausdorff (G-H) 86 Gromov hyperbolicity 320 Growing network 310

H Hall’s Theorem 35 Hamiltonian cycle 49, 50, 51 Hamiltonian graphs 148, 149 Hamiltonian path 5, 6, 7, 8, 9, 10, 11, 12, 16, 26, 28, 47, 52, 58 Heart rate (HR) 333 Hemodynamics 331, 332, 333 hierarchical layered structure 279 hierarchically organized networks 311, 324 hierarchical network 323 hierarchical organization of social graphs 312 high-frequency transactions 231 High rotational symmetry 148 Histogram 19 Horizontal 24, 25 Huckel molecular orbital model 133 Huckel molecular Orbital theory 132 Huckel Molecular Orbital Theory 133 Huckel’s method 133 Huckel theory 135, 143

Human brain intelligence 250 hyperbolicity 310, 312, 320, 321, 325, 329 Hyperbolicity parameter 321 Hyper graph 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 192 Hyper vertices 279

I Image encryption 5, 6, 7, 23, 26, 27, 28, 29 implicit linear quadratic regulator 253, 259, 265 Improvement ratio (IR) 221 Incidence graph 163, 164, 168, 169, 172, 174, 175, 176, 178, 179, 182, 183, 184, 186, 187, 188, 189 Incremental graph matching algorithms 211 Induced path-saturated graphs 157 Information entropy 21 Insertion Edges 79 interburst interval (IBI) 334 interquartile range (IQR) 346 intra-layer edges 280, 282

K kernel bandwidth 341, 343 Kernel functions 86 kernel matrices 342, 343 Kruskal algorithm 79, 195

L Labeled vertex 73 Laplacian 85, 87, 88, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99 Laplacian distributions 92

Index

Laplacian matrix 90, 91, 98, 99, 105, 107, 108, 109, 120, 127 Laplacian Matrix Eigenvalue Distribution 85 Latin cubes 6 Leonhard Euler 8 Linear complexity 14 linear time invariant 251 local-game matching (LM) 253, 255, 265

M macroscopic 276, 286, 289, 290, 291, 293, 297, 299, 302 Macroscopic models 276 Magnant’s a dynamic survey 58 Mathematical modelling 194 Mathematician 164 Mathematics 61, 67, 69, 164 Matrix corresponding 202 Matrix functions 108 Maximum matching (MM) 252 Mean arterial blood pressure (MABP) 333 Medical science 250, 265 Methodology 274 microscopic 276, 297, 299, 304 minimizing longest control path (MLCP) 253, 256, 261, 263 Minimum cost 72 Minimum-weight spanning tree problem (MSTP) 72 Minor diagonal 24, 25 Molecular graph 108 Multilevel self-assembly 309 Multipartite graph 44 Multipartite graphs 41 Multiple chaotic maps 7

363

N Nano crystals 310 nanoparticles 310, 323, 327, 329 nano-structured materials 311 Nash equilibrium 253, 258 Near-infrared spectroscopy (NIRS) 333 Network discovery 84, 85, 91, 92, 98, 99, 102 network grows 312 network’s adjacency matrix 319 Network Sciences community 84 Networks in Networks (NiN) 271, 273, 279, 299 Network’s shape 86 network’s structural controllability 258 Network topology 250, 251, 252, 253, 259, 265 Network Visualization Tool 90, 92 Node/Edge Kernel 86 node-labeled directed graph 215 Non - adjacent vertices 136, 137, 138, 139, 141 Non-Hamiltonian graphs 41 Nonlinear interactions 339 Nonparametric statistical tests 85, 88 nonrapid eye movement (NREM) 334 Non-trivial automorphism 62 Non-trivial induced H-saturated graphs 147 nonzero ternary matrix 201 Noordin Top’s network 91 Normalization 344 Normalized histograms 20 normalized Laplacian 85, 87, 88, 90, 92, 93, 94, 95, 96, 98, 99

364

Graphs: Theory and Algorithms

NP-C problem 9 Nullity 131, 132, 133, 135, 136, 137, 138, 139, 140, 144 numerical controllability transition 261, 263 Numerical simulations 109

O Olympic Cerebral Function Monitor 336 On-Hamiltonian path 35, 36, 37, 39, 40 On-Hamiltonian walk 31, 33, 34, 35, 37, 38, 39, 40, 41 optimal control of large-scale complex networks 252 optimal cost control 250, 251, 252, 254, 259, 264, 265 Optimization method 213, 218, 219, 220, 223 ordinary graphs 277 Original graph 76, 77, 79 orthonormal boundary condition 260 orthonormal-constraint-based projected gradient method (OPGM) 253, 260 Overlapping template 14

P Paragraph 58 Partial information graphs 92 Partial information network 84 Particular paths and cycles 146 Path kernel 86 Path-saturated graphs 145, 149 pattern graph 211, 212, 213, 214, 215, 216, 217, 219, 220, 221,

222, 223 Pawlak’s Rough set theory 194 Petersen graph 149 Pharmacodynamic 335, 357 Physiological network 339 Plain images 23 Pm-saturated graphs 147, 152, 154, 155, 156, 160 Pn-saturated graph 152 Polydispersity of the binding simplexes 310 Polyhedral graphs 41 Postmenstrual age (PMA) 335 Postnatal age (PNA) 335 power spectral density (PSD) 340 Principal diagonal 24, 25 probability density function (PDF) 341 Problem of edge coloring 61, 62 programming language 17 project manager (PM) 215 Proof of Importance (PoI) 230 Proof of Luck (PoL) 230 Proof of Stake (PoS) 230 Proof of Work (PoW) 230 Proper-path 2-coloring 46, 56, 58 Proper-path coloring 43, 44, 47, 48, 49, 51, 53, 55 Proper-path colorings in graphs 58, 59 Properties of graphs 132 propofol administration 332, 335, 339, 344, 347, 348, 349, 350, 351, 352, 353

Q Quantum information theory 106, 109, 127

Index

Quantum mechanics systems 109 Quartic graph 65

R radial basis function (RBF) 340 Rainbow coloring 44 Rainbow-connected graph 44 Rainbow connection 44 random allocation method (RAM) 264 Random graphs 80 Random matrix theory 84, 89 Random spanning tree 72, 74, 75 RBF kernel function 340 Real-life graphs 223 Real time analysis 84 Real-world graphs 211, 212 Real-world networks 98 Regular graphs 120 Removal Edges 79 Removal process 39 Resist differential attacks 21 RF algorithm 61 RF coloring algorithm 62, 64, 65, 66, 68 RF coloring matrix 62, 63, 64, 65, 66, 67, 68 Root mean squared (RMS) 337 Rough Graph 194, 201 Rough set theory 194, 196, 206, 208 Rupper approximate graphs 197

S SAGE mathematics software 146 scaling parameter 89 Schwartz inequality 134 Scientific data and network 107 Secret key 17 Self-assembly of nanoscale 310

365

Separating vertex 164, 181, 182, 183, 184, 185, 186, 187, 188, 189 Sequences of alternating nodes and edges 86 Sequential Adjacency 85 Sets of lower-order hyper vertices 279 Shannon entropy 341, 342, 343 Shape analysis 86 Signless Laplacian Gaussian 120 Silicon Valley’s 230 Simplified Payment Verification 231 single-layer network (SLN) 286, 292, 300 Smith graph 132 social network 215, 221, 226, 332 Social science 250, 265 software architecture (SA) 215 software developer (SD) 215 software development 215 software tester (ST) 215 Spanning star 76 Spanning subgraph 43, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56 spanning trees 71, 72, 74, 75, 76, 77, 78, 80, 81 Special graphs 110, 120, 127 Spectral graphs 87 Spectral graph theory 132, 134 Square matrix 133 Stiefel manifolds 260 structural controllability 250, 251, 252, 253, 254, 263, 265 Structure of bipartite graphs 39 Sub-graph 49 Subgraph 49, 50, 51, 53, 55, 56 Sub hyper graphs 165, 166 Supplementary Information 252

366

Graphs: Theory and Algorithms

supra-adjacency matrix 280 Surfactant administration 332, 352 Swapping tree edges 72 systemic variables 331, 332, 333, 336

T Technological networks 250 ternary matrices 202 Theoretical systematic 85 time-dependent external control input vector 251 topological response 317 Topology 213, 226, 338, 350 Topology information 257 tracé alternant (TA) 334 Traditionally difficult problem 147 Traffic flow 271, 272, 273, 274, 276, 283, 284, 286, 288, 289, 290, 291, 292, 293, 294, 296, 297, 298, 299, 300, 307 Transmission degrees 106, 109, 110, 111, 113, 115, 117, 119, 127 Transmission regular graph 106, 111, 115, 117, 120, 121, 122, 123, 125 transportation network 271, 273, 275, 276, 281, 282, 290, 299 Traveling Baseball Fan 32 Traveling Salesman Problem (TSP) 251

Triangles 45, 46, 310, 313, 316, 321, 322, 323 Turan graph 135

U Universal curve 97 University Grants (UGC) 207

Commission

V variance inflatable factor (VIF) 346 Vertex 31, 32, 33, 34, 35, 36, 37, 39, 40, 41 Vertex-deleted sub hyper graphs 167 Vertex rough graphs 194, 196, 204, 205, 207 Vertex-subset-induced hyper sub graph 174 Vertex-subset-induced sub graphs 167 Vertical 24, 25 Vertices of degree 48, 51, 53, 55 Virus scans 32

W weighted adjacency matrix 251 weighted graph 332, 342, 343 Wiener index 72, 73, 81, 106, 109, 110, 117, 119, 122, 127