Self-Managed Networks, Systems, and Services: Second IEEE International Workshops, SelfMan 2006, Dublin, Ireland, June 16, 2006, Proceedings (Lecture Notes in Computer Science, 3996) 3540347399, 9783540347392

This book constitutes the refereed proceedings of the Second IEEE International Workshop on Self-Managed Networks, Syste

121 53 3MB

English Pages 196 [194] Year 2006

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Frontmatter
Middleware and Infrastructure for Self-Management
Implementation and Evaluation of a Middleware for Self-Organizing Decentralized Web Services
Self-Adaptive Systems: A Middleware Managed Approach
Gossip-Based Clock Synchronization for Large Decentralized Systems
Peer-to-Peer and Overlay Networks
Proximity-Aware Superpeer Overlay Topologies
Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services
Using Aggregation for Adaptive Super-Peer Discovery on the Gradient Topology
Self-Adaptation
Self-Adaptive Applications Using ADL Contracts
Dynamic Generation of Context Rules
Self-Managed Mobile Systems
{\itshape Spirits}: Using Virtualization and Pervasiveness to Manage Mobile Robot Software Systems
Mobile Service Clouds: A Self-Managing Infrastructure for Autonomic Mobile Computing Services
Networking
Capacity Efficient Shared Protection and Fast Restoration Scheme in Self-Configured Optical Networks
Increasing Lifetime of Wireless Sensor Networks with Energy-Aware Role-Changing
Work-in-Progress Papers
Self-Organisation of Resources in PROSA P2P Network
Plug-and-Play Address Management in Ambient Networks
{\itshape k}-Variable Movement-Assisted Sensor Deployment Based on Virtual Rhomb Grid in Wireless Sensor Networks
Toward Self-Managed Networks?
Backmatter
Recommend Papers

Self-Managed Networks, Systems, and Services: Second IEEE International Workshops, SelfMan 2006, Dublin, Ireland, June 16, 2006, Proceedings (Lecture Notes in Computer Science, 3996)
 3540347399, 9783540347392

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

3996

Alexander Keller Jean-Philippe Martin-Flatin (Eds.)

Self-Managed Networks, Systems, and Services Second IEEE International Workshop, SelfMan 2006 Dublin, Ireland, June 16, 2006 Proceedings

13

Volume Editors Alexander Keller IBM T.J. Watson Research Center P.O. Box 704, Yorktown Heights, NY 10598, USA E-mail: [email protected] Jean-Philippe Martin-Flatin UQAM, Laboratoire de Téléinformatique Département d’Informatique Case postale 8888, Succursale Centre-Ville, Montréal, Québec H3C 3P8, Canada E-mail: jp.martin-fl[email protected]

Library of Congress Control Number: 2006926662 CR Subject Classification (1998): C.2, D.4.4, H.4.3, I.2.11 LNCS Sublibrary: SL 5 – Computer Communication Networks and Telecommunications ISSN ISBN-10 ISBN-13

0302-9743 3-540-34739-9 Springer Berlin Heidelberg New York 978-3-540-34739-2 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2006 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 11767886 06/3142 543210

Preface

This volume of the Lecture Notes in Computer Science series contains all the papers accepted for presentation at the second IEEE International Workshop on Self-Managed Networks, Systems and Services (SelfMan 2006), which was held at University College Dublin, Ireland on June 16, 2006. This workshop follows up on a very successful edition that took place last year in Nice, France. The online proceedings of SelfMan 2005 are available at http://madynes.loria.fr/selfman2005/ . The objectives of this year’s edition were to bring together people from different communities (networking, distributed systems, software engineering, P2P, service engineering, distributed artificial intelligence, robotics, etc.) and crosspollinate their experience in designing and implementing self-managed networks, systems and services. We received 51 papers from 21 countries, of which 12 were selected. The acceptance ratio was below 24%. In addition, we selected three work-in-progress papers for short presentations. This one-day event was structured so as to encourage discussions and foster collaborations. The breadth of the topics presented herein reflects the current interest and developments in this rapidly growing field. It is also a testimony to the promises of self-management to design, operate and manage today’s increasingly complex and heterogeneous networks, systems and services. SelfMan 2006 was co-located with the third IEEE International Conference on Autonomic Computing (ICAC 2006). It was sponsored by the IEEE Computer Society’s Task Force on Autonomous and Autonomic Systems (TFAAS) and Technical Committee on Parallel Processing (TCPP), in cooperation with the ACM Special Interest Groups on Operating Systems (SIGOPS) and Artificial Intelligence (SIGART), the IEEE Systems, Man, and Cybernetics Society (SMC), and the IFIP Working Group 6.6 on Management of Networks and Distributed Systems (WG6.6). The outstanding quality of this workshop’s technical program owes a good deal to the members of the Technical Program Committee, who encouraged colleagues in the field to submit papers and devoted much time to review papers. We sincerely thank them, as well as the few external reviewers who also took part in the review process. Finally, we are grateful to the corporate patrons of SelfMan 2006, Cisco and BT, for their generous donations. New York and Montreal, June 2006

Alexander Keller Jean-Philippe Martin-Flatin

Organization Conference Chairs Alexander Keller Jean-Philippe Martin-Flatin

IBM T.J. Watson Research Center, Yorktown Heights, NY, USA University of Quebec in Montreal, Canada

Sponsored by Institute of Electrical and Electronics Engineers (IEEE)

IEEE Computer Society

In cooperation with ACM SIGOPS, ACM SIGART, IEEE SMC and IFIP WG6.6

Corporate Patrons

Steering Committee Kurt Geihs, University of Kassel, Germany Joe Sventek, University of Glasgow, UK

Technical Program Committee Ozalp Babaoglu, University of Bologna, Italy Raouf Boutaba, University of Waterloo, Canada Geoff Coulson, Lancaster University, UK Giovanna Di Marzo Serugendo, Birkbeck College, University of London, UK Jim Dowling, MySQL, Sweden

VIII

Organization

David Garlan, Carnegie Mellon University, USA Joseph L. Hellerstein, IBM T.J. Watson Research Center, USA Michael Hinchey, NASA, USA Kazuo Iwano, IBM, Japan Mark Jelasity, University of Bologna, Italy Randy Katz, University of California, Berkeley, USA Robert Laddaga, Massachusetts Institute of Technology, USA Ian Marshall, University of Kent, UK Radhika Nagpal, Harvard University, USA George Pavlou, University of Surrey, UK Paul Robertson, Massachusetts Institute of Technology, USA Jerry Rolia, HP Labs Palo Alto, USA Fabrice Saffre, BT Research & Venturing, UK J¨ urgen Sch¨ onw¨ alder, International University Bremen, Germany Karsten Schwan, Georgia Institute of Technology, USA Morris Sloman, Imperial College London, UK Mikhail Smirnov, Fraunhofer FOKUS, Germany Roy Sterritt, University of Ulster, UK John Strassner, Motorola Labs, USA Joe Sventek, University of Glasgow, UK Aad van Moorsel, Newcastle University, UK Maarten van Steen, Vrije Universiteit Amsterdam, The Netherlands Franco Zambonelli, Universit` a di Modena e Reggio Emilia, Italy Zheng Zhang, Microsoft Research Asia, China

Reviewers The task of reviewing the papers submitted to SelfMan 2006 was extremely important. It is therefore a great pleasure to thank the additional reviewers listed below for their constructive and detailed comments. Their efforts were key in assuring the high quality of the workshop. Sharad Agarwal, Microsoft Research, USA Matt Caesar, University of California, Berkeley, USA Nikolaos Chatzis, Fraunhofer FOKUS, Germany Markus Huebscher, Imperial College London, UK Lutz Mark, Fraunhofer FOKUS, Germany George Porter, University of California, Berkeley, USA Christoph Reichert, Fraunhofer FOKUS, Germany Giuseppe Valetto, IBM T.J. Watson Research Center, USA Tanja Zseby, Fraunhofer FOKUS, Germany

Table of Contents

Middleware and Infrastructure for Self-Management Implementation and Evaluation of a Middleware for Self-Organizing Decentralized Web Services Constantin Adam, Rolf Stadler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Self-Adaptive Systems: A Middleware Managed Approach Eli Gjørven, Frank Eliassen, Ketil Lund, Viktor S. Wold Eide, Richard Staehli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

Gossip-Based Clock Synchronization for Large Decentralized Systems Konrad Iwanicki, Maarten van Steen, Spyros Voulgaris . . . . . . . . . . . . .

28

Peer-to-Peer and Overlay Networks Proximity-Aware Superpeer Overlay Topologies Gian Paolo Jesi, Alberto Montresor, Ozalp Babaoglu . . . . . . . . . . . . . . .

43

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services Marco Mamei, Franco Zambonelli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

Using Aggregation for Adaptive Super-Peer Discovery on the Gradient Topology Jan Sacha, Jim Dowling, Raymond Cunningham, Ren´e Meier . . . . . . .

73

Self-Adaptation Self-Adaptive Applications Using ADL Contracts Leonardo Cardoso, Alexandre Sztajnberg, Orlando Loques . . . . . . . . . . .

87

Dynamic Generation of Context Rules Waltenegus Dargie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Self-Managed Mobile Systems Spirits: Using Virtualization and Pervasiveness to Manage Mobile Robot Software Systems Himanshu Raj, Balasubramanian Seshasayee, Keith J. O’Hara, Ripal Nathuji, Karsten Schwan, Tucker Balch . . . . . . . . . . . . . . . . . . . . . 116

X

Table of Contents

Mobile Service Clouds: A Self-Managing Infrastructure for Autonomic Mobile Computing Services Farshad A. Samimi, Philip K. McKinley, S. Masoud Sadjadi . . . . . . . . 130

Networking Capacity Efficient Shared Protection and Fast Restoration Scheme in Self-Configured Optical Networks Jacek Rak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Increasing Lifetime of Wireless Sensor Networks with Energy-Aware Role-Changing Frank Reichenbach, Andreas Bobek, Philipp Hagen, Dirk Timmermann . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Work-in-Progress Papers Self-Organisation of Resources in PROSA P2P Network Vincenza Carchiolo, Michele Malgeri, Giuseppe Mangioni, Vincenzo Nicosia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Plug-and-Play Address Management in Ambient Networks Zolt´ an Lajos Kis, Csaba Simon, L´ aszl´ o Harri N´emeth . . . . . . . . . . . . . . 175 k-Variable Movement-Assisted Sensor Deployment Based on Virtual Rhomb Grid in Wireless Sensor Networks Wang Xueqing, Yang YongTian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179 Toward Self-Managed Networks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Implementation and Evaluation of a Middleware for Self-Organizing Decentralized Web Services Constantin Adam and Rolf Stadler KTH Royal Institute of Technology, Laboratory for Communication Networks, Stockholm, Sweden {ctin, stadler}@kth.se

Abstract. We present the implementation of Chameleon, a peer-to-peer middleware for self-organizing web services, and we provide evaluation results from a test bed. The novel aspect of Chameleon is that key functions, including resource allocation, are decentralized, which facilitates scalability and robustness of the overall system. Chameleon is implemented in Java on the Tomcat web server environment. The implementation is nonintrusive in the sense that it does not require code modifications in Tomcat or in the underlying operating system. We evaluate the system by running the TPC-W benchmark. We show that the middleware dynamically and effectively reconfigures in response to changes in load patterns and server failures, while enforcing operating policies, namely, QoS objectives and service differentiation under overload.

1

Introduction

Large-scale web services, such as on-line shopping, auctioning, and webcasting, rapidly expand in geographical coverage and number of users. Current systems that support such services, including commercial solutions (IBM WebSphere, BEA WebLogic) and research prototypes (Ninja [1], Neptune [2]), are based on centralized designs, which limit their scalability in terms of efficient operation, low configuration complexity and robustness. To address these limitations, we have developed Chameleon, a decentralized middleware design that dynamically allocates resources to multiple services classes inside a global server cluster. Chameleon has three features characteristic of peer-to-peer systems. First, the server cluster consists of a set of functionally identical nodes, which simplifies the design and configuration. Second, the design is decentralized and the cluster dynamically re-organizes after changes or failures. Third, each node maintains only a partial view of the system, which facilitates scalability. Chameleon supports QoS objectives for each service class. Its distinctive features are the use of an epidemic protocol [3] to disseminate state and control information, as well as the decentralized evaluation of utility functions to control resource partitioning among service classes. In [4], we have presented our design in detail and evaluated it through extensive simulations. This work complements simulation studies we have conducted earlier to validate the scalability A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 1–14, 2006. c Springer-Verlag Berlin Heidelberg 2006 

2

C. Adam and R. Stadler

of the Chameleon design. The implementation presented in this paper further validates the design and the system model we have used in our simulations. The rest of this paper is structured as follows: Section 2 reviews the design of our peer-to-peer middleware. Section 3 describes the implementation of the design on an experimental test bed. Section 4 evaluates the implementation. Section 5 reviews related work. Finally, Section 6 contains additional observations and outlines future work.

2

Overview of the Chameleon Middleware Design

We have developed our design for large-scale systems. A typical deployment scenario that contains several data centers and many entry points is shown in Fig. 1. We assume that the data centers are connected through high-capacity links and the networking delays are much smaller than the processing delays. We refer to the collection of servers in these centers as the cluster. Note that the design in this paper covers the tier of the application servers, but not the database tier. Specifically, we do not provide an approach to scale the database tier, which is an active area of current research. This is also reflected in the evaluation of the design, where we use database operations in browsing mode for read-only data, which do not have transaction requirements. Service requests enter the cluster through the entry points, which function in our design as layer 7 switches. An entry point associates an incoming request with a service class and directs it to a server assigned to that class. An epidemic protocol runs in the cluster to provide the entry points with information about servers assigned to each service class. We associate two performance targets with each service: the maximum response time (defined per individual request), and the maximum drop rate (de-

Data center

Management Station

Database



Entry Points

Internet Clients

Fig. 1. We have developed a design for large-scale, self-configuring server clusters. The design includes the servers in data centers, the entry points and a management station.

Implementation and Evaluation of a Middleware

3

fined over all the requests for a specific service). In this paper, we use the terms service and service class interchangeably. Upon receiving a request, a server schedules it for execution, using a FIFO policy. In case of overload, or high CPU utilization, the server redirects the request to one of its neighbors that runs the same service. A request that cannot be processed within the maximum response time objective is dropped by the system. 2.1

Cluster Services and Utility Functions

The objective of the system is to maximize a cluster utility function that measures how well it meets the performance targets for the offered services. We define the cluster utility function as the sum of the service utility functions. A service utility function specifies the rewards for meeting and the penalties for missing the performance targets for a given service. Let ρ denote the maximum allowed drop rate and r represent the experienced drop rate. Meeting or exceeding the QoS objectives for a service yields a reward: U + = α(ρ − r), if r ≤ ρ. The violation of the QoS objectives for a service results in a penalty. In order to avoid starvation of a service, the penalty increases exponentially past the point r+ = ρ + α1/(β−1) :  −α(r − ρ), ρ < r ≤ r+ − U = −(r − ρ)β , r > r+ The control parameters α and β are positive numbers that define the shape of the graph and determine the relative importance of a service. As the cluster utility is the sum of service utilities, the system will attempt to maximize the cluster utility by allocating cluster resources to services in such a way that the performance targets of services with higher values for α and β are more likely to be met than those of services with lower values. This enables service differentiation in case of overload. Note that the behavior of a system depends on the specific choice of the cluster utility function. For example, a system for which the cluster utility function is defined as the minimum of the service utility functions will attempt to provide fair allocation to all cluster services. 2.2

Decentralized Control Mechanisms

Three distributed mechanisms, shown in Fig. 2, form the core of our design. Topology construction, based on Newscast, an epidemic protocol, organizes the cluster nodes into dynamic overlays, which are used to disseminate state and control information in a scalable and robust manner. Request routing directs service requests towards available resources, subject to response time constraints. Service selection dynamically partitions the cluster resources between services in response to external events, by periodically maximizing utility functions. These mechanisms run independently and asynchronously on each server. The topology construction mechanism organizes the servers of a cluster with s service classes into s + 1 logical networks: one system overlay and s service

4

C. Adam and R. Stadler

Fig. 2. Three decentralized mechanisms control the system behavior: (a) topology construction, (b) request routing, and (c) service selection

overlays, one per service class. Each server runs, at any time, a single service S and is a member of two logical networks, the system overlay and the overlay for the service S. For each of these networks, a server has a neighbor table, with the structure shown in Tables 1 and 2. As we will explain below, the request routing mechanism uses the information in the service table, while the service selection mechanism uses the information in the system table. Table 1. Structure of the service neighborhood table ID Timestamp Utilization Processed Requests Dropped Requests

Table 2. Structure of the system neighborhood table ID Timestamp Service Processed Requests Dropped Requests

The request routing mechanism directs incoming requests along the service overlays toward available resources, subject to the maximum response time objective. It is invoked when a server is overloaded and does not have sufficient local resources to process a request in time. If the server has light neighbors (i.e. with utilization below a threshold cpumax ) in its service neighborhood table, it forwards the request to a randomly selected light neighbor. Otherwise, it forwards the request to a random neighbor. In order to avoid routing loops, each request keeps in its header the addresses of the servers it has visited. If a node cannot process the request and the request has already visited all of its neighbors, the request is dropped. 1. Service_id my_service, s, new_service; 2. Hashtable known_services;

Implementation and Evaluation of a Middleware

5

3. Double max_util; 4. while (true) { 5. max_util=estimateNhoodUtility(my_service); 6. new_service=my_service; 7. for each s in known_services { 8. estimateCurrentDropRate(s); 9. predictDropRateAfterSwitch(s); 10. } 11. for each s in known_services { 12. U_s= predictNhoodUtilityAfterSwitchTo(s); 13. if(U_s > max_util) { 14. max_util=U_s; 15. new_service=s; 16. } 17. } 18. if(new_service!=my_service) { 19. if(local_utilization Δ. The filter addresses two important issues. Firstly, it is capable of rejecting samples with round-trip delays higher than “normal,” often an indication of a large error. A high round-trip delay is not necessarily equivalent to a large differential delay (i.e., the time to get from A to B is not the same as for going from B to A). However, without synchronized clocks it is not possible to determine a differential delay during a timestamp exchange. On the other hand, a large differential delay may be caused by a packet being stalled on its way in one direction, which implies a high round-trip delay. Secondly, the filter is capable of adjusting to a changing network, that is, it exhibits self-adaptation properties. When round-trip delays increase due to escalated network load, Δ gradually grows and eventually some samples will justifiably pass the filter. Dispersion Metric. The dispersion metric provides a fine-grained estimation of sample errors leading to improved accuracy. Recall that in the previous metric, the hop count measures the expected error accumulation while the filter deals with exceptional cases. However, if a filter contains high round-trip delays, a clock sample with a small hop count but with a relatively large error may easily pass the filter. By using such a sample, a node’s hop count may become small, in turn, leading to a situation in which its now erroneously adjusted clock value will suddenly be used by other nodes with which it gossips. We thus see a poor sample rapidly propagating to potentially many other nodes. This situation is easily avoided by deploying the dispersion metric presented here. Intuitively, the dispersion value tells a node whether it is better off with an independently accumulated error since its last synchronization, than with the errors possibly inherited by using a clock sample from the selected peer. Assume that each node knows the precision and the frequency tolerance of its clock.1 Node A calculates the dispersion of a sample obtained by gossiping with node B as follows:  B δ  B B ·φ + , λ = εB + ρB + |θB | + TNOW − TLU 2

(3)

where εB is the dispersion of B, ρB is the precision of B’s clock, θB denotes the outB standing clock correction of B (in case of the gradual clock adjustment method), TNOW B is B’s current time, TLU is the time when B’s clock was synchronized for the last time, φB is B’s frequency tolerance, and δ is the round-trip delay. The dispersion of the sample incorporates all types of errors that can cause a clock to be inaccurate. Node A decides to use the sample for synchronization if and only if  A  A  A  A − TLU θ = 0 or H A > H B , · φ and (4) λ ≤ εA + ρA + TNOW 1

Such a value is a common element of a clock specification.

34

K. Iwanicki, M. van Steen, and S. Voulgaris

where the meaning of the symbols is as before. The first inequality determines whether using the sample for synchronization will improve the constantly degrading time accuracy. The second heuristic allows to start the synchronization only if a possible previous clock adjustment is completed or node B has a lower hop count than node A. If the sample satisfies both conditions, node A sets its dispersion value to λ. 3.3 Gossiping Frequency Synchronization in GTP is a continuous process in order to compensate for clock skew, leap seconds, and most of all, membership changes in the network. In the simplest scenario, a timestamp exchange could be initiated by a node at a fixed rate, that is, nodes would gossip at a constant frequency. Although a timestamp exchange is relatively inexpensive, it does consume resources. Obviously, the gossiping frequency can be small when the network is synchronized and stable. Ideally, however, it should automatically boost at certain events (e.g., when the time displayed by the time source is reset, or when many new nodes join the network such as after recovering from a major disruption). Again, the gossiping frequency should automatically return to lower values when the change is accounted for. This goal may be achieved in the following way. Each node is allowed to change its gossiping frequency between ϑmin and ϑmax . To control the frequency, a node stores a sliding window containing absolute values of clock offsets derived from the last N samples. After m (m ≤ N) samples have been used for synchronization, the minimal (Θmin ) and maximal (Θmax ) values of the offsets in the window are determined. Additionally, a weighted average (denoted θ) of the last M offsets (m ≤ M ≤ N) is computed. If Θmax − Θmin = 0, the gossiping frequency is modified to: θ − Θmin (ϑmax − ϑmin ) · + ϑmin , (5) Θmax − Θmin otherwise it is set to ϑmin . The formula linearly maps a recent absolute offset θ from the interval [Θmin , Θmax ] to a value in [ϑmin , ϑmax ]. The reaction speed and the change smoothness for a node are controlled by m, M and N parameters. In the next section, we will evaluate this adjustment for different parameter settings. The presented algorithm allows each node to make its own decisions based on local information. Nevertheless, since the behavior of all nodes is consistent, the whole network adapts its gossiping frequency in a timely manner, as the change or the stability is discovered by subsequent nodes. Note that our solution is fully automatic in a sense that it does not require precalculating any critical parameters. The maximal and minimal gossiping frequencies can be derived by GTP from the clock specification and the expected time accuracy. As we show by experiments, the values for m, M and N, on the other hand, are not critical for the overall behavior.

4 Experimental Results We have studied the properties of GTP on an emulated network of nodes, using a large server cluster. Each of 72 machines in this cluster contains two 1-GHz PIII CPUs, at least 1 GB of RAM and a 20-GB hard disk. The machines within the cluster are connected by 100-Mbps Ethernet. They run Red Hat Linux 7.2. Since the machines are

Gossip-Based Clock Synchronization for Large Decentralized Systems

35

not equipped with hardware capable of high-accuracy synchronization with UTC (e.g., WWV receivers), we had to rely on NTP to provide clock synchronization among them. The GTP implementation [10] was written in Java and uses UDP for communication between nodes. Each machine was running two JVMs. One JVM in the whole testbed was hosting an RMI registry for utilities, while all others were hosting 500 GTP nodes each. We experimented with large overlay networks (64,500 nodes on 65 machines), medium ones (10,500 nodes on 11 machines) and small ones (1,500 nodes on 2 machines). Emulation is superior to simulation as it involves a real implementation. This is especially true in the world of large-scale systems, where practice often diverges from theory. In the case of GTP, using real software implied real-world effects, for example, delays caused by the operating system on the path of a packet through the protocol stack, or an asymmetry in the time of execution between the code responsible for sending messages and the code responsible for receiving messages, which affect the quality of synchronization. Emulation is also more efficient than deployment, as it allows to study the behavior of incomparably bigger networks at a reasonable cost. Although our testbed was unable to model wide-area latencies, this drawback was partially compensated by the communication heterogeneity provided: intra-process (between threads within the same JVM), interprocess (between threads in two different JVMs on the same machine), and network communication (between threads on different machines). It is important to note that although up to 1,000 nodes resided on a single machine, the random selection of a peer node, as provided by our PSS, ensures that, respectively, more than 98% (large system), 90% (medium system) and 33% (small system) of the messages were transmitted through physical network links. Consider, for example, a large system. For a node in such a configuration, at most 999 other nodes were running on the same machine while 63,500 were running on different machines. As observed during the tests and in similar experiments [11], peers subsequently selected by a node were uniformly distributed over all machines. Therefore, the expected communication between the nodes on the same machine for a large system constituted only 1.55% of total communication. In the following sections we only highlight the most important aspects of GTP. More information on the full set of experiments that we conducted can be found in [10]. 4.1 Dissemination Speed and Accuracy The speed by which nodes synchronize is influenced by the overlay topology, the size of the network and the gossiping frequency. To simplify matters, we assume that each node initiates an exchange precisely once every ΔT units, that is, the gossiping frequency is fixed. When all nodes have performed such an exchange, we say that a round has completed. Note that during a round a node may be involved in more than one exchange as other nodes selected it for communication. We set ΔT = 25s, leading to a gossiping frequency of f = 0.04 Hz. We measured the dissemination speed by counting the number of nodes that received at least one sample that was used for synchronization. The results are shown in Fig. 3. Note that the actual synchronization speed may be smaller than the dissemination speed, for example, if the time is adjusted gradually (cf. Fig. 5).

36

K. Iwanicki, M. van Steen, and S. Voulgaris 300

65536

240

4096 1024

time [s]

number of nodes

16384

256 64 16

120 60

64,500 nodes 10,500 nodes 1,500 nodes

4

180

1

64,500 nodes 10,500 nodes 1,500 nodes

0 0

60

120 180 time [s]

240

300

0

10

20

30 40 50 60 70 percentage of nodes

(a)

80

90 100

(b)

Fig. 3. Time dissemination speed of GTP: (a) shows that the number of nodes to which the time has been disseminated grows exponentially with time (note the log scale for the y-axis); the horizontal lines in (b), which approximate 90% dissemination, show that the dissemination time grows logarithmically with the size of the network

16384

16384 64,500 10,500 1,500

1024 256 64 16 4 1 -100 -75

64,500 10,500 1,500

4096 number of nodes

number of nodes

4096

1024 256 64 16 4

-50

-25 0 25 error [ms]

(a)

50

75

100

1 -16

-12

-8

-4 0 4 error [ms]

8

12

16

(b)

Fig. 4. Time accuracy of GTP: (a) using hop counts and sample filtering; (b) using the dispersion metric. Note the log scale for the y-axis and different scales for the x-axis.

The results indicate that the dissemination speed depends logarithmically on the size of the network. This is consistent with the observation that in a network without differential delays if two GTP nodes, one of which is synchronized, exchange their timestamps, they are both synchronized afterwards. Assuming a topology with reasonable connectivity properties, the time should be propagated fast with a high probability. We tested the accuracy by considering both hop counts (with sample filtering) and dispersion, on networks of different sizes. Initial clock errors were chosen randomly with uniform distribution from the [10 s, 60 s] interval (positive and negative). Figure 4 presents snapshots of error distribution taken in a converged state. The hop-count metric with sample filtering performs significantly worse than the dispersion metric. For example, in a converged state in a 64,500-node network we have measured errors up to even 322 ms when using hop counts, whereas considering the dispersion criterion they do not exceed 12 ms. This difference is caused by the aforementioned scenario of error propagation which only becomes worse as the network grows.

Gossip-Based Clock Synchronization for Large Decentralized Systems

37

0.045 average node 1 node 2

gossiping frequency [Hz]

0.04 0.035 0.03

AVG. ABS. TIME ERROR

TIME CHANGE

0.025 0.02 0.015 0.01 0

900

1800

2700 3600 time [s]

4500

5400

(a)

average gossiping frequency [Hz]

0.045 1-1-8 1-4-8 2-4-8 4-4-8 4-8-8

0.04 0.035 0.03 TIME CHANGE

0.025 0.02 0.015 0.01 0

900

1800

2700 3600 time [s]

4500

5400

(b) Fig. 5. Adjustment of gossiping frequency: (a) shows the changes for two randomly selected nodes and the average, with configuration [m-M-N] = [1-1-8]; (b) shows the average adjustment for different configurations

The dispersion metric, on the other hand, ensures high accuracy that scales gracefully with respect to the number of nodes. 4.2 Gossiping Frequency The choice of the gossiping frequency is made individually by every node. To test the influence of such decisions on the macroscopic behavior of the network, we performed experiments in which, first, the network synchronized, and then, the clock setting of the time source was changed by 30 s. We used the earlier configurations and set the maximal and minimal gossiping frequency values of each node to 0.04 Hz and 0.01 Hz, respectively. Figure 5a illustrates the relationship between individual nodes and the whole system, consisting of 1,500 nodes. While the system is synchronizing, all nodes gradually decrease their gossiping frequency as the offsets between nodes’ clocks decrease. (Compare the curve representing

38

K. Iwanicki, M. van Steen, and S. Voulgaris

the average absolute time error with the curve representing the average gossiping frequency during the initial 1700 s in Fig. 5a.) Later, when the time at the time source is changed, the frequency of individual nodes increases as they notice the change by observing high offset values reported by other nodes. (See the period from 2500 s to 4500 s in Fig. 5a.) The nodes that are closer, in terms of network hops, to the time source (e.g., node 1 in Fig. 5a), notice the time change, and consequently, adjust their frequency, earlier. Additionally, they keep this high gossiping frequency for a longer period of time as there are many other nodes that have not noticed the change, that is, they report high clock offsets. In contrast, the nodes that are farther from the time source (e.g., node 2 in Fig. 5a) and thus notice the change later, do not gossip long at a high frequency, because at that moment, a number of nodes should already be in the process of correcting their clocks, so they are reporting lower offsets. These nice properties ensure that if there is work to be done (i.e., there are many nodes unaware of the change), some nodes will gossip longer with a high frequency to accelerate completion of the work. On the other hand, when a significant amount of work has been already done, there is no need to operate long with high frequency. Moreover, note that although some nodes gossip with the maximal frequency, the average value for the whole system is relatively small, because first, not all nodes notice the change at the same time, and second, the clock correction causes a gradual decrease of reported offsets. This behavior is highly desirable as it prevents network congestion, which would result in high differential delays, and thus, slower and less accurate synchronization. When the system accounts for the change, the gossiping frequency is gradually decreased, such that the moment of synchronization, in which the time offsets between nodes are minimal, matches the moment of reaching the minimal gossiping frequency. (Compare the curve representing the average absolute time error with the curve representing the average gossiping frequency during the period from 2500 s to 4500 s in Fig. 5a.) Figure 5b presents how different parameters influence frequency adjustment. Recall that a setting m-M-N indicates that the frequency is adjusted every m samples, by calculating the average absolute offset of the last M samples and comparing it to the maximal and minimal values derived from the last N samples. Varying N (the results not plotted) influences the amount of historical information taken into account when establishing the frequency bounds. The M parameter controls the smoothness and the range of the frequency changes, as it specifies how many historical offset values are taken into account when computing the average. Finally, m determines the reaction speed. What we see, however, is that actual parameter settings are not critical to the overall behavior, an issue we feel is important for self-organizing systems.

5 Related Work Numerous time synchronization protocols focusing on clock accuracy have been proposed [12, 13]. The most prevalent one, NTP ver. 3 [7, 6], operates in a predefined hierarchical configuration where nodes are divided into primary time servers, secondary time servers and clients. The position in the hierarchy is determined by a stratum – a hop-count equivalent. A primary time server (stratum 1) is directly synchronized to an

Gossip-Based Clock Synchronization for Large Decentralized Systems

39

external reference time source (e.g., a radio clock or a high-quality GPS receiver), and is usually equipped with a precise and stable clock, such as an atomic clock. A secondary time server, intended to deliver the time to clients, synchronizes, possibly via other secondary servers, with one or more primary servers. The basic idea behind NTP is building a synchronization spanning tree in which a primary server is the root, secondary servers are the inner nodes, and clients are the leaves. To achieve the remarkable accuracy, NTP requires complex sample evaluation algorithms, rigorous support from hardware and an operating system, a stable network (well-defined by the administrators of the servers), and a fair amount of configuration. In contrast, GTP focuses on very large, dynamic, decentralized systems. It sacrifices the nano-second accuracy for adaptability, scalability, simplicity, portability, and the ease of deployment or integration. The fine-grained error estimation of GTP has been inspired by NTP, but also many other algorithms employed by NTP can be successfully applied in GTP. Due to the assumptions about the dynamism of the network and only a basic support from an operating system, GTP, unlike NTP, does not synchronize clock frequencies, since this requires long measurement periods involving exactly same hosts, and special hardware [7]. Moreover, it does not prevent synchronization cycles if they do not degrade the clock accuracy. As the experiments show, despite these limitations, GTP is still able to deliver reasonably accurate time. Another example, the UNIX timed [14], designed for LANs, uses an election algorithm to designate a single host as master. The master periodically polls every slave host to obtain its time. Based on the answers, it computes the average that determines the current global time. Afterwards, the clock corrections are propagated to all other hosts. This is an example of the internal synchronization, in which the network is not synchronized to the global reference time [13]. As opposed to the UNIX timed, GTP provides the external synchronization. Moreover, it has been designed for applications that usually work in the Internet, and not only LANs. Recently a novel group of clock synchronization algorithms minimizing resource consumption and emphasizing robustness and self-organization, aimed at sensor networks, has been proposed [15]. Such networks require physical time, so data from different sensor nodes can be fused and delivered in a meaningful way to the sink. For example, the Reference-Broadcast Synchronization (RBS) [16] exploits the broadcast property of the wireless communication medium, which states that two receivers within listening distance of the same sender will receive the same message at approximately the same time (i.e., with very little variability in the delay). If every receiver records its time as soon as the message arrives, all receivers can synchronize with high accuracy by comparing their recorded time readings. This receiver-to-receiver synchronization removes two major sources of delays in wireless sensor networks, namely, waiting for the medium to be clear for transmission, and preparing the message. Since this type of synchronization requires all nodes to be in listening range of the sender, which may not be the case, RBS provides additional time forwarding. The topology for forwarding is determined by the senders of periodical synchronization messages. The Timing-sync Protocol for Sensor Networks (TPSN) [17], Lightweight Time Synchronization (LTS) [18], and Tiny-Sync [19] use Cristian’s algorithm and a span-

40

K. Iwanicki, M. van Steen, and S. Voulgaris

ning tree to synchronize time. However, due to the broadcast property of the wireless communication, a number of optimizations can be made. For example, a node deep in the tree can overhear messages exchanged between nodes closer to the root, and utilize some of the timestamps carried within these messages during its synchronization phase. Moreover, MAC-level packet timestamping, as employed by TPSN, significantly reduces possible errors. The Flooding Time Synchronization Protocol (FTSP) [20] uses periodic floods from an elected root, rather than maintaining a spanning tree. In the case of root failure, the system elects a new root node. FTSP also refines the MAC-level timestamping to within microsecond accuracy. Although time synchronization protocols for sensor networks bear some similarities with GTP, they all operate under completely different assumptions. They are very tightly coupled with the hardware they were devised for. This is particularly visible when considering how they exploit the broadcast property of the wireless communication medium or MAC-level timestamping, to reduce synchronization errors and to minimize the number of exchanged messages. Moreover, as opposed to ordinary PCs, sensor nodes are often equipped with high-precision timers. Due to such differences, a direct comparison of these protocols to GTP in terms of performance is difficult. For example, high-precision timers and various low-level optimizations influencing accuracy are in favor of the sensor network protocols. On the other hand, the time dissemination speed with respect to the number of nodes should be better for GTP, because of the properties of the physical network. Finally, the behavior of the sensor network protocols with respect to robustness and node population has not been widely studied, whereas in GTP we focus on very large, dynamic systems.

6 Conclusion and Future Work Many modern large-scale systems require at least loosely synchronized clocks. Existing time protocols were not designed to operate in the large, often self-organizing environments we consider in this paper. GTP addresses this problem. It requires no hierarchy; instead, all nodes actively participate in epidemic-style time synchronization. Sample evaluation and decisions on adjusting the gossiping frequency are made individually by every node based only on locally-available information. Such local decisions not only inherently ensure high scalability, but above all lead to consistent behavior of the whole system, which is an essential self-management property. By using a peer-sampling service, such as described in [9], we obtain automatic organization of the overlay and robustness in the presence of various changes such as those in node membership. The results of experiments, conducted on emulated networks using a real implementation, show that GTP is capable of attaining very reasonable accuracy and that it can automatically adjust its gossiping frequency when synchronization errors have become low or a sudden event occurs in the system. We plan to improve GTP with mechanisms for dealing with time source failures and conduct experiments on larger physical networks that exhibit wide-area transmission

Gossip-Based Clock Synchronization for Large Decentralized Systems

41

delays. We are also interested in applying GTP to wireless sensor networks. Finally, we intend to investigate serverless synchronization, that is, heterogeneous nodes (in terms of the clock parameters) would establish and maintain consistent and accurate systemwide time without any designated time source. In this latter approach, because of the network dynamics, the stability of the algorithm may become an issue.

References 1. Yu, H., Vahdat, A.: Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Transactions on Computer Systems (TOCS) 20(3) (2002) p.239–282 2. Duvvuri, V., Shenoy, P., Tewari, R.: Adaptive leases: A strong consistency mechanism for the World Wide Web. IEEE Transactions on Knowledge and Data Engineering 15(5) (2003) p.1266–1276 3. van Renesse, R., Birman, K.P., Vogels, W.: Astrolabe: A robust and scalable technology for distributed system monitoring, management, and data mining. ACM Transactions on Computer Systems (TOCS) 21(2) (2003) p.164–206 4. Garman, J.: Kerberos: The Definitive Guide. O’Reilly & Associates, Sebastopol, CA, USA (2003) 5. Rajendran, R.K., Ganguly, S., Izmailov, R., Rubenstein, D.: Performance optimization of VoIP using an overlay network. Technical report, NEC Laboratories America, Inc., Princeton, NJ, USA (2005) 6. Mills, D.L.: Improved algorithms for synchronizing computer network clocks. IEEE/ACM Transactions on Networking (TON) 3(3) (1995) p.245–254 7. Mills, D.L.: Network Time Protocol (version 3) specification, implementation and analysis. Technical Report RFC 1305, The Internet Society (1992) 8. Cristian, F.: Probabilistic clock synchronization. Distributed Computing 3 (1989) p.146–158 9. Jelasity, M., Guerraoui, R., Kermarrec, A.M., van Steen, M.: The peer sampling service: Experimental evaluation of unstructured gossip-based implementations. In: Proc. of the Middleware 2004, Toronto, Canada (2004) p.79–98 10. Iwanicki, K.: Gossip-based dissemination of time. MSc thesis, Warsaw University and Vrije Universiteit Amsterdam (2005) Available at: http://www.few.vu.nl/∼iwanicki/. 11. Voulgaris, S., Jelasity, M., van Steen, M.: A robust and scalable peer-to-peer gossiping protocol. In: Proc. of the 2nd Int. Workshop on Agents and Peer-to-Peer Computing (AP2PC), Melbourne, Australia (2003) p.47–58 12. Simons, B., Welch, J.L., Lynch, N.A.: An overview of clock synchronization. Lecture Notes in Computer Science 448 (1990) p.84–96 13. Patt, B.: A Theory of Clock Synchronization. PhD thesis, MIT (1994) 14. Gusella, R., Zatti, S.: The accuracy of the clock synchronization achieved by TEMPO in Berkeley UNIX 4.3BSD. IEEE Transactions on Software Engineering 15(7) (1989) p.847–853 15. Sundararaman, B., Buy, U., Kshemkalyani, A.D.: Clock synchronization in wireless sensor networks: A survey. Ad-Hoc Networks 3(3) (2005) p.281–323 16. Elson, J., Girod, L., Estrin, D.: Fine-grained network time synchronization using reference broadcasts. In: Proc. of the 5th USENIX Symposium on Operating System Design and Implementation (OSDI), Boston, MA, USA (2002)

42

K. Iwanicki, M. van Steen, and S. Voulgaris

17. Ganeriwal, S., Kumar, R., Srivastava, M.B.: Timing-sync protocol for sensor networks. In: Proc. of the 1st Int. Conf. On Embedded Networked Sensor Systems, Los Angeles, CA, USA (2003) p.138–149 18. van Greunen, J., Rabaey, J.: Lightweight time synchronization for sensor networks. In: Proc. of the 2nd ACM Int. Conf. on Wireless Sensor Networks and Applications, San Diego, CA, USA (2003) p.11–19 19. Sichitiu, M.L., Veerarittiphan, C.: Simple, accurate time synchronization for wireless sensor networks. In: Proc. of the IEEE Wireless Communications and Networking Conference (WCNC) 2003. (2003) 20. Mar´oti, M., Kusy, B., Simon, G., L´edeczi, A.: The Flooding Time Synchronization Protocol. In: Proc. of the 2nd Int. Conf. On Embedded Networked Sensor Systems, Baltimore, MD, USA (2004) p.39–49

Proximity-Aware Superpeer Overlay Topologies Gian Paolo Jesi1 , Alberto Montresor2, and Ozalp Babaoglu1 1

2

Dept. of Computer Science, University of Bologna (Italy) {gjesi, babaoglu}@CS.UniBO.IT Dept. of Information and Communication Technology, University of Trento (Italy) {alberto.montresor}@dit.unitn.it

Abstract. The concept of superpeer has been introduced to improve the performance of popular P2P applications. A superpeer is a “powerful” node that acts as a server for a set of clients, and as an equal with respect to other superpeers. By exploiting heterogeneity, the superpeer paradigm can lead to improved efficiency, without compromising the decentralized nature of P2P networks. The main issues in the construction of superpeer-based overlays are the selection of superpeers, and the association between superpeers and clients. Generally, superpeers are either run voluntarily (without an explicit selection process), or chosen among the “best” nodes in the network, for example those with the most abondant resources, such as bandwidth or storage. In several contexts, however, shared resources are not the only factor; latency between clients and superpeers may play an important role, for example in online games. This paper presents SG-2, a novel protocol for building and maintaining a proximity-aware superpeer topology. SG-2 uses a gossip-based protocol to spread messages to nearby nodes and a biology-inspired task allocation mechanism to promote the “best” nodes to superpeer status. The paper includes extensive simulation experiments to prove the efficiency, scalability and robustness of SG-2.

1

Introduction

Modern P2P networks present several unique aspects that distinguish them from traditional distributed systems. Networks comprising hundreds of thousand of peers are not uncommon. A consequence of such scale is extreme dynamism, with a continuous flow of nodes joining or leaving. Such characteristics present several challenges to the developer. Neither a central authority nor a fixed communication topology can be employed to control the various components. Instead, a dynamically changing overlay topology is maintained and control is completely decentralized. The topology is defined by ”cooperation” links among nodes, that are created and deleted based on the requirements of the particular application. The choice of a particular topology is a crucial aspect of P2P design. Until recently, most deployed P2P applications were characterized by the absence of a specific mechanism for enforcing a given topology; for example, Gnutella nodes 

This work was partially supported by the FET unit of the European Commission through projects bison(IST-38923), delis (IST-01907) and cascadas (IST-27807).

A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 43–57, 2006. c Springer-Verlag Berlin Heidelberg 2006 

44

G.P. Jesi, A. Montresor, and O. Babaoglu

were free to accept/refuse connections at will [12]. Inefficient communication schemes, such as flooding, were a consequence of this choice. A distinct, but related problem regards roles that nodes may assume: original P2P systems were based on a complete “democracy” among nodes: “everyone is a peer”. But physical hosts running P2P software are usually very heterogeneous in terms of computing, storage and communication resources, ranging from highend servers to low-end desktop machines. The superpeer paradigm is an answer to both issues [12,9]. It is based on a twolevel hierarchy: superpeers are nodes faster and/or more reliable than “normal” nodes and take on server-like responsibilities and provide services to a set of clients. For example, in the case of file sharing, a superpeer builds an index of the files shared by its clients and participates in the search protocol on their behalf. Clients are leveraged from taking part in costly protocols and the overall traffic is reduced by forwarding queries only among superpeers. Superpeers allow decentralized networks to run more efficiently by exploiting heterogeneity and distributing load to machines that can handle the burden. On the other hand, this architecture does not inherit the flaws of the client-server model, as it allows multiple, separate points of failure, increasing the health of the P2P network. The superpeer paradigm is not limited to file sharing: it can be seen as a general approach for P2P networking. Yet, the structural details are strongly application-dependent, so we cannot identify a “standard” superpeer topology. Parameters to be considered include: how superpeers are linked together; how to arrange clients; how many superpeers are needed; etc. In this paper, we focus our investigation on a specific aspect of the problem: proximity. Our goal is to build a topology where clients and superpeers are related based on their distance (in terms of communication latency). The idea is to select superpeers among the most powerful nodes, and to associate them with clients whose round-trip time is bounded by a specified constant. This is a generic problem, whose solution can be beneficial to several P2P applications. Examples include P2P telephony networks such as Skype [2], streaming applications such as PeerCast [20], and online games such as Age of Empires [3]. In all these cases, communication latency is one of the main concerns. Our solution, called SG-2, is a self-organizing, decentralized protocol capable of building and maintaining superpeer-based, proximity-aware overlay topologies. SG-2 uses an epidemic protocol to spread messages to nearby nodes, and implements a task allocation protocol that mimics the behavior of social insects. These biology-inspired mechanisms are combined to promote the “best” nodes to the superpeer status, and to associate them to nearby clients. To validate the results of our protocol, we considered a specific test case: online games. In these applications, a large number of players interact together (or against each other) in virtual worlds. Most online games follow a clientserver model, where the only function of the client software is to present a graphic user interface to the player, while the state of the simulated persistent world is hosted on the server side. This approach is scalable only thanks to the deployment of high-end clusters of replicated servers. A small number of games

Proximity-Aware Superpeer Overlay Topologies

45

have attempted a different approach. MiMaze [11] and Age of Empires [3] are completely decentralized, and the game state is replicated at all participants. In this case, consistency requirements limit the number of players that may be involved in the same game. We believe that the superpeer paradigm could represent an interesting alternative to the two approaches above. We envision a system where a small number of powerful nodes act as state servers when needed, with the remaining ones acting as clients. All nodes run the same code and can switch from the first role to the second when needed. Thus, superpeers dynamically change over time, depending on the environment conditions.

2

System Model

We consider a network consisting of a large collection of nodes. The network is highly dynamic; new nodes may join at any time, and existing nodes may leave, either voluntarily or by crashing. Since voluntary leaves may be simply managed through “logout” protocols, in the following we consider only node crashes. Byzantine failures, with nodes behaving arbitrarily, are excluded from the present discussion. We assume nodes are connected through an existing routed network, such as the Internet, where every node can potentially communicate with every other node. To actually communicate with another node, however, a node must know its identifier, e.g. a pair IP address, port. The nodes known to a node are called its neighbors, and as a set are called its view. Together, the views of all nodes define the topology of the overlay network. Given the large scale and the dynamism of our envisioned system, views are typically limited to small subsets of the entire network. Views can change dynamically, and so the overlay topology. Nodes are heterogenous: they differ in their computational and storage capabilities, and also (and more importantly) with respect to the bandwidth of their network connection. To discriminate between nodes that may act as superpeers and nodes that must be relegated to the role of clients, each node v is associated with a capacity value cap(v), that represents the number of clients that can be handled by v. To simplify our simulations, we assume that each node knows its capacity. In reality, this parameter is strongly dependent on the specific application, and can be easily computed on-the-fly through on-line measurements. Besides capacity associated to each single node (“how many”), another parameter to be considered is the end-to-end latency between nodes (“how well”). In our model, each pair of nodes (v, w) is associated with a latency distance lat (v, w), representing the average round-trip time (RTT) experienced by communications between them. The latency distance between a specific pair of nodes may be measured directly and precisely through ping messages, or approximately estimated through a virtual coordinate service [7]; given the dynamic nature of our system and the large number of nodes to be evaluated as potential neighbors, we will adopt the latter approach.

46

3

G.P. Jesi, A. Montresor, and O. Babaoglu

The Problem

Generally speaking, our goal is to create a topology where the most powerful nodes (in terms of capacity) are promoted to the role of superpeers, and the association clients/superpeers is such that each client obtains a configurable quality of service (in terms of latency distance) from its superpeer. More formally, we define the problem of building a proximity-aware, superpeerbased topology as follows. At any given time, the problem input is given by the current set of nodes V, and the functions cap() and lat () defined over it. Furthermore, a global parameter tol expresses the maximum latency distance that can be tolerated between clients and superpeers. The constraints describing our target topology are the following: – – – –

each node is either a superpeer or a client; each client c is associated to exactly one superpeer s (we write super (c) = s); the number of clients associated to a superpeer s does not exceed cap(s); given a superpeer s and one of its clients c, we require that lat (s, c) ≤ tol .

To avoid to end up with a set of disconnected, star-shaped components rooted at each superpeer, we require that superpeers form another proximity-based overlay: two superpeers are connected if their latency distance is smaller than tol +δ, where δ is another configuration parameter. We aim at selecting as few superpeers as possible (otherwise, the problem could be trivially solved by each node acting as a superpeer, with no client/superpeer connections). This choice is motivated, once again, by the particular scenario we are considering: in online games, superpeers manage the distributed simulation state, so centralizing as many decisions as possible is important from the performance point of view. Note that given the dynamism of our environment, obtaining the minimum number of superpeers may be difficult, or even impossible. But even in a steady state, the resulting optimization problem is NP-complete.

4

The

SG-2

Protocol

The architecture of SG-2 is shown in Figure 1; here, we briefly describe the rationale behind it, leaving implementation details to the following subsections. Our solution to the problem described above is based on a fundamental observation: measuring precisely the RTT between all pairs of nodes (e.g., through pings) is extremely slow and costly, or even impossible due to topology dynamism. To circumvent this problem, and allow nodes to estimate their latency without direct communication, the concept of virtual coordinate service has been developed [7]. The aim of this service is to associate every node with a synthetic coordinate in a virtual, n-dimensional space. The Euclidean distance between the coordinates of two nodes can be used to predict, with good accuracy, the RTT between them; in other words, it is sufficient for two nodes to learn about their coordinates to estimate their latency, without direct measurements. Our problem may be redefined based on the concept of virtual coordinates. Nodes are represented by points in the virtual space; each of them is associated

Proximity-Aware Superpeer Overlay Topologies

47

1 0.9

    

0.8 0.7 0.6

    

0.5 0.4

  

0.3 0.2 0.1

    

0 0

Fig. 1. The set of services composing the SG-2 architecture

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Fig. 2. A superpeer topology in a bidimensional virtual space, where Euclidean distance corresponds to latency

with an influence zone, described as a n-dimensional sphere of radius tol centered at the node. Our goal is to cover the virtual space with a small number of superpeers, in such a way that all nodes are either superpeers or are included in the influence zone of a superpeer. Figure 2 shows the topology resulting from the execution of SG-2 in a bi-dimensional virtual space. Nodes communicate with each other using a local broadcast service, whose task is to efficiently disseminate messages to nodes included in the influence zone of the sender. This service is used by powerful nodes to advertise their availability to serve as superpeers, and by ordinary nodes to seek superpeers whose capacity has not been saturated yet. The main component of SG-2 is the superpeer management service, which selects the superpeers and associates clients to them. The protocol is heavily inspired by the behavior of social insects [4], such as ants or bees, that have developed very sophisticated mechanisms for labor division. In summary, such mechanisms work as follows. In a totally decentralized fashion, specialized groups of individuals emerge, with each group aimed at performing some particular task. The task allocation process is dynamic and follows the community needs according to changes in the environment. The stimulus to perform some kind of task or to switch to another one can be given by many factors, but it is normally given by high concentrations of chemical signals, such as pheromones, that are released by other individuals and are spread in the environment. Each individual has its own response threshold to the stimulus and reacts accordingly. The superpeer protocol mimics this general picture. Un-associated nodes diffuse a “request for superpeers” signal through local broadcasts; the signal concentration in the network may stochastically trigger a switch to the superpeer role in some nodes according to their response threshold, which is proportional to their capacity. On the other hand, powerful nodes covering the same area of the virtual space compete with each other to gain news clients, by signaling their availability through local broadcasts. Clients associate themselves to the most powerful superpeers, and superpeers with an empty client set switch back to the

48

G.P. Jesi, A. Montresor, and O. Babaoglu

client role. The combination of these two trends (the creation of new superpeers to satisfy client request, and the removal of unnecessary superpeers) finds its equilibrium in a topology that approximates out target topology. The last component to be described is the peer sampling service. The task of this layer is to provide each node with a view containing a random sample of nodes [13]. The motivation is twofold: first of all, the random sample is used by the local broadcast service to perform gossiping; second, the topology resulting from this layer can be described as a random graph composed of a single connected component among all nodes. This topology is extremely robust and present no central point of failure; it may be used to recover from catastrophic failures in the overlaying superpeer topology, for example due to a coordinated attack to the subset of superpeers. 4.1

Virtual Coordinate Service

In SG-2, the virtual coordinate service is provided by Vivaldi [7], which is a decentralized, scalable, and efficient protocol developed at MIT. Using Vivaldi, nodes may obtain good coordinates with few RTT probes directed to a small subset of nodes. More importantly, Vivaldi can exploit normal traffic produced by applications using it, without requiring further communication. The estimate of the latency distance between vi and vj is denoted est(vi , vj ). Being estimates, these values may differ from the actual latency. The pairwise error between the estimate and the actual latency can be computed as: | lat (vi , vj ) − est (vi , vj )| min{est(vi , vj ), lat (vi , vj )} In our experiments, the number of dimensions of the virtual space is 5; measuring the error between all pairs of nodes, we found a median error of only 0.14, and a maximum error of 3.5. Note that latency distances that are “under-estimated” may pose a problem: if the actual latency is over tol , but the estimated latency is smaller, a superpeer may accept a client out of the tolerated range. For this reason, the maximum error must be considered when selecting parameter tol . 4.2

Peer Sampling Service

The sampling service is provided by Newscast [14], which has proven to be a valuable building block to implement several P2P protocols [15]. We provide here a brief description of the protocol and its characteristics. Each Newscast node maintains a view containing c node descriptors, each of them composed of a remote node identifier and a logical time-stamp. Newscast is based on the gossip paradigm: periodically, each node (i) selects a random peer from its partial view; (ii) updates its local descriptor; and (iii) performs a view exchange with the selected peer, during which the two nodes send each other their views, merge them, and keep the c freshest descriptors. This exchange mechanism has three effects: views are continuously shuffled, creating a topology that is close to a random graph with out-degree c; the resulting topology is strongly connected (according to experimental results, choosing

Proximity-Aware Superpeer Overlay Topologies

49

c = 20 is already sufficient for very stable and robust connectivity); and finally, the overlay topology is self-repairing, since crashed nodes cannot inject new descriptors any more, so their information quickly disappears from the system. The peer sampling service is a key component both during the initialization phase (bootstrap) of the other layers, and during the normal functioning of the protocol, when it allows the discovery of “distant” or newly joined peers from the entire network. Newscast is extremely inexpensive: messages are small, and the periodicity of view exchanges may be as low as one message per minute [14]. 4.3

Local Multicast Service

Unlike previous layers, based on existing protocols, the local multicast service has adapted an existing protocol for the specific needs of SG-2 [8]. Each message m is associated with the sender identifier sm and a radius parameter rm . Message m is delivered only to those nodes that are within latency distance rm from sm , as estimated by Vivaldi. Hence, the name Spherecast. The protocol may be described as follows. When a node either receives a message or wants to multicast a new one, it forwards it to its local fan-out. The fan-out of node v for message m is given by the subset of neighbors known to v that are potentially interested in the message, i.e. whose distance from sm is not larger than rm . Spherecast does not maintain its own topology; instead, it relies on the underlying overlay network provided by the peer sampling service. When a message is originated locally, or it is received for the first time, it is forwarded immediately to all nodes in the fan-out. If a message has been already received, a node may stochastically decide to drop it (i.e., not forwarding it). This is a standard approach used to avoid flooding the network. A strict deterministic approach such as dropping any multiple copy would not work correctly due to the nature of the underlying overlay. The actual clustering coefficient of the underlying topology and the continuous rewiring process may stop the message spreading. The stochastic approach solves this issue in a straightforward manner. The probability of dropping a message is given by the following formula: p = 1 − e−s/ϑ , where s is the number of times the node has seen this message and ϑ is a response threshold parameter. In this way, when a packet is received multiple times by a peer, it has less and less probability to be forwarded again. From an implementation point of view, digests of received messages are stored in a per-node table, together with the number of times that specific message has been received. This table is managed with a LRU policy, to avoid unbounded growth. 4.4

Superpeer Management Service

This layer is the core component of SG-2. Nodes participate in this protocol either as superpeers or as clients; a client c may be either associated to a superpeer (super (c) = s), or actively seeking a superpeer in its tol range (super (c) = ⊥). At the beginning, all nodes start as clients; to converge to the target topology defined in Section 3, nodes may switch role at will, or change their client-superpeer relationship. The decision process is completely decentralized.

50

G.P. Jesi, A. Montresor, and O. Babaoglu

Each node v maintains the following local variables. role specifies the role currently adopted by v; role = sp if v is a superpeer, role = cl otherwise. clv and spv are two views, respectively containing the clients and the superpeers known to v. They are composed of node descriptors combining an identifier w and a logical time-stamp ts w ; the latter is used to purge obsolete identifiers, as in Newscast. When v acts as a superpeer, clv is populated with the clients currently associated to v; it is empty otherwise. The size of clv is limited by cap(v). spv contains descriptors for the superpeers that are in tol +δ range; its size is not explicitly limited, but rather is bounded by the limited number of superpeers that can be found within tol +δ distance. When v acts as a client, one of the descriptors in spv may be the associated superpeer of v. Two distinct kinds of messages are broadcasted using Spherecast: cl-bcast and sp-bcast. The former are sent while in client state and are characterized by a radius parameter rm equal to tol , i.e. the maximum tolerated latency. The latter are used in superpeer state and their radius parameter is equal to tol +δ; superpeers need a wider radius to get a chance to contact other superpeers; furthermore, nodes with overlapping influence zones can exchange clients if they find a better client allocation that reduces their latency. At each node, two threads are executed, one active and one passive. The execution of active threads may be subdivided in periodic cycles: in each cycle, superpeers emit a sp-bcast signal which is broadcast in the surrounding area, to notify nodes about their presence and its residual capacity. Clients, on the other hand, periodically emit cl-bcast messages if and only if they are not associated to any superpeer. The shorter the cycle duration, the faster the system converge to the target topology; but clearly, the overhead grows proportionally. The passive threads react to incoming messages according to the message type and the current role. Four distinct cases are possible: Superpeer v gets sp-bcast, s, ts s , cap(s): The pair (s, ts s ) is inserted in spv . If s was already present, its time-stamp is updated. After that, the capacity of the two supernodes is compared: if cap(v) > cap(s), then a migration process is started. Clients associated with s that are inside the influence zone of v migrate to v, until the capacity is exhausted. Each affected client is notified about the new superpeer (v) by the current superpeer s. Node s, if left with no clients, switches back to the client role; it associates itself to v, if est (v, s) ≤ tol and v has still residual capacity; otherwise, it starts emitting cl-bcast messages. Superpeer v gets cl-bcast, c, ts c : If | clv (v)| < cap(v) (the capacity of v has not been exhausted), the client node is associated to v (unless, given the asynchrony of messages, it has been already associated with another superpeer). Client v gets sp-bcast, s, ts s , cap(s): The pair (s, ts s ) is inserted in spv . If s was already present, its time-stamp is updated. If v is not client of any superpeer, it sends a request to s asking to be associated with it. The response may be negative, if s has exhausted its capacity in the period between the sending of the message and its receipt by v. On the other hand, if v is already client of another superpeer s and cap(s) > cap(s ), then it tries to migrate to the more

Proximity-Aware Superpeer Overlay Topologies

51

powerful superpeer. This strategy promotes the emergence of a small set of highcapacity superpeers. Client c gets cl-bcast, c, ts c : This kind of messages can trigger a role change from client to superpeer; it is the cornerstone of our approach. The willingness of becoming a superpeer is a function of a node threshold parameter and the signal concentration perceived by a node in its influence area. The switching probability can be modeled by the following function: P (role(v) = cl → role(v) = sp) =

s2 s2 + θv2

where s is the signal magnitude and θv is the response threshold of node v. This function is such that the probability of performing a switch is close to 1, if s θ, and it is close to 0 if s θ. If cmax is the maximum capacity, θv is initialized with a value which is cmax − cap(v); in this way, nodes with higher capacity have a larger probability of becoming superpeers. The maximum capacity may be either known, or it can be easily computed by an aggregation protocol in a robust and decentralized fashion [15]. After the initialization, in order to make the topology more stable and avoid fluctuations, the response threshold is modified in such a way that time reinforces the peer role: the more time spent as a client, the less probable is to change role. Once again, the inspiration for this approach comes from biology: it has been observed, for example, that the time spent by an individual insect on a particular task produces important changes in some brain areas. Due to these changes, the probability of a task change (e.g., from foraging to nursing) is a decreasing function of the time spent on the current task [4]. For this reason, θv is reinforced as follows: θv (t) = θv (t − 1) + (α ∗ (t − tv )) Where t is the current cycle and tv is the last cycle in which v became a superpeer; α is a parameter to limit or increase the time influence. The peer normal responsiveness is re-initialized based on its local capacity if its superpeer crashes or if it becomes a superpeer node. The reaction to cl-bcast messages is the only mechanism to allow a client to become a superpeer. A superpeer can switch back to the client role only when other higher capacity superpeers have drained its client set. The θ adaptation process is only active when a node is in the client state.

5

Experimental Results

We performed a large number of experiments based on simulation to validate the effectiveness of our approach. The goal of our experiments was twofold: first of all, we measured the speed of convergence in a stable overlay, in the absence of failures; second, we measured the robustness of our approach in a dynamic

52

G.P. Jesi, A. Montresor, and O. Babaoglu

environment, where a fixed percentage of nodes are substituted with fresh ones periodically. Any node in the network can be affected by substitution, regardless of its role. Unlike the real world, where a superpeer is supposed to be more reliable, our choice is stricter and more “catastrophic”. Finally, communication overhead has been measured. The experiments have been performed using Peersim [21]. In our experiments, network size is fixed at 1000 and 2000 nodes. Several kinds of networks have been considered, but here, due to space restrictions, the focus is on gaming-oriented scenario [28, 23]. Other scenarios present similar results. For each pair of nodes v, w, the latency distance lat (v, w) among them has been generated using a normal distribution with average value μ = 250 ms and variance σ = 0.1 [28]. Then, we have run Vivaldi on this network, obtaining the corresponding function est(v, w). In the corresponding virtual space, we have considered tol values of 200 ms, 250 ms and 300 ms, which are typical of strategy and role-playing games. We have experimented with δ values of 200 ms, 300 ms and 400 ms, corresponding to typical round-trip time that can be accepted for superpeer communication. The capacity function cap(), i.e. the maximum number of clients that can be served, is generated through an uniform distribution in the range [1 : 500]. The simulation is organized in synchronous cycles, during which each node has the possibility to initiate a gossip exchange; note, however, that in reality node do not need to be synchronized. All the results are averaged over 10 experiments. Figure 3 illustrates the behavior of the protocol over time. All the figures in the left column are obtained in networks whose size is 1000 nodes, while the figures in the right column are relative to networks with size equal to 2000 nodes. The content of each sub-figure is divided in two parts; in the main plot, the number of superpeer active at each cycle is shown; in the small frame inside the main plot, the percentage of clients that are already associated is shown. In these experiments, the network is static; no nodes are removed or added. Figure 3(a) depicts a rather bad situation: in both network sizes, the convergence is extremely slow, and the number of nodes that are satisfied is low. This bad performance is motivated by the characteristics of the latency distributions [28, 23] and the tolerance value selected; most of the node pairs have a higher latency than 200 ms, and thus SG-2 cannot help much. Figure 3(b) shows a much better situation: a large percentage of clients (between 94% and 100% depending on size and parameter δ) have been associated after only few cycles (10-20). The number of superpeers is also very small, after an initial peak due to a large number of clients reacting to the signal. Almost every client can reach the required latency because 250 ms is the average pairwise latency in our game-like coordinates distribution. However, some nodes lies outside the 250 ms border and it is challenging for SG-2 to accommodate those nodes. The node density plays an important role for SG-2. In fact, the bigger network can be fully organized in a latency-aware fashion using the wider superpeer communication range (δ = 400 ms). Figure 3(c) shows the performance for tol = 300 ms: a response time that is perfectly acceptable in a strategic/role playing game scenario. The

Proximity-Aware Superpeer Overlay Topologies SP Convergence (1000 nodes network, gaming scenario)

150

100

DELTA 200 DELTA 300 DELTA 400

80 60 40 20 0 0

20

40

60

DELTA 200 DELTA 300 DELTA 400

200

# of superpeers

% associated clients

200

# of superpeers

SP Convergence (2000 nodes network, gaming scenario) 250

DELTA 200 DELTA 300 DELTA 400

% associated clients

250

80

53

100

100

50

150

100

DELTA 200 DELTA 300 DELTA 400

80 60 40 20 0 0

20

40

60

80

100

100

50

0

0 0

20

40

60

80

100

0

20

40

Cycles

60

80

100

Cycles

(a) tol = 200 ms SP Convergence (1000 nodes network, gaming scenario) 250

% associated clients

150

DELTA 200 DELTA 300 DELTA 400

100 200

80 60 40

# of superpeers

% associated clients

200

# of superpeers

SP Convergence (2000 nodes network, gaming scenario) 250

DELTA 200 DELTA 300 DELTA 400

DELTA 200 DELTA 300 DELTA 400

20 0 0

20

40

60

80

100

100

50

150

100 80 60 40

DELTA 200 DELTA 300 DELTA 400

20 0 0

20

40

60

80

100

100

50

0

0 0

20

40

60

80

100

0

20

40

Cycles

60

80

100

Cycles

(b) tol = 250 ms SP Convergence (1000 nodes network, gaming scenario) 250

% associated clients

150

DELTA 200 DELTA 300 DELTA 400

100 200

80 60 40

# of superpeers

% associated clients

200

# of superpeers

SP Convergence (2000 nodes network, gaming scenario) 250

DELTA 200 DELTA 300 DELTA 400

DELTA 200 DELTA 300 DELTA 400

20 0 0

20

40

60

80

100

100

50

150

100 80 60 40

DELTA 200 DELTA 300 DELTA 400

20 0 0

20

40

60

80

100

100

50

0

0 0

20

40

60

80

100

0

Cycles

20

40

60

80

100

Cycles

(c) tol = 300 ms Fig. 3. Convergence time. Three tol values are considered: 200 ms (a), 250 ms (b), 300 ms (c). The main figures show the number of active superpeer at each cycle, while the small sub-figures show the number of clients that are in tol range. Three different δ values are shown in each figure.

latency-aware topology in the figure is very good with any δ value. We obtain 100% of in range clients with about 50 superpeers in the small network and about 63 in the bigger network, in less than 10 cycles. Figure 4 is aimed at illustrating the robustness of our protocol. The size of the network is fixed at 1000 nodes. Its composition, however, is dynamic: at each cycle, 10% or 20% of the nodes crashes and are substituted with new ones. The figure shows that the number of superpeers oscillates over time, as expected, and

54

G.P. Jesi, A. Montresor, and O. Babaoglu 1000 nodes network, gaming scenario 90

DELTA 0.400, 10% DELTA 0.400, 20% % of associated clients

80 70

# of superpeers

60 50 40

100 80 60 40 20 DELTA 0.400, 10% DELTA 0.400, 20% 0 0 20 40 60

80

100

30 20 10 0 0

20

40

60

80

100

Cycles

Fig. 4. Experiments with churn. Network size is 1000; at each cycle, 10% or 20% of the nodes are substituted with new ones.

that up to 80% and 70% of the clients are associated to superpeers. The nodes that are not associated are those that have been recently created and are trying to find a position in the topology. Finally, we discuss message overhead; due to space limitations, we provide summary data instead of plots. We have measured the number of broadcast messages, including both cl-bcast and sp-bcast. Since the former type of message is broadcast only in case of lack of satisfaction, only a small number of them are generated: on average, less than 2 messages every thousand nodes. Superpeers, on the other hand, continuously send one message per cycle.

6

Related Work

The superpeer approach to organize a P2P overlay is a trade-off solution that merges the client-server model relative simplicity and the P2P autonomy and resilience to crashes. The need for a superpeer network is mainly motivated by the fact to overcome the heterogeneity of peers deployed on the Internet. Yang and Garcia Molina [27] proposed some design guidelines. A mechanism to split node clusters is proposed and evaluated analytically, but no experimental results are presented. Superpeer solutions proved to be effective solutions in the real world: Kazaa / Fasttrack [9] and Skype [25] are two outstanding examples. However, their actual protocols are not publicly available and they cannot be compared with any other solution or idea. At the time of writing, only a few works [17, 2] describe some low-level networking details. The SG-2 protocol can be considered as a natural evolution of the SG-1 [18] protocol; the two solutions, however, cannot be directly compared from a performance point of view because of their different goals. SG-1 focuses on optimizing the available bandwidth in the system, while SG-2 introduces the notion of latency between peer pairs and poses a QoS limit on it. The definition of the target topology is straightforward in SG-1 (e.g., the minimum number of superpeers to accommodate all the peers according to the superpeer capacities), while it is a

Proximity-Aware Superpeer Overlay Topologies

55

NP-problem in the SG-2 case. From the architectural point of view, they both rely on the existence of an underlying random overlay. The superpeer overlay is generated on top of it. The superpeer election process in SG-2 is strongly bio-inspired and much more randomized than approach used in SG-1. In [24], the authors propose a socio-economic inspiration based on Shelling’s model to create a variation of the super-peer topology. Such variation allows the ordinary peers to be connects with each other and to be connected to more than one super peer at the same time. This topology focuses on efficient search. As in our case, the superpeers are connected to each other to form a network of hubs and both solutions are suited for unstructured networks. However, they do not address the problem of the superpeer election. The basic problem of finding the best peer, having the required characteristics, to accomplish some task (e.i., the superpeer task) is addressed in a more general form in [1]. The problem is referred as “optimal peer selection” in P2P downloading and streaming scenarios. The authors use an economics inspired method to solve the optimization problem; the developed methodologies are general and applicable to a variety of P2P resource economy problems. The proposed solution is analytically strong, but no experimental results are shown especially regarding a large and dynamic scenario as the one the authors are addressing. Our implementation is based on Vivaldi (see section 4.1), but it is not tied to any particular virtual coordinate service. Other architectures can be adopted, such as IDMaps [10] and GNP [19] or PIC [6] and PCoord [16]. The first two rely on deployment of infrastructures nodes, while the other provide latency estimates gathered only between end-hosts, as Vivaldi does. We opted for Vivaldi because of its fully distributed nature and simple implementation. In less strict latency context, the hop-count is usually preferred in contrast to the millisecond latency to provide distance estimation. Pastry [22,5], for example, uses a hop distance metric to optimize its time response.

7

Conclusions

This paper presented SG-2, a fully decentralized, self-organizing protocol for the construction of proximity-aware, superpeer-based overlay topologies. The protocol produces an overlay in which almost all nodes (99.5%) are in range with a tol latency of 300 ms. The number of generated superpeers is small with respect to the network size (only 3-5%). The protocol shows also an acceptable robustness to churn. We believe that these results can be profitably adopted to implement several classes of applications, including strategy and role-playing games. Other classes of games, such as first-person shooter, are probably not suitable given their extremely strict latency requirements (inferior to 100 ms). These results are an improvement over existing decentralized games [3], that are based on strong replication [26] or low-level facilities such as IP-multicast [11]). We conclude noting that the results presented in this paper are only a first step toward the implementation of P2P games; several other problems have to be solved, including security, state replication, state distribution, etc.

56

G.P. Jesi, A. Montresor, and O. Babaoglu

References 1. M. Adler, R. Kumar, K. W. Ross, D. Rubenstein, T. Suel, and D. D. Yao. Optimal peer selection for p2p downloading and streaming. In Proceedings of IEEE Infocom, Miami, FL, March 2005. 2. S. Baset and H. Schulzrinne. An analysis of the skype peer-to-peer internet telephony protocol. Technical Report CUCS-039-04, Columbia University, Department of Computer Science, New York, NY, Sept. 20004. 3. P. Bettner and M. Terrano. 1500 Archers on a 28.8: Network Programming in Age Of Empires and Beyond. In Proc. of the GDC’01, Mar. 2001. 4. E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm intelligence: from natural to artificial systems. Oxford University Press, Inc., New York, NY, USA, 1999. 5. M. Castro, P. Druschel, Y. C. Hu, and A. Rowstron. Exploiting network proximity in distributed hash tables. In O. Babaoglu, K. Birman, and K. Marzullo, editors, International Workshop on Future Directions in Distributed Computing (FuDiCo), pages 52–55, June 2002. 6. M. Costa, M.Castro, A.Rowstron, and P.Key. Pic: Practical internet coordinates for distance estimation. In Proc. of ICDCS’04, 2004. 7. F. Dabek, R. Cox, F. Kaashoek, and R. Morris. Vivaldi: A decentralized network coordinate system. In Proc. of the SIGCOMM ’04, Portland, Oregon, August 2004. 8. P. Eugster, R. Guerraoui, S. B. Handurukande, A.-M. Kermarrec, and L. Massouli´e. Lightweight probabilistic broadcast. ACM Transactions on Computer Systems, 21(4):341–374, 2003. 9. Fasttrack Home Page. http://www.fasttrack.nu. 10. P. Francis, S. Jamin, C. Jin, Y. Jin, V. Paxson, D. Raz, Y. Shavitt, and L. Zhang. IDMaps: a global internet host distance estimation service. In Proc. of IEEE Infocom ’99, 1999. 11. L. Gautier and C. Diot. MiMaze, a Multiuser Game on the Internet. http://citeseer.ist.psu.edu/gautier97mimaze.html, 1997. 12. Gnutella web site. http://gnutella.wego.com. 13. M. Jelasity, R. Guerraoui, A.-M. Kermarrec, and M. van Steen. The peer sampling service: Experimental evaluation of unstructured gossip-based implementations. In Proc. of the 5th Int. Middleware Conf., Toronto, Canada, Oct. 2004. 14. M. Jelasity, W. Kowalczyk, and M. van Steen. Newscast computing. Technical Report IR-CS-006, Vrije Universiteit Amsterdam, Department of Computer Science, Amsterdam, The Netherlands, November 2003. 15. M. Jelasity, A. Montresor, and O. Babaoglu. Gossip-based aggregation in large dynamic networks. ACM Trans. Comput. Syst., 23(1):219–252, 2005. 16. L. Lehman and S. Lerman. Pcoord: network position estimation using peer-to-peer measurements. In Proc. of the 3rd IEEE International Symposium on Network Computing and Applications (NCA’04, 2004. 17. N. Leibowitz, M. Ripeanu, and A. Wierzbicki. Deconstructing the kazaa network, 2003. 18. A. Montresor. A Robust Protocol for Building Superpeer Overlay Topologies. In Proc. of the 4th Int. Conf. on Peer-to-Peer Computing, Zurich, Switzerland, August 2004. IEEE. To appear. 19. T. Ng and H. Zhang. Predicting internet network distance with coordinates-based approaches. In Proc. of IEEE Infocom, 2002. 20. Peercast P2P Radio. http://www.peercast.org. 21. Peersim Peer-to-Peer Simulator. http://peersim.sf.net.

Proximity-Aware Superpeer Overlay Topologies

57

22. A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), pages 329–350, Nov. 2001. 23. N. Sheldon, E. Girard, S. Borg, M. Claypool, and E. Agu. The effect of latency on user performance in Warcraft 3. In Proc. of the 2nd Workshop on Network and System Support for Games, pages 3–14, New York, NY, USA, 2003. ACM Press. 24. A. Singh and M. Haahr. Creating an adaptive network of hubs using schelling’s model. Commun. ACM, 49(3):69–73, 2006. 25. Skype: Free internet telephony that just works. http://www.skype.com. 26. Unreal networking protocol notes by tim sweeney. http://unreal.epicgames.com/ Network.htm. 27. B. Yang and H. Garcia-Molina. Designing a super-peer network. In IEEE International Conference on Data Engineering, 2003, 2003. http://bpubs.stanford. edu/pub/showDoc.Fulltext?lang=en&doc=2003-33&for%mat=pdf&compression=. 28. S. Zanikolas and R. Sakellariou. Towards a monitoring framework for worldwide grid information services. In Euro-Par, pages 417–422, 2004.

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services Marco Mamei and Franco Zambonelli Dipartimento di Scienze e Metodi dell’Ingegneria, University of Modena and Reggio Emilia Via Allegri 13, 42100 Reggio Emilia, Italy {mamei.marco, franco.zambonelli}@unimore.it Abstract. Overlay data structures are a powerful mechanism to provide application components with context-information and to let them interact in dynamic-network scenarios like mobile ad-hoc networks and pervasive computing. These overlays can be propagated across a network in order to support components’ context awareness and coordination activities. We present a modeling framework and some autonomic algorithms to create overlay data structures that are able to self-maintain their intended distribution under a number of circumstances. The paper presents some experiments and performance measures to validate our approach and to show its scalability.

1

Introduction

One of the main challenges for developing distributed applications in pervasive computing scenarios is to provide application components with autonomic and adaptive coordination mechanisms capable of sustaining the dynamics of the operational environment and of adapting to different contexts. A number of recent researches try to address this problem by relying on overlay data structures. Overlay data structures are distributed data structures encoding specific aspects of the application components’ operational environment. These overlays are propagated across a network by a component in order to represent and “communicate” its own activities. Overlay data structures are easily accessible by the components and provide easy-to-use context information (i.e., the overlays are specifically conceived to support their access and fruition). The strength of these overlay data structures is that they can be accessed piecewise as the application components visit different places of the distributed environment. This lets the components to access the right information at the right location. In addition, overlay data structures decouple components’ activities from the underlying network dynamism. Components interacting and perceiving their operational environment by means of these overlay data structure can disregard the underlying physical network and its dynamics. From this point of view, overlay data structures enable “stigmergy” [1] in that components’ interactions can be mediated by these kind of overlay “markers” distributed across the environment. From another perspective, overlay data structures generalize the idea of overlay networks. Overlay networks arebasically A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 58–72, 2006. c Springer-Verlag Berlin Heidelberg 2006 

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services

59

routing distributed data structures providing components with a suitable application-specific view of the network (i.e. they allow components to perceive a specific overlay topology of the network) [2]. Overlay data structures do not focus on network topology only. They are general-purpose and can encode any kind of context information. To clarify these concepts let us focus on the problem of coordinating the movements of some autonomous components (i.e., agents) in a distributed environment [3]. Hereafter we will use the term agent to refer to any autonomous real-world or software entity with computing and networking capability (e.g., a user carrying on a Wi-Fi PDA, a robot, or a modern car). In particular, we focus on the simple application of having two persons, provided with a PDA, moving across an environment instrumented with an ad-hoc network infrastructure. The goal of the application is to allow one person to be guided by the PDA, to follow the other person. A simple solution based on overlay data structures is the let the person to-be-followed to spread in the environment (i.e., ad-hoc network) a data structure that increases an integer value by one at every hop as it gets farther from the source. This creates a sort of gradient that can be followed downhill by the other person to complete the application [4] (see Figure 1(a)). If the person to-be-followed moves, it is important that the overlay data structure adjust its shape accordingly, so that the gradient leads to that person anyway (see Figure 1(b)). The power of this approach is that the overlay data structure provides expressive contextual information tailored for that specific task. The agent running on the PDA does not need to know any map of the environment, nor it has to execute complex algorithms to decide where to go. It just blindly follows the overlay data structure. Beside this exemplary application, overlay data structures are general purpose and can be applied in a wide range of application scenarios, ranging from robotics to network routing (see the following section for a brief review of their use in this context). The main contribution of this paper is to present a general framework to model and implement overlay data structures in dynamic network environments. In particular, we focus on some decentralized algorithms that enable overlay data structures to maintain their intended distribution despite network topology changes.

(a)

(b)

Fig. 1. (a) A gradient overlay data structure enables an agent to follow another one. (b) The data structure is updated to reflect the new agent position.

60

M. Mamei and F. Zambonelli

A number of experiments is reported to assess the efficiency of our approach. In particular, the experiments show that the proposed algorithms are scalable in that the operations to maintain an overlay data structure tend to be confined near the place where the data structure had been corrupted. Moreover, some experiments to evaluate the time taken to propagate, delete and maintain such overlay data structures have been conducted. These experiments show that the time required to fix an overlay is a specific (bell-shaped) function of the distance from where the data structure get corrupted. This time drops almost to 0 when the distance is big enough confirming our scalability results.

2

Application Scenarios and Related Work

In this section, we first introduce several application scenarios where such overlay data structures have been successfully applied. The contribution of this brief survey is to illustrate the wide range of scenarios that benefit from these overlay data structure and to show their generality. Then, we present currently proposed approaches that can be employed to create and maintain these data structures, showing their shortcomings. Both these aspects are the main motivations at the root of our proposal that will be described later. 2.1

Application Scenarios

Motion Coordination. As already stated in the introduction, overlay data structures, spread across a properly networked environment, have been used in [3, 4] for the sake of enabling agents (e.g. users carrying a PDA, robots, cars) to coordinate their respective movements. The goals of agents’ coordinated movements can be various: letting them to meet somewhere, distribute themselves accordingly to specific spatial patterns, or simply move in the environment without interfering with each other and avoiding the emergence of traffic jams. As previously stated, overlay data structures provide suitable tools for this task, in fact, they can be accessed piecewise to guide agents motion step-by-step. Routing in Mobile Ad-Hoc Network. Routing can be easily modelled as a coordination problem: agents (i.e. network nodes) need to cooperate forwarding each other messages to enable long-range, multi-hop communication. The main principle underlying many routing algorithms is to build several overlay data structures (implemented by means of a set of distributed routing tables) suitable to provide route information. Specifically, these data structures create paths in the network enabling agents to forward messages in the right direction. These paths (i.e. data structures) are maintained to take into account changes in the network topology [5]. The idea at the basis of distributed routing data structure is the same as motion coordination: provide agents with a ready-to-use representation of the context (i.e. where the message should go next). Swarm Intelligence. From a general perspective, overlay data structures are at the core of a number of swarm-intelligent (e.g. ant-inspired) systems [1]. These approaches mimic the way in which social insects, like ants, coordinate their

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services

61

activities to achieve complex tasks (e.g. the mechanism used by ants to find food can be used in the context of computer networks to route packets or find relevant resources [1, 6]). The key to these approaches is in emulating the way in which ants interact one another. They do so by means of pheromone signals they spread in the environment that will be perceived by other ants later on. These pheromone signals can be used to find food sources, or to coordinate efforts in moving some heavy objects, etc. Pheromone signals can be easily modelled by means of overlay data structures. Overlay data structures implementing the concept of pheromone could be distributed by the agents themselves as they move across the network. These data structures can then be used as trails driving agents’ activities. The research projects Anthill [6] and SwarmLinda [7] share the idea of applying ant-inspired algorithms to Internet-scale Peer-to-Peer systems. Here, overlay data structures - modelling ants’ pheromones - create paths connecting peers that share similar files, thus enabling, for example, an effective content-based navigation in the network of peers. Amorphous computer. Overlay data structures are at the core of the amorphous computer [8] research. An amorphous computer consists of massive numbers of identically-programmed and locally-interacting computing agents, embedded in space. It can be modelled as a collection of “computational particles” sprinkled randomly on a surface or mixed throughout a volume. Overlay data structures can be spread and deployed in the amorphous computer to let various patterns and shapes emerge among the computational particles. Just to mention few trivial examples, if a leader particle spreads a hop-increasing overlay data structure (as defined above), it is possible to create approximately circular regions of controlled size: particles sensing the overlay are able to determine if they are in or out a specific circular region of radius R (i.e. they are in if they sense the data structure with a value lower than R). Similarly, if a line of particles propagate the above data structure, stripes instead of circles can be identified in the amorphous computer [8]. Modular Robotics. A modular or self-reconfigurable robot is a collection of simple autonomous actuators with few degrees of freedom connected with each other. A distributed control algorithm is executed by all the actuators to let the robot assume a global coherent shape or a global coherent motion pattern (i.e. gait). Some proposed approaches adopt overlay data structures to control such a robot [9]. A distributed shape or motion gait is encoded by means of overlay data structures spread across the robot specifying how the robot’s actuators should bend. Robots are programmed to bend their actuators depending on the sensed data, thus realizing the prescribed motion gait. 2.2

Related Work

In all the above examples it is clear that the self-maintained property for the data structures being used in an asset of tremendous importance. In fact, failures in the data structures typically break down the application. For example, in a motion coordination task, broken overlay data structures are perceived as

62

M. Mamei and F. Zambonelli

dead-ends by the application agents (i.e., agents follow an overlay that does not lead where it is supposed to). As another example, in modular robotics, broken overlay – providing wrong information on where to bend the actuators – can disrupt the the global gait. On the contrary, if the overlays self-adapt their structure to remain coherent, the robot can keep going despite module failures and even partitions (i.e., a robot that split in two “produces” two smaller robots that are still able move). Given the wide range of application scenarios and the importance of maintaining the overlay data structures, it is not surprising that some mechanisms to create and maintain such data structures have already been proposed. In very general terms, there are two main approaches proposed in literature: proactive algorithms [10, 11] and reactive algorithms [5]. Proactive algorithms keep-up the overlay data structures by simply letting agents to repropagate them on a time-basis. If repropagations are more frequent than network changes, the data structures maintain (almost always) their intended distribution. Although some mechanisms (e.g., in sensor network [11]) propose to aggregate overlay data structures to save bandwidth [10], the problem of this approach is about its scalability. Large and dynamic networks are likely to be saturated by repropagations. A partial relief from this problem comes from opportunistic and epidemic algorithms [12, 13]. These are proactive algorithms that, instead of flooding the network to update the whole structure, update only sub-parts of the overlay. Reactive algorithms, originally introduced as mobile ad-hoc network routing algorithms, propose not to maintain the overlay data structures. Network dynamics can break the overlay data structures, but, when a node need to access one of them, it engages a discovery protocol to find the correct value of the data structures [5, 14]. Basically, such protocol amounts at finding - via a flood-based mechanism - the source of the overlay data structure, then to let the source resend the correct data accordingly to the new network conditions. This approach is very effective if nodes are highly mobile and they have to communicate only sporadically. On the contrary, if nodes need to constantly access the overlay data structures - like in most of the above mentioned applications - this approach can trash performance. The main contribution of this paper is to try to overcome such proposals and to present some algorithms trying to combine the best of both proactive and reactive approaches to let the overlay data structures self-maintain, without flooding the network while still continuously providing components with the correct data structures’ values.

3

A Framework to Model Overlay Data Structures

In this section we present a modeling framework to create self-maintained overlay data structures. It is important to remark that these overlays are at the core of the distributed middleware infrastructure called Tuples On The Air (TOTA) developed within our group [3, 4]. However, in this paper we do not discuss such

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services

63

a middleware and focus only on the data structures themselves: how they can be created and which algorithms are needed to support them. Overlay data structures can be defined by means of a couple (C,P). The content C can be an arbitrary data structure representing the information carried on by the data structure. The propagation rule P determines how the overlay data structure should be distributed and propagated across the network. This includes determining the “scope” of the overlay (i.e. the distance at which it should be propagated and possibly the spatial direction of propagation) and how such propagation can be affected by the presence or the absence of other data structures in the system. In addition, the propagation rules can determine how the content C should change while it is propagated. Overlay data structure are not necessarily distributed replicas: by assuming different values in different nodes, they can be effectively used to shape a structure expressing some kind of contextual and spatial information. In addition, a maintenance mechanism should be enforced to let the overlay data structure preserve its intended distribution (C,P) despite network contingencies. The idea of overlay data structures can potentially be implemented on any distributed middleware providing basic support for data storing (to store data values), communication mechanisms (to propagate overlay data structures) and event-notification mechanisms (to update overlay data structures and notify agents about changes in overlay data structures’ values). In most of the application scenarios, described in section 2.1, there are two main kind of overlay data structures being employed. They will be described in the next subsections. 3.1

Hop-Based Overlay Data Structures

Hop-based overlay data structures are those having a content (C) and a propagation-rule (P) that depend only on the hop-distance from the source. Hop-based overlay data structures enable to express and diffuse across the network (possibly within a bounded scope) contextual information related to the network distance from the source. These kind of data structures have been widely used in motion coordination, routing and location-based-information-access applications [4, 8]. We designed the Hop data structure as a basic template to build this kind of overlays. In particular, the Hop data structure has a integer hop-counter (hop) as a content. Once one of these data structures is injected in the network, it propagates breadth-first maintaining the hop-distance from the source. It is clear that, once the above hop-distance is maintained within the data structure, adding other parameters or triggering conditions on the basis of such hop-value becomes trivial (e.g., add the parameter k = f(hop) , or the condition “stop propagating if hop > 5”). These overlay data structures have to be maintained despite network topology changes either due nodes mobility or failures. The self-maintaining algorithm that will be described next performs exactly this task. The strength of these data structures, from a software engineering point of view, is that agents have

64

M. Mamei and F. Zambonelli

simply to inject these data structures without further taking care of their update. All the burden in maintaining data structures is moved away form the agents. 3.2

Space-Based Overlay Data Structures

Space-based overlay data structures are those having a content (C) and a propagation-rule (P) that depend on the spatial coordinates (i.e. location) of the node hosting the data. To realize this kind of overlay data-structures it is thus fundamental to provide agents with a suitable localization mechanism. Localization can be either realized by means of specific hardware devices or also via specific self-localization algorithms, discussed later, that use hop-based overlay data structures. These overlays enable to express contextual information related to the spatial location of the application agents. They have been widely used in sensor network scenarios and peer-to-peer application. [15, 4]. We designed two main types of space-based overlay data structures: Metric and Space data structures. They have three numbers (x,y,z) as a content. Once one of these data structures is injected in the network, it propagates changing its content so that (x,y,z) reflect the coordinates of the node in a coordinate system centered where the data structures was first injected. It is clear that, once the above coordinates are maintained within the data structure adding other parameters, or triggering conditions on the basis if such coordinates is trivial. We will describe the differences between these data structures (related to their maintenance algorithm) in the next section. The design of Metric and Space data structures, given the availability of a GPS-like device, is straightforward. Once injected, the data structures will store the injecting node GPS coordinates and will initialize its content to (0,0,0). Upon reaching a new node, it will change the content to the GPS coordinates of the new node translated back by the injecting node coordinates. It is also very interesting to report that some recently proposed localization algorithms (e.g. beacon-based triangulation) [16] relies on Hop overlay data structures to create a coordinate system over the network. Basically, in these algorithms a number of beacon-nodes spread across the network Hop data structures expressing the hop-distance from themselves together with their coordinates in some (possibly arbitrary) reference frame. Other nodes infer their own coordinates by triangulating the distances from these beacon nodes. Such coordinate system can then be used as the basis for space-based overlay data structures (they act as a GPS-like device). Self-maintenance of hop-based overlays assures that the coordinate system remains always up to date. Self-maintenance of the space-based overlays enable such data structures to be consistent despite lowlevel hop-based adjustments.

4

Self-maintenance Algorithms

In this section we present the main contribution of this paper. It consists of two algorithms to enable hop-based and space-based overlay data structures

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services

65

to self-maintain their distribution despite network dynamism. For the sake of clarity, in the followings, we will use the term data structure to refer to the whole distributed overlay. We will use the term data to indicate a single piece of the overlay stored in a single node. For example a Hop data structure is made of a number of data each stored in a node of the network. 4.1

The Hop-Based Self-maintenance Algorithm

The most significant algorithm we will describe is the one allowing Hop data structures to self-maintain their shape despite network dynamism. For obvious scalability reasons, we would like the burden of such algorithm to be evenly distributed among all the network nodes. Recall that a Hop data structures propagates increasing its integer content by one at every hop. Given a local instance of such a data X, we will call another data Y a supporting data of X if: Y belongs to the same distributed data structure as X, Y is one-hop distant from X, the value of Y is equal to the value of X minus one. With such a definition, a supporting data of X is a data that could have created X during its propagation. Moreover, we will say that X is in a safe-state if it has at least a supporting data, or if it is in the node that first injected the data structure (i.e. hop value = 0). We will say that a data is not in a safe-state if the above condition does not apply (i.e. it has not any supporting data and it has a hop value greater that 0). The basic idea is that a data that is not in a safe-state should not be there, since no neighbor data could have created it. Each local data can subscribe to the arrival or the removal of other data of its type in its one-hop neighborhood. Upon a removal, each data reacts by checking if it is still in a safe-state. In the case a data is not in a safe state, it erases itself from the local node. This eventually causes a cascading deletion of data until a safe-state data can be found, or the source is eventually reached, or all the data in that connected sub-network are deleted. In the case a data is in a safe-state, the removal of neighbor data triggers a reaction in which the data propagates to that node. It is worth noting that this mechanism is the same as when a new node is connected to the network. Similar considerations apply with regard to data arrival: when a data senses the arrival of a data having a value higher than its own plus one, it means that, because of topology changes, a short-cut leading to the source has been created. In such a situation the data can propagate to the new node to overwrite the previous data, fixing the data structures shape. This set of mechanism is enough to make Hop data structures self-maintain. To prove the validity of this algorithm we will show its correctness with regard to four special cases (see figure 2). The rationale behind these four spacial cases is answering the following questions: does the network topology change implies a link creation or removal? Is the changed link the only one connecting two networks or there are others? It is rather clear that the four possible yes/no answers to these questions (four special cases) can be generalized to cover all the other possibilities (i.e, all the possible topology reconfigurations). The four special cases proving the correctness of Hop data structures self-maintenance are

66

M. Mamei and F. Zambonelli

Network Network pocket pc

pocket pc

A

T=n

A

T=n

New Link pocket pc

T=n+1

pocket pc

T=n+1

B

B

pocket pc

T=n+2

pocket pc

C

T=n+2

C

Network

Network

(a)

(b) D

D

pocket pc pocket pc

T=k

Network

T=k

Network pocket pc

pocket pc

T=n

A

New Link

T=n

A

pocket pc

pocket pc

pocket pc

pocket pc

T=k+1

C

T=n+2

T=k+1

T=n+1 T=k+2 B

T=n+2

T=n+1=( k+2)

B Network

Network

(c)

C

(d)

Fig. 2. Hop-maintenance in four special cases

reported in figure 2. In figure 2(a) the link between A and B breaks down. Since the data on B has not any supporting data, it is not in a safe state anymore. Thus it deletes itself. After that, the data on C does not see any supporting data, thus it deletes itself. This applies recursively in all the bottom network. After that, the distributed data structure is in a consistent state with respect to the new topology. In figure 2(b) a new link between A and B is created. The data on A propagates to B and then recursively to the all bottom network. In figure 2(c) the link between A and B breaks down. Like in case (a) this causes a cascading deletion until reaching a safe-state data. In this example, the safe-state data is in node D. When this latter data sees that the data on C gets deleted it can propagate toward C fixing the gap. The propagation applies recursively to the whole bottom network, adjusting the distributed data structures. In figure 2(d) a new link is created between C and D. The data on D finds in its neighbor a data with value greater than its own plus one (i.e. n + 2 > k + 1). Thus, it propagates to C, overwriting the data on C. This process applies recursively until the right node is found (i.e. where the two branches of the data structure seamless merge: n + 1 = k + 2). Clearly if the network constantly changes its topology faster than the time required to the algorithm to fix the data structure shape the process could never converge. However the self-maintenance algorithm ensures that, once the network stabilizes, the algorithm eventually makes the overlay distributed data structure converge to a consistent state. 4.2

The Space-Based Self-maintenance Algorithm

Given a reliable coordinate system, maintaining space-based overlay data structures is rather easy. Once these data structures have been propagated, if a node

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services

67

moves (not the source one), only the data local value is affected, while all the others are left unchanged. In fact, the others physical positions with respect to the source do not change. In particular, when a node moves (not the source one), the data structure locally changes its content by accessing the new GPS information. To understand the difference between Metric and Space data structures, it is fundamental to focus on what happens when the source moves. In theory, all the data of the overlay must be changed because the origin of the coordinate system has shifted. This is exactly what happens in Metric data structures where the origin of the coordinated system is anchored to the source node. This of course can lead to scalability problems, especially if the source is highly mobile. What happens if also the source updates its value locally, without further propagating? In this case, the origin of the coordinate system remains where the data structure was first injected, even if no nodes are in that position. The coordinate system is maintained by the network, but not affected by it. This is the Space data structure implementation.

5

Performance and Experiments

The effectiveness of our approach is of course related to costs and performance in managing overlay distributed data structures. 5.1

Overhead

The cost of propagating a data structure, relying on a multi-hop mechanism, is something inherently scalable. Each node will have to propagate the data structure only to its immediate neighbors. The size of the network does not matter since the global effort to spread the data structure is fairly partitioned between the constituting nodes. The scalability of data structures maintenance is less clear. The main requirement for our algorithms is to be independent of the network size. This implies maintenance operations must be confined within a locality from where events that altered the data structure (e.g., a network topology change) happened. If it is so, concurrent events happening at distant points of the network do not accumulate locally. If on the contrary maintenance operations always spread across the whole network, distant concurrent events do accumulate and the system does not scale. With regard to Hop data structures, establishing if maintenance operations are confined to an area neighboring the place in which the network topology had actually changed is rather complicated. The size of this neighborhood is not fixed and cannot be predicted a-priori, since it depends on the network topology. Thus, trying to answer, we exploited a network simulator developed within our group [4], performing a large number of experiments to measure the scope of maintenance operations. To perform the experiments, we run several simulations varying the node density and their initial position. In particular, we run six sets

68

M. Mamei and F. Zambonelli

of experiments where we randomly deployed 200, 250, 300, 350, 400, 450 nodes in the same area; thus obtaining an increasing node density and a shrinking network diameter. All the experiments were repeated a large number (over 100) of times with different initial network topologies and the result were averaged together. The experiment consisted in a randomly chosen node injecting a Hop data structure in the network. After that, randomly chosen nodes start moving independently (following a random waypoint motion pattern) perturbing the network. In particular, a randomly picked node moves randomly for a distance equals to 1 wireless radius. This movement changes the network topology by creating and disrupting links. The number of messages sent between nodes to adjust the data structure, according to the new topology, is recorded. Specifically, we evaluate the average number of messages exchanged by nodes located at xhop away from the moving node. Then, we average these numbers over a large set of topology changes. The results of this experiment are in figure 3(a). The experiments reported in figure 3(b) have been conducted in the same manner. This time, however, nodes move for a distance of 1/4 wireless radius. This second set of experiments is intended to show what happens for very little topology reconfigurations (wider reconfigurations can be depicted as a chain of these smaller ones). The most important consideration we can make looking at the figure is that, when a node moves and the network topology changes consequently, a lot of update operations will be required near the area where the topology changes, while only few operations will be required far away from it. This implies that,

Fig. 3. The number of maintenance operation decreases sharply with the hop distance from topology reconfigurations caused by: (a) random node movements for 1 wireless radius. (b) random node movements for 1/4 wireless radius.

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services

69

even if the network and the data structures being propagated have no artificial boundaries, the operations to keep their shape consistent are strictly confined within a locality scope. This result is even more significant if compared to the average network diameter (averaged over the various experiments). It is easy, in fact, to see that the number of operations required to maintain a data structure falls close to zero well before the average diameter of the network, thus confirming the quality of our results. This fact supports the idea that the operations to fix distant concurrent topology changes do not add up, making the system scalable. With regard of Metric and Space Data data structures, determining if their self-maintenance operations are confined is rather easy. In section 4.2 we described that Metric data structures’s maintenance operations are confined to the only node that moved for all the nodes apart from the source, while it spreads across the whole network if the source moves. So Metric data structures must be used carefully, maybe with custom rules in their propagation rules limiting a-priori their scope, or by triggering update operations only if the source node moves by at least a certain amount (e.g. trigger update only if the source moves for at least 1m). The answer for a Space data structures, instead, is clearly affirmative since maintenance is strictly locally confined. How much time is required to spread a data structure across the network? How much time to delete the data structure? How much time to let a data structure maintain its shape? These are fundamental questions to evaluate our algorithms. 5.2

Timing

The time to propagate or to delete a data structure can be easily computed in theory. Let us focus on the basic operation of a Hop overlay structure travelling from a node to another neighbor node and let us assume the average time to perform such operation as the unity Tu = 1 of our timing model. More in detail, we can define Tu as: Tu = Tprop + Tsend + Ttravel + Trcv . Tprop is the time taken by a data structure to run its algorithm on a node. Tsend is the time required to serialize and send the data content. Ttravel is the time to letting the stream of data arrive on the other node. Trcv is the time to receive, deserialize the data and have it ready again to execute its methods. This abstract timing model allows us to abstract from low-level details such as the network being used and the time to parse and process the data structures. All this details are wrapped by our abstract unity. Under this hypothesis, it is rather easy to understand that a data structure will propagate at X hop distance in a X time. To assess this theoretical model, we performed several experiments with our simulator. In particular, a randomly selected node injects several overlay data structures in the network. The time taken by the data structures to reach a distance of x-hop away from the source has been recorded. The results were averaged together over a large set of experiments. The results are depicted in figure 4(a). Looking at the graph, we notice a small disagreement between the theoretical model and experimental results. This can be easily explained, considering that data structures propagation does not happen always in a perfect

70

M. Mamei and F. Zambonelli

(a)

(b)

(c) Fig. 4. (a) Time required to propagate and delete a data structure: theoretical model and simulations experiments. (b) The time required to fix a data structure randomly selected nodes move of 1 wireless radius. (c) Nodes move of 1/4 wireless radius.

expanding ring manner. To correct the imbalances and consequent backward propagations, some extra operations are required. These extra operation account for the time gap between the theoretical model and experimental results. Evaluating in theory the time to fix a Hop data structure is not easy since it depends on the topology of the network. Given this problem, we focused again on simulations to test the system performance, and we considered the same experimental set up of overhead experiment. In this set of experiments, however, instead of counting the number of operations required to fix the structure, we measured the time taken. In particular, for a given network reconfiguration (caused by moving nodes), we recorded the time at which a node performs the last operation to fix the overlay data structure. These times are grouped by the hop distance from the moving node and averaged together. These operations have been repeated several (more then 100) times and all the outcomes have been averaged together to obtain the results depicted in figures 4(b-c).

Self-Maintaining Overlay Data Structures for Pervasive Autonomic Services

71

In these experiments it is possible to see that the time to complete maintenance operations has a rough bell-shaped behavior. It has a low value near the topology problem. This is because, the data close the the topology change is the first to be maintained (so they complete maintenance in short time). Then it increases with the hop-distance, since it requires time to the local algorithm to propagate information across the network to delete and update those data that must be maintained. Even further, it decreases because of the fact that maintenance tend to remain confined, and so a lot of data“maintain” in 0 time. With regard to the other overlay data structures; Space data structures are maintained with only one-hop-bounded operations, so they always maintain with a delay equals to 1. Metric data structures either maintain with one-hop-bounded operations, and so with a delay of 1, or if the source moves, they are repropagated and thus timing evaluation falls in the propagation and deletion case (previous subsection).

6

Conclusion and Future Work

In this paper we have presented a modeling framework and some algorithms to create self-maintained overlay data structures. Such overlay data structures have been proven useful in developing a number of application tasks in the context of distributed computing. Our future work will be mainly devoted in researching on how the presented overlay data structures can be extended and generalized to cover more diverse application tasks.

Acknowledgments Work supported by the project CASCADAS (IST-027807) funded by the FET Program of the European Commission.

References 1. Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence. From Natural to Artificial Systems. Oxford University Press, Oxford, United Kingdom (1999) 2. Ratsanamy, S., Francis, P., Handley, M., Karp, R.: A scalable content-addressable network. In: ACM SIGCOMM Conference. ACM Press, San Diego, CA, USA (2001) 3. M. Mamei, F.: Field-based Coordination for Pervasive Multiagent Systems. Springer Verlag (2006) 4. Mamei, M., Zambonelli, F.: Programming pervasive and mobile computing applications with the tota middleware. In: IEEE International Conference On Pervasive Computing. IEEE CS Press, Orlando, FL, USA (2004) 5. Butera, W.: Embedded networks: Pervasive, low-power, wireless connectivity (2001) PhD Thesis, Massachusstes Institute of Technology. 6. Babaoglu, O., Meling, H., Montresor, A.: A framework for the development of agent-based peer-to-peer systems. In: International Conference on Distributed Computing Systems. IEEE CS Press, Wien, Austria (2002)

72

M. Mamei and F. Zambonelli

7. Menezes, R., Tolksdorf, R.: A new approach to scalable linda-systems based on swarms. In: ACM Symposium on Applied Computer. ACM Press, Orlando, FL, USA (2003) 375 – 379 8. Nagpal, R.: Programmable self-assembly using biologically-inspired multiagent control. In: Proceedings of the 1st Joint Conference on Autonomous Agents and Multi-Agent Systems. ACM Press, Bologna, Italy (2002) 418 – 425 9. Shen, W., Salemi, B., Will, P.: Hormone-inspired adaptive communication and distributed control for conro self-reconfigurable robots. IEEE Transactions on Robotics and Automation 18 (2002) 1 – 12 10. Borcea, C., Iyer, D., Kang, P., Saxena, A., Iftode, L.: Cooperative computing for distributed embedded systems. In: International Conference on Distributed Computing Systems. IEEE CS Press, Wien, Austria (2002) 11. Intanagonwiwat, C., Govindan, R., Estrin, D.: Directed diffusion: A scalable and robust communication paradigm for sensor networks. In: ACM Mobicom. ACM Press, Boston, MA, USA (2000) 12. Chen, Y., Schwan, K.: Opportunistic overlays: Efficient content delivery in mobile ad hoc networks. In: International Middleware Conference, LNCS 3790, Grenoble, FR (2005) 13. Eugster, P., Guerraoui, R., Handurukande, S., Kouznetsov, P., Kermarrec, A.: Lightweight probabilistic broadcast. ACM Transactions on Computer Systems 21 (2003) 341 – 374 14. Broch, J., Maltz, D., Johnson, D., Hu, Y., Jetcheva, J.: A perfomance comparison of multi-hop wireless ad hoc network routing protocols. In: ACM/IEEE Conference on Mobile Computing and Networking. (ACM Press) 15. Borcea, C., Iyer, D., Kang, P., Saxena, A., Iftode, L.: Spatial programming using smart messages: Design and implementation. In: International Conference on Distributed Computing Systems. IEEE CS Press, Tokio, Japan (2004) 16. Nagpal, R., Shrobe, H., Bachrach, J.: Organizing a global coordinate system from local information on an ad hoc sensor network. In: International Workshop on Information Processing in Sensor Networks, Pasadena, CA, USA (2003)

Using Aggregation for Adaptive Super-Peer Discovery on the Gradient Topology Jan Sacha, Jim Dowling, Raymond Cunningham, and Ren´e Meier Distributed Systems Group, Trinity College, Dublin {jsacha, jdowling, rcnnnghm, rmeier}@cs.tcd.ie Abstract. Peer-to-peer environments exhibit a very high diversity in individual peer characteristics ranging by orders of magnitude in terms of uptime, available bandwidth, and storage space. Many systems attempt to exploit this resource heterogeneity by using the best performing and most reliable peers, called super-peers, for hosting system services. However, due to inherent decentralisation, scale, dynamism, and complexity of P2P environments, self-managing super-peer selection is a challenging problem. In this paper, decentralised aggregation techniques are used to reduce the uncertainty about system properties by approximating the peer utility distribution allowing peers to calculate adaptive thresholds in order to discover appropriate super-peers. Furthermore, a heuristic search algorithm is described that allows super-peers, above a certain utility threshold, to be efficiently discovered and utilised by any peer in the system.1

1

Introduction

Measurements on deployed peer-to-peer (P2P) systems show that the distributions of peer characteristics such as the uptime, bandwidth, or available storage space, are highly skewed and often follow heavy-tailed distribution [1, 2, 3]. Researchers have also reported that the use of low stability or low performance peers can lead to poor performance in a P2P system [4, 5]. Consequently, in order to improve the system efficiency, many P2P systems attempt to assign most important services to selected, high capability peers, called super-peers [6, 7, 8, 9, 10]. However, super-peer election in P2P environments poses a number of difficult problems. Due to the massive scale, dynamism, and complexity of P2P systems, it is not feasible for a peer or any other entity to maintain a global view of the system. Inherent decentralisation of P2P environments introduces uncertainty in decision making. Traditional election algorithms, such as the Bully algorithm [11], and other classical approaches to group communication [12], potentially require communication with all peers in the system, and can only be applied to small clusters of peers. Other approaches to super-peer election, based on flooding or random walking [13], are difficult in large P2P systems due to high 1

This work was supported by the European Union funded ”Digital Business Ecosystem” Project IST-507953.

A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 73–86, 2006. c Springer-Verlag Berlin Heidelberg 2006 

74

J. Sacha et al.

communication overhead that increases as the size of the P2P system grows. Solutions based on manual or static configuration of super-peers are inappropriate due to a lack of system-wide knowledge of peer characteristics. This paper proposes a decentralised and self-managing approach to superpeer discovery based on gossipping. The paper introduces a peer utility metric in section 3 and applies a decentralised aggregation algorithm, covered in section 4, that generates a utility histogram in order to estimate the distribution of peer utility in the system. In section 5, a self-organising gradient topology is constructed based on the utility metric that allows peers to apply an efficient search heuristic for super-peer discovery described in section 6. The utility histogram is used for adaptive super-peer criteria selection. The presented approach is evaluated in section 7 and this evaluation shows that the aggregation algorithm provides a good approximation of peer utility distribution and that the search heuristic based on the utility metric achieves high search performance (significantly better than random walking). Section 8 concludes the paper.

2

Related Work

Recent research on P2P systems has been primarily focused on Distributed Hash Tables [14, 15, 16, 17], where the main goal is to provide efficient routing between any pair of peers. In our approach, we are focusing on searching for peers with particular properties in the system (high utility), and assuming that system services are placed on these peers, we provide a mechanism that allows the efficient discovery and consumption of these services. Furthermore, DHTs assume that peer identifiers are unique and relatively static, uniformly distributed in a key space. In our approach, the utility is dynamic and may potentially follow any distribution with multiple peers allowed to have the same utility value. A number of P2P systems based on super-peers have been proposed. Yang and Molina [6] investigate general principles of designing super-peer-based networks, however, they do not provide any specific super-peer election algorithm. OceanStore [18] proposed to elect a primary tier “consisting of a small number of replicas located in high-bandwidth, high connectivity regions of the network” for the purpose of handling updates, however, no specific algorithm for the election of such a tier is presented. Brocade [8] improves routing efficiency in a DHT by exploiting resource heterogeneity, but unlike our approach, it does not address the super-peer election problem. In Chord [14, 10], it has been shown that the load between peers can be balanced by assigning multiple virtual servers to high performance physical hosts. The DHT structure may be used for the discovery of under- or over-loaded peers using random sampling, distributed directories, and other similar techniques. Mizrak et al [9] proposed the use of high capacity super-peers to improve routing performance in a DHT. However, these systems focus on load balancing in a DHT rather than the selection of potential super-peers from the set of all peers in the system.

Using Aggregation for Adaptive Super-Peer Discovery

75

Montresor [7] proposes a protocol for super-peer overlay generation, however, unlike our gradient topology, his topology maintains a discrete (binary) distinction between super-peers and client peers. In contrast, our approach introduces a continuous peer utility spectrum and approximates the distribution of peer utility in the system in order to discover super-peers above an adaptive utility threshold. Our neighbour selection algorithm can be seen as a special case of the T-Man protocol [19] that generates a gradient topology, where the ranking function is based on peer utility. The advantage of such a utility ranking function is that applications can exploit the underlying topology to elect appropriate super-peers. Kempe et al [20] describes a push-based gossip algorithm for the computations of sums, averages, random samples, and quantiles, and provides a theoretical analysis of the algorithm. Montresor, Jelasity and Babaoglu [21, 22] introduce a push-pull-based approach for aggregate computation, however, their model assumes that message exchange between any two peers is atomic and that the clocks of peers are synchronised. We have extended Kempe’s algorithm to calculate histograms, and we have added a peer leave procedure that improves the behaviour of the algorithm in the face of peer churn. We are using the aggregated data for super-peer discovery.

3

Peer Utility

This section introduces peer utility as a metric that captures the applicationspecific requirements and measures the capability of a peer to become a superpeer. Depending on the domain, the utility metric may involve a number of parameters. For example, in a P2P storage system, the utility may place greater emphasis on a peer’s available local storage space and bandwidth. In a multimedia streaming application, the utility may combine a peer’s latency and bandwidth, while in a grid computing system the utility may be a function of a peer’s CPU load and availability. A simple approach to utility calculation would be for each peer to individually compute their own utility. A more sophisticated utility metric may involve feedback from neighbouring peers. In either case, the utility of a peer is a local or microscopic property of a peer (or neighbourhood of peers). In an untrusted environment, a decentralised approach to trust may be adopted to prevent malicious peers from providing fake utility information about themselves. Given that the utility of each peer in the topology can be calculated by a common function U (), the selection of super peers then becomes a question of how can an individual peer discover a high utility peer. In one possible approach, a peer may search for a super-peer above an absolute utility value. However, in many other applications, before a peer attempts to discover a high utility peer, the peer needs to estimate the distribution of peer utility in the system in order to know what constitutes high utility in a running system. For example, if an application requires the selection of the most stable peers in the system, it needs to learn the peer stability characteristics before it can decide on the stability threshold for a super-peer. Otherwise, if the super-peer threshold is

76

J. Sacha et al.

static (hardwired), it may happen that no peer in the system satisfies the criteria, or that the threshold is very low and hence sub-optimal. Moreover, due to the system’s dynamism, the super-peer selection criteria has to be continuously adapted to the system’s current state and peer availability. In the remainder of this paper, we describe a set of algorithms that provide solutions to the problems highlighted above. A decentralised aggregation technique is shown that allows peers to estimate the distribution of peer utility in the system and from this to identify an adaptive super-peer selection threshold. The gradient topology and gradient search heuristic are shown that enable the efficient discovery of (super)peers above a given utility threshold.

4

Aggregation Algorithm

Our approach to aggregation is based on the algorithms described by Kempe [20] and Montresor [21]. We adopt a push-based gossip model, since it can be implemented using asynchronous message passing and does not require synchronisation between peers. In our approach, each peer continuously maintains estimates of a number of system properties by gossipping with neighbours. A peer, p , has an estimate of the current number of peers in the system, Np , the maximum peer utility in the system, M axp , and a cumulative histogram of peer utility values, Hp . Each of these values approximate the true system properties N ∗ , M ax∗ , and H ∗ . The cumulative utility histogram with B bins of width w is defined as     (1) H(i) = {p | U (p) ≥ i · w} for 1 ≤ i ≤ B. Parameter B is also called the histogram resolution. The histogram is a discrete approximation of the peer utility distribution in B points, where each bin corresponds to a single point of the distribution function. Peers joining the network contact any peer already in the system and obtain an initial set of neighbours and a current approximation of N ∗ , M ax∗ , and H ∗ . A newly joining peer has minimum utility, which is zero, and the maximum utility of any peer is unbounded. The number of histogram bins, B, is constant in the algorithm. Peers periodically execute a gossip-based algorithm, where at one step (or round ) of the algorithm a peer can send (push) messages to a number of neighbours and receive messages sent by its neighbours in the previous round. A sequence of steps that leads to a new approximation of N ∗ , M ax∗ , and H ∗ is called an aggregation epoch. An epoch can be potentially initiated by any peer at any time step, and the information about the newly created epoch is gradually propagated to all peers in the system. In order to distinguish between different, possibly overlapping, epochs, each epoch is tagged by a unique identifier selected by the initiating peer. Every peer p maintains a cache cachep that stores the identifiers of aggregation epochs that this peer has participated in. The duration of an epoch is delimited by a time-to-live value. At the end of an epoch, every peer p updates its estimates Np , M axp , and Hp .

Using Aggregation for Adaptive Super-Peer Discovery

77

The algorithm performed at each step by a peer p is shown in Figure 1. In line 1, 1 peer p starts a new aggregation epoch with probability F ·N . Thus, a new epoch is p 1 started by the system with average frequency F (every F time steps). The epoch is initiated by creating an aggregation message with a new epoch id and a weight w = 1, as specified by Kempe. The ttl field is initialised with an O(log(Np )) value, since informally speaking, the propagation speed of push-based epidemics is exponential and requires only O(log(Np )) steps with high probability to reach all N Max peers in the system [20]. The histogram bin width is calculated as hw = B p . ∗ Furthermore, aggregation messages include a field used to estimate N labelled n, a field used to estimate M ax∗ labelled max, and finally, a histogram, h, consisting of B entries representing individual histogram bins. By combining all aggregation

Algorithm 1. Aggregation algorithm at a peer p at round t. 1 F ·Np

 send message rand(), T T L, 1, 0, 0,

 M axp ,0 B

1

with probability

2

forall epoch identifiers id do let {mi }id be messages received at round t − 1 with identifier id M ← |{mi }id | let (idi , ttli , wi , ni , maxi , hwi , hi ) be mi  ttli ttl ← M Mi=1 M w ← i=1 wi  n← M i=1 ni max ←max(maxi) for 0 ≤ j < B do  h(j) ← M i=1 hi (j) end if id ∈ / cachep then n ←n+1 max ←max(U (p), max) (p) for 1 ≤ j ≤  Uhw  do h(j) ← h(j) + 1 end end cachep ← cachep ∪ {id} if ttl < 1 then n Np ← w M axp ← max Hp (j) ← h(j) w end else m ← (id, ttl − 1, w2 , n2 , max, hw, h) send m to a random neighbours and to self end end

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Fig. 1. Aggregation algorithm at peer p at time t

to self

78

J. Sacha et al.

information in one message, the algorithm reduces the total number of messages generated, and thus limits the network traffic generated. For a 100-bin histogram, the aggregation message size is below 1KB. In lines 2-12 of Figure 1, a peer performs the aggregation of received messages. A peer that receives an aggregation message with a new epoch identifier, i.e., with an id field that is not stored in the cache, joins this new aggregation (lines 13-20) by adding the value of 1 to its n field and to all histogram bins according to formula (1). If the ttl value is less than 1 (indicating the end of the epoch), a peer updates its current estimates of the system properties (lines 21-25). Otherwise, the peer emits a message to a random neighbour and to itself so that this peer will continue to participate during the next aggregation round (lines 26-30). The algorithm exhibits the property of mass conservation defined by Kempe [20] provided that no peers fail during an aggregation epoch. At any time step, for each aggregation epoch, the sum of  the weights of all aggregation messages in the system is always equal to one, i.e., N i wi=1 = 1. Furthermore, the sum of n fields of all messages is equal to the number of peers participating in the aggregation, the maximum of max fields is equal to the maximum utility among peers participating in the aggregation, the average value of ttl fields Nof all messages at subsequent rounds decreases by one, and for 1 ≤ j ≤ B, i=1 hi (j) = H ∗ (j), where H ∗ is the utility histogram for peers participating in the aggregation. In order to ensure mass conservation, each peer leaving the system is required to perform a leave procedure shown in Figure 2. In lines 1-11 of this figure, a peer aggregates currently buffered messages (as in lines 2-12 of Figure 1). In lines

Algorithm 2. Leave procedure at peer p 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

forall epoch identifiers id do let {mi }id be messages received at previous round with identifier id M ← |{mi }id | let (idi , ttli , wi , ni , maxi , hwi , hi ) be mi  ttli ttl ← M Mi=1 M w ← i=1 wi  n← M i=1 ni max ←max(maxi) for 0 ≤ j < B do  h(j) ← M i=1 hi (j) end n ← n−1 u for 1 ≤ j ≤  hw  do h(j) ← h(j) − 1 end m ← (id, ttl − 1, w, n, max, hw, h) send m to a random neighbour end Fig. 2. Leave procedure at peer p

Using Aggregation for Adaptive Super-Peer Discovery

79

12-15 of Figure 2, the peer subtracts the value of 1 from the n field and from the histogram bins. Finally, in lines 16-18, the peer sends a message containing the aggregated values to a random neighbour. During one round of the aggregation algorithm, each peer participating in an epoch generates one aggregation message. The epochs are initiated on average every F rounds (frequency F1 ), and since each epoch lasts on average T T L rounds, the average number of aggregation messages generated and received by each peer in one round is bounded by O( T FT L ), or O( F1 log(N )) if T T L is O(log(N )).

5

Gradient Topology

In this section, we introduce the self-organising gradient P2P topology and we outline its main properties. We briefly discuss the neighbour selection algorithm that generates the gradient topology. The topology is exploited by the gradient search heuristic. The gradient topology is a P2P topology where the highest utility peers are connected with each other and form a core in the system, while lower utility peers are located gradually farther from the core. The core, which clusters the highest utility peers in the system, corresponds to a set of super-peers in the system. Figure 3 shows a visualisation of a gradient topology. The position of each peer in the topology is determined by the peer’s utility. We have designed and evaluated a self-organising neighbour selection algorithm that generates the gradient topology in a completely decentralised P2P

Fig. 3. Visualisation of a gradient topology

80

J. Sacha et al.

environment. Each peer p maintains two sets of neighbours, a similarity-based set, Sp , and a random set, Rp . Peers periodically gossip with each other and exchange their sets. On receiving both sets from a neighbour, a gossipping peer selects one entry whose utility level is closest to its own utility and replaces an entry in its similarity-based set. This behaviour clusters peers with similar utility characteristics and generates the core of the network surrounded by peers with gradually decreasing utility. In addition, a gossipping peer randomly selects an entry from the received random set and replaces a random entry in its random set. Connections to random peers allow peers to explore the network in order to discover other potentially similar neighbours. This significantly reduces the possibility of more than one cluster of high utility peers forming in the network. Random connections also reduce the possibility of the gradient topology partitioning due to excessive clustering. Moreover, random connections between peers are used by the aggregation algorithm described in section 4. Peer p removes a random entry from Rp or Sp if the number of entries in the sets exceeds the maximum allowed number of connections. In addition to the neighbour sets, a peer p maintains a cache Up that stores an estimated utility value, Up (q), for each neighbour q. Entries in the cache are timestamped and peers exchange these entries whenever they gossip. Our initial evaluation of the neighbour selection algorithm, described in [23], shows that the algorithm generates a P2P topology with a very small diameter (an order of 5-6 hops for 100,000 peers) and that it has a global gradient structure. The emergence of a gradient topology is a result of the system’s selforganisation. Peers are independent, have limited knowledge about the system and interact with a limited number of neighbours. Utility can be considered as a microscopic property of a peer which enables through peer interaction the construction of the macroscopic gradient structure.

6

Gradient Search

The gradient structure of the topology allows us to develop an efficient search heuristic, called gradient search, that enables the discovery of high utility peers in the system. The algorithm exploits the information contained in the topology to limit the search space to a relatively small subset of peers and to achieve a significantly better search performance than traditional search techniques, such as random walking [24]. The goal of the search algorithm is to deliver a message from any peer in the system to a super-peer in the core, i.e., to a peer with utility above a certain threshold. The value of the threshold is assigned by a peer that initiates the search and can be calculated using the utility histogram generated by the aggregation algorithm described in section 4. The threshold is included in the search message. A peer below the specified utility threshold forwards search messages to higher utility peers until a peer is found whose utility is above the threshold.

Using Aggregation for Adaptive Super-Peer Discovery

81

Each message is associated with a time-to-live (TTL) value that determines the maximum number of hops the message can be propagated. In gradient search, each peer greedily forwards messages to its highest utility neighbour, i.e., to a neighbour q whose utility is equal to   maxx∈Sp ∪Rp Up (x) Thus, messages are forwarded along the utility gradient, as in hill climbing and other similar techniques. It is important to note that the gradient search strategy is generally applicable only to a gradient topology. It assumes that a higher utility peer is closer to the core in terms of the number of hops than a lower utility peer. Local maxima should never occur in an idealised gradient topology, however, every P2P system is under constant churn and the gradient topology may undergo local perturbations from the idealised structure. In order to prevent message looping in the presence of such local maxima, a list of visited peers is appended to each search message, and a constraint is imposed that messages are never forwarded to already visited peers.

7

Experimental Evaluation

In this section, we describe our experimental setup and present the results of our experiments. The experiments evaluate the precision of the aggregation algorithm and the performance of gradient search. The following notation and metrics are used. In the first experiment, we measure the average error in histogram estimation, ErrH , defined as T t=1

ErrH (T ) =

Nt∗

D(Ht∗ , Hp,t ) T · Nt∗

p=1

(2)

where Nt∗ , Np,t , M ax∗t , M axp,t , Ht∗ and Hp,t correspond to N ∗ , Np , M ax∗ , M axp , H ∗ and Hp at time t of the experiment, T is the duration of the experiment, and D is a histogram distance function defined as D(Ht∗ , Hp,t )

B =

i=1

|Ht∗ (i) − Hp,t (i)| B ∗ i=1 Ht (i)

(3)

Similarly, we define ErrN as the average error in the estimation of Nt∗ , and ErrMax as the average error in the estimation of M ax∗t over the course of the experiment. In the second experiment, we compare the performance of gradient search with random walking by measuring two properties of both algorithms. We calculate the average number of hops in which the algorithms deliver a search message from a random peer in the network to a super-peer in the core, i.e., to a peer above a certain utility threshold, and we measure the search success rate, i.e., the percentage of search messages that are delivered to super-peers. We evaluate the aggregation and search algorithms in a discrete event simulator. An individual experiment consists of a set of peers, connections between

82

J. Sacha et al.

peers, and messages passed between peers. We assume all peers are mutually reachable, i.e., any pair of peers can establish a connection. We also assume that it takes exactly one time step to pass a message between a pair of connected peers. We do not model network congestion, however, we limit the maximum number of concurrent connections per peer. In order to reflect network heterogeneity, we limit the number of peer connections according to the Pareto distribution (power law) with an exponent of 1.5 and a mean of 24 connections per peer. The simulated P2P network is under constant churn. Every new peer p is assigned a session duration, sp , according to the Pareto distribution with an exponent of γ = 1.5 and minimum value smin . Thus, the expected session duration smin is given by the formula E(s) = γ γ−1 . We calculate the churn rate as the fraction of peers that leave (or join) the system at one step of the simulation. Over the lifetime of a running system, the average churn rate, E(c), is equal to the inverse of the expected peer session time E(s), therefore, in order to simulate a churn rate, c, in the system, we set smin to smin =

γ−1 γ·c

(4)

We assume that 10% of peers leave the system without performing the leave procedure of the aggregation algorithm (i.e., they crash). A central bootstrap server is used that stores the addresses of peers that have most recently joined the network. The list includes “dangling references” to peers that may have already left the system. Every joining peer receives an initial random set of 20 neighbours from the bootstrap server. If a peer becomes isolated from the network (i.e., has no neighbours), it is bootstrapped again. The bootstrap server executes the aggregation algorithm and provides initial estimates of N ∗ , M ax∗ , and H ∗ , for peers entering the system. We start each individual experiment from a network consisting of a single peer. The number of peers is increased by one percent at each time step, until the network grows to the size required by the experiment. Afterwards, the network is still under continuous churn, however, the rate of arrivals is equal to the rate of departures and the number of peers in the system remains constant. Each peer continuously performs the neighbour selection and aggregation algorithms at every time step after it is bootstrapped. Additionally, at each turn, a number of randomly selected peers emit search messages that are routed using gradient search. All peers attempt to either deliver search messages, if their utility is higher than the specified threshold, or forward messages to neighbours. For the purpose of the simulation, in all experiments, the number of bins in the utility histogram is 100 , the aggregation frequency parameter F is 10 (except figure 4), and T T L is set to 3 · log(N ) + 10 hops. The utility function of a peer p with uptime u and d maximum connections with neighbours is defined as U (p) = d · log(u + 1). Figure 4 shows the average precision of N ∗ estimation as a function of time and compares the results obtained for three different values of F . The best approximation, close to N ∗ , is obtained for F = 10. Random fluctuations are visible.

Using Aggregation for Adaptive Super-Peer Discovery

83

12000 N* F=100 F=30 F=10

Number of Peers

10000 8000 6000 4000 2000 0 0

1000

2000 Time Steps

3000

4000

Fig. 4. Average estimation of the number of peers in the system (N) as a function of time. Three experiments are compared, with the frequency of aggregation (F) set to 100, 30, and 10 time steps. 1 N Max Histogram

0.8

Relative Estimation Error

Relative Estimation Error

1

0.6

0.4

0.2

0

N Max Histogram

0.8

0.6

0.4

0.2

0 0

100

200

300

400

Substituted Peers per Time Step

(a)

500

0

10000

20000

30000

40000

50000

Number of Peers

(b)

Fig. 5. Average estimation error of the number of peers in the system (N), maximum utility (Max), and the utility distribution (Histogram) as a function of peer churn rate (a) and network size (b)

Figure 5 shows the average error of the aggregation algorithm, ErrN , ErrMax , and ErrH , as a function of the churn rate and as a function of the network size. The variance is not shown as it is approximately two orders of magnitude lower than the plotted values. The churn rate is measured as the number of substituted peers per time step. The estimation of M ax∗ is the most precise as the algorithm for the maximum calculation is simpler compared to the algorithm for H ∗ and N ∗ estimation. H ∗ approximation is less accurate than N ∗ since the histogram changes more dynamically than the number of peers. The relative error as a function of the number of peers is bounded as the number of rounds in the

84

J. Sacha et al. 60

1 Gradient Search Random Walk

Gradient Search Random Walk

Search Success Rate

Search Hop Count

50 40 30 20

0.8

0.6

0.4

0.2

10 0

0 0

20000

40000 60000 Number of Peers

(a)

80000

100000

0

0.02

0.04 0.06 Churn Rate

0.08

0.1

(b)

Fig. 6. Average hop count of search messages delivered to peers above 1% utility threshold as a function of the number of peers in the system (a) and average success rate of searching for peers above 1% utility threshold as a function of peer churn rate (b). Gradient search is compared with random walking.

epoch is proportional to log(N ), which corresponds to the theoretical analysis of Kempe. Figure 6(a) shows the average hop count of search messages delivered to peers above 1% utility threshold as a function of the number of peers in the system. Figure 6(b) shows the average success rate of searching for peers above the same utility threshold as a function of peer churn rate. Both figures demonstrate superior performance of gradient search over random walk.

8

Conclusions

In this paper we have shown that the combination of a peer utility metric, aggregation techniques, and gradient topology with gradient searching allows the discovery of super-peers in peer-to-peer environments. Decentralised aggregation techniques reduce the uncertainty about the system by approximating peer utility distribution, and enable the decentralised and adaptive calculation of superpeer utility thresholds. The neighbour selection algorithm used in the gradient topology allows peers to self-organise themselves and to create a system-level gradient structure based on a local peer utility metric. The information contained in the topology enables the efficient searching for (super)peers above a given utility threshold.

References 1. Sen, S., Wong, J.: Analyzing peer-to-peer traffic across large networks. Transactions on Networking 12 (2004) 219–232 2. Gummadi, K.P., Dunn, R.J., Saroiu, S., Gribble, S.D., Levy, H.M., Zahorjan, J.: Measurement, modeling, and analysis of a peer-to-peer file-sharing workload. In: Proceedings of Symposium on Operating Systems Principles. (2003) 314–329

Using Aggregation for Adaptive Super-Peer Discovery

85

3. Leibowitz, N., Ripeanu, M., Wierzbicki, A.: Deconstructing the kazaa network. In: Proceedings of the 3rd International Workshop on Internet Applications. (2003) 112–120 4. Bhagwan, R., Savage, S., Voelker, G.M.: Understanding availability. In: the 2nd International Workshop on Peer-to-Peer Systems. (2003) 5. Rhea, S., Geels, D., Roscoe, T., Kubiatowicz, J.: Handling churn in a dht. In: Proceedings of the USENIX 2004 Annual Technical Conference. (2004) 127–140 6. Yang, B., Garcia-Molina, H.: Designing a super-peer network. In: Proceedings of the 19th International Conference on Data Engineering. (2003) 49–60 7. Montresor, A.: A robust protocol for building superpeer overlay topologies. In: Proceedings of the 4th International Conference on Peer-to-Peer Computing. (2004) 202–209 8. Zhao, B.Y., Duan, Y., Huang, L., Joseph, A.D., Kubiatowicz, J.D.: Brocade: Landmark routing on overlay networks. In: Proceedings of the 1st International Workshop on Peer-to-Peer Systems. (2002) 34–44 9. Mizrak, A.T., Cheng, Y., Kumar, V., Savage, S.: Structured superpeers: Leveraging heterogeneity to provide constant-time lookup. In: Proceedings of the 3rd IEEE Workshop on Internet Applications. (2003) 104–111 10. Rao, A., Lakshminarayanan, K., Surana, S., Karp, R., Stoica, I.: Load balancing in structured p2p systems. In: the 2nd International Workshop on Peer-to-Peer Systems. (2003) 11. Garcia-Molina, H.: Elections in a distributed computing system. IEEE Transactions on Computers 31(1) (1982) 48–59 12. Robbert van Renesse, K.P.B., Maffeis, S.: Horus, a flexible group communication system. Communications of the ACM 39(4) (1996) 76–83 13. Yang, B., Garcia-Molina, H.: Improving search in peer-to-peer networks. In: Proceedings of the 22nd International Conference on Distributed Computing Systems. (2002) 5–14 14. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for internet applications. SIGCOMM Computer Communication Review 31(4) (2001) 149–160 15. Ratnasamy, S., Francis, P., Handley, M., Karp, R., Schenker, S.: A scalable contentaddressable network. In: Proceedings of the Conference on Applications, Technologies, Trchitectures, and Protocols for Computer Communications. (2001) 161–172 16. Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Proceedings of the 18th International Conference on Distributed Systems Platforms. (2001) 329–350 17. Manku, G.S., Bawa, M., Raghavan, P.: Symphony: Distributed hashing in a small world. In: Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems. (2003) 127–140 18. Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Weimer, W., Wells, C., Zhao, B.: Oceanstore: An architecture for global-scale persistent storage. In: Proceedings of the 9th international Conference on Architectural Support for Programming Languages and Operating Systems. (2000) 190–201 19. Jelasity, M., Babaoglu, O.: T-man: Gossip-based overlay topology management. In: the 3rd International Workshop on Engineering Self-Organising Applications. (2005) 20. Kempe, D., Dobra, A., Gehrke, J.: Gossip-based computation of aggregate information. In: Proceedings of the 44th IEEE Symposium on Foundations of Computer Science. (2003) 482–491

86

J. Sacha et al.

21. Montresor, A., Jelasity, M., Babaoglu, O.: Robust aggregation protocols for largescale overlay networks. In: Proceedings of the International Conference on Dependable Systems and Networks. (2004) 19–28 22. Jelasity, M., Montresor, A.: Epidemic-style proactive aggregation in large overlay networks. In: Proceedings of the 24th International Conference on Distributed Computing Systems. (2004) 102–109 23. Sacha, J., Dowling, J.: A self-organising topology for master-slave replication in p2p environments. In: Proceedings of the 3rd International Workshop on Databases, Information Systems and Peer-to-Peer Computing. (2005) 52–64 24. Sacha, J., Dowling, J., Cunningham, R., Meier, R.: Discovery of stable peers in a self-organising peer-to-peer gradient topology. In: Proceedings of the 6th IFIP International Conference on Distributed Applications and Interoperable Systems. (2006) To appear.

Self-Adaptive Applications Using ADL Contracts Leonardo Cardoso1, Alexandre Sztajnberg2, and Orlando Loques1 1 Instituto

de Computação, Universidade Federal Fluminanse (UFF), Rua Passo da Pátria, 156, Niterói - Brazil {cardoso, loques}@ic.uff.br 2 DICC/IME and PEL/FEN, Universidade do Estado do Rio de Janeiro (UERJ), Rua S. F Xavier, 524 – Maracanã, Rio de Janeiro - Brazil [email protected]

Abstract. This paper presents a comprehensive approach to facilitate the specification, deployment and self-management of application architectures having context-sensitive non-functional requirements. The approach is centered on high-level contracts, associated to architectural descriptions, which are used to specify execution context requirements and to control configuration adaptations in the application supporting infrastructure. A videoconference application is used to present the proposal’s relevant features and to evaluate the approach through an implementation.

1 Introduction The operation and utility of many distributed application architectures depend on context sensitive non-functional requirements, which must be enforced in addition to their basic functional requirements. In our terms, context-sensitive non-functional requirements express concerns related to resources (e.g., processing, memory and communication), features (e.g., fault tolerance and security) or other services (e.g., location services, game providers, streamers, codecs) that complement the support infrastructure for a given application architecture. Thus, considering that the same basic functional architecture can be deployed in different dynamic contexts, and that the non-functional requirements have some degree of flexibility, it is interesting to continuously adapt the architecture for each particular context. This idea is present in many areas of current concern, among them context-sensitive and autonomic computing. On the other hand, given that component technology is currently in a mature stage, a lot of effort is being put on characterizing diverse non-functional aspects associated to typical architectural level components (clients, servers and interaction mechanisms). This characterization provides the foundations to design description (or specification) languages that provide abstractions to formalize the non-functional requirements of a given application architecture. Such special purpose languages allow, starting from the architectural level non-functional requirements, to describe contracts establishing the individual and mutual requirements of the entities composing a given application architecture. Probably, in a dynamic context, there will be different configuration options that comply with the demands of a given architectural level contract. Thus, in order to A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 87 – 101, 2006. © Springer-Verlag Berlin Heidelberg 2006

88

L. Cardoso, A. Sztajnberg, and O. Loques

support a self-adaptive architecture, it is worth defining a set of configuration options, one of which can be initially deployed, and others that can be deployed in different contextual situations, which may happen during the application life time. Thus, as a further feature, the contract description language can include an abstraction that allows describing a set of different configuration options, which satisfy, with different quality degrees, the requirements of the application. (In our proposal, we use the word service to name each particular configuration option in a given contract, see section 3.) Once a configuration option is selected, it can be used to guide adaptations in the supporting infrastructure, in order to guarantee the non-functional properties for the application. Based on the concepts previously described, we present the CR-RIO approach [12], and discuss in this paper how it caters for the deployment and management of self-adaptive context-aware applications. Besides the introduction and the conclusion, the paper presents the elements of our approach (Section 2), a prototype videoconference application, which provides a basis for discussing some aspects of our approach (Section 3), and implementation details and some evaluation results (Section 4). We also present related work in Section 5.

2 The CR-RIO Approach The CR-RIO (Contractual-Reflective Reconfigurable Objects) approach is centered on an architectural model and uses a contract description language (CBabel) to express the application’s context sensitive non-functional requirements. Based on these elements a supporting infrastructure was developed to: (i) parse the contract specifications and store them as meta-level data associated to the application, (ii) provide reflective and adaptive mechanisms, which allows adapting the application’s configuration (including its elements of support), in order to cope with the contract demands, and (iii) provide a set of mechanisms to interpret, impose, monitor and manage these contracts. The highlight of the approach is the immediate association between the application’s architectural components and the non-functional requirements described in a contract. This allows the designer to describe, with the required granularity, in which part of the application’s component configuration a given contract has to be imposed. In addition, adaptations that have to be performed to maintain a service quality level described in a contract, are also made explicit over the application’s architecture. This exposition also facilitates the use of formal techniques to verify structural and consistency properties of contracts [4]. 2.1 Contracts In our approach, non-functional requirements, which impose constraints to nonspecialized activities that the application may have, are described by contracts [2]. A contract describes at design time the non-functional aspects of the applications, specifying how supporting resources should be used in operation time, and the acceptable variations on the availability of theses resources. An application may depend on more than one contract; the semantics defined by each contract is imposed in operating time by a middleware composed by a standard set of components (Section 2.2). Similarly to described in [6], our contract concept has the following elements:

Self-Adaptive Applications Using ADL Contracts

89

a) Categories, which describe, in an abstract level, properties of resources, services, or specific non-functional aspects, typically associated to the context where the applications execute. For instance, processing, memory or communication resources can have an associated Category. Operational aspects, such as fault-tolerance and cryptography, and less tangible aspects such as price (“cheap”, “expensive”), quality (“good”, “avg”, “low”), physical position (inside, outside), can also be described in categories. Categories are associated to the entities that will implement, manage, monitor, or give access to properties of resources, services, or specific features, available in the supporting infrastructure. It is possible to compose or specialize Category descriptions to meet specific requirements of the entities of interest in a given application. b) Profiles, which quantify and valuate the properties of a given Category. The quantification constrains each property according to its description, functioning as an instance of acceptable values for a given Category. Components or parts of the application’s architecture can define profiles constraining their required operating context with a desired granularity. c) A set of services, where each service defines a set of constraints, which are acceptable by the application, related to architecture level entities. This is accomplished by associating one or more profiles to components or to interaction channels (used to connect components) of an application’s architecture. In this way, the desired or acceptable level of quality associated to a service is differentiated from the others by the properties (and their values) declared in the profiles. A service can be deployed only if all of its associated profiles are valid. Each service defines a possible operating state for the application. d) A negotiation clause that describes a policy, defined by a state machine, which establishes an arbitrary order to deploy the services. According to the described policy, when a service with higher preference can no longer be maintained, the contract management entity will try to deploy a service with lower preference. The return to a service of higher preference can be also described, allowing a service with better quality to be deployed if the associated profiles become valid. 2.2 Category Description This section describes the categories to be used in the videoconference application (Section 3). Each category contains the properties of interest, and the characteristics of these properties (Fig. 1). The LocalResc category represents the resources from the local system that will be allocated / monitored in order to support the quality expected by the user. The Transport category defines the transport and communication characteristics, such as bandwidth, delay and jitter. The categories related to audio and video resources, which are described in a simplified form in Fig.1, contain the properties than can be selected in typical implementations. When instantiating a videoconference terminal, for instance, the audio and video profiles can select specific values for the audio (e.g, codec:G711; sampleRate:8000) and video properties (say, codec:H261; frameRate:20). These values will be used by associated Resource Agents to allocate the necessary resources and by the Configurator to create the component instances with the adequate parameters (please, see next subsection).

90

L. Cardoso, A. Sztajnberg, and O. Loques

Category LocalResc { utilization: decreasing numeric %; clockFreq: increasing numeric MHz; memReq: increasing numeric Mbytes; } Category Transport { bandwidth: increasing numeric Mbps; delay: decreasing numeric ms; jitter: decreasing numeric; } Category VMedia { codec: enum (H261, H263, MJPEG); quality: enum (LOW, MEDIUM, HIGH); size: enum (CIF, QCIF); frameRate: increasing numeric fps; } Category AMedia { codec: enum (G711, G723, GSM, DVI); sampleLenght: enum (8, 16); // bits sampleRate: increasing numeric Hz; channels: enum (MONO, STEREO); } Fig. 1. Categories for the videoconference application

2.3 Support Infrastructure Architectures described in CBabel are mapped to an object model [5]. This model is reflected on a meta-level repository, which maintains the configuration information of the architecture that can be queried and updated during the application’s life. This repository also includes the representation of the contracts and maintains information regarding resources (devices, services, applications, etc.) that are of interest of the applications. Based on the contracts, which use the information contained in the repository, a supporting infrastructure is used to manage the architectural configurations. This infrastructure is composed of a standard set of components (Fig. 2), described next: Non-functional constraints Contract Manager

Contract

Configurations

Resource-level requests Contractor

Constraints violation

Measured values

Resource Agent

Configurator

Fig. 2. Component of the supporting infrastructure

Contract Manager (CM). The CM interprets the contracts, already mapped to the meta-level repository and extracts from them the information regarding the services, respective profiles and the service negotiation state machine. During the service deployment process the CM commands the Contractors to verify if the constraints defined in the service profiles can be satisfied. If the CM is notified, by any Contractor, with the information that a profile cannot be (or can no longer be) satisfied, the current service is invalidated and a new one should be selected. The CM can also initiate

Self-Adaptive Applications Using ADL Contracts

91

a new negotiation procedure when the resources to deploy a more preferable service become (again) available. Contractor. Manages and mediates the monitoring process of the properties of the basic elements (mechanisms, resources or low-level services associated to Categories) specified in the profiles. A Contractor interacts with the Resource Agents (RA) obtaining and evaluating the measured values of the properties. Violations and validations of the profiles are notified to the CM. Resource Agent (RA). Categories are associated to RAs, which encapsulate the access to the supporting basic elements, providing interfaces to manage and monitor values of the associated properties. The acquired values are transferred to the Contractor when significant changes are detected. Configurator. Element responsible for mapping architectural descriptions into actions that configure the architecture. It can initiate an application (if it is not running yet) and execute commands received from de CM regarding services management. The Configurator provides a configuration API, which allows instantiating, linking, stopping and substituting components during the application’s operation. These operations are reflected into a persistent meta-level repository, which maintains the application’s context and can be queried through an architectural reflection API.

3 Videoconference Application In this section we present a videoconference application to expose details of the proposed approach. The considered application contains the usual elements: (i) a directory/session service, which performs the session (room) registration and control, such as the H323 standard Gatekeeper [19]; (ii) a audio and video flow redistribution element, equivalent to the H323’s MCU (without the transcoding function), or to reflectors from systems such as the VRVS [1]; and (iii) user’s terminals that perform the capture, display and transmission of the audio and video medias. The basic scenario for the videoconference application comprises an arbitrary number of users distributed over a network such as the Internet, with an overlay network used to provide multipoint communication through point-to-point connections (Fig. 3(a)). The overlay network architecture is formed by reflectors (Fig. 3(b)), interconnected by communication connectors, which relay the audio and video flows. More details of this architecture are presented in sections 3.1 and 3.2. c2

c1

c2

c1 r1

Overlay Network r2

c3

c4

(a)

r3

c3

c4

(b)

Fig. 3. (a) Basic architecture; (b) Overlay network detailed

92

L. Cardoso, A. Sztajnberg, and O. Loques

The teleconference application can have specific availability and reliability requirements. To cater for this case, we can add fault-tolerance to the overlay network and self-repair for its individual links. This is accomplished by a set of contracts and services, as described in section 3.3. Also, for clients, different forms of adaptation can be planned, such as: (i) for a static set of participating clients of a session, select a set of reflectors aiming to minimize the global communication delay; (ii) for a dynamic set of clients, when admitting a new client, the support can select, from the available reflector set, a given one that based on a policy optimizes performance parameters, e.g., delay and network throughput. Contracts for clients are presented in section 3.4. 3.1 Basic Architecture The basic videoconference application architecture is described in CBabel as in Fig. 4. The overlay network (a module in CBabel terms) is described stating each reflector’s instantiation (lines 3-5) and the connection between reflectors (lines 6-7) that establishes the topology for the flow relay. The terminal references are then specified (line 9) and also their link to the overlay network (link 10). All the elements are described in scope of the module teleconf, which will be used as a reference to the application’s context.

01 module teleconf { 02 module ovNet { 03 instantiate reflector as refUFF at caueira.uff.br; 04 instantiate reflector as refUERJ at rio.uerj.br; 05 instantiate reflector as refLMPD at lmpd.uerj.br; 06 link refUFF to refUERJ by comSock; 07 link refUERJ to refLMPD by comSock; 08 } 09 instantiate Term as c1, c2, c3, c4; 10 link c1, c2, c3, c4 to ovNet; 11 } Fig. 4. Videoconference application architecture

Once described, the application architecture can be actually deployed. But, besides the application functional aspects we are interested in non-functional aspects regarding operational requirements, such as the quality of the video and audio flows required by the user, and the overlay network capability to forward the flow of various sessions. In this context, those requirements should be specified, monitored and imposed by the supporting environment, as discussed previously. In our approach this is achieved by establishing contracts (including services) associated to different parts of the overall architecture. In this application architecture, specifically, the overlay network and the terminals used by the teleconference users are associated to one or more contracts. 3.2 Contract for a Single Overlay Network The contract for a single overlay network requires only one service viaInternet (Fig. 5), which specifies the minimum resources (CPU and memory) required to run the

Self-Adaptive Applications Using ADL Contracts

93

01 service { 02 instantiate reflector as refUFF at r1.uff.br with refLProf; 03 instantiate reflector as refUERJ at r2.uerj.br with refLProf; 04 instantiate reflector as refLMPD at r3.uerj.br with refLProf; 05 link refUFF to refUERJ by comSock with refCProf; 06 link refUERJ to refLMPD by comSock with refCProf; 07} viaInternet; Fig. 5. viaInternet service for the overlay network

reflectors (in this case, the refLProf profile, lines 2-4); and the minimum resources required by the communication channels interconnecting them, defined according to the transport requirements to provide the service (profile refCProf, lines 5-6). The foreseen demand for local resources by each reflector is defined in the refLProf, Fig. 6. In this profile the admitted values for some properties of the LocalResc category are defined. When the refUFF reflector (Fig. 5, line 2) is to be instantiated, for example, these characteristics are previously verified. The desired characteristics for the communication channels that interconnect each pair of reflectors are defined in the refCProf profile. The delay and bandwidth requirements are configured proportionally to the maximum number of users and simultaneous sessions, and the type of the audio and video coding to be used. profile { LocalResc.utilization: 50; LocalResc.clockFreq: 2800; LocalResc.memReq: 512; } refLProf;

profile{ Transport.delay: 50; Transport.bandwith: 16000; } refCProf;

Fig. 6. LocalResc and Transport profiles for the reflectors

It is worth noting that resource reservation is not always possible. In many systems it is only possible to perform an initial allocation of resources, which can be actually shared, or even relocated to applications with higher priority. Thus, during operation, the defined characteristics could be monitored and, in case of violation, an adaptive mechanism (associated to a service) can be started, e.g., to introduce a new reflector in the overlay network to balance the load. 3.3 Contracts for Fault-Tolerance and Self-repair Now we present two contracts used to introduce fault-tolerance (by duplication) and link-level self-repair characteristics on the overlay network. Alternative overlay network. In our experiments we worked with two sets of reflectors: one of them used links from the commodity Internet and the other Giga bit dedicated channels, connecting the UFF, the UERJ and the LAMPADA Project. In the contract described in Fig. 7, in addition to the viaInternet service (described in Fig. 5),

94

L. Cardoso, A. Sztajnberg, and O. Loques

we have the viaGIGA service, which is also associated to profiles, indicating its non-functional requirements. The idea is that “overlay providers” offer differentiated access services, and the user, using a contract, can specify a policy for choosing one of those services. For instance, the viaInternet service should be preferably used – given that the non-functional requirements are being met. In the negative case, the viaGIGA service, with better resources, but more expensive, is negotiated and is made the current service. If neither service can be deployed the videoconference session cannot be established. This policy is defined in the negotiation clause in the contract (lines 18-21). 01 contract { 02 context @teleconf; 03 service {... @teleconf = _viaInternet} viaInternet; 11 service {... @teleconf = _viaGIGA} viaGIGA; 18 negotiation { 19 not viaInternet -> viaGIGA; 20 not viaGIGA -> out-of-service; } 21 } cAlt1Ovl; Fig. 7. Contract providing fault tolerance

The overlay network selected in the contract becomes part of the context, which is used to make management decisions. To allow the composition with other contracts of the videoconference application (for instance, the contracts for the terminals, see Section 3.4), the @teleconf variable (line 2) is updated with the reference of the selected network (lines 3 and 11); establishing a path to access it’s information, e.g., the identity of its component reflectors. prim

R1

prim

R2 alt

R1

R2

(a)

alt

11 service { 12 link refUFF to refUERJ by comSock1 with refCProf; 13 } primeLink; 14 service { 15 link refUFF to refUERJ by comSock2 with refCProf; 16 } altLink; 17 negotiation { 18 not primeLink -> altLink; 19 altLink -> primeLink; 20 not altLink -> out-of-service; } 21 } cAltOvl;

(b)

Fig. 8. Contract to self-repair overlay links

Alternative link. In the case of saturation or fail of a link connecting two reflectors, the interconnection could be reconfigured using an alternative link (maintained in

Self-Adaptive Applications Using ADL Contracts

95

cold or hot standby), Fig. 8(a). This aspect can be handled in a specific contract, which describes the possible configuration of the link between two reflectors, as presented in Fig. 8(b), associated to the overlay network provided by the viaInternet service. The contract in Fig. 8(b) describes two link services. By using different communication connectors (comSock1 and comSock2) independent routes are used in each service. The negotiation clause states that the preferred service is primeLink. If this service is not available, the architecture is reconfigured to use the altLink service (line 18). If the primeLink service becomes available again it is established again (line 19). If neither service can be deployed this part of the architecture becomes no-operating (line 20), fact that will also be reflected in the whole overlay network. 3.4 Contracts for a Terminal Once the overlay service is deployed and available, user terminals can connect to this network to participate on a session. The access of the terminals is handled in another vision of the architecture. Each user terminal connects independently of the other terminals to a reflector in the overlay network. Each terminal is associated to a contract that is specified to cover the many involved requirements: (i) local resources characteristics; (ii) media quality, which depend on the chosen audio and video codecs; (iii) communication characteristics, to be observed in the link between the Term module and the associated reflector; (iv) videoconference session management. For brevity sake, we omit the details of this last profile, considering that the directory service performs an admission control, which limits the number of clients per session. Besides defining the quality requirements the user can have a pro-active role regarding the access to the videoconference session, by enhancing the contracts as follows: (a) Initially the selection of the reflector to which the terminal will be linked can be done by policies defined in the profiles. Some strategies can be used considering the properties of interest, described in the profiles of the service, such as: network delay, available bandwidth, monetary cost, etc, or a combination of these properties. In a more sophisticated option, the selection can be performed by a utility-function (referenced in the contract) provided by the designer (as proposed in [9]). In that way, calculations over a set of properties (e.g., clockFreq and utilization from the LocalResc category, delay and bandwidth from the Transport category) would guide the choice of the best reflector. (b) After the initial selection, during operation, the link connecting the terminal to the reflector begins to present a higher delay than can be tolerated, or the bandwidth becomes lower than the necessary. Foreseeing this case, the contract can include a service with lower quality, which consumes fewer resources (using H261 instead of the MJPEG; for instance), making it possible the interaction continuity. The instantiation of a terminal is ruled by the contract described in Fig. 9, where two services are specified: (a) sHQTerm: the preferential service, which allows the use of high-quality audio and video. The instantiation of the Term module (line 2) and the link of the terminal to the reflector (line 3) are constrained by profiles adequately dimensioned; (b) sLQTerm: lower quality service, to be used in the last case. Its equivalent to the sHQTerm service with less restrictive profiles (lines 6-7), which contains property values just sufficient to give support to this lower quality service.

96

L. Cardoso, A. Sztajnberg, and O. Loques

01 service { 02 instantiate Term with hqTermLProf, hqTermVProf, hqTermAProf; 03 link Term to @ref by commCon with hqTermCProf, @ref=select(hqRef, reflector@teleconf); 04 } sHQTerm; 05 service { 06 instantiate Term with lqTermLProf, lqTermVProf, lqTermAProf; 07 link Term to @ref by commCon with lqTermCProf; 08 } sLQTerm; 09 negotiation { 10 not sHQTerm -> sLQTerm; 11 not sLQTerm -> out-of-service; } Fig. 9. Providing dynamic adaptation for the terminal

Each service contains a description of the local resources and media requirements, regarding the instance of the Term module, described in the profiles *termLProf, *termVProf and *termAProf (line 2 for the sHQTerm service and line 6 for the sLQTerm service); the communication constraints are described in the *termCProf profile. These profiles are valuations of the properties of the LocalResc, Transport, VMedia and AMedia categories, previously defined. profile { LocalResc.utilization: 30; LocalResc.clockFreq: 1600; } hqTermLProf; profile { VMedia.codec: MJPEG; VMedia.resolution: CIF; } hqTermVProf; profile { AMedia.codec: G711; AMedia.sampleLenght: 16; } hqTermAProf; profile{ Transport.delay: 80; } hqTermCProf; profile { VMedia.codec: H261; VMedia.resolution: QCIF; } lqTermVProf; profile { AMedia.codec: DVI; AMedia.sampleLenght: 8; } lqTermAProf profile{ Transport.delay: 80; } lqTermCProf;

LocalResc.memReq: 512;

VMedia.frameRate: 24; VMedia.quality: HIGH;

(a) AMedia.channels: MONO; Media.sampleRate: 32000;

Transport.bandwith: 1600;

VMedia.frameRate: 14; VMedia.quality: MEDIUM;

AMedia.channels: MONO AMedia.sampleRate: 800;

Transport.bandwith: 128;

Fig. 10. Profiles for (a) sHQTerm and (b) sLQTerm

(b)

Self-Adaptive Applications Using ADL Contracts

97

The profile used to specify the requirements to sHQTerm and sLQTerm are presented in Fig. 10(a) and (b), respectively. The values for each property where selected consistently regarding the service to which they are associated. For instance, in the hqTermVProf profile it is defined that the MJPEG is the codec to be used, CIF is the desired resolution, and the minimum tolerated frame rate tolerated is 24 fps. In that way, the deployment of the sHQTerm service will be only possible if these characteristics can be imposed. In the negotiation clause (Fig. 9) the sHQTerm service (line 10) is initially negotiated. In the case it is not possible to maintain this service (assuming that in this example it will be possible to deploy it initially), the CM will negotiate the sLQTerm service deployment (line 11), with lower quality. If none of the services can be deployed the application is terminated. category { selPol (random, lowCost, bestPerf, optim); } Refl; profile { Refl.selPol: optim; } hqRef;

(a)

(b)

Fig. 11. (a) Refl category; (b) hqRef profile

In the contract deployment time the CM must select a specific reflector to which the Term module will actually connect. So, in the description of the link of the terminal, regarding the preferential service (Fig. 9, line 3) the @ref context variable, will contain the reference to the reflector to be effectively used in the link. This reference is resolved, when deploying the initial service, by executing the select primitive, which is parameterized by the selection function to be used (chosen among the ones described in the Refl category, Fig. 11(a)) and a reference to the context / domain to be considered in the selection (in this case, reflector of the @teleconf context). In this example, the utility function (optim – enumerated in the hqRef profile, Fig. 11(b)) is applied to the profile of each reflector, which makes part of the context, and the results are classified to determine which reflector is the most adequate. The reference to the specific selected instance is then used in the configuration operations.

4 Implementation and Evaluation The basic mechanisms for contract interpretation and management are recurrent, constituting a design pattern [11]. Based on this pattern, the specific support for contracts was implemented as a pre-composed set of objects (an object framework) [5], in which hot-spot objects allow some classes to be specialized to cope with the specific requirements of each application. Using this framework the videoconference application was implemented with the objective of evaluating the contract approach and performing experiments related to the switching of services. Specializations of the Contractor were implemented in order to handle the specific constraints of the services defined in the contracts of this

98

L. Cardoso, A. Sztajnberg, and O. Loques

application. In a similar way, the required RAs were derived from a basic class to provide the access to the metrics associated to each contract. Specifically, for this application three resource agents were developed: • • •

Processing: performs the discovery and monitoring of local resources such as CPU and memory; implemented using the Ganglia tool [13]; Network: provides bandwidth and delay measurements regarding communication links; based on the Abing tool [15]; Real Time Protocol: in charge of monitoring the media flow properties. Developed using the commons library distributed with the VIC program [14]. In the current version, it monitors RTP packet loss.

The reflector’s implementation uses the code described in [8] as a base, which only performs the forwarding of UDP packets. The original version was enhanced with an API that allows the dynamic creation of sessions and the management of participants. In the experiment we configured some scenarios with pre-recorded audio and video flows. A set of machines (2.8 GHz P4, 512 MB RAM) distributed among UFF and UERJ were connected trough Internet links, which were “strangled” by a 10 Mbps HUB (to make possible the injection of background flows with no additional impact to the observed nodes). The same set was redundantly connected by dedicated optical Giga bit links, with access by 100/1000 Mbps switches. Delay (ms) 200,00

C

Loss (pkt/s) 3,00

B

180,00 160,00 140,00

2,00

120,00 100,00 80,00

Loss

60,00

A

40,00 20,00

1,00

Delay

0,00 1

8

15

22

29

36

43

50

57

64

71

78

85

92

0,00 99 106 113 120 127 134 141 148 155 162 169 176 Time (s)

Fig. 12. Measure (mobile avg.) for delay and RTP packet loss

Fig. 12 presents the results of one of the experiments regarding the contract associated to the communication between two reflectors, refUFF and refUERJ, primLink service on Fig. 8. An audio flow was transmitted form refUFF to refUERJ and a UDP flow injected as background. The measure of the delay was used to identify the necessity of adaptation. Also, the RTP packet loss was monitored in refUERJ to evaluate the quality of the audio flow under test. Point A shows the moment when the UDP flow was introduced. Point B indicates the instant when the RA in refUERJ monitors a value above the established in the profile (150ms, in this case), fact that is reported to the CM by the Contractor. Point C shows the moment when the alternative service (altLink) is deployed (between B and C, the state machine is evaluated, and a new service is selected and deployed). The RA continues to report packet losses, which are

Self-Adaptive Applications Using ADL Contracts

99

discarded by the Contractor until the alternative services has sufficient time to get in permanent regime; it can be verified that beyond this point the delay value returns to the desired level. To avoid instability provoked by temporary spike values monitored on the flow, which could cause false service transitions, it used a mobile average filter. In the experiment we adopted a sample period of 3 sec, and the mobile average calculated over the 5 last samples. On the other hand, according to the tests performed in [5], the average time between receiving a profile violation notification and the establishment of the next service is approximately 2 ms. Additionally, the time to switch/add another component (that can be necessary to make an adaptation, e.g., chance the codec) using a Configurator integrated to the middleware [16] varies between 10.4 ms (best case: local execution / pre-loaded object) and 28.7 ms (worst case: remote execution / with object creation). These last measures demonstrate that the bottleneck, which bounds the total time for adaptation, is explicitly related to the interval required to identify in a consistent way a profile violation. [3] presents a more detailed investigation on this issue.

5 Related Work The approach described in this paper comprises a broad range of techniques [18]. Here we comment on additional related works. The TAPAS project [10] proposes the use of a special language (SLAng) [17] to specify non-functional restrictions, which are associated to specific parts of an application infrastructure. Concepts similar to categories and profiles are adopted, but they are designed to e-busyness applications in the WWW context. Mainly, the proposal concentrates on the specification issues, only suggesting that the restrictions can be checked at execution time, and eventually used to guide adaptive actions. The Rainbow [7] proposal adopts similar elements to those described in our proposal. It uses application invariants coordinated with adaptation strategies, which are, respectively, equivalents to profiles and services of our contracts. Adaptations in Rainbow, in contrast to our approach, which associates them explicitly to architectural level entities, are embedded in procedures that implement the configuration actions. This approach can hide structural details making formal verification of contracts more difficult than in CR-RIO. In our project, we were able to develop a tool [4] that transforms contract descriptions in rules and equations in Maude (based on a modular structural operational semantics definition formalism), allowing to verify profile definition inconsistencies, service reachability and deadlocks. Huang presents an approach for service auto-configuration based on recipes related to the application’s architecture [9]; However, the recipes only consider communication level aspects (delay and throughput) and the association with the architectural level is only illustrated by means of a simulator. An attractive point in the approach is the use of utility functions, which identify the adequate set of resources, to make the initial configuration of the application. Once initiated, local adaptations can be rapidly decided and performed to maintain the application running. Our approach generalizes this proposal including diverse non-functional aspects, as well as catering for the explicit description of the adaptations to be performed.

100

L. Cardoso, A. Sztajnberg, and O. Loques

6 Conclusion We presented an approach that integrates diverse elements necessary to the design and implementation of applications with context-sensitive non-functional requirements. According to this approach non-functional specifications of the applications are expressed by contracts, associated to the description of their architecture. Guided by the contracts several requirements can be imposed through adaptations on the application’s configuration or on its supporting infrastructure. In this context, it should be noted that an application can in fact be managed by several contracts, each one of them associated to different elements (resources, applications or services) that are part of it, making it self-manageable. In this way, the goal of building autonomous systems, with capability to perform the required adaptations internally and automatically, in order to impose the contractual specifications could be achieved. In order to make practical the use of the proposed adaptation techniques, it is expected that in the future an explicit separation between the policies and mechanisms is adopted in the design and implementation of the artifacts commonly used to support context-sensitive non-functional requirements. Specifically, the resource agents should provide (standard) APIs to manage and monitor the resource associated mechanisms. This should facilitate the integration with the high-level contract management entities, allowing more flexibility to impose the adequate adaptation policies to cope with the demands of current applications. Acknowledgments. The authors acknowledge the support of RNP/CPqD/Finep (GIGA Project). Alexandre Sztajnberg thanks the support from Faperj (PAEP 04/2005 E-26/171.130/2005).

References 1. Adamczyk, D., Collados, D., Denis, G., Fernandes, J. et al., “VRVS 3 - Global platform for rich media conferencing and collaboration”, Third Annual Workshop on Advanced Collaborative Environments, Seattle, Washington, June, 2003. 2. Beugnard, A., Jézéquel, J.-M., Plouzeau, N. and Watkins, D., “Making components contract aware”, IEEE Computer, Vol. 32, N.7, pp. 38-45, July 1999. 3. Bhatti, S. N. and Knight, G., “Enabling QoS adaptation decisions for Internet applications”, Computer Networks, Vol. 31, No. 7, pp. 669-692, March, 1999. 4. Braga, C. Chalub, F., “QoSTool”, in Loques, O., Sztajnberg, A., Abelém, A., Braga, C. and Stanton, M., “CARAVELA: Contracts for applications running on high speed networks”, 4th Report, GIGA Project, RNP, RNP/CPqD/Finep, January, 2006. (in Portuguese) 5. Corradi, A. S., “A framework to support non-functional requirements for high-level services”, MSc. Dissertation, Instituto de Computação, UFF, 2005. (in Portuguese) 6. Frolund, S. and Koistinen, J., “Quality-of-Service Specifications in Distributed Object Systems”, IEE Distributed Systems Engineering, No. 5, pp. 179-202, UK. 1998. 7. Garlan, D., Cheng, S., Huang, A., et al., “Rainbow: architecture-bases self adaptation with reusable infrastructure”, IEEE Computer, Vol. 37, No. 10, October, 2004.

Self-Adaptive Applications Using ADL Contracts

101

8. O. Hodson, O., Varakliotis, S. and Hardman, V., “A software platform for multi-way audio distribution over the Internet”, Audio and music technology: the challenge of creative DSP, IEE Colloquium, London, November, 1998. 9. Huang, A.-C., “Building self-configuring services using service-specific knowledge”, PhD Thesis, Carnegie Mellon Univ., Pittsburgh, PA, December, 2004. 10. Lamanna, D. D., Skene, J. and Emmerich, W., "Specification language for service level agreements". D2 - TAPAS Project, http://www.newcastle.research.ec.org/tapas /deliverables/D2.pdf, 2004. 11. Lisbôa, J. C. and Loques. O., “An architectural pattern to manage Quality of Service in distributed systems”, The 4th SugarLoafPLoP Conference, Porto das Dunas, Brazil, 2004. (in Portuguese) 12. Loques, O., Sztajnberg, A., Cerqueira, R. C. and Ansaloni, S., “A contract-based approach to describe and deploy non-functional adaptations in software architectures”. JBCS, Vol. 10, No. 1, pp. 5-18, July, 2004. 13. Matthew, M. L., Brent, C., and David, C., “The Ganglia distributed monitoring system: design, implementation, and experience”, Parallel Computing, Vol. 30, No. 7, July, 2004. 14. UCL Network and Multimedia Research Group, “Mbone conferencing applications”, http://www-mice.cs.ucl.ac.uk/multimedia/software 15. Navratil, J. and Cottrell, R. L., “A practical approach to available bandwidth estimation”, Passive & Active Measurement Wks. pp.1-11, La Jolla, CA, April, 2003. 16. Santos, A. L. G., “Dynamic configuration support for component-based software architectures”, MSc. Dissertation (in preparation), Instituto de Computação, UFF, 2005. (in Portuguese) 17. Skene, J., Lamanna, D. D. and Emmerich, W., "Precise service level agreements", 26th Int. Conference on Software Engineering, Edinburgh, UK. pp. 179-188, 2004. 18. Sztajnberg, A., Corradi, A. M., Santos, A. L., Barros, F. A., Cardoso, L. X. T. and Loques, O., “Specification and support of non-functional requirements for high-level services”, 23rd SBRC Mini-courses, p. 223-279, Fortaleza, CE, 2005. (in Portuguese) 19. Toga, J. and Ott, J., “ITU-T standardization activities for interactive multimedia communications on packet-based networks: H.323 and related recommendations”, Computer Networks, v. 31, n. 3, p. 205-223, February, 1999.

Dynamic Generation of Context Rules Waltenegus Dargie Department of Computer Networks, Faculty of Computer Science, Dresden University of Technology Hans-Grunding-str. 25, 01062 Dresden, Germany [email protected]

Abstract. This paper presents a scheme for the dynamic generation of context rules which are useful for modifying the behaviour of mobile devices according to the social and physical settings of their users. Existing context-aware systems employ a pool of predefined rules which will be executed whenever a context of interest is sensed and captured. Defining rules at design time, however, has the following limitations: (1) The system designer should identify the set of context primitives which describe a context of interest as accurately as possible; (2) the various states of each context primitive should be predetermined and mapped to sensory data, which often requires experience or expertise; (3) the availability of mechanisms for capturing the context primitives is presupposed; if instead of the specified context primitives other context primitives are discovered, which may equally describe a similar situation, the system does not react to them, unless of course, all possible alternatives are foreseen; and (4) the desired behaviour itself may not be a priori known, as it is specific to the user. To facilitate a flexible use of context rules, they should be generated dynamically. The scheme proposed in this paper associates decision events – signifying the activities of a user – with a set of context primitives which are acquired at the time the decision events are produced. From the decision-context associations, context rules are generated. The Context-Aware E-Pad will be introduced to illustrate the scheme proposed.

1 Introduction In context-aware computing, mobile devices and applications adjust their behaviour in accordance with the social and physical settings wherein a computing task takes place [1]. Adjustment in behaviour implies either provision of suitable services or modification of device configurations [2]. Often a context-aware system constantly senses the context of a user and updates a knowledge base. A context reasoning engine receives a set of rules from an application (defined at the time the application is designed) to manipulate the knowledge base; if the captured context satisfies predefined criteria, an action specific to the application will be executed to transform one world model to some other world model [3]. Defining context rules at design time and embedding these rules into the business model of an application has the following limitations: A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 102 – 115, 2006. © Springer-Verlag Berlin Heidelberg 2006

Dynamic Generation of Context Rules

ƒ

ƒ

ƒ

103

The application reacts only to the context primitives which are expressed in a rule; if different context primitives express a situation of interest – thereby leading to the modification of the same behaviour – either the application developer has to foresee all these possibilities at design time or the application will fail to capture the situation and to react to it. The application developer specifies a mechanism to detect the specified context; in the absence of the presupposed mechanism, there will be no way for the application to exploit other resources. In a ubiquitous computing environment, the availability of mechanisms for detecting a context may not be foreseen, as resources are highly dynamic. Existential quantifiers required for setting criteria for evaluating a context rule must be set at design time, which usually require a priori knowledge, and in some cases, expertise. For example, as in the case1: ‘When environment loudness is above 4 dB, set ringing tone volume to 2.5’. In order to construct such a rule, the meaning of 4 dB as well as the reference used to measure noise in dB should be known.

To alleviate these prohibitions, we provide support for the dynamic generation of context rules. The way we achieve this goal is by learning the activities of a user, and by mapping his activities onto a multidimensional context space. Assumption to this is that the user makes certain decisions habitually, or with some predictive regularity. Where there is no regularity in the activities of the user, learning cannot take place, or requires substantial computational and storage overhead. The rest of this paper is organised as follows: in section 2, a brief assessment of related work is given; in section 3, elements of a context rule are introduced; in section 4, the architecture for context processing and rule generation is proposed; in section 5, the event processing components and event expression semantics are discussed; in section 6, the Context-Aware E-Pad is demonstrated; and finally, in section 7, a brief summary is given.

2 Related Work Wang et al propose Semantic Space [5], a generic infrastructure for reasoning about higher-level contexts. It consists of context wrappers, a knowledge base, an aggregator, a context query engine, and a context reasoner. Context wrappers obtain raw context data from various sources, and transform the data into context markups. Context elements are represented as ontology instances and associated properties that applications can easily interpret. Among various contexts, the researchers identify three classes of real-world objects – user, location, and computing entities, and one class of conceptual objects – activity. These classes of contexts characterise smart spaces. An Aggregator discovers context wrappers and gathers context markups from them, and then asserts the markups into the context knowledge base, which it updates whenever a context event occurs. A context knowledge base stores extended context ontology for a particular space and the context markups that are provided by users or gathered from context wrappers. The KB links the context ontology and markups in a 1

The example is taken from [4].

104

W. Dargie

single semantic model and provides interfaces for the context query engine and context reasoner to manipulate correlated contexts. The context query engine provides support for applications to query the context KB. Finally, the context reasoner infers abstract higher-level contexts from basic sensed contexts. To employ general purpose logic based reasoning engines, Semantic Space explicitly represents all contexts. Applications submit a set of rules to the context reasoner, which applies them to infer higher-level contexts. Gu et al. propose the SOCAM [6] architecture, which shares similar properties with Semantic Space, and consists of a context provider, a context interpreter, a context database and a service locating component. A context provider provides context abstraction to separate the low-level context sensing from the higher-level context manipulation. Each context provider registers at a service registry by using the service locating component. The context interpreter consists of a Context Reasoner and a Knowledge Base to carry out a logic-reasoning to obtain a higher-level context. The Context Reasoner has the functionality of providing deduced contexts based on sensed contexts; the KB contains a context ontology and the instances of (assertions on) this ontology. The researchers distinguish between defined and sensed contexts, which are similar to explicit and implicit contexts as defined by Schmidt et al. [7]. In the case of defined contexts, the user may predefine the instances. A sensed context is an assertion of a context acquired from a context provider into the Knowledge Base. The userdefined rule-based reasoning provides forward chaining, backward chaining and a hybrid execution model. The forward-chaining rule engine employs the standard RETE algorithm [8]; the backward-chaining rule engine employs a logic-programming engine similar to Prolog engines; and a hybrid execution mode performs reasoning by combining both forward-chaining and backward-chaining engines. One obvious drawback of these frameworks is that they do not deal with uncertainty. The various primitive contexts gathered from context wrappers, context providers, and context acquisition modules are taken as reliable evidences. This however, is quite unrealistic, since uncertainty is always associated with sensed data and the context extracted from these data are prone to be inexact. Furthermore, the reliability of the sources delivering the data may not be ascertained, and the a priori knowledge required to manipulate the sensed data may not be up to date, i.e. it may not reflect the reality represented by the reasoning world-model.

3 Elements of a Context Rule We define two types of events: context events and decision events. Context events signify the occurrence of an interesting real-world situation. Decision events, on the other hand, correspond to the invocation of services inside mobile devices. Moreover, we distinguish between primitive events and composite events. Primitive events are those predefined in the system; a mechanism for their detection is assumed to be available. Primitive events include temporal events, atomic context events such as a room temperature being 20°C or the light intensity of a place being 1000 Lux. Events created by the invocation of action subroutines to perform specific actions inside a mobile device give rise to the occurrence of decision events, which are primitive events.

Dynamic Generation of Context Rules

105

A primitive event, E, is expressed as a predicate with four arguments. The event expression includes the event’s name, the event’s subject (the one which is responsible for the occurrence of the event), the event’s value and the time of occurrence (timestamp). Thus, a primitive event is expression as:

E (t ) ≡ (∃n ) (∃s )(∃v )(∃ts ) (event (n, s, v, ts ) ∧ (t = ts ) )

(1)

In equation (1), the existential quantifiers n, s, v, and ts are, respectively, the event type, the subject to which the event refers, the value of the event, and the timestamp; v can be a numerical value or an event object. Equation (2) shows a valid instance of a temperature context event.

ET (t ) ≡ event(temp, RoomA, 20°C, 10AM )

(2)

Once a primitive event is specified, it is possible to define an atomic context rule:

(∀t )(∀i )(∀k )(∃j )(∃l )(∃m )(ET (t ) ∧ (i < j ) ⊃ heater (k ) ∧ (l ≤ k ≤ m ) )

(3)

Equation (3) states that if a temperature event is detected the value of which is below the threshold j, heater k should be adjusted such that the value of k should be between l and m. Primitive events discussed so far are useful for modelling simple behaviours. For many context-aware applications, however, it is necessary to detect certain combinations of events in order to capture a dynamic real-world situation and perform suitable actions. Three composite event expressions are adopted from [9] to express composite events. These are: ANY, SEQ, and APeriodic events. The ANY event expression is a conjunction event which occurs when m out of a pool of n distinct events occurs, regardless of their order of occurrence The SEQ event expression is a sequence of events where the order of occurrence is preserved. The APeriodic (A) event expression is the occurrence of an event, E2, within a closed time interval [E1, E3]. This composite event is a non-cumulative event; i.e., the event A is signalled every time E2 is detected within the time interval started by E1 and ended by E3. If instead of E2, ANY is used to express the APeriodic composite event, A will be signalled every time one or a combination of the events expressed in the ANY composite event is detected.

4 Architecture Two basic operations are required in order to dynamically generate context rules: capturing of a dynamic real-world situation and understanding of the user’s preference in that situation. Consequently, the architecture we propose comprises two main units: the context acquisition and processing unit and the event processing unit. The former is responsible for interacting with a variety of sensors measuring some physical parameters of a real-world situation, translating the sensory data into a desirable format, and performing various manipulations and comparisons in order to produce an expressive, higher-level context, which is an abstraction of a dynamic realworld situation. This unit consists of Primitive Context Servers (PCS), Aggregators, a Knowledge Base (KB), an Empirical Ambient Knowledge component (EAK), and a

106

W. Dargie

Fig. 1. Architecture for context acquisition and processing

Composer. The second unit, the event processing unit, is responsible for generating context rules by associating a user’s actions with a context in which these actions are carried out. This unit consists of an Event Handler (EH) and a Rule Organiser (RO). Figure 1 shows the complete architecture. Although the main goal of this paper is to explain the dynamic generation of context rules, we will briefly introduce the context processing unit. 4.1 Primitive Context Server (PCS) A PCS abstracts from other components, such as Aggregators, the details and complexities of extracting data from physical sensors. It transforms the raw sensory data into a meaningful context atom. In some cases, multiple features (context atoms) can be extracted from a single sensor. An atomic context maps directly to a real world object. The primitive context provided by a PCS are delivered along with a description of the sensing element, for the components employing the context to decide how to rate the context. 4.2 Aggregator Often a piece of context primitive is not sufficient to appropriately model a real-world situation [10]. The real world is far too complex to be captured in complete detail by a single primitive. Therefore, several aspects of entities should be captured to reason about a dynamic real world situation. Moreover, the implication of inexact sensory data should also be taken into account. Subsequently, an aggregator is responsible for enhancing the quality of sensed data.

Dynamic Generation of Context Rules

107

4.3 Knowledge Base A priori knowledge of entities (places, devices, persons, etc) playing role in dynamic computing environments is useful both for modelling a real world situation and for appropriately interpreting sensor measurements. We distinguish between factual knowledge and knowledge based on beliefs. Facts reflect objective reality; they do not change or if they do, change only slowly over time. Beliefs, on the other hand, are not necessarily true, and they change quite frequently. The KB comprises a collection of facts constituting the vocabulary of an application domain, and a list of assertions about individual named entities in terms of this vocabulary. The vocabulary consists of concepts, which denote sets of entities, and relations, which denote binary relationships between these entities. In addition to atomic concepts and relations, the KB allows the building of complex descriptions of concepts and relations. 4.4 Empirical Ambient Knowledge The KB accommodates facts only, however incomplete. There is little support to encode uncertain knowledge. On the other hand, empirical and heuristic knowledge of situations and people’s perception of them is helpful to reason about dynamic realworld situations. However, this type of knowledge is rather based on beliefs established on past experiences and observations which cannot be described as facts, but as uncertain knowledge. The EAK quantitatively describes various aspects of an entity in terms of numerous atomic contexts (mapping to sensor data). The choice of a particular context atom depends on its capability in capturing a relevant (physical) aspect of the entity it describes. Other criteria include feasibility of measuring or recognising the context atom as accurately and unambiguously as possible. 4.5 Composer Even with reliable sensory data, we may still not be able to capture a real world situation. The reason for this is a gap between what sensors can provide and what applications may need. Composition deals with a single higher-level context as an abstraction of more numerous context atoms. Once entities are modelled using the facts and beliefs stored in the KB and EAK, respectively, the composer accesses these components to retrieve useful knowledge which can serve as a reference to manipulate reports from sensors. The result is a higher-level context representing a real world situation. A composer has three components. The logic based reasoning component (LBR) manipulates the facts, relations, and assertions in the KB. The probabilistic reasoning component (PR) deals with imprecise and possibly erroneous data in the EAK. We call it a probabilistic reasoning component for convenience of expression, although it may or may not be such. Our own implementation of the Composer is a probabilistic reasoning scheme based on Bayesian Networks. The decision unit reconciles the decisions of the LBR with that of the PR. At times, the PR may not be able to make decision because two or more propositions have equal posterior probabilities of the differences in posterior probabilities are not significant enough. Consider a scenario of reasoning about the whereabouts of a person in the absence of any localisation sensor. Suppose the outcome of the PR

108

W. Dargie

indicates that it is equally probable for the person to be either on a CORRIDOR or inside a ROOM. Knowing from the KB that a CORRIDOR and a ROOM are mutually exclusive concepts, but a BUILDING subsumes both concepts, it is possible for the decision unit to decide that a person is inside a BUILDING instead of randomly selecting for either a COORIDOR or a ROOM.

5 Event Processing Components In this section, we will introduce the Event Handler (EH) and the Rule Organiser (RO). These two components are required for the dynamic generation and execution of context rules which modify the behaviour of mobile devices or induce contextaware applications to execute suitable services. 5.1 Event Handler (EH) The EH is a bridge between various context-aware applications and the context acquisition and processing block. It receives decision events from the applications and associates them with contexts which are acquired from the computing environment. These decision events are produced when an interesting context is captured. As mentioned earlier, context rules are generated when decision events are associated with a context wherein the events are produced. Decisions are the basic elements of contextaware applications with which useful services are executed and the behaviour of applications (devices) is modified. Each decision corresponds to an action routine the execution of which causes a system to take certain actions, transforming one world model to some other world model. To associate decisions with a context, the EH subscribes to context-aware applications running on the mobile devices owned by a user to be notified of the occurrence of decision events. When a decision event arrives, it checks whether the same event has previously occurred; if it has not occurred, it tags the event by giving it a universally unique identifier (UUID) and associates the event with a context describing the situation of the application, the user, the device, and/or the place. If the event has already occurred, the context association will occur in the same way, and additionally, the EH performs aggregation of decision-context associations. Aggregation deals with the merging and filtering of decision-context associations referring to the same decision. The aim is to identify the types and states of primitive contexts which best represent a situation of interest in which a decision event occurs. 5.2 Rule Organiser (RO) The main task of a rule organiser is the generation of context rules from an aggregate of decision-context associations. Besides this, the RO manages context rules. Managing context rules is necessary if two or more contextual states trigger the same decision event; or if two or more decision events can be mapped to the same context. Example to the second case is switching a mobile phone to a vibration mode while loading an appropriate document whenever a user attends a meeting or a lecture. The sequential event operation is employed to describing decisions events referring to the same context.

Dynamic Generation of Context Rules

109

There are times when a set of rules no longer represent a user’s preferences. To cope with this situation, human intervention is required. Human intervention is followed by a process called unlearning – a process referring to the disintegration of context rules. 5.3 Observation Time Decision-context aggregation takes place during an observation time, a time required by the EH to learn the behaviour of a mobile user. The EH subscribes to a single or multiple applications to be notified of the occurrence of decision events. The applications produce desirable events when a mobile user interacts with them by modifying their behaviour or by invoking certain services. An observation time is expressed as an Aperiodic composite event – it begins and ends with temporal events. Between these two events a number of decision events, and the association of these decision events with a context, occur. The E1 of an observation time can be either an absolute or a relative temporal event. An absolute temporal event corresponds to a unique time span on the time line with a clearly defined reference time and an offset time. A relative temporal event corresponds to a unique time span on the time line, but the reference event can be other than a temporal event. An example for an absolute temporal event is when a user specifies a time span to train a system; an example for a temporal event is when a user specifies the frequency of the occurrence of a particular decision event. During an observation time, the event flow is from applications to the EH down to the available Primitive Context Servers. 5.4 Execution Time An observation time is followed by an execution time. During the execution time, context rules are fired. The condition for firing a rule is the occurrence of a desirable context signifying a situation of interest. During an execution time, the EH has already learned the behaviour of the user in a situation of interest, and therefore subscribes to the Composer to be notified of the detection of a specific context. The event flow is from Primitive Context Servers to the Composer and the EH and up to context-aware applications. The final event to be produced is a decision event.

6 Application This section introduces the Context-Aware E-Pad (CAEP). The motivation behind the design and implementation of CAEP is to demonstrate (1) how applications can employ a higher-level context without the need to directly deal with its composition; and (2) the dynamic generation of context rules by associating a context with actions (decision events) performed by a mobile user. The reason for choosing CAEP as a demonstrator is its richness in the diversity of decision events which can be produced; hence, it is possible to associate each decision event with different situations. CAEP is a context-aware application intended to be used in a mobile environment; it takes into account the context of a mobile user to create, load, send, receive, and delete confidential documents. Figure 2 shows the main components constituting CAEP.

110

W. Dargie

Fig. 2. Components of the Context-Aware Electronic Document

The design concept of CAEP enables the dynamic generation of context rules from a user’s activity. The essential features of CAEP are summarised as follows: ƒ

ƒ ƒ ƒ

During an observation time, CAEP publishes the user’s activities (in the form of decision events) to the Event Handler (EH). This is useful for associating the decision events with a set of contexts describing the situation in which the activities are taking place. From the association, the Rule Organiser generates a context rule or a subpart thereof, and stores the result. During an execution time, CAEP receives instructions from the EH to create, load, receive, send, and delete documents. Instructions from the EH are results of the occurrence of a context of interest. Moreover, CAEP offers a semantic-based input interface to a user. The interface enables a user to enter instructions from which decision events and context rules can be constructed.

6.1 Components of CAEP The components of CAEP are: the Input Interface (II), the Parser, the Event Import/Export component (EIE), and the Command Generator (CG). Figure 3 shows the components of CAEP. 6.1.1 The Input Interface (II) The input interface enables a user to enter instructions. To enable the dynamic generation of context rules, input instructions are managed centrally by the input interface. Instructions which are provided via the GUI produce decision events directly, so they need not be further processed. However, besides the graphical user interface, CAEP provides a semantic-based input interface for a user to enter textual, visual or audio instructions from which decision and context states are constructed. Such instructions require further processing, such as the conversion of audio and visual inputs into text, and the parsing and processing of the textual instructions. The II is responsible for transforming inputs of other modalities into textual instructions.

Dynamic Generation of Context Rules

111

6.1.2 The Parser The parser receives unstructured textual instruction from the II to generate decision events. Any instruction from a user should therefore consist of the decision event to be produced and the object (a document’s name) to which the decision refers; additional elements of a user’s instruction include contextual elements which can be associated with the decision events and persons to or from whom a note (document) should be sent or received. The Parser employs the KB of the context processing unit in order to produce a meaningful decision event. This is carried out by passing a parsed instruction to the KB. Words such as create, load, send, and receive, when appear in an instruction, are automatically interpreted as decision event generators; as a result, the KB searches within the collection of parsed words for associated objects, namely, documents and notes. A document refers to a self contained file, while a note refers to a part of a document. Receive and send instructions, besides being decision event generators, are associated with recipients and senders, respectively, i.e., to a human entity type. To illustrate decision event creation, consider the following instruction from a user. Receive last week’s DB2 note from Kathy. When the II receives this instruction, it forwards it to the Parser without producing any decision event. The Parser parses the instruction word by word and issues a query request to the KB, which manipulates the concepts and roles it stores to create associations with the collection of words from the Parser. This is displayed in figure 3. ªreceive from → event (decision ); º «last week → context (time (between (27.09.2004, 01.10.2004)));» « » « DB 2 → entity (document ); » « » «note → part Of (document ) » «¬kathy → entity ( person) »¼

Fig. 3. A Query result from the KB for producing a receive decision event

The Parser receives the result and produces the decision event shown in figure 4. ªreceive [ from = ' kathy' ;], º «CAEP; » « » « ªcontent = note; º» event « « », » « « subcontentOf = document; »» « «createdOn = between(27.09.2004, 01.10.2004 )» » ¼» «¬ ¬«05.10.2004, 10 : 10 ¼»

Fig. 4. A decision event generated by the Parser

6.1.3 The Event Import/Export Unit (EIE) The event import/export (EIE) is responsible for dispatching decision events and receiving context events. Decision events are exported to the EH during an observation time, so that they can be associated with a context; a context event – or more precisely, the notification of its occurrence – is received from the EH during an execution time, so that CAEP performs an action associated with the context.

112

W. Dargie

6.1.4 The Command Generator (CG) All decision events which can be generated inside CAEP are registered at the Command Generator. This is useful for monitoring the activities of CAEP in a centralised manner. Only registered events can be triggered or associated with a context. A request to trigger an action should therefore arrive from the EIE referring to one of the registered decision types. When a user performs an action referring to one of the decision events, the CG notifies the EIE, which in turn notifies the EH; and the association of a decision event with a context ensues. 6.2 Decision-Context Association Semantics This section illustrates how we trained CAEP to autonomously load a document suitable for a particular lecture a user attended. Besides loading a document, CAEP autonomously collected notes from one of the user’s friends and insert them2 in the appropriate document if the user happened to miss a previous lecture. An observation time of one month was set during which time four lecture sessions were conducted. The user loaded a DB2 document four times, but one of these decisions was made during a makeup class rather than the normal lecture sessions. The set of primitive contexts which were collected at the time the loading decisions have been made are shown in table 1. Table 1. A Decision-Context Associations

1

2

3

4

[[Context: time: 21.09.2004, 9:57:13M] [Context: temperature: 20] [Context: sound_pressure: 13 dB] [Context: light_intensity: 720 Lux] [Context: RH: 50%]] [[Context: time: 01.10.2004, 10:10:23 AM] [Context: RH: 46%] [Context: temperature: 22]] [[Context: time: 05.10.2004, 10:06:47 AM] [Context: RH: 52%] [Context: temperature: 22] [Context: light_Intensity: 700 Lux]] [[Context: time: 12.10. 2004, 10:01:07 AM] [Context: temperature: 23] [Context: sound_pressure: 11.98 dB] [Context: RH: 48%]]

RH = relative humidity; temperature is measured in degree centigrade. 20 micro Pascal is used as a reference to measure sound pressure.

Table 2. Summary of decision-context associations Decision Load 2

Place ROOM

Context Time 10:02 AM{±3}

Day Tuesday

Duplicate documents were not allowed. If multiple notes refer to a similar context-decision association, they were considered to be duplicates.

Dynamic Generation of Context Rules

113

The syntax to associate decisions with a context is given by:

decision ( descriptio n ) associated With context ( descriptio n ) Fig. 5. A syntax for decision-context association

The Composer of the context processing unit computed a higher-level context by taking the primitive contexts listed in table 1; thus, what appeared in a contextdecision association was the higher-level context Room instead of the set of primitive contexts (see figure 5), as a result of which the dynamics of primitive contexts is totally abstracted from CAEP. The Composer employed a self-organising Bayesian Network to reason about various places (rooms, corridors, buildings and outdoors)3. The Rule Organiser accumulated the decision-context associations until the observation time was over, and merged and filtered associations to generate an aggregate association. The summary is shown in table 2. The context rule generated out of the aggregate association is given by equation (4). This rule was used during an execution time to autonomously load the DB2 document whenever the user attended a DB2 lecture. Notice that existential quantifiers were dynamically identified from the primitive contexts. decision ( load (DB 2 ) ) associated With context ((location (Room ) ) ; (time (September 21, 2004, 9:57:23 AM ))) Fig. 6. A loading decision associated with contexts § § day(d ) ∧ time(t ) ∧ loaction(l ) ∧

(∀d )(∀t )(∀l )¨¨ ¨¨

· · ¸ ⊃ (decision (load ( pad (DB2 ))))¸¸ ¹

¸ © © (d =' Thursday') ∧ (09:59 ≤ t ≤ 10:05) ∧ (l =' Room')¹

(4)

6.2.1 Mapping Multiple Decisions Two or more decision events can be associated with a similar context, signifying the occurrence of multiple activities in the same situation. For example, during the time of training CAEP to load the DB2 document, the user switched the ringing tone of his mobile phone to a vibration mode. Since the EH has subscribed to the application managing the mobile phone’s ringing style, it was notified of the change in behaviour of the mobile phone while the user attended a lecture. The two decision events, namely, the loading event and the ringing style adjustment event, occurred together frequently; hence, they were associated to the same sets of contexts. The expression for associating multiple decisions with a similar context is given by:

decision ( description ) followedBy decision ( description context( description )

)

associatedWith

Fig. 7. Syntax for associating multiple decisions with a similar context 3

Composition of a Higher-Lever context is treated elsewhere [10].

114

W. Dargie decision

(

ringerStyl e (vibration )

)

followedBy decision

( load (DB 2 )) associated With

context ((location (Room )) ; (time (September 21, 2004, 9:57:23 AM )))

Fig. 8. Multiple Decisions associated with a similar context

For the scenario we just described, the merged association looked like the one depicted in figure 8. The context rule which was generated from the aggregated decision-context association is given in equation 5. (day (d ) ∧ time (t ) ∧ loaction (l ) ∧ (d = ' Thursday ') ∧ (09:5 9 ≤ t ≤ 10:0 5 ) ∧ (l = ' Room ')) ⊃ · ¸¸ © (decision (load ( pad (DB 2 ))) ∧ decision (ringerStyl e(mobilePhon e (vibration )))) ¹

(∀ d )(∀ t )(∀l )§¨¨

(5)

6.2.2 Managing Temporal Events Temporal context could directly reveal a user’s habit. However, due to the unpredictable nature of the user’s actions, there is uncertainty in dealing with time. Additional knowledge is required to resolve this uncertainty, since inconsistency in time is application specific. For example, a user may not arrive at a lecture session exactly on a set time; he may arrive later or earlier than the time at which a lecture begins; but he cannot be earlier or later than the starting time by more than 90 minutes, considering 90 minutes the duration of a lecture. Therefore, this extra knowledge has been explicitly made known to the RO.

7 Discussion Existing context-aware applications integrate rules which are defined at design time. It has been shown that defining context rules at design time has certain limitations including the need to identify the actions and the context wherein these actions are invoked as well as the existential quantifiers for setting criteria for triggering the context rules. This paper presented a distributed architecture for dynamically generating context rules. The architecture consists of two main units: the context acquisition and recognition unit and the event detection and processing unit. The first unit is responsible for gathering data from various sensors, transform the data into a desirable format and performs various manipulations to reason about a higher-level context, which is an abstraction a dynamic real world situation, while the second unit is responsible for detecting the occurrence of decision events inside context-aware applications, and for associating these events with a context of interest. The time required to perform context-decision association is called an observation time, and association takes place inside the Event Handler. When the observation time is over, the Rule Organiser aggregates decision-context associations and generates context rules from the aggregation. Afterwards, the Event Handler begins to listen to the occurrence of a context of interest to fire the decision events stored in the Rule Organiser. The Context-aware E-Pad was designed and implemented to demonstrate our approach. The design concept of CAEP frees both the application developer and the user from setting rules to perform desirable actions. It has been shown how CAEP employed the higher-level context PLACE (corridor, room, building, or outdoors) without the need to directly deal with its composition. Moreover, CAEP has been trained to load suitable documents during lecture sessions.

Dynamic Generation of Context Rules

115

References 1. Dourish, P. 2004. What we talk about when we talk about context. Personal Ubiquitous Comp. 8, 1 (Feb. 2004), 19-30. 2. Dargie W., Loeffler, T., Droegehor, O., David, K. 2005. Composition of Reusable HigherLevel Contexts. In Proceedings of the 14th Mobile and Wireless Communication Summit, IST, Dresden, Germany (June 2005). 3. Dargie W., Loeffler, T., Droegehor, O., David, K. 2005. Architecture for Higher-Level Context Composition. In Proceedings of the Workshop on Context-Aware Proactive Services (CAPS05), Helsinki (June 2005). 4. Mäntyjärvi, J. and Seppänen. 2003. T. Adapting Applications in Handheld Devices Using Fuzzy Context Information. Interacting with Computers J., vol. 15, no. 4, 2003, pp. 521– 538. 5. Wang, X., Dong, J. S., Chin, C., Hettiarachchi, S., and Zhang, D. 2004. Semantic Space: An Infrastructure for Smart Spaces. IEEE Pervasive Computing 3, 3 (Jul. 2004), 32-39. 6. Gu, T., Pung, H. K., and Zhang, D. Q. 2004. Toward an OSGi-Based Infrastructure for Context-Aware Applications. IEEE Pervasive Computing 3, 4 (Oct. 2004), 66-74. 7. Schmidt, A., Beigl, M., and Gellersen, H.-W. 1999. There is more to context than location. Computers and Graphics 23 (1999), no. 6, 893–901. 8. Forgy, C. L. 1982. Rete: A fast algorithm for the many pattern/many object pattern match problem. Artificial Intelligence, 19(1):17-37, 1982 9. Chakravarthy, S., Krishnaprasad, V., Anwar, E., and Kim, S. 1994. Composite Events for Active Databases: Semantics, Contexts and Detection. In Proceedings of the 20th international Conference on Very Large Data Bases (September 12 - 15, 1994). 10. Dargie, W. and Hamann, T. 2006. A Distributed Architecture for Reasoning about a Higher-Level Context. (To Appear) In Proceedings of the 2nd IEEE International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob 2006), IEEE Press (June 19-16 2006).

Spirits: Using Virtualization and Pervasiveness to Manage Mobile Robot Software Systems Himanshu Raj, Balasubramanian Seshasayee, Keith J. O’Hara, Ripal Nathuji, Karsten Schwan, and Tucker Balch College of Computing Georgia Institute of Technology Atlanta, GA 30332 http://www.cc.gatech.edu/~kjohara/MORPH Abstract. Management capabilities comprise a key component of any autonomous distributed system. In this work our focus is on mobile systems like teams of robots exploring and operating in some physical environment. Here, basic management goals are to adapt application behavior to prevent or mitigate reductions in the application’s quality of service. This paper presents the spirits system-level mechanisms supporting (1) behavior persistence – the ability to maintain some desirable behavior learned through online adaptation – and (2) behavior propagation – the ability to propagate a learned behavior across different physical components. With spirits, software management is enriched with a low-level mechanism, termed a spirit cache, which permits a mobile entity to cache its current state and code to realize behavior persistence. Next, a cached spirit can be acquired by a different physical component and then used, thereby propagating it. By using system-level virtualization techniques to realize spirit caching and propagation, both can be performed without the need to make any changes to application code, without requiring middleware-level support, and without changes to operating system kernels or utilities. Furthermore, any number of spirits can exist in a robot with system-level isolation guarantees. Experimental results presented in this paper highlight the types of overheads for spirit exchanges experienced on typical next generation virtualizable embedded machines, and indicate optimizations to be considered in future research.

1

Introduction

Self-management is a defining aspect of autonomic systems. In the embedded and pervasive systems addressed by our research, management tasks include (re)configuring systems to adapt them to changing environments and to optimize resource usage. The environment for a system includes both external surroundings and internal resource availability, such as remaining battery levels. Here, dynamic self-management is critical because resource constraints may not only result in sub-optimal performance if not managed properly, but they may threaten the very existence of the system due to lack of battery power, for example. Further, self-management cannot be replaced by manual or operator-based A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 116–129, 2006. c Springer-Verlag Berlin Heidelberg 2006 

Spirits: Using Virtualization and Pervasiveness

117

methods, since mobile nodes distributed over some large geographical area will not always be accessible and will be difficult, if not impossible, to service [1]. Our work assumes that basic management goals are to adapt application behavior to prevent or mitigate reductions in the application’s quality of service and to propagate desirable behaviors across different system components. Yet, in pervasive systems, high component failure rates coupled with lack of connectivity make it difficult to maintain desirable global state or to cause global behavior changes. Substantial literature addresses this fact, including rich basic insights about global state [2], state propagation techniques like gossiping or constrained flooding [3, 4], and practical methods for the pervasive or peer-to-peer domains [5, 6]. Reacting to the difficulty of maintaining desirable state and behaviors in a distributed pervasive system, the specific technical problems addressed by our research are (1) behavior persistence – the ability to maintain some desirable behavior learned through online adaptation – and (2) behavior propagation – the ability to propagate a learned behavior across different physical components. The contributions of this research are new system-level techniques for behavior persistence and propagation. Specifically, we present the spirits system-level mechanisms, which support (1) behavior persistence through a system-level spirit cache and (2) behavior propagation through system-level techniques for transplanting a cached spirit from one machine to another. To better explain the idea, we first describe a realistic usage scenario, followed by additional technical information about the spirit approach. The mobile system studied in our research is a team of autonomous robots deployed for a specific task, such as disaster recovery and reconnaissance. For instance, imagine a forest that has had a sensor network previously deployed [1]. The sensor network is primarily used to monitor temperature for detecting forest fires. On this occasion, a team of mobile robots have been commissioned to search for a wounded forest worker. The robots are deployed and venture deep into the woods where one of them comes upon the victim. It begins using its cameras and complex computer vision algorithms to monitor her vital signs. Then the robot’s camera malfunctions. Another nearby robot is commissioned to fulfill the broken robot’s role. Because the wireless communication system between the robots is unreliable, high bandwidth communication is unavailable. Instead, the broken robot creates a spirit by freezing its software (including the code, data, and state of the operating system and application) and sends it to nearby sensor nodes for storage (i.e., a spirit cache). It then heads back to be repaired. The replacement robot finds its way to the victim (possibly through the assistance of the sensor network) and downloads the cached software (i.e., spirit) from the storage nodes (i.e., spirit cache) of the sensor network. The robot is then imbued with exactly the same behaviors as the previous one, able to monitor the victim and help as if it were the same robot. The outcomes are behavior persistence and propagation, realized with simple system-level methods that replace complex multi-robot or multi-sensor schemes for global state update or propagation.

118

H. Raj et al.

As commonly done in robotics applications, our software is represented as a task graph with known task dependencies. A subtask, or role, may run continually throughout the application’s lifetime. Thus, behavior or role persistence reduces to the problems of capturing the state of a task, and propagation reduces to the ability to reinstantiate captured (and cached) states. Both are desirable because it is difficult or impossible to re-create the state maintained by the role. First, the dynamic state is created from interaction with the environment rather than static state associated with the role’s implementation. Second, if we use code morphing techniques, the code might be dynamically optimized for the robot’s current task and environment. Thus, it is not an option to simply shutdown a role at a robot and instantiate it afresh at another one. A benefit of our approach is its simplicity. Instead of requiring application developers to use representations of roles built into application software or middleware, as with agent-based approaches [7], the spirit approach can be applied to any software implementation of robotic or pervasive systems. Further, spirits can be used with arbitrary operating software or operating systems, as will become clear in the technical approach for spirit implementation described in Section 3. Finally, the implementation of spirits is straightforward, replacing otherwise complex application-level methods for task or agent migration [8]. This is due to our ability to leverage new hardware and operating system capabilities developed for system virtualization and isolation [9]. Based on our experiments with representative applications running on VMs managed by the Xen virtual machine monitor, we show that virtual machine migration is a feasible way to migrate spirits among robots. Moreover, this approach obviates the need for adaptation interactions which would require spirit migration between applications implementing spirits and the control plane software performing the overall management of the system. This way, new applications can be deployed and integrated into the system adaptation framework relatively easily. The tradeoff here is the amount of state that must be moved over the network which presents the ultimate bottleneck in a resource constrained system as that of ours. We make use of compression techniques in order to reduce state size. We intend to incorporate other middleware and application level system adaptation techniques in the framework [10].

2

System Adaptation for Autonomous Robotics

Two current applications of autonomous robot technology are search and rescue missions and dangerous (or dull) exploratory missions (e.g. Mars). Classes of robot systems that are used in these applications range from a large number of simple sensing devices to very sophisticated mobile robots. Despite the variety in hardware and domains, all the applications pose unique challenges to the distributed software systems they rely on. The platforms are often mobile, inhabit dangerous and dynamic environments, rely on limited energy resources, and use a large amount of external devices. Three aspects of this domain lend themselves particularly to our approach.

Spirits: Using Virtualization and Pervasiveness

119

First, autonomous and semi-autonomous robotics applications are often longlived and stateful. For instance, search and rescue missions may last for days and include the creation of detailed maps. Unfortunately, robot hardware and energy supplies don’t always easily support such long-lived applications. The authors in [11] find in one of their studies that the mean time to failure of unmanned ground vehicles was 8 hours, with 50% of the failures due to effectors, 33% to control systems, and 9% to both power and sensing. The robots have limited energy capacity, which at some point must be replenished, and the sensor and effectors often break. Methods that can easily, or automatically, move the robot’s application to a new platform will ease deployment of such critical systems. Second, robotics applications are becoming increasingly distributed, often deployed on heterogeneous hardware and communication platforms. For example, using teams of robots or even a single robot with an application distributed over multiple computing platforms. However, mobile robots live in a dynamic environment where failures of sensors, effectors, control systems, and communication is commonplace. Techniques for easily managing the distributed nature of these applications will greatly aid in deployment of future robotics systems. Third, robots and sensor networks find themselves in unique locations for scientific study. Consider the Mars rovers, or sensor networks deployed in deep forests. These computing systems provide unprecedented access to some of the most interesting places in nature to scientists. Unfortunately, these systems are expensive to deploy and are often only available to a limited group of scientists. We would like to “time-share” these systems, allowing multiple scientists to work in parallel. For this to be practical and useful, the scientists must view the sensor network or robot system as their own, being isolated from other scientists’ applications. For instance, a software fault in one scientist’s application should not impact another’s experiments. The spirits approach addresses all of these issues. System Architecture - A Usage Scenario In order to highlight our system architecture, we revisit the example scenario briefly described earlier. There are a variety of conditions which can arise in robotics that can create the need for spirit migration. In our example, we consider a local failure event in the form of a broken sensor vital to the role being executed. This type of failure situation is common in robotics due to the richness of sensors and peripheral devices that the platforms incorporate. Once this failure is detected and reported to a high-level decision making layer, a replacement for the faulty robot is assigned. Robotics systems rely strongly upon wireless communication devices for network operations. Indeed, individual robots often support multiple radios that vary in range, bandwidth, and power consumption. All of these solutions, though, are plagued by effects such as signal attenuation and interference which can increase bit error rates and create transient network partitions. Mechanisms such as retransmissions which are implemented to balance these attributes decrease the overall bandwidth provided by the network schemes. This creates an issue

120

H. Raj et al.

for long-haul communication of state for spirit migration. Indeed, in the case of network partitions, transmitting the state remotely can be impossible. In our example scenario, for power reasons the failed robot needs to move from its region in a timely fashion in order to return to base for repairs before its battery life is too low to make the trip. At the same time, for the various reasons described, the spirit state cannot be transferred to the new robot arriving from a distant location until it is within a closer proximity. In order to disassociate the act of spirit migration and communication of state between the failed robot and its replacement, we propose the use of an underlying network of participants we term spirit cache nodes. These entities consist of extremely low-power, pervasive embedded devices that can perform simple tasks such as communication, storage, and light weight computations. Embedded networks deployed in the field for other purposes (e.g. sensing [12], supporting robots [13]) may also fulfill this role. We envision these platforms to be similar in capabilities to the XScale based Gumstix device [14]. The Gumstix platform comes equipped with Bluetooth radio, and can be extended to equip additional 802.11 radios and storage solutions such as flash. Using the environment to save state is also used in other contexts such as [15], where collaborative construction by a team of robots is shown to be more efficient with extended stigmergy. Spirit caches are distributed geographically in the environment for use by robots in redeployment. In particular, the robot which experiences sensor failure utilizes a discovery process to determine the local spirit cache, and then stores its spirit onto it. This spirit state is tagged with a pre-negotiated unique ID. Upon arrival, the replacing robot discovers the local spirit cache, and then retrieves its new spirit by the unique ID. In this manner spirit caches allow the failed robot to

Fig. 1. An illustration of the spirits approach

Spirits: Using Virtualization and Pervasiveness

121

return for repairs while simultaneously allowing the application to continue by reinstantiating the role on the assigned robot. Figure 1 provides an illustration of our approach.

3

Utilizing Virtualization for Spirit Migration

A key problem associated with spirit migration in any adaptive system is of state transfer. One can implement migration via simple stop-start mechanisms if spirits do not accrue state over time, i.e. stop the application on one robot and start it at another one. However, this is particularly non-interesting in our domain where robots use machine learning algorithms to learn and build knowledge-base overtime. One such example is building posture in order to maintain stability in a particular terrain. A robot starts with a conservative, resource consuming, highly stable posture when deployed in a new area. As it moves around, it learns about the terrain and switches to a posture stable enough for the terrain that is less resource consuming. Also, if we employ dynamic code optimization, the code will be morphed to suit the external and internal environments of the robot. A simple stop-start mechanism will incur the cost of re-learning, which will result in unnecessary resource wastage. Hence we must migrate spirits in a way that captures the state associated with it. There are multiple techniques that can be readily employed - using an agent based framework, using process migration, or using virtual machine migration. With agent based solutions, spirits are implemented as agents that are fully aware of how to capture necessary state when it is time to migrate. This requires either a run time system that can capture agent state fully via, e.g. object reflection and serialization, or agents must implement that facility themselves. The former approach usually necessitates using a domain specific language, such as Java. The latter solution implies that the entity implementing the agent must know exactly how to serialize its state. Since the nature of the data structures used by the algorithms to internally represent the “knowledge” gathered from the learning varies from one algorithm to another, using an agent migration technique customized to the individual algorithms is tedious and needs to be done at design time. As VM migration encapsulates the entire memory state, regardless of the semantics of memory being used by the processes running inside, this problem is naturally overcome. Another alternative is to use process migration, where an underlying management system, such as MOSIX [16], tightly coupled with the OS can migrate a process in a cluster environment. This approach is not very lucrative for our area of focus, for various reasons: 1. The system can not be truly modeled as a cluster as there might be huge communication delays between any two nodes (and probable network partitions). 2. This approach doesn’t provide isolation. If a robot has multiple spirits, a tighter isolation can be enforced between them if using virtual machines vs. using processes.

122

H. Raj et al.

3. Since the resources owned by a virtual machine are better abstracted than a process (a VM owns architectural resources vs. a process owns OS resources), it is much easier, and possibly less costly, to encapsulate the VM state than the process state. There are two primary issues with using virtual machines for spirit migration in our system. One is checkpointed state size, which comprises mainly of the memory contents allocated to a virtual machine in Xen VMM (details of VM state that is checkpointed appear elsewhere [17].) A virtual machine should have enough memory to run the application(s) comprising the spirit role gracefully any amount bigger than that will just add to state size without actually benefiting the application. This amounts to estimating the working set size for a spirit, which is non-trivial. This problem can be largely alleviated by compression techniques. Also, we make changes to the OS of the guest VM so that all free pages in the system remain scrubbed (filled with 0). This makes compression quite efficient, as demonstrated by experimental results. Any peripheral state, such as disk contents, must be captured separately. Once it is determined that a spirit can no longer exist on a robot, its execution is stopped by Xen and the state is checkpointed to a file. The checkpointed state is further compressed. The compressed state of the spirit is moved to either the spirit cache or to another robot, where it is reinstated by Xen after decompression. Hence, the spirit ceases to execute for the time period when it is being migrated. Another approach for VM migration is live migration [17] where a VM continues to execute when its state is being checkpointed. We plan to incorporate this approach in future work. The second issue is peripheral virtualization. The job of virtualizing the core architectural resources, the CPU and memory, is readily done by the Xen VMM. One must implement device specific backend/frontend drivers [9] in order to virtualize a peripheral device. For this work, we chose to implement logical devices rather than implementing full peripheral virtualization for all the devices present in our robots, such as camera, sonar, laser, etc. A logical device is a user space construct, part of which resides in the controller domain (the one with access to actual peripheral hardware), termed backend logical device. Another part of it, termed frontend logical device, resides in the guest VM (where a spirit executes). The backend provides data to the frontend using a channel such as a shared memory buffer or network packets. Logical devices are easier to implement since it makes use of already existing device drivers and software framework available to the OS running in controller domain. For our experiments, we implemented a logical camera device. The backend captures data via video4linux framework [18], and uses UDP send to transmit it to the frontend running in the guest VM. This severely limits the size of images that can be atomically transmitted to the guest VM. A future shared memory implementation will alleviate this restriction. Limitations This section describes the limitations posed by using a virtual machine approach in general for adaptation in mobile robotics systems. One such limitation is

Spirits: Using Virtualization and Pervasiveness

123

inflexibility. Depending on the availability of robots and number of spirits, a robot might be assigned multiple roles (one role per spirit), and migration might only require a subset of these spirits be migrated. This situation is readily supported via assigning different spirits to independent virtual machines. However, this makes subdividing a spirit’s role into any smaller entities a problem similar to that of process migration (one needs to create additional VMs and migrate some processes of a role to another VM). Our approach is currently inflexible in this regard and cannot take care of such a situation. Hence, it requires design-time effort on our part to define spirits small enough so that it cannot be sub-divided but large enough so that in the common case one doesn’t need to run multiple copies of them for a single task on the same robot. Otherwise, it will create elevated overheads and performance costs. Another challenge of utilizing the virtualization approach is that though nodes in a distributed embedded system are often highly heterogeneous in multiple aspects including architectures, the virtual machine states themselves are by nature highly architecture dependent, since they rely on native execution for improved performance. A portable solution, such as Java based VMs might support architectural heterogeneity, but is not well suited for embedded platforms for various reasons, including reduced performance, higher operating costs and difficulty in accessing native resources such as peripherals. Our Xen VMM based solution implies that at least the core architecture of all robots be the same, which is a reasonable assumption. However, this doesn’t impose any restriction on heterogeneity resulting from different peripherals on different robots. In our case, peripheral heterogeneity gives rise to a logical problem - how to determine which robot a spirit should be migrated to? For this, we envision using a markup language such as the Service Provisioning Markup Language [19] to describe both robots and spirit roles in order to perform capability matching.

4 4.1

Evaluation and Discussion Experimental Results

We conduct simple experiments with virtualization technology based on the open source Xen virtual machine monitor to determine the feasibility of our approach. All experiments are run on an i686 based machine with a P4 3GHz HT-enabled processor and 512MB of RAM (386MB is available to the controller domain and the remaining left for creating guest VMs). The platform executes a paravirtualized Linux kernel 2.6.12 and Xen 3.0. The computational resources of this machine are modestly superior to those available on robots. We envision that future robots, even those based on embedded platforms, will be based on x86 architectures such as the Intel Centrino platform, and will have comparable computing strength. As demonstrated by experiments, it is communication overheads that dominate the overall cost, and hence determines the feasibility of a virtual machine based approach for dynamic adaptation. In the first set of experiments, we characterize the effect that the behavior of an application has on the size of the checkpointed state, and consequently,

124

H. Raj et al.

the time it takes to capture and reinstate it. Next, we run two representative applications - the localization module from the CARMEN [20] robotic software suite, and a CMVision [21] based application that captures images of its environment and builds a knowledge-base. These applications are run inside a virtual machine. Since checkpointed state of a VM in Xen mostly comprises of its memory contents, these virtual machines should have enough memory to run the applications gracefully - any amount bigger than that will just add to state size without benefiting the application. We use a ramdisk based image for the root file system in order to capture peripheral state as part of memory state itself. For our experiments, a ramdisk of size 8MB was sufficient to hold all application related files. We configured VMs with 32MB RAM, running paravirtualized 2.6.12 linux kernel and busybox based small filesystem from the ramdisk image. These VMs have approximately 18MB of free RAM for application use after system startup. This amount of RAM is sufficient for the representative applications. For compression/decompression we use the standard GNU zip v1.3.3. In order to demonstrate the effect of memory usage on checkpointed state, we use a simple application eatmem that allocates a specified amount of memory and dirties it with either random or specified pattern of data. Regardless of application memory requirement, the size of the checkpoint file is ∼32MB. However, significant savings on state size can be extracted based on an application’s memory usage pattern. Figure 2(a) shows the total time taken for state capture and reinstantiation, including (1) the VM save operation to checkpoint a VM to a file, (2) the transmission cost to transfer it over an 802.11b wireless link, and (3) the VM restore operation to restore a VM from a checkpointed file. These results are presented for idle domains and for eatmem application using different amounts of memory. Except for the case of domain idle-unzipped, all other cases also include the cost of compressing and un-compressing the state. Domain idle-scrubbed and all eatmem experiments use scrubbed memory. The transmission cost is obtained by measuring the average throughput achieved by an ftp stream between a laptop and a handheld computer connected via an 802.11b based single hop wireless network running at 11Mbps. The throughput turns out to be approximately 2.2Mbps. Even with higher throughput, the cost for transmission will dominate the overall cost of state exchange. This clearly shows the benefits of using compression and scrubbing, which allow for efficient compression. As expected, higher memory usage by eatmem application leads to less benefits from compression, but the overall cost is still less than the decompressed case (which will be equal to domain idle-unzipped case.) Figure 2(b) shows just the computational part of state exchange cost. This cost is dominated by the compression cost; VM save/restore and decompress costs are a relatively small component. The CMVision based application represents a typical vision processing component of a robotic framework. It searches for a specified colored blob in the image captured by the camera attached to the host machine. Based on the result of this search, it builds a knowledge-base which indexes the result with its current coordinates (randomly chosen from a 1000x1000 grid). Table 1(a) shows

Spirits: Using Virtualization and Pervasiveness

140 "vm save" "gzip −9" "transmission" "gunzip" "vm restore"

120

time(seconds)

100

80

60

40

20

0

Dom

ain

Dom Dom eatm eatm eatm eatm eatm ain ain em em em em em idle idle idle , sc 2M, ran 6M, ran 10M, ra 14M, ra 18M, ra −un rubb zipp dom ndo ndo dom ndo ed ed m m m

(a) Total 8

"vm save" "gzip −9" "gunzip" "vm restore"

7

time(seconds)

6

5

4

3

2

1

0

Dom

ain

eatm eatm eatm eatm Dom Dom eatm ain ain em em em em em idle idle idle , sc 2M, ran 6M, ran 10M, ra 14M, ra 18M, ra −un rubb zipp ndo ndo ndo dom dom ed ed m m m

(b) Compute Only

Fig. 2. State exchange cost for idle domain and eatmem application

VM Save 0.550 s Compress 3.016 s Transfer state 27.3 s Decompress 0.715 s VM Restore 0.398 s Compression ratio 77.6% Free memory in VM 14012 kB

VM Save 0.428 s Compress 3.131 s Transfer State 30.11 s Decompress 0.781 s VM Restore 0.547 s Compression ratio 75.3% Free memory in VM 13852 kB

(a) CMVision

(b) localize

Table 1. Migration costs for a VM running representative applications

125

126

H. Raj et al.

migration costs associated with the state of the VM running the CMVision based application. The amount of state is limited by the grid and image size, the latter of which is small due to the restrictions with logical virtualization discussed earlier. The localization module performs the task of inferring the robot’s position given a map and current sensor readings. Currently we use simulated sensor readings as provided by another node over the network rather than using the actual physical sensor readings. The state of the VM running localization consists of the map of the environment and sensor data being read over network. Table 1(b) shows migration costs associated with the state of this VM. Based on the results above, we observe that: – Migration cost for representative applications is dominated by state transfer cost over limited bandwidth wireless network. – Using a spirit cache node provides flexibility in deploying spirits at replacement robots at the cost of an extra state transfer from cache node to replacement robot. – Since most of the cost is in state transfer, we can draw better cost savings by reducing state size. Currently we use compression for the purpose. Another approach that we plan to use in future is to generate a diff of the current guest VM state with a pristine VM state. This diff can then be used as a binary patch to re-create state at the other node.

5

Related Work

One of the primary motivating factors for system adaptation in robotics is due to the unreliability inherent in the domain [11], apart from power savings. A range of techniques have been deployed for system adaptation including software evolution, middleware based solutions, and adaptable and extensible operating systems [22]. The focus of this paper is primarily on migrating software components among robots for system adaptation. We encapsulate migratable components in spirits, which form the basic unit for system adaptation via migration. In contrast to the numerous techniques that exist for migrating specialized processes [7], we focus on the more general case in this paper. Process migration, whereby an executing process is transferred from one system to another, is an effective technique to address adaptation. However, as observed in [8], the transparent implementation of process migration for systems originally designed to run stand-alone is considered unrealistic. Instead, checkpoint-restart [23] is gaining acceptance. Zap [24] is an example of such a mechanism, where a thin virtualization layer on top of the operating system is used in conjunction with checkpoint-restart to provide process migration. The MIGSOCK [25] project at CMU is an attempt at implementing process migration by also providing network socket migration support. Recent efforts also include using compilers to identify points in the process that make checkpointing easier by minimizing the amount of state that needs to be saved [26].

Spirits: Using Virtualization and Pervasiveness

127

VM migration, like process migration, was originally developed for the purpose of load balancing in computational clusters. Recent efforts have adapted VMs to be used in other domains. An example is Soulpad [27], where the operating system is run in a virtual machine, which in turn is stored in and accessed from a small portable device. In this setup, the VM presents a uniform, virtual environment for the operating system to run on, thus enabling it to run on any platform. In this paper, we apply the VM migration approach for system adaptation in mobile robotics. The ultimate vehicle for carrying out migration could vary from using adhoc networks, to techniques like message ferrying [28].

6

Conclusions and Future Work

This paper makes the case for applying techniques from virtualization and pervasive networks to overcome the limitations arising from power and unreliability in the robotics domain. The feasibility of the spirits approach is demonstrated through sample applications migrated with the Xen virtualization environment, and the effect of varying checkpointed state sizes is studied. The increasing processing power in current robots could make system adaptation via VM migration, originally developed with enterprise computing in mind, both feasible and beneficial. Our future work includes addressing and investigating the various challenges discussed in Section 3. We also plan to look at the effects of including poweraware applications into our virtualized environment. Since these types of applications typically rely upon underlying power models in the OS as feedback mechanisms, the virtualization technology must be extended to recreate these mappings as applications are shifted to different platforms with varying power tradeoffs.

References [1] Estrin, D., Govindan, R., Heidemann, J., Kumar, S.: Next century challenges: scalable coordination in sensor networks. In: The 5th annual ACM/IEEE international conference on Mobile computing and networking (MobiCom ’99). (1999) 263–270 [2] Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty process. J. ACM 32(2) (1985) 374–382 [3] Demers, A., Greene, D., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis, H., Swinehart, D., Terry, D.: Epidemic algorithms for replicated database maintenance. In: The sixth annual ACM Symposium on Principles of distributed computing (PODC ’87). (1987) 1–12 [4] Haas, Z., Halpern, J.Y., Li, E.L.: Gossip based ad-hoc routing. In: IEEE Conference on Computer Communications (INFOCOM). (2002) [5] Elson, J., Girod, L., Estrin, D.: Fine-grained network time synchronization using reference broadcasts. SIGOPS Oper. Syst. Rev. 36(SI) (2002) 147–163 [6] Stoica, I., Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A scalable Peer-To-Peer lookup service for internet applications. In: The 2001 ACM SIGCOMM Conference. (2001) 149–160

128

H. Raj et al.

[7] Kiniry, J., Zimmerman, D.: A hands-on look at Java mobile agents. IEEE Internet Computing 1(4) (1997) [8] Milojicic, D., Douglis, F., Paindaveine, Y., Wheeler, R., Zhou, S.: Process Migration Survey. ACM Computing Surveys (2000) [9] Pratt, I., Fraser, K., Hand, S., Limpach, C., Warfield, A., Magenheimer, D., Nakajima, J., Mallick, A.: Xen 3.0 and the Art of Virtualization. In: Ottawa Linux Symposium (OLS ’05). (2005) [10] O’Hara, K.J., Nathuji, R., Raj, H., Schwan, K., Balch, T.: Autopower: Toward energy-aware software systems for distributed mobile robots. In: IEEE International Conference on Robotics and Automation. (2006) [11] Carlson, J., Murphy, R.: How UGVs physically fail in the field. IEEE Transactions on Robotics 21(3) (2005) 423–437 [12] Culler, D.E., Hill, J., Buonadonna, P., Szewczyk, R., Woo, A.: A network-centric approach to embedded software for tiny devices. EMSOFT 2001 (2001) [13] O’Hara, K.J., Bigio, V., Whitt, S., Walker, D., Balch, T.: Evaluation of a large scale pervasive embedded network for robot path planning. In: IEEE International Conference on Robotics and Automation. (2006) [14] : Gumstix robotics platform. (Available from http://www.gumstix.com) [15] Werfel, J., Bar-Yam, Y., Nagpal, R.: Construction by robot swarms using extended stigmergy. Technical Report MIT-CSAIL-TR-2005-024, Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory (2005) [16] : The mosix organizational grid - a white paper. (Available from http:// www.mosix.org/txt%5Fpub.html) [17] Clark, C., Fraser, K., Hand, S., Hansen, J.G., Jul, E., Limpach, C., Pratt, I., Warfield, A.: Live migration of virtual machines. In: The 2nd Symposium on Networked Systems Design and Implementation (NSDI ’05). (2005) [18] : Video for linux. (Available from linux.bytesex.org/v4l2) [19] : Service provisioning markup language. (Available from http://www.oasisopen.org) [20] Montemerlo, M., Roy, N., Thrun, S.: Perspectives on standardization in mobile robot programming: The carnegie mellon navigation (carmen) toolkit. In: IEEE/RSJ Int. Conf. on Intelligent Robots and Systems. (2003) 2436–2441 [21] Bruce, J., Balch, T., Veloso, M.: Fast and inexpensive color image segmentation for interactive robots. In: IEEE/RSJ International Conference on Intelligent Robots and Systems. (2000) [22] Kasten, E.P., McKinley, P.K.: A taxonomy for computational adaptation. Technical Report MSU-CSE-04-4, Department of Computer Science, Michigan State University (2004) [23] Sankaran, S., Squyres, J., Barrett, B., Sahay, V., Lumsdaine, A., Duell, J., Hargrove, P., Roman, E.: The LAM/MPI Checkpoint/Restart Framework: SystemInitiated Checkpointing. International Journal of High Performance Computing Applications 19(4) (2000) [24] Osman, S., Subhraveti, D., Su, G., Nieh, J.: The Design and Implementation of Zap: A System for Migrating Computing Environments. In: Operating Systems Design and Implementation (OSDI ’02). (2002) 361–376 [25] Kuntz, B., Rajan, K.: Migsock migratable tcp socket in linux. Master’s thesis, Information Networking Institute, Carnegie Mellon University (2002) Available from www.cs.cmu.edu/ softagents/migsock/MIGSOCK.pdf.

Spirits: Using Virtualization and Pervasiveness

129

[26] Zhang, K., Pande, S.: Efficient application migration under compiler guidance. In: The SIGPLAN Conference on Languages, Compiler and Tools for Embedded Systems (LCTES ’05). (2005) [27] Caceres, R., Carter, C., Narayanaswami, C., Raghunath, M.: Reincarnating PCs with portable SoulPads. In: The 3rd international conference on Mobile systems, applications, and services (MobiSys ’05). (2005) 65–78 [28] Zhao, W., Ammar, M., Zegura, E.: A message ferrying approach for data delivery in sparse mobile ad hoc networks. In: ACM Mobihoc. (2004)

Mobile Service Clouds: A Self-Managing Infrastructure for Autonomic Mobile Computing Services Farshad A. Samimi1 , Philip K. McKinley1 , and S. Masoud Sadjadi2 1

2

Department of Computer Science and Engineering, Michigan State University, East Lansing, MI 48823, USA {farshad, mckinley}@cse.msu.edu School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA [email protected]

Abstract. We recently introduced Service Clouds, a distributed infrastructure designed to facilitate rapid prototyping and deployment of autonomic communication services. In this paper, we propose a model that extends Service Clouds to the wireless edge of the Internet. This model, called Mobile Service Clouds, enables dynamic instantiation, composition, configuration, and reconfiguration of services on an overlay network to support mobile computing. We have implemented a prototype of this model and applied it to the problem of dynamically instantiating and migrating proxy services for mobile hosts. We conducted a case study involving data streaming across a combination of PlanetLab nodes, local proxies, and wireless hosts. Results are presented demonstrating the effectiveness of the prototype in establishing new proxies and migrating their functionality in response to node failures.

1 Introduction As the cyberinfrastructure becomes increasingly complex, the need for autonomic [1] communication services is also increasing. Autonomic communication services can be used to support fault tolerance, enhance security, and improve quality of service in the presence of network dynamics. An integral part of such systems is the ability to dynamically instantiate and reconfigure services, transparently to end applications, in response to changes in the network environment. A popular approach to supporting transparent reconfiguration of software services is adaptive middleware [2]. However, supporting autonomic communication services often requires adaptation not only at the communication end points, but also at intermediate nodes “within” the network. One approach to this problem is the deployment of a service infrastructure within an overlay network [3], in which end hosts form a virtual network atop the physical network. The presence of hosts along the paths between communication end points enables intermediate processing of data streams, without modifying the underlying routing protocols or router software. This paper investigates the integration of adaptive middleware and overlay networks to support autonomic communication services on behalf of mobile hosts. 

This work was supported in part by the U.S. Department of the Navy, Office of Naval Research under Grant No. N00014-01-1-0744, and in part by National Science Foundation grants EIA0000433, EIA-0130724, and ITR-0313142.

A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 130–141, 2006. c Springer-Verlag Berlin Heidelberg 2006 

Mobile Service Clouds: A Self-Managing Infrastructure

131

We recently introduced Service Clouds [4], an overlay-based infrastructure intended to support rapid prototyping and deployment of autonomic communication services. In this approach, the nodes in an overlay network provide a “blank computational canvas” on which services can be instantiated on demand, and later reconfigured in response to changing conditions. When a new service is needed, the infrastructure finds a suitable host, instantiates the service, and maintains it only as long as it is needed. We implemented a prototype of Service Clouds and experimented with it atop the PlanetLab Internet testbed [5]. We conducted two case studies in which we used the Service Clouds prototype to develop new autonomic communication services. The first was a TCP-Relay service, in which a node is selected and configured dynamically to serve as a relay for a data stream. Experiments demonstrate that our implementation, which is not optimized, can in many cases produce significantly better performance than using a native TCP/IP connection. The second case study involved MCON, a service for constructing robust connections for multimedia streaming. When a new connection is being established, MCON exploits physical network topology information in order to dynamically find and establish a high-quality secondary path, which is used as shadow connection to the primary path. Details of these studies can be found in [4]. In this paper, we propose Mobile Service Clouds, an extension of the above model that supports autonomic communication at the wireless edge of the Internet, defined as those nodes that are one, at most a few, wireless hops away from the wired infrastructure. Mobile computing environments exhibit operating conditions that differ greatly from their wired counterparts. In particular, applications must tolerate the highly dynamic channel conditions that arise as users move about the environment. Moreover, the computing devices being used by different end users may vary in terms of display characteristics, processor speed, memory size, and battery lifetime. Given their synchronous and interactive nature, real-time applications such as video conferencing are particularly sensitive to these differences. The Mobile Service Clouds model supports dynamic composition and reconfiguration of services to support clients at the wireless edge. We have implemented a prototype of this model and applied it to the problem of dynamically instantiating and migrating proxy services for mobile hosts. We conducted a case study involving data streaming across a combination of PlanetLab nodes, local proxies, and wireless hosts. Results are presented demonstrating the effectiveness of the prototype in establishing new proxies and migrating their functionality in response to node failures. The remainder of this paper is organized as follows. Section 2 provides background on Service Clouds and discusses related work. Section 3 introduces Mobile Service Clouds model. Section 4 describes the architecture and the implementation of Mobile Service Clouds. Section 5 presents a case study and experimental results. Finally, Section 6 summarizes the paper and discusses future directions.

2 Background and Related Work This research is part of the RAPIDware project, which addresses the design of highassurance adaptable software. In this paper, we focus primarily on the software infrastructure needed to realize adaptive behavior in streaming services. Other parts of the

132

F.A. Samimi, P.K. McKinley, and S.M. Sadjadi

RAPIDware project address different dimensions of autonomic computing, including state maintenance and consistency across adaptations [6], contracts to guide autonomic behavior [7], and a perceptual memory system to support decision making [8]. Figure 1 shows a conceptual view of Service Clouds. A service cloud can be viewed as a collection of hosts whose resources are available to enhance communication services (e.g., in terms of fault tolerance, quality of service, or security) transparently with respect to the communication end points. To do so requires autonomic behavior, in which individual nodes in the cloud use adaptive middleware and cross-layer collaboration to support reconfiguration. An overlay network connecting these nodes serves as a vehicle to support cross-platform cooperation. The nodes in the overlay network provide the computing resources with which services can be instantiated as needed, and later reconfigured in response to changing conditions. The Service Clouds infrastructure is designed to be extensible: a suite of low-level services for local and remote interactions can be used to construct higher-level autonomic services.

Service Clouds

`` ``

Overlay Node Distributed Overlay Service

Fig. 1. Conceptual view of the Service Clouds infrastructure

The Service Clouds concept incorporates results from three areas of research in distributed systems. First, adaptive middleware and programming frameworks [9, 10, 11] enable dynamic reconfiguration of software in response to changing conditions. Research in this area has been extensive (see [2] for a survey) and has provided a better understanding of several key concepts relevant to autonomic computing, including reflection, separation of concerns, component-based design, and transparent interception of communication streams. Second, cross-layer cooperation mechanisms [12, 13] enable coordinated adaptation of the system as a whole, and in ways not possible within a single layer. The Service Clouds architecture supports cross-layer cooperation and incorporates low-level network status information in the establishment and configuration of high-level communication services. Third, overlay networks [3] provide an adaptable and responsive chassis on which to implement communication services for many distributed applications [14, 15, 16, 17].

Mobile Service Clouds: A Self-Managing Infrastructure

133

Recently, several groups of researchers have investigated several ways to use overlay networks to support dynamic composition and configuration of communication and streaming services (e.g., SpiderNet [18], CANS [19], GridKit [20], DSMI [21], and iOverlay [22] ). Service Clouds complements to these research works by providing a “toolkit” with which to develop and test new services that require dynamic instantiation, composition, and reconfiguration. Service Clouds provides an extensive set of components that can be used to compose complex communication services. Moreover, the developer can introduce new or customized components by simply plugging them into the Service Clouds infrastructure. Our initial work on Service Clouds [4] focused on services in the wired network. At the wireless edge, services are often deployed on proxy nodes, which operate on behalf of mobile hosts. Extensive research has been conducted in the design of proxy services to support data transcoding, handle frequent disconnections, and enhance the quality of wireless connections through techniques such as forward error correction (FEC) [23, 24, 25]. Rather than addressing operation of specific proxy services, in this work we concentrate on the dynamic instantiation and reconfiguration of proxy services in response to changing conditions.

3 Extending Service Clouds to the Wireless Edge Figure 2 depicts an extension of the original Service Clouds model to support mobile computing. In this model, mobile service clouds comprise collections of overlay hosts that implement services close to the wireless edge, while deep service clouds perform

`` ` ```

Service Clouds Federation

````

Airport Hotspots

``` ` City Hotspots (ISP operated)

Mobile User

``` ` Mobile Service Cloud (Internet Wireless Edge)

University Campus

Possible stream redirection based on user mobility

Overlay Node

Distributed Overlay Service

Multimedia Server (Streaming)

Deep Service Cloud (Internet Overlay)

Fig. 2. Example scenario involving Mobile Service Clouds

134

F.A. Samimi, P.K. McKinley, and S.M. Sadjadi

services using an Internet overlay network (such as the PlanetLab wired hosts used in our earlier study). In a manner reminiscent of Domain Name Service, the clouds in this federation of clouds cooperate to meet the needs of client applications. Typically, a mobile service cloud will comprise nodes on single intranet, for example, a collection of nodes on a university campus or used by an Internet Service Provider (ISP) to enhance quality of service at wireless hotspots. A mobile user may interact with different mobile service clouds as he/she moves about the wireless edge, with services instantiated and reconfigured dynamically to meet changing needs. Figure 2 shows an example in which a mobile user is receiving a live or interactive video stream on his mobile device. Service elements are instantiated at different locations along an overlay path according to application requirements and current conditions. For example, if the user is connected via a relatively low bandwidth wireless link, a video transcoder may be established close to the video source, to reduce the bit rate on the video stream and avoid wasted bandwidth consumption along the wired segment of the path. On the other hand, a proxy service that uses FEC and limited retransmissions to mitigate wireless packet losses, may be established on a mobile service cloud node at the wireless edge. The operation of such a proxy service depends on the type of data stream to be transmitted across the wireless channel. Over the past few years, our group has investigated proxy services for reliable multicasting [26], audio streaming [27, 28] and video streaming [27, 29, 30]. As the user moves within an area serviced by a single mobile service cloud, or among different service clouds, the proxy can be migrated so as to follow the user. We refer to such an instantiation as a transient proxy [30]. There are several reasons to keep the proxy close (in terms of network latency) to the mobile user. First is the quality of service of the data stream delivery. For example, EPR [29] is a forward error correction method for video streaming in which the proxy encodes video frames using a blockerasure code, producing a set of parity packets. The proxy sends a subset of the parity packets “pro-actively” with the stream. Additional parity packets are sent in response to feedback from the mobile device, to handle temporary spikes in loss rate. However, the effectiveness of these additional parity packets depends on the round-trip delay between the mobile device and the proxy: if the delay is too long, the parity packets arrive too late for real-time video playback. The second reason for the proxy to be close to the mobile host is resource consumption. For example, a proxy that implements forward error correction increases the bandwidth consumption of the data stream. When the mobile device connects to a different Internet access provider, if the proxy is not relocated, then the additional traffic may traverse several network links across Internet service providers. While the effect of a single data stream may be small, the combined traffic pattern generated by a large number of mobile users may have noticeable effect on performance. Third is the policy of the service provider. While an ISP may be willing to use its computational resources to meet the needs of users connected through its own access points, this policy may not apply to mobile hosts using access points belonging to another provider. In the same way that the mobile device changes its access point but remains connected, the proxy services on the connection may need to move in order for the new connection to be comparable to the old one.

Mobile Service Clouds: A Self-Managing Infrastructure

135

Mobile Service Clouds provides an infrastructure to support the deployment and migration of such proxy services for mobile clients. In the experiments described in Section 5, we address a fourth reason to migrate a proxy service, namely, fault tolerance. If a proxy suddenly crashes or becomes disconnected, another node in the service cloud should assume its duties with minimal disruption to the communication stream(s). Next, we describe the architecture and implementation of our proof-of-concept prototype.

4 Architecture and Implementation The Service Clouds infrastructure is intended primarily to facilitate rapid prototyping and deployment of autonomic communication services. Overlay nodes provide processing and communication resources on which transient services can be created as needed to assist distributed applications. Figure 3 shows a high-level view of the Service Clouds software organization and its relationship to Schmidt’s model of middleware layers [31]. Most of the Service Clouds infrastructure can be considered as host-infrastructure middleware, which provides a layer of abstraction on top of heterogeneous platforms. The Application-Middleware eXchange (AMX) provides a high-level interface to applications, and encapsulates the required logic to drive various overlay services. The Kernel Middleware eXchange (KMX) layer provides services to facilitate collaboration between middleware and the operating system. Distributed composite services are created by plugging in new algorithms and integrating them with lower-level control and data services, which in turn depend on underlying overlay services.

Fig. 3. Relationship of Service Clouds to other system layers [4]

Figure 4 provides a more detailed view of the Service Clouds architecture, showing those components introduced or used in this study (unshaded boxes), as well as those used in our TCP-Relay and MCON studies (shaded boxes). The architecture comprises four main groups of services, situated between the AMX and KMX layers. At the lowest layer, Basic Overlay Services provide generic facilities for establishing an overlay topology, exchanging status information and distributing control packets among overlay hosts. Control Services include both event processors and DCS-specific services.

136

F.A. Samimi, P.K. McKinley, and S.M. Sadjadi Application Service Reply

Service Request

Application-Middleware eXchange (AMX)

MCON Computation

Primitives

TCP Relay Computation

Service Composer

Service Path Computation

Distributed Composite Services (DCS)

TCP Relay Manager

Path RTT UDP Relay Manager

Service Gateway

Overlay Engines

TCP Relay Establish

Coordination and Service Composition

InquiryInquiry-Packet Processing

PathPath-Establish PathPath-Explore

FEC encoder/decoder

Event Processors

DCS-Specific Services

Control Services

Resiliency Adaptor

Recomposer

Data Services

Topology Monitor

Source Router

Loss Monitor Inquiry Communicator

Service Monitor

Cross-Platform Communicators

UDP Relay

RTT Meter Meta-information

Basic Overlay Services

TCP Relay

Overlay Data

Collectors Management Commands

Inquiries/replies

Routers

Data Packets

Kernel-Middleware eXchange (KMX) Operating System and Distributed Infrastructure Legend Interaction between building blocks

Management commands

Application service request/reply

Inquiry/replies between nodes

Monitoring probes/queries/replies

Data stream flow

Fig. 4. Instantiation of the general Service Clouds model

An event processor handles specific control events and messages. For example, such a service might receive certain types of inquiry packets and extract information useful to multiple higher-level services. Data Services are used to process data streams as they traverse a node; they include monitors, which carry out measurements on data streams, and actuators, which modify data streams. At the highest layer, Distributed Composite Services include overlay engines, which codify complex distributed algorithms, and Coordination and Service Composition, which provide the “glue” between overlay engines and lower-level services. For the prototype implementation of Mobile Service Clouds, we introduced new components but also reused several others. First, the Service Path Computation overlay engine manages the interaction between a mobile host and a service cloud federation. Tasks include selection of a node in a deep service cloud, called the primary proxy, which coordinates composition and maintenance of the service path between two end nodes. The Service Path Computation engine also finds a suitable node in a mobile service cloud on which to deploy the transient proxy services (FEC in our study). We also required lower-level services to use in identifying potential proxies. The RTT Meter component uses “ping” to measure round-trip time (RTT) to an arbitrary node, and the Path RTT component measures end-to-end RTT between two nodes, whose communication is required to pass through an intermediate node. The Service Gateway component implements a simple protocol to accept and reply to service requests. Upon receiving a request, it invokes the overlay engine to find

Mobile Service Clouds: A Self-Managing Infrastructure

137

a suitable primary proxy. The Service Composer component implements mechanisms for composing a service path. It uses the Relay Manager to instantiate and configure a UDP relay on the primary proxy and the transient proxy. The UDP relay on the transient proxy enables the infrastructure to intercept the stream and augment it with FEC encoding. Accordingly, as soon as the FEC proxy service is instantiated, the Service Monitor on the transient proxy begins sending heartbeat messages through a TCP channel toward the service monitor on the primary proxy. The Recomposer component on the primary proxy tracks activity of the service monitors. Upon detecting a failure, it starts a self-healing operation that recomposes the service path and restores communication. The prototype also has a main program that deploys the infrastructure primitives. It reads configuration files containing the IP addresses of overlay nodes and the overlay topology, instantiates a basic set of components, and configures the components according to the specified parameter values. Examples of these parameters include the interval between probes for monitoring purposes, service port numbers, and an assortment of debugging options. The prototype software package also includes a collection of scripts to configure nodes, update code at nodes, launch the infrastructure, run test scenarios, and collect results. To deploy management commands that control test execution on several nodes, we have used Java Message Service (JMS). Example management commands include those for gathering operation statistics on each node, changing framework parameters at run time, and shutting down the Service Clouds infrastructure. We emphasize that the purpose of this prototype is merely to identify different aspects of the problem and conduct a requirements analysis. Therefore, we have implemented only a small set of features. Building a number of such prototype systems will help reveal the salient issues for designing a more complete Service Clouds framework.

5 Case Study We conducted a case study in which we assessed the ability of MSC (Mobile Service Clouds) to establish transient proxies for mobile devices, monitor the service path, and support dynamic reconfiguration (with minimal interruption) when the proxy node fails. Basic Operation. In this scenario a mobile node on a wireless link wants to receive a multimedia stream (e.g., in an interactive video conference or in a live video broadcast). In this case, the MSC infrastructure needs to fulfill the following requirements. First, the quality of the received stream must remain acceptable as the wireless link experiences packet loss. Second, the video stream must be transcoded to satisfy resource restrictions such as wireless bandwidth and processing power at the mobile device. Third, stream delivery should not be interrupted as conditions on the service path change (e.g., when user movement causes a wireless network domain change, or when a service node fails). Figure 5 shows the configuration used in the experiments, with four PlanetLab nodes in a deep service cloud and two workstations on our department intranet in a mobile service cloud. These systems are all Unix/Linux-based machines. We have used a PlanetLab node to run a UDP streaming program and a Windows XP laptop to receive the stream over a wireless link. The middleware software on the mobile client connects to a Service Gateway node (N1) and requests the desired service. Gateway nodes are the entry point to the Service Clouds: they accept requests for connection to the Service

138

F.A. Samimi, P.K. McKinley, and S.M. Sadjadi

Clouds and designate a service coordinator based on the requested service. In this work, we assume that gateway nodes are known in advance, as with local DNS servers. In future implementations, we plan to integrate other methods, such as directory services, into the infrastructure to enable automatic discovery of gateway nodes. Upon receiving the request, the gateway begins a process to find a node to act as the primary proxy (N4), and when completed, informs the mobile client of the selection. The primary proxy receives details of the desired service, sets up a service path, and coordinates monitoring and automatic reconfiguration of the service path during the communication. A gateway might consider several factors in deciding on a primary proxy for a requested service: security policies of the client and service cloud components, round-trip time between the the node and the communication endpoints, and computational load at the node. In this example, N4 is chosen such that the path round-trip time between the two endpoints is minimal. Next, the MSC infrastructure instantiates FEC functionality at the transient proxy (W1) in a mobile service cloud, and dynamically re-instantiates the service on another node when the W1 fails. “Failure” can be defined in different ways: high computational load, high RTT to the client due to change in access point used by client, software failure of the service, or hardware failure. To study and test a basic self-repair in Service Clouds, we simply inject a failure by terminating the service process on W1. In the example depicted in Figure 5, the Service Clouds client middleware, residing on the laptop, sends a service request to the gateway node N1, which chooses N4 as the primary proxy and informs the client. Next, the client software sends a primary proxy service request to N4, which constructs a service path comprising a UDP relay on itself and a UDP relay augmented with an FEC encoder on W1. As soon as W1 starts the Service Gateway

Deep Service Cloud PlanetLab

M

Mobile User

``

2 ms

W1

N1 N2

S

37 ms (RTT)

2 ms

14.06 ms

N4

36.7 ms

W2

UDP Streamer Cornell University PlanetLab Node

N3

Mobile Service Cloud Michigan State University

Overlay Node

M

senslap10.cse.msu.edu

S

planetlab1.cs.cornell.edu

Mobile User UDP Streamer

N1

planet-lab-1.csse.monash.edu.au

Service Gateway

N2

planetlab01.cs.washington.edu

Candidate Primary Proxy

N3

planetlab1.cs.ucsb.edu

Candidate Primary Proxy

N4

planetlab1.csail.mit.edu

Candidate Primary Proxy (chosen at run time)

W1

countbasey.cse.msu.edu

Wireless edge relay

W2

arctic.cse.msu.edu

Wireless edge backup relay

Fig. 5. The experimental testbed and example scenario

Mobile Service Clouds: A Self-Managing Infrastructure

139

service, it begins sending heartbeat beacons over a TCP connection to N4 that indicate the service node is active. N4 runs a monitoring thread that listens for beacons from W1 (sent every 5msec in our experiment). If it detects a failure, it reconfigures the overlay service path to use another node in the mobile service cloud. Experimental Results. To test dynamic reconfiguration of a service path, the program running on W1 is terminated. We have evaluated two different strategies to realize selfhealing at the time of failure detection: (1) on demand backup where another node, W2, is configured dynamically as soon as failure is detected; (2) ready backup where W2 is configured as a backup at the same time of W1 configuration, so the system only needs to configure the relay on N4 to forward the stream to W2 instead of W1. We have measured percentage of packets received at the wireless node. Figure 6 plots the average of 12 runs for 50 millisecond epochs, indicating the situation when W1 fails and system recovers automatically. As the plot shows, the system can completely recover from the failure at W1 in less than 0.4 seconds. The “on demand backup” is slightly slower, since system has to instantiate and configure the proxy service. In the ready backup case, the service is instantiated at the time of service composition, yielding faster response. In future work, we will consider additional failure detection and recovery strategies. For example, adding a limited capability to the client middleware to participate in failure detection may enable the client to trigger the reconfiguration faster, since the mobile node is closer (in terms of round-trip time) to the wireless edge. 100 percentage of packets received

90 80 70 60

On Demand Backup

50

Ready Backup

40 30 20 10 0 12.45 12.5 12.55 12.6 12.65 12.7 12.75 12.8 12.85 12.9 12.95 seconds

13

Fig. 6. Packet loss during node failure

6 Conclusions and Future Work In this paper, we addressed the issue of dynamic services at the wireless edge of the Internet. We introduced Mobile Service Clouds, an infrastructure for rapid prototyping and deployment of such services. We described a proof-of-concept implementation that we used to support proxy instantiation and fault tolerance on a testbed comprising PlanetLab nodes and hosts on a university intranet. Preliminary results demonstrate the usefulness of the model and the effectiveness of the prototype. Our ongoing investigations address dynamic insertion and reconfiguration of transcoding filters along a

140

F.A. Samimi, P.K. McKinley, and S.M. Sadjadi

stream path, dynamic migration of proxy services to enhance quality-of-service, and integration of mobile service clouds and data streaming protocols for sensor networks. Further Information. Related publications on the RAPIDware project, as well as a download of the Service Clouds prototype, can be found at the following website: http://www.cse.msu.edu/rapidware.

References 1. Kephart, J.O., Chess, D.M.: The vision of autonomic computing. IEEE Computer 36(1) (2003) 41–50 2. McKinley, P.K., Sadjadi, S.M., Kasten, E.P., Cheng, B.H.C.: Composing adaptive software. IEEE Computer (2004) 56–64 3. Andersen, D., Balakrishnan, H., Kaashoek, F., Morris, R.: Resilient Overlay Networks. In: Proceedings of 18th ACM Symposium on Operating Systems Principles (SOSP 2001). (2001) 4. McKinley, P.K., Samimi, F.A., Shapiro, J.K., Tang, C.: Service Clouds: A distributed infrastructure for composing autonomic communication services. Technical Report MSU-CSE-0531 (Available at http://www.cse.msu.edu/rapidware/serviceclouds.pdf), Department of Computer Science, Michigan State University, East Lansing, Michigan (2005) 5. Peterson, L., Anderson, T., Culler, D., Roscoe, T.: A Blueprint for Introducing Disruptive Technology into the Internet. In: Proceedings of HotNets–I, Princeton, New Jersey (2002) 6. Zhang, J., Cheng, B.H.C.: Specifying adaptation semantics. In: Proceedings of the IEEE ICSE Workshop on Architecting Dependable Systems (WADS), St. Louis, Missouri, IEEE (2005) 7. Zhou, Z., McKinley, P.K.: COCA: a contract-based infrastructure for composing adaptive multimedia systems. In: Proceedings of the 8th International Workshop on Multimedia Network Systems and Applications (MNSA 2006), held in conjunction with the IEEE 26th International Conference on Distributed Computing Systems (ICDCS 2006), Lisboa, Portugal (2006) 8. Kasten, E.P., McKinley, P.K.: Meso: Perceptual memory to support online learning in adaptive software. In: Proceedings of the 3rd International Conference on Development and Learning (ICDL’04), La Jolla, California (2004) 9. Zinky, J.A., Bakken, D.E., Schantz, R.E.: Architectural support for quality of service for CORBA objects. Theory and Practice of Object Systems 3(1) (1997) 1–20 10. Redmond, B., Cahill, V.: Supporting unanticipated dynamic adaptation of application behaviour. In: Proceedings of the 16th European Conference on Object-Oriented Programming, Malaga, Spain, Springer-Verlag (2002) volume 2374 of Lecture Notes in Computer Science. 11. Liu, H., Parashar, M., Hariri, S.: A component-based programming model for autonomic applications. In: Proceedings of the 1st International Conference on Autonomic Computing, New York, NY, USA, IEEE Computer Society (2004) 10–17 12. Noble, B.D., Satyanarayanan, M., Narayanan, D., Tilton, J.E., Flinn, J., Walker, K.R.: Agile application-aware adaptation for mobility. In: Proceedings of the Sixteen ACM Symposium on Operating Systems Principles. (1997) 276–287 13. Kong, J., Schwan, K.: KStreams: kernel support for efficient data streaming in proxy servers. In: Proceedings of the 15th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV), ACM (2005) 159–164 14. Gribble, S.D., Welsh, M., von Behren, J.R., Brewer, E.A., Culler, D.E., Borisov, N., Czerwinski, S.E., Gummadi, R., Hill, J.R., Joseph, A.D., Katz, R.H., Mao, Z.M., Ross, S., Zhao, B.Y.: The Ninja architecture for robust Internet-scale systems and services. Computer Networks 35(4) (2001) 473–497

Mobile Service Clouds: A Self-Managing Infrastructure

141

15. Byers, J.W., Considine, J., Mitzenmacher, M., Rost, S.: Informed content delivery across adaptive overlay networks. IEEE/ACM Transactions on Networking (TON) 12(5) (2004) 767–780 16. Li, B., Xu, D., Nahrstedt, K.: An integrated runtime QoS-aware middleware framework for distributed multimedia applications. Multimedia Systems 8(5) (2002) 420–430 17. Rodriguez, A., Killian, C., Bhat, S., Kostic, D., Vahdat, A.: Macedon: Methodology for automatically creating, evaluating, and designing overlay networks. In: Proceedings of the USENIX/ACM First Symposium on Networked Systems Design and Implementation (NSDI 2004), San Francisco, California (2004) 267–280 18. Gu, X., Nahrstedt, K., Yu, B.: SpiderNet: An integrated peer-to-peer service composition framework. In: Proceedings of IEEE International Symposium on High-Performance Distributed Computing (HPDC-13), Honolulu, Hawaii (2004) 110–119 19. Fu, X., Shi, W., Akkerman, A., Karamcheti, V.: CANS: composable and adaptive network services infrastructure. In: The 3rd USENIX Symposium on Internet Technology and Systems, San Franscisco, California (2001) 20. Grace, P., Coulson, G., Blair, G., Mathy, L., Duce, D., Cooper, C., Yeung, W.K., Cai, W.: Gridkit: Pluggable overlay networks for grid computing. In: Proceedings of International Symposium on Distributed Objects and Applications(DOA), Larnaca, Cyprus (2004) 1463– 1481 21. Kumar, V., Cooper, B.F., Cai, Z., Eisenhauer, G., Schwan, K.: Resource-aware distributed stream management using dynamic overlays. In: Proceedings of the 25th International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, OH, USA, IEEE Computer Society (2005) 783–792 22. Li, B., Guo, J., Wang, M.: iOverlay: A lightweight middleware infrastructure for overlay application implementations. In: Proceedings of the Fifth ACM/IFIP/USENIX International Middleware Conference, also Lecture Notes in Computer Science. Volume 3231., Toronto, Canada (2004) 135–154 23. Fox, A., Gribble, S.D., Chawathe, Y., Brewer, E.A.: Adapting to network and client variation using active proxies: Lessons and perspectives. IEEE Personal Communications (1998) 24. Zenel, B.: A general purpose proxy filtering mechanism applied to the mobile environment. Wireless Networks 5 (1999) 391–409 25. Roussopoulos, M., Maniatis, P., Swierk, E., Lai, K., Appenzeller, G., Baker, M.: Person-level routing in the mobile people architecture. In: Proceedings of the 1999 USENIX Symposium on Internet Technologies and Systems, Boulder, Colorado (1999) 26. McKinley, P.K., Tang, C., Mani, A.P.: A study of adaptive forward error correction for for wireless collaborative computing. IEEE Transactions on Parallel and Distributed Systems (2002) 27. McKinley, P.K., Padmanabhan, U.I., Ancha, N., Sadjadi, S.M.: Composable proxy services to support collaboration on the mobile internet. IEEE Transactions on Computers (Special Issue on Wireless Internet) (2003) 713–726 28. Zhou, Z., McKinley, P.K., Sadjadi, S.M.: On quality-of-service and energy consumption tradeoffs in fec-enabled audio streaming. In: Proceedings of the 12th IEEE International Workshop on Quality of Service (IWQoS 2004), Montreal, Canada (2004) 29. Ge, P.: Interactive Video Multicast in Wireless LANs. PhD thesis, Michigan State University, Department of Computer Science and Engineering (2004) 30. Samimi, F.A., McKinley, P.K., Sadjadi, S.M., Ge, P.: Kernel-middleware interaction to support adaptation in pervasive computing environments. In: Proceedings of the 2nd Workshop on Middleware for Pervasive and Ad-Hoc Computing, Toronto, Ontario, Canada, ACM Press (2004) 140–145 31. Schmidt, D.C.: Middleware for real-time and embedded systems. Communications of the ACM 45(6) (2002) 43–48

Capacity Efficient Shared Protection and Fast Restoration Scheme in Self-Configured Optical Networks* Jacek Rak Gdansk University of Technology, Narutowicza 11/12 80-952 Gdansk, Poland [email protected]

Abstract. At present one can observe the increasing dependency of society on large-scale complex networked systems. The consequences of faults of network elements are magnified by rapidly growing bandwidth of links and nodes. In this paper, a novel SCPO heuristic algorithm of establishing survivable connections in wide-area networks, that optimizes the level of resource (link capacity) utilization, is proposed. Unlike many popular optimization methods, it guarantees fast restoration of connections. The key idea is to keep backup paths the shortest by performing the optimization after establishing the connections. The proposed a posteriori optimization is based on the Largest-First graph coloring heuristics. The model is dedicated to static traffic pattern and preplanned survivability scheme. The algorithm was evaluated for the US Long-Distance Network and compared to the earlier resource utilization optimization approaches. The results show that with only a little capacity utilization degradation, fast restoration can be achieved. The observed reduction in restoration time values is significant (up to 41%). Presented solutions are dedicated to WDM-based optical communications network architectures.

1 Introduction At present one can observe the increasing dependency of society on large-scale complex networked systems. This magnifies the consequences of failures as well as amplifies the importance of assuring the network survivability. Software Engineering Institute defines survivability as the capability of a system to fulfill its mission in a timely manner, in the presence of attacks, failures or accidents [4, 11]. The term mission refers to goals or requirements of a very high level, also comprising timeliness. In the past, failures – caused by system deficiencies such as software errors or hardware corruptions, and accidents – meaning all the potentially damaging events such as natural disasters, were mostly considered. They were all assumed to be independent and random. However, many errors are now often the result of an attack, directed towards an important network element. This paper investigates the survivability issues of end-to-end connections in optical wide-area networks, where, due to wavelength division multiplexing, each network *

This work was partially supported by the Ministry of Education and Science, Poland, under the grant 3 T11D 001 30.

A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 142 – 156, 2006. © Springer-Verlag Berlin Heidelberg 2006

Capacity Efficient Shared Protection and Fast Restoration Scheme

143

link consists of a set of channels (wavelengths), each one being capable of transmitting the data independently at peak electronic speed of a few Gbps. Survivability of connections is achieved by introducing redundancy and applying the appropriate network self-reconfiguration procedures. It means that for the main path of a connection, referred to as active or working path, there are additional (redundant) paths, called backup paths (or shortly backups), used to protect the connection in case of a certain failure scenario [7, 8, 12, 15, 16]. The most common is the protection against a single network element failure (either a link or a node). In a failure state, if an active path of a connection uses capacity of a failed network element, then the flow of data must be redirected onto the backup path, as shown in Fig. 1.

3

7

1 4 9

backup path

6

2

5

active path

8

Fig. 1. An example of a survivable connection For a connection, established between nodes 1 and 8, there is an active path (1,4,6,8). To provide survivability (here in case of a single link failure), a backup (1,2,5,8), being link-disjoint1 with its active path, is used. The active path transmits the data in a normal operating state. However, if a failure happens and is referred to a network element being used by the primary path, (e.g. a link (4,6) fails), then the procedure of a connection reconfiguration is executed (a backup path is activated), and the flow of data is switched onto the backup path. To fulfill the initial capacity requirement, the backup path must be of the same capacity as the protected primary path

Protection and restoration are the two main issues of fault management in survivable networks [12, 15, 16]. In the protection (preplanned) scheme, backup paths are installed in advance, when service is requested. This provides guarantee on restoration ability, but utilizes much more spare capacity. On the contrary, restoration approach does not provide full survivability, since it implies attempts, performed after a failure, to dynamically establish a backup path for each broken connection. Concerning the scope of protection/restoration, we may mainly have path or link protection/restoration [8, 15]. In the former scheme, there is only one backup path to protect an active path, while in the latter approach, if protection against a single link failure2 is to be assured, then each link of an active path is protected by its own 1

The term link-disjoint refers to the backup path that has no common links with its active path and is used to protect the connection against a single link failure. A backup path that is node-disjoint with its active path, has no common nodes with its active path (except the source and destination nodes of a backup path) and is used to protect the connection against a single node failure. 2 If protection against a single node failure is to be assured, then each backup path is to protect two neighboring links of an active path.

144

J. Rak

backup path. In region protection/restoration, proposed in [13], each backup path is to protect a certain region of an active path. In order to detect a network element failure, at each link channel, analog signal parameters, such as signal-to-noise ratio or signal power, are monitored. An alarm is triggered when the value of at least one of the monitored parameters goes beyond the predefined level [19]. At each optical link, there is a control channel [10], used for administrative purposes. In particular, this channel is utilized to restore connections upon a failure. Figs. 2-3 show the example processes of backup path activation [16, 17] after a single link failure for path and link protection scheme, respectively.

s

d

s

d

s

d

A LINK-FAIL message is sent by end-nodes of a failed link to connection-source s and connection-destination d nodes s

d

A SETUP message is sent by a connection-source node s along the backup path s

d

A CONFIRM message is sent by a connection-destination node d along the backup path s

d

Backup path activation process is complete

Fig. 2. Backup path activation procedure for path protection scheme

A SETUP message is sent by a link-source node along the backup path s

d

A CONFIRM message is sent by a link-destination node along the backup path s

d

Backup path activation process is complete

Fig. 3. Backup path activation procedure for link protection scheme

The procedure of network self-reconfiguration is, however, different for the restoration approach. For instance, in link restoration, the end nodes of the failed link try to dynamically discover a route around that link for each broken connection [16]. In path restoration, the end nodes of the connection try to find an end-to-end backup. Assuring survivability of connections by establishing backup paths results in an excessive level of link capacity utilization. This redundancy, together with the constrained capacities of links, limits the number of connections that may be established. Typical optimization technique, that reduces the ratio of link capacity utilization, here referred to as the a priori approach, is based on the idea of sharing the backup paths [6, 8, 15]. However, it finds backups that are often not the shortest ones, since, under sharing the backup path capacities, backup paths may consist of

Capacity Efficient Shared Protection and Fast Restoration Scheme

145

many zero cost links. Long backups make the restoration task time-consuming, since, due to the restoration protocol, the longer the backup paths are, the more time it takes to restore a connection [16, 17]. Apart from violating the demanded guarantee on restoration time, a long backup path may also cause the degradation of optical signal quality. Due to transmission and switching impairments, the bit error rate or the signal-to-noise ratio at the destination node may not be acceptable. A long backup path, even if installed in advance, can unexpectedly fail to restore a connection.

2 Related Work Although the issue of assuring fast service restoration with simultaneous optimization of link capacity utilization has been investigated by several research groups, in general it still has not been extensively studied. In [14] an ILP model to achieve fast connection restoration by minimizing the length of backup paths, is proposed. It incorporates an additional parameter μ to each backup link cost, in the way that the link cost reflects both the link length and the level of unused link capacity that the backup may use. Another interesting approach, based on the p-cycle concept, eliminates the time-consuming task of reconfiguring the backup path transit nodes during the connection restoration process, by assuring that sharing the backup path resources will not result in establishing the so called branch points [2]. In [1] and [18] the time of connection restoration is reduced by using the two-way messaging protocol, instead of a typical three-way handshake restoration routine. No CONFIRM message, normally needed to finish the restoration process, is sent, which remarkably reduces the overall value of restoration time. Switching the data onto the backup path is done just after sending the SETUP message along the backup path, followed by the time interval of ε. In this paper, a novel SCPO heuristic algorithm of establishing survivable connections in wide-area networks, that optimizes the level of resource (link capacity) utilization, is proposed. Unlike many popular optimization methods, it guarantees fast restoration of connections. The key idea is to keep backup paths the shortest by performing the optimization after establishing the connections. The proposed a posteriori optimization is based on the Largest-First graph coloring heuristics. Its important feature is that it can be used in parallel with many algorithms of establishing connections, to improve (i.e. lower) the ratio of network resource utilization without increasing the values of connection restoration time. In this sense, the only competitive approach to the proposed method is the concept of the a priori optimization [6, 8]. To the best knowledge of the author, this is the first paper to propose the algorithm optimizing the link capacity utilization in survivable networks, that does not increase the length of backup paths. The results show that with only a little capacity utilization degradation, fast restoration can be achieved. Our algorithm may be applied for all cases of the protection scope as well as for all types of failures. However, here only protection against a single link failure is considered. In order to simplify the comparisons, the standard metrics of distance as

146

J. Rak

well as typical Dijkstra’s shortest path algorithm are used in the paper in all cases to find both active and backup paths. However, our optimization method can be easily applied for other metrics as well as algorithms of path finding. The traffic is assumed to be static, meaning that the set of connection requests is given in advance. The rest of the paper is organized as follows. Section 3 presents the principles of optimizing the link capacity utilization when assuring survivability. The author’s SCPO algorithm with the novel a posteriori optimization of resource utilization is given in detail. The proposed optimization is then compared to the common a priori technique. Section 4 shows the modeling assumptions. The results, obtained for the US Long-Distance Network, are described in Section 5 and include: the lengths of backup paths, the level of link capacity utilization and the values of connection restoration time.

3 Optimization of Link Capacity Utilization Typical methods optimizing the ratio of link capacity utilization are generally based on the concept of sharing the link capacities that are reserved for backup paths. If, for each connection, provisioning 100% of the requested capacity after a failure is required, then capacity of a link channel, reserved for backup purposes, may be shared among different backups, if the respective protected parts of active paths are mutually disjoint3 [6, 8]. Indeed, if the protected parts of active paths are mutually disjoint, then after a failure of any single network element, there is no danger of parallel activation of more than one backup path from the set of backups that share the same unit of bandwidth (i.e. a particular channel of a link). In other words, backup capacities may be shared, if they are to protect against different failure scenarios (which are not expected to occur simultaneously) [17]. Considering the strength of optimization, there are three main variants, described in [8] and illustrated in Fig. 4. They include: • • •

intra-demand sharing – sharing the link capacities of backup paths that protect disjoint parts of an active path of the same connection inter-demand sharing – sharing the link capacities of backup paths that protect disjoint parts of active paths of different connections parallel intra- and inter-demand sharing – the most capacity efficient kind of optimization, combining the features of both intra- and inter-demand sharing.

A typical optimization technique, presented in [6, 8], here referred to as the a priori approach, is performed before finding a given backup path. It is applied when calculating the cost ξij of each link lij. If the backup path for the kth connection of capacity r(k) is to be found, then the cost ξij of each link lij is determined as: ­ 0 ° ξ ij = ® ( r ( k ) − mij( k ) ) ⋅ sij ° ∞ ¯ 3

if r ( k ) ≤ mij( k ) if r

(k )

> mij( k ) and f ij ≥ r ( k ) − mij( k )

(1)

otherwise

Depending on the kind of protection (either against a single link or a single node failure) these parts of active paths must be mutually link- or node-disjoint, respectively.

Capacity Efficient Shared Protection and Fast Restoration Scheme

147

where: r(k) mij(k)

– –

fij sij

– –

the requested capacity the capacity reserved so far at a link lij (for the backups of the already established connections) that may be shared the unused capacity of a link lij the length of a link lij.

If, for a link lij, the requested capacity r(k) is less or equal to the capacity already reserved for backups and may be shared, then the link cost ξij is set to 0. However, if the demanded capacity r(k) is greater then the capacity mij(k) that may be shared, but there is enough unused capacity fij, then the link cost ξij is determined by the extra capacity that is to be reserved at lij and also often reflects the length sij of this link. Otherwise it is set to infinity. After having calculated the costs of all the network links, the backup path is then found as the cheapest one (regarding the aggregated sum of costs of the used links), with help of any algorithm of path finding (e.g. Dijsktra’s [3]).

8

5

2

10

active paths

1 4

6

backup paths 9

3

7

Fig. 4. An example of various strengths of link capacity utilization optimization There are two connections established between pairs of nodes [1,9] and [1,5]. The network is protected against a single link failure; link protection scheme is assumed. For the connection [1,9] there is an active path (1,3,7,9). Each link of this path is protected by a dedicated backup ((1,4,3), (3,4,6,7) and (7,6,4,9), respectively). The connection [1,5], has an active path (1,2,5) and two backups: (1,4,2) and (2,4,6,5). Intra-demand sharing may be applied to the link (3,4) for backups (1,4,3) and (3,4,6,7), as they protect link-disjoint parts of the same active path of the connection [1,9]. Inter-demand sharing may be used at the link (1,4) for backups (1,4,3) and (1,4,2), as they protect link-disjoint parts of active paths of different connections. Both intra- and inter-demand sharing may be applied to the link (4,6)

The well known heuristic algorithm of establishing Survivable Connections with the a PRiori optimization (here referred to as SCPR) is given in Fig. 5. Since, under the a priori optimization, the obtained costs ξij of backup path links (Eq. 1) frequently do not reflect well the real lengths of links, the established backups are often not the shortest ones. Here, any link, even a very long one, may be chosen to be used in a backup path, if only its capacities may be shared.

148

J. Rak

Input:

A set K of demands to establish survivable connections, each of capacity r(k)

Establish survivable connections by sequentially processing the demands from K as follows: For each demand k: 1 Find and install its active path by performing the following: a) b) c)

2

Find (applying in advance the a priori optimization of link capacity utilization) and install its backup path (or the set of backups*) by performing for each backup the following: a)

b) c)

*

calculate the matrix Ξ of link costs ξij (for each link lij, set its cost ξij to the link length sij, if the amount of unused capacity fij is not less than r(k); otherwise set ξij to infinity) find the path, using any algorithm of path finding (e.g. Dijkstra’s [3]) if finding the active path for a demand is infeasible due to the lack of resources, then reject the demand, else install the active path.

calculate the matrix Ξ of costs ξij of using links lij, including optimization (allow sharing) by calculating the link costs according to Equation 1. Any type of backup path sharing may be applied. In order to assure that the backup path is link-disjoint with the respective active path, set the costs ξij of links, used by the active path, to infinity find the path, using any algorithm of path finding (e.g. Dijkstra’s [3]) if finding any backup path for a given demand is infeasible due to the lack of resources, then reject the demand and delete all the previously installed paths of the demand, else install the backup path.

depending on the scope of protection. Path or link protection scheme is used most commonly.

Fig. 5. SCPR algorithm

Summarizing, the main disadvantage of the a priori optimization is the nonoptimal length of the established backup paths. Nowadays, optimization that is capacity effective, but, due to the increased backup path lengths, causes the restoration process take much time, is often not acceptable. Network operators are interested in fast restoration, even for the price of worse link capacity utilization ratio. 3.1 Details of the Proposed SCPO Algorithm The author’s heuristic algorithm of establishing Survivable Connections with the a POsteriori optimization of link capacity utilization (SCPO), given in Fig. 6, reduces the level of link capacity utilization while not increasing the lengths of backup paths. Short backups, in turn, provide fast restoration of connections. Finding both active and backup paths may be performed for instance with help of Dijsktra’s shortest path algorithm [3], with the link costs being equal to their lengths or to infinity. The optimization is applied after the active and backup paths of all the network connections are tried to be found and installed. That is why after applying the a posteriori optimization, the backups, originally established according to the used algorithm of path finding, remain unchanged. However, the fact of processing the a posteriori optimization after searching for paths for all the connections, confines the scope of optimization to the single link. In order to get the full advantage of the a posteriori optimization, the full wavelength conversion capability is assumed at each network node.

Capacity Efficient Shared Protection and Fast Restoration Scheme

Input: 1

149

A set K of demands to establish survivable connections, each of capacity r(k) Establish connections by sequentially processing the demands from K as follows: For each demand k: 1.1 Find* and install its active path 1.2 Find* and install its backup path(-s)

2

After having processed all the demands from K, apply the a posteriori optimization of backup paths’ link capacities, as follows: For each network link lij: 2.1 Divide the set Bij of backup paths, installed on channels of a link lij, into the subsets Bijs such that: − each subset contains backups that may share capacity one another** − the number of subsets Bijs for a link lij is minimized. 2.2 For each subset Bijs: 2.2.1 delete link channel allocations for the backups of Bijs, made in Step 1.2 2.2.2 apply sharing by allocating one common channel for all the backups of a given Bijs (allow sharing with regard to all the backups of the subset Bijs).

*

Perform the following: a) calculate the matrix Ξ of link costs (i.e. for each link lij set its cost ξij to the link length sij, if the amount of unused capacity fij is not less than r(k); otherwise set ξij to infinity). If a backup path is to be found then, in order to assure that the backup path is link-disjoint with the respective active path, set the costs ξij of links, used by the active path, to infinity b) find the path, using any algorithm of path finding (e.g. Dijkstra’s [3]) c) if finding any path of a demand is infeasible due to the lack of resources, then reject the demand and delete all the previously installed paths of the demand, else install the found path

**

Any type of sharing is allowed. Inter- or intra-demand sharing is used most commonly.

Fig. 6. SCPO algorithm

The problem of optimally dividing the set Bij of backup paths of each link lij into the subsets Bijs (Step 2.1 of the SCPO algorithm) is NP-hard [9], as it is equivalent to the vertex coloring problem of an induced graph of conflicts Gij. In such a graph: • •

the vertices correspond to parts of active paths such that their backups run through a link lij, there is an edge between a given pair of vertices p and r of Gij, if and only if the respective parts p and r of active paths are not disjoint (when a conflict appears).

Due to the computational complexity reasons, during experiments the Largest-First (LF) heuristic algorithm [9], described in the next subsection, was used to color the vertices of each graph Gij. This algorithm has the polynomial computational complexity of O(k), where k denotes the number of demands. After applying the LF algorithm for a given link lij, all the vertices that obtained the same color, represent mutually disjoint parts of active paths such that their backups, traversing a link lij, belong to one particular subset Bijs and thus may share a common channel at lij. The total number of colors assigned to vertices of Gij determines the number of channels of a link lij that will become allocated for backup paths after applying the Step 2.2 of the SCPO algorithm. Comparison of the main features of the a priori and the a posteriori optimization techniques is given in Table 1.

150

J. Rak

Table 1. Comparison of the main features of the a priori and the a posteriori optimizations

optimization scope does the optimization increase the length of backups time of applying the optimization

a priori optimization all the network links

a posteriori optimization each single link

yes

no

when establishing each connection, at defining the cost matrix Ξ

after establishing all the connections

3.2 Graph Coloring Algorithm Used in the A Posteriori Optimization The following graph coloring routine is used to perform the sharing of backup path resources. For each link lij we construct a graph of conflicts Gij with vertices representing the backups that are installed at lij. Let us define the following: Vij - set of vertices of Gij; | Vij| = nij Eij - set of edges of Gij; | Eij| = mij deg(va) - the degree of a vertex va in Gij, i.e. the number of vertices adjacent to va (meaning the number of backup paths traversing a link lij that protect parts of active paths being non-disjont with the respective part of active path protected by the backup a). There is an edge eab between any two vertices va and vb in Gij, if and only if the respective backups a and b protect parts of active paths being non-disjoint each other (when a conflict appears). At each link lij, we need to assign channels to backups in a way that any two backups protecting parts of active paths being non-disjoint each other, are assigned different link channels and the number of channels assigned for backups at lij is minimized. However, this channel assignment problem, may be reduced to the vertex coloring problem of Gij, where neighboring vertices must be assigned different colors. Since vertex coloring problem is NP-hard, many sequential graph coloring routines were introduced [9]. Generally, when using such routines, vertices are sequentially added to the already vertex-colored subgraph of Gij and assigned colors. Colors are represented by integers. At each step of the algorithm, the newly added vertex is assigned the lowest possible color. Here the Largest-First (LF) heuristics is used to color the vertices of Gij. Following this routine, vertices are first ordered descending their degree values in Gij and then sequentially assigned colors based on their position in this ordering (i.e. the vertex of the highest degree will be assigned its color first and so on). Intuitively, if there are few high degree vertices in Gij, then coloring them first will reduce the total number of assigned colors. Fig. 7 shows the example network with five survivable connections. The idea of the LF algorithm is explained for a link l46. The corresponding graph of conflicts G46 is given in Fig. 8. The results of LF color assignment are presented in Table 2. They show that for a link l46 two colors were sufficent, so two channels will become allocated for backup purposes (instead of previous 5). One channel will become reserved for backups of connections: kb and kc, while backups of connections ka, kd and ke will utilize the other common channel of a link l46.

Capacity Efficient Shared Protection and Fast Restoration Scheme

151

8

5

2

b

d

a

e

10

1 4

6

9

3

7

c

active paths

backup paths

Fig. 7. Example survivable connections established in an artificial network The network is protected against a single link failure. Path protection scheme is assumed. Five connections: ka, kb, kc, kd, and ke, established between pairs of nodes [3,5], [1,9], [2,8], [7,9] and [3,7], respectively, are realized by the respective active paths: (3,1,2,5), (1,3,7,9), (2,5,8), (7,9) and (3,7). Each active path has the corresponding backup: (3,4,6,5), (1,4,6,10,9), (2,4,6,10,8), (7,6,4,9) and (3,4,6,7), respectively

Fig. 8. Example graph of conflicts G46 for backups traversing a link l46 Backups: a, b, c, d, e, being represented by the vertices of G46 protect active paths of connections: ka, kb, kc, kd and ke, respectively. The edges eab, ebd, ebe, eac denote that the respective active paths are not link-disjoint each other

Table 2. Example vertex coloring of G46 obtained by LF algorithm descending ordering of G46 vertices degrees of G46 vertices color assignment (according to LF algorithm)

b

a

c

d

e

3

2

1

1

1

1

2

1

2

2

4 Modeling Assumptions The experiment was performed for the U.S. Long-Distance Network, presented in Fig. 9, consisting of 28 nodes and 45 bidirectional links. Due to the size of the network, only computations with the use of heuristic algorithms (SCPR and SCPO) were feasible4. Simulations, performed using the author’s network simulator, were to measure the link capacity utilization ratio and the values of connection restoration time. Time of connection restoration was measured according to [12, 16, 17] and comprised: time to detect a failure, link propagation delay, time to configure backup path transit nodes and time of message processing at network nodes (including queuing delay). All the links were assumed to have 32 channels. Channel capacity unit was considered to be equal for all the network links. Network nodes were assumed to have a full channel (wavelength) conversion capability. 4

Integer Linear Programming formulations of the NP-complete problem of finding optimal active and backup paths for the path protection scheme, can be found in [12, 13].

152

J. Rak 1

27

26

4

6

9

10

28 24

23 2

14

11 7

5

25 17

15 12 3

20 18

8 13

21 16

19 22

Fig. 9. U.S. Long-Distance Network

For each connection, the following properties were assumed: • protection against a single link failure (each of the failure states consisted of a failure of one link at a time, the rest being fully operational) • a demand of resource allocation equal to the capacity of one channel • provisioning 100% of the requested bandwidth after a failure (preplanned scheme) • a demand to assure unsplittable flows (both for primary and backup paths) • the distance metrics and Dijkstra’s shortest path algorithm in all path computations • the three-way handshake protocol of restoring the connections (the exchange of LINK FAIL, SETUP and CONFIRM messages), described in [16, 17]. During a single research, described in Fig. 10, one particular variant of optimization strength as well as one type of protection scope were provided for all the connections. Repeat 30 times the following steps: 1 Randomly choose 30 pairs of source s and destination d nodes. 2 Try to establish the connections by applying the SCPR or SCPO algorithm*. 3 Store the level of link capacity utilization and the lengths of the backups. 4 30 times simulate random failures of single links. For each failure state restore connections that were broken and note the values of connection restoration time. *

Any type of sharing as well as any scope of protection is allowed here.

Fig. 10. Research plan

5 Modeling Results 5.1 Backup Path Length Fig. 11 shows the average lengths of backup paths for all variants of optimization strength as well as for various scopes of protection. The results prove that when using our a posteriori optimization, the average length of backup paths is not increased and remains at the same level as in case no optimization is performed. Fast restoration is thus possible. They also show that after applying the common a priori optimization, the length of backup paths is often far from optimal. Additionally, the stronger the a priori optimization is chosen, the longer the backup paths are. For the extreme case, when the link protection scheme and the parallel intra- and inter-demand optimization were used,

Capacity Efficient Shared Protection and Fast Restoration Scheme Path protection scheme

Link protection scheme 6000

4738,6 3725,3

backup path length [km]

backup path length [km]

6000

4738,6

3725,3

4000

2000

153

3725,3

3725,3

3725,3

3725,3

3661,0

4000

3261,4 2219,5

2326,5

2219,5

2219,5

2000

0

2219,5

2219,5

0

no optimization

intra-demand

inter-demand

parallel intra- and inter-demand

no optimization

[ in

lin k s ]

optimization strength

inter-demand

parallel intra- and inter-demand

optimization strength

8 6 5 , 5 2 ,2 2 6 2 , 6 4 , 64 5 , 5 2 ,2 2 5 2 , 5 2 ,2 2 2

4 2

le n g t h

0

o p t i m

i z at i o n

s t r e ng t h

a priori optimization

a posteriori optimization

b ac k u p

p a t h

intra-demand

Fig. 11. Average length of backup path as a function of optimization strength

the average backup path length for our a posteriori optimization (2219,5 km) was even about 39% better, compared to results of the a priori approach (3661,0 km). 5.2 Level of Link Capacity Utilization Fig. 12 shows the average values of link capacity utilization ratio for all variants of optimization strength as well as for various scopes of protection. It proves that our a posteriori optimization routine remarkably reduces the level of link capacity utilization (up to 38 % for the link protection scheme and the parallel intra- and inter-demand sharing, compared to the ”no optimization” case). The a priori optimization technique is, however, always more capacity effective, but for larger scopes of protection (e.g. when using the path protection scheme), its advantage over our a posteriori technique practically does not exist, as shown in the left part of Fig. 12. Link protection scheme link capacity utilization [%]

link capacity utilization [%]

Path protection scheme 40 32 24 16

18,81

18,81

18,81

18,81

8

14,90

14,90

13,79

13,79

0 no optimization

intra-demand

inter-demand

parallel intra- and inter-demand

40 32

34,30

29,21

34,30 22,44

24 16

15,50

lin k s ] [ in

14,49

8 0 no optimization

optimization strength

intra-demand

inter-demand

parallel intra- and inter-demand

optimization strength

8 6 5 , 5 2 ,2 2 6 2 , 6 4 , 64 5 , 5 2 ,2 2 5 2 , 5 2 ,2 2 2

4 2

le n g t h

0

o p t i m

i z at i o n

s t r e ng t h

a priori optimization

a posteriori optimization

b ac k u p

p a t h

21,15

26,44

Fig. 12. Average level of link capacity utilization as a function of optimization strength Table 3. Confidence intervals of 95 % for the mean values of link capacity utilization [%]

path protection link protection

intra-demand sharing

inter-demand sharing

parallel intra- and inter-demand sharing

a priori optimization

0,84

0,60

0,60

a posteriori optimization

0,84

0,69

0,69

a priori optimization

1,06

0,63

0,64

a posteriori optimization

1,10

0,98

0,97

154

J. Rak

5.3 Restoration Time Values Figs. 13 and 14 show the cumulative distribution function of restoration time for both all variants of optimization strength and various scopes of protection, while Fig. 15 the respective average values. It is worth mentioning that the obtained mean values of restoration time for our a posteriori optimization were always similar to the shortest ones, achieved when no optimization was performed (about 25 and 50 ms for link and path protection, respectively). When compared to the results of the a priori optimization, the difference is significant. In particular, for the link protection scheme and the parallel intra- and inter-demand sharing, it takes about 41% less time (24,78 against 41,86 ms) to restore a connection, when our a posteriori optimization is used. The results also show that the value of restoration time depends on the protection scope and the length of the backup path to be activated. Additionally, the overall value of restoration time gets increased by message processing delay at network nodes and by reconfigurations of transit nodes of shared backups, that must be performed when restoring each connection [16]. The stronger the optimization is, the more transit nodes of backup paths are to be reconfigured. This is shown in Fig. 16, where the average ratios of restoration time value versus backup path length as a function of optimization strength are presented. The results prove that the impact of factors other than the propagation delay is greater for larger scopes of protection (e.g. for path protection). The obtained ratios were, however, similar for both kinds of optimization. Intra-demand sharing

Parallel intra- and inter-demand sharing

Inter-demand sharing

1

1

0

probability

probability

probability

1

0 1

51

101

0

1

restoration time [ms]

101

1

51

101

restoration time [ms]

lin k s ]

8 6

[ in

4 2

5 , 2 5 2 , 22 6 , 6 4 , 64 5 , 2 5 2 , 2 2 5 , 2 5 2 , 2 2

le n g t h

0

op t i m i z a t i o n

s t r e n gt h

a priori optimization

a posteriori optimization

b ac k u p

p a t h

51

restoration time [ms]

Fig. 13. Cumulative distribution function of restoration time for various optimization strengths (path protection scheme) Parallel intra- and inter-demand sharing

Inter-demand sharing

Intra-demand sharing 1

probability

1

probability

probability

1

0

0

0

1

51

101

1

[ in

lin k s ]

restoration time [ms]

101

1

51

101

restoration time [ms]

8 6 4 2

5 , 2 5 2 , 22 6 , 6 4 , 64 5 , 2 5 2 , 2 2 5 , 2 5 2 , 2 2

le n g t h

0

op t i m i z a t i o n

s t r e n gt h

a priori optimization

a posteriori optimization

b ac k u p

p a t h

51

restoration time [ms]

Fig. 14. Cumulative distribution function of restoration time for various optimization strengths (link protection scheme)

Capacity Efficient Shared Protection and Fast Restoration Scheme Path protection scheme

Link protection scheme

80

80

61,47 60 40

49,84

49,85

49,82

49,85

61,14

51,41

restoration time [ms]

restoration time [ms]

155

51,38

20 0 no optimization

intra-demand

inter-demand

60

20

24,64

26,21

24,67

24,68

25,03

24,78

no optimization

intra-demand

inter-demand

parallel intra- and inter-demand

0

parallel intra- and inter-demand

optimization strength

optimization strength lin k s ]

8 6 5 , 2 5 2 , 22 6 , 6 4 , 64 5 , 2 5 2 , 2 2 5 , 2 52 , 2 2

[ in

4 2

le n g t h

0

o p t i m

i z at i o n

s t r e ng t h

a priori optimization

a posteriori optimization

b ac k u p

p a t h

41,86

37,05 40

Fig. 15. Average values of restoration time as a function of optimization strength Table 4. Confidence intervals of 95 % for the mean values of restoration time [ms] intra-demand sharing

inter-demand sharing

a priori optimization

1,69

2,04

2,00

a posteriori optimization

1,69

1,72

1,71

a priori optimization

1,12

1,85

2,04

a posteriori optimization

0,99

1,01

0,99

path protection link protection

Path protection scheme

Link protection scheme

0,0150

0,0134

0,0134

0,0134

0,0134

0,0138

0,0138

0,0130

0,0129

0,0100 no optimization

intra-demand

inter-demand

parallel intraand interdemand

optimization strength

0,0150

0,0111

0,0113

0,0111

0,0113

0,0114

0,0114

no optimization

intra-demand

inter-demand

parallel intraand interdemand

0,0111

0,0112

0,0100

optimization strength

lin k s ]

8 6 5 , 2 5 2 , 22 6 , 6 4 , 64 5 , 2 5 2 , 2 2 5 , 2 52 , 2 2

[ in

4 2

le n g t h

0

o p t i m

i z at i o n

s t r e ng t h

a priori optimization

a posteriori optimization

b ac k u p

p a t h

restoration time / backup path length [ms/km]

restoration time / backup path length [ms/km]

parallel intra- and inter-demand sharing

Fig. 16. Relationship between the restoration time and the length of backup path as a function of optimization strength

6 Conclusions Obtained results confirm that the proposed SCPO algorithm of establishing survivable connections with the a posteriori optimization significantly reduces the ratio of link capacity utilization, while simultaneously providing fast restoration. Our algorithm does not increase the lengths of backup paths, which is true independent of the optimization strength. The commonly used a priori optimization is, however, still more capacity effective, but for the price of much greater restoration time values. Concluding the paper, it must be pointed out, that one cannot have both the restoration time and the link capacity utilization at the minimum level. One of these two factors plays against the other and vice-versa. If they are of similar importance,

156

J. Rak

the best is to use the a posteriori optimization, which provides fast restoration of connections together with medium level of link capacity utilization.

References 1. Assi, C., Ye, Y., Shami, A., Dixit, S., Ali, M.: Efficient Path Selection and Fast Restoration Algorithms for Shared Restorable Optical Networks. ICC 2003 - IEEE International Conference on Communications, Vol. 26, No. 1 (2003) 1412-1416 2. Chow, T. Y., Chudak, F., Ffrench, A. M.: Fast Optical Layer Mesh Protection Using Precross-connected Trails. IEEE/ACM Trans. on Networking, Vol. 12, No. 3 (2004) 539-548 3. Dijkstra, E.: A Note on Two Problems in Connection with Graphs, Numerische Mathematik, 1 (1959) 269-271 4. Ellison, R. J., Fisher, D. A., Linger, R. C., Lipson, H. F., Longstaff, T., Mead, N. R.: Survivable Network Systems: An Emerging Discipline. Carnegie Mellon University, Software Engineering Institute, Technical Report CMU/SEI-97-TR-013 (1997, Rev. 1999) 5. Hauser, O., Kodialam, M., Lakshman, T. V.: Capacity Design of Fast Path Restorable Optical Networks. IEEE INFOCOM, Vol. 2 (2002) 817-826 6. Ho, P-H., Tapolcai, J., Cinkler, T.: Segment Shared Protection in Mesh Communications Networks With Bandwidth Guaranteed Tunnels. IEEE/ACM Transactions on Networking, Vol. 12, No. 6 (2004) 1105-1118 7. Kawamura, R.: Architectures for ATM Network Survivability. IEEE Communications Surveys, Vol. 1, No. 1 (1998) 2-11 8. Kodialam, M., Lakshman, T. V.: Dynamic Routing of Locally Restorable Bandwidth Guaranteed Tunnels Using Aggregated Link Usage Information. IEEE INFOCOM (2001), 376-385 9. Kubale, M. et al.: Models and Methods of Graph Coloring. WNT (2002) in Polish 10. Lee, M., Yum, J., Kim, Y., Park, J.: A Restoration Method Independent of Failure Location in All-optical Networks. Computer Communications 25 (2002) 915-921 11. Mead, N. R., Ellison, R. J., Linger, R. C., Longstaff, T., McHugh, J.: Survivable Network Analysis Method. Carnegie Mellon University, Software Engineering Institute, Technical Report CMU/SEI-2000-TR-013 (2000) 12. Molisz, W.: Survivability Issues in IP-MPLS Networks. To appear in Systems Science in 2006, Vol. 32 13. Molisz, W., Rak, J.: Region Protection/Restoration Scheme in Survivable Networks. 3rd Workshop on Mathematical Methods, Models and Architectures for Computer Network Security (MMM-ACNS’05) St. Petersburg, Russia, Springer-Verlag, LNCS, Vol. 3685 (2005) 442-447 14. Qiao, Ch. et al.: Novel Models for Efficient Shared Path Protection. OFC (2002) 545-547 15. Ramamurthy, S., Mukherjee, B.: Survivable WDM Mesh Networks, Part I – Protection. IEEE INFOCOM (1999) 744-751 16. Ramamurthy, S., Mukherjee, B.: Survivable WDM Mesh Networks, Part II – Restoration. Proc. IEEE Integrated Circuits Conference (1999) 2023-2030 17. Ramamurthy, S., Sahasrabuddhe, L., Mukherjee, B.: Survivable WDM Mesh Networks. IEEE Jounral of Lightwave Technology, Vol. 21, No. 4 (2003) 870-883 18. Wang, H., Li, J., Hong, P.: RSVP-TE Fast Restoration Scheme for Meshed IPO All-optical Networks. IEEE (2002) 830-834 19. Wei, J. Y.: Advances in the Management and Control of Optical Internet. IEEE Journal on Selected Areas in Communications, Vol. 20, No. 4 (2002) 768-785

Increasing Lifetime of Wireless Sensor Networks with Energy-Aware Role-Changing Frank Reichenbach, Andreas Bobek, Philipp Hagen, and Dirk Timmermann University of Rostock, Germany Institute of Applied Microelectronics and Computer Engineering Richard-Wagner-Street 31, 18119, Rostock-Warnemuende, Germany {frank.reichenbach, andreas.bobek, philipp.hagen,dirk.timmermann}@uni-rostock.de Abstract. Energy aware and robust self-organization is a challenging task in large, randomly deployed wireless sensor networks. In this paper, we achieve such a self-organization by introducing a hierarchical network structure and additionally roles that represent basic network functionalities like packet forwarding or data aggregation. These roles are exchanged between the participating nodes considering specific constraints. We are focusing on a long network lifetime, which strongly depends on the limited energy resources of each node. Therefore, the complex roles are released by nodes with critical battery levels and are assigned to nodes with more energy capacity left. With this approach, we achieve a uniform energy distribution over the whole network. Finally, we extend the overall lifetime of the network by 40% at continuous capability at all time. We demonstrate the proper function and the efficiency of the postulated protocol and we show its benefits by simulating an applicable “Forest Fire Scenario”.

1

Introduction

Recently, tiny devices with sensors perform measurements of a physical phenomenon such as vibration, pressure, temperature, or humidity. The technical progress has been leading to smaller sensors, which are equipped with sensing hardware, their own processing units, memory, communication modules, and energy sources [1]. Hundreds or even thousands of these sensor nodes, are randomly deployed, self-organized, and communicate wireless. Normally, at least one special node, called the sink node, is installed at the edge of the sensor network. This node acts as the final collector of sensed data from the network’s view, and provides gateway capabilities for further processing and monitoring. When designing sensor networks a minimal power consumption is the key challenge. There are two main strategies to avoid too much energy consumption: Since sensors are spread out, it is expected that collected data is redundant when observing a common physical phenomenon under normal circumstances. Filtering and aggregating this data is a common technique to remove redundancy of measurements and therefore reduce the amount of data that has to be sent. Secondly, instead of sending packets directly via long-range communication it A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 157–170, 2006. c Springer-Verlag Berlin Heidelberg 2006 

158

F. Reichenbach et al.

is much more efficient to use short-range multi-hop routing [2]. Regarding the latter case, it is important to avoid node failures arising from battery exhaustion. Given these facts, a uniform energy distribution, considering the energy of each node, is strongly demanded. According to the network infrastructure, there are nodes with different power consuming functionalities assigned to them, such as aggregating, routing, or managing. These functions are further defined as roles. This paper introduces dynamic roles to achieve efficient self-organization in sensor networks. Roles are changed dynamically within the network depending on the available energy of each node. We show that this approach results in a homogeneous power consumption of all nodes and consequently increases the overall network lifetime. Section 2 describes the advantages of self-organization and research in the field. Then, in Section 3 we describe our protocol to achieve dynamic role-change very efficiently. Comprehensive simulation results are shown in Section 4. Finally, we discuss these simulation results critically and conclude this paper in Section 5.

2

Self Organization in Sensor Networks

Self-organization in wireless sensor networks is essential to ensure proper measuring as well as communicating even if external influences of the environment (e.g. weather, impacts or electromagnetic fields) affect the network. Eventually, nodes join or leave the network, nodes are mobile, or the connection between nodes is asymmetric or interrupted. Because of the unknown position of each node in the beginning, the self-organization process implies multi-hopping by routing protocols and localization algorithms. Due to the following reasons, selforganization is substantially important: – Initially, the topology of the deployed network is unknown and can change frequently. – The participating number of nodes highly fluctuates due to node failures or battery exhaustion. – Networks must be able to adapt to every kind of unknown environment and be aware of obstacles. 2.1

An Approach with Dynamic Roles

It is commonly accepted to define roles in a network that run on different nodes and thus must be exchangeable. For us, roles represent a specific algorithm that consumes a defined quantity of energy to be accomplished. Exemplarily, a parity check on physical layer will consume less energy than a complex encryption algorithm on an application layer. Nevertheless, these are two abstract roles that must be executed to fulfill the overall network function. It is conceivable to detach the roles of the specific hardware and to distribute them in the network depending on different parameters. In this paper, we will focus on low energy consumption on each node and in the end in the overall network. This approach enables:

Increasing Lifetime of Wireless Sensor Networks

159

– Fair distribution of all tasks in the network. – A defined protocol by which nodes communicate with each other. – The adaptation of tasks to specific parameters (e.g. a role release at minimal energy resources). – Adding and removing roles, even if the sensor network is already deployed. – The concentration of roles near a phenomenon (e.g. to strengthen a region or to minimize the communication overhead). Different approaches already exist in literature, which we will be described in the following. 2.2

Related Work

Dynamic Clusterhead Gateway Switched Routing. A well-known approach using roles is the “Dynamic Clusterhead Gateway Switched Routing” that presents a further stage of the “Dynamic Destination-Sequenced DistanceVector Routing” (DSDV) by introducing hierarchical levels [3]. Thereby, efficient routing with small communication overhead is achieved. A hierarchy is established by so-called cluster-heads that manages the process within their specific clusters. Clusters are groups of nodes. Particularly, a cluster-head directly communicates with each node in the cluster. Clusters intersect topologically whereas a second role, the gateway, handles communication between them. Therefore, a gateway forwards packets from a cluster-head to the neighboring cluster-head. Extensive simulations showed that this approach is qualified to increase the overall network lifetime. Low Energy Adaptive Clustering Hierarchy. In the “Low Energy Adaptive Clustering Hierarchy” algorithm, data is collected from sensor nodes and then transmitted to a base station [4]. LEACH also considers a clustered network hierarchy where some of the nodes start executing the role of a cluster-head. They collect sensor data from all direct neighbors. This data is aggregated and then sent to a base station. Due to the high power consumption when transmitting data to a base station, the cluster-head role rotates within the network. The authors showed that this approach also leads to a longer lifetime of the overall network. Connected Dominating Sets. By extending the “Connected Dominating Sets” routing (CDS) from [5] with roles, a new approach is presented in [6] to achieve self-organization. New roles, the “Sensor Coordinator”, the “Sensing Node”, and the “Level 1 / Level 2 Backbone Nodes”, are postulated in different levels of hierarchy. The so called “Sensing Zones” are created by the “Sensor Coordinator”- nodes with “Sensing Nodes”. Thereby the authors defined a sensor dependent metric, called the “Cumulative Sensing Degree”. At present, dynamic role-change is not provided. Generic Role Assignment. In another approach, a role-based organization in a clustered network is extended by two new roles - the aggregator and the sink

160

F. Reichenbach et al.

[7]. The first role aggregates information in the cluster to decrease the communication overhead caused by redundancy. The aggregator principally adopts the function of the cluster-head controlling the slaves that are neighboring nodes. The authors consider lists with all properties of the nodes. Additionally, tables with the description of each role and associated rules when assigning the roles are introduced. Finally, this achieves a generic role assignment. We build up on some of the introduced concepts as well as dynamic roles to increase lifetime. Thereby, we focus on minimizing the power consumption on every node and consequently of the overall network. The following section describes our own protocol with all its concepts.

3 3.1

Concept of a Dynamic Role-Based Protocol Architecture of the Network

To increase the lifetime of the sensor network by changing roles dynamically, the protocol has to fulfill some requirements. First the network must support self-organization of its nodes in an ad-hoc manner. Nodes can be added to or removed from the network transparently. It is noted that node mobility can also be interpreted as removing/adding nodes. We demand a mechanism for initializing and updating roles to react on changes of the environment and energy resources. For this purpose, nodes must be able to request services and information about their current state regarding resources of their neighboring nodes. Role assignment shall minimize global and local communication and lead to a uniform energy distribution over the whole sensor network. We use clusters as the basis of our network architecture. A cluster is a logical structure that is formed by some nodes (see Fig. 1). It is managed by a special node called cluster-head. Communication between a cluster-head and its clusternodes is bidirectional, which means, they can communicate directly with each other. Each node belongs to a cluster and thus can reach at least one cluster-head by one hop or is a cluster-head itself. Here, a hop count defines the number of hops (relays or nodes) a message visits when traveling from source to destination.

Fig. 1. Topology of 3 clusters and the participating roles

Increasing Lifetime of Wireless Sensor Networks

161

Nodes that reach more than one cluster-head are called gateways. They act as bridges between clusters and enable communication between cluster-heads. Sensing data is conveyed over gateways and cluster-heads to one or more sink nodes that serve as data collectors for further processing outside the sensor network. The protocol differentiates between global and local communication. Marking global packets with sequence numbers facilitates reduction of global network traffic. Cluster-heads and gateways discard packets with same sequence numbers to avoid packet circulation. During role initialization, nodes analyze all received packets. After this, the communication is limited to the node’s cluster. Nevertheless, it is still possible to send cluster broadcasts. To reduce global traffic we introduce an aggregator. Such node collects data of sensor nodes within one cluster. In the case of nearly homogeneous data, the average value is determined and transmitted to the cluster-head. Moreover, we introduce a report cycle that defines the time between sending info packets. We want to emphasize that in contrast to other works, cluster-head and aggregator are assigned to different nodes. We expect a uniform distribution of energy in a cluster, since the cluster-head is already stressed by global communication, and therefore too many role changes are avoided. 3.2

Basic Network Roles

The sensor role is the standard role for the nodes. After initialization and if no role was assigned, a node starts with the sensor role implicitly. Whenever a node has to release another role it falls back into the sensor role. Sensor nodes measure data and forward it to the aggregator periodically. The aggregator role is assigned to only one node per cluster. It collects data of all sensor nodes, analyzes them, filters out redundancy, and forwards it to the sink node via the cluster-head. The gateway role is responsible for inter-cluster communication. The cluster-head role has to forward global packets as well and manages the cluster and gateways to keep a consistent state. Measuring and sending data within the network depends on a base cycle. This value is determined by the aggregator and is dynamically adapted to the current situation. The base cycle is identical to the time between two data packets sent by a sensor node. Currently, the sensor node measures data 10 times per base cycle. The averaged value is determined and sent to the aggregator. If a measured value exceeds the last average values, packets are sent to the aggregator immediately. Since these packets present an exception, they are marked with a special exception flag. Moreover, all nodes in the cluster periodically send info packets to each other. Info packets contain among other information the energy level. All nodes except sensors need this information in order to assign a role to another node. The main task of the aggregator is filtering and aggregating sensor data before forwarding it to the cluster-head. Furthermore, aggregators define the base cycle. If measured data is almost homogeneous over time, the base cycle can be increased. Cycles are implemented as levels. Higher levels consume more time

162

F. Reichenbach et al.

than upper levels between two cycles. Instead of assigning an absolute time value, the aggregator assigns a level and increases (decreases) it. 3.3

Network Initialization

Self-organization of the network is separated into initialization phase and update phase. During initialization, a node executes an initializing algorithm, which is the same for each node (see Fig. 2). After that, it owns a role and starts the role specific task. This protocol uses delays to avoid network traffic storm in case of starting numerous nodes at the same time. After starting (booting) a node, it broadcasts an info packet. Info packets are sent periodically. They contain data about energy level, possible roles and current role. Information in these packets is extracted and stored in an info table that contains data of all neighbor nodes, their energy levels and currently assigned roles. If the node detects to be in range of a working cluster by finding cluster-head and aggregator, it joins this cluster and starts the sensor role. If a cluster-head but no aggregator is detected the node will wait for a role command that is sent by the cluster-head. In case of losing the connection to a cluster-head, or if no cluster-head is available at initialization, the node starts the cluster-head role and broadcasts a corresponding info packet. If no other nodes try to become a cluster-head after a random time, the new cluster-head sends a stand by message to consolidate the network and to start the cluster-head role. If more than one node falls into the cluster-head role during

random delay

Each info packet is responded with an info packet by the receiver. Extracting these data results in a neibor node information table.

send info packet aggregator available?

yes

start role sensor

random delay no start role cluster head

no yes

check roles of neighbor nodes

wait

cluster head available? no

cluster head available?

yes

role command message?

yes

no

assign cluster head role

send cluster head info packet random delay

start role according command message

send StandBy no

send info packet

give up cluster head role

yes

other cluster head(s)?

Fig. 2. Flow chart of the initialization phase

Increasing Lifetime of Wireless Sensor Networks

163

initialization, they all release their role and start the initialization routine once again. We will describe in the following the process if a node has adopted the clusterhead role. All neighbor nodes are caused to check for gateway ability. This is the case if a node knows more than one cluster-head. Such nodes take over the gateway role and send an info packet to the cluster-head immediately. In the next step, the cluster-head chooses one gateway for each neighboring cluster according to their energy level and tries to set up a connection. If all connections fail (e.g. no neighbor cluster is accessible), the cluster-head gives up its role. If at least one connection was successfully established, it broadcasts a cluster-head initialization message. All nodes that are one hop away start participating in the cluster. Gateways that receive such message also participate in the cluster. The node with the highest energy capacity is assigned with the aggregator role. All other nodes receive a broadcast message to fall back into the sensor role. 3.4

Dynamical Adaptation of the Roles

During runtime of the network, it is required to adapt the current energy situation by changing roles. Role changes can be enforced actively or when a missing role is detected. Aggregator and cluster-head control each other. This means, if one of them is broken, the other node chooses an appropriate node and initializes a role assignment. The same assignment is started if an aggregator or a cluster-head wants to leave its role. When an aggregator releases its role actively, it sends a stand by message to the sensor nodes, which in turn, stops them to send data furthermore. After that, the aggregator’s data base is transmitted in a two-step procedure. Each of these steps is acknowledged by the new aggregator node (Fig. 3). The data base consists of mean, deviation, and exception values. The last acknowledge is sent to release the role at the old node and to start the role at the new node. Changing the cluster-head is very simple. The old cluster-head chooses a new node, sends a change message and waits for an acknowledge message. This is repeated until a new cluster-head is established. If gateways want to release their role, they request the cluster-head for admission. The cluster-head evaluates possible nodes that reach the same neighbor cluster. If no successor is found, the gateway has to keep its role. Otherwise, the change can be applied to the old and the new node (Fig. 4). current aggregator choose best appropriate successor

send role command aggregator

wait for ACK

drops role sensor

send ACK

send data level 1

wait for ACK

send data level 2

wait for ACK

confirm dropping role aggregator

role sensor

send ACK

start role aggregator

new aggregator send ACK

send ACK

Fig. 3. Flow chart of a role reversal of the aggregator

164

F. Reichenbach et al. current gateway

cluster head

request for dropping role gateway

appropriate gateway for same cluster available?

sends response to requesting gateway

wait for response

yes

role reversal possible?

new gateway drops role sensor

start role gateway

yes

send role command gateway to new node

drops role gateway no keeps role gateway

checks and optimzes its gateway environment

Fig. 4. Flow chart of the role reversal of gateway

Network Communication. Communication is organized in logical layers. Each packet addresses a logical layer and implies one of the following semantics: – Physical layer (0): all nodes that are one hop away analyze the packet – Cluster layer (1): all nodes in the same logical hop composed of nodes within the same cluster like the sender analyze the packet – Direct layer (2): packet is for a specific addressed node only – Global layer (3): packet is for gateways and cluster-heads only Three packet types for local communication are supported: sensor information message (SIM) used for info messages, sensor data message (SDM) and sensor command message (SCM). Furthermore, a global message (GM) packet is available to send sensor messages globally. Sensor messages are packed as payload in global messages (multiplexing and demultiplexing paradigm). To verify our concepts, we implemented the described concepts in the simulation tool J-Sim.

4

Simulation and Evaluation

Numerous simulation tools for wireless sensor networks are available. Well-known simulation platforms are NS2, OMNeT ++, J-Sim, SENSE, and Shawn [8, 9, 10, 11, 12]. We decided to use J-Sim as simulation framework. For our purposes, we needed to modify J-Sim by replacing some components to include our own energy model. This new model supports power consumption when processing data on a node, which was not possible yet. Further, we integrated hardware dependent power consumptions on bases of the MICA-2 system [13]. After these modifications, we were able to verify our concepts in extensive simulations. This section presents simulation series designed firstly to make a proof of concept and secondly to validate and characterize the performance of a dynamic role-change-approach. In all simulations we used a sensor field with a size of 800×800 meters. The field was split into a grid of 10×10 subfields, each having a side length of 80 meters. All clusters are within one of these subfields. Between

Increasing Lifetime of Wireless Sensor Networks

165

each pair of clusters the minimal distance is always the length of one subfield. Our goal by simulating was to be able to answer these questions: – Do the initial assignment of roles and the dynamic change of roles work correctly? – What are the implications of a specific energy capacity at which the rolechange is initiated? – What is the effect of a moving phenomenon in the sensor network? - simulating the “Forest Fire Scenario” In all simulations, we started with a relative energy capacity of 100%, which is equal to an absolute capacity of 5000 mAh. Every activity (e.g. computation, transmission) on the node consumes a defined quantity of energy on basis of the power consumption of the Mica-2 motes postulated in [13]. 4.1

Initial Role Assignment Without Dynamic Role-Change

In this first simulation series, we studied the initial role-assignment after starting the network. We placed 6 clusters in the sensor field as illustrated in Fig. 5. Each cluster was directly connected with its neighboring clusters only. Cluster 6 forwarded data by a gateway to the sink of the network. We stimulated all sensor nodes with random generated sensor values that were out of a uniform distribution between 18 and 22. This is due to our intension later to use temperatures as sensor value. Moreover, to achieve clear results in an acceptable simulation time, we multiplied all calculated energy consumptions by factor 10, similarly. This did not impede the correctness of the results but affected the overall network energy to drop faster. We stopped the simulation after the last base cycle change was accomplished because at this point the initialization phase had been over. Fig. 6 shows the energy capacity of each node over the time in cluster 4. Please note that in all following figures the time is in seconds multiplied by 103 . First, the energy capacity of the aggregator will be exhausted, which can Node 30 Node 32 Node 33 Node 34 Node 35 Node 36 Node 37 Node 38 Node 39 Node 31

99.5

0 480

80

160

240

5

320

400

480

560

640

720

800

99.0 98.5

Sink

6

gateways

98.0

400

320

240

4

2 160

Remaining energy capacity [%]

E 97.5 n e 97.0 r g 96.5 y 96.0

sensors

clusterhead aggregator

[ 95.5 % ] 95.0 94.5 94.0

80

1

3

0

x

Cluster

MobiltyModel Grid/ NodePositionTracker Grid

93.5 93.0 0.0

cycle changes

0.1

0.2

0.3

0.4

0.5

0.6 time [sec]

0.7

0.8

0.9

1.0

1.1 3

x10

Fig. 5. Network configuration with 6 clus- Fig. 6. Energy capacity [%] of each node ters and one data sink; network size is over the time [sec] ×103 in cluster 4 with 800×800 meters (only 800×480 meters are 10 nodes shown); the communication between the clusters is marked with arrows

166

F. Reichenbach et al.

not be directly seen in the figure. Due to the constant sensor data, the aggregator raised its base cycle successively and thus the curve flattened more and more. Furthermore, the initial power consumption between 80 and 130 sec of the cluster-head and both gateways is marginal because the global communication had not been established. After 130 sec the global communication started, resulting in a fast drop of the curve. It can also be seen in Fig. 6 that all remaining nodes, excluding the special roles, took over the sensor node role and started the measuring process. After this phase, the power consumption of all nodes was almost linear. 4.2

Initial Role Assignment and Dynamic Role-Change

In this simulation we explored the dynamic role-change after the initialization phase with varying energy differences. In order to do so, the energy difference indirectly specifies the moment of a role-change. In more detail, if the mean energy capacity of the whole cluster (Eclmean ) subtracted by the current own energy capacity of a node (Eown ) drops below the energy difference then the role must be released: Ex = Eclmean − Eown (Switch if Ex < Edif f ). We considered networks with 10 and 20 nodes in each cluster. We justified the report cycle at constant 10 sec to achieve clear results regarding the rolechange functionality. In addition, the power consumption was multiplied by 50 to minimize the simulation time. Following, we studied the implications in a cluster with 10 nodes only. Implications of the Energy Difference. Fig. 7 illustrates the result with an energy difference of 10%. All nodes switched the roles correctly. This ensured a more uniformly distributed power consumption over the sensor field. Moreover, no sensor node died prematurely and almost all nodes died in the same time interval of 600 sec. All curves fluctuated around an average straight line in a specific bandwidth. To analyze the influence of the energy difference, we run two more simulations with first a smaller value of 5% and secondly a higher value of 20%. The results are presented in Fig. 8 and Fig. 9. The deviation at an energy difference of 5% was much smaller than the deviation at 20%. Besides, at 20% the lifetime of the cluster slightly increased. This was probably due to more communication overhead at role switching. In reality, nodes cannot predict the precise time at which the battery is exhausted due to electronic limitations of measuring a battery capacity. Thus, role-changes very close to the battery exhaustion are questionable. In further simulations, we used the same test setup as before whereas 20 nodes formed a cluster (see Fig. 10-12). An energy difference of 10% was not sufficient to bother all nodes similarly. In contrast to the 10-node cluster, these results showed a longer lifetime at higher energy differences. The nodes with the important roles lived longer though the sensor nodes died successively above 2000 sec. Thus, the overall network functionality increased, but with fewer nodes that measured.

Increasing Lifetime of Wireless Sensor Networks Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10

Remaining energy capacity [%]

80 70 e n e 60 r g 50 y

80 70 e n e 60 r g 50 y

1. aggregator role transfer

[ 40 % ] 30

[ 40 % ] 30

1. clusterhead role transfer

20

20

10

10

0 0.0

0 0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

2.4

2.6

2.8

0.0

0.6

0.8

1.0

1.4

1.6

1.8

2.0

2.2 3

x10

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 Node 14 Node 15 Node 16 Node 17 Node 18 Node 19 Node 20

90 80 Remaining energy capacity [%]

Remaining energy capacity [%]

80 70 e n e 60 r g 50 y

1.2

time [sec]

Fig. 8. 10 nodes in each cluster and an energy difference of 5%

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10

90

[ 40 % ] 30

70 60 50 40 30 20

20

10

10

0

0 0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

2.2

0.0

2.4

0.2

0.4

0.6

0.8

1.0

1.2

1.4

3

Fig. 9. 10 nodes in each cluster and an energy difference of 20%

70 60 50 40 30 20

2.0

2.2

2.4

2.6

2.8

3.0

3.2 3

x10

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 Node 14 Node 15 Node 16 Node 17 Node 18 Node 19 Node 20

90 80 Remaining energy capacity [%]

80

1.8

Fig. 10. 20 nodes in each cluster and an energy difference of 10%

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 Node 14 Node 15 Node 16 Node 17 Node 18 Node 19 Node 20

90

1.6 time [sec]

x10

time [sec]

Remaining energy capacity [%]

0.4

x10

Fig. 7. 10 nodes in each cluster and an energy difference of 10%

70 60 50 40 30 20

10

10

0 0.0

0.2

3

time [sec]

0.0

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10

90

Remaining energy capacity [%]

90

167

0

0.5

1.0

1.5

2.0 time [sec]

2.5

3.0

3.5

0.0

0.2

0.4

0.6

0.8

3

x10

1.0

1.2 time [sec]

1.4

1.6

1.8

2.0

2.2

2.4 3

x10

Fig. 11. 20 nodes in each cluster and an Fig. 12. 20 nodes in each cluster and an energy difference of 20% energy difference of 5%

4.3

Dynamic Role-Change vs. the Static Case

To verify the benefits of dynamic roles, we compared simulation results of a dynamic network with a static network without role-changing. For this, we deactivated the dynamic role change in the following. Due to this modification, nodes with battery exhaustion die without transferring its role to another node.

168

F. Reichenbach et al. 0 800

80

160

240

320

400

480

560

640

720

800

720

640

Sink

3 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12 Node 13 Node 14 Node 15 Node 16 Node 17 Node 18 Node 19 Node 20

90

Remaining energy capacity [%]

80 70 60 50 40 30 20

560

480

400

2 320

240

x

80

Target Route 0

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

time [sec]

1.0

1.1

1.2

1.3

1.4

1.5

MobiltyModel Grid/ NodePositionTracker Grid

Circular influence of the target node

0 0.1

Target Node

1

10

0.0

Cluster

160

1.6 3

x10

Fig. 13. Static network with initial role assignment only; the network consisted of 20 nodes in each cluster

Fig. 14. Simulation of a moving sunbeam (the target node) that heated up the environment along a path with 3 clusters, each containing 10 nodes, and one data sink

Thus, the remaining nodes have to detect role failure and have to reactivate the missing role. To achieve a quick detection of missing roles, we decreased the report cycle to 30 seconds. Besides, the configuration was similar to the previous ones. We simulated a cluster with 20 nodes. Fig. 13 contains the characteristic curve of the energy capacity of a cluster, which shows a reduced lifetime by 600 seconds in comparison to the optimized dynamic network (see Fig. 12). The reason for this higher energy consumption was the higher frequency of info packets. The network did not work continuously with its full capability. In more detail, if one of the major roles died it needed between 30 an 45 seconds to got detected by the remaining nodes. In this detection period, no measured data left the cluster due to the missing aggregator. Moreover, missing cluster-heads or gateways implied problems in packet forwarding that reduced the performance of the cluster and the whole network. For example, alternative routing paths had to be explored. Hence, the efficiency of the global network organization decreased. To summarize, in the static network the first node with the important aggregator role died at 750 sec followed by all other nodes until 1600 sec. Less than 80% of all nodes in the cluster were working after 60% of the network lifetime. However, in the dynamic network the first node died at 1500 sec. A dynamic cluster lived up to 40% longer than the static one. Moreover, the failure of the cluster-head at 900 seconds involved complex reorganization. Thus, the global consistency of the dynamic network was increased. Finally, the wasted energy of global routing and the reorganization had been reduced. 4.4

Simulation of the “Forest Fire Scenario”

In this last simulation, we verified our concepts in a conceivable application. In this scenario, the sun, represented by a sunbeam, heated up the surface along a

Increasing Lifetime of Wireless Sensor Networks

169

2

x10

Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9

99 98

Remaining energy capacity [%]

97 96

Cluster1 Cluster2 Cluster3

1.7 1.6 1.5 1.4 1.3 1.2

95

1.1

94

1.0 0.9

93

0.8

92

0.7

91

0.6

90

0.5 0.4

89

0.3

88

0.2 0.1

87 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

time [sec]

Fig. 15. Energy capacity of cluster 3 over the time in a network with a moving sunbeam

0.1 3

x10

0.2

0.3

0.4

0.5

0.6

0.7

0.8

time [sec]

0.9

1.0

1.1

1.2

1.3 3

x10

Fig. 16. Temperatures over the time that were measured at the data sink

path in the forest. Along that path, a forest fire could spark. We placed 3 clusters, each consisting of 10 nodes, along the critical path thus to be able to detect temperature changes. Cluster 3 was directly connected to the sink and each cluster was connected to its neighbor. The power consumption was multiplied by 10 and the test setup corresponded to the one in the sections before. The target node, representing the stimuli, started at 350 sec. The geometry of this scenario is shown in Fig. 14. Beside the stimuli (temperature of the sun) of the target node itself, all clusters were stimulated by basic stimuli (temperatures of the environment) that ranged from 16 to 24. The target node, representing the sunbeam, moved with a speed of 2.1 m/s. After simulation start the target node affected the sensor nodes in cluster 1. To reach the endpoint of the path the target node needed 500 sec where it lost at last his influence on cluster 3. The energy consumptions for cluster 1 is examplarily shown in Fig. 15. The moving target node effectuated exception packets initiated by the sensor nodes. Thus, the global communication increased. While the target node passed the clusters, the role of the aggregator as well as the role of the cluster-head changed. The new aggregators retained in all three clusters its report cycle like the successor did. At the end position the target node lost its influence on the network and the energy consumption normalized in all clusters. The report cycle was reset to its maximum. The different maximums of the sensor values in Fig. 16 were due to the statistical distribution of the nodes in all 3 clusters. We finally verified the ability of adaptation of our protocol for the specific scenario.

5

Conclusion

This paper described a protocol that enables robust and efficient selfconfiguration that is strongly required in large and dynamic wireless sensor networks. For that, we introduced roles that represented the different fundamental network functionalities like communicating, forwarding, or precalculation. We demanded fair execution of these roles on all nodes similarly, independently of

170

F. Reichenbach et al.

the different complexity each role possessed. By exchanging these roles between nodes, we achieved a uniform energy distribution over the whole network. This finally extended the lifetime of the network by 40%, compared to a static case where each node executes all tasks until its battery is exhausted. Additionally, we showed the proper function of the protocol in extensive simulations. We demonstrated the practical relevance of our concepts by simulating a real “Fire Forest Scenario” with a moving sunbeam along a path. In further research, we intend to study the implications of more roles (e.g. localization and encryption) and a network consisting of different hardware platforms.

Acknowledgment This work was supported by the German Research Foundation under grant number TI254/15-1 and BI467/17-1 (keyword: Geosens).

References 1. I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, E. Cayirci, A Survey on Sensor Networks, IEEE Communications Magazine, August 2002, pp. 102-114. 2. J. Broch, D. A. Maltz, D. B. Johnson, Y.-C. Hu, and J. Jetcheva, A Performance Comparison of Multi-Hop Wireless ad-hoc Network Routing Protocols, in Proceedings of the 4th Annual ACM/IEEE International Conference on Mobile Computing and Networking, 1998, pp. 85-97. 3. P. E. Charles, and B. Pravin, Highly dynamic Destination-Sequenced DistanceVector routing (DSDV) for mobile computers, in Proceedings of the conference on Communications architectures, protocols and applications, 1994, London, United Kingdom, pp. 234-244. 4. W. Heinzelmann, A. Chandrakasan, and H. Balakrishnan, Energyefficient Communication Protocols for Wireless Microsensor Networks, in Proceedings of the 33rd Hawaii International Conference on System Sciences, Volume 8, 2000, pp. 8020. 5. J. Wu, M. Gao, and I. Stojmenovic, On calculating power-aware connected dominating sets for efficient routing in ad hoc wireless networks, in Proceedings of the International Conference on Parallel Processing, 2001, pp. 346-356. 6. M. Kochhal, L. Schwiebert, and S. Gupta, Role-based hierarchical self organization for wireless ad hoc sensor networks, in Proceedings of the 2nd ACM international conference on Wireless sensor networks and applications, 2003, pp. 98-107. 7. K. Roemer, C. Frank, P. J. Marrn, and C. Becker, Generic Role Assignment for Wireless Sensor Networks, in Proceedings of the 11th ACM SIGOPS European Workshop, 2004, pp. 7-12. 8. http://www.isi.edu/nsnam/ns 9. http://www.omnetpp.org 10. http://www.j-sim.org 11. http://www.cs.rpi.edu/ cheng3/sense 12. http://www.swarmnet.de/shawn/ 13. V. Shnayder, M. Hempstead, R. Chen, G. W. Allen, and M. Welsh, Simulating the power consumption of large-scale sensor network applications, in Proceedings of the 2nd international conference on Embedded networked sensor systems, Baltimore, MD, USA, 2004, pp. 188-200.

Self-Organisation of Resources in PROSA P2P Network Vincenza Carchiolo, Michele Malgeri, Giuseppe Mangioni, and Vincenzo Nicosia Dipartimento di Ingegneria Informatica e delle Telecomunicazioni Facolt`a di Ingegneria – Universit`a di Catania Viale A. Doria 6 – 95100 Catania (Italy) {car, malgeri, gmangioni, vnicosia}@diit.unict.it

Abstract. PROSA tries to simulate the evolution of human relationships from simple acquaintance to friendship and partnership. Our target is to obtain a self– reconfiguring P2P system which possesses some of the desirable features of social communities. We show that PROSA naturally evolves into a small–world network, with an high clustering coefficient and a relly short average path length.

1 Introduction The way social contacts and relationships are arranged, how they evolve and how they end, is matter for psychologists and social scientists research. Nevertheless some studies about social groups and their connections reveal that a “social network”, i.e. the network of relationships among people from simple acquaintance to friendship, has many interesting properties that can be exploied in a real–world P2P structure, such that of beeing a small–world. In this paper we define and simulate a P2P structure, named PROSA , in which semantic proximity of resources is mapped onto topological proximity of peers. PROSA uses a self–organising algorithm heavily inspired by social networks of collaborating peers, which dinamically links peers sharing similar knowledge and resources, putting them into high clustered and self–structured “semantic groups”. The paper is organised as follows: in Section 2we discuss our proposal; in Section3we show simulation results and finally Section 4presents a plan for future work. An extended explanation of PROSA is available in [3]. Related works in the field of P2P network algorithms inspired by social relationships include SETS[1], GES [6] and Bibster [2].

2 PROSA In P2P networks the culture or knowledge of a peer is represented by the resources (documents) it shares with other peers while links among peers represent existent relationships. To implement such a P2P system it is necessary to define a system to model knowledge, culture, interests and a self–organising network management algorithm to make and destroy links among peers. 2.1 Modelling Knowledge In PROSA , knowledge is represented using the Vector Space Model (VSM) [5]. In this approach each document is represented by a state–vector of (stemmed) terms called A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 171–174, 2006. c Springer-Verlag Berlin Heidelberg 2006 

172

V. Carchiolo et al.

Document Vector (DV); each term in the vector is assigned a weight based on the relevance of the term itself inside the document. This weight is calculated using a modified logarithmic version of TF–IDF [4]. The whole knowledge owned by a peer is itself represented by a state–vector called Peer Vector (PV), which is the weighted sum of all the frequencies of the terms contained into the resources shared by the peer. In a similar way, also queries are represented by a state–vector of the query terms, called Query Vector (QV). Using VSM it is possible to define the relevance of a peer (P) with respect to a given query (Q) as follows:  wt,P · wt,Q r(P, Q) = t∈P ∩Q

This relevance is used by the PROSA query routing algorithm. It is worth noting that a high relevance between a QV and a PV means that probably the given peer has documents that can match the query. More details about this model are reported in [3]. 2.2 Network Management Alghoritm In PROSA three types of links have been introducedin order to model possible relationships among peers: i) Acquaintance–Link (AL) ii) Temporary Semantic–Link (TSL) and iii) Full Semantic–Link (FSL). While FSLs represent “stable” connections (parents, relatives), TSLs are similar to FSL but are not so strong (friends, colleagues), and finally AL are really weak links (random links). A new peer which wants to join PROSA , just looks for other peers (for example using broadcasting, or by selecting them from a list of peer that are supposed to be up, as in Freenet or Gnutella) and adds some of them in his PL as ALs. Updating. In PROSA FSLs and TSLs dynamics are strictly related to queries. When a user of PROSA requires a resource, he performs a query and specifies a certain number of results he wants to obtain. The relevance of the query with respect to the resources hosted by the user’s peer is first evaluated: if none of the hosted resources has a sufficient relevance with respect to the query, the query has to be forwarded to other peers. When a peer receives a query forwarded by another peer, it first updates its PL. If the requesting peer is an unknown peer, a new TSL to that peer is added in the PL, and the QV becomes the corresponding Temporary Peer Vector (TPV). If the requesting peer is a TSL for the peer that receives the query, the corresponding TPV in the list is updated, adding the received QV and normalising the result. If the requesting peer is a FSL, its PV is in the PL yet, and no updates are necessary. If the peer has a sufficient number of relevant documents, it sends a result message to the requesting peer that lists the available resources. Otherwise the query is forwarded to the most relevant link available or to a randomly chosen one, if neither FSLs nor TSLs are sufficiently relevant.

3 PROSA Simulations and Results To simulate PROSA functionalities, we chose to use a data set composed by scientific articles in the field of maths (740 articles) and philosophy (750 articles). Article terms

Self-Organisation of Resources in PROSA P2P Network

173

have been stemmed and stored into a database. For each article, it’s DV was computed, using only the most 100 frequent terms of the document. We choose to limit the number of terms of the DV because in [6] it has been proved that a larger DV does not give better results. Preliminary simulations of PROSA have been focused on topological characteristics, such as clustering coefficient and average path length, Since links netween peers in PROSA are not symmetric, it is possible to represent a PROSA network as a directed graph G(V,E). The clustering coefficient for a node in a directed graph can be defined as the ratio between the number of existing edges among neighbors and the maximum number of possible edges, while the clustering coefficient of a graph is defined as the mean clustering coefficient for all the vertices (nodes) in the graph. In figure 1 the clustering coefficient (CC) and average path length (APL) of PROSA is compared to that of an “equivalent” random graph defined as a graph with the same number of nodes and edges of the PROSA network and randomly chosen edges.

# nodes 400 600 800 1000 3000 5000

# edges 15200 14422 14653 14429 15957 19901

CC prosa 0.26 0.19 0.17 0.15 0.11 0.06

APL prosa CC rnd 2.91 0.095 2.97 0.04 2.92 0.02 2.90 0.014 2.41 0.002 2.23 0.0008

ALP rnd CC prosa/CC rnd 1.65 2.7 2.01 4.75 2.29 8.5 2.58 10.7 4.8 55 6.17 75

Fig. 1. Clustering coefficients and APL for different network size

These measures regard the case of PROSA networks where each peer starts with 20 documents on average. The CC and AP L are computed after 10.000 queries. Each query contains 4 terms, on average. Looking at the results, it is clear that PROSA networks always present a higher clustering coefficient than the corresponding random graphs. This means that each peer tends to link with a strongly connected neighborhood, which represents (a part of) the “semantic group” joined by the peer. As showed in figure 1, the clustering coefficient for both PROSA and random network decreases almost exponentially, but the ratio between CC prosa and CC rnd increases exponentially with the number of nodes. We think this is due to the fact that in PROSA the clustering degree of the network is related to the number of queries performed by nodes. It is clear that the clustering coefficient depends on the average number of links per node. To verify this conjecture, we simulated PROSA networks behaviour for different numbers of queries, from 5000 to 20000. Results are showed in figure 2. The table shows that a PROSA network has higher clustering coefficient than the corresponding random graph for networks that have more than 200 nodes. Given that PROSA has, for different network sizes, a larger CC and a lower APL with respect to a correspondent random graph, we can finally deduce that a PROSA network naturally evolves to a “small–world” if a sufficient number of query is performed by peers. This fact confirm that the implemented algorithm is a good approximation of a real social network.

174

V. Carchiolo et al. Nodes PROSA RND # of queries: 5000 10000 15000 20000 5000 10000 15000 20000 200 0.24 0.37 0.44 0.49 0.16 0.37 0.53 0.65 400 0.21 0.26 0.27 0.30 0.04 0.09 0.15 0.21 600 0.15 0.19 0.22 0.22 0.02 0.04 0.06 0.09 800 0.14 0.17 0.19 0.20 0.011 0.023 0.036 0.05 1000 0.11 0.15 0.17 0.18 0.007 0.014 0.023 0.033 Fig. 2. Clustering coefficient for different network sizes and # of queries

4 Conclusions and Future Works In this paper a novel P2P self–organising algorithm for resource searching and retrieving has been presented. The algorithm emulates the way social relationships among people naturally arise and evolve, and finally produces a really small–world network topology, as confirmed by simulation results. The next step is to prove that a PROSA network is internally organized into semantic–groups, i.e. highly clustered groups of peers formed by nodes that share knowledge into a certain field.

References 1. Mayank Bawa, Gurmeet Singh Manku, and Prabhakar Raghavan. Sets: search enhanced by topic segmentation. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 306–313, New York, NY, USA, 2003. ACM Press. 2. Jean Broekstra, aaamrc Ehrig, Peter Haase, Frank van Harmelen, Maarten Menken, Bj¨orn Schnizler, and Ronny Siebes. Bibster — A semantics-based bibliographic peer-to-peer system. 3. V. Carchiolo, M. Malgeri, G. Mangioni, and V. Nicosia. Prosa: P2p resource organisation by social acquaintances. 2006. To be presented at AP2PC – Agents and P2P Computing Workshop at AAMAS 2006. 4. Gerard Salton and Chris Buckley. Term weighting approaches in automatic text retrieval. Technical report, Ithaca, NY, USA, 1987. 5. H. Schutze and C. Silverstein. A comparison of projections for efficient document clustering. In Prooceedings of ACM SIGIR, pages 74–81, Philadelphia, PA, July 1997. 6. Yingwu Zhu, Xiaoyu Yang, and Yiming Hu. Making search efficient on gnutella-like p2p systems. In Parallel and Distributed Processing Symposium, 2005. Proceedings. 19th IEEE International, pages 56a– 56a. IEEE Computer Society, April 2005.

Plug-and-Play Address Management in Ambient Networks* Zoltán Lajos Kis, Csaba Simon, and László Harri Németh Budapest University of Technology and Economics, Hungary Magyar Tudósok krt. 2., Budapest, H-1117 [email protected], [email protected], [email protected]

Abstract. This paper presents a peer-to-peer address distribution protocol – called Address and Domain Allocation Protocol – designed for Ambient Networks utilizing the identifier/locator split. The protocol focuses on the optimal distribution and management of the IP locators used within the Ambient Networks. The paper describes the protocol in details and presents the results of our measurements.

1 Introduction Address distribution and -management is always a main question in ad hoc networks. The locator/identifier split approach tries to alleviate the problemby using two identifiers. A globally unique identifier is used for the identification of a node, and another one is used for the sole purpose of finding the actual location of the node. Ambient Networks (AN) [3,4] is a European Union financed project that aims on developing solutions for beyond 3rd generation networks by extending the all-IP concept. The main innovations offered by ANs are support for enhanced mobility, network heterogeneity and dynamic, instant network composition. There are already a number of proposed solutions for looking up the locator mapped to a given identifier. The most promising of these solutions is the Host Identity Protocol (HIP) [1], where the identifier (Host Identity Tag - HIT) is resolved into IP addresses. The protocol specifications however focus on the resolution process itself, and not assigning the IP addresses. Our paper focuses on this latter problem, which is the distribution and management of locators within Ambient Networks. It does not detail the interworking with HIP, but the solution presented can be used by any HIP implementation as the locators. As most of the current solutions in this research field, our proposed protocol uses IP addresses for network locators. Thus it is a natural consequence to use the widely accepted Dynamic Host Configuration Protocol [2], and extend its capabilities by peer-to-peer means. *

This paper describes work undertaken in the Ambient Networks project, which is part of the EU’s IST programme. In total, 41 organizations from Europe, Canada, Australia and Japan are involved in this project. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the Ambient Networks project.

A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, 175 – 178, 2006. © Springer-Verlag Berlin Heidelberg 2006

176

Z.L. Kis, C. Simon, and L.H. Németh

2 Problem Statement The locator/identifier split takes the burden from the distribution and management of IP addresses in Ambient Networks. Since IP addresses are used for the location of a node, they do not have to remain the same, nor unique for the lifetime of a node. Our work focuses on the IP address management in a way that the resulting distribution of IP addresses optimizes IP routing and address discovery in Ambient Networks. Ambient Networks results in an AN that on the management plane can be best visualized as a hierarchical tree [6]. The topmost level represent the functional entities common to all nodes in the whole network. Lower levels present more and more autonomous sub-networks and lowest nodes on the tree represent nodes themselves. Typical management traffic would go between different levels of the tree. Thus, the most optimal solution for the locator distribution is to follow the hierarchy. This can be done by mapping different hierarchy levels onto IP subnetworks (Fig 1.a). We envisage that the address distribution function to be common for the whole network, and thus would be controlled by the topmost level. Actual address distribution on the other hand is done on the lowest levels of the tree, over the physical connections, by the peers themselves. This would imply, that prior to that of an address distribution taking place, a management message has to traverse the whole tree. It is recommended to use some sort of caching for this purpose. In our proposed solution each Ambient Control Space (ACS) [5] possess an address space – not necessarily continuous – which it can freely distribute among lower level ACSs and nodes. An ACS may receive requests for a chunk of its address space from a lower level ACS or for a single IP address from a bootstrapping node. As the topmost ACS virtually possesses the whole AN address space, it can always allocate free addresses, and propagate it on demand to lower level ACSs. Our proposed protocol takes a chunk of the local address space (e.g. the 192.168.0.0/16 part) and use it as an AN-temporary space that have a local scope only. The rest is used as AN-global addresses. Both bootstrapping AN nodes and legacy nodes would be served from the temporary address space. Legacy nodes would keep their temporary address until they move out of range, while bootstrapping nodes use this address for the bootstrapping phase only. Once member of an AN, they can hand back the temporary address, and acquire a global one using the ACS.

Fig. 1. a. – Mapping Address Spaces onto Ambient Networks. b. – Bootstrapping Phase.

Plug-and-Play Address Management in Ambient Networks

177

3 Protocol Details We implemented the protocol on the Peer-to-Peer ACS Prototype [7] (PAP) platform, providing it with the Address and Domain Allocation Protocol functional entity. We used DHCP as the final-step address distribution protocol, because most of today’s operation systems have native support for it. Thus our protocol can support non ANaware nodes, serving such nodes with connectivity in a best-effort manner. The process of acquiring an IP address is the same for both legacy and bootstrapping nodes. Every AN node runs a ’thin’ DHCP server and can capture requests for IP addresses. The request is propagated to the ACS, which returns a temporary address. This address is delivered to the requestor using the standard DHCP procedure. The process can be followed on Fig. 1.b. Global addresses of an Ambient Network and the address spaces of the lower level ANs are solely controlled by the Ambient Control Space of the AN. The ACSs upkeep a list of used and free IP addresses reserved for its own AN, and a list of delegated IP address spaces of its lower level ANs. A bootstrapping node has to change its temporary address to a permanent one using the Address and Domain Allocation Protocol. Inside the AN, the communication is done via the ACS, using ANIDs. ANIDs are constant through the lifetime of an AN node. Thus, nodes can freely change their IP addresses. Given the DHCP-based solution presented, two different address domain allocation strategies were implemented: a static and a dynamic one. In the static allocation strategy – called fixed allocation –, every request from an ACS is answered with the allocation of a fixed sized IP address domain. The dynamic strategy -- called adaptive allocation – ,allocates relatively small domains at first, and then, the size of the allocated domain is doubled.

4 Evaluation We set up a test network for the evaluation of our protocol. First we connected only two computers, thus we had only two ANs connecting through a gateway [6] in the experiment. E.g. in Fig. 2.a. PR2 and PR4. Then the rest of the computers were gradually added, each increasing the depth of the hierarchy tree by one. Each computer emulated an AN with 200 nodes requesting an IP address. We measured the average service times of address requests. Fig. 2.b. shows the average service times of the fixed and adaptive allocation strategies with increasing network sizes. The curve of fixed strategy shows a linear trend. The adaptive strategy gives a better performance and at the same time has more efficient address allocation, since it uses increasing sized domains. As a conclusion we can say that the fixed strategy is more economical but generates higher control traffic than the adaptive strategy, thus having higher service times. Adaptive strategy is wasting addresses in some cases, but in the case of frequent requests it can radically reduce service times. The fixed strategy gives a linear characteristic, showing that our protocol is scalable in this case.

178

Z.L. Kis, C. Simon, and L.H. Németh

Comparison of average service times

Average service times [ms]

155

Fixed Adaptive

150 145 140 135 130 125 120

2

3

4

5

6

7

8

Number of ANs

Fig. 2. a. – Virtual network topology for IP address distribution evaluation. b. – Average service times of fixed and adaptive strategies.

Within the limits of our measurements the adaptive strategy always performed better than the fixed one, but its curve shows an exponential trend. To stop the exponential growth, we will oversee some optimization techniques in our future work.

5 Conclusions This paper introduces a solution for automated address management in Ambient Networks. The solution is a potential answer to the more general locator/identifier split problem. It relies on the hierarchical tree of the composed ANs, and extends the DHCP protocol to support non-AN aware nodes, as well. We tested two strategies to allocate address spaces to the requesting overlays. In order to evaluate these strategies, we implemented them in the Peer-to-peer ACS Prototype. Through measurements we have shown that the adaptive strategy is more economical but generates higher control traffic than the fixed one and both strategies proved to be scalable. During our evaluation we identified some proactive and reactive optimization techniques. In the near term future we plan to test the impact of these optimization techniques on the performance of our solution.

References 1. R. Moskowitz et al.: Internet Draft, work in progress, IETF, February 2004. 2. R. Droms: RFC 2131 – Dynamic Host Configuration Protocol, March 1997. 3. N. Niebert et al.: Ambient Networks – An Architecture for Communication Networks Beyond 3G, IEEE Wireless Communications, April 2004. 4. The Ambient Networks Project, http://www.ambient-networks.org 5. Ambient Networks Deliverable D1-5 : AN Framework Architecture, December 2005. 6. R. Szabo et al.: Dynamic Network Composition in Ambient Networks: a Management View, Eurescom Summit 2005, April 2005. 7. Cs. Simon et al.: Peer-to-peer Management in Ambient Networks, 14th IST Mobile and Wireless Communications Summit, Dresden, Germany June 19-23, 2005.

k-Variable Movement-Assisted Sensor Deployment Based on Virtual Rhomb Grid in Wireless Sensor Networks Wang Xueqing and Yang YongTian College of Computer Science and Technology, Harbin Engineering University, China, 150001 [email protected]

Abstract. The paper proposes a k-variable Movement-assisted Sensor Deployment based on Virtual Rhomb Grid (kMSDVRG). The kMSDVRG algorithm has such characteristics as follows: (1) Forming a MCDS; (2) No “holes” in the sensor field; (3) Different applications, different k; (4) Different regions, different k for the same application. Index Terms: Minimum Connected Dominating Set (MCDS), Sensor Deployment, Virtual Rhomb Grid (VRG), Wireless Sensor Networks (WSN).

1 Introduction The sensor deployment is either deterministic or self-organizing. Random placement of sensors may not satisfy the deployment requirement. Two methods can be used to enhance the coverage: incremental sensor deployment and movement-assisted sensor deployment. For the former, the algorithms are not scalable and computationally expensive. For the latter, literature [1] proposes Virtual Force Algorithm (VFA). VFA has the drawbacks of coverage “holes” for sensor field; Literature [2] develops selfdeployment protocols VEC, VOR and Minimax. The Voronoi polygon constructed may not be accurate enough, which results in more moves and larger moving distance. On the other hand, the algorithms may terminate with an unbalanced distribution. For the above methods, they only use the same k in the total sensor field. But for some applications, there are at least two drawbacks, namely, either excessive coverage or incomplete coverage. Furthermore, the users cannot configure k according to need of specific applications or regions. Here, such method, different k for different regions in the same application, is used to both reduce the quantity of sensors and satisfy the application requirement, not to reduce coverage capability. And users can configure different k according to need of the different region for the same application.

2 Preliminaries 2.1 Hypotheses A sensor’s radio ability is omnidirectional. In a sensor field, all sensors’ radio power is uniform, and they are in the same plane. The initial deployment is random. Every node has the ability to know its own location. Lastly, each node can be mobile. A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, pp. 179 – 183, 2006. © Springer-Verlag Berlin Heidelberg 2006

180

W. Xueqing and Y. YongTian

2.2 Rhomb Grid To illustrate in Figure 1 (a), each sensor associates itself with one of the vertices of a rhomb grid. The rhomb grid both make enough use of sensing range and ensures full coverage and full connectivity [3]. Furthermore, the minimum number of the nodes in sensor field covered fully and seamlessly is determined by the equation [3] (1) N=

F

δ

=

F 3 3 2 r 2

Where F is the area of the sensor field, δ

=

2F 3 3r 2

(1)

3 3 2 r 2

is the effective area of each sensor,

the r is the radius of sensing or communication of sensors.

3 k-Variable Movement-Assisted Sensor Deployment Based on Virtual Rhomb Grid (kmsdvrg) kMSDVRG partitions the sensor field into “virtual rhomb grids(VRG)” (illustration in Figure 1 (b)). Figure 1 (c) shows the form of sensor deployment based onVRGs.

Fig. 1. (a) Rhomb grid, (b) a partition example of virtual rhomb grid for sensor field, (c) sensor deployment based on virtual rhomb grids

Figure 2 shows the data structure of the kMSDVRG algorithm, and Figure 3 shows the implementation details in pseudocode form. For a given number of sensors, kMSDVRG attempts to move each sensor to the desired vertex of VRGs. The desired vertex is closest from the sensor. The cluster head or base station is responsible for executing kMSDVRG and managing the onetime movement of sensors to the desired locations. Once the effective sensor positions are identified, a one-time movement is carried out to redeploy the sensors at these positions. Energy conservation is also taken into account during the repositioning of sensors. The kMSDVRG algorithm first ensures that the coverage degree required by each region or cluster is satisfied (illustration in figure 4(b)). Sequently, it deploys the rest of sensors into appropriate vertices of VRGs on the basis of energy conservation and requirement of coverage degree (illustration in figure 4(c)).

k-Variable Movement-Assisted Sensor Deployment Based on VRG in WSN

181

kMSDVRG data structures: Vertex data structure v(i): xv, yv: the coordinates of vertices of the virtural rhomb grids; k: the coverage degree of sensing or communication for each region; kcur: the coverage degree of sensing or communication of kMSDVRG’s execution; Sensor data structure s(i): xs, ys: the random coordinates of sensor; xd, yd: the destination coordinates of sensor; r: the radius of sensing or communication of sensors; state: the state of sensor (such as active, sleep and off); IDv: the identifier of the vertex the sensor belongs to; Fig. 2. Data structures used in the kMSDVRG algorithm

kMSDVRG pseudocodes: kMSDVRG (the set of sensors: s(i)(i=1,2,…,n), the set of vertices: v(i)(i=1,2,…,m)) {Initialization: //Initial coordinates of all sensors of WSN and vertices of VRGs. v(i).k = a random number; s(i).r = a designated constant; v(i).kcur = 0; set state = sleep, their xds, yds to the smallest number, their IDvs to 0; //estimate whether each sensor is not at kMSDVRG’s vertex. If so, the sensor does not move //any more, and its state = active, the kcur of the vertex add one. for(s(j) sensors) { for(v(i) vertices) { if(v(i).kcur = v(i).k) break; if (v(i).kcur < v(i).k and d(s(j), v(i)) is minimum) {move s(j) to v(i); s(j).IDv=i; s(j).state=active; v(i).kcur = v(i).kcur+1;} } if coverage degree for all the regions is satisfied { for(v(i) vertices) { if s(j).state=active continue; if d(s(j), v(i)) is minimum {move s(j) to v(i); s(j).IDv=i; v(i).kcur = v(i).kcur+1} } } } } Fig. 3. Pseudocodes of the kMSDVRG algorithm

In order to prolong network’s life and balance sensor’s energy, sleep-scheduling methods are used, which make some sensors off duty, and others on duty. Because the kMSDVRG algorithm makes each node associated with a vertex of VRG, scheduling among sensors takes place for the sensors belonging to the same vertex. Thus their communication distance is almost equal to 0. Energy consumption for scheduling communication is close to 0. For m vertices of VRGs in sensor field with the n sensors deployed, the computational complexity of kMSDVRG is O(mn). Because kMSDVRG ensures that sensor field can be k(i)-covered (i=1,2…m), its convergence is controlled by k(i), that is,

182

W. Xueqing and Y. YongTian

kMSDVRG continues to iterate until kcur(i) = k(i) or all the sensors are deployed at the desired positions.

4 Simulations For simulation, 58 sensors are placed in the sensor field at random. The sensor radius is 5 units. The field is 30×40. The simulation of kMSDVRG is in Matlab 7.0. The initial sensor locations are shown in Figure 4 (a). Figure 4 (b) show the sensor deployment at iteration 30. Here, all the regions are up to their required coverage degrees. Figure 4(c) shows the final deployment of all the sensors. From the figures, it is concluded that sensor deployment becomes uniform. From Figure 4 (c), it is obvious that there are no “holes” in the field. Figure 5 (a) shows the requirement of coverage degree for each region. Figure 5 (b) shows sleep sensors distributed by kMSDVRG in the end for each region. Furthermore, the kMSDVRG algorithm gives MCDS, namely, MCDS = {sensors | the sensors are at the vertices of the virtual rhomb grids}. Because each vertex has k(i) sensor nodes, the kMSDVRG provides us with MCDS in which each node has different weight, namely, coverage degree of region. Therefore, k(i)=1, single coverage; k(i)=k, multiple coverage (k coverage).

Fig. 4. (a) Initial sensor deployment at random, (b) when coverage degrees of all the regions are satisfied (iteration 30), (c) after all the sensors are redeployed (iteration 58)

Fig. 5. (a) the required coverage degree of each region, (b) the distributed sleep sensors of each region

5 Conclusion In the paper, we proposed a k-variable movement-assisted sensor deployment based on virtual rhomb grid (kMSDVRG), which integrates both deterministic and

k-Variable Movement-Assisted Sensor Deployment Based on VRG in WSN

183

self-organizing deployment in a unified framework. The characteristics of kMSDVRG algorithm are as follows: (1) Forming a MCDS, consequently conservation of energy more effective and the MCDS has k-robustness relative to one formed by single node (2) No “holes” in the sensor field, which ensures full and seamless coverage of sensing and communication. (3) k-varying with different applications. (4) k-varying with different regions for the same application. The flexibility of (3) and (4) allows the network to self-configure for a wide range of applications.

References [1] Y. Xou and K. Chakrabarty. “Sensor deployment and target localization based on virtual forces.” In Proceedings of INFOCOM, March 2003. [2] Guiling Wang, Guohong Cao, and Tom La Porta, “Movement-assisted sensor deployment,” IEEE INFOCOM 2004. [3] X. Wang, W. Yang, Y. Yang. Research on Coverage Problem of Wireless Sensor Networks, in: Proceedings of International Conference on Advanced Design and Manufacture (ADM2006), Harbin, China, 2006, pp. 543-546.

Toward Self-Managed Networks? Previous successes encountered by self-* techniques (self-organization, self-healing, etc.) in biology, distributed artificial intelligence or robotics, to name only a few, have intrigued many researchers in a variety of fields. Some people are now convinced that these techniques may also be the answer to the increasingly difficult problem of designing and managing today’s networks. The complexity of these networks stems from their large scale, the frequent hardware and software changes that they undergo, the high heterogeneity of their components, the complex dependencies between these components, the variety of services offered to end-users, the short time-to-market of new services and technologies, the resulting lack of debugging, etc. Can this complexity be reduced by leveraging self-management? Does the industry believe in the future of self-management in general, and self-managed networks in particular?

Panelists: – – – –

David Bartlett, IBM Corporation, USA Bruno Klauser, Cisco, Switzerland Huw Oliver, Ericsson, Ireland Fabrice Saffre, BT, UK

Panel Chair: Jean-Philippe Martin-Flatin, University of Quebec in Montreal, Canada

A. Keller and J.-P. Martin-Flatin (Eds.): SelfMan 2006, LNCS 3996, p. 184, 2006. c Springer-Verlag Berlin Heidelberg 2006 

Author Index

Adam, Constantin

1

Meier, Ren´e 73 Montresor, Alberto

Babaoglu, Ozalp 43 Balch, Tucker 116 Bobek, Andreas 157

Nathuji, Ripal 116 N´emeth, L´ aszl´ o Harri 175 Nicosia, Vincenzo 171

Carchiolo, Vincenza 171 Cardoso, Leonardo 87 Cunningham, Raymond 73 Dargie, Waltenegus Dowling, Jim 73

102

Eide, Viktor S. Wold Eliassen, Frank 15 Gjørven, Eli

15

15

Hagen, Philipp Iwanicki, Konrad Jesi, Gian Paolo

43

157 28

O’Hara, Keith J.

116

Raj, Himanshu 116 Rak, Jacek 142 Reichenbach, Frank 157 Sacha, Jan 73 Sadjadi, S. Masoud 130 Samimi, Farshad A. 130 Schwan, Karsten 116 Seshasayee, Balasubramanian Simon, Csaba 175 Stadler, Rolf 1 Staehli, Richard 15 Sztajnberg, Alexandre 87

43 Timmermann, Dirk

Kis, Zolt´ an Lajos Loques, Orlando Lund, Ketil 15

157

175 87

van Steen, Maarten 28 Voulgaris, Spyros 28 Xueqing, Wang

Malgeri, Michele 171 Mamei, Marco 58 Mangioni, Giuseppe 171 McKinley, Philip K. 130

179

YongTian, Yang Zambonelli, Franco

179 58

116