Next Generation Teletraffic and Wired/Wireless Advanced Networking: 7th International Conference, NEW2AN 2007, St. Petersburg, Russia, September ... (Lecture Notes in Computer Science, 4712) 3540748326, 9783540748328

This book constitutes the refereed proceedings of the 7th International Conference on Next Generation Teletraffic and Wi

131 45 21MB

English Pages 499 [496] Year 2007

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Title Page
Preface
Organization
Table of Contents
Effects of Spatial Aggregation on the Characteristics of Origin-Destination Pair Traffic in Funet
Introduction
Measurements and Original Data
Magnitudes of OD Pairs
Diurnal Variation of the OD Pair Traffic
Gaussianity
Mean-Variance Relation
Conclusion
References
Users Dimensioning and Traffic Modelling in Rural e-Health Services
Introduction
Description of the e-Health Rural Scenario
Use Cases
Traffic Descriptors
Parameters Optimization
SF Services
RT Services
Users Dimensioning
Traffic Modelling
Discussion and Conclusions
References
Empirical Observations of Traffic Patterns in Mobile and IP Telephony
Introduction
Traffic Measurement Data
Traffic Demand Characterisation
Traffic Intensities
Call Holding Times
Traffic Demand Profiles
Closing Comments
References
On the Number of Losses in an MMPP Queue
Introduction
Arrival Process and Queueing System
Number of Losses
Numerical Example
Conclusions
References
On-Line State Detection in Time-Varying Traffic Patterns
Introduction
Time-Varying Nature of Aggregated Traffic
Service Configuration
Traffic Aggregate: Varying Number of Sources
Traffic Aggregate: Fixed Number of Sources
Model for Aggregated Traffic
Change-Point Statistical Tests
Basic Principles
Change in the Mean Value
EWMA Control Charts
Numerical Examples
Aggregated Traffic: Fixed Number of Sources
Aggregated Traffic: Varying Number of Sources
Conclusions
References
The Drop-From-Front Strategy in AQM
Introduction
Active Queue Management
Self-similarity of Network Traffic
Analytical and Simulation Models of RED and DSRED
Numerical Results
Conclusions
References
TCP Congestion Control over 3G Communication Systems: An Experimental Evaluation of New Reno, BIC and Westwood+
Introduction
Related Work
Live UMTS Network Performance Evaluation
TCP Congestion Control Algorithms
Experimental Testbed
Experimental Results
Goodput, Link Utilization and Fairness
RTT and Queuing Time
Timeouts and Packet Retransmission Percentage
Goodput Versus File Size
Discussion of Results
Conclusions
References
Cross-Layer Enhancement to TCP Slow-Start over Geostationary Bandwidth on Demand Satellite Networks
Introduction
Enhancing TCP for Satellite Networks
Bandwidth-on-Demand Geostationary Satellite System
Cross-Layer Enhancement for TCP Slow-Start Phase
Problem Statement
Algorithm of Cross-Layer Enhanced Slow-Start Phase
Performance Evaluation
Simulation Setup
Preliminary Inspection of the Mechanism Behavior
Impact to File Transfer Sessions
Multiple Competing Connections Scenario
Lossy Satellite Link Scenario
Summary and Conclusions
References
TCP Performance over Cluster-Label-Based Routing Protocol for Mobile Ad Hoc Networks
Introduction
An Overview of Cluster-Label-Based Mechanism for Backbone
Cluster-Label-Based Routing Protocol
Performance Evaluations
Simulation Model
Accuracy of Simulations
Simulation Results
Conclusions
References
An Analytic Model of IEEE 802.16e Sleep Mode Operation with Correlated Traffic
Introduction
Queueing Model
BufferAnalysis
Derivation of $\V(z)$
Derivation of $\U_c(z)$
Derivation of $\U(z)$
Mean Queue Content and Mean Delay
Energy Consumption
Numerical Examples
References
Real Life Field Trial over a Pre-mobile WiMAX System with 4th Order Diversity
Introduction
System Setup
System Description
Measurement Area
Measurements
Physical Performance
Received Signal Strength Indicator
Signal to Noise Ratio
Throughput Performance
Diversity Impact
Received Signal Strength Indicator
Modulation Rate
Signal to Noise Ratio
Throughput
Conclusion
References
On Evaluating a WiMAX Access Network for Isolated Research and Data Networks Using NS-2
Introduction
The WEIRD System, Objectives and Architecture
Simulating the WEIRD Scenario 1: Forest Fire Prevention
Simulating the WEIRD Scenario A3: Monitoring Volcanic Unrest
Conclusions
References
Performance Evaluation of the IEEE 802.16 ARQ Mechanism
Introduction
IEEE 802.16 ARQ Mechanism
Basics of the ARQ Mechanism
ARQ Feedback Types
Choosing the Feedback Type
Scheduling of ARQ Feedbacks and Retransmissions
ARQ Block Rearrangement
ARQ Transmission Window and ARQ Block Size
Simulation
General ARQ Results
ARQ Block Rearrangement
ARQ Feedback Types
ARQ Transmission Window
Conclusions
References
Evaluating Differentiated Quality of Service Parameters in Optical Packet Switching
Introduction
Switch Architecture and Model
Efficient Simulation Technique
Importance Sampling
Adaptive Change of Measure
Search and Update Procedure
Numerical Results
Conclusion
References
GESEQ: A Generic Security and QoS Model for Traffic Priorization over IPSec Site to Site Virtual Private Networks
Introduction
IPSec Virtual Private Networks
Quality of Service
GESEQ: Generic Security and QoS Model
Security
Quality of Service
Management
Test Scenario and Implementation
Security
Quality of Service
Management
Results Analysis
Latency
Jitter
Packet Loss
Conclusion
References
Routes Building Approach for Multicast Applications in Metro Ethernet Networks
Introduction
Metro Ethernet
Metro Ethernet Features
Spanning Tree Protocols
An Efficient Approach of Spanning Tree Building for Multicast Traffic
Conclusions
References
Performance Modelling and Evaluation of Wireless Multi-access Networks
Introduction
General Setting
GSM/EDGEModel
UMTS/HSDPAModel
802.11a WLAN Model
NumericalExperiments
Concluding Remarks
References
Analysis of a Cellular Network with User Redials and Automatic Handover Retrials
Introduction
System Description
System Model and Performance Analysis
Results and Discussion
Approximate Methodology
Redimensioning with Redials
Impact of Automatic Retrial Configuration
Distribution of the Maximum Number and Time Between Reattemps
Conclusions
References
Stochastic Optimization Algorithm Based Dynamic Resource Assignment for 3G Systems
Introduction
Dynamic OVSF Code Allocation Using GA and SA
GA Based Dynamic OVSF Code Assignment Scheme
SA Based Dynamic OVSF Code Assignment Scheme
Computer Simulations
Conclusion
References
Adaptive Resource Reservation for Efficient Resource Utilization in the Wireless Multimedia Network
Introduction
The Adaptive Resource Reservation Scheme
The Concept of Adaptive Resource Reservation Scheme
The Probability of Handoff Occurrence
The Crossing Probability per Neighbor Cell
Handoff Probability Prediction with Filtering Algorithm
The Adaptive Resource Reservation Procedure
The Performance Evaluation
Conclusion and Future Work
References
A Discrete-Time Queueing Model with a Batch Server Operating Under the Minimum Batch Size Rule
Introduction
ModelDescription
Analysis
Optimization of MBS
Single-Slot Service Times
Geometric Service Times
Deterministic Service Times of $m$ Slots
Conclusion
References
Derivatives of Blocking Probabilities for Multi-service Loss Systems and Their Applications
Introduction
ModelDescription
Convolution Algorithm
Algorithm for Derivatives of Individual Blocking
Examples of Usage of Derivatives for Multi-service Loss Systems
Conclusion
References
Rare Events of Gaussian Processes: A Performance Comparison Between Bridge Monte-Carlo and Importance Sampling
Introduction
Setting of the Problem
Definition of IS Estimators
The BMC Approach
Comparison Between IS and BMC Estimators
Simulation Results
Conclusions
References
A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES)
Introduction
FURIES General Description
FURIES Entities
Incentive Factor $IF$
FURIES Credit-Based Protocol
Initialization
Micropayment Scheme
Charging and Rewarding Model
Evaluation
Conclusions
References
Direct Conversion Transceivers as a Promising Solution for Building Future Ad-Hoc Networks
Introduction
Main DSA Challenges
Main System Architectural Features
Test Results
Conclusion
References
Location Tracking for Wireless Sensor Networks
Introduction
Related Work
The Proposed Scheme
Assumptions
Scheme Description
Performance Evaluation
Performance Metrics
Simulation Environment
Simulation Results
Conclusions
References
An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks with Anonymous Packets
Introduction
Forwarding Model and Selfish Behavior
LAC Game
Reachability of Nash Equilibria
D&F Protocol
LAC Game Payoffs Under D&F
Conclusion
References
Providing Seamless Mobility Using the FOCALE Autonomic Architecture
Introduction
The Vision of Seamless Mobility
Salient Features of Autonomic Networking
The FOCALE Architecture
FOCALE’s Use and Adaptation of Multiple Control Loops
FOCALE’s Behavioral Orchestration
FOCALE’s Management Data and Command Normalization
FOCALE’s Context-Driven Policy Management
FOCALE’s Use of Machine Learning and Reasoning
Applying FOCALE to Wired/Wireless Network Management
Using FOCALE to Implement Seamless Mobility
Conclusions
References
Evaluation of Joint Admission Control and VoIP Codec Selection Policies in Generic Multirate Wireless Networks
Introduction
A Multirate Scenario
VoIP Codec-Rate Adaptation
Admission Control
VoIP Codec-Rate Selection Policies
PerformanceResults
Is There a Right Policy?
Conclusions
References
A Novel Inter-LMD Handoff Mechanism for Network-Based Localized Mobility Management
Introduction
Network-Based Localized Mobility Management
Protocol Overview
Intra-LMD Handoff
Inter-LMD Handoff with the Current Protocol
Novel Inter-LMD Handoff Mechanism
Performance Evaluation and Discussion
Performance of the Handoff Signaling
Performance of the Packet Delivery
Numerical Results and Discussion
Conclusion
References
Improvement of Link Cache Performance in Dynamic Source Routing (DSR) Protocol by Using Active Packets
Introduction
DSR: Dynamic Source Routing
Caching
Related Work
Active Packets Approach
Overview
Format of Active Packet
Initialization
Performance Evaluation
Conclusions
References
tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing in Wireless Sensor Networks
Introduction
Problem Statement: Design Objectives for tinyLUNAR
Networking Model of WSN
Routing in Wireless Sensor Networks
Design Objectives for tinyLUNAR
From LUNAR to tinyLUNAR: Solution Outline
Packet Forwarding Via Label Switching
tinyLUNAR: Protocol Description
Modified Selector Structure
Generation of Incoming Labels
Path Establishment in tinyLUNAR
Implementation Details and Memory Footprint
Related Work
Discussion and Future Developments
Addressing and Routing for WSN
Future Development
Conclusions
References
On the Optimality and the Stability of Backoff Protocols
Introduction
Protocol
Related Work
Analysis
Model with Unbounded Backoff Counter
System Load
Expected Transmission Time
Stability Condition
Optimality Condition
Elimination the Saturation Conditions
Model with Bounded Backoff Counter
Application to the Ethernet Case
Conclusions
References
Maximum Frame Size in Large Layer 2 Networks
Introduction
Carrier Grade Ethernet
IEEE Approach
IETF Approach
The MTU Problem
Possible Solutions for the MTU Problem
Gratuitous ICMP Unreachable Type Fragment Needed
Modification of ARP/ND
Comparison of Gratuitous ICMP Fragment Needed and Modified ARP Approaches
Conclusion
References
Analysis of Medium Access Delay and Packet Overflow Probability in IEEE 802.11 Networks
Introduction
Medium Access Delay
The Packet Transmission Process of a Terminal
The Background Traffic Analysis
Medium Access Delay Analysis
Analysis of an Unsaturated Terminal
Numerical Studies
Conclusions
References
Communications Challenges in the Celtic-BOSS Project
Introduction
State-of-the-Art Situation
State-of-the-Art on Dual Mobility for Transmission with QoS
State-of-the-Art on Wireless Mobile Transmission Links
Relevance to Market Needs and Expected Impact
Technological Innovation and Strategic Relevance
Conclusions
References
Performance Analysis of the REAchability Protocol for IPv6 Multihoming
Introduction
Failure Detection and Path Exploration in the SHIM6 Architecture
Simulation Setup
Analysis of the Results
UDP Behavior
TCP Behavior
Conclusion and Future Work
References
Controlling Incoming Connections Using Certificates and Distributed Hash Tables
Introduction
Requirements and Related Work
Recipient Controlled Session Management Protocol
Controlling Incoming Connections
Revocation of Rights
Providing a Packet Level Security
Analysis and Comparison with Other Solutions
Conclusions and Future Work
References
Design and Implementation of an Open Source IMS Enabled Conferencing Architecture
Introduction
Context and Motivation
The IMS Architecture
An IMS Compliant Video Conferencing Architecture
CONFIANCE: An Open Source Implementation of the Conferencing Architecture
Server Side Components
Client Side Components
Related Work
Conclusions and Future Work
References
Author Index
Recommend Papers

Next Generation Teletraffic and Wired/Wireless Advanced Networking: 7th International Conference, NEW2AN 2007, St. Petersburg, Russia, September ... (Lecture Notes in Computer Science, 4712)
 3540748326, 9783540748328

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

4712

Yevgeni Koucheryavy Jarmo Harju Alexander Sayenko (Eds.)

Next Generation Teletraffic and Wired/Wireless Advanced Networking 7th International Conference, NEW2AN 2007 St. Petersburg, Russia, September 10-14, 2007 Proceedings

13

Volume Editors Yevgeni Koucheryavy Jarmo Harju Tampere University of Technology, Institute of Communications Engineering Korkeakoulunkatu 1, 33720 Tampere, Finland E-mail: {yk, harju}@cs.tut.fi Alexander Sayenko Nokia Research Center, Computation Structures Itämerenkatu 11-13, 000180 Helsinki, Finland E-mail: [email protected]

Library of Congress Control Number: 2007934040 CR Subject Classification (1998): C.2, C.4, H.4, D.2, J.1, K.6, K.4 LNCS Sublibrary: SL 5 – Computer Communication Networks and Telecommunications ISSN ISBN-10 ISBN-13

0302-9743 3-540-74832-6 Springer Berlin Heidelberg New York 978-3-540-74832-8 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2007 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12120567 06/3180 543210

Preface

We welcome you to the proceedings of seventh NEW2AN 2007 (Next-Generation Teletraffic and Wired/Wireless Advanced Networking) held in St. Petersburg, Russia. Significant contributions have been made in various aspects of networking in next-generation teletraffic. Presented topics encompassed several layers of communication networks: from physical layers to the transport protocols and modeling of new services. New and innovative developments for enhanced signaling protocols, QoS mechanisms, cross-layer optimization, and traffic characterization were also addressed within the program. In particular, issues of QoS in wireless and IP-based multi-service networks were dealt with, as well as financial aspects of future networks. It is also worth mentioning the emphasis placed on wireless networks, including, but not limited to, cellular networks, wireless local area networks, personal area networks, mobile ad hoc networks, and sensor networks. The call for papers attracted 113 papers from 29 countries, resulting in an acceptance ratio of 35%. With the help of the excellent Technical Program Committee and a number of associated reviewers, the best 39 high-quality papers were selected for publication. The conference was organized in 13 single-track sessions. The Technical Program of the conference benefited from two keynote speakers: Rod Walsh, NOKIA, Finland, and Saverio Mascolo, Politecnico di Bari, Italy. Moreover, a panel session on emerging wireless technologies organized and moderated by Alexander Sayenko, NOKIA, Finland, brought the wireless domain of the conference to a new level. We wish to sincerely thank the Technical Program Committee members and associated reviewers for their hard work and important contribution to the conference. This year the conference was organized in cooperation with ITC (International Teletraffic Congress), IEEE, COST 290, with the support of NOKIA (Finland) and BalticIT Ltd. (Russia). The support of these organizations is gratefully acknowledged. Finally, we wish to thank many people who contributed to the NEW2AN organization. In particular, Jakub Jakubiak (TUT) carried a substantial load with submissions and reviews, Web site maintaining, and he did an excellent job in the compilation of camera-ready papers and the interaction with Springer. Sergei Semenov (NOKIA) is to be thanked for his great efforts in conference linkage to industry. Many thanks go to Natalia Zaharova (Monomax Meetings & Incentives) for her excellent local organization efforts and the conference’s social program preparation. We believe that the work done for the seventh NEW2AN conference provided an interesting and up-to-date scientific experience. We hope that all the

VI

Preface

participants enjoyed the technical and social conference program, Russian hospitality and the beautiful city of St. Petersburg. September 2007

Yevgeni Koucheryavy Jarmo Harju Alexander Sayenko

Organization

International Advisory Committee Ian F. Akyildiz Nina Bhatti Igor Faynberg Jarmo Harju Andrey Koucheryavy Villy B. Iversen Paul K¨ uhn Kyu Ouk Lee Mohammad S. Obaidat Michael Smirnov Manfred Sneps-Sneppe Ioannis Stavrakakis Sergey Stepanov Phuoc Tran-Gia Gennady Yanovsky

Georgia Institute of Technology, USA Hewlett Packard, USA Alcatel Lucent, USA Tampere University of Technology, Finland ZNIIS R&D, Russia Technical University of Denmark, Denmark University of Stuttgart, Germany ETRI, Korea Monmouth University, USA Fraunhofer FOKUS, Germany Ventspils University College, Latvia University of Athens, Greece Sistema Telecom, Russia University of W¨ urzburg, Germany State University of Telecommunications, Russia

Technical Program Committee Mari Carmen Aguayo-Torres Ozgur B. Akan Khalid Al-Begain Tricha Anjali Konstantin Avrachenkov Francisco Barcelo Thomas M. Bohnert Torsten Braun Georg Carle Chrysostomos Chrysostomou Ibrahim Develi Roman Dunaytsev Eylem Ekici Sergey Gorinsky Markus Fidler Giovanni Giambene Stefano Giordano Ivan Ganchev

University of Malaga, Spain METU, Turkey University of Glamorgan, UK Illinois Institute of Technology, USA INRIA, France UPC, Spain University of Coimbra, Portugal University of Bern, Switzerland University of T¨ ubingen, Germany University of Cyprus, Cyprus Erciyes University, Turkey Tampere University of Technology, Finland Ohio State University, USA Washington University in St. Louis, USA NTNU Trondheim, Norway University of Siena, Italy University of Pisa, Italy University of Limerick, Ireland

VIII

Organization

Andrei Gurtov Vitaly Gutin Martin Karsten Andreas Kassler Maria Kihl Vitaly Kondratiev Tatiana Kozlova Madsen Yevgeni Koucheryavy Jae-Young Kim Jong-Hyouk Lee Vitaly Li Lemin Li Leszek T. Lilien Saverio Mascolo Maja Matijaˇsevic Paulo Mendes Ilka Miloucheva Dmitri Moltchanov Edmundo Monteiro Se´an Murphy Marc Necker Mairtin O’Droma Jaudelice Cavalcante de Oliveira Evgeni Osipov George Pavlou Simon Pietro Romano Stoyan Poryazov Alexander Sayenko Dirk Staehle Sergei Semenov Burkhard Stiller Weilian Su Veselin Rakocevic Dmitry Tkachenko Vassilis Tsaoussidis Christian Tschudin Kurt Tutschku Lars Wolf Linda J. Xie

HIIT, Finland Popov Society, Russia University of Waterloo, Canada Karlstad University, Sweden Lund University, Sweden Baltic-IT, Russia Aalborg University, Denmark Tampere University of Technology, Finland (Chair) Purdue University, USA Sungkyunkwan University, Korea Kangwon National University, Korea University of Electronic Science and Technology of China, China Western Michigan University, USA Politecnico di Bari, Italy University of Zagreb, FER, Croatia DoCoMo Euro-Labs, Germany Salzburg Research, Austria Tampere University of Technology, Finland University of Coimbra, Portugal University College Dublin, Ireland University of Stuttgart, Germany University of Limerick, Ireland Drexel University, USA RWTH Aachen, Germany University of Surrey, UK Universit` a degli Studi di Napoli “Federico II”, Italy Bulgarian Academy of Sciences, Bulgaria University of Jyv¨ askyl¨ a, Finland University of W¨ urzburg, Germany NOKIA, Finland University of Z¨ urich and ETH Z¨ urich, Switzerland Naval Postgraduate School, USA City University London, UK IEEE St. Petersburg BT/CE/COM Chapter, Russia Demokritos University of Thrace, Greece University of Basel, Switzerland University of W¨ urzburg, Germany Technische Universitt Braunschweig, Germany University of North Carolina, USA

Organization

Additional Reviewers A. Akan A. Amirante F. Araujo M. Bechler B. Bellalta A. Binzenhoefer C. Blankenhorn J. Brandt M. Bredel G. Bruck F. Cercas M. Ciurana S. D’Antonio L. De Cicco M. Dick S. Enoch J.T. Entrambasaguas C. Esli D. Ficara A. Fonte I. Goldberg L. Grieco X. Gu S. Gundavelli A. Gutscher

R. Henjes T. Hoßfeld T. Hossmann J. Jakubiak I. Jawhar R. Jurdak A. Kuwadekar M. Kaschub D. Kyung Kim E.S. Lohan F.J. Lopez-Martinez S. Luna M. Maggiora A. Malkov I. Martin-Escalona L. Martucci D. Milic L. Miniero C. Mueller J. Munilla A. Ruzzelli J.J. Sanchez Sanchez L. Servi D. M. Shila J. Silva

T. Ozsahin V. Palmisano P. Papadimitriou I. Psaras J. Riihijrvi D. Schlosser T. Staub B. Soret A. Spedalieri G. Stette L. Tavanti A. Tsioliaridou N. Vassileva F. Velez M. Waelchli O. Wellnitz L. Wood M. Wulff H. Xiong C.Y. Yang S. Yerima I.P. Zarko E. Zola

IX

X

Organization

Table of Contents

Teletraffic I Effects of Spatial Aggregation on the Characteristics of Origin-Destination Pair Traffic in Funet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ilmari Juva, Riikka Susitaival, Markus Peuhkuri, and Samuli Aalto

1

Users Dimensioning and Traffic Modelling in Rural e-Health Services . . . I. Mart´ınez, J. Garc´ıa, and E. Viruete

13

Empirical Observations of Traffic Patterns in Mobile and IP Telephony . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Poul E. Heegaard

26

Teletraffic II On the Number of Losses in an MMPP Queue . . . . . . . . . . . . . . . . . . . . . . . Andrzej Chydzinski, Robert Wojcicki, and Grzegorz Hryn

38

On-Line State Detection in Time-Varying Traffic Patterns . . . . . . . . . . . . . D. Moltchanov

49

The Drop-From-Front Strategy in AQM . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joanna Doma´ nska, Adam Doma´ nski, and Tadeusz Czach´ orski

61

TCP Protocol in Wireless Systems TCP Congestion Control over 3G Communication Systems: An Experimental Evaluation of New Reno, BIC and Westwood+ . . . . . . . . . . Luca De Cicco and Saverio Mascolo

73

Cross-Layer Enhancement to TCP Slow-Start over Geostationary Bandwidth on Demand Satellite Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . Wei Koong Chai and George Pavlou

86

TCP Performance over Cluster-Label-Based Routing Protocol for Mobile Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vitaly Li and Hong Seong Park

99

XII

Table of Contents

WiMAX An Analytic Model of IEEE 802.16e Sleep Mode Operation with Correlated Traffic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Koen De Turck, Stijn De Vuyst, Dieter Fiems, and Sabine Wittevrongel Real Life Field Trial over a Pre-mobile WiMAX System with 4th Order Diversity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P˚ al Grønsund, Paal Engelstad, Moti Ayoun, and Tor Skeie On Evaluating a WiMAX Access Network for Isolated Research and Data Networks Using NS-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Michael Bohnert, Jakub Jakubiak, Marcos Katz, Yevgeni Koucheryavy, Edmundo Monteiro, and Eugen Borcoci Performance Evaluation of the IEEE 802.16 ARQ Mechanism . . . . . . . . . Vitaliy Tykhomyrov, Alexander Sayenko, Henrik Martikainen, Olli Alanen, and Timo H¨ am¨ al¨ ainen

109

121

133

148

QoS Topics in Fixed Networks Evaluating Differentiated Quality of Service Parameters in Optical Packet Switching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Poul E. Heegaard and Werner Sandmann

162

GESEQ: A Generic Security and QoS Model for Traffic Priorization over IPSec Site to Site Virtual Private Networks . . . . . . . . . . . . . . . . . . . . . Jes´ us A. P´erez, Victor Z´ arate, and Angel Montes

175

Routes Building Approach for Multicast Applications in Metro Ethernet Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Anatoly M. Galkin, Olga A. Simonina, and Gennady G. Yanovsky

187

Wireless Networking I Performance Modelling and Evaluation of Wireless Multi-access Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Remco Litjens, Ljupco Jorguseski, and Mariya Popova Analysis of a Cellular Network with User Redials and Automatic Handover Retrials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jose Manuel Gimenez-Guzman, Ma Jose Domenech-Benlloch, Vicent Pla, Vicente Casares-Giner, and Jorge Martinez-Bauset Stochastic Optimization Algorithm Based Dynamic Resource Assignment for 3G Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mustafa Karakoc and Adnan Kavak

194

210

223

Table of Contents

Adaptive Resource Reservation for Efficient Resource Utilization in the Wireless Multimedia Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seungwoo Jeon, Hanjin Lee, and Hyunsoo Yoon

XIII

235

Teletraffic III A Discrete-Time Queueing Model with a Batch Server Operating Under the Minimum Batch Size Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dieter Claeys, Joris Walraevens, Koenraad Laevens, and Herwig Bruneel

248

Derivatives of Blocking Probabilities for Multi-service Loss Systems and Their Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V.B. Iversen and S.N. Stepanov

260

Rare Events of Gaussian Processes: A Performance Comparison Between Bridge Monte-Carlo and Importance Sampling . . . . . . . . . . . . . . . Stefano Giordano, Massimiliano Gubinelli, and Michele Pagano

269

AdHoc Networks I A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Helena Rif` a-Pous and Jordi Herrera-Joancomart´ı

281

Direct Conversion Transceivers as a Promising Solution for Building Future Ad-Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oleg Panfilov, Antonio Turgeon, Ron Hickling, and Lloyd Linder

294

Location Tracking for Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . Kil-Woong Jang

306

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks with Anonymous Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jerzy Konorski

316

Wireless Networking II Providing Seamless Mobility Using the FOCALE Autonomic Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Strassner, Dave Raymer, and Srini Samudrala

330

Evaluation of Joint Admission Control and VoIP Codec Selection Policies in Generic Multirate Wireless Networks . . . . . . . . . . . . . . . . . . . . . . B. Bellalta, C. Macian, A. Sfairopoulou, and C. Cano

342

XIV

Table of Contents

A Novel Inter-LMD Handoff Mechanism for Network-Based Localized Mobility Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joong-Hee Lee, Jong-Hyouk Lee, and Tai-Myoung Chung

356

AdHoc Networks II Improvement of Link Cache Performance in Dynamic Source Routing (DSR) Protocol by Using Active Packets . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimitri Marandin

367

tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Evgeny Osipov

379

Wireless Topics On the Optimality and the Stability of Backoff Protocols . . . . . . . . . . . . . Andrey Lukyanenko

393

Maximum Frame Size in Large Layer 2 Networks . . . . . . . . . . . . . . . . . . . . Karel Slavicek

409

Analysis of Medium Access Delay and Packet Overflow Probability in IEEE 802.11 Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gang Uk Hwang

419

EU Projects Experience Communications Challenges in the Celtic-BOSS Project . . . . . . . . . . . . . . G´ abor Jeney, Catherine Lamy-Bergot, Xavier Desurmont, ´ Rafael Lopez da Silva, Rodrigo Alvarez Garc´ıa-Sanchidri´ an, Michel Bonte, Marion Berbineau, M´ arton Csapodi, Olivier Cantineau, Naceur Malouch, David Sanz, and Jean-Luc Bruyelle

431

NGN Topics Performance Analysis of the REAchability Protocol for IPv6 Multihoming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio de la Oliva, Marcelo Bagnulo, Alberto Garc´ıa-Mart´ınez, and Ignacio Soto Controlling Incoming Connections Using Certificates and Distributed Hash Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dmitrij Lagutin and Hannu H. Kari

443

455

Table of Contents

XV

Design and Implementation of an Open Source IMS Enabled Conferencing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A. Buono, T. Castaldi, L. Miniero, and S.P. Romano

468

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

481

Effects of Spatial Aggregation on the Characteristics of Origin-Destination Pair Traffic in Funet Ilmari Juva , Riikka Susitaival, Markus Peuhkuri, and Samuli Aalto Networking Laboratory, Helsinki University of Technology P.O.Box 3000, FI-02015 TKK, Finland [email protected]

Abstract. In this paper we analyze measurements from the Finnish University Network (Funet) and study the effect of spatial aggregation on the origindestination flows. The traffic is divided into OD pairs based on IP addresses, using different prefix lengths to obtain data sets with various aggregation levels. We find that typically the diurnal pattern of the total traffic is followed more closely by the OD pairs as their volume increases, but there are many exceptions. Gaussian assumption holds well for all OD pairs when the aggregation level is high enough, and we find an approximate threshold for OD pair traffic volume after which they tend to be Gaussian. Also the functional mean-variance relation holds better when the aggregation level is higher. Keywords: Measurements, Traffic characterization, Gaussianity, Mean-variance relation.

1 Introduction Origin-Destination (OD) pair traffic refers to the traffic flow that traverses between two nodes in a network. Depending on the aggregation level, these can be, for example, hosts, routers, or ISPs. The main feature of measuring OD pair traffic is that traffic has to be aggregated both in time and space. Diurnal variation of the Internet traffic is usually studied at the coarse level of temporal aggregation with sample interval of some minutes whereas the packet level dynamics has to be studied at a very fine granularity of time. When the aggregation in space is considered, traffic flowing between two hosts is an example of very fine level of spatial aggregation, whereas ISP level studies represent coarse-grained aggregation. In many areas of traffic engineering, nature of OD pair traffic plays an important role. For example, in load balancing the shares of OD pair traffic are moved from one route to another. The idea of traffic matrix estimation is to estimate the OD traffic flows from the measured link flows. The existing estimation techniques make several assumptions about the OD pair traffic, including Gaussianity, functional mean-variance relationship and independence of the traffic samples. Evidently, the validity of these assumptions in real traffic traces depends both on the level of temporal and spatial aggregation. Few papers have studied the characteristics of OD pair traffic earlier. First, Feldman et al. [3] characterize point-to-multipoint traffic and find that a few demands account 

Corresponding author.

Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 1–12, 2007. c Springer-Verlag Berlin Heidelberg 2007 

2

I. Juva et al.

for 80% of total traffic and the traffic volumes follow Zipf’s law. Daily profiles of the greatest demands also vary significantly from each other. Bhattacharyya et al. characterize Point of Presence-level (POP) and access-link level traffic dynamics in [4]. Also they find that there are huge differences in the traffic volumes of the demands. In addition, the larger the traffic volume of an egress node, the larger also the variability of the traffic during the day. Finally, Lakhina et al. [5] analyze traffic of two backbone networks. Using Principal Component Analysis (PCA) they demonstrate that OD flows can be approximated by a linear combination of a small number of so-called eigenflows. In addition they observe that these eigenflows fall into three categories: deterministic, spiky and noisy. We have also previously studied the characteristics of measured Funet link traffic: in [7] we studied the characteristics of the aggregate link traffic and in [8] OD pair traffic at a fixed spatial aggregation level. Even though these aforementioned measurement studies answer to some questions related to OD pair traffic, full understanding how spatial aggregation changes the characteristics of OD pair traffic, is still missing. To this end, in this paper we study the effect that aggregation in space has on the OD pair traffic characteristics. Thus the main contribution of this paper is to locate boundaries for certain traffic behavior characteristics as a function of the aggregation level. The traffic of the link in Funet network is divided into OD pairs with different prefix lengths. Often traffic characteristics are analyzed in short time scales. We take the vantage point of traffic engineering and traffic matrix estimation, in which the relevant time scale is minutes, instead of seconds or less. We show that while the diurnal pattern of the OD pairs is not always the same as the diurnal pattern of the total traffic, the correlation is better, in general, as the OD pair’s traffic volume is larger. The Gaussian assumption, on the other hand, is shown to hold well for all OD pairs exceeding a certain size. For the relation between mean and variance we found that the larger the aggregation level, the better the relation holds. The rest of the paper is organized as follows. In Section 2 we explain the measurement methodology and introduce the data set used in the study. Section 3 studies the magnitudes of OD pairs, while Sections 4 and 5 study how the aggregation affects the diurnal pattern and Gaussianity of the OD pairs. In Section 6 the existence of a meanvariance relation is studied. Finally, section 7 concludes the paper.

2 Measurements and Original Data The traffic traces of this paper were captured by Endance DAG 4.23 cards from 2.5 Gbit/s STM-16 link connecting nodes csc0-rtr and helsinki0-rtr in Funet network1. The link is two-directional and we denote the direction from helsinki0-rtr to csc0-rtr by d0 and the opposite direction by d1 . Further details of the measurement process are available in earlier work based on the same measurements [7]. We divide the traffic of the link into origin-destination pairs by identifying the origin and destination networks of packets by the left-most bits in the IP address. Let l denote the number of bits in this network prefix, also called network mask. Different levels of 1

For details about Finnish university network (Funet), see www.csc.fi/suomi/funet/verkko. html.en

Effects of Spatial Aggregation on the Characteristics of OD Pair Traffic

3

aggregation are obtained by changing the prefix length l. The maximum length of the network prefix is 24 bits. With this resolution, there are 224 , or over sixteen million, possible origin networks. On the other hand, with the prefix length l = 1 there are only two networks and thus four possible OD pairs. Our procedure for selecting OD-pairs for further analysis from the original link traffic is the following. Combining both directions, the N most active networks in terms of traffic sent are selected and a N × N traffic matrix is formed, where N ≤ 100. This is enough to include all the significant OD pairs. From the obtained traffic matrix at most M greatest OD pairs in terms of sent traffic are selected for further analysis. We select M = 100, except in section 6, where we use M = 1000. Note that for very coarse level of aggregation the number of all OD pairs remains under 100. The measurements capture the traffic of two days: November 30th 2004 and June 31st 2006, with the main focus being on the first day. The traffic is divided into origindestination pairs using different prefix lengths and aggregated in time to one minute resolution. For each prefix length l and direction d0 /d1 separately, we denote the original measurement data by x = (xt,k ; t = 1, 2, . . . , T, k = 1, 2, . . . , K), where xt,k refers to the measured bit count of OD pair k over one minute period at time t minutes. Let us consider traffic of individual OD pairs. As in [1], we split the OD pair bit counts xt,k into components, xt,k = mt,k + st,k zt,k , where mt,k refers to the moving sample average, st,k to the moving sample standard deviation, and zt,k to the sample standardized residual of OD pair k. The averaging period was chosen to be one hour. Thus, mt,k = and st,k

n+30  1 xj,k 60 j=n−30+1

  n+30  1 = (xj,k − mj,k )2 . 60 j=n−30+1

350

350 300

400

350

350

300

Mbps

400

400

Mbps

450 Mbps

Mbps

500

300

300 250

250 250

250

200

200

200

200 0

5

10 15 Time h

20

0

5

10 15 Time h

20

0

5

10 15 Time h

20

0

5

10 15 Time h

20

Fig. 1. One day original traffic trace and moving average of the studied link. Left side: direction d0 , right side: direction d1 .

4

I. Juva et al.

The traces of total traffic on the first measured day in the studied link for directions d0 and d1 are shown in the left and right side of Figure 1, respectively. The figure depicts also the moving sample averages of the traces. The diurnal variation of the traffic at this level of aggregation is clearly visible. The busiest hour of the day is in the middle of the day from 11 a.m. to 12 a.m. in both directions.

3 Magnitudes of OD Pairs In this section we study the size of the OD pairs at different aggregation levels. We are interested in how the traffic is distributed in address space, and whether there is a power law behavior observable in the sizes of the OD pairs, which would mean that the decrease in OD pair size as a function of rank should be linear in the log-log scale. For OD pair k we define the volume Xk as the average of bits transferred per second over one day, T  xt,k . Xk = 60T t=1 When the level of aggregation is very coarse (l ≤ 4), the number of non-zero OD pairs is smaller than 100 and we are able to study the volumes of the complete traffic matrix. In Figure 2 we have depicted traffic matrices for cases from l = 1 to l = 4. In the density graphs the darker the color is the more traffic is sent, while white indicates that there is no traffic between the networks. When l = 1, the classification into OD-pairs is done based on the first bit of the network prefix. The density plot shows that most of the traffic in the link originates and terminates in the network whose first bit of prefix is 1. On the other hand, there is no traffic at all between networks with first bit 0. As we increase l, the density plots become sparser since the non-zero OD pairs form only a minor part of all possible OD pair combinations in the traffic matrix. One reason for sparseness is that the measurements are not network wide, but just from one link. Next we consider the volumes of the OD pairs with different values of l. In Figure 3 the OD pairs are sorted from greatest to smallest and their volumes are plotted on the log-log scale, when the prefix length varies from l = 4 to l = 22. For every level of aggregation there are approximately 15 very significant OD pairs and after that the volumes decrease. We note that for l ≥ 10 the decrease is quite linear. l1

l2

l3

l4

Fig. 2. Traffic volume sent between the origin and destination network for different prefix lengths l. Black: a lot of traffic, white: no traffic. Direction d0 .

Effects of Spatial Aggregation on the Characteristics of OD Pair Traffic l4

l10

100

10 7 5 Mbps

Mbps

10 1 0.1

1

2

5 10 20 OD pair l16

3 2 1.5 1

50

1

10 5

2

5 10 20 OD pair l22

50 100

10 0.1

2 1 0.5

Mbps

Mbps

5

0.001 0.00001 1.  107

0.2 0.1 1

2

5 10 20 OD pair

50 100

1

2

5 10 20 OD pair

50

Fig. 3. Traffic volume of OD pairs for different prefix lengths l. Direction d0 .

% of total traffic

On the left side of Figure 4 the volume of the greatest OD pair for each aggregation level l is plotted. Decrease in the volume as a function of l is first very steep even in the logarithmic scale, but then it saturates until l changes from 16 to 17 where the volume drops again. In general, as compared to the hypothetical situation where all link traffic is divided evenly among all possible OD-pairs, the decrease is moderate. On the right side of Figure 4 we show the percentage that the 15 greatest OD pairs comprise of the total link traffic as a function of l. Even for finer resolutions, such as l = 16, these 15 pairs form a significant part of the traffic. As a result of this section we can say that the classification of the link traffic based on origin and destination pairs produces "mice" and "elephants", which is a well known phenomenon from earlier Internet measurement studies. However, the power-law assumption is valid only for finer granularity of aggregation, such as l ≥ 10, where the traffic volumes are smaller.

Mbps

200 100 50 0

5

10

15 l

20

90 80 70 60 50 40 30 0

5

10

15

20

l

Fig. 4. Left side: The volumes of the greatest OD pairs as a function of prefix length l. Right side: The percentage of traffic of 15 greatest OD pairs as a function of l. Direction d0 .

6

I. Juva et al.

4 Diurnal Variation of the OD Pair Traffic In [8] we observed that at a fine aggregation level of l = 22 none of the OD pairs seemed to follow the diurnal variation of the total link traffic, in which the traffic peaks in the midday. We concluded that the strong diurnal variation in the link traffic is more explained by the variation in the number of active on-off OD pairs than diurnal pattern within these OD pairs. However, we would expect that when increasing the aggregation level, at some point the diurnal pattern should become visible in the OD pairs. In this section we study in more detail the diurnal variation of the OD pairs at different levels of OD pair aggregation. This is done by comparing the daily profiles of the OD pairs and the corresponding profile of the total link traffic, shown in the lower row of Figure 1. As an example, we plot the moving sample averages of the four largest OD pairs with aggregation levels l = 4 and l = 8 for direction d0 in Figure 5. At the coarse level of aggregation we can see different types of diurnal patterns. Pairs 3 and 4 have a diurnal variation close to the variation of the total link traffic, while pairs 1 and 2 are not so close. At the resolution l = 8 only the fourth OD pair follows the diurnal pattern of the link traffic. To better understand how the diurnal variation changes as the aggregation level l increases, we study the correlation between two time series; the moving sample average of the total link traffic, and moving sample average of the OD pair k. The correlation coefficient between any two time series x = (xi , i = 1, ..., n) and y = (yi , i = 1, ..., n) is defined as n (xi − x)(yi − y) n . (1) r(x, y) = n i=1 2 (x − x)2 i=1 i i=1 (yi − y) On the left side of Figure 6 we plot the correlation coefficients for all OD pairs with all aggregation levels l and directions d0 and d1 as a function of the volume of the OD pair. For small OD pairs there exist both positive and negative correlations but for large OD pairs the correlations are positive, as we would expect. However, dependence between the correlation and the volume of the OD pair is not strong. In the right hand side of the same figure the mean of the correlation coefficients for the OD pairs with

OD 1

OD 2

OD 3

70

35

70

60 50

15

30 5

10 15 Timeh

20

0

5

OD 1

10 15 Timeh

0

5

20

0

5

10 15 Timeh

10 15 Timeh

20

5

0

OD 3

45 40 35 30 25 20 15 10

Mbps

Mbps 5

20

OD 2

62.5 60 57.5 55 52.5 50 47.5 0

10 15 Timeh

30 27.5 25 22.5 20 17.5 15

20

18 17 16 15 14 13 12 0

5

10 15 Timeh

10 15 Timeh

20

OD 4

Mbps

0

30 25 20

40

60

Mbps

80 75

Mbps

40

Mbps

Mbps

80

65

Mbps

OD 4

45

90 85

20

20 18 16 14 12 10 8 0

5

10 15 Timeh

20

Fig. 5. The moving sample average for the 4 greatest OD pairs. Prefix length l = 4 (upper) and l = 8 (lower). Direction d0 .

1 0.75 0.5 0.25 0 -0.25 -0.5 -0.75

7

0.7 0.6 0.5 r

r

Effects of Spatial Aggregation on the Characteristics of OD Pair Traffic

0.4 0.3 0.2 0.1

0

50

100 Mbps

150

200

5

10

15

20

l

Fig. 6. Testing diurnal variation. Left side: OD pairs correlation to total link traffic as a function of the traffic volume. Right side: Average correlation of OD pairs with different prefix lengths l.

given prefix length l are plotted. We can see that the mean correlation decreases as a function of l, as the earlier figures indicated. As a conclusion of this section we can state that as the aggregation level of the traffic coarse, also the diurnal traffic pattern of the OD pairs is closer to the variation of the total link traffic. However, there is not any clear bound in OD pair volume or in the prefix length, after which we can say that the daily behavior is similar to the familiar profile found in the link traces.

5 Gaussianity A typical assumption in traffic engineering is that traffic follows Gaussian distribution. Specifically, traffic matrix estimation techniques make this assumption to simplify statistical calculations. In [7] the aggregated link traffic was found to follow very closely the Gaussian distribution. However, when we studied the origin-destination flows in [8], only a small portion of them were anywhere close to Gaussian, typically only the larger flows. Due to the Central Limit Theorem we might assume that when the aggregation of individual non-gaussian flows is large enough, the aggregate will indeed follow the Gaussian distribution. In [9] the authors studied the number of users required for aggregate to be Gaussian and found that "a few tens of users" is typically sufficient. We study the different aggregation levels in terms of traffic volume in order to determine how much traffic is needed to yield Gaussian behavior. In this paper, the Gaussianity of each OD pair is evaluated by the Normal-quantile (N-Q) plot of the standardized residual zt,k . The original sample (denoted by x in the equation) is ordered from the smallest to the largest and plotted against a, which is defined as i ) i = 1, . . . , n, ai = Φ−1 ( n+1 where Φ is the cumulative distribution function of the Gaussian distribution. The vector a contains the quantiles of the standard Gaussian distribution, thus ranging approximately from −3 to 3. If the considered data follows the Gaussian distribution, the N-Q plot should be linear. Goodness of fit with respect to this can be calculated by the linear correlation coefficient r(x, a), defined in (1), and the value r2 is used as a measure of the goodness of fit, an approach used in [10] and in our earlier works [7,8]. In [9] the

I. Juva et al.

0.0

0.2

0.4

r^2

0.6

0.8

1.0

8

0

50

100

150

200

Mbps

Fig. 7. Testing Gaussianity: Goodness of fit values r 2 as a function of OD pair traffic volume

1kbps − 1Mbps

0

0

100

1000

300

2000

500

3000

0 − 1 kbps

0.2

0.4

0.6

0.8

0.0

1.0

0.2

0.4

0.6

r^2

r^2

1 Mbps − 10 Mbps

> 10 Mbps

0.8

1.0

0.8

1.0

0

0

20

50

40

60

100

80

150

0.0

0.0

0.2

0.4

0.6 r^2

0.8

1.0

0.0

0.2

0.4

0.6 r^2

Fig. 8. Testing Gaussianity: Distribution of r 2 values for OD pairs of different traffic volumes

authors studied this method and found that although simple, it is sufficiently accurate to determine the validity of the Gaussian assumption. They note that when r2 > 0.9 then also the more complex Kolmogorov-Smirnov test usually supports the assumption that the traffic is Gaussian. In Figure 7 the size of the OD pair traffic volume (bits per second) is plotted against the goodness of fit value r2 of the Gaussian assumption. We can see from the figure that the larger flows are always close to Gaussian, with r2 values easily over 0.90. The largest OD pair with r2 < 0.90 has traffic volume of 17.5 Mbps. The vertical line in the figure is located at 10 Mbps, which seems to be an approximate threshold after which an overwhelming majority of the OD pairs have r2 > 0.90, with r2 > 0.98 for many of the OD pairs, as seen in the histogram of Figure 8. For OD pairs of size 1 Mbps

9

0.8 0.6 0.4

r^2

80 0

0.2

40

Mbps

120

1.0

Effects of Spatial Aggregation on the Characteristics of OD Pair Traffic

5

10

15

20

l

5

10

15

20

l

Fig. 9. Testing Gaussianity: Average OD pair traffic volumes and goodness of fit values r 2 as a function of prefix length l. Direction d0 .

to 10 Mbps there is still a lot of Gaussian traffic, while for OD pairs smaller than 1 Mbps there is not any Gaussian behavior observable. For the smallest OD pairs the fit is almost always near zero, as these are typically flows that have one or few bursts of traffic and are idle the rest of the time. In Figure 9 the average OD pair traffic volumes and the average r2 values are shown as a function of the prefix length. The average is taken over those largest OD pairs that comprise 80 percent of total traffic. For the link direction d0 , depicted in the figure, the first six cases, with prefix lengths from 1 to 6, have an aggregation level high enough so that their average traffic volume is over 10 Mbps, and the r2 values for the first seven cases exceed 0.9. For the d1 direction, the first six are over 10 Mbps and the same six are over 0.9 while the the seventh is almost exactly 0.9. In general, the ten megabit threshold seems to approximately apply also for averages. An average of 10 Mbps implies that the goodness of fit is better than 0.90. However, in both directions the values decline rather slowly from good to reasonable to adequate until a steep drop occurs from the adequate values to the bad values between network prefixes of 15 and 20 bits. While Figure 9 is in linear scale and fails to depict any observable change in the mean flow size in this region, Figure 4, in logarithmic scale, shows a steep drop in the maximum size of the OD pair. To summarize, while it is impossible to set a concrete threshold, it seems that in our data majority of the OD pairs with at least 10 Mbps of traffic are fairly Gaussian.

6 Mean-Variance Relation Traffic matrix estimation is an underdetermined problem, if we do not have any extra information. A typical way to obtain the extra information is to use the mean-variance relation. A functional relation is assumed between the mean λ and the variance Σ of an OD pair’s traffic volume. Although this spatial mean-variance relation is a key assumption in many traffic matrix estimation techniques [1,11,12,13], evidence of its validity is contradictory. Cao et al. [1] found the relation sufficiently valid to justify its use, but their study is of a local area network. Gunnar et al. [6] found the relation valid in study of a Global Crossing backbone network, while Soule et al. considered the validity

10

I. Juva et al.

not sufficient in their study [14]. We found the relation to hold moderately in the Funet network, with average goodness of fit value around r2 = 0.80 [8]. That study, however, was done with a high resolution, leading to rather small traffic volumes. Now we have extended the measurement data for higher aggregation levels, which probably gives more relevant results, as it is similar to typical traffic matrix estimation environment, where a backbone network with large traffic volumes is considered. The commonly used power law relation can be written as Σ = φ · diag{λc }. The power law relation for the OD pair i is σi2 = φ · λci , and its logarithm is log σi2 = c log λi + log φ. Thus, if the relation held, the points would fall on a line with slope c and intercept log φ in the log-log scale. This is a simple linear regression model and we can measure the validity of the mean-variance relation with the linear correlation goodness of fit value r2 used in the previous section. For each prefix length, the mean and the variance are calculated for each one hour period in the 24 hour trace. In Figure 10 the values are depicted for one selected hour and two selected prefix lengths, with one point in the plot representing the mean and the variance of one OD pair for that hour. For a longer prefix (l = 18) r2 = 0.80, which is in line with previous results. It can be seen that the values deviate significantly more from the regression line making the fit worse. However, for shorter prefix (l = 7), depicted in the same Figure, the fit is much better, about r2 = 0.95. In Figure 11 the average goodness of fits values are shown as a function of the network prefix length l. As the prefix gets longer, there are more OD pairs, with the average size of an OD pair obviously getting smaller. Recall that the average OD pair sizes for different prefixes are shown in Figure 9. For the longer prefixes the fit of the meanvariance relation is around 0.75 to 0.80. As the resolution gets coarser, the goodness of fit values improve to over 0.90, in some cases as high as 0.95. The OD pair traffic volumes at these aggregation levels are still less than 100 Mbps, and as the growth is approximately linear as a function of the aggregation level, we may conclude that for larger traffic flows the fit is at least as good, probably better. Table 1 shows the values of the exponent parameter c with different aggregation levels. It can be said that the parameter stays relatively constant and that the values fall between the results reported for parameter values in other networks [1,6,14]. l7

l18

30

25

20

log Σ2

log Σ2

25 15 10

10 5

5 0

20 15

0

2.5

5

7.5 10 12.5 15 17.5 log Λ

0

0

2

4

6 8 10 12 14 log Λ

Fig. 10. Mean variance relation in log-log scale. Left: r 2 = 0.95, right: r 2 = 0.80.

Effects of Spatial Aggregation on the Characteristics of OD Pair Traffic

0.95

0.9

r^2

r^2

0.95

11

0.85 0.8

0.9 0.85 0.8

4

8

12 l

16

4

8

12 l

16

Fig. 11. Testing mean-variance relation: Goodness of fit values r 2 as a function of prefix length l. Directions d0 on the left side, d1 on the right side.

Table 1. Estimates for the mean-variance relations exponent parameter c for different prefix lengths l l 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 c 1.64 1.60 1.60 1.60 1.66 1.71 1.72 1.76 1.77 1.73 1.75 1.76 1.73 1.71 1.67 1.66 1.71

We can conclude that there is a clear dependency of the mean-variance relation fit and the aggregation. Most importantly, there is a strong functional mean-variance relation for the cases where aggregation level is high.

7 Conclusion In this paper we have analyzed the origin-destination pair traffic in the Funet network, and in particular the effects that spatial aggregation has on these characteristics. The gaussian assumption holds better when the aggregation level is higher. An approximate threshold, after which all OD pairs are at least fairly Gaussian, would appear to be around traffic volumes of 10 to 20 Mbps. This means that for many traffic engineering and traffic modeling tasks where we consider much larger traffic flows the Gaussian assumption is justified, but it probably cannot be used for cases with smaller traffic volumes due to low aggregation level. The diurnal variation of the OD pairs follow the diurnal pattern of total traffic more closely when the aggregation level is higher. However, there is not a clear cut boundary as in the Gaussianity assumption. We can point out, though, that it would be ill-advised to assume in any scenario that diurnal patterns are similar for all OD pairs, or that busy hours of different flows would coincide. We validated the spatial power law assumption between mean and variance of the OD pairs. Particularly with large aggregation levels it holds well. This is an essential result concerning traffic matrix estimation techniques which rely on this very assumption. Our results also show that the exponent parameter remained about constant regardless of the aggregation, and was within the range of values obtained for it in literature. To conclude, we can state that the more aggregated the traffic becomes, the more well behaved it is in general, in the sense that the assumptions studied hold better.

12

I. Juva et al.

References 1. Cao, J., Davis, D., Wiel, S.V., Yu, B.: Time-varying Network Tomography. Journal of the American Statistical Assosiation 95, 1063–1075 (2000) 2. Barthelemy, M., Gondran, B., Guichard, E.: Spatial Structure of the Internet Traffic, Physica A: Statistical Mechanics and its Applications 319 (2003) 3. Feldmann, A., Greenberg, A., Lund, C., Reingold, N., Rexford, J., True, F.: Deriving Traffic Demands for Operational IP Netowrks: Methodology and Experience. In: IEEE/ACM Transactions on Networking, vol. 9(3), ACM Press, New York (2001) 4. Bhattacharyya, S., Diot, C., Jetcheva, J., Taft, N.: Pop-Level and Access-Link-Level Traffic Dynamics in a Tier-1 POP. In: Proceedings of ACM Internet Measurement Workshop (IMW), ACM Press, New York (2001) 5. Lakhina, A., Papagiannaki, K., Crovella, M., Diot, C., Kolaczyk, E.D., Taft, N.: Structural Analysis of Network Traffic Flows. In: SIGMETRICS/Performance 2004, New York, USA (2004) 6. Gunnar, A., Johansson, M., Telkamp, T.: Traffic Matrix Estimation on a Large IP BackboneA Comparision on Real Data. In: IMC 2004, Taormina, Italy, October (2004) 7. Juva, I., Susitaival, R., Peuhkuri, M., Aalto, S.: Traffic Characterization for Traffic Engineering Purposes: Analysis of Funet Data. In: NGI 2005, Rome, Italy (2005) 8. Susitaival, R., Juva, I., Peuhkuri, M., Aalto, S.: Characteristics of OD Pair Traffic in Funet. In: ICN 2006, Mauritius (Extended version to appear in Telecommunications Systems) (2006) 9. van de Meent, R., Mandjes, M., Pras, A.: Gaussian Traffic Everywhere? In: ICC 2006, Istanbul, Turkey (2006) 10. Kilpi, J., Norros, I.: Testing the Gaussian Approximation of Aggregate Traffic. In: 2nd ACM SIGCOMM Internet Measurement Workshop, Marseille, France (2002) 11. Vardi, Y.: Network Tomography: Estimating Source-Destination Traffic Intensities from Link Data. Journal of the American Statistical Association 91, 365–377 (1996) 12. Liang, G., Yu, B.: Pseudo Likelihood Estimation in Network Tomography. In: IEEE Infocom (2003) 13. Juva, I., Vaton, S., Virtamo, J.: Quick Traffic Matrix Estimation Based on Link Count Covariances. In: proceedings of ICC 2006, Istanbul, Turkey (2006) 14. Soule, A., Nucci, A., Cruz, R., Leonardi, E., Taft, N.: How to Identify and Estimate the Largest Traffic Matrix Elements in a Dynamic Environment. In: SIGMETRICS/Performance’04, New York, USA (2004)

Users Dimensioning and Traffic Modelling in Rural e-Health Services I. Martínez, J. García, and E. Viruete Communications Technologies Group (GTC), Aragon Institute for Engineering Research (I3A) D.204 – Dpt. IEC. Ada Byron Building. Univ Zaragoza (CPS.UZ) – 50018 Zaragoza Spain {imr, jogarmo, eviruete}@unizar.es

Abstract. The development of e-Health services in rural environments, where broadband networks are usually not accessible, requires a specific analysis of available resources to improve Quality of Service (QoS) management. This work quantifies the maximum number of simultaneous users that fulfill the specific QoS levels in common e-Health services, including both store-andforward and real-time telemedicine applications. The analysis also proposes variations in the modelling of traffic distributions regarding the number of multiplexed users. The results obtained in this study permit an accurate users dimensioning, which is necessary to optimize the performance and to guarantee the QoS requirements in this kind of services where network resources are limited. Keywords: e-Health, QoS, rural services, traffic model, user dimensioning.

1 Introduction The great advance in new technologies in the last years has allowed to increase the quantity and to improve the quality of e-Health services in very varied assistance scenarios (rural environments, tele-assistance, home assistance, etc.) [1]-[3]. Every of these heterogeneous environments includes different Types of Service (ToS) that require specific analyses and precise estimations of the Quality of Service (QoS) level that can offer [4], [5]. In order to achieve that objective, it is crucial to study two aspects: the specific nature of the information to transmit and the exact behaviour of the networks transporting it. Regarding the first aspect, a particular description of traffic models and parameters associated to the service is required. With regard to the second, the network parameters that allow to estimate QoS levels have to be studied to guarantee the feasibility, efficiency and the precise parameter range for the correct behaviour of e-Health services [6]-[8]. In this line, an extended idea is to manage and vary adaptively the transmission of information generated by applications (codecs, transmission rate and compression levels, etc.) to adapt it to network resources (capacity, available bandwidth, performance, etc.). This concept would permit to improve the QoS of e-Health communications to approach their optimum behaviour in every moment [9], [10]. In the last Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 13 – 25, 2007. © Springer-Verlag Berlin Heidelberg 2007

14

I. Martínez, J. García, and E. Viruete

years, this idea has been developed in multimedia scenarios over best-effort networks like Internet, but a detailed analysis in a rural environment, like the one presented in this article, would contribute with quantitative results to optimize QoS and to model the traffic of the sources in the e-Health applications. Rural scenarios (characterized by the long distances to the hospital) are one of the most representative environments in which new technologies allow to improve health services by bringing closer the hospital and the patient, and benefiting users in a massive way, irrespective of their location. In this context, a study to fix specific models depending on the type of traffic and the volume of information transferred as a function of the available resources are required to correctly develop services and to dimension the maximum number of users to be granted guaranteed QoS in the most adverse situations. The analysis presented in this article has been carried out thanks to an ad-hoc tool [11], [12] that allow integrating the results obtained from experimental measurements (developed at the network laboratory of the Zaragoza University) and simulated traces (developed with the Network Simulator (NS-2) tool and using specific traffic and network models). This integrated methodology permits to characterize, optimize and model the service from the two main points-of-view: application traffic and communication networks. Section 2 describes the characteristics of the rural scenario, the use cases and the traffic parameters (from the point of view of the application and the network). Section 3 analyzes the optimum application parameters that fulfill QoS depending on network conditions. These parameters serve as the starting point to Section 4, where the maximum number of system users is obtained. The different traffic models for this environment are presented in Section 5. Finally the results obtained and their translations into adaptive mechanisms to guarantee QoS are discussed in Section 6.

2 Description of the e-Health Rural Scenario The features of the rural scenario correspond to a communication between a medical non-specialist (situated in a rural medical centre) and the reference hospital in order to offer tele-consulting with the medical specialist or patient tele-care, see Fig.1. The rural medical centers are situated in a remote place with fixed interconnection technologies (Public Switched Telephone Network, PSTN, or Digital Subscriber Line, DSL). These accesses are often based on narrowband technologies [13], [14]. Thus, for every user connection, the maximum transmission rate to the hospital (upstream) r ≤ 64kb/s is considered in the access point. These different user connections are multiplexed over the remote hospital server, which requires more capacity (C=k·64kb/s, with k ≥ 1). In addition, every user connection may include different ToS grouped into two main categories: Store-and-Forward (SF) services and Real-Time (RT) services. SF services are used for applications without time requirements (e.g. medical tests transmission to Electronic Healthcare Record (EHR) database). RT services are used by applications that require minimum thresholds of delay and packet loss (biomedical signals transmission, medical video-conference, etc.). In order to study most of the rural situations, several Use Cases (UC) are proposed, see Fig. 1.

Users Dimensioning and Traffic Modelling in Rural e-Health Services

15

In every UC, it is useful to take into account the study of service performance (according to the occupation factor of network resources, ρ) in order to evaluate the number of simultaneous users (N) that may be multiplexed keeping their individual QoS level. 2.1 Use Cases Based on the technical description of the rural scenario, several real situations are proposed (UCs, see Fig. 1). The UCs descriptions are the following: − UC1. The most frequent UC consists of remote transmission (to the reference hospital) of medical tests (ECGs, ECHOs, digital images) acquired on the medical centre (SF.Data). − UC2. Including UC1, it adds transmission of clinical/administrative data and remote access to the EHR database (RT.EHR). − UC3. It consists of UC2 adding a RT videoconference with a medical specialist for diagnostic support (RT.Media), which includes audio (RT.Audio) and video (RT.Video) services. − UC4. Including UC3, it is usual to add the RT acquisition and transmission of specific vital signals (pulse, blood pressure) in order to complete patient diagnostic (RT.Bio). These UCs defined previously include SF and RT services and permit to evaluate and quantify the optimum performance areas depending on N and ρ to guarantee the recommended QoS. The result of this evaluation will also permit to model the traffic parameters that characterize the service to propose new traffic models, and to design optimum algorithms according to the variable conditions of the system. For leading this study, it is necessary to define the main traffic parameters which take part in the

Fig. 1. Evaluation scenario for a rural e-Health service between a Primary Healthcare Centre and a hospital; including transmission of medical tests, patient information, and biomedical signals, EHR updating and audio/video-conference

16

I. Martínez, J. García, and E. Viruete

Fig. 2. Application traffic descriptors and network QoS parameters associated to the evaluation of rural e-Health scenarios

scenario, their specific values in the rural context, and the variable QoS to optimize network resources. 2.2 Traffic Descriptors The service model used in this study is based on previous contributions detailed in [15], and it has been designed from the technical results obtained in [11], [12] and from the main traffic descriptors and conclusions over QoS in related works [16]-[19]. All these proposed QoS models include the performance of application as well as network technologies and, from both points of view, a generic evaluation scheme for rural e-Health scenarios is proposed in Fig.2. A. Application Parameters − Data size (S). Amount of data (in their original format) generated by the traffic generator (source application). − Packet size. The transfer unit size of the Internet Protocol (IP) datagram, using the TCP (SMSS) or UDP (s) protocol, depending on the information type (the final packet sizes are calculated adding network and physical layer headers). − Data rate. It may be defined considering several parameters: Peak Data Rate (PDR) that it is the maximum data rate (it is the inverse of the nearest timestamps between two consecutive packets, 1/Δt), and Sustained Data Rate (SDR) that is the transmission data rate measured in a reference time interval (T = ti+n - ti), see (1). − Maximum Burst Size (MBS). It is defined as the maximum number of packets that may be transmitted at PDR guarantying SDR. The relation factor of both parameters is defined as Burst Tolerance (BT), see (1). PDR = 1/ T ⎢ BT ⎥ MBS = ⎢1 + , ⎥ with SDR = 1/ Ts ⎣ Ts − T ⎦

and with BT = ( MBS − 1) ⋅ (

1 SDR



1 PDR

)

(1)

B. Network Parameters − End-to-End Delay (EED) [20]. It is defined as the time since a packet is transmitted until the packet is received at its destination. This is the addition of several delays: accessing, processing, buffering, etc. The EDD is completed by other parameters as jitter (EED variance between consecutive delays: for RT services, a probability of P[jitter>20ms] < 10% must be guaranteed).

Users Dimensioning and Traffic Modelling in Rural e-Health Services

17

− Packet Loss Rate (PLR) [21]. It is the number of lost packets with regard to transmitted packets. Thus, the EED-PLR combination is decisive in the QoS study. − BandWidth (BW) and Available BW (ABW) [22]. BW represents the capacity (C) for all the communications that share the link and ABW is the capacity not used, which is available for new input connections. Moreover, it is usual to define the effective capacity (Ce) as the real resources for data transmission measured in a reference time interval. − Occupation factor (ρ). It is normally used for link occupation comparisons related to available resources. It is a good indicator of the service efficiency and performance [23], [24]. In a stable system without packet loss, ρ is limited by a maximum value (ρmáx). Moreover, the control bits are usually distinguished of the information bits; thus, ρ is usually normalized to its maximum value, see ρ* in (2). ρ* =

ρ Ce C = < 1, with ρ = C C , ρmáx = ρmáx Ce e

e máx

C

máx

(2)

and C = r · k → Ce = r · ke → Ce máx = r · ke máx

3 Parameters Optimization From the specific characteristics of the rural scenarios and some conclusions obtained in previous works [25], this paper proposes new considerations for the traffic descriptors focused on application parameters: data size (S), packet size (SMSS for TCP, and s for UDP), data rate (1/Δt), and burst lengths (bs, bt, and MBS). The variation range considered in this study is detailed in Appendix I. 3.1 SF Services In order to study the main parameters related to SF services, UC1 (that only includes SF.Data) was analyzed. Thus, the influences of the SF parameters (SMSS, Δt and MBS) were evaluated according to EED and ρ* thresholds for different congestion levels: low-light (PLRn]

−2

10

−3

10

20

40

60

80

100

n: number of customers

Fig. 2. Pr[U > n] in terms of n; c = 10, ρ = 0.7, α = 0.9

A Discrete-Time Queueing Model with a Batch Server

255

l−1 

(y − 1)E xWk y Hk = (y − 1) q0 (n) − yR(1, x) n=0

*

+ S(y)y x Pr[H = 1] + c

c−1 

+ e(n) [x − x ] n

c

.

(10)

n=l

Hereby, we have used the fact that R(1, 1) = Pr[H = 1]. Let us denote the steadystate PGF of the number of customers in a served batch by B(x). Replacing y by 1 and using R(1, x) = B(x) Pr[H = 1] in (10) implies:  1 e(n) [xn − xc ] . Pr[H = 1] c−1

B(x) = xc +

(11)

n=l

From (11), we easily obtain the corresponding probabilities b(n): ⎧ e(n) if l ≤ n ≤ c − 1 ⎪ ⎨ Pr[H=1]  c−1 e(n) b(n) = 1 − n=l Pr[H=1] if n = c ⎪ ⎩ 0 else We still need to calculate Pr[H = 1]. To this end, we replace z by 1 in (8); we obtain: 

A (1) Pr[H = 1] =

l−1  n=0

q0 (n) +

c−1 

e(n)(c − n)

n=l

c − S  (1)A (1)

,

(12)

where q0 (n) and e(n) are already numerically calculated. In Table 1, some probabilities b(n) are shown for a system with c = 10, l = 5, and the parameter α of the geometric service distribution equals 0.9. The probabilities are calculated for several loads. We observe that the higher the load, the more the batches are filled. Other performance measures, such as the mean number of customers in the queue while a non-full batch is served, can be calculated as well. Table 1. b(n) for a system with c=10, l = 5 and α = 0.9 n b(n) if load = 0.1 b(n) if load = 0.5 b(n) if load = 0.9 5 0.93891 0.46668 0.07885 6 0.05269 0.15156 0.04244 7 0.00502 0.06565 0.02498 8 0.00175 0.04553 0.01936 9 0.00084 0.03781 0.01779 10 0.00079 0.23277 0.81659

256

D. Claeys et al.

4

Optimization of MBS

In this section, we investigate through some numerical examples how an appropriate choice for the MBS l is influenced by the distribution of the service times and the load. Note that also other parameters can have an influence, but we do not investigate them in this paper. We assume a Poisson distribution for the number of arrivals, i.e. the probability of having n arrivals during a slot is e−A





n (1) (A (1)) . n!

We consider three cases for the distribution of the service times:

– S(z) = z: this means that service cycles last exactly one slot. – S(z) = (1−α)z 1−αz : the lengths of service cycles have a geometric distribution. – S(z) = z m , m > 1: the service cycles last exactly m slots. We define the optimal MBS as the MBS that minimizes the mean system contents and hence the mean delay of the customers. 4.1

Single-Slot Service Times

Fig. 3 shows the mean system contents (left pane) and the mean delay (right pane) in terms of the load for a system with a server capacity c of 10. In the figure we plotted the curves for 3 MBS’s. The figure reflects that the system performs best with MBS l equal to one, which corresponds to an immediate-batch service policy. This is trivial since waiting is not useful in this case. Indeed, the service cycle is finished at the end of the slot it started, thus the best policy is to serve every slot. In this case, we obtain for U (z) and l = 1, the same expression as for a system with c servers of capacity one. In Fig. 3, we also observe that, for very low load ρ en l > 1, the mean system contents is not converging to zero, although, for load equal to zero, the mean system contents is zero. Hence, we have a discontinuity in ρ = 0. It can intuitively be shown that for a MBS l > 1 and ρ → 0, the mean system contents is about l−1 2 . This is also observed from the figure. If we investigate the mean delay of the system, we remark again a special situation for ρ → 0 and l > 1: the mean delay goes to infinity. This is caused by the long time customers might have to wait to form a batch of at least l customers if the load is low. If the load increases, the mean delay diminishes. For still higher loads the queueing effect becomes dominant, leading to an increasing mean delay. 4.2

Geometric Service Times

In Fig. 4, the mean delay is plotted versus the load. We assume a server capacity c equal to 10. The mean service time is equal to 5 slots (i.e. α = 0.8) in the left part while this is 10 slots in the right part (i.e α = 0.9). The figure illustrates that l = 1 is not always the best choice in this case. We observe that a larger MBS is preferable for a higher load. So, there are transition points in the load from where the system performs better with a certain MBS i(≤ c) than with MBS i − 1. From these loads onwards, the positive effect of waiting longer to provide a

A Discrete-Time Queueing Model with a Batch Server

257

10

30

9 25

8 7

mean system contents

mean delay

l=10

4 10

3

l=10

l=5

2 5

l=5 1

l=1

l=1 0 0

0.2

0.4

0.6

0.8

1

0 0

0.2

0.4

0.6

0.8

1

load

load

Fig. 3. Server capacity c = 10; service cycles last exactly 1 slot; left part: mean system contents; right part: mean delay 40

40

l=10

30

30

l=5 mean delay

mean delay

l=10

15

15

l=5

l=1 10

10

5

0 0

l=1

5

0.2

0.4

0.6

0.8

1

0 0

0.1

load

0.2

0.3

0.4

0.5

0.6

0.7

0.8

load

Fig. 4. Mean delay versus the load for several MBS’s with server capacity c = 10; α = 0.8 in the left part and α = 0.9 in the right part

better utilisation of the server wins on the negative effect that customers might have to wait a long time before l customers are present. We remark that the profit of waiting is larger for larger mean service times; the transition points also appear for a lower load. 4.3

Deterministic Service Times of m Slots

In this last example, the service times are deterministically equal to m slots. In Fig. 5, we plotted the mean delay as a function of the load. The service time m is equal to 5 slots in the left part and 10 slots in the right part. These are the same mean service times as in Fig. 4. Though it appears like there are no transition points on this figure, there in fact are: they appear for a very high load

258

D. Claeys et al. 40

40

l=10 30

30

l=5

l=10

mean delay

mean delay

15

15

l=5

l=1 10

10

5

5

l=1

0 0

0.2

0.4

0.6

load

0.8

1

0 0

0.2

0.4

0.6

0.8

1

load

Fig. 5. Mean delay versus the load for several MBS’s with server capacity c = 10; m = 5 in the left part and m = 10 in the right part

and the performance gain is negligible. This difference with respect to geometric service times is caused by the variance of the service times. It is well-known that in many queueing systems a bigger variance of the service times causes a higher mean system contents. Apparantly, the lower the MBS l, the more this effect plays a role.

5

Conclusion

In this paper, we have studied a system with a batch server operating under the MBS service policy. Specifically, we have calculated the steady-state PGF of the system contents at the beginning of an arbitrary slot for the M X |GI l,c |1 queueing model. From this PGF we could obtain the required performance measures. We have also analyzed the number of customers in a served batch. Furthermore, by means of some case studies in section 4, we have investigated the influence of the load and the distribution of the service times on the optimal choice of the MBS l. This has enabled us to conclude: – If the service times are equal to one slot, then l = 1 is the optimal MBS. – If the service times can last longer than one slot, the load influences the optimal MBS. Then, there are transition points in the load from where the system performs better with a certain MBS i(≤ c) than with MBS equal to i − 1. Obviously other parameters also influence the optimal MBS, e.g. the distribution of the number of arrivals during a slot. This is a topic for future research. Furthermore, we also plan to incorporate the influence of the number of customers in a served batch on the service times in a future analysis. This will obviously complicate the analysis.

A Discrete-Time Queueing Model with a Batch Server

259

References 1. Abate, J.: Numerical inversion of probability generating functions. Operations Research Letters 12(4), 245–251 (1992) 2. Bruneel, H., Steyaert, B., Desmet, E., Petit, G.H.: Analytic derivation of tail probabilities for queue lengths and waiting times in ATM multiserver queues. European Journal of Operational Research 76, 563–572 (1994) 3. Chaudhry, M.L., Templeton, J.G.C.: A First Course in Bulk Queues. Wiley, New York (1983) 4. D¨ ummler, M.A., Sch¨ omig, A.K.: Using discrete-time analysis in the performance evaluation of manufacturing systems. In: 1999 International Conference on Semiconductor Manufacturing Operational Modeling and Simulation (SMOMS ’99), San Francisco, California (1999) 5. Fiems, D., Bruneel, H.: A note on the discretization of Little’s result. Operations Research Letters 30, 17–18 (2002) 6. Gupta, U.C., Goswami, V.: Performance analysis of finite buffer discrete-time queue with bulk service. Computers & Operations Research 29, 1331–1341 (2002) 7. Kim, N.K., Chaudhry, M.L.: Equivalences of Batch-Service Queues and MultiServer Queues and Their Complete Simple Solutions in Terms of Roots. Stochastic Analysis and Applications 24, 753–766 (2006) 8. Kleinrock, L.: Queueing Systems: Volume – I Theory. Wiley Interscience, New York (1975) 9. Neuts, M.F.: A general class of bulk queues with Poisson input. Ann. Math. Stat. 38, 759–770 (1967) 10. Qiao, C.M., Yoo, M.S.: Optical burst switching (OBS) - a new paradigm for an optical Internet. Journal of high speed networks 8(1), 69–84 (1999)

Derivatives of Blocking Probabilities for Multi-service Loss Systems and Their Applications V.B. Iversen1 and S.N. Stepanov2 COM · DTU, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark [email protected] 2 Sistema Telecom 125047, Moscow, 1st Tverskay-Yamskaya 5, Russia [email protected] 1

Abstract. Derivatives of blocking probabilities of multi-service loss networks are important for traffic engineering. An explicit formula for the derivatives of blocking probabilities with respect to offered traffics is obtained expressed by stationary probabilities of global state probabilities. The approach is based on the convolution algorithm. It allows us to find expressions for the derivatives in a much more easy way than known so far. It is briefly shown how derivatives can be applied for approximate evaluation of performance measures and for studying the error of performance measures estimation caused by small changes of offered traffic. The results can also be applied for network optimization. Keywords: Derivatives, blocking probabilities, multi-service loss networks, approximate evaluation.

1

Introduction

The derivatives of blocking probabilities of multi-service loss systems are important for solving optimization problems, but to obtain them in a suitable form is a complicated task. There are only few results in this field [1]–[6]. Complete solutions exist for models whose performance measures are expressed through Erlang-B function [1,3]. The derivative of the Erlang-B function En,A with respect to the intensity of offered traffic A is: n − A(1 − En,A ) dEn,A = · En,A . dA A

(1)

Here n is the capacity of the link expressed in basic bandwidth units (BBU). The requests for bandwidths arrive according to Poisson processes. In (1) the value of the derivative of Erlang-B formula is expressed through the value of Erlang-B function. This property simplifies the estimation of derivatives. After calculating En,A we can easily find the derivatives of En,A with respect of offered traffic. Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 260–268, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Derivatives of Blocking Probabilities

261

For multi-rate systems the situation is more complicated because (i) we do not have explicit expressions for performance measures through the values of input parameters and (ii) because we may have an arbitrary number of input flows. Results similar to (1) are obtained in [4] where for a single-link multirate model simple expressions are found for derivatives of blocking probability of flow number i with respect to traffic intensity of i’th flow. These derivatives are expressed through global state probabilities, i.e. distribution of total number of occupied basic bandwidth units. Global state probabilities together with performance measures are obtained when realizing the convolution algorithm [10]. So in this case the values of derivatives can be found together with main stationary performance measures of the model. Unfortunately, to obtain these particular derivatives is not sufficient for studying the error of estimation of performance measures caused by small changes of offered traffic and to solve other similar problems related to the practical usage of derivatives. To do so we need to know the derivatives of the blocking probabilities with respect to intensities of all traffic streams offered to the link. This problem in general form was studied in [2,6] but without presenting algorithm for obtaining the values of derivatives. These results have certain theoretical value but it is difficult to apply them for solving practical engineering problems. In [5] for multi-rate one-link model a computational scheme is presented for estimation of derivatives of any probability of blocking with respect to intensity of any stream. But to realize the suggested approach we need to perform much more complex calculations than for Erlang case. In this paper, based on ideas of [4], we present a new algorithm for finding the derivatives of blocking probabilities with respect to the intensity of any traffic stream of multi-rate models. The complexity of the problem is reduced to same complexity as for the classical single traffic stream. We illustrate the realization of the suggested approach by an example of a one-link model with full accessibility and an arbitrary number of traffic streams. But the proposed method can also be applied for any model whose performance measures are estimated by means of the convolution algorithm [10]. The paper is organized as follows. In Section 2 the model is described. The convolution algorithm will be formulated in Section 3. In the following Section 4 we derive the basic formulæ for derivatives of blocking probabilities with respect to the intensity of any traffic stream. In Section 5 we present possible applications of the results for approximate evaluation of performance measures and for studying the estimation error of performance measures caused by small variations of offered traffic.

2

Model Description

Let us consider a single-link traffic model, where the link transmission capacity is represented by n basic bandwidth units, and let us suppose that we have K incoming Poisson flows of calls with traffic intensities Ak , k = 1, 2, . . . , K. A call of k’th flow uses dk bandwidth units for the time of connection. It is supposed

262

V.B. Iversen and S.N. Stepanov

that calls from the k’th stream are blocked when more than n−dk bandwidth units are occupied. Without loss of generality we shall assume that the holding times all are exponentially distributed with mean value chosen as time unit, but it is known that the model considered is insensitive to the distribution of the holding time, and each flow may furthermore have individual mean holding times. Let ik (t) denote the number of calls of the k’th flow served at time t. The model is described by a K-dimensional Markovian process of the type r(t) = {i1 (t), i2 (t), . . . , iK (t)} with state space S consisting of vectors (i1 , i2 , . . . , iK ), where ik is the number of calls of the k’th flow being served by the link under stationary conditions. The state space S is defined as follows: (i1 , i2 , . . . , iK ) ∈ S,

ik ≥ 0, k = 1, 2, . . . , K,

K 

ik dk ≤ n .

k=1

Let us by P (i1 , i2 , . . . , iK ) denote the unnormalised values of stationary probabilities of r(t). After normalisation the value p(i1 , i2 , . . . , iK ) denotes the mean proportion of time when exactly {i1 , i2 , . . . , iK } connections of different types are established. Assume that for state (i1 , i2 , . . . , iK ) the value i denotes the total number of occupied bandwidth units i = i1 d1 + i2 d2 · · · + iK dK . The process of transmission of requests for bandwidth of k’th flow is described by blocking probability πk , k = 1, 2, . . . , K, and by mk , the mean number of bandwidth units occupied by calls of k’th flow. Their formal definition through values of state probabilities are as follows (here and further, summations are for all states (i1 , . . . , iK ) ∈ S satisfying the above formulated condition, and by small characters we denote the normalised values of state probabilities):   πk = p(i1 , . . . , iK ), mk = p(i1 , . . . , iK ) ik dk . (2) i+dk >n

(i1 ,...,iK )∈S

Because mk = Ak dk (1 − πk ) we only consider the problem of estimating πk in the following. There are many algorithms for estimation of πk . All of them are based on the product form relations valid for P (i1 , . . . , iK ): P (i1 , i2 , . . . , iK ) = P (0, . . . , 0) ·

AiK Ai11 Ai22 · · ...· K , i1 ! i2 ! iK !

(i1 , i2 , . . . , iK ) ∈ S. (3)

The best calculation scheme for the model introduced is the recurrence algorithm first obtained in [7] and later also derived in [8],[9]. This algorithm exploits the fact that the performance measures (2) can be found if we for the process r(t) know probabilities p(i) of being in the state where exactly i bandwidth units are occupied:  p(i) = p(i1 , i2 , . . . , iK ) . i1 d1 +···+iK dK =i

The corresponding formulæ are as follows: πk =

n  i=v−dk +1

p(i),

k = 1, 2, . . . , K.

(4)

Derivatives of Blocking Probabilities

263

The unnormalised values of P (i) are found by the recurrence: 1 Ak dk P (i − dk ) I(i − dk ≥ 0), i K

P (i) =

i = 1, 2, . . . , n,

(5)

k=1

where we usually let P (0) = 1, and the function I(·) equals one if the formulated condition is fulfilled and otherwise equals zero. A numerically better and stable procedure is to normalize the state probabilities during each iteration. The alternative approach for estimation of πk , k = 1, 2, . . . , K, is based on the convolution algorithm [10]. The realization of this scheme allows very easily to find in one run the values of individual probabilities of blocking πk , k = 1, 2, . . . , K ∂πk , i = 1, . . . , K, j = 1, . . . , K. We will show this and the values of derivatives ∂A j in Section 4, but first we give a short description of the convolution algorithm.

3

Convolution Algorithm

For two independent vectors of the same dimension having components x = {x(0), x(1), · · · , x(a)} and y = {y(0), y(1), · · · , y(a)} we can define the convolution operator that being applied to vectors x, y gives the vector z with components {z(0), z(1), · · · , z(b)} as follows: z(i) = x(0) y(i) + x(1) y(i−1) + . . . + x(i−1) y(1) + x(i) y(0),

i = 0, 1, ..., b ,

where b ≤ a. In the following, the term convolution means usage of the convolution operator defined in the above way. Because it is known that the solution of the system of state equations has a product form (3) it can be found by means of an algorithm that we shall refer to as the convolution algorithm [10]. It consists of making the following three steps: 1. For k’th stream (k = 1, 2, . . . , K) calculate its individual unnormalized state probabilities {Pk (0), Pk (1), ..., Pk (n)} as if it was the only traffic stream offered to the n bandwidth units. For determination of Pk (r) we use the product form expressions (3) when it is considered for only one incoming flow number k. We have the following relations: ⎧ ⎨ Aik , r = 0, dk , 2dk , . . . , i dk , . . . , k dk , Pk (r) = (6) ⎩ i! 0, r = 0 , dk , 2dk , . . . , i dk , . . . , k dk . In (6) k is a maximum number of calls of k’th stream that can be served simultaneously. It is clear that k is the integer part of n/dk . Let us call {Pk (0), Pk (1), . . . , Pk (n)} the individual state distributions for call stream number k. 2. In any fixed order make successive convolution of all K individual state distributions. Let P (r) be the vector obtained after convolving all n individual distributions. Here r, r = 0, 1, . . . , n, is the total number of bandwidth units occupied by all calls. Let PK\k (r) be the vector obtained after convolving

264

V.B. Iversen and S.N. Stepanov

all K individual distributions except stream number k. Here r = 0, 1, . . . , n is a total number of occupied bandwidth units by all calls except for calls of the stream number k. 3. If we perform convolution of vector PK\k (r), r = 0, 1, . . . , n, with vector Pk (),  = 0, 1, . . . , n, of individual state distributions for stream number k we obtain after normalization the system state distribution p(r), r = 0, 1, ..., n, and individual performance measures πk , mk of the last stream having number k. This algorithm is alternative to the calculation scheme (5). For this particular case it is not so effective as (5) but much more general for the number of models where convolution algorithm can be applied. The performance measures for all streams can be found after performing the above mentioned steps for each stream by putting it at the end of the convolution procedure. Let us denote by Nm the computational efforts that is required to find the performance measures for all streams. We will measure Nm by the number of multiplications. Let us denote by Nc the number of convolutions and by Ncm the mean number of multiplications in performing one convolution. Then it is clear that Nm = Nc Ncm . For the convolution algorithm Nc = k(k − 1), so the required computational efforts is Nm = k(k − 1)Ncm . In [11] it shown how to decrease the amount of computational efforts in the implementing of the convolution algorithm by decreasing both Nc and Nmc . The total number of convolutions can be made equal to Nc = 4k − 6 by storing some of the intermediate results, and Nmc can be decreased by truncation of the used state space.

4

Algorithm for Derivatives of Individual Blocking

Auxiliary expressions obtained during realization of convolution algorithm results in a very simple scheme for determination of the derivatives of the individual blocking probabilities πk , k = 1, 2, . . . , K. Let us show this by the example of n  the model considered. Let us denote by N = P (r) the normalizing constant. r=0

Then in accordance with definition, the derivative of πk with respect of Aj can be found through the following expressions: + * ∂ P (n) + P (n−1) + . . . + P (n−dk +1) ∂πk (7) = ∂Aj ∂Aj N  ∂ (P (n) + P (n−1) + . . . + P (n−dk +1)) 1 = 2× ×N N ∂Aj ∂N − (P (n) + P (n−1) + . . . + P (n−dk +1)) . ∂Aj Then in accordance with definition of the convolution algorithm we have for r = 0, 1, . . . , n: P (r) = PK\j (r)Pj (0) + PK\j (r−1)Pj (1) + . . . + PK\j (0)Pj (r) .

(8)

Derivatives of Blocking Probabilities

265

Because PK\j (r) does not depend on Aj and due to the definition of Pj (r) (6) its derivative has the form: ∂Pj (r) = Pj (r−dj ) , ∂Aj

r = dj , dj + 1, ..., n .

(9)

From (8, 9) we have the following relation for the derivative of unnormalised stationary probability P (r) with respect to the intensity of j’th traffic stream: ∂P (r) = PK\j (r−dj )Pj (0) + PK\j (r−dj −1)Pj (1) + . . . + PK\j (0)Pj (r−dj ) ∂Aj (10) = P (r−dj ). Using (10) we finally obtain from (7): ∂πk = p(n−dj ) + p(n−dj −1) + . . . + p(n−dj −dk +1) (11) ∂Aj − {p(n−dj )+p(n−dj −1)+. . . + p(0)} · {p(n)+p(n−1)+. . .+p(n−dk +1)} = p(n−dj ) + p(n−dj −1) + . . . + p(n−dj −dk +1) − (1−πj )πk . The usage of (11) allows us to obtain the value of derivatives of individual blocking with respect to the intensity of any traffic stream served by the link in single run of the recursion (5) used for estimation of individual blocking probabilities.

5

Examples of Usage of Derivatives for Multi-service Loss Systems

Let us suppose that the performance of a multi-service link is described by a function f (π1 , π2 , . . . , πK ), where πk , k = 1, 2, . . . , K, are individual blocking probabilities for stream number k, each depending on the intensities of calls arriving A1 , A2 , . . . , AK , so we can write πk = πk (A1 , A2 , . . . , AK ). Examples of specific types of function f (·) are the following: 1. The individual blocking probability for the stream number k: f (π1 , π2 , . . . , πK ) = πk .

(12)

2. The mean number of resource units occupied by calls of k’th flow: f (π1 , π2 , . . . , πK ) = Ak dk (1 − πk )

(13)

3. The average revenue from calls being served: f (π1 , π2 , . . . , πK ) =

K 

k Ak (1 − πk ),

k=1

where k is the rate at which class-k calls generates revenue.

(14)

266

V.B. Iversen and S.N. Stepanov

Because we know the derivatives of individual blocking with respect to the intensity of any traffic stream served by the link it allows us to study how changes in arrival rates affect link performance described by function f (·). Let us for simplicity choose a type of f (·) defined by (12): πk (A1 + ΔA1 , A2 + ΔA2 , . . . , AK + ΔAK ) = πk (A1 , A2 , . . . , AK ) +

(15)

∂πk ∂πk ∂πK ΔA1 + ΔA2 + . . . + ΔAK . ∂A1 ∂A2 ∂AK

The above result is of great practical importance. It solves at least three problems: 1. Discrete-event simulation. Suppose we want to forecast blocking probabilities, but that we only have a rough idea of what offered load (A1 , A2 , . . . , AK ) is going to be. Then we need to simulate the system not only for values (A1 , A2 , . . . , An ) but also at the perturbed loads (A1 + ΔA1 , A2 , . . . , AK ), (A1 , A2 + ΔA2 , . . . , AK ),.... This requires K + 1 simulation runs and requires much computer time. An alternative approach is to simulate πk (A1 , A2 , . . . , AK ) once and then to use the relation (15). 2. Approximate calculation. If we calculate the exact value of the probability πk (A1 , A2 , . . . , AK ) for given input values (A1 , A2 , . . . , AK ), then it possible to use linear function (15) for approximate calculation of blocking probabilities in some neighbourhood of the values (A1 , A2 , . . . , AK ) determined by expression (A1 ± ΔA1 , A2 ± ΔA2 , . . . , AK ± ΔAK ). Because the function πk (A1 , A2 , . . . , AK ) changes smoothly with respect of components (A1 , A2 , . . . , AK ) it allows us to attain good accuracy of estimation for comparatively large values of ΔAk , k = 1, 2, . . . , K. 3. Error of measurements. Very often in solving dimensioning problems we use as input parameters the values of traffic intensities (A1 , A2 , . . . , AK ) found with some error depending on the used statistical procedures and confidence level. So instead of (A1 , A2 , . . . , AK ) we need to use (A1 ± ΔA1 , A2 ± ΔA2 , . . . , AK ± ΔAK ). For this case relation (15) allows us to find the error of performance measure estimation caused by the error of measurements of the model’s input parameters. Let us consider a numerical example that for the model studied shows the accuracy of approximate estimation of the blocking probabilities with help of derivatives. Figure 1 shows the exact value of πk (A1 + ΔA1 , A2 + ΔA2 , . . . , AK + ΔAK ) for k = 3, K = 5 and its approximation found by (15). The values of others parameters are as follows n = 500, A1 = 100, d1 = 1, A2 = 50, d2 = 2, A3 = 25, d3 = 4, A4 = 20, d4 = 5, A5 = 10, d5 = 10 with ΔAk defined by relation:

Derivatives of Blocking Probabilities

267

Blocking 0,14

0,12

0,1

0,08

0,06

Exact value of blocking

0,04

0,02

Approximate value of blocking

0 x 0,900 0,910 0,920 0,930 0,940 0,950 0,960 0,970 0,980 0,990 1,000 1,010 1,020 1,030 1,040 1,050 1,060 1,070 1,080 1,090 1,100

Fig. 1. The approximate and exact value of blocking

ΔAk = Ak (x − 1). The value of x on the Figure varies from 0.9 to 1 (the negative ΔAk ) and from 1 to 1.1 (the positive ΔAk ). The results presented shows good accuracy of approximation, especially with increasing blocking probabilities.

6

Conclusion

In this paper we present a new algorithm for finding the derivatives of blocking probabilities with respect to the intensity of any traffic stream of multi-rate models. The complexity of the problem is reduced to the same complexity as for the classical single traffic stream. The results obtained give explicit expressions for derivatives of blocking probabilities through stationary probabilities of total number of occupied bandwidth units. We demonstrate the realization of the suggested approach by an example of one-link model with arbitrary number of traffic stream and full accessibility of the calls to the link. But the proposed method can be realized for any model whose performance measures are estimated according convolution algorithm [10]. It is briefly shown how derivatives can be applied for approximate evaluation of performance measures or for the study error of performance measures estimation caused by small changes of offered traffic. These results can be also be applied for solving economical problems and for network optimization

268

V.B. Iversen and S.N. Stepanov

References 1. Akimaru, H., Takahashi, H.: Asymptotic Expansion for Erlang Loss Function and its Derivatives. IEEE Transactions on Communications. vol. COM 29(9), 1257– 1260 (1981) 2. Virtamo, J.: Reciprocity of Blocking Probabilities in Multiservice Loss Systems. IEEE Transactions on Communications 36(10), 1257–1260 (1988) 3. Esteves, J.S., Craveirinha, J., Cardoso, D.: Computing Erlang-B Function Derivatives in the Number of Servers. Communications in Statistics, Stochastic Models 11(2), 311–331 (1995) 4. Iversen, V.B.: Derivatives of Blocking Probabilities of Multi-Service Loss Systems. NTS–12. In: 12th Nordic Teletraffic Seminar, Otnas, Esbo, Finland, August 22– 24(1995) 5. Nilsson, A.A., Perry, M., Gersht, A., Iversen, V.B.: On Multi-rate Erlang-B Computation. ITC-16. In: 16th International Teletraffic Congress, Edinburg, Scotland (1999) 6. Ross, K.W.: Muliservice Loss Models for Broadband Telecommunication Networks. Springer, Heidelberg (1995) 7. Fortet, R., Grandjean, Ch.: Congestion in a Loss System when some Calls want Several Devices Simultaneously. In: Electrical Communications. Paper presented at ITC–4, Fourth International Teletraffic Congress, London, UK, 15–21 July, vol. 39, pp. 513–526 (1964) 8. Kaufman, J.S.: Blocking in a Shared Resource Environment. IEEE Transactions on Communications. vol. COM-29, 1474–1481 (1981) 9. Roberts, J.W.: A Service System with Heterogeneous User Requirements – Applications to Multi–Service Telecommunication Systems. In: Pujolle, G. (ed.) Performance of Data Communication Systems and their Applications, pp. 423–431. North–Holland, Amsterdam (1981) 10. Iversen, V.B.: The Exact Evaluation of Multi-Service Loss Systems with Access Control. NTS-7, Lund, Sweden, and Teleteknik 31(2), 56–61 (1987) 11. Iversen, V.B., Stepanov, S.N.: The Usage of Convolution Algorithm with Truncation for Estimation of Individual Blocking Probabilities in Circuit Switched Telecommunication Networks. ITC-15. In: Proceedings of the 15th International Teletraffic Congress, Washington, USA, pp. 1327–1336 (1997)

Rare Events of Gaussian Processes: A Performance Comparison Between Bridge Monte-Carlo and Importance Sampling Stefano Giordano1 , Massimiliano Gubinelli2 , and Michele Pagano1 1

2

Universit` a di Pisa, Dipartimento di Ingegneria dell’Informazione, Via Caruso 16, I-56126 Pisa, Italy {s.giordano,m.pagano}@iet.unipi.it Universit´e de Paris-Sud, Equipe de probabilit´es, statistique et mod´elisation, Bˆ atiment 425, F-91405 Orsay Cedex, France [email protected]

Abstract. A goal of modern broadband networks is their ability to provide stringent QoS guarantees to different classes of users. This feature is often related to events with a small probability of occurring, but with severe consequences when they occur. In this paper we focus on the overflow probability estimation and analyze the performance of Bridge Monte-Carlo (BMC), an alternative to Importance Sampling (IS), for the Monte-Carlo estimation of rare events with Gaussian processes. After a short description of BMC estimator, we prove that the proposed approach has clear advantages over the widespread single-twist IS in terms of variance reduction. Finally, to better highlight the theoretical results, we present some simulation outcomes for a single server queue fed by fraction Brownian motion, the canonical model in the framework of long range dependent traffic. Keywords: Rare Event Simulation, Gaussian Processes, Importance Sampling, Most Likely Path, Bridge Monte-Carlo.

1

Introduction

In the framework of teletraffic engineering, many challenging issues have recently arisen as a consequence of the evolution of network architectures and services. First of all, the last decade was marked by the search for global network architectures, which should handle heterogeneous applications and (sometimes) very stringent Quality of Service (QoS) guarantees. On the other hand, the growing interest for new sophisticated traffic models, able to take into account the Long Range Dependent (LRD) nature of real traffic, had a deep negative impact on analytical tractability, making simulation a more and more relevant tool for performance estimation. Indeed, even in the case of a simple single server queue, only a few asymptotic results for specific input traffic models are known [1,2]. Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 269–280, 2007. c Springer-Verlag Berlin Heidelberg 2007 

270

S. Giordano, M. Gubinelli, and M. Pagano

A primary QoS parameter is the loss probability, whose typical values can be very small and therefore difficult to estimate through standard Monte-Carlo (MC) simulation, since long run times are required to achieve accurate results. Simple queuing models are often considered in the literature in order to test the efficiency of new speed–up techniques and then generalize the results to more complex scenarios. In the framework of LRD traffic, the usual benchmark is represented by the estimation of the overflow probability (an upper bound for the loss probability in the corresponding finite-buffer system) for a lossless single server queue. By virtue of central-limit-type arguments (superposition of a large number of independent sources), Gaussian processes have been used quite often to capture in a parsimonious and flexible way the long–memory property of actual traffic flows; in particular, fractional Brownian motion (fBm) has become the canonical model with a LRD correlation structure [3]. To tackle the limits of traditional MC simulation in dealing with rare events, Importance Sampling (IS) techniques can be applied. Unfortunately, the goodness of an IS-based algorithm strongly depends on the choice of a proper change of measure to reduce the variance of the estimate. Recent works [4,5] show that single-twist IS (which consists in a change of measure chosen within the class of pdfs that differ from the original one only by a shift in the mean value) cannot be even asymptotically efficient if the input is fBm with Hurst parameter H = 0.5. As a matter of fact, asymptotic optimality can be achieved [5,6] by the use of more refined IS techniques, but at the cost of a higher computational complexity. In [7] we introduced Bridge Monte-Carlo (BMC), an alternative strategy to the efficient MC estimation of the overflow probability, which exploits the Gaussian nature of the input process. BMC estimator has the same computational complexity of single-twist IS and does not rely on a change of measure; hence, its applicability is quite general, since the only assumption is the knowledge of the correlation structure of the incoming traffic flow. The aim of this work is to evaluate the performance of the BMC estimator (in terms of its variance) and compare it with the more traditional single-twist IS. The analytical results are then verified through discrete event simulations, considering as input to the queue fBm traces with different values of H.

2

Setting of the Problem

The reference model we will consider in the paper is a single server queue with infinite buffer and deterministic service rate. In particular, we are interested in the evaluation of the overflow probability, i.e., the probability that the steadystate queue-length Q exceeds a given threshold b. By Lindley’s recursion, the latter is given by   IP (Q ≥ b) = IP sup(Xt − ϕt ) ≥ 0 (1) t∈I

where ϕt = b + tμ, μ is the difference between the mean values of the service and arrival rates and {Xt }t∈I is a Gaussian noise, with covariance Γts = E[Xt Xs ],

Rare Events of Gaussian Processes

271

t, s ∈ I, modeling the fluctuation of the input traffic. In the following, we will assume that Γtt > 0 for any t ∈ I except a point t0 ∈ I for which Γt0 t0 = 0. For instance, if the input is modelled by fBM, it is easy to see that the previous hypotheses are fulfilled (with t0 = 0) since, in that case, Γtt = σ02 t2H . In general the set I can be a finite subset of ZZ or a whole bounded interval of IR. In the first case the process X is just a random vector in X = IR|I| (|I| is the cardinality of I), which we will consider a Banach space using the Euclidean norm. In the second case, it can usually be assumed that the process X belongs to the Banach space X of continuous functions from I to IR endowed with the supremum norm. Since for simulation purposes a finite-dimensional I is enough, we prefer to restrict our discussion to the first framework, which is free of some technicalities that would prevent a clear exposition of the novel methodology, although most of the arguments works in (almost) the same way both in the discrete and in the continuous settings. Let us introduce a metric | · |H on X associated to the finite-dimensional Gaussian process X and defined as  [Γ −1 ]ts ρt ρs (2) |ρ|2H = ρ, ρH = ρ, Γ −1 ρ = t,s∈I

where ·, · is the Euclidean scalar product of X and Γ −1 is the inverse of the covariance matrix Γ . We will denote by H the so called reproducing kernel Hilbert space of X, i.e., the set of elements ρ ∈ X for which |ρ|H < +∞. In order to compare the behavior of the estimators when the probability of interest is small, we will introduce a small parameter ε in equation (1) and consider the probabilities pε defined as   pε = IP sup(εXt − ϕt ) ≥ 0 . (3) t∈I

For instance, in the many-sources regime, i.e., when n i.i.d. Gaussian sources are aggregated and queuing resources (buffer size and service rate) are scaled with n, buffer overflow (over level nb) becomes a rare event when n → ∞ as a consequence of statistical multiplexing and, as can be easily checked by direct computation, √ the corresponding overflow probability is given by equation (3) with ε = 1/ n.

3

Definition of IS Estimators

A trivial approach to the estimation of the probability pε is to draw i.i.d. samples X (1) , . . . , X (N ) from X and consider the MC estimator p$ε 

N 1  1A (X (i) ) N i=1 ε

where Aε is the event Aε = {x ∈ X : sup[εxt − ϕt ] ≥ 0} . t∈I

(4)

272

S. Giordano, M. Gubinelli, and M. Pagano

However, if pε → 0 the number N of samples to obtain a reliable estimate should −10 grow roughly as p−1 or ε . Estimation of very small probabilities (e.g. pε  10 smaller) becomes impossible or computationally heavy. Importance Sampling (IS) is a popular technique devised to build unbiased estimators not suffering from the smallness of pε . This is achieved by changing the law of the process so that to favor the occurrence of the target rare event and taking this change into account by reweighting the estimation according to the likelihood ratio, which, in measure-theoretic terms, is the Radon-Nikodym derivative of the original law with respect to the new one [8]. The efficiency of an IS-based algorithm depends on the choice of a proper change of measure to reduce the variance of the estimate. It is well known that the optimal change of measure (zero-variance pdf) involves the knowledge of the probability we want to estimate and therefore cannot be practically adopted. The issue is commonly tackled by minimizing some sub-optimal criteria instead. To this aim, we consider the class of IS estimators constructed by shifting the process X with a constant path η ∈ H (single-twist estimator). In the case of Gaussian processes, the likelihood ratio Lη (x) becomes   1 2 Δ dγ |η| Lη (x) = (x) = exp −η, x + (5) H dλη 2 H where γ and λη are the laws (on X ) of {Xt }t∈I and {Xt + ηt }t∈I respectively. Hence, the overflow probability can be rewritten as 1Aε (x) dγ(x) pε = E[1Aε (X)] = X −1 −1 = 1Aε (x)Lε η (x) dλε η (x) X

−1

= E[1Aε (X + ε−1 η)Lε

η

(X + ε−1 η)]

and the single twist IS estimator is defined to be Δ

p$N twist,ε,η =

N −1 1  1A (X (i) + ε−1 η)L−ε η (X (i) + ε−1 η) N i=1 ε

(6)

where η is the twist and its choice determines the performance of the estimator. Indeed, the variance of the single-twist IS estimator is controlled by the quantity −1

2 σtwist,ε,η = Var[1Aε (X + ε−1 η)Lε

−1

= E[(1Aε (X + ε−1 η)Lε

−1

= E[1Aε (X + ε−1 η)L2ε

η

(X + ε−1 η)]

η

(X + ε−1 η))2 ] − p2ε

η

(X + 2ε−1 η)]e|η|H /ε − p2ε 2

2

= Itwist − p2ε where

−1

Itwist = E[1Aε (X + ε−1 η)L2ε Δ

η

(X + 2ε−1 η)]e|η|H /ε . 2

2

(7)

Rare Events of Gaussian Processes

273

It is worth noticing that, as proved in [4,5], the single twist IS estimator cannot be asymptotically efficient if the input is fBm with H = 0.5. In any case, according to Large Deviation arguments, the natural choice (see, for instance, [5,9,10]) for η is represented by the most-likely path to overflow ρ∗t = ϕt∗ Γtt∗ /Γt∗ t∗

(8)

where t∗ is a most-likely time, i.e., a time which satisfies inf

t∈I

4

ϕ2t ϕ2∗ = t . Γtt Γt∗ t∗

The BMC Approach

The Bridge Monte-Carlo (BMC) method is based on the idea of expressing the overflow probability as the expectation of a function of the Bridge Y of the Gaussian input process X, i.e., the process obtained by conditioning X to reach a certain level at some prefixed time t ∈ I: Yt = Xt − ψt Xt , where Δ

ψt =

(9)

Γtt . Γtt

By the properties of Gaussian processes, the joint process (X, Y ) is still Gaussian and the process Y is independent of Xt since E[Xt Yt ] = Γtt −

Γtt Γ =0. Γtt tt

Moreover, its covariance is a simple function of the covariance of the original process X: Γ Γ Δ Γ%ts = E[Yt Ys ] = Γts − t t s t . (10) Γt t Finally, it is relevant to point out that the computational effort to simulate Y is equal to that of X. Since Xt = Yt + ψt Xt for any t ∈ I, we can express the probability pε of the event of interest Aε as follows:     pε = IP sup[εXt − ϕt ] ≥ 0 = IP sup[εYt + εψt Xt − ϕt ] ≥ 0 t∈I t∈I      = IP inf ψt−1 [ϕt − εYt ] ≤ εXt = E IP inf ψt−1 [ϕt − εYt ] ≤ εXt |Y t∈I t∈I *  + Yε  =E Φ ε Γt t (11)

274

S. Giordano, M. Gubinelli, and M. Pagano

where ϕt − εYt ψt

Δ

Y ε = inf

t∈I

and Δ



Φ(x) =



x

e−y /2 √ dy . 2π 2

Given an i.i.d. sequence {Y , i = 1, . . . , N } distributed as Y , we introduce the Bridge Monte-Carlo (BMC) estimator for pε as follows:  (i)  N  Y Δ 1 N ε p$bmc,ε,t = Φ N i=1 ε Γt t (i)

with (i)



(i)

Δ

= inf

t∈I

ϕt − εYt ψt

.

It is easy to check that the BMC estimator is unbiased, i.e. . = pε E p$N bmc,ε,t and that its variance is given by   2 Var p$N bmc,ε,t = σbmc,ε,t /N where *  2 σbmc,ε,t = Var Φ

Yε  ε Γt t

+

⎡  = E ⎣Φ

Yε  ε Γt t

2 ⎤ ⎦ − p2ε .

To heuristically justify the efficiency of BMC from a computational perspective, we point out that if the basic MC method can be seen as a numerical scheme to perform integration in a large number of variables, then BMC is a hybrid method in that it performs one of these integrations exactly exploiting the properties of Gaussian processes, while the remaining integrations are still performed using a MC scheme (the bridge Y lives in a smaller space than the original process X, since one of the coordinates is zero by definition). When it comes to rare event estimation, it happens that in the full space of the process the characteristic function of the rare event has support on a region with small probability and this renders direct MC estimation ineffective. However, BMC smoothes out the function to be integrated allowing a more efficient estimation by the MC part.

Rare Events of Gaussian Processes

5

275

Comparison Between IS and BMC Estimators

The main result of the paper is represented by the following theorem, which basically states that, for any choice of the rarity parameter ε, BMC performs better than single-twist IS, even when the change of measure is based on the most likely path ρ∗ . Theorem 1. For any ε > 0 and any twist η of the form ηt = λψt , t ∈ I with λ ∈ IR, we have 2 2 σbmc,ε,t ≤ σtwist,ε,λψ . (12) Proof. Since the mean values of the two estimators are the same (both estimators are unbiased), it is enough to prove that ⎡  2 ⎤ Y Δ ε ⎦ ≤ Itwist .  Ibmc = E ⎣Φ ε Γt t We can rewrite the l.h.s. as ⎡ Ibmc = E ⎣



+∞

−∞

−y 2 /(2Γt t )

e 1Y ε ≤εy  dy 2πΓt t

2 ⎤ ⎦

and then perform the change of variables z = y − α (for an arbitrary α ∈ IR): ⎡ 2 ⎤ +∞ −z 2 /(2Γt t ) 2 e Ibmc = E ⎣ 1Y ε −εα≤εz  e−αz/(Γt t )−α /(2Γt t ) dz ⎦ 2πΓt t −∞ which, by Jensen’s inequality, is less than * + +∞ −z 2 /(2Γt t )  2 2 e  Ibmc ≤ E 1Y ε −εα≤εz e−αz/(Γt t )−α /(2Γt t ) dz 2πΓt t −∞ so that

* Ibmc ≤ E

+∞

−∞

2 e−z /(2Γt t )  1 e−2αz/(Γt t )−α /(Γt t ) dz 2πΓt t Y ε −εα≤εz 2

+ .

Now we want to rewrite the exponent −2αz/(Γt t ) − α2 /(Γt t ) in a different way, taking into account that Y, ψH = 0 almost surely and that |ψ|2H = Γt−1 t (see Lemma 1 below). Hence −2αz/(Γt t ) − α2 /(Γt t ) = −2Y + zψ, αψH − |αψ|2H and, setting η = εαψ (so that λ = εα is an arbitrary real number), we get * + +∞ −z 2 /(2Γt t ) e −2ε−1 Y +zψ,η H −ε−2 |η|2H  Ibmc ≤ E 1 e dz . 2πΓt t Y ε −εα≤εz −∞

276

S. Giordano, M. Gubinelli, and M. Pagano

Now recall the definition of Y ε and note that the event {Y ε − εα ≤ εz} is equivalent to the event {supt [ε(z + α)ψt + εYt − ϕt ] ≥ 0} = {zψ· + Y· + ε−1 η· ∈ Aε } = {X· ∈ Aε − η}, so that * + +∞ −z 2 /(2Γt t ) e −2ε−1 Y +zψ,η H −ε−2 |η|2H  Ibmc ≤ E 1zψ· +Y· +ε−1 η· ∈Aε e dz . 2πΓt t −∞ Finally, since Y +zψ has the same distribution as the original process X (since Y is its bridge and z is an independent Gaussian random variable with the “right” covariance), we have . −1 −2 2 Ibmc ≤ E 1X· +ε−1 η· ∈Aε e−2ε X,η H −ε |η|H . −2 2 −1 = E 1Aε (X + ε−1 η)L2ε η (X + 2ε−1 η) eε |η|H = Itwist where we took into account the definition (5) of the likelihood ratio for Gaussian processes. This proves our claim. Here we prove the auxiliary results needed in the above proof. Lemma 1. We have Y, ψH = 0 almost surely and ψ, ψH = Γt−1 t Proof. Let us first prove that ψ, ψH = Γt−1 . By definition, the scalar product t on H is such that Γ· a , Γ· b H = Γa b ; since ψt = Γt t /Γt t , we have the claim: Γ· t , Γ· t H = Γt−1 . ψ, ψH = Γt−2 t t To prove the second statement, recall that Y is a centered Gaussian process with covariance Γ% given by equation (10); then the random variable A = Y, ψH is still Gaussian with mean zero. We will prove that its variance is also zero, so that we can conclude that A = 0 almost surely. Let us compute E[A2 ]: ⎤ ⎡   [Γ −1 ]t s Yt ψs ⎦ E[A2 ] = E ⎣ [Γ −1 ]ts Yt ψs t,s

=



t ,s

[Γ −1 ]ts ψs ψs [Γ −1 ]t s E [Yt Yt ]

t,s,t ,s

=



[Γ −1 ]ts ψs ψs [Γ −1 ]t s Γ%tt

t,s,t ,s

=



[Γ −1 ]ts ψs ψs [Γ −1 ]t s (Γtt − Γtt ψt ψt )

t,s,t ,s

=



[Γ −1 Γ Γ −1 ]s s ψs ψs − Γtt ψ, ψ2H

s,s

= ψ, ψH − Γtt ψ, ψ2H = 0 by the previous result on the H-norm of ψ.

(13)

Rare Events of Gaussian Processes

277

The Theorem is quite general and inequality (12) holds for any choice of the conditioning point t and any twist η of the form ηt = λψt = λ

Γtt Γtt

(λ ∈ IR) .

In particular, if the most likely time to overflow t∗ is chosen as conditioning point (i.e., t = t∗ ), then equation (8) implies that ηt ∼ ρ∗t . Since λ is an arbitrary constant, from the previous Theorem it follows that the BMC estimator (with t = t∗ ) has a lower variance than the single-twist IS estimator with η = ρ∗ .

6

Simulation Results

In this section we present some of the results achieved by applying the above described techniques to the simulation of a single server queue. As a reference scenario, we consider the many-source regime, where n i.i.d. Gaussian sources are multiplexed together (i.e., the aggregate traffic has covariance nΓts ) and the queuing resources are scaled with n. We recall that in this case the overflow probability is still given by equation (3), i.e.:     √ Δ (n) pn = P Q ≥ nb = P sup(Xt / n − ϕt ) ≥ 0 t

√ where ε = 1/ n and X is a centered Gaussian process with covariance Γts . In our simulations, we consider a queue with μ = 0.1 and b = 0.3, fed by fBm traces (with σ0 = 1) for different values of the Hurst parameter. In all cases the trace length is 3 · t∗ (with preliminary sets of simulations, we verified that longer traces do not affect significantly the results) and the estimations are averaged over N = 106 random samples. In the performance comparison, for IS we consider two different changes of measure, corresponding to the most-likely path ρ∗ and the widely used (see, for instance [11]) linear path, denoted in the following as IS-MLP and IS respectively. Figure 1 shows that the overflow probability (for sake of brevity, only the plots for H = 0.7 are shown) decays exponentially as expected and the slope of the plot (in logarithmic scale) is in accordance with the Large Deviation limit [5,9]: − lim n log pn = inf n→∞

t∈I

ϕ2t∗ Δ 1 ∗ 2 ϕ2t = = |ρ |H . 2 Γtt 2 Γt∗ t∗ 2

(14)

The estimations obtained with IS-MLP and BMC (with t = t∗ ) are quite close, while, as expected, naive IS (i.e., IS with linear path) is less efficient. To confirm the correctness of the previous Theorem, we present the behavior of the Relative Error (i.e., the estimated standard deviation of the estimator divided by p$ε ) for H = 0.5 (standard Brownian motion – see Figure 2) as well as H = 0.7 (LRD traffic – see Figure 3). In both cases, the relative position of IS-MLP and BMC curves agrees with the theoretical results, while naive IS introduces bigger errors for H = 0.7. Heuristically, this is explained by the fact

278

S. Giordano, M. Gubinelli, and M. Pagano 1 BMC IS-MLP IS 0.01

OverflowProbability

1e-04

1e-06

1e-08

1e-10

1e-12

1e-14 100

200

300

400

500

600

700

800

900

n

Fig. 1. Overflow Probability estimates for H = 0.7 1000

RelativeError

100

10

BMC IS = IS-MLP BMC-1/2MLT BMC-2MLT 1 100

200

300

400

500

600

700

800

900

n

Fig. 2. Relative Error estimates for H = 0.5

that we did not use the additional information about the most probable way in which overflow is reached. It is worth noticing that for H = 0.5, IS and IS-MLP are equivalent: indeed, in case of standard Brownian Motion, the most likely path is linear.

Rare Events of Gaussian Processes

279

10000

RelativeError

1000

100

10 BMC IS-MLP IS BMC-1/2MLT BMC-2MLT 1 100

200

300

400

500 n

600

700

800

900

Fig. 3. Relative Error estimates for H = 0.7

Finally, Figures 2 and 3 also compares the performance of BMC for different choices of the conditioning point: t = t∗ (BMC), t = 1/2·t∗ (BMC-1/2MLT) and t = 2 · t∗ (BMC-2MLT). As expected, in accordance with the Large Deviation interpretation, the most likely time to overflow t∗ appears to be the best choice for t. BMC estimators, for a wrong choice of the conditioning point, may behave worse than IS with MLP; this result is quite intuitive and not in contrast with Section 5. Indeed, when t = t∗ , ψt is not proportional to ρ∗ and the Theorem only refers to changes of measure of the form ηt = λψt .

7

Conclusions

It is well known that single-twist IS approach is not the best technique to tackle the problem of rare events for Gaussian traffic. This was the main motivation that led us to propose the Bridge Monte-Carlo (BMC) estimator, a novel approach that exploits the Gaussian nature of the input process and relies on the properties of Bridges. The computational cost of BMC is comparable to that of single-twist IS (the same random samples are generated during the simulation runs) and in this paper we proved that BMC performs better of IS in terms of estimator variance. The theoretical results are also confirmed by various sets of simulations, which also highlight that naive IS (with linear path to overflow) has the worst performance among the analyzed speed–up techniques. Finally, it is important to point out that the principle underlying the BMC method can be applied to any Gaussian process (only the knowledge of the

280

S. Giordano, M. Gubinelli, and M. Pagano

covariance function is required in order to estimate Y ε ) and in a wide variety of network systems (e.g. tandem queues, schedulers, etc.) and could be generalized with more than one conditioning or with dynamic choice of the parameters. Acknowledgments. The authors would like to acknowledge support from the Network of Excellence EuroNFI (Design and Engineering of the Future Generation Internet).

References 1. D¸ebicki, K., Mandjes, M.R.H.: Exact overflow asymptotics for queues with many Gaussian inputs. Technical Report PNA-R0209, CWI, the Netherlands (2002) 2. Narayan, O.: Exact asymptotic queue length distribution for fractional Brownian traffic. Advances in Performance Analysis 1(1), 39–63 (1998) 3. Norros, I.: Studies on a model for connectionless traffic, based on fractional Brownian motion. In: Conference on Applied Probability in Engineering, Computer and Communications Sciences, Paris, INRIA/ORSA/TIMS/SMAI (1993) 4. Baldi, P., Pacchiarotti, B.: Importance Sampling for the Ruin Problem for General Gaussian Processes. Technical report (2004) 5. Dieker, A.B., Mandjes, M.R.H.: Fast simulation of overflow probabilities in a queue with Gaussian input. ACM Trans. Model. Comput. Simul. 16(2), 119–151 (2006) 6. Dupuis, P., Wang, H.: Importance Sampling, Large Deviations and differential games. Technical report, Lefschetz Center for Dynamical Systems, Brown University (2002) 7. Giordano, S., Gubinelli, M., Pagano, M.: Bridge Monte-Carlo: a novel approach to rare events of Gaussian processes. In: Proc. of the 5th St.Petersburg Workshop on Simulation, St. Petersburg, Russia, pp. 281-286 (2005) 8. Heidelberger, P.: Fast Simulation of Rare Events in Queueing and Reliability Models. Performance Eval. of Computer and Commun. Syst. 729, 165–202 (1993) 9. Addie, R., Mannersalo, P., Norros, I.: Most Probable Paths and performance formulae for buffers with Gaussian input traffic. Eur. Trans. on Telecommunications 13 (2002) 10. O’Connell, N., Procissi, G.: On the Build-Up of Large Queues in a Queueing Model with Fractional Brownian Motion Input. Technical Report HPL-BRIMS98-18, BRIMS, HP Labs. Bristol (U.K.) (1998) 11. Huang, C., Devetsikiotis, M., Lambadaris, I., Kaye, A.: Modeling and simulation of self-similar variable bit rate compressed video: a unified approach. In: Proc. of SIGCOMM’95, Cambridge, US, pp. 114–125 (1995)

A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES) Helena Rif` a-Pous and Jordi Herrera-Joancomart´ı Universitat Oberta de Catalunya Rb. del Poble Nou, 156 08018 Barcelona {hrifa,jordiherrera}@uoc.edu

Abstract. The functioning of an ad hoc network is based on the supportive contributions of all of its members. Nodes behave as routers and take part in route discovery and maintenance. In this paper, a forwarding protocol is presented that stimulates node cooperation to improve the throughput of the network. The incentive mechanism is provided through a micropayment protocol that deals with the cooperation range of the users. Most cooperative nodes intrinsically benefit of the best routes and availability and they take preference in front of selfish ones. Keywords: Multihop ad hoc networks, cooperation, forwarding, payment.

1

Introduction

The functioning of an ad hoc network is based on the supportive contributions of all of its members. Nodes cooperate to form a communication infrastructure that extends the wireless transmission range of every terminal without using any dedicated network device. To ensure and spur the cooperative behavior of ad hoc network members, an incentive mechanism is required that regulates the resources spent and given to the community. Protocols to stimulate cooperation can be divided in two groups: reputationbased and credit-based1 . The former treat packet forwarding as an obligation and isolate and punish those nodes that do not behave as expected, while the latter consider it as a service that can be valued and charged. Reputation-based schemes define a method for keeping track of nodes’ actions in order to classify reliable and unreliable nodes [2,3,4,5]. The main problem of this approach is distinguishing misbehaving nodes from those that can not retransmit packets due to energy constraints, channel fadings or simply natural disconnections. The assumption that a node shall forward always all the packets it receives is too hard for a network formed of -beyond others- small and handheld 1

For a detailed comparison of different cooperative protocols we refer to [1] where there are summarized the most relevant proposals.

Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 281–293, 2007. c Springer-Verlag Berlin Heidelberg 2007 

282

H. Rif` a-Pous and J. Herrera-Joancomart´ı

devices. On the other hand, nodes on some strategic points of the network will have more transmission requests than those on the periphery, and it will be unfair to punish them if they can not hold all the transport. In credit-based schemes, virtual currency is introduced to stimulate each node to behave cooperatively. Nodes that generate traffic have to pay to those ones that help forwarding the data. In this category, a distinction can be done regarding the nature of the payment: money-based schemes and token-based schemes. Money-based schemes [6,7,8] use money as the payment token. The drawbacks of that kind of currency models is that the costs of managing financial information have a considerable legal and administrative overhead. Furthermore, the minimization of selfish nodes is not guaranteed since users without economical concerns can behave selfishly in the net and pay whatever is needed to have its packets transmitted. Token-based schemes generally require the nodes have a balanced number of packets transmitted and relayed [9,10]. Nodes increase the number of stored tokens when they forward packets, and decrease them proportionally to the number of hops when sending messages. A node shall forward packets until it earns enough to send its owns, so this kind of protocols can be sometimes limiting the capacity of the network if the average token level is too low. On the other hand, if it is too high, tokens no longer suppose an incentive to cooperate and the mechanism does not fulfill its purposes any more. Present research in credit-based mechanisms is basically focused on how much a node should be paid for forwarding messages. One research direction is finding a fair incentive algorithm that rewards the users for the resources used in the forwarding connection [10,11,12]. The circumstances and resources employed by relying parties (battery level, transmission energy, position within the network topology, mobility, bandwidth, ..) are considered to calculate the cost of a certain path. Although theoretically these kind of algorithms are very attractive, they are too complex for mobile ad hoc networks. The real cost of a transmission changes for every transferred packet so the overhead involved for sending a message is barely affordable. Too hard protocols may provoke a contrary effect on the users, not willing to participate on the network. In this paper, we present a Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES), a simple credit-based scheme that provides incentives to selfish mobile nodes to cooperate. The proposed protocol seeks to foster the traffic through a fair protocol, but instead of trying to pay for the resources spent in a connection, it rewards with a high quality of service those constant collaborative nodes. An evaluation of the system through a simulation analysis is also presented. The contributions of the proposal are the following. In spite of previous approaches, that try to spur the system though a payment model that rewards the nodes based on its utility function, FURIES uses a payment protocol to categorize nodes’ behavior. Users are prone to collaborate in order to obtain a better quality of service. One of the novelties of this protocol with respect to the

A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES)

283

previous ones proposed in the literature is that introduces an incentive factor to prize the forwarding of packets of high ranked people. The paper is organized as follows. In section 2 we introduce the protocol and give an overview of the proposed architecture. Section 3 describes the protocol details and analyzes some interesting aspects to spur traffic in multihop networks. Section 4 evaluates the solution based on simulation results. Finally, we conclude the paper in section 5.

2

FURIES General Description

We present in this section the general description of our Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES). FURIES is a credit-based protocol that combines properties of both credit-based and reputation-based incentive models. On one hand, it uses payment mechanisms to charge/reward the forwarding of packets through the net. On the other hand, it manages user reputation status to classify reliable from unreliable nodes. Packets of both high and low reputed nodes are prone to be sent, however nodes with higher reputation take preference to get their data forwarded, i.e. they favor of a better quality of service. The interchange currency used in the FURIES payment protocol is not money but credit to transmit data. The unit of credit is a token that represents 1 packet of 2346 bytes 2 . Credit tokens exchanged in a transmission session are used to state the reputation of a user and categorize its involvement in the net. Nodes that generate traffic loose tokens and reputation, while the ones that forward it, gain them. However, payments and collections are not balanced. The cost of sending a packet depends on the hop distance to the destiny. On the other hand, the reward is based on the credit level of the sender, that is, its participation status. Thus, nodes earn more credits for forwarding packets of highly reputed and credited users. 2.1

FURIES Entities

In this paper we consider a user that wants to connect to another one who is not in his transmission range, so a multihop route has to be established. We assume a routing protocol that provides information of available routes. Opposed to other credit-based protocols for ad hoc networks, FURIES does not require that the source node knows the complete path to the destination but only the hop distance. FURIES will stimulate the transmission through the discovered routing paths. Credit-based schemes require the use of tamper-proof hardware or a trusted third party (TTP) to manage the tokens. We make use of a TTP to securely store the credit account of nodes and give memory to the system, that is, credits earned or spent in a session are taken in consideration further the lifetime of a particular ad hoc network. 2

The maximum size of an IP packet over a 802.11 network is 2346 bytes [13].

284

H. Rif` a-Pous and J. Herrera-Joancomart´ı

FURIES architecture is composed of the following entities: – Certification Authorities (CA) that issue identity certificates for the participants of ad hoc networks. The recognized CAs are the ones accepted in the Internet Community and that follow some established security policies. – Reputation Authority (RpA), a TTP that is used to manage the users’ credit account. Such information is contained in a reputation certificate that will be implemented as an attribute certificate according the standard X.509. All users in our model are registered in a well known CA that issues them a certificate which binds their identity with their public key. With this certificate, users can sign on the RpA that will manage their credits. The RpA is an independent entity not related to any specific CA. It can deal with CAs of different providers as long as it accepts its certification policies. Moreover, the RpA does not need to be centrally controlled but can be a distributed entity under the control of a world-wide community. Reputation certificates are used to classify users and fix the rewarding credits of a forwarding. For this reason it is important that these certificates hold updated information at any time. Therefore, reputation certificates are short live certificates, with a validity that we fix in 10 days. 2.2

Incentive Factor IF

The FURIES protocol introduces an Incentive Factor (IF ) element to prioritize the forwarding of packets from collaborative nodes and thus provide them a good quality of service. Nodes do not need to pay more to receive a better service, the incentives a router receive to forward a packet are intrinsically stated in the protocol based on the profile of each payer. The incentive factor modulates the credits (c) that an intermediate node has to receive for its job such that c = IF · d, where IF is the incentive factor of the sender node, and d is the number of transmitted packets. We have designed the incentive factor as a function of the credits such that it asymptotically tends to 0 when the credit balance of a user grows in negative values, and increases polynomially otherwise. Since the amount of data transmitted in ad hoc networks can range from a few Kb when the devices are very small and limited, up to hundreds of Mb when the net has access to the Internet, the gradient of the incentive factor function is bigger for values around 0 (see figure 1(a)). This allows to clearly make a distinction between selfish and unselfish nodes. The IF function on the credits is the following: IF (c) = A ∗ abs(c)(signum(c)/B) Through simulations we have heuristically approximated two values for A and B, resulting in A = 1/2, and B = 10 (see figure 1(b)). IF (c) = 1/2 ∗ abs(c)(signum(c)/10) , −109 < c < 109

(1)

A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES)

285

4 1.0 0.9 3 Incentive Factor

Incentive Factor

0.8 0.7 0.6 0.5

2

1

0.4 0.3 −200

0 0

200

400

600

800

−7.5

1,000

−5.0

−2.5

0.0

2.5

5.0

7.5

Credits (log scale)

Credits

(a) IF for low range credits

(b) IF function (log axis)

Fig. 1. Incentive factor function

The charges and rewards of a transmission are not balanced, so we have limited the range of the accumulated credits to [−109 , 109 ] in order to avoid the saturation of a node in an extreme position. When the credit rate of a user is 0, its incentive factor is A = 1/2, which is lower than 1. This discourages users from indiscriminately registering themselves with a new identity to reset their record. The neutral incentive factor (IF = 1), that is, when a forwarder receive the same amount of credits for a carried packet that the ones it would have to pay in case it initiates a transaction, is when the accumulated credit of a user is 103 packets which is a little more than 2, 2M Bytes of data.

3

FURIES Credit-Based Protocol

FURIES stimulates cooperation through a credit mechanism that regulates nodes’ transmissions based on their reputation. In this section we detail such mechanism, that can be divided in three phases: – Initialization phase – Contract establishment and communication, driven by a micropayment scheme – Charging and Rewarding phase 3.1

Initialization

In order to initiate a transmission in a multihop network a user needs to hold a reputation certificate that states its forwarding parameters. In particular, the reputation certificate sets two main attributes: – Credits (c): Accumulated user credits at the time of certificate generation. – Incentive Factor (IF ): The result of applying the IF equation(1) over c.

286

H. Rif` a-Pous and J. Herrera-Joancomart´ı

When a user first requests a certificate in an RpA he is issued a certificate with c = 0 and IF = 1. His IF will be 1 until the user starts transmitting data or his accumulated credit is equivalent to an incentive factor greater that 1. We give new users an IF of 1 to not prejudice their first transactions. At the same time, we spur users first to give resources to the net and then take the profit. 3.2

Micropayment Scheme

The micropayment scheme we use in this paper is highly inspired on the PayWord [14], a light protocol that allows offline verification of the payment proofs. The micropayment protocol is divided in two parts: Contract Establishment and Data Transmission. Figure 2 depicts all the steps. Contract Establishment When node A0 wants to send data to node An , assuming the path will go through nodes A1 , · · · , An−1 : 1. A0 generates payment tokens in the following way: Node A0 generates a long fresh chain of paywords w0 , w1 , ..., wm by choosing w0 at random and by applying a hash function h iteratively such that wj = h(wj−1 ) for j = 1, 2, · · · , m, where m is the maximum number of possible payments during the session. 2. A0 prepares a contract offer. The offer includes the sender and receiver identifiers, IA0 , IAn , the serial number of the sender reputation certificate, SNA0 , and its validity period V , the number of hops of the route n and the top hash chain value wn : Offer = {IA0 , IAn , SNA0 , V, n, wm } 3. Node A0 sends a forwarding request towards An that contains its digital signature of the contract offer together with its reputation certificate RCertA0 : ReqA0 = {SA0 [Offer], RCertA0 } 4. The request is read by intermediate nodes of the path (A1 , · · · , An−1 ). If they are not interested in forwarding the packet because for them the expense is not worthwhile, they send a reject response to A0 . Otherwise, they enclose in the request a signed attachment with information about its identity. ReqAi = {ReqAi−1 , SAi [ReqAi−1 ], IAi }

For i = 1, · · · , n − 1

After forwarding an offer request, a node Ai waits (n − i)·timeout seconds for a response, either positive or negative, from Ai+1 . If it not arrives, it send a break up chain message to A0 . 5. Node An receives the request of transmission from user A0 along with the information of the relaying parties Ai for i = 1, · · · , n − 1. An verifies the signatures and checks that the number of hops stated in the contract offer is at most n.

A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES)

Node A1

Node A0 SA0(Offer)

(3)

(1-2)

...

ReqA0+ SA1(ReqA0) (4)

Contract Establishment

Node A(n-1)

287

Node n

ReqA(n-2) + SA(n-1)(ReqA(n-2)) (4) (6)

Signed contract (5)

(7) Contr

Payment check (8) Data Transmission

data

(9)

Fig. 2. Micropayment protocol

6. If all data is correct and node An accepts the transmission from A0 , it generates a contract with the data of the received offer and an appendix with the list of recruited routing nodes, and signs the overall information. It sends the contract to node A0 using the same bidirectional path as the one used in the reception. RepAn = Contr = SAn [ReqAn−1 ] 7. All routing nodes keep a copy of the contract. Data Transmission At the end of the contract set up phase, data transmission can be started. 8. If A0 wants to send d packets of data to An , it will transmit to A1 the information along with a payment check. The payment check consists of the d next hash values of the chain. In fact, presenting the highest hash it is enough. For instance, for the first d packets A0 has to send the chain value wn−d . inf o = {packets, wn−d} 9. A1 verifies the payment, checking that wn = hd (wn−d ), where d is obtained from the number of transmitted packets. A1 keeps a copy of the wn−d value and forwards the inf o to the next node. Such operation is performed at each intermediated node Ai , for i = 1, · · · , n − 1. 10. Finally, An obtains the packet inf o. 3.3

Charging and Rewarding Model

Charging and rewarding is performed using a protocol between the routing nodes involved in the transmission and the reputation authority, RpA. This phase must be executed anytime after the data transmission session and within the validity period of the contract, when the nodes have online connection with the RpA. It is important to notice that the possession of a payment proof by a node Ai does not entail that the corresponding node Ai has forwarded the data, just

288

H. Rif` a-Pous and J. Herrera-Joancomart´ı

that it has received it. However, if Ai has the payment proof, it is clear that Aj for 0 ≤ j ≤ i − 1 indeed forwarded the packets. For that reason, when a routing node Ai with i = n reports a payment proof to the RpA, it only receives half of the full router rate, while the lower nodes of the path can be completely rewarded. Only when the destination node of a packet An sends payment proofs to the RpA it is evidenced that the data has been delivered and all intermediate nodes are rewarded. In order to stimulate destination parties to send the proofs, they are also rewarded with a tax of 1 credit for each packet they demonstrate they have received. The detailed protocol is the following: 1. When node Ai wants to get payed for the forwarding services, it sends to the RpA the forwarding contract, Contr, and the payment proof wk , where k = m−d, being m the maximum number of packets that can be transmitted within that session, and d the number of forwarded packets. 2. The RpA verifies that hd (wk ) = wm , which ensures that the token is valid. RpA obtains the value wm from Contr, where the value is signed by the sender node A0 and then assumed authentic. 3. Then, the RpA executes the following procedure: – If no proof wk has been previously presented by any node, then RpA adds (IFA0 · d) credits to each node Aj for 1 ≤ j < i and (1/2 · IFA0 · d) credits to Ai , in case i = n. If i = n (i.e. the reporter is the destination node) then An is rewarded with d credits. In any case, the RpA also deducts (i · d) credits from A0 credits. – If Aj , for some 1 ≤ j < i has already presented the proof wk to the RpA, then the RpA adds (1/2 · IFA0 · d) credits to Aj , (IFA0 ∗ d) credits to each node Ak for j + 1 ≤ k < i and (1/2 · IFA0 · d) credits to Ai , in case i = n. If i = n, then An is rewarded with d credits. In any case, the RpA also deducts ((i − j) · d) credits from A0 credits. – If Aj , for some i < j ≤ n has already presented the proof wk to the RpA, then the RpA informs to Ai that it has already been rewarded for such operation. Since the incentive factor of a node can suffer changes in short periods of time, the rewarding IFA0 to be used in step 3 is the one stated in the reputation certificate which serial number SNA0 appears in the forwarding contract Contr. However, when the transmission path is short (n < 5), rewarding IFA0 can not exceed 1. This prevents fake nodes to create looping traffic between them in order to increase their credit. Then,

1, if n < 5, IFA0 > 1 Rewarding IFA0 = IFA0 , otherwise It has to be noted that the charging and rewarding model we propose is unbalanced, hence, it faces a problem of credit saturation when all nodes achieve the maximum credit level. This congestion leads the system to work as if it was a plain model that can neither prioritize transmission packets to provide a quality

A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES)

Threshold=1

289

Threshold=1

A5 +1 credit/pkt

IF=0,8

-5 credit/pkt

A0

Threshold=0,7

Threshold=0,5 Threshold=0,7

+0,8 credit/pkt

A1 A2

A4

Threshold=0,8

A3

+0,8 credit/pkt +0,8 credit/pkt

+0,8 credit/pkt

Fig. 3. Illustration of the payment charges

of service, nor offer any real incentive to the routing nodes to spur the data forwarding. To avoid such case, the RpA maintains a sliding window for each user that inspects the accumulated amount of data forwarded by each of them during the last 30 days. If the result does not exceeds 1% of his forwarding credit, this will be reduced 1% every day that passes in these conditions. Figure 3 illustrates the charging and rewarding model with an example. Nodes only transmit packets from initiators whose incentive factor is greater than a threshold. Node A0 has two connection routes to node A5 , however, it can not use the shortest one to send data to A5 because its reputation value is not high enough to encourage the intermediate nodes of this path to forward its packets. Nodes in the shortest path are centrally located in the network, receive a lot of forwarding requests, and only relay packets of users who are very collaborative and have a high reputation level. As a result, A0 has to select the longest path for the transmission, which is more expensive (it costs 5 credits/packet instead of 3 credits/packet), but offers the required availability.

4

Evaluation

Simulations of FURIES were conducted to evaluate the general characteristics of the protocol and provide proof of concept. We used a self-developed application that considers network layer factors and allows us to make qualitative appraisals. However, we do not model the problems of physical and link layers, so that quantitative performances can not be directly extracted from the tests. We simulated two different payment models: a plain payment protocol without incentives, such as [9](that is, sending one packet through 3 hops costs 3 credits, and the intermediate nodes get 1 credit each one), and the proposed FURIES protocol with the incentive factor defined in section 2.2. The simulated networks are composed of 100 nodes that move randomly in a square area of 1000m2. The transmission range is 70m. Each node starts, on average, 2 transmissions a day of messages the size of which is uniformly distributed from 1Kb to 10Mb. The application is run during a simulation period of a year. 100 simulation runs have been performed. Table 1 compares the results of a population attempting to send data through a multihop network giving the mean and variance over 100 simulations. We have

290

H. Rif` a-Pous and J. Herrera-Joancomart´ı Table 1. Forwarding Simulation: Plain protocol vs. FURIES

Ratio of accepted transmissions Reputation level of accepted senders vs. average Reputation level of rejected senders vs. average

Plain protocol E(X) = 69%, σ 2 = 0.95 E(X) = 0%, σ 2 = 4.64 E(X) = 0%, σ 2 = 5.23

FURIES E(X) = 83%, σ 2 = 1.82 E(X) = 8%, σ 2 = 0.64 E(X) = −12%, σ 2 = 2.59

modeled the users willingness to forward packets based on their available resources (i.e. battery level), and the profits they can make for the action. Relaying parties do not transmit if the battery level is below 20%. However, our assumption is that between 20% and 50% they will resend packets if they obtain a credit rate over the cost price, in particular, a benefit more than 30%. If the remaining battery is above 50%, users will transmit if the reward is at least the 90% of what they offer. Despite the battery level, we also assume that nodes with a negative credit balance will accept any forwarding request. Otherwise, when the forwarding is rejected, the initiator has to search another routing path. It tries it up to five times. First of all, it has to be noted that the number of accepted transmissions in FURIES is greater than in the plain payment protocol. This is one of the goals of incentive protocols, and FURIES achieve it. By offering appropriate incentives -a good reputation status that, as we state in the next point, provides a quality of service-, FURIES can take profit of the maximum forwarding capacity of nodes and thus improve the overall throughput of the network. The service of forwarding packets is rewarded with credits, and the accumulation of credits increases the reputation status. The second and third rows of Table 1 show that in plain mode the reputation level of nodes which packets are accepted or rejected is not relevant since its average is the same as the rest of the population. That is, in spite of its accumulated credits, the sending of any node can be blocked. Nevertheless, in FURIES accepted traffic is from people who hold a better profile (8% better than the average), and rejected one is from those nodes that tend to behave more selfishly (its reputation is 12% worse than the average). Hence connectivity of cooperative nodes takes priority and such users receive a better quality of service. FURIES spurs cooperation, but does not enforce it. There are multiple reasons for which a node can not collaborate in a determinate moment (lack of resources, bandwidth..). What is not acceptable is a continuous selfish behavior, and thus is penalized. Moreover, when users enter in the FURIES system, they start with a negative reputation level in order to prevent sybil attacks that cause the unfair exploitation of the system. In general, the advantage of FURIES in front of other credit based mechanisms [8,6,7] is that credits have a double use: being the exchange currency of the payment protocol and, moreover, being the hook that attracts nodes to relay packets of certain users. The accumulation of credits is awarded, and because credits can not be obtained by external means, users have to provide resources to the net if they want to benefit of its services.

100

20

90

15

80

10

70 60

5

50

0

40

-5

30

-10

20 Credit trend

10

291

Credit/Average credit (%)

Packets delivered (%)

A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES)

-15

0

-20 0,0

0,1

0,2

0,3

0,4

0,5

0,6

0,7

0,8

0,9

1,0

x=Threshold/Router_IF Throughput

Accepted/Average credit

Rejected/Average credit

Fig. 4. Forwarding response of an ad hoc network

The evolution of an ad hoc network depends on the behavior of each of its members and how they react to the proposed incentives. We made a simulation of FURIES to analyze the performance of a network relative to the threshold used to trigger the forwarding services. We assume nodes always reject to forward when their battery level is below 20% of its capacity. Otherwise, they accept the transmission if the incentive factor of the initiator stated in its reputation certificate IFA0 , is greater in a certain factor x than its own IF , that is, IFA0 ≥ IFAi · x, where Ai is the forwarding node. Figure 4 shows the results of the simulation based on parameter x, that is, the quotient between the triggering threshold and the incentive factor of the forwarding node. The background columns of the figure depict the percentage of packets accepted to transmit. It is shown that the throughput of the network is nearly constant whatever the threshold. However, when we harden the condition and require the IFA0 is equal to the IFAi of the forwarding node (x = 1), the throughput gets down to 58%. If we would increment the threshold a little more, the throughput will continue to fall toward 0%. With this result it may seem that the best x to choose is a low one. However, for very low values of x we can not offer quality of service, the probability to get a packet rejected is hardly the same for all kind of nodes. It is worth noting the lines in the figure that show the relation between the reputation level of the nodes which packets are accepted or rejected, and the average level. The more these lines are separated, a better quality of service is offered because the reputation of a node most influence the forwarding acceptance decision. Moreover, the figure depicts with black arrows the credit storage trend of a group of people whose initial credit level was 0. It is shown that when x is low, the credit storage of the group tends to decrease, so in the long run people will not have credit to transmit. Therefore, there is a compromise to get the best results. Setting thresholds with low values increases the performance at short term but the network gets unhealthy: less credits, no quality of service, and so, at last, less motivation

292

H. Rif` a-Pous and J. Herrera-Joancomart´ı

to do the forwarding. On the other hand, high thresholds can reduce the throughput of the net. Consequently, there is no fixed optimum threshold, it depends on the resources of the node, its eagerness to transmit and so the necessity to obtain credits, etc.. The threshold is a variable that has to be adjusted in every case to get the expected reactions. However, the adjustment can be done automatically to meet the requirements of a specific environment.

5

Conclusions

The work presented in this paper describes a simple solution to stimulate cooperation in multihop networks and provides proof of concept. We have analyzed the protocol and, by means of simulation, we have evaluated the functionality of the system based on the configurable parameters. The results prove that FURIES fulfills its objectives: it improves the throughput of the network and reinforces a quality of service for collaborative nodes. In terms of future work, we plan to to study the performance of the protocol in real environments, evaluate its overhead in terms of energy consumption and delay, and compare it quantitatively and qualitatively with other mechanisms of incentives.

Acknowledgement The work described in this paper has been supported by the Spanish MCYT with a grant for the project PROPRIETAS-WIRELESS SEG2004-04352-C04-04.

References 1. Marias, G.F., Georgiadis, P., Flitzanis, D., Mandalas, K.: Cooperation enforcement schemes for MANETs: A survey. Wirel. Commun. Mob. Comput. 6, 319–332 (2006) 2. Buchegger, S., Boudec, J.L.: Nodes bearing grudges: Towards routing security, fairness, and robustness in mobile ad hoc networks. In: Euromicro Workshop on Parallel, Distributed and Network-based Processing (PDP) (2002) 3. Michiardi, P., Molva, R.: Core: A COllaborative REputation mechanism to enforce node cooperation in Mobile Ad Hoc Networks. Institut Eurecom. RR-02-062 (2001) 4. Rebahi, Y., Mujica, V., Simons, C., Sisalem, D.: SAFE: Securing packet Forwarding in ad hoc networks. Work. on App. and Services in Wirel. Networks (2005) 5. He, Q., Wu, D., Khosla, P.: SORI: A Secure and Objective Reputation based Incentive Scheme for Ad-hoc Networks. In: IEEE Wirel. Commun. and Net. (2004) 6. Butty´ an, L., Hubaux, J.: Nuglets: a virtual currency to stimulate cooperation in self-organized ad hoc networks. Tech.Rep.DSC (2001) 7. Anderegg, L., Eidenbenz, S.: Ad hoc-vcg: a truthful and cost-efficient routing protocol for mobile ad hoc networks with selfish agents. In: Mob. Compt. and Net (MobiCom), pp. 245–259. ACM Press, New York (2003) 8. Zhong, S., Chen, J., Yang, Y.R.: Sprite: A simple, cheat-proof, credit-based system for mobile ad hoc networks. In: IEEE INFOCOM, vol. 3, pp. 1987–1997. IEEE Computer Society Press, Los Alamitos (2003)

A Forwarding Spurring Protocol for Multihop Ad Hoc Networks (FURIES)

293

9. Butty´ an, L., Hubaux, J.: Stimulating cooperation in self-organizing mobile ad hoc networks. Tech.Rep.DSC (2002) 10. Crowcroft, J., Gibbens, R., Kelly, F., Ostring, S.: Modelling incentives for collaboration in mobile ad hoc networks. Perform. Eval. 57, 427–439 (2004) 11. Ileri, O., Mau, S.C., Mandayam, N.: Pricing for enabling forwarding in selfconfiguring ad hoc networks. IEEE J. Sel. Areas Commun. 23, 151–162 (2005) 12. Yoo, Y., Ahn, S., Agrawal, D.: A credit-payment scheme for packet forwarding fairness in mobile ad hoc networks. IEEE Intern. Conf. on Commun (ICC) 5, 3005–3009 (2005) 13. Congdon, P., Aboba, B., Smith, A., Zorn, G., Roese, J.: IEEE 802.1X Remote Authentication Dial In User Service (RADIUS) Usage Guidelines. RFC 3580 (2003) 14. Rivest, R.L., Shamir, A.: PayWord and MicroMint: Two Simple Micropayment Schemes. In: Security Protocols Workshop, pp. 69–87 (1996) 15. Tewari, H., O’Mahony, D.: Multiparty micropayments for ad hoc networks. IEEE Wirel. Commun. and Net (WCNC) 3, 2033–2040 (2003)

Direct Conversion Transceivers as a Promising Solution for Building Future Ad-Hoc Networks Oleg Panfilov1, Antonio Turgeon1, Ron Hickling1, and Lloyd Linder2 1

Technoconcepts, 6060 Sepulveda Blvd., Van Nuys, CA 91411, USA 2 Consultant {panfilov, tony, ronh, llinder}@technoconcepts.com

Abstract. A potential solution for building ad-hoc networks is described. It is based on the Technoconcepts’ TSR chipset. This chipset provides direct conversion of RF signals to baseband. Such RF/DTM chips convert the received signals into digital form immediately after the antenna, making it possible to provide all required signal processing in digital form, thus allowing adaptive frequency bands free of interference. This flexibility provides the most favorable conditions for quality communications. The offered solution has its own set of challenges. The paper describes the major challenges, and the potential ways of addressing them. An example of possible ad-hoc network architectures is presented that is based on the RF/DTM chips. Keywords: Direct RF conversion, dynamic spectrum allocation, opportunistic networks, network connectivity, frequency agility, protocol independence.

1 Introduction Ad-hoc wireless networks (AWN) show great promise in solving stringent communication requirements for system self-organization/self-configuration in a very dynamic and unpredictable communication environment typical for any mission critical applications. Operation in such communication environment, with its random pattern of interference, demands using adaptive principles for allocation of system resources in general and frequency bands in particular. To add to the seriousness of situation is the acute shortage of available frequencies that slows convergence of a viable mix of voice, data and video. The major culprit in the current spectrum shortage is the static nature of frequency allocation. Dynamic spectrum allocation (DSA) is a relatively new concept, taking advantage of the architectural features of software defined radio (SDR), to allow for clear, unjammed reception of a radio service, when the pre-determined allocated frequency band contains interference from another, unwanted service. DSA would substantially enhance the quality of service by eliminating or substantially minimizing the connection down time due to an unfavorable RF environment. It could be the sought after solution for mission critical applications, although any kind of communication system would undoubtedly benefit from it. The necessity to review the current state of affairs in spectrum utilization was emphasized in [1], [2]. The suggestion is to rely more on market forces rather than on Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 294 – 305, 2007. © Springer-Verlag Berlin Heidelberg 2007

Direct Conversion Transceivers as a Promising Solution

295

administrative regulations. Dynamic channel frequency assignment is proposed as the main mechanism to be implemented in the design of a physical and link layer OSI network model. This idea has attracted a lot of attention in finding a solution to the current spectrum shortage. For example, adaptive, dynamic mechanisms of spectrum reuse based on the smart radio are considered in [4], [5] as a viable solution to the current static frequency allocation. The importance of adaptive principles in DSA, including priority classes of different services for individual or multiple operators in the multi-vendor environment, is described in [6]-[21]. This paper leverages the ideas presented in [4]-[21], and provides the next step of the implementation of DSA. The main focus here is the practical realization of spectrum management principles by utilizing the direct conversion RF/DTM chips. These chips are able to provide universal network access to a broad range of frequencies covering the major wireless communication protocols including CDMA, GSM, WiMax, WiFi, and as well as future wireless standards. The emphasis is to use dynamically downloadable software to operate in a frequency agile environment, based on dynamic frequency management. A given range of possible scenarios, implementing RF/DTM chips, illustrates the benefits of frequency agility and protocol independence in solving DSA problems. The solution of one problem, particularly dynamic spectrum allocation, brings other problems that require adequate attention. This paper shows the benefits and challenges of broadband frequency agile chips in light of specifics in operating in broad frequency ranges. It is a necessity to cancel powerful interferers at the receiver front end while these interferers might be outside of the spectrum of desirable signals, as can be seen in Figure 1. The spectral occupancy measurements for Figure 1 were performed in New York City during the 2004 Republican Convention, when traffic intensity was substantially above the average level. It can be seen from Figure 1 that a lot of interference is picked up by the broadband receiver.

Continuous Distribution Peaked at Low Values >> Many handsets and many distances

Large Amplitude Concentrations >> Small Number of Base Stations

Fig. 1. Amplitude histogram of PCS band (courtesy of [3])

296

O. Panfilov et al.

Proper addressing of in-band and out-of-band interference will preserve the dynamic range of a receiver front end and avoid the possibility of its operation in the nonlinear region with all the resultant negative consequences. It has to be noted that since interference by its nature is not predictable, it has to be cancelled or avoided by using adaptive means. The approach to this solution will be described. Dynamic spectrum allocation can be viewed as an area where new solutions can be provided to an industry that is based on static spectrum allocation principles. Such outdated allocation principles result in gross inefficiencies of allocated frequencies on one side, and an acute shortage of these frequencies on the other. The successful solution to the spectrum availability problem will create a highly sought after win-win situation for both parties - customers as well as service providers. A significant amount of system simulation work has been done to look at how this concept could be implemented at the system level. What is lacking, and the focus of this paper, is the analysis of circuit level architectures, to support the concepts of DSA from an integrated circuit standpoint, to begin to understand the practical limitations of implementing the concepts in hardware, as well as performing architectural trade-off studies to determine solutions to overcome these limitations. The main body of the paper has the following structure. Section two describes the main DSA challenges. Section three shows the main architectural features and technical realization specifics of ad-hoc networks incorporating DSA. Section four is devoted to the existing TSR chip test results. The results will show the level of chip’s maturity, the areas where it can be utilized right away, as well as what kind of improvements are expected in the future chip generations. Section five summarizes the obtained results.

2 Main DSA Challenges The main DSA implementation challenges are easy to understand from the analysis of system operation. Figure 2 shows the block-diagram of the network operating environment of an ad-hoc network system implementing the DSA concept. Here, the simultaneously operated multitude of different wireless area networks including wide area (WANs), metropolitan (MANs), global (GANs) using different elevation satellites, and shared pre-defined frequency bands, have to coexist with ad-hoc networks set up to deal with the emergency situation. The illustration in Figure 2 corresponds to setting up an ad-hoc network to deal with a large forest fire, coordinating operation of the first response teams from fire, ambulance, and police departments. Two way communications between each element and the DSA controller are shown in purple colors. Here is an example of centralized control of frequency allocation. It assumes analysis of the interference level at each site and sending that information to a DSA controller. That controller analyzes data from all participating nodes and sends allocated frequencies to each node. As soon as the interference environment will change, the DSA controller will send the updated frequency distribution based on the new available frequencies. To operate in such a dynamic environment, the individual nodes of an ad-hoc network have to be able to receive and transmit in a broad range of

Direct Conversion Transceivers as a Promising Solution

297

Fig. 2. Network operating environment (an ad-hoc network using DSA control coexists with multitude of traditional networks)

frequencies, in addition to being able to scan the entire operational frequency range sequentially by separate set of transceivers to provide the DSA controller with data on the spectrum distribution of interference. Figure 3 shows an ad-hoc network operation separately illustrating the interaction of its components. The practical implementation requires co-ordination within a network between node receivers and the DSA controller as is shown in Figure 3. Here the DSA controller provides monitoring of the

298

O. Panfilov et al.

Fig. 3. An extended view of and ad-hoc network in action

operational environment through back channels with ad-hoc network nodes, where each node is equipped with the scanning RF receiver measuring the local interference levels.

3 Main System Architectural Features The DSA concept is based on determining a frequency band free of interference, and using it for the current communication session. In addition, it must support real-time mission critical services, across ad-hoc wireless networks involving traditional quality of service (QOS) mechanisms for dynamic allocation of RF spectrum among multiple users. With that in mind, shown below is a partial set of requirements for the implementation of DSA in ad-hoc networks: • Operational frequency band: It has to be sufficient to include the major frequency bands for the most popular communication standards. • Low switching time: Low latency in switching from one spectrum band to another is important in providing uninterruptible service and satisfying QOS. It has to be less than 150ms to satisfy QOS requirements. • Low bit error rate (BER): Low BER is important to maintain required QOS. • Spurious free dynamic range (SFDR): Low system SFDR allows operating in the broad dynamic range of input signals without experiencing a negative impact of nonlinear distortion and compression of the receiver front end. • Power efficiency: It is important to prolong battery life of each device, thus extending the lifetime of the entire system.

Direct Conversion Transceivers as a Promising Solution

299

• Self-organization: Procedures involved in self-organization include: spectrum occupancy analysis within the targeted range of frequencies, selection of a few recommended frequencies for potential users, and sending these recommendations to the participating nodes. Actual implementation can be done based on centralized or distributed approaches using dedicated transceivers and specialized signal processing for performing spectrum analysis and allocation functions. Both options have issues. • Scalability: This requirement refers to the ability of a system to support a variable number of nodes without affecting performance of each of them. It includes the number of nodes, traffic load, and some mobility aspects. • Security: Security is always a critical aspect during deployment of a wireless network, since the broadcast nature of wireless signals and network itself is vulnerable to attacks at various protocol layers. Efficient resilience for physical layer jamming is very crucial for SDR. Using adaptive interference cancellation may be recommended to enhance system security and availability of its services. • Multicasting: Efficient multicast should be supported by the ad-hoc network. For example, in a two-way, or many-to-one, communication, many services such as voice and multimedia will require simultaneous processing of different signals. These signals come from different geographically distributed sources that may have variable signal power spectral density at a receiver input. The frequency allocation to each one of the originated sources has to take into account the destination source status in terms of interference level at its location. As a further extension of DSA concept, it is conceivable that multiple services can share frequency bands, or use frequency bands, that typically they do not have, by taking advantage of this intelligent network. DSA provides a new way of taking advantage of communications transceivers to optimally remove jamming and interference situations for different frequency bands and services. By doing so, the receiver will not be taxed from a dynamic range perspective. Resultantly, through the use of digital programmability, the DC power of the receiver can be optimized for the scenario of normal operation. Thus, as a result of no interference, the receiver block requirements can be relaxed, and resultantly, the DC power of the receiver can be reduced, enhancing performance of the receiver. As with all new concepts, the DSA implementation will evolve architecturally. The initial application of the DSA may be expected in systems that have the luxury of dissipating more power consumption, in order for the ideas to be refined. Eventually, the concept will be applied to consumer / hand-held applications, and the flexibility of digital programmability, as well as software license upgradeability, will open up an entirely new way of looking at wireless communications. Putting it all together, a DSA controlled system might look like the one shown on Figure 4. Figure 4 shows the DSA controller from Figure 3 in more detail. There are two signal paths: one for transmission and the other for reception. Figure 4 shows a conceptual block diagram of a DSA system. A system includes two transceivers and baseband DSA controllers. A base-band controller provides intelligent digital signal processing. Initially, the receiver on transceiver #1 is in SCAN mode, by selecting the wide-band input filter setting. The entire frequency range is down-converted with the receiver in a high dynamic range mode. The

300

O. Panfilov et al.

Fig. 4. Detailed Block Diagram of a Conceptual Implementation of the DSA architecture

frequency band of interest is probed for the signals of interest, as well as interference. If there are interferers, the transmitter sends information to the receiver in transceiver #2 to move frequency bands. The transmitter of the transceiver #2, as well as the filter bank of the receiver on transceiver #1, is digitally programmed for the new frequency band, and the signal is received on a narrow bandwidth channel. If there are no interferers, the filter bank on transceiver #1 is selected for the proper frequency band, and the signal is received on a narrow bandwidth channel. For reception on transceiver #2, the opposite holds. Additional transceivers can be added to create a mesh network, and the base-band processors must keep track of the services, time-slotting, frequency bands, and hand-shaking that must occur in order for the network to work properly, without self-jamming. Aside from the TechnoConcepts radios, most of the equipment is “off the shelf”. The system controller function is provided by a standard personal computer. ETHERNET will be used for all inter-equipment data transfer therefore standard routers form the core of the Interoperability switch and the Channel Bank MUX and Controller. For transmission the baseband signal is routed to the proper radio by the channel bank multiplexer. An appropriate TX module (a radio exciter) is software defined to operate on the appropriate channel using the appropriate protocol. The output of the exciter is routed to the appropriate PA/channel through the transmit cross-point switch. This switch permits any TX module be used as an exciter to any PA/channel combination. Customized PA modules are used for each channel to match the PA requirements to the licensed power and mode(s) available to the channel. Duplicate hot standby PA’s automatically replace the main PA in case of failure, and alternate TX modules can serve as hot standbys resulting in complete redundancy for the entire radio. For reception, the channel signal is first routed through a receive cross-point switch permitting any RX module to be assigned to any channel. The chosen RX module is software defined to operate on the desired channel with the appropriate protocol. As in the TX case, any RX module can be assigned to any channel providing maximum reliability, flexibility and redundancy. The demodulated baseband signal is then routed through the channel bank to the desired location. These locations can include another transmitter (for simple repeater operation), a trunk controller (for a trunking configuration), to a pager server (for a pager configuration), and/or to the dispatcher

Direct Conversion Transceivers as a Promising Solution

301

via the microwave link. Of course, multiple transmitters, including other base stations, can be fed simultaneously to configure a simulcast system, through the appropriate processing and time delays. Figure 5 provides block diagrams of the TX and RX modules. These modules are based upon TechnoConcepts proprietary RX and TX silicon germanium integrated circuits. These “chips” bridge the radio frequency domain with the digital processing domain. The receiver chip provides a one-way conversion from a radio signal to digital signals. The transmitter chip provides a one way conversion from digital signals to a radio signal. The baseband processors are digital integrated circuits that receive or transmit the desired radio message using the proper channel and the proper protocol. The Configuration and Control processors choose the proper RX and /or TX module, set it to the proper channel, tell it to use a chosen protocol, and route the appropriate baseband signals. The channel bank multiplexer and controller routes the baseband signals to and from the RF Gateway system, defines system configurations (interoperability) and provides for interoperability to other systems. Figure 5 consists of multiple transceivers shown in Figure 4. This network is controlled by the baseband DSA controller network, and hand-shaking procedure between the transceivers must be accomplished intelligently in order to avoid self-jamming.

Fig. 5. Details of the DSA network

4 Test Results In support of DSA, the TSR receiver and transmitter ICs have been developed by Technoconcepts. The receiver IC has a wide band RF front end, allowing the digitization of a wide frequency range. The test set-up for the TSR measurements includes two external synthesizers for the RF input frequency, as well as the RF clock source. A simplified version of the test set-up is shown in Figure 6.

302

O. Panfilov et al.

Fig. 6. Block-diagram of system test

Additionally, band-pass filters are included at the outputs of the synthesizers to filter out the wide-band noise and distortion of the signal sources. The ADC produces a serial bit stream. This bit stream is de-multiplexed on-chip. The de-multiplexed digital data then goes into a decimating filter block on-chip. The output of the decimating filter block is the digital output of the chip. This data is transferred from the receiver evaluation board to a data capture board. The data capture board interfaces to a PC. A Fast Fourier Transform (FFT) is performed on the digital data, and the result is shown in Figure 6. The digital output bits of the TSR chip are a decimated, filtered version of the on-chip ADC’s digital output. The Spur-Free Dynamic Range (SFDR) and Signal-to-Noise Ratio (SNR) over a frequency range of 850 MHz to 6 GHz. are obtained from the ADC FFT plots and are shown in Figures 7 and 8. Measurements of Receiver SFDR vs. ADC Output Power With Receiver Input Frequency @ 0.85 GHz, 1.75 GHz, 2.75 GHz, 3.5 GHz, 6 GHz 75 0.85 GHz 1.75 GHz 2.75 GHz

70

3.5 GHz 6 GHz

65

SFDR (dBc)

60

55

50

45

40

35

30 -45

-40

-35

-30

-25

-20

-15

-10

ADC Output Power (dBFS)

Fig. 7. Existing TSR SFDR as a function of ADC output power

-5

Direct Conversion Transceivers as a Promising Solution

303

Receiver SNR in 5 MHz Bandwidth vs. ADC Output Power With Receiver Input Frequency @ 0.85 GHz, 1.75 GHz, 2.75 GHz, 3.5 GHz, 6 GHz 60

0.85 GHz 0.85 GHz 1.75 GHz 2.75 GHz 3.5 GHz 6 GHz

55

50

6 GHz

SNR (dBc)

45

40

35

30

25

20

15 -45

-40

-35

-30

-25

-20

-15

-10

-5

0

ADC Output Power (dBFS)

Fig. 8. Existing TSR SNR as a function of RF input frequency

the receiver’s SFDR, versus ADC output power, for a few typical frequencies within the entire operational range are shown on Figure 7. Frequencies selected include the end points of operational range as well as a few ones in proximity of popular communication standards implementation. It can be seen from Figure 7 that SFDR, as expected, degrades over frequency, however, the receiver demonstrates frequency agility, and the ability to receive and digitize a very wide band of frequencies. At lower RF input frequencies, the receiver’s measured SFDR is on the order of 70 dB. Figure 8 shows measurements of the receiver’s SNR in a 5 MHz bandwidth versus the ADC amplitude for several input RF frequencies. The SNR performance of the receiver supports a number of wireless applications, and thus demonstrates the frequency agility and flexibility of the receiver. For DSA applications, in order to encompass a wide frequency range, wide band receivers such as the TSR receiver IC will be needed. These receivers will be used in conjunction with intelligent / switch-able front end filter banks, in order to take advantage of the DSA architecture.

5 Conclusion This paper has described a new approach in addressing implementation specifics of ad-hoc networks (AHN) destined to operate in very dynamic and unpredictable communication environment typical for any mission critical applications. The offered approach is based on using direct conversion RF/DTM chips taking advantage of the architectural features of software defined radio (SDR), to allow for clear, un-jammed reception of a radio service, when the pre-determined allocated frequency band

304

O. Panfilov et al.

contains interference from another, unwanted service. RF/DTM chips convert the received signals into digital form immediately after an antenna, making it possible to provide all required signal processing in digital form, thus allowing adaptive selection of frequency bands free of interference. The advantages and challenges, associated with implementing dynamic spectrum allocation (DSA) to ad-hoc networks have been presented. Notional concepts have illustrated how a RF/DTM transceiver can be used to implement DSA for a wireless system. All the analysis was done under the assumption that the conventional regulatory constraints on the current static spectrum distribution were being either relaxed or removed. A mesh network of transceivers, functioning using the DSA principle, has been discussed. There are architectural trade-offs and system issues that need to be further developed for practical implementations of the DSA concept. To that end, Technoconcepts has built and demonstrated a wide-band receiver front end IC that can support further development of DSA. The SNR and SFDR of the IC have been measured over a broad range of RF input frequencies, verifying that it is feasible to support the DSA concept from a hardware standpoint.

References 1. Hoffmeyer, J.A.: Regulatory and standardization aspects of DSA technologies - Global Requirements and Perspectives. In: DySPAN 2005, pp. 700–705 (2005) 2. Pawełczak, P., Prasad, R.V., Xia, L., Niemegeers, I.G.: Cognitive radio emergency networks - requirements and design. In: DySPAN 2005, pp. 601–606 (2005) 3. McHenry, M.: NSF Spectrum Occupancy Measurements, New York (2004) 4. Maldonado, D., Le, B., Hugine, A., Rondeau, T.W., Bostian, C.W.: Cognitive radio applications to dynamic spectrum allocation: a discussion and an illustrative example, New Frontiers in Dynamic Spectrum Access Networks. In: DySPAN 2005. First IEEE International Symposium, November 2005, pp. 597–600 (2005) 5. Cowen-Hirsch, R., Shrum, D., Davis, B., Stewart, D., Kontson, K.: Software radio: evolution or revolution in spectrum management. In: MILCOM 2000. Proceedings of the 21st Century Military Com. Conference, vol. 1, pp. 8–14 (2000) 6. Xu, L., Tonjes, R., Paila, T., Hansmann, W., Frank, M., Albrecht, M.: DRiVE-ing to the Internet: Dynamic Radio for IP services in Vehicular Environments. In: LCN 2000. Proceedings of the 25th Annual IEEE Conference on Local Computer Networks (2000) 7. Kim, H., Lee, Y., Yun, S.: A dynamic spectrum allocation between network operators with priority-based sharing and negotiation. Personal, Indoor and Mobile Communications. In: PIMRC 2005. IEEE 16th International Symposium, September 11-14 2005, vol. 2 (2005) 8. Kulkarni, R., Zekavat, S.: Traffic - Aware Inter-vendor Dynamic Spectrum Allocation: Performance in Multivendor Environment. In: IWCMC’06, July 3-6, 2006, Vancouver, British Colambia, Canada (2006) 9. Oner, M., Jondral, F.: Cyclostationarity-based methods for the extraction of the channel allocation information in a spectrum pooling system. Radio and Wireless Conference, September 19-22, 2004, IEEE, Los Alamitos (2004) 10. Leaves, P., Moessner, K., Tafazolli, R., Grandblaise, D., Bourse, D., Tonjes, R., Breveglieri, M.: Dynamic spectrum allocation in composite reconfigurable wireless networks. Communications Magazine, IEEE 42(5) (2004)

Direct Conversion Transceivers as a Promising Solution

305

11. Grandblaise, D., Moessner, K., Leaves, P., Bourse, D.: Reconfigurability support for dynamic spectrum allocation: from the DSA concept to implementation, Mobile Future and Symposium on Trends in Communications. In: SympoTIC ’03. Joint First Workshop, 2628 October (2003) 12. Keller, R., Lohmar, T., Tonjes, R., Thielecke, J.: Convergence of cellular and broadcast networks from a multi-radio perspective. IEEE Wireless Communications 8(2) (2001) 13. Andrews, J.: Interference Cancellation for Cellular Systems, University of Texas, Austin (February 8, 2005) 14. Lu, X.: Novel Adaptive Methods for Narrow Band Interference Cancellation in CDMA Multiuser Detection. In: ICASSP 05 (2005) 15. Panfilov, O., Hickling, R., Turgeon, T., McClellan, K.: Direct Conversion Software Radio - Its Benefits and Challenges in Solving System Interoperability Problems. In: Mobility 2006 conference, Bangkok, Thailand (2006) 16. Santivanez, C., Ramanathan, R., Partridge, C., Krishnan, R., Condell, M., Polit, S.: Opportunistic Spectrum Access: Challenges, Architecture, Protocols. In: WiCon ’06, Boston MA (2006) 17. van Wazer, L. (FCC), Spectrum Access and the Promise of Cognitive Radio Technology. In: presented at the FCC Cognitive Radio Workshop (2003) 18. Marshall, P.: (DARPA), Beyond the Outer Limits - XG Next Generation Communications. In: presented at the FCC Cognitive Radio Workshop (May 19, 2003) 19. Hsieh, T., Kinget, P., Gharpurey, R.: An Approach to Interference Detection for Ultra Wideband radio Systems, Design, Applications, Integration and Software. In: 2006 IEEE Dallas/CAS Workshop, pp. 91–94 (2006) 20. Cagley, R.E., Mcnally, S.A., Wiatt, M.R.: Dynamic Channel Allocation for Dynamic Spectrum Use in Wireless Sensor Networks. In: MILCOM 2006. Military Communications Conference, pp. 1–5 (2006) 21. Haas, H., Nguyen, V.D., Omiyi, P., Nedev, N., Auer, G.: Interference Aware Medium Access in Cellular OFDMA/TDD Networks, Communications. In: ICC ’06, June 2006, vol. 4, pp. 1778–1783 (2006)

Location Tracking for Wireless Sensor Networks Kil-Woong Jang Division of Nano Data System, Korea Maritime University 1 YeongDo-Gu Dongsam-Dong, Busan, Korea [email protected]

Abstract. In location tracking, there is a trade-off between data accuracy of mobile targets and energy efficiency of sensor nodes. If the number of nodes is increased to track a mobile target, the level of data accuracy increases, but the energy consumption of nodes increases. In this paper, we propose a new location tracking scheme that considers these two factors. The proposed scheme is designed to track mobile targets in a cluster-based sensor network. In order to increase the energy efficiency of sensor nodes, a portion of nodes that detect the mobile target is selected to track the target using the backoff procedure. In addition, we consider data accuracy as controlling the number of nodes by varying the transmission range of the nodes. We perform a simulation to evaluate the performance of the proposed scheme over sensor networks. We provide simulation results comparing the proposed scheme with the general approach. The simulation results show that the proposed scheme has the excellent performance over a broad range of parameters. Keywords: wireless sensor networks, location tracking, backoff procedure, moving objects, wireless networks.

1

Introduction

Sensor networks are an emerging technical challenge in ubiquitous networks. Using many sensor nodes with sensing, wireless communication and computation functions, a wide range of monitoring applications, such as temperature, pressure, noise and so on, have been commonly studied in the literature [1]. Sensor networks consist of a large number of nodes which are very close to each other, and have a multi-hop wireless topology. The nodes’ batteries are constrained, and cannot be recharged or replaced after they are deployed. In sensor networks, the low power consumption requirement is the most important constraint of the nodes. Detecting and tracking a mobile target is one of the challenging application areas in sensor networks [2-5]. In general, the nodes surrounding a mobile target detect and track the target, and collaborate among themselves to aggregate data regarding the target. The aggregated data then can be forwarded to a sink or a base station. Consideration for detecting and tracking a mobile target provides reliable data of the mobile target and forwards the data to the sinks or the base stations in a fast and energy efficient way. Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 306–315, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Location Tracking for Wireless Sensor Networks

307

In this paper, we propose a new scheme to detect and track the mobile target, differing in terms of energy efficiency and data accuracy. The proposed scheme is designed based on the cluster network. The head in each cluster collects data about the mobile target from the nodes and aggregates the data to obtain more accurate data about the mobile target. In general, all active nodes surrounding the mobile target can participate in tracking the target. In the proposed scheme, however, even if only a portion of active nodes are tracked, energy efficiency can be increased. To select the participating nodes to track the mobile target, the proposed scheme makes use of the backoff procedure. A node that detects the mobile target broadcasts a message to other nodes existing within the node’s transmission range, using the backoff procedure. The number of nodes required to track the target can vary depending on the transmission range. In the performance evaluation, we evaluate data accuracy and energy efficiency for the proposed scheme under a variety of parameters.

2

Related Work

Several approaches have been studied to detect and track a mobile target in sensor networks. Yang et al. [2] studied the problem of tracking moving objects using distributed wireless sensor networks in which sensors are deployed randomly. They proposed an energy efficient tracking algorithm, called Predict-and-Mesh that is well suited for pervasively monitoring various kinds of objects with random movement patterns. Predict-and-Mesh is a distributed algorithm consisting of two prediction models: n-step prediction and collaborative prediction, and a predication failure recovery process called mesh. Zhang et al. [3] proposed a dynamic convoy tree-based collaboration (DCTC) framework to detect and track the mobile target and monitor its surrounding area. DCTC relies on a tree structure, which includes nodes around the mobile target, and the tree is dynamically configured to add and prune selected nodes as the target moves. As the configuration of a tree changes dynamically, they proposed two tree expansion and pruning schemes and two tree reconfiguration schemes. This paper compared and evaluated their schemes under the different node density, in terms of coverage and energy consumption. Halkidi et al. [4] presented a distributed mechanism to track tracking moving objects efficiently and accurately with a network of sensors. Their mechanism provides set-up and cooperation of the sensors within the network, while providing fault tolerant characteristics through replication. They provide an algorithm for predicting, with high probability, the future location of an object based on past observations by many sensors.

3 3.1

The Proposed Scheme Assumptions

We present some assumptions before describing how the proposed scheme operates. Every node has capabilities for sensing, communication and processing.

308

K.-W. Jang

D

C B

E

A

Fig. 1. An example of selection process for the proposed scheme. A circle node represents a sensor node, and a rectangle node represents a mobile target. A dotted circle represents the maximum transmission range of a node.

In addition, all nodes have the same amount of energy and no error of communication and processing. Every node knows its own position a priori through manual techniques or by using various other techniques [7]. 3.2

Scheme Description

We describe two processes for the proposed scheme: selection and release. In the proposed scheme, all nodes periodically maintain active or sleep status. The nodes with active status have the capability of detecting and tracking a mobile target. Whenever a mobile target is detected by nodes surrounding the target, it is determined whether the node is tracking the mobile target using the backoff procedure. Here the backoff procedure used in the proposed scheme is slightly different from that of IEEE 802.11 [8]. The backoff procedure of the proposed scheme is as follows. Each node selects a random time and the node that selects the shortest random time will transmit a DETECT message. The other nodes stop their backoff times and do not wait for the remaining time. The node that transmit the message to neighbors starts to track the mobile target. Upon receiving the message, the neighbors stop their backoff procedure and do not reply to any message. To briefly describe the selection process presented so far, we use the example shown in Fig. 1. Suppose five nodes detect the mobile target simultaneously. Five nodes trigger the backoff procedure. If node A has the lowest value of backoff times, node A sends a DETECT message to neighbors when the value of the backoff times is equal to zero. In Fig. 1, node C and E receive the message from node A, and stop the backoff procedure. However, as nodes B and D exist out of

Location Tracking for Wireless Sensor Networks

D

C

D

B

E

A

309

C B

E

A

(a)

(b)

Fig. 2. Collision examples of selection process for the proposed scheme. An arrow between nodes represents that source sends a control message to destination.

the transmission range of node A, they do not receive the message from node A. Thus, node B and D continue to progress through the backoff procedure. If node D has the lowest value of backoff times, node D is selected to track the target. If more than two nodes simultaneously transmit the DETECT message due to having the same backoff time, the collision event occurs. We consider some examples for collision. Suppose node A and E have the same value of backoff times, then the event of collision occurs as shown in Fig. 2(a). Node A and E start to track the mobile target after sending a DETECT message. Due to collision, no nodes receive the message, and they progress through the backoff procedure. In such a situation, we can consider two cases. The first case is that node C has the lowest value of backoff times. Node C then sends a DETECT message to neighbors and starts to track the mobile target. On receiving the message, nodes A and E continue to track the mobile target and nodes B and D finish the backoff procedure. Finally, nodes A, C and E track the mobile target. The second case is that node B or D has the lowest value of backoff times. If node D has the lowest value of backoff times, node D sends a DETECT message to neighbors. Nodes A and E continue to track the target since node C, which is the intermediate node between node A (or E) and node D, do not reply to the message. Therefore, three nodes, node A, E and D, track the mobile target. Another example is as follows. Consider that nodes A and D have the same value of backoff times as shown in Fig. 2(b). When node A sends a DETECT message to neighbors, nodes C and E receive the message. At the same time, when node D sends the message to neighbors, nodes B and C receive the message. Here, as node C concurrently receives the message from node A and D, the collision occurs at node C. Therefore, since node C does not recognize the message, it continues to progress through the backoff procedure. When the backoff times of

310

K.-W. Jang

D

C

B

A

Fig. 3. An example of release process for the proposed scheme. A dashed circle represents the maximum sensing range of a node.

node C is zero, node C sends a DETECT message to neighbors. Finally, nodes A, C and D track the target. The target will be located out of the sensing range of tracking nodes as the target moves. In such a case, the tracking nodes send a RELEASE message to neighbors. Of nodes receiving the message, nodes that can detect the target trigger the backoff procedure. As mentioned above, nodes that have the lowest value of backoff times are selected as the tracking node. To present the release process of the proposed scheme easily, we consider an example, shown in Fig. 3. Now, node A is tracking the target, and nodes B and C only detected it. As the target moves, it is located out of the transmission range of node A. Node A sends a RELEASE message to neighbors, since it cannot track the target. Upon receiving the message, nodes B and C start the backoff procedure, and nodes having the lower value of backoff times may be tracking the target. If nodes exist that are not neighbors of node A and can detect the target, they also trigger the backoff procedure. In Fig. 3, node D is not neighbors of node A and can detect the target. We can see that nodes B and C start the backoff procedure after they receive the message from node A, but node D starts the backoff procedure when it detects the target. Therefore, node D starts the backoff procedure earlier than nodes B and C, and node D has a higher probability of being selected as the tracking node. Therefore, the proposed scheme does not need to frequently change tracking nodes when the target moves in a uniform direction. If the RELEASE message losses, the nodes (like node D in Fig. 3) outside of the transmission range of node A only start the backoff procedure. Our location tracking scheme is designed in a cluster-based sensor network [6]. As the head of each cluster aggregates data from tracking nodes, the proposed

Location Tracking for Wireless Sensor Networks

311

scheme can increase the level of data accuracy and when the size of the data report is reduced it can decrease the energy required to send data from the head to sink.

4

Performance Evaluation

We carried out computer simulation to evaluate the performance of the proposed scheme. In this section, we describe performance metrics, simulation environment and simulation results. 4.1

Performance Metrics

In order to evaluate the performance of the proposed scheme, the performance metrics of interest are – the number of tracking nodes and – total energy consumption. 4.2

Simulation Environment

The network model for simulation consists of randomly placed nodes in a constant square area. We assume that there are some nodes distributed randomly over 200 × 200 m2 , and we divide the network area into clusters that consist of 50 × 50 m2 . In order to measure the energy dissipation of nodes, we use a radio model developed in [5]. In this model, nodes have the transmitter circuitry, which consists of a transmit electronics and a transmit amplifier, Let Ee be the energy dissipated in transmit and receive electronics and Ea be the energy dissipated in the transmit amplifier. We assume that Ee = 50 nJ/bit and Ea = 100pJ/bit/m2. We also assume that the energy loss happens according to the distance between source and destination. Therefore, the transmit energy, Et , dissipated to send a k bit data packet to a destination at distance d is as follows: Et = Ee × k + Ea × k × d2 .

(1)

The value of total energy, E, dissipated to transmit data between the tracking node and the head, is as follows: E = Es + Et .

(2)

Here Es is defined as the energy dissipated to detect and track the mobile target. However, Es is very small value compared to Et , thus we did not include this value to calculate E in this paper. In the proposed scheme, we need additional energy, Ec , to carry out the selection and release processes for tracking nodes. Therefore, E for the proposed scheme is as follows: E = Es + Et + Ec .

(3)

The simulation parameters used to simulate the proposed scheme are listed in Table 1.

312

K.-W. Jang Table 1. Simulation Parameters Parameters

Values

Network size (m2 ) 200 × 200 Number of nodes 100 – 700 Transmission range (m) 6 – 20 Sensing range (m) 10 Simulation time (s) 1000 Speed of a mobile target (m/s) 2 Size of a control message (DETECT, RELEASE) (byte) 10 40 or 80 Size of a sensing report: Rs (byte)

4.3

Simulation Results

In this section, to evaluate the proposed scheme, we compare it with a different location tracking scheme, in terms of the average number of tracking nodes and total energy consumption. The compared scheme makes all nodes that can detect the mobile target track the target. In this paper, hereafter we call this the normal scheme. When comparing the proposed scheme and the normal scheme, both are adapted in the same cluster-based sensor networks. We first experiment on various values of transmission ranges for sensor nodes, as shown in figures 4 and 5. In Fig. 4, we plot total energy consumption, and in Fig. 5, we plot the average number of tracking nodes for the two schemes. In these figures, we denote the proposed scheme as ”proposed” and the normal scheme as ”normal”. At the same time, we experiment with two different sizes of sensing reports, Rs , for two schemes, 40 and 80, respectively. Fig. 4 shows that the energy efficiency of the proposed scheme is approximately 3 times greater than that of the normal scheme, because the proposed scheme utilizes a lower number of nodes than the normal scheme. In the normal scheme, when the size of the data report doubles, the energy consumption of this scheme is almost doubled. However, in the proposed scheme, although the size of the data report doubles, it consumes less that double the energy previously required, because it includes the energy consumption to carry out the selection and release process. When using the normal scheme, average number of nodes required to track a target is about 5.5, as shown in Fig. 5. This means that there are about 5.5 nodes surrounding the target. In this figure, we see the proposed scheme tracks the target and uses 2 – 4 times fewer nodes than the normal scheme. By using a fewer number of nodes to track the target, the proposed scheme has the advantage of increasing the energy efficiency. However, as the number of tracking nodes is decreased, the data accuracy can be decreased. In this figure, as the transmission range of nodes is increased, we also see the average number of tracking nodes is accordingly decreased. This is because the messages for the backoff procedure are sent further due to longer transmission range. Specially, when the transmission range is twice that of the sensing range, only a node tracks the target.

Location Tracking for Wireless Sensor Networks

313

Total energy consumption (J)

300 normal (Rs=40) normal (Rs=80) proposed (Rs=40) proposed (Rs=80)

250 200 150 100 50 0 6

8

10

12

14

16

18

20

Transmission range (m)

Fig. 4. Total energy consumption under various transmission ranges

Average number of tracking nodes

6 5

normal (Rs=40) normal (Rs=80) proposed (Rs=40) proposed (Rs=80)

4 3 2 1 0 6

8

10

12

14

16

18

20

Transmission range (m)

Fig. 5. Average number of tracking nodes under various transmission ranges

We next evaluate the various numbers of deployed nodes, as shown in figures 6 and 7. In Fig. 6, we plot total energy consumption, and in Fig. 7 we plot the average number of tracking nodes for the two schemes. Although the number of nodes is increased in the proposed scheme, the energy consumption is only increases slightly because the number of tracking nodes is slowly increased. However, energy dissipated in the normal scheme is increased in direct proportion to the number of deployed nodes. Moreover, as the size of the data report is doubled, we saw that the normal scheme has energy consumption increased at double speed, but the proposed scheme only increased slowly. Fig. 7 shows that the proposed scheme has fewer numbers of tracking nodes than the normal scheme. In particular, as the number of deployed nodes is increased, we see that the gap for the number of tracking nodes is accordingly increased. When a lot of nodes are deployed in sensor networks, though a part of all the nodes surround the target track the target, we can see that the energy efficiency is increased and

314

K.-W. Jang

Total energy consumption (J)

350 normal (Rs=40) normal (Rs=80) proposed (Rs=40) proposed (Rs=80)

300 250 200 150 100 50 0 100

200

300

400

500

600

700

Number of deployed nodes

Fig. 6. Total energy consumption under the various numbers of deployed nodes

Average number of tracking nodes

6 normal (Rs=40) normal (Rs=80) proposed (Rs=40) proposed (Rs=80)

5 4 3 2 1 0 100

200

300

400

500

600

700

Number of deployed nodes

Fig. 7. Average number of tracking nodes under the various numbers of deployed nodes

the data accuracy decreased slightly in the above figures. In figures 5 and 7, we can see that the plots overlap in the graph because average number of tracking nodes of normal and proposed schemes is equal irrespective of Rs . The simulation results show that the proposed scheme achieves high energy efficiency using a portion of nodes surrounding the target when a lot of nodes are deployed in the network. In an environment requiring data accuracy, the proposed scheme can increase data accuracy by increasing the number of tracking nodes and reducing the transmission range of nodes.

5

Conclusions

In this paper, we presented a new location tracking scheme to maintain the energy of sensors in sensor networks efficiently. The proposed scheme is designed to track mobile targets in a cluster-based sensor network. In order to increase the

Location Tracking for Wireless Sensor Networks

315

energy efficiency of sensor nodes, a portion of sensor nodes, to detect the mobile target, are selected to track the target using backoff procedure. Moreover, we consider data accuracy as controlling the number of nodes, by varying the transmission range of the node. Using the simulation, we evaluated the performance of the proposed scheme in terms of the energy dissipated in the network and the data accuracy. The simulation results demonstrated that the proposed scheme outperformed a general scheme over various parameter ranges.

References 1. Akyildiz, I., Su, W., Sankarasubramanian, Y., Cayiraci, E.: Wireless sensor networks: a survey. Computer Networks 38, 393–422 (2002) 2. Yang, L., Feng, C., Rozenblit, J.W., Qiao, H.: Adaptive tracking in distributed wireless sensor networks. In: 13th Annual IEEE International Symposium and Workshop on Engineering of Computer Based Systems (2006) 3. Zhang, W., Cao, G.: DCTC: Dynamic Convoy Tree-Based Collaboration for Target Tracking in Sensor Networks. IEEE Transaction on Wireless Communications 5, 1689–1701 (2004) 4. Halkidi, M., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Resilient and energy efficient tracking in sensor networks. International Journal of Wireless and Mobile Computing 1(2), 87–100 (2006) 5. Kung, H.T., Vlah, D.: Efficient Location Tracking Using Sensor Networks. In: Proceedings of IEEE WCNC (2003) 6. Heinzelman, W.R., Chandrakasna, A., Balakrishnam, H.: An Application-Specific Protocol Architecture for Wireless Microsensor Networks. IEEE Transaction on Wireless Communications 1(4), 660–669 (2002) 7. Zou, Y., Chakrabarty, K.: Sensor Deployment and Target Localization Based on Virtual Forces. In: INFOCOM (2003) 8. IEEE Standard 802.11 for Wireless Medium Access Control and Physical Layer Specifications (August 1999)

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks with Anonymous Packets Jerzy Konorski Gdansk University of Technology ul. Narutowicza 11/12, 80-952 Gdansk, Poland [email protected]

Abstract. A mobile ad hoc network (MANET) station acts both as a source packet generator and transit packet forwarder. With selfish stations and the absence of administrative cooperation enforcement, the lack of forwarding incentives has long been recognized as a serious design problem in MANETs. Reputation systems discourage selfishness by having past cooperation increase the present source packet throughput. We describe a simple watchdogcontrolled first-hand reputation system and point to a form of selfishness not addressed by existing research, arising from packet anonymity. If the watchdog at a station cannot tell a nearby station's source packets from transit packets, that station is tempted to admit more source packet traffic than a fair local admittance control (LAC) scheme permits. We analyze a related noncooperative LAC game and characterize three types of its Nash equilibria. Next we propose a simple packet forwarding protocol by the name Decline and Force (D&F) and using an approximate performance model show that, when properly configured, D&F leads to a fair and efficient game outcome.

1 Introduction A mobile ad-hoc network (MANET) uses a wireless medium to interconnect a number of stations, some pairs of which are adjacent i.e., can directly receive each other's transmissions. MANET stations should act both as a terminals (admit source packets and absorb destination packets) and packet forwarders (transmit transit packets on behalf of nonadjacent stations); this enables multihop source-to-destination routes. However, forwarding in MANETs needs incentives. Firstly, stations belonging to different owners need not be concerned about global connectivity. Secondly, forwarding transit packets is doubly unprofitable: it consumes energy and delays source packets. Finally, MANETs allow station anonymity and so refusal to forward by an "energy stingy" and/or "bandwidth greedy" station may meet with little punishment. One can envisage various kinds of selfish (as distinct from cooperative and malicious) forwarding behavior; [17] presents a taxonomy. Various schemes have been proposed to enforce cooperative forwarding behavior. Micropayment schemes make it necessary for a station to forward transit packets in order to earn a virtual currency with which to pay other stations for forwarding its source packets. Honest credit clearance necessitates tamper-proof cryptographic modules at each station [6, 7] or secure communication protocols [2, 21]. Typically, Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 316 – 329, 2007. © Springer-Verlag Berlin Heidelberg 2007

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks

317

a station is required to forward at least as many transit packets as it transmits source packets; on the other hand, there will never be enough incentives to forward more than that [19]. To overcome this difficulty, a market-based approach [11, 13, 15, 20] has each station define a payoff it wants to maximize, while the underlying pricing scheme ensures that cooperative forwarding yields greater payoffs. A rational outcome of the stations' behavior, in the form of a Nash equilibrium [12], can then be predicted in a game-theoretic framework. Reputation systems such as CORE [18], CONFIDANT [5], or OCEAN [2] can in essence be considered a variety of marketbased solutions. Here, part of the payoff is a reputation a station owes to past forwarding cooperation. Verification of packet forwarding can be provided by a watchdog (WD) [16], a mechanism that promiscuously senses an adjacent station's transmissions and compares them with copies of packets sent to, and supposed to be forwarded by, that station. First-hand WD-based reputation measures can be disseminated to produce indirect reputation measures. The success of market-based or reputation system depends on how adequately the defined payoff reflects the stations' preferences (e.g., source packet throughput, energy expenditure, the ratio or linear combination of the two etc.). In this paper we describe a WD-controlled first-hand reputation system, where the stations are primarily after source packet throughput. In Sec. 2 we point to a difficulty not addressed by previous research, arising from packet anonymity. In short, the WD cannot tell an adjacent station's source packets from transit packets. This gives rise to selfish manipulation of a local admittance control (LAC) mechanism. In Sec. 3 we introduce a noncooperative LAC game and in Sec. 4 characterize its reachable Nash equilibria. Next, in Sec. 5 we propose a simple packet forwarding protocol by the name Decline and Force (D&F). The idea is to give a station under heavy transit traffic the right to decline to receive packets, as well as give adjacent stations the right to nevertheless force packets into that station if they are under heavy transit traffic themselves. Using an approximate performance model we show in Sec. 6 that, when properly configured, D&F may lead to a fair and efficient game outcome. Sec. 7 concludes the paper.

2 Forwarding Model and Selfish Behavior A MANET is modeled as a collection of N stations, each of which has a permanent identity announced to the other stations, implements agreed upon MAC and multihop routing protocols, is equipped with a WD, and may agree with each destination upon a full-packet encryption scheme. The latter feature allows packet anonymity in that no station other than the destination can determine a packet's source station. This protects user data and prevents traffic analysis attacks. Fig. 1 illustrates anonymous packet forwarding with WDs, assuming AODV-type routing [4] (DSR-type routing modified similarly as in [8] can also apply). In particular: • •

A pair of adjacent stations, n and m, establish a neighborhood relationship (by exchanging their identities and routing information). Station n transmits a packet to next-hop neighbor station m, appending to it n, m, and the destination station identity d (source station identity, in this case n, is encrypted along with the packet body).

318

• •



J. Konorski

Station m checks that it has a neighborhood relationship with n. If m = d, the packet is decrypted; otherwise n is replaced by m and m is replaced by next-hop neighbor station l, If m ≠ d, station n performs a WD check: it compares a sensed packet from station m with a retained packet copy; if no match is found within a predefined deadline B, the WD check is failed. To factor out MAC delays we express B as the number of packets transmitted by station m; this amounts to a station buffer limit of B packets. Based on the statistics of failed WD checks, station n may recognize station m as selfish and punish it by terminating the neighborhood relationship. packet transmission from m sensed at n m source

body

n

m

retained copy n

l

m

source

d

l

source

d

d

body

body packet transmitted along route from n to d

Fig. 1. Anonymous packet forwarding with WDs

The backlog limit B mandates local admittance control (LAC) to prevent buffer overflow, and the right to legitimately decline to receive a transit packet, announcing a current backlog of B (via a FULL primitive). LAC settings should ensure equal admittance rates at all stations. However, since packet anonymity blurs the distinction between source and transit packets sensed by the WD, nothing stops a station from unrestrained admission of source packets and issuing FULL primitives. Such behavior is undetectable (hence, unpunished) and selfish, as it yields a larger-than-fair source packet throughput. A possible remedy consists in (i) immediate termination of a neighborhood relationship if a failed WD check occurs without a prior FULL primitive, (ii) imposing a public-knowledge tolerable rate V* of FULL primitives that keeps a neighborhood relationship alive, and (iii) forcing a station to receive high enough transit traffic, at the same time giving it the right to legitimately decline to receive transit packets under heavy transit traffic.

3 LAC Game In the sequel we focus on a simple Drop-and-Throttle LAC mechanism [14], which admits a source packet only if the current backlog is below a. The LAC threshold a is set autonomously by each station (the smaller a, the less source traffic is admitted). Consider an N-player noncooperative LAC game where station n's feasible actions correspond to LAC thresholds an (1 ≤ an ≤ B −1). A LAC profile has the form (an a−n), where a−n = (am, m ≠ n) is the opponent profile. Station n payoff involves:

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks

• •

319

source throughput S[an a−n] (the source packet admittance rate), and a fullness measure V[an a−n] (the proportion of time with backlog B).

We take V[an a−n] ≥ V* as the condition of termination of all neighborhood relationships involving station n. Note that V[an a−n] thus measures station n's first-hand reputation. Let 1C be the indicator function (1 if C is true and 0 otherwise) and define station n payoff as:

payoff n [an a− n ] = S[an a− n ] ⋅ 1V [ a

n

a − n ] 0 for n = 1,…,N and inefficient otherwise (in the latter, some station(s) find all their neighborhood relationships terminated). According to game theory [12], selfish stations reach a Nash equilibrium (NE), where no station has an incentive to change its LAC threshold unilaterally.

Definition 1. A NE is a LAC profile [a no a −o n ] such that for n = 1,…,N and any an, payoff n [ano a−o n ] ≥ payoff n [an a−o n ] .

(2)

We now formalize reachability of a NE. To this end, we model dynamic scenarios of the LAC game, where each station n adjusts an to seek a maximum payoff. Assumption 2. The LAC threshold adjustment mechanism is (i) gradual − a change of an by ±j causes a payoff change at any other station equivalent of j consecutive changes of an by ±1, and (ii) prompt − any other station becomes aware of, and can react to, each of these j changes before the next one takes effect. Part (i) reflects updating of S[an a−n] and V[an a−n] via a low-pass filter, hence without abrupt changes. Part (ii) is an idealization that limits our interest to unit changes of LAC thresholds. We thus assume that the stations move sequentially, each changing its LAC threshold by ±1, which immediately yields station payoffs corresponding to the new LAC profile. Moreover, the stations act in quasi-unison, none lagging behind more than one move. We model the LAC game in the extensive form to account for the order and timing of the stations' moves (a move consists in an adjustment of the LAC threshold). Given past play we only specify a set of stations on move, rather than a single station; yet owing to Assumption 2 no information sets [12] need to be specified.

320

J. Konorski

Definition 2. A noncooperative LAC game is a quadruple ({1,…,N}, A, payoff, move), where A is the set of feasible LAC thresholds, payoff: AN → RN, and move: Π → 2{1,…,N} determines the order of moves; Π is the set of feasible play paths and 2{1,…,N} is the powerset of {1,…,N}. A feasible play path has the form πk = ((a0, M0),…, (ak, Mk)), where ak = (a1k ,..., aNk ) ∈ AN and ∅ ≠ Mk ⊆ move(πk−1); ank ≠ ank −1 implies that n ∈ Mk and ank = ank −1 ± 1 . By convention, move(π0) = {1,…,N}. A play path specifies the LAC profiles in successive "rounds" of moves along with the sets M0, M1, … of stations that moved. These are arbitrary subsets of the sets move(π0), move(π1), … of stations on move. The timing of moves within Mk is not specified − the stations may move simultaneously, one by one, or subset by subset. Let vn(k) be the number of times station n has changed its LAC threshold on the play path πk. The quasi-unison specification is: move(πk) = {n = 1,..., N | vn (k ) = min1≤ m ≤ N vm (k )}.

(3)

Definition 3. A best-reply strategy of station n is a function σ nBR : Π → A with ⎧⎪ ∈ arg max a: |a −a k |≤1 payoff n [a a −k n ], if n ∈ move(π k ) ∩ M k +1 n ⎪⎩= ank , otherwise.

σ nBR (π k )⎨

(4)

where k = 0,1,… That is, each station seeks a best move given the past play path and current opponent profile. Fig. 2 shows a 5-station LAC game scenario under best-reply strategies, indicating upon each "round" the sets of stations on move and those actually moving, as well as the current LAC profile; it is assumed that V[2 (2…2)] < V* and V[3 (2…2)] ≥ V*. LAC threshold NE

station 1

1



1



1

+

2



2

station 2

1

+

2

×

2

×

2



2

×

not on move

station 3

1



1



1

+

2

+2



on move & not moving

station 4

1

+

2

×

2

×

2

+

2

+

on move & moving

station 5

1



1

+

2

2



2

×

M0 = {2,4} M1 = {5} M2 = {1,3} M3 = {3,4} "rounds" of moves

Fig. 2. LAC game scenario under best-reply strategies

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks

321

Definition 3. Given a0, a reachable NE is a LAC profile [a1o ,..., aNo ] such that (i) the set Π contains a play path π = ((a0, M0),…, (ak, Mk), (ak+1, Mk+1),…) with ak = [ano a−o n ] for some finite k, and (ii) ano ∈ arg max1≤ a ≤ B −1 payoff n [a a−k n ] for n = 1,…,N. Condition (ii) implies that a NE reached in the kth "round" persists later on; indeed, by (4) and Assumption 1, σ nBR (π k ′ ) ≡ ano for all k' ≥ k. E.g., (2,…,2) is a NE in Fig. 2.

4 Reachability of Nash Equilibria Suppose an efficient LAC profile a0 = [a0 (a0 ... a0)] initially prevails e.g., is negotiated at network setup. (We continue to indicate an arbitrarily chosen station's LAC threshold and the opponent profile.) If the MANET topology and traffic pattern are symmetric then a0 is also fair. For readability we further write payoff[an a−n] instead of payoffn[an a−n]. Hence, payoff[a + 1 (a ... a)] denotes the payoff to a station whose LAC threshold is a + 1, all the other stations setting a. The proposition to be presented shortly states a necessary and sufficient condition for any reachable NE to be fair and efficient; moreover, its proof helps categorize any other reachable Nash equilibria if the condition is not fulfilled. In particular, the reachable outcomes of the game can be characterized as follows: • each station receives a positive payoff and has no incentive to change its LAC

threshold (a fair and efficient NE), or • each station receives a zero payoff and finds no incentive to change its LAC

threshold as all neighborhood relationships are terminated (a fair and inefficient NE), or • a timing game (a "war of preemption" or a "war of attrition") arises after some play 1 path, leading to an unfair NE.

Proposition 1. Fix V* such that V[a (a ...a)] < V* and V[a + 1 (a + 1 ... a + 1)] ≥ V* for some a, i.e., payoff[a (a ...a)] > 0 and payoff[a + 1 (a + 1 ... a + 1)] = 0. Under best-reply strategies, [a (a ...a)] is the only (and fair and efficient) NE reachable from a0 if an only if payoff[a + 1 (a ... a)] = 0.

(5)

Proof: Let us show first that [a (a ...a)] is reachable from the initial LAC profile a0. If payoff[a0 + 1 (a0 + 1 ... a0 + 1)] = 0 then the assertion is immediate, so assume the opposite: payoff[a0 + 1 (a0 + 1 ... a0 + 1)] > 0. By Assumption 1, payoff[a0 + 1 (a0 + 1 ... a0 + 1 a0 ... a0)] > payoff[a0 + 1 (a0 + 1 ... a0 + 1)] > 0.

1

In a timing game (see [12] for suggestive examples), a player moves at most once and initially all players have incentives to move. In a "war of preemption" moving early yields higher payoffs than moving late or not at all, whereas in a "war of attrition" the converse is true.

322

J. Konorski

Consequently, payoff[a0 + 1 (a0 + 1 ... a0 + 1 a0 ... a0)] > payoff[a0 (a0 + 1 ... a0 + 1 a0 ... a0)] regardless of the number of (a0 + 1) entries in the opponent profile. I.e., regardless of how many stations have already increased their LAC thresholds, any other one has an incentive to do so. This implies that any play path starting at a0 and conforming to (3) and (4) contains an ak = [a0 + 1 (a0 + 1 ... a0 + 1)]. Fig. 3a illustrates this type of scenario using a conceptual payoff vs. LAC threshold plot: with each station finding its payoff higher upon increasing its LAC threshold, all of them end up at a symmetric and higher LAC profile. Continuing along any play path we arrive at al = [a (a ...a)] with move(πl) = {1,…,N} such that payoff[a (a ...a)] > 0 and payoff[a + 1 (a + 1 ... a + 1)] = 0. Assume that (5) holds, implying payoff[a (a ...a)] > payoff[a + 1 (a ... a)]. Using Assumption 1, we also find that payoff[a (a ...a)] > payoff[a − 1 (a ...a)], as illustrated in Fig. 3b. Thus for any n, σ nBR (π l ) ≡ a and so [a (a ... a)] is a fair and efficient NE, which proves the "if" part. To prove the "only if" part we assume payoff[a + 1 (a ...a)] > 0 and give examples of play paths conforming to (3) and (4) that lead to inefficient and/or unfair NE. Since now payoff[a (a ...a)] < payoff[a + 1 (a ...a)], each station n ∈ move(πl) has an incentive to set a + 1 i.e., σ nBR (π l ) = a + 1 . Imagine a continuation of the play path πl with Ml+1 = {1}, Ml+2 = {2}, …, Ml+N = {N} (clearly, it conforms to the quasi-unison constraint (3)). One concludes that a "war of preemption" results, in which some stations will have set a + 1 and some will not. Indeed, σ 1BR (π l ) = a + 1 , that is, station 1 finds payoff[a (a ... a)] < payoff[a + 1 (a ... a)] (recall that payoff[a + 1 (a ...a)] > 0). However, σ NBR (π l + N −1 ) = a i.e., station N, the last that contemplates setting a + 1, will find no incentive to do so since payoff[a (a + 1 ... a + 1)] ≥ payoff[a + 1 (a + 1 ... a + 1)] = 0. In the meantime, a number of stations besides station 1 (say 2 through l') may have set a + 1, namely those that found payoff[a (a + 1 ... a + 1 a ... a)] < payoff[a + 1 (a + 1 ... a + 1 a ... a)]. The resulting LAC profile al+l' = [a + 1 a + 1 ... a + 1 a ... a] (with l' entries equal to (a + 1)) is an unfair and efficient NE if payoff[a + 2 (a + 1 ... a + 1 a ... a)] = 0; otherwise the play path continues to eventually reach an unfair and efficient NE where some or all of the l' stations will have set a + 2 or larger. Either way, stations 1 through l' receive a larger payoff than the others. In Fig. 3c; the upward arrow symbolizes some stations setting a higher LAC threshold upon finding that payoff[a+ 1 (a ...a)] > payoff[a (a ...a)], whereas the other arrow symbolizes the rest finding payoff[a (a + 1 ... a + 1)] ≥ payoff[a + 1 (a + 1 ... a + 1)] and so staying at a. Still assuming payoff[a + 1 (a ...a)] > 0, consider another scenario starting with the same play path πl as before. Let Ml+1 = {1,…,N}, i.e., all the stations set a + 1 almost simultaneously (in the same "round" of moves), producing al+1 = [a + 1 (a + 1 ... a + 1)]. Since payoff[a + 1 (a + 1 ... a + 1)] is zero, so is payoff[a + 2 (a + 1 ... a + 1)] (by Assumption 1); hence no station will find an incentive to further increase its LAC threshold. However, decreasing it may be worthwhile. Consider two cases:

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks

323

1) payoff[a (a + 1 ... a + 1)] = 0. No station has an incentive to change its LAC threshold, therefore [a + 1 (a + 1 ... a + 1)] is a fair and inefficient NE with all neighborhood relationships terminated. This type of scenario is illustrated in Fig. 3d; it is similar to that in Fig. 3a except that the resulting payoffs are zero. 2) payoff[a (a + 1 ... a + 1)] > 0. Now each station has an incentive to decrease its LAC threshold i.e., set a. We can continue along the play path assuming Ml+2 = {1}, Ml+3 = {2}, …, Ml+N+1 = {N} and reason similarly as before to conclude that a "war of attrition" results, in which some stations will have decreased their LAC thresholds and some will not. Indeed, the first station to set a (station 1) does so because payoff[a (a + 1 ... a + 1)] > payoff[a + 1 (a + 1 ... a + 1)] = 0, whereas the last one (station N) finds no incentive to do so because payoff[a (a ... a)] < payoff[a + 1 (a ... a)]. Hence an unfair and efficient NE is reached: those stations that have set a receive payoff[a (a + 1 ... a + 1 a ... a)], whereas the other stations receive a higher payoff[a + 1 (a + 1 ... a + 1 a ... a)]. This scenario is illustrated in Fig. 3e: initial incentives to move upwards bring all stations' payoffs to zero; subsequent incentives (or lack thereof) make some of them stay at a + 1, whereas the others go back to a for a higher payoff. Of the above outcomes of the LAC game, a fair and efficient NE is the only desirable. Proposition 1 is mainly of cautionary value, as it shows that the other outcomes are reachable too, even though the MANET topology, traffic pattern, and initial LAC profile are symmetric.

(a)

(c)

(b)

a

a+1 z

payoff[z+1 (z…z)]

a z

(e)

(d)

a+1

a z

payoff[z (z…z)]

a

a+1 z

a

a+1 z

payoff[z−1 (z…z)]

Fig. 3. LAC game payoff vs. LAC threshold; see text for explanation of scenarios a through e

5 D&F Protocol A packet forwarding protocol should give each station n under heavy transit traffic the right to legitimately decline to receive packets, and each neighbor station the right to legitimately force packets into station n. In the Decline and Force (D&F) protocol presented below, executing these rights is discretional and linked to local conditions. A station operates in the NORMAL, CONGESTED, or FULL mode, depending on whether the danger of a failed WD check is perceived as remote, incipient, and immediate, respectively. A CONGESTED station legitimately declines to receive transit

324

J. Konorski

packets, which a CONGESTED or FULL neighbor station can disregard and force a transit packet transmission; a FULL station receives no transit traffic. Thus D&F assists a CONGESTED station in reducing inbound transit traffic, and prevents failed WD checks at a FULL station. (A destination packet can always be received as it is not subject to a WD check.) It is therefore vital that the current mode be announced to all neighbor stations via mode primitives, either broadcast in special control packets or piggybacked on data packets. The proportion of time a station remains FULL i.e., the fullness measure V, is thus known to each neighbor station (no record of the CONGESTED mode has to be kept). Observe that having forwarded a transit packet (and thus passed the WD check at a 2 neighbor station), a station has no incentive to attempt a retransmission. Consequently, D&F must stipulate that (i) every transmitted packet be received (no retransmissions) and (ii) every received transit packet be forwarded (no failed WD checks). Condition (i) implies that a NORMAL station must know about a neighbor station's mode change to CONGESTED or FULL (and a CONGESTED station about a neighbor station's mode change to FULL) in time to suspend an intended packet transmission; accordingly, expedited transmission of CONGESTED and FULL primitives is prescribed. Condition (ii) can be enforced by imposing severe punishment for a failed WD check; D&F prescribes immediate termination of the relevant neighborhood relationship even if V < V*. Both prescriptions can be replaced by some tolerance levels to account for transmission errors and misinterpretation of the mode primitives (to keep the description simple we assume zero tolerance). Any systematic violation of D&F rules is counterproductive. Indeed, • Failure of station n to announce a NORMAL to CONGESTED change forgoes the right to legitimately decline to receive packets; ultimately, to avoid failed WD checks, station n would have to announce FULL mode, worsening its fullness measure V. Similar failure in the case of a CONGESTED to FULL change directly leads to failed WD checks and the termination of station n's neighborhood relationships. Conversely, failure to act upon a FULL to CONGESTED change unnecessarily worsens V, whereas similar failure upon a CONGESTED to NORMAL change drives the neighbor stations CONGESTED and enables them to force traffic into station n, soon bringing about FULL mode. • Illegitimate declining to receive a packet from, or forcing a packet into, a neighbor station terminates the neighborhood relationship upon a failed WD check. Both these violations are counterproductive for obviously a station wants to maintain all neighborhood relationships for which V < V* and none for which V ≥ V*. • Finally, failure to acknowledge a sensed packet (as if the packet was not sensed or was received in error) causes no retransmission or a failed WD check. A possible remedy consists in treating such events as failed WD checks if their frequency distinctly exceeds the estimated statistics of channel errors. Being free to set the mode on its own, each station pursues a maximum payoff (1). Using an approximate analysis we will show that the principle of mode setting decides the type of reachable NE and for some V* may cause the LAC game to reach a fair and efficient NE. 2

See [20] for a (costly) alternative method to include retransmissions in a reputation measure.

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks

325

6 LAC Game Payoffs Under D&F We have seen that the outcome of the LAC game is determined by the payoffs for LAC threshold profiles of the form [a' (a ... a)] with |a' − a| ≤ 1. These will now be approximately calculated for a symmetric topology and traffic pattern, where each station has M neighbor stations, and the average source-to-destination route length is H hops. Let the station mode be solely determined by its current backlog x (x ≤ B) and a D&F threshold e: NORMAL if x < e, CONGESTED if e ≤ x < B, and FULL if x = B. Furthermore, (A1) transit packets follow a stationary Poisson process with exponentially distributed transmission times whose mean 1/μ includes MAC delays, (A2) a station can simultaneously transmit and sense a packet, (A3) mode primitives are issued and transmitted with negligible delay, (A4) the rate of packet corruption due to channel errors is negligible, and (A5) a station admits as many source packets as its LAC threshold permits. Even though the model and the principle of mode setting are simplistic, our objective is to discuss reachability of NE rather than accurate performance measures. (A1) and (A2) enable a birth and death approximation [10] and by (A3) the birth and death coefficients only depend on current modes at neighbor stations. (A2) implies separate channels for transmission and reception; we stick to this assumption to avoid shifting the focus to multiple access. (A5) reflects that selfishness is a concern primarily under heavy traffic. Consider a symmetric LAC threshold profile [a (a…a)]. Following the "isolated node" approach [1], we focus upon a station n, where the backlog follows a birth and death process with steady-state probability distribution (pX(x), a ≤ x ≤ B) . (Because of (A5), it never drops below a.) Let current backlog at each of M neighbor stations be drawn from the steady-state probability distribution (pY(x), a ≤ x ≤ B) . This determines the birth and death coefficients at station n as explained below We thus seek a dependence of the form pX(⋅) = f[pY(⋅) ]; by the model symmetry this becomes a fixpoint-type relationship p(⋅) = f[p(⋅)] to be solved iteratively for p(⋅). Denote the birth and death coefficients conditional on x by αx and βx (βa, αe−1, and αB−1 signify admission of a source packet, and mode change to CONGESTED and FULL, respectively). Recalling D&F operation we express βx through pY(⋅): ⎧ ⎡ 1 ⎞⎤ ⎛ ⎪ μ ⋅ ⎢1 − PY (e) ⋅ ⎜1 − ⎟⎥, if x < e ⎪ ⎝ H ⎠⎦ βx = ⎨ ⎣ ⎡ ⎪μ ⋅ 1 − p ( B ) ⋅ ⎛⎜1 − 1 ⎞⎟⎤, if x ≥ e, Y ⎥ ⎪⎩ ⎢⎣ ⎝ H ⎠⎦

(6)

where PY(e) = Σe≤y≤B pY(y) is the probability that a neighbor station is CONGESTED or FULL, and 1/H is the fraction of packets whose destination is next-hop. To approximate αx, let β C = ∑ e ≤ x ≤ B pY ( x) ⋅ β x and β N = ∑ a ≤ x < e pY ( x) ⋅ β x be the average death coefficients at a CONGESTED and NORMAL neighbor station, respectively (by putting βx we take advantage of the model symmetry). The flow of packets into CONGESTED station n only consists of forced and destination packets,

326

J. Konorski

whereas NORMAL station n receives all packets for which it is the next-hop station (1/M of the traffic from each of M neighbor stations). Since the fraction 1/H of traffic received at station n are destination packets, which do not contribute to αx, we have for x = 0,…,B − 1: ⎧β , if x ≥ e 1⎞ ⎪ C ⎛ α x = ⎜1 − ⎟ ⋅ ⎨ ⎝ H ⎠ ⎪( β + β ), if x < e. C ⎩ N

(7)

Standard calculation [10] now yields:

pX ( x ) = C ⋅

α a ⋅ α a +1 ⋅ ... ⋅ α x −1 , β a +1 ⋅ β a + 2 ⋅ ... ⋅ β x

(8)

where C is a normalization constant. Further calculation is performed iteratively: assuming some initial pY(⋅) calculate pX(⋅) and in the next iteration substitute for pY(⋅) until pX(⋅) ≈ pY(⋅) ≈ p(⋅). The fullness measure follows directly and for the source throughput we use the observation that a source packet is admitted each time a death occurs at backlog a. Hence, S[a (a ... a)] = β a ⋅ p(a) and V [a (a ... a)] = p( B ).

(9)

To calculate payoff[a' (a ... a)], where a' = a ± 1, calculate pX′ ( x) by putting a := a' in (8), with the αx and βx obtained by substituting PY(e) = Σe≤y≤B p(y), and next reapply (9) i.e., take S[a′ (a ... a)] = β a ′ ⋅ p′X (a′) and V [a′ (a ... a)] = p′X ( B ). In doing so, we assume that the interaction of the other stations with a single station setting a different LAC threshold does not affect their birth and death processes significantly. Numerical experiments confirm the validity of Assumption 1 under D&F. Sample results are depicted in Fig. 4 for various LAC and D&F thresholds (source throughput is normalized with respect to the maximum attainable value Mμ; discrete points are joined by solid lines for readability). In view of (A5) it is logical to consider only a < e, as otherwise there would be no packet transmissions other than forced. Taking e = 4 yields a fair and efficient NE at a = 3, with normalized payoff S[3 (3 ... 3)] = 39.5%; this coincides with the highest fair payoff attainable in a cooperative setting. Taking e = 3 leads to zero payoffs and the termination of all neighborhood relationships (fair payoffs of 39.5% could be attained in a cooperative setting), while e = 5 leads to a timing game (cf. Fig. 3c,e) with unfair payoffs ranging from 29.4% to 50.4%. (fair payoffs of 39.3% could be attained if all stations were cooperative and stuck to a = 3). The outcome of the LAC game varies with e and V*. E.g., if we increase V* twofold (V* = 0.01) then e = 5 and e = 7 yield fair and efficient Nash equilibria and no e yields a timing game. One may look for V* yielding a fair and efficient NE. This brings into consideration a quality we refer to as robustness. Let a fair and efficient NE occur at [a (a ... a)]. Then V[a (a ... a)] < V* and V[a + 1 (a ... a)] ≥ V* (cf. Fig. 3b). Since V is a statistical average and V* is typically low in magnitude, the

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks

80

80 e =3

e =4

60 payoff (%)

60 payoff (%)

327

39.5%

40 20

39.5%

40 20

0

0 1

2

3

a

4

5

1

2

a

3

4

5

80 e =5

payoff (%)

60

50.4%

40

29.4%

20 0 1

2

3

a

4

5

Fig. 4. LAC game payoffs for various e; M = 3, H =3, B = 8, V* = 0.005, marking as in Fig. 7 50

20

e =2

40

10

e =4 e =5

r

r

e =3

30

e =6

20

e =6

e =5

10

e =4

0 0.0001

0.001

V*

0.01

0.1

0 0.0001

0.001

V*

0.01

0.1

10 e =7

r

e =3 e =6

0 0.0001

0.001

V*

0.01

0.1

Fig. 5. Robustness for various types of NE; top left: fair and efficient, top right: inefficient, bottom: timing game; circled: suggested range of V*

values of V[a (a ... a)], V*, and V[a + 1 (a ... a)] ought to be far enough apart, for each station to credibly perceive the payoff it is receiving and avoid accidental departure from the NE. To assess statistical credibility, the distance between V[a (a ... a)] and V* should be expressed as a multiple of the standard deviation of sample V[a (a ... a)]; the latter is roughly proportional to the square root of V[a (a ... a)]. Similar conclusion applies to the distance between V* and V[a + 1 (a ... a)]. Therefore, the lesser of the two relative distances measures the robustness r of the NE:

328

J. Konorski

⎧⎪V * −V [a (a ... a)] V [a + 1 (a ... a)] − V * ⎫⎪ r = min ⎨ , ⎬, V [a + 1 (a ... a)] ⎪⎭ ⎪⎩ V [a (a ... a )]

(10)

and the interesting range of V* is where r > 0. For inefficient NE the three values to be far enough apart are V[a + 1 (a ... a)], V*, and V[a (a + 1 ... a + 1)], and for the timing game, V[a + 1 (a ... a)], V*, and V[a (a + 1 ... a + 1)], the rest of the argument being similar. Fig. 5 plots r versus V* for the three types of NE. Note that high and wide-ranged robustness is only desirable in the case of fair and efficient NE. The plots therefore indicate preferable V*, namely those for which an e* exists yielding high r in the top left plot and no high r in the other two (a suggested range of V* is circled, e* = 6).

7 Conclusion MANET stations are understandably reluctant to forward transit packets given that it consumes the energy and bandwidth needed by source packets. One way of incentivizing packet forwarding is to define a WD-controlled reputation system where a station should not decline to receive transit packets too often lest its neighborhood relationships be terminated. However, stations under heavy transit traffic should be exempted from this rule, which selfish stations may abuse by setting their LAC thresholds so as to admit larger-than-fair source packet traffic. With packet anonymity allowed, source and transit packets are indistinguishable to a neighbor station's WD, hence the selfishness cannot be detected or punished. Using simple MANET and game models we have characterized reachable Nash equilibria of the resulting LAC game. In a symmetric MANET, the desirable type of NE is a fair and efficient one, where each station gets the same source packet throughput and no neighborhood relationship is terminated. In this context, a class of packet forwarding protocols worth studying is D&F, where each station is given the right to decline to receive transit packets, as well as the right to force transit packets into a neighbor stations on some conditions. While in principle these conditions can be quite sophisticated, decided autonomously, and possibly time-varying, we have demonstrated that the type of NE can be controlled even with a single decision parameter (the D&F threshold). Research challenges for the future involve D&F optimization under more general principles of mode setting, and for more realistic MANET, traffic, and LAC game models. These are being studied in both simulation and real-world environments.

Acknowledgment This work was supported by the Ministry of Education and Science, Poland, under Grant 1599/T11/ 2005/29.

An Incentive-Based Forwarding Protocol for Mobile Ad Hoc Networks

329

References 1. Agnew, G.B., Mark, J.W.: Performance Modeling for Communication Networks at a Switching Node. IEEE Trans. Comm. COM-32, 902–910 (1984) 2. Bansal S., Baker M.: Observation-based Cooperation Enforcement in Ad hoc Networks (2003), http:arxiv.org/abs/cs.NI/0307012 3. Ben Salem, N., Buttyan, L., Hubaux, J.-P., Jakobsson, M.: A Charging and Rewarding Scheme for Packet Forwarding in Multi-hop Cellular Networks. In: Proc. ACM Symposium on Mobile Ad Hoc Networking and Computing MobiHoc’03, Annapolis MD (2003) 4. Broch, J., Johnsonn, D., Maltz, D.: The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks, IETF Internet Draft (1998) 5. Buchegger, S., Le Boudec, J.-Y.: Performance Analysis of the CONFIDANT Protocol (Cooperation Of Nodes Fairness In Dynamic Ad-hoc NeTworks), Tech. Rep. IC/2002/01, Swiss Federal Institute of Technology (2002) 6. Buttyan, L., Hubaux, J.-P.: Nuglets: A Virtual Currency to Stimulate Cooperation in SelfOrganized Mobile Ad-Hoc Networks, Tech. Rep. DSC/2001/001, Swiss Federal Institute of Technology (2001) 7. Buttyan, L., Hubaux, J.-P.: Stimulating Cooperation in Self-Organizing Mobile Ad Hoc Networks. J. Mobile Networks and Applications 8, 579–592 (2003) 8. Cheng, Y., Agrawal, D.P.: Distributed Anonymous Secure Routing Protocol in Wireless Mobile Ad Hoc Networks, http:www.ececs.uc.edu/ cdmc/OPNETWORK_Yi.pdf 9. Felegyhazi, M., Hubaux, J.-P., Buttyan, L.: Nash Equilibria of Packet Forwarding Strategies in Wireless Ad Hoc Networks. IEEE Trans. Mobile Computing 5, 1–14 (2006) 10. Feller, W.: An Introduction to Probability Theory and its Applications. J. Wiley and Sons, New York (1966) 11. Fratkin, E., Vijayaraghavan, V., Liu, Y., Gutierez, D., Li, T.M., Baker, M.: Participation Incentives for Ad Hoc Networks, www.stanford.edu/#y1314/adhoc 12. Fudenberg, D., Tirole, J.: Game Theory. MIT Press, Cambridge (1991) 13. Ileri, O., Mau, S.-C., Mandayam, N.B.: Pricing for Enabling Forwarding in SelfConfiguring Ad Hoc Networks. IEEE J. Selected Areas Commun. 23, 151–162 (2005) 14. Kamoun, F.: A Drop-and-Throttle Flow Control Policy for Computer Networks. IEEE Trans. Comm. COM-29, 444–452 (1981) 15. Marbach, P.: Cooperation in Wireless Ad Hoc Networks: A MArket-BasedApproach. IEEE/ACM Trans. Networking 13, 1325–1338 (2005) 16. Marti, S., Giuli, T.J., Lai, K., Baker, M.: Mitigating Routing Misbehavior in Mobile Ad Hoc Networks. In: Proc. 6th Annual Conf. on Mobile Computing and Networking MobiCom 2000, Boston MA (2000) 17. Michiardi, P., Molva, R.: Making Greed Work in Mobile Ad Hoc Networks, Res. Rep. RR-02-069, Institut Eurecom, Sophia-Antipolis (2002) 18. Michiardi, P., Molva, R.: CORE: A Collaborative Reputation Mechanism to enforce node cooperation in Mobile Ad hoc Networks. In: Proc. 6th IFIP Conf. on Security Comm. and Multimedia (CMS 2002), Portoroz, Slovenia (2002) 19. Srinivasan, V., Nuggehalli, P., Chiasserini, C.F., Rao, R.R.: An Analytical Approach to the Study of Cooperation in Wireless Ad Hoc Networks. IEEE Trans. Wireless Commun. 4, 722–733 (2005) 20. Wang, Y., Giruka, V.C., Singhal, M.: A Fair Distributed Solution for Selfish Nodes Problem in Wireless Ad Hoc Networks. In: Nikolaidis, I., Barbeau, M., Kranakis, E. (eds.) ADHOC-NOW 2004. LNCS, vol. 3158, Springer, Heidelberg (2004) 21. Zhong, S., Chen, J., Yang, Y.R.: Sprite: A Simple, Cheat-Proof, Credit-Based System for Mobile Ad-Hoc Networks. In: Proc. INFOCOM 2003, San Francisco (2003)

Providing Seamless Mobility Using the FOCALE Autonomic Architecture John Strassner, Dave Raymer, and Srini Samudrala Motorola Labs, 1301 East Algonquin Road, MS IL02-2240 Schaumburg, IL 60196 USA {john.strassner, david.raymer, srini.samudrala}@motorola.com

Abstract. Existing wireless networks have little in common, as they are designed around vendor-specific devices that use specific radio access technologies to provide particular functionality. Next generation networks seek to integrate wired and wireless networks in order to provide seamless services to the end user. Seamless Mobility is an experiential architecture, predicated on providing mechanisms that enable a user to accomplish his or her tasks without regard to technology, type of media, or device. This paper examines how autonomic mechanisms can satisfy some of the challenges in realizing seamless mobility solutions. Keywords: autonomic communications, autonomic networking, network management, Seamless Mobility.

1 Introduction Current voice and data communications networks are difficult to manage, as exemplified by the stovepipe systems that are common in Operational and Business Support Systems [1]. This is due to the desire to incorporate best of breed functionality, which prohibits the sharing and reuse of common data [2], resulting in the ability to manage the increase in operational, system, and business complexity. For example, management of wireless operations requires the analysis and interpretation of diverse management data from different sources to provide a machineinterpretable view of system quality as perceived by the end user [3][4]. Management and optimization of wireless systems require mechanisms that are mostly specific to a particular type of radio access technology (RAT). Unfortunately, current RATs use a set of non-compatible standards and vendor-specific functionality. This is exacerbated by current trends, such as network convergence (which combine different types of wired and wireless networks), as well as future multi access mode devices [5] and cognitive networks [6], in which the type of network access can be dynamically defined. The vision of Seamless Mobility [7] is even more ambitious – the ability for the user to get and use data independent of access mode, device, and media. Note that while handover between multiple network technologies is challenging, the hard part of Seamless Mobility is in maintaining session continuity. This paper describes ongoing research in supporting the needs of Seamless Mobility through the use of a novel autonomic networking architecture called FOCALE. The organization of this paper is as follows. Section 2 describes the vision of Seamless Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 330 – 341, 2007. © Springer-Verlag Berlin Heidelberg 2007

Providing Seamless Mobility Using the FOCALE Autonomic Architecture

331

Mobility. Section 3 defines autonomic networking, and how it differs from autonomic computing. Section 4 describes the FOCALE architecture in detail, and Section 5 shows how FOCALE meets the needs of wired and wireless network management. Section 6 describes how FOCALE can be used to implement Seamless Mobility. Section 7 concludes the paper.

2 The Vision of Seamless Mobility Businesses are gaining competitive advantage through innovative applications that empower their increasingly mobile employees and customers, and end-users are now starting to expect a world of easy, uninterrupted access to information, entertainment, and communication across diverse environments, devices and networks. Businesses want anywhere, anytime communications to provide enhanced productivity to their workforce. Consumers are equally eager for personalized services that make it easy to access and share digital content when, where and how they want it. Network operators seeking an edge in a changing marketplace are exploring new approaches to delivering this content in a personalized, timely and cost-effective manner. Seamless Mobility, and its vision of seamless service delivery, requires significant changes to existing wired and wireless network management systems. For example, when handover from one wireless system to another wired or wireless system is performed, a “seam”, or discontinuity, is created that interrupts the continuity of application experience. Motorola’s vision of Seamless Mobility is to provide simple, uninterrupted access to any type of information desired at any time, independent of place, network and device. Seamless Mobility is dependent on the underlying business models (the willingness to pay for it by the user); this is why it revolves around an experiential architecture that captures the current context of what the user is doing, so that services that the user desires can be optimized. In earlier work, we have developed a novel context model that provides a first step in solving this difficult problem that is part of our FOCALE [12] architecture; this will be explained in more detail in Section 5.

3 Salient Features of Autonomic Networking The purpose of autonomic computing and autonomic networking is to manage complexity. The name “autonomic” was chosen to reflect the function of the autonomic nervous system in the human body. By automating more manual functions (e.g., treating them as functions under involuntary control), additional resources (human and otherwise) are made available to manage higher-level processes. Most current autonomic computing architectures look like that shown in Figure 1 [13][14], and focus on providing self-* functionality to IT components and systems. They consist of a Managed Element instrumented with Sensors and Effectors to send data to and receive commands from an Autonomic Element, which is part of an Autonomic Manager. The Autonomic Manager provides its own set of Sensors and Effectors to communicate with other Autonomic Managers. The Autonomic Element implements a simple control loop: it uses sensors to retrieve data, which is then

332

J. Strassner, D. Raymer, and S. Samudrala

analyzed to determine if any correction to the Managed Resource being monitored is Autonomic Element needed (e.g., to correct “non-optimal”, “failed” or “error” states). If so, then those Analyze Plan corrections are planned, and appropriate actions are executed using effectors that Execute Monitor Knowledge translate commands back to a form that the Managed Resource can understand. Effectors Sensors The motivation behind autonomic networking is twofold: (1) to perform Managed Resources manual, time-consuming network tasks (such as configuration management) on Fig. 1. Autonomic Building Block behalf of the network administrator, and (2) to pre- and post-process data to enable the system and the administrator to work together to perform higher-level cognitive functions, such as planning and optimization of the network. The FOCALE autonomic networking architecture is specifically designed for use with heterogeneous wired and wireless networks. While it builds on autonomic computing concepts, it introduces several novel changes into an autonomic architecture in order to cope with the problems described in Section 1, and are summarized below. Effectors

Managed Element

Autonomic Manager

Sensors

¾ Inability to relate current business needs to network services and resources delivery ¾ Multiple control mechanisms, each having different side effects and resource requirements, are applied to the same “path” ¾ Heterogeneous information sources, each with their own organization, structure, and semantics to their own management information, must be correlated to infer problems and solutions ¾ Heterogeneous languages per network element, meaning that correlation between commands issued and data found must be deduced, not looked up In current environments, user needs and environmental conditions can change without warning. Therefore, the system, its environment, and the needs of its users must be continually analyzed with respect to business objectives. FOCALE uses inferencing to instruct the management plane to coordinate the (re)configuration of its control loops in order to protect the current business objectives. In addition, FOCALE uses modeling and ontologies to develop representations of Managed Resources that are inherently extensible, so that the knowledge of a Managed Resource can be updated over time. This work is novel. The second difference is a result of converged networks. The control mechanisms of wired and wireless networks are very different. Autonomic computing efforts to date do not consider this – rather, they assume a set of largely homogeneous computing resources. FOCALE addresses this through a novel knowledge fusion approach (see section 4.3) that relates diverse functionality in different managed entities to each other. The last two points are usually not found in autonomic computing applications, which typically use the same sensors and effectors to get data and send commands. FOCALE solves this by using a model-based translation layer, which is described in more detail in Section 5.3.

Providing Seamless Mobility Using the FOCALE Autonomic Architecture

333

4 The FOCALE Architecture FOCALE stands for Foundation – Observation – Comparison – Action – Learn – rEason, which are six key principles required to support autonomic networking. These principles manage complexity while enabling the system to adjust to the changing demands of its users and environmental conditions. Basic operation is as follows. Assume that behavior can be defined using a set of state machines, and that the configuration of each device is determined from this information. FOCALE is a closed loop control system, in which the current state of the managed element is calculated from sensed data and compared to the desired state defined in the state machines. Any variance from the desired state is analyzed to ensure that business goals and objectives are still being met. If they are, the system will keep monitoring state (though it may need to change what is being monitored); if they aren’t, then the system executes a set of configuration changes to fix the problem(s). Equally important, the results of these changes are observed to ensure that the system reacted as expected. However, since networks are complex, highly interconnected systems, FOCALE introduces six novel modifications to current autonomic control loops: (1) the use of multiple control loops, (2) a combination of information models, data models, and ontologies are used to develop state machines for orchestrating behavior, (3) diverse information is normalized for gathering vendor-specific management data and issuing vendor-specific commands, (4) the functions of the control loop are changed based on context, policy, and the semantics of the data as well as the current management operations being processed, (5) reasoning mechanisms are used to generate hypotheses as to why the actual state of the managed element is not equal to its desired state, (6) learning mechanisms are used to update its knowledge base. 4.1 FOCALE’s Use and Adaptation of Multiple Control Loops Figure 2 shows FOCALE’s two types of control loops. The desired states of the Managed Resource are predefined in the appropriate state machines using business goals [15][16] [17]. In our case, we use Key Performance and Quality Indicators (KPIs and KQIs) of Service Level Agreements (SLAs) to define these business goals. These are modeled in DEN-ng [20], which enables each to be strongly related to the other. This is one example of translating business needs into network functionality. The top control loop (maintenance) is used when no anomalies are found (i.e., when either the current state is equal to the actual state, or when the state of the managed element is moving towards its intended goal). The bottom (adjustment) control loop is used when one or more reconfiguration actions must be performed. The use of multiple control loops (Figure 2 shows two for simplicity) enables FOCALE to provide more flexible management, and is fundamental to overcoming the limitations of using a single static control loop having fixed functionality. Since FOCALE is designed to adapt its governance model according to the context (see Section 4.4), FOCALE associates a given set of policies for each particular context. This set of policies determines the specific functionality that can be provided by the system, as well as defines the functionality and operation of each component of each control loop (note that this adaptation of system functionality could not be managed by using a static, non-changing control loop, as is done in the current state of the art). In addition,

334

J. Strassner, D. Raymer, and S. Samudrala

controlling the reconfiguration process must be able to have its YES functionality adapted to suit the Compare Actual State to Gather Sensor Match? Desired State Data vendor-specific needs of the differNO ent devices being adapted. For Managed Resource Define New Device example, even a standard protocol Configuration(s) like BGP cannot share the same Loop 2: Adjustment configuration among different vendors, due to vendor-specific implementation differences as well Fig. 2. Two of FOCALE’s Control Loops as different functionality that may or may not be part of a standard. As will be seen in Section 4.7, different components of FOCALE can each alter the function of the control loops according to the set of policies active for a particular context. Another use of multiple control loops is to protect the set of business goals and objectives of different constituencies (e.g., business users vs. programmers vs. architects vs. network operators). Each of these constituencies has different concepts, vocabularies, and understanding of the same function. Thus, the implementation of a “common” objective for these constituencies is different and sometimes in conflict. Therefore, FOCALE uses multiple control loops – having a single control loop to protect these objectives is simply not feasible. The reconfiguration process uses dynamic code generation based on models and ontologies [15][16][17][18]. This forms a novel control loop: context changes policies, which in turn change functionality through dynamic code generation that is orchestrated through a set of state machines. The policies are tightly linked to the model, which enables the model to be used both as a generic specification of functionality as well as an instance model. More specifically, sensor data is used to populate the state machines that in turn specify the operation of each entity that the autonomic system is governing. The management information that the autonomic system is monitoring signals any context changes, which in turn adjusts the set of policies that are being used to govern the system, which in turn supplies new information to the state machines. The state machines defines the (re)configuration commands required to achieve a particular state or set of states. Loop 1: Maintenance

4.2 FOCALE’s Behavioral Orchestration FOCALE uses information and data modeling to capture knowledge relating to network capabilities, environmental constraints and business rules. Unlike other approaches, we combine the knowledge from these models with a different type of knowledge – information from a set of ontologies; this produces an augmented set of data structures that, together with machine-based learning techniques, can be used to reason about this knowledge. Figure 3 shows how each activity in a business model can be represented by a set of classes that are then related to one or more Finite State Machines (FSMs). As the system changes, code is dynamically generated according to the appropriate FSM(s) to protect business goals. Knowledge embedded within

Providing Seamless Mobility Using the FOCALE Autonomic Architecture Business Process Model

Finite State Machine

DEN-ng Model

Code Generation

FOCALE Architecture

Fig. 3. The FOCALE Approach

335

system models will be used by policy management systems to automatically configure network elements in response to changing business goals and/or environmental changes. Policies help standardize how configuration changes are applied in relation to context and business goals. An important benefit of this approach is that it explicitly relates commands and sensor data to each other, thereby simplifying the management task. This enables a closed control loop to be formed, one in which the commands issued can be related to management data that can be queried for, thereby verifying that the issued commands had the desired effect.

4.3 FOCALE’s Management Data and Command Normalization Networks use vendor-specific devices that have varying functionality, as well as ones that implement the same functionality in different ways. This is why management standards such as SNMP are in and of themselves not enough to adequately manage networks. FOCALE associates one or more ontologies with its DEN-ng based [20] data and information models. This enables ontologies to represent relationships and semantics that cannot be represented using UML. For example, UML cannot represent the relationship “is similar to” because it doesn’t define logic mechanisms to enable this comparison. Note that this relationship is critical for heterogeneous end-to-end management, since different devices have different languages, programming models, and side effects [1], and administrators need to ensure that the same relative commands are given to devices having different languages. This is why we combine UML models with ontological data to synthesize new semantic capabilities. The autonomic manager uses ontologies to analyze sensed data to determine the current state of the managed entities being monitored. Often, this task requires inferring knowledge from incomplete facts. For example, consider the receipt of an SNMP alarm. The alarm in and of itself doesn’t provide the business information that the system needs. Which customers are affected by the alarm? Which SLAs of which customers are affected? FOCALE tries to determine, without human intervention, which SLAs of which customer are impacted by examining its libraries of model and ontology data. Once an SLA is identified, it can be linked to business information, which in turn can assign the priority of solving this problem. FOCALE uses a process known as semantic similarity matching [21] to establish additional semantic relationships between sensed data and known facts. This is required because, in this example, an SLA is not directly related in the model to an SNMP alarm. Inferencing is used to establish semantic relationships between the fact that an SNMP alarm was received and other facts that can be used to determine which SLAs and which customers could be affected by that SNMP alarm. Note that without the use of coherent information

336

J. Strassner, D. Raymer, and S. Samudrala

and data models, these facts could not be established; without augmenting this knowledge with ontological data, this inferencing could not be accomplished. 4.4 FOCALE’s Context-Driven Policy Management Figure 4 shows a simplified form of the DEN-ng context model, which relates Context to Management Information to Policy [21], and works as follows. Context determines the working set of Policies that can be invoked at any given time; this working set defines the set of Profiles and Roles that can be assigned, which in turn defines functionality that can be invoked or provided. Significantly, this model also defines the set of management information that is used to determine how the Managed Element is operating. Note that this proactive definition of how to determine whether a component or function is operating correctly is very important to the central concept of governance. Managed Entity Roles Fig. 4. Simplified DEN-ng Context Model are used to describe the state of the Managed Entity, and are then linked to both Policy and Context by the four aggregations shown. Specifically, Policy is used to define which management information will be collected and examined; this management information in turn affects policy. Context defines the management information to monitor, and the values of these management data affect context, respectively. Our context-aware architecture, which controls our autonomic manager, is shown in Figure 5. This architecture enables the type of algorithm, function, and even the type of data to use to be changed as a function of context. This is facilitated by detecting context changes to change Context Manager Policy Manager the active policies that Policies control application of intelligence are being used at any Autonomic Manager given time. Control Control Control Control Current systems that use policy (regardless of Model-Based Analyze Data Determine Managed whether it is part of an Translation and Events Actual State Resource autonomic system) use it Reasoning and Ontological in a static way, causing Define New Device Comparison Learning Configuration(s) three serious problems: Control (1) it is impossible for pre-defined policies to Fig. 5. Simplified FOCALE Architecture anticipate all conditions EntityData

DescribedByEntityData

0..n

0..n

ManagementInfo

DescribedByMgmtInfo

0..n

0..n

0..n

Entity

0..1

ManagedEntity

Tak esOnManagedEntityRoles

0..n

GovernsManagementInfo

0..n

0..n

Policy

0..n

0..n

SelectsPolicies

0..n

0..n 0..n

0..n

0..n

0..n

ManagementInfoUsesPolicy

ContextDependsOnMgmtInfo

ManagedEntityRole

0..n

0..n

1

0..n

GovernsManagedEntityRoles

0..n

ManagedEntityRoleUsesPolicy

0..n

PolicyResultAffectsContext

0..n ContextData

0..n

0..n

MgmtInfoAltersContext

ContextDependsOnManagedEntityRoles ManagedEntityRoleAltersContext

0..n

HasContexts

0..1 ContextData Composite

ContextData Atomic

YES

Match?

NO

Providing Seamless Mobility Using the FOCALE Autonomic Architecture

337

that can affect a managed resource, let alone a managed system, (2) management systems are static, in that they are designed to manage known resources and services; if a new resource or service is dynamically composed, how can a static management system manage it, and how can pre-defined static policies be applicable, and (3) if the underlying context or business objectives change, existing policies may no longer be relevant. Therefore, FOCALE enables context changes to vary the policies used, which in turn change the functions of the control loop. The policies used in our project follow a standard event-condition-action model: events are generated when new wireless system data is available; these events trigger the evaluation of the conditions of one or more policy rules. If the conditions are matched, then one or more policy actions are executed. (Note that this is a very simplified description; for more details, including more granular execution options, please see [20]). In our prototype system, this results in a Causal Analysis, which classifies the reason for the KPI or KQI violation and defines actions to fix the problem. A separate set of policy rules is then used to implement the actions; this is to allow for humans to examine the proposed operation of the system until the humans gain the confidence needed to let the policies run on their own. While this is possibly not needed in autonomic computing, this is definitely needed in autonomic networking, due to the tremendous complexity of networks. 4.5 FOCALE’s Use of Machine Learning and Reasoning Machine learning and reasoning are provided by the “Ontological Comparison” and “Machine Learning and Reasoning” functions in Figure 5. The former implements semantic similarity matching as previously described, and is used by other components to find equivalent semantic terms in the analysis, learning and reasoning functions; the latter implements reasoning and learning algorithms. A bus enables the reasoning and learning components to “watch” the current operation being performed, and be used in addition to or instead of that operation. For example, machine learning can examine the operations being performed, and note the effectiveness of the actions taken given the context, policies, and input data used. This can be used to build a knowledge base that can help guide future decisions. Similarly, an abductive reasoning algorithm can be used to generate hypotheses as to the root cause of problems sensed, which the autonomic manager then tries to verify by using the models and ontologies to query the system and the environment to gather more data to support each hypothesis. The combination of these learning and reasoning algorithms enable the autonomic system to adapt to changing business goals, user needs and environmental conditions. Machine learning enables the incorporation of new and learned behavior and data. Information modeling facilitates machine learning by providing a general to specific ordering of functionality, as well as details regarding aggregation and connectivity, within the system. Machine learning can be applied to temporal, spatial as well as hierarchical system aspects, allowing for learned behavior about different system “cuts” or cross-sections. Hypothesis formation is a mapping of the data to be explained into the set of all possible hypotheses rank-ordered by plausibility [22]. We define the hypothesis space by using a combination of the information model, the results of the topology discovery

338

J. Strassner, D. Raymer, and S. Samudrala

process, and axiomatic knowledge. If the hypothesis space is too large, then falsification techniques can be used to provide a “critic” function [22], which helps reject some hypotheses and reduce the hypothesis space cardinality. Two examples of a critic function are (1) incorporating upper and lower boundary conditions on capacities and qualities directly into the information model to use for comparison purposes, and (2) the use of ontological relationships, such as “never-has-a”, “is-not-a-kind-of”, and especially “does-not-cause” relationships. One or more machine learning algorithms may be employed to gain experience from the environment, and to aid the reasoning process. We expand the traditional definition of machine learning [23] to include notions of identifying specific values (statistics), identification of specific attribute-value pairs (traditional “data-mining”), and finally the identification of attributes and processes linked to events (our new definition of “machine learning”). Clearly, techniques such as candidate elimination and decision trees, coupled with notions of positive and negative examples, may be employed to help define those attributes of interest surrounding an anomalous event. However, these techniques tell us nothing about the cause behind this event (how the attributes might be linked), nor what the sequel effects and consequences might be. Furthermore, it conveys no understanding. Hence, our machine learning approach combines modeled data with the knowledge of subject matter experts to define a set of axioms and theories. We use machine learning to maintain and repair established theories, as well as in finding successively minimal descriptions of those theories upon encountering future examples of the behavior described by the theories. Finite state machines are a way of encoding behavior, and these may be considered a form of causal structure. The transition probabilities between states need to be maintained for any managed entity whose behavior varies with context. Machine learning and statistics are critical in refinement of transition probabilities and maintenance/repair activities as well as in finding behavioral cues by linking together state change with stimulus/response pairs that describe behavior.

5 Applying FOCALE to Wired/Wireless Network Management Sensor data from the Managed Element is analyzed to determine if the current state of the Managed Element is equal to its desired state. If it is, the process repeats. If it isn’t, then the autonomic manager examines the sensor data. If the autonomic manager already understands the data, then it continues to execute the processes that it was performing. Otherwise, the autonomic manager examines the models and ontologies to develop knowledge about the received data (the remaining steps are complex and beyond the scope of this paper; indeed this first step can often be enough). This knowledge is then fed to a set of machine-based learning and reasoning algorithms that reason about the received data. For example, if the data represents a problem, then the algorithms try and determine the root cause of the problem; finding a cause, actions are then issued, which are translated into vendor-specific commands by the model-based translation functions, and applied to the appropriate Managed Elements. Note that this may include Managed Elements that were not the cause of the problem, or were not being monitored. The cycle then repeats itself, except that in general the monitoring points will have changed to ensure that the reconfiguration commands had their desired effect.

Providing Seamless Mobility Using the FOCALE Autonomic Architecture

339

6 Using FOCALE to Implement Seamless Mobility One of the many challenges of Seamless Mobility is providing Seamless Management, not only across Wired and Wireless domains, but also across multiple domains of various RATs. Each of these Radio Access Technologies have their own set of specifications, provide different functionality and need different governance and management mechanisms. For example, an SLA for a customer will have different specifications depending on the type of network the customer is currently using. A customer could expect a better Voice Service Quality on a GSM network than on a Wi-Fi network using VOIP; this knowledge is easily encoded into ontologies, while the detailed setting for network configuration are encoded in our data models. Furthermore, the SLAs in this example will be different for each network type and RAT; they could also be affected by the vendor-specific functionality of the devices being used. Context-Driven Policy Management determines which Policies are to be used to govern what functionality of which type of network. Furthermore, other business driven policies can also be used to control the overall experience in a Seamless Mobility environment. For example, if a user with a dual mode phone enters an area where a Wi-Fi connection is available, performing a switch from a cellular to a Wi-Fi VOIP call should not be based only on the power availability of the network, but on a number of other factors that are context-specific. For example, suppose a customer has three profiles in a Seamless Mobility environment: (1) their Business Profile, in which their devices must use their employer’s network services, (2) a Home Profile, in which the customer wants to minimize cost (since these are services that the customer is paying for), and (3) an Entertainment Profile, which is currently designed to maximize a video on demand experience. Simple business policies for this scenario could look as follows: IF Business Profile is Enabled, THEN maximize Security ELSE, if Home Profile is Enabled, THEN minimize Cost ELSE, maximize Quality of Service for Video on Demand application In this scenario, the phone, being a device controlled by Seamless Mobility policies, will act differently depending on the context (i.e., the particular profile that is currently active). For example, if the Business Profile is active, then the phone will strive to maximize Security, and hence handover to a network that is the most secure of its choices (or even alert the user that no such network service exists!). In contrast, if that same device is in Home Profile mode, then its prime directive is to minimize cost, and hence it will try to handover any calls received to the cheaper Wi-Fi network to save money for the customer. However, this solution will not be blindly followed, since the Wi-Fi network may not be able to provide the SLA requirements for the customer. Hence, the Autonomics manager may determine that the call should not switch in order to fulfill the SLA requirements of the customer (or, alternatively, alert the user as to the possible degradation in quality and ask if that is acceptable to the user). Note that in both of these cases, we strive to avoid making “black or white” decisions. In other words, just because a policy says “handover to cheapest network” doesn’t always mean that this should be done if it violates some other user metric. It is this ability to reason that makes autonomics so valuable for Seamless Mobility.

340

J. Strassner, D. Raymer, and S. Samudrala

(Similarly, the ability of the phone to learn user preferences in such situations and act more intelligently on behalf of the user is also very valuable, but is beyond the scope of this paper).

7 Conclusions This paper has described the application of the novel FOCALE autonomic networking architecture to realize Seamless Mobility solutions. The rationale behind FOCALE was explained by examining current problems in network management along with the challenges of next generation networks and applications, like Seamless Mobility, and how autonomic principles can be used to meet these challenges. In this background, autonomic networking was differentiated from autonomic computing. FOCALE builds on these differences and introduced six novel enhancements to current autonomic architectures: use of multiple control loops, use of models with ontologies to develop state machines for orchestrating behavior, normalizing and correlating diverse data, enabling control loop components to change their operation according to context, and incorporation of reasoning and learning mechanisms. Behavioral orchestration is achieved by deducing the current state of a Managed Resource through analyzing and reasoning about sensed management data. This is done using a novel set of control loops that are reconfigured to meet the changing context and/or data being examined. Machine learning is facilitated by our use of models and ontologies; machine reasoning is used for hypothesis generation and theory maintenance. Finally, we explained how FOCALE could be used to realize Seamless Mobility. Handovers are analyzed using the model based translation functions; models and ontologies enable us to gather and reason about sensed data, and ensure that the correct relative commands for heterogeneous network devices are issued. Future work will expand on the types of management data that are analyzed, as well as evaluate different types of machine learning and reasoning algorithms for different situations in Seamless Mobility applications. References 1. Strassner, J.: Autonomic Networking – Theory and Practice, Tutorial for IM (2007 ) 2. Strassner, J.: Knowledge Management Issues for Autonomic Systems. In:TAKMA 2005 conference (2005) 3. Lee, J., Miller, L.: CDMA Systems Engineering Handbook. Artech House Publishers (1998) ISBN 0-89006-990-5 4. Rosenberg Adam, N., Kemp, S.: CDMA Capacity and Quality Optimization. McGrawHill, New York (2003) 5. Ovesjö, F., Dahlman, E., Ojanperä, T., Toskala, A., Klein, A.: FRAMES Multiple Access Mode 2 – Wideband CDMA. In: PIMRC 1997 (1997) 6. Mitola, J.: Cognitive Radio Architecture: The Engineering Foundations of Radio XML. Wiley-Interscience, Chichester ISBN 0471742449 7. http://www.motorola.com/content.jsp?globalObjectId=6611-9309 8. TMF Wireless Service Measurements Handbook Approved version 3.0, GB923v30_040315.pdf (March 2004)

Providing Seamless Mobility Using the FOCALE Autonomic Architecture

341

9. Kreher, R.: UMTS Performance Measurement: A Practical Guide to KPIs for the UTRAN Environment (October 2006) ISBN: 0-470-03487-4 10. Pareto chart definition, http://en.wikipedia.org/wiki/Pareto_chart 11. Homer-Dixon, T.: The Ingenuity Gap: Facing the Economic, Environmental, and other Challenges of an Increasingly Complex and Unpredictable World. Vintage Books, New York (2002) 12. Strassner, J., Agoulmine, N., Lehtihet, E.: FOCALE – A Novel Autonomic Networking Architecture. In: LAACS 2006 13. IBM, An Architectural Blueprint for Autonomic Computing, v7 (June 2005) 14. Kephart, J.O., Chess, D.M.: The Vision of Autonomic Computing (Jan 2003), www.research.ibm.com/autonomic/research/papers/AC_Vision_Computer_Jan_2003.pdf 15. Strassner, J.: A New Paradigm for Network Management – Business Driven Device Management. In: SSGRRs 2002 conference 16. Strassner, J., Raymer, D., Lehtihet, E., Van der Meer, S.: End-to-end Model-Driven Policy Based Network Management. In: Policy 2006 Conference 17. Strassner, J., Raymer, D.: Implementing Next Generation Services Using Policy-Based Management and Autonomic Computing Principles. In: NOMS 2006 18. www.omg.org/mda 19. Strassner, J.: Seamless Mobility – A Compelling Blend of Ubiquitous Computing and Autonomic Computing. In: Dagstuhl Workshop on Autonomic Networking (January 2006) 20. Strassner, J.: Policy-Based Network Management. Morgan Kaufman Publishers, Seattle (2003) 21. Wong, A., Ray, P., Parameswaran, N., Strassner, J.: Ontology mapping for the interoperability problem in network management. Journal on Selected Areas in Communications 23(10), 2058–2068 (2005) 22. Josephson, J., Josephson, S.: Abductive Inference: Computation, Philosophy, Technology, ch. 7, Cambridge University Press, Cambridge (1996) 23. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

Evaluation of Joint Admission Control and VoIP Codec Selection Policies in Generic Multirate Wireless Networks B. Bellalta, C. Macian, A. Sfairopoulou, and C. Cano Network Technologies and Strategies (NeTS) Research Group Departament de Tecnologies de la Informaci´ o i Comunicaci´ o Universitat Pompeu Fabra, Passeig de Circumval.laci´ o, 8, 08003 Barcelona, Spain {boris.bellalta, carlos.macian, anna.sfairopoulou, cristina.cano}@upf.edu

Abstract. Multirate wireless networks in general share a common problem for the transmission of VoIP traffic, since the rate changes of some of the flows affect the transmission of all others, which causes an unacceptable voice quality degradation. In this work, an admission control algorithm is combined with a codec selection block to mitigate this negative effect. This centralized admission control is able to block or drop calls in order to maintain the system stability despite rate changes. Moreover, the integrated codec selection block is able to select the most adequate VoIP codec for each incoming or already active call based on the channel rate used and according to a number of optimization policies. Several such policies are designed and evaluated in this paper. Results show that this combined adaptive solution provides a beneficial trade-off among the performances of the different codecs in terms of MOS, blocking and dropping probability and cell resource usage. Furthermore, a number of interesting singularities deriving from the multirate nature of the network are identified. Keywords: VoIP codecs, Mobile Networks, Multiple channel transmission rates, WLANs.

1

Introduction

All kinds of wireless channels suffer from high error rates due to variable fading, noise and interferences, which make difficult the correct data transmission (for example between a mobile node (MN) and an access point (AP)). Moreover, the channel profile also changes during the duration of a communication session due to the MN mobility pattern, being the distance d between the two communicating nodes one of the key parameters thereof: The farther off, the lower the quality. In order to allow for larger coverage areas (larger values of d) with adequate communication conditions, the system itself increases the data protection by increasing the channel coding rate (longer redundancy patterns) and using more robust modulations. Hence, longer messages have to be transmitted for the same amount of user data, while using a less efficient modulation, which results in Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 342–355, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Evaluation of Joint Admission Control and VoIP Codec Selection Policies

343

longer delays to transmit the same amount of information. As a consequence, the available bit rate for user data is greatly reduced. Such systems are referred to as multirate systems; systems which dynamically choose the transmission rate (code and modulation) to guarantee a lower error rate in the transmissions [1]. WLANs are but one example of such systems, using a multirate scheme in order to achieve high coverage areas (about a 100 meters of radius). MNs near the AP see good channel conditions and therefore can use high transmission rates (low protection codes and efficient modulations) to obtain a high throughput. However, MNs far from the AP observe bad channel profiles and require lower transmission rates. For example, the IEEE 802.11b [2] defines four transmission rates: 11, 5.5, 2 and 1 Mbps, which are obtained from the use of a punctured convolutional code and four different modulations [2]. As shown in [3], and as a consequence of the CSMA/CA random access protocol, 802.11b MNs using low transmission rates degrade the performance of MNs using higher transmission rates as they have to wait for the long transmission delays of the slow ones. For best-effort (elastic) data transmission, the use of multiple rates does not have a critical impact over the user / network performance, as the effect is to lower the overall network performance, which is translated simply into larger delays for transmiting or receiving the desired data (for example, downloading a web page). For inelastic and/or real-time traffic such as VoIP, however, the variable channel data rate could be very problematic. To understand this, consider that n VoIP calls are active and the total bandwidth requirement of the n calls, B(n) is lower than the channel capacity at time t, B(n, t) < C(t). At t + 1, a single MN suffers a reduction in its transmission rate, which results in a C  (t + 1) < C(t) and that B(n, t + 1) > C  (t + 1). This situation degrades the quality of all n VoIP calls as they are not able to achieve the necessary (constant) throughput for the codec in use, while the packet losses and delay increase. Possible solutions to this case are to drop one or more calls (actually, dropping the call which has changed to a lower transmission rate would be enough) or reducing B(n, t + 1) until at least B(n, t + 1) = C(t + 1). One way to do this is by using lower bandwidth VoIP codec, which can be obtained by switching to another codec [4,5,6], using AMR (Adaptive Multi-Rate) codecs [7, 8], changing the codec parameters (different packetization intervals) or by choosing both codec and packetization interval that provide the most appropriate combination [9]. In most of the works mentioned above, the authors use some threshold metrics to call the adaptation procedure when the quality decrease becomes unacceptable. The most commonly used metrics are the packet loss ratio and the end-to-end delay, since they are the critical parameters in a VoIP transmission. From them, only [6] and [9] are working on the specific problems of a multirate environment, while the others use the adaptive mechanism to deal mostly with bad channel conditions. Also, most of the above mentioned focus only on one part of the problem, either addressing the new calls arriving to the network with an Admission Control mechanism or the ongoing calls with the codec adaptation procedure. Only [5] and [9] provide a combined mechanism using a codec adaptation module working together with admission control. Still,

344

B. Bellalta et al.

both solutions are somehow limited since the former proposes a change of codec for all calls when congestion is detected while the latter proposes changing the codec only of the node that suffers a rate change, which depending on the case might not be enough to correct the congestion. As for the works based on AMR codecs, notice that their working principle could be useful for our purposes if the AMR codecs are able to react to network congestion rather than bad channel conditions in terms of bit channel errors or measuring SNR values since, in our scenario, a MN although observing very good channel conditions will suffer from the extra bandwidth required by other MNs with low transmission rates. In any case, the use of some decision block is required which selects the more properly or suitable VoIP codec rate. In this paper, a combined call admission control (CAC) scheme and dynamic VoIP codec rate selection module is presented, to alleviate the effects of the multirate issue in a generic wireless network. The CAC is not only able to block (incoming) or drop (active) VoIP calls, but it also includes a policy to select the most adequate VoIP codec rate in order to avoid having to block or drop calls by adapting them to the current cell situation. Different adaptation policies are proposed and evaluated, which try to optimize the network usage according to different criteria: call quality, number of simultaneous active calls, fairness, etc. These are simple yet representative policies, which provide a basic insight into the intrinsic trade-offs found in the most common adaptation policies, when applied to multirate environments. Only general and simple assumptions about the channel conditions are made to evaluate the policies, in order to make the results qualitatively valid for a broad range of wireless technologies. Results show the relation between the different metrics of interest for designing a successful VoIP service: offered load, blocking / dropping probabilities, speech quality and simultaneous number of calls.

2

A Multirate Scenario

Throughout the paper, a single wireless channel (cell) of capacity C is ideally shared (statistically multiplexed) among n Mobile Nodes (MNs) and a single Access Point (AP). An example of the considered scenario is depicted in Figure 1, where MNs use different VoIP codecs at different transmission rates. Each node tries to use the bandwidth it requires, without considering any fixed channel division by using different time slots, frequency sub-bands or codes, which allows the node to access to complete channel capacity. No assumptions have been made about the MAC protocol employed, except that all MNs share a common wireless channel, with equal access probability to it. Then, if the required bandwidth by all active nodes is higher than the channel capacity, all of them will see how their communications are degraded as they will all suffer from packet losses and / or higher packet delays (overload of transmission MAC queues). Due to the channel conditions, a node i (including the AP) will transmit to the channel using the transmission rate Ri , occupying a relative bandwidth Bi• equal to Bi• = αi Bi , where αi = (C/Ri ) is the bandwidth excess due to use lower

Evaluation of Joint Admission Control and VoIP Codec Selection Policies

345

MN B=64 Kbps G.711 VoIP 1 Mbps

B=8 Kbps G.729 VoIP 2 Mbps

B=16 Kbps G.728 VoIP 2 Mbps

MN

MN

MN B=32 Kbps G.726 VoIP 5 Mbps

AP

B=208 Kbps G.729, G.728, G.726, G.711 11,5.5,2,1 Mbps

MN B=64 Kbps G.711 VoIP 11 Mbps

B=8 Kbps G.729 VoIP

MN 1 Mbps

MN

B=8 Kbps G.729 VoIP 1 Mbps

Fig. 1. Sketch of the considered scenario

transmission rates than the channel capacity C and Bi is the required bandwidth by node i. The set of channel transmission rates is R = [R1 , R2 , . . . , RN ], with R = [C, C/2, C/5.5, C/11], where the channel capacity is C = 1 Mbps. 2.1

VoIP Codec-Rate Adaptation

Each VoIP call comprises one downlink flow and one uplink flow. Since all flows go through the AP, the AP requires the same amount of bandwidth as all MNs together. Thus, the VoIP call i consumes a channel capacity equal to 2 · Bi (va ), where Bi (va ) is the one-way throughput of a flow using the specified VoIP codecrate (va ) 1 for M Ni . It is also assumed that the same transmission rate Ri is used for the uplink and downlink directions (see Figure 2) and the relative capacity required when transmitting at a rate Ri is Bi• (va , Ri ) = αi Bi (va ). In Figure 2, it is shown how MNs using low transmission rates require a relative capacity Bi• (va ) which is higher than the one of the MNs that use higher transmission rates (see call 2, which uses the same VoIP codec as call 1 and R1 > R2 ). To mitigate this effect, MNs which use low transmission rates could use low rate VoIP codecs in order to maintain their Bi• (va , Ri ) low. 1

For the sake of generality, and since it does not affect the validity of our results, only the nominal codec rate is used, obviating all protocol-dependent overheads.

346

B. Bellalta et al.

Channel capacity

R1 64 K

0

R1

R2

R2

64 K

64 K

64 K

Call 1

Call 2

uplink

R2

R2

8K

8K

Call 3

bandwidth

C

downlink

Fig. 2. Example with three calls sharing the channel

It is assumed that the considered VoIP codec-rates provide different voice quality results, being this quality proportional to its required bandwidth2 . Therefore, considering the set of V VoIP codecs, an integer value ranging from 1 (lowest bandwidth codec) to vN (highest bandwidth codec) is assigned to each VoIP codec as a measure of its voice quality. The set of VoIP codec rates is V = [v1 , v2 , . . . , vN ]. Four VoIP codec rates have been considered, with V = [64, 32, 16, 8] Kbps and quality Q = [1, 2, . . . , N ] 2.2

Admission Control

The cell state is governed by the admission control entity. It decides whether to accept or reject new calls based on the current system state and the information carried in the admission request transmitted by the MN or arriving from the fixed network. Moreover, it is able to drop active calls when the system state becomes unstable due to the behavior of the already accepted calls, for example due to a transmission rate change3 . A VoIP codec selection module is co-located with the admission control. It suggests the VoIP codec to be used following a given policy. For example, a valid policy would be ”always use the G.711 VoIP codec, independently of the considered rate, for it gives the best voice quality”. The set of policies used in this work is introduced in the next section. A block scheme of the considered admission control is shown in Figure 3. For evaluation purposes, it is considered that VoIP calls arrive to the system from an assumed infinite population with an arrival rate λ (calls/second) following a Poisson distribution. The call duration has a mean of 1/μ seconds and 2

3

It is theoretically possible to find a codec with higher bandwidth requirements and lower achieved MOS than a contender. Indeed, such codecs have been proposed in practice. However, once a better performing option is found, the older codec is abandoned. Hence, in our argumentation, we assume an ordered list of codec-rates along a bandwidth-MOS axis. in general, for any change to a faster rate than the one in use, the resulting system state is always feasible. Therefore, dropping calls only occurs when a MN using a faster rate changes to a slower one.

Evaluation of Joint Admission Control and VoIP Codec Selection Policies

347

Call Admission Control Call request arrival rate (VoIP codec, Rate)

λ

μ(n)-d(n) New Request

Codec Selection Policy

Rate Change rate (VoIP codec, Rate)

γ(n)

Call completion departure rate

Cell state information (bandwidth used, number of active calls (n), codec and rate used by each call)

d(n) Calls dropped departure rate

Rate Change (Accepted calls)

Fig. 3. Call Admission Control scheme with VoIP codec selection

it follows and exponential distribution. A new call arrives to the system using a transmission rate Ri picked from the set of existent set rates R. The probability to pick a given rate is uniformly distributed among all rates (so, all rates can be picked with equal probability). Once a call is admitted, every call will suffer a rate change in exponentially distributed periods with mean 1/γ seconds (the overall cell rate changes is nγ). Notice that calls depart the system due to their completion at rate μ(n) or because they are dropped, which occurs at rate d(n). How the codec negotiation and management takes place is outside the scope of this paper, for it is protocol and technology-specific. However, as an example, the SIP [10] protocol provides the possibility of updating the codec used during an active connection by means of an additional INVITE message. Obviously, in-call signalling to select adaptively the proper codec (from the set of codecs available for the two end points) without extra signalling overhead would be the desirable solution.

3

VoIP Codec-Rate Selection Policies

To mitigate the multi-rate problem several codec selection policies can be implemented, depending on the criterion to be optimized. A number of trade-offs exist among such criteria as call quality, maximum number of simultaneous active calls, resource usage efficiency, dropping and blocking probabilities, signalling overhead, complexity and time needed to regain system stability after a rate change. In this section, a set of simple policies is proposed and later evaluated. The chosen policies are valuable for they provide an upper and lower bound in the call quality versus call quantity trade-off, arguably one of the most critical parameters to tune any

348

B. Bellalta et al.

admission control scheme for VoIP. Moreover, the policies have been chosen to exemplify the most common approaches for codec adaptation found in the literature, albeit in a very simple form. As such, they are both representative of more complex schemes and yet intuitively easy to grasp. Furthermore, they isolate the most common adaptation mechanisms, so as to identify their individual effects, which in more sophisticated policies are usually found combined in refined but complex ways. The set of considered policies is the following: P1: Always Best Quality (ABQ). The highest bandwidth VoIP codec-rate is used for all calls without consideration of the transmission rate. This policy will result in a high VoIP speech quality but will suffer from higher blocking and dropping probabilities. As such, it provides the upper bound on call quality, and it is expected to provide the worst blocking and dropping probabilities. P2: Always Maximum Calls (AMC). The lowest bandwidth VoIP codecrate is used for all calls without consideration of the transmission rate. Being the opposite of the previous policy, AMC provides the lower bound on quality, but an upper bound on the number of simultaneous active calls and it is expected to provide the lower bound in blocking and dropping probabilities. P3: Constant Relative Capacity Consumption (CRCC). A specific VoIP codec-rate is assigned to each transmission rate, so as to equalize the capacity consumption across transmission rates, i.e.: Bi• (va , Ra ) ≈ Bi• (vb , Rb ), ∀a, b, where a and b belong to the set of codec-rate pairs. Therefore, a MN using transmission rate Ra will use the VoIP codec va . Notice that this is a ”fair” policy as all MNs have a voice quality proportional to their transmission rate. The typical operation mode of AMR codecs matches this policy. P4: Capacity Before Quality for the target call (CBQ-1). Based on P 3, a MN perceiving a change to a transmission rate Ra is allowed to choose a VoIP codec equal or lower than the va codec provided by P3. This policy allows each VoIP call to proceed with a worse voice quality but reducing its blocking and dropping probabilities, as it is able to check if lower codec-rates would be better suited to the network conditions. This would allow, for example, an incoming call to be accepted in an otherwise saturated network, by accepting to use an ”unfairly” low rate codec for its transmission rate. It is expected to provide a higher number of simultaneous calls than P3 but with lower speech quality. P5: Capacity Before Quality with Iterative Capacity Reduction (CBQ-n-IC↓). Based on P 4, this policy also contemplates reducing the VoIP codec-rate of other active calls4 . Different criteria can be used to select which calls have to suffer a VoIP codec degradation despite experiencing higher transmission rates. 4

Again, it is not considered how this policy would be implemented in practice. However, intuitively, a centralized cell optimization instance, which could be placed jointly with the AP and/or a SIP proxy, could be a good candidate.

Evaluation of Joint Admission Control and VoIP Codec Selection Policies

349

The algorithm followed by this policy is: 1. Select the lowest codec-rate for the target VoIP call. 2. Compute the required bandwidth to accept (or, for already active calls, not to drop) the target VoIP call i, so the required bandwidth is labelled as RBi . 3. Compute the amount of bandwidth that can be re-allocated from other calls and assigned to call i, ABi . If ABi ≥ RBi a feasible solution exists and the process is initiated. Otherwise, block (drop) the target call i. 4. Starting with the active calls with lowest transmission rate, decrease each call to a lower codec-rate until RBi is achieved. If all calls have suffered a codec-rate reduction and RBi has not been achieved, re-start the process5 . The rationale behind starting with low rate calls is that a codec-rate change from 16 to 8 Kbps in an active VoIP call using an excess bandwidth proportional to α = 11, reduces the consumed relative capacity from 176 Kbps to 88 Kbps, resulting in a capacity saving of 88 Kbps, while changing from 16 Kbps to 8 Kbps on a call using an excess bandwidth proportional to α = 1, reduces the consumed capacity only in 8 Kbps. Hence, it is more effective to begin with lower rate calls. P6: Capacity Before Quality with Iterative Capacity Reuse (CBQ-n-IC↑). This is an extension of P 5. Each time a call arrives, ends or is dropped, the available / released capacity is assigned (if possible) to other VoIP calls in order to increase their VoIP codec-rate and hence the quality of the communication. No unused capacity is kept as long as better quality for active calls can be achieved, but preference is given to a higher number of simultaneous calls, as in P5. 1. Compute the free capacity left by the leaving or rejected VoIP call i and when a new VoIP calls arrives to the system, F Bi . 2. Starting with the active call with lowest transmission rate, increase each call to a higher codec until F Bi is exhausted. If F Bi is not exhausted after upgrading all calls by one codec, re-start the process6 . The rationale for starting with the lower codec-rates is to mitigate the impact of the initial P 5 policy, as it allows to increase the quality of the VoIP calls that probably have suffered most from previously lowering their VoIP codec. However, it would be more efficient to start with high transmission rate calls, as an increment of a single codec implies lower relative capacity consumption. This should provide for the most efficient resource usage in all cases. The performance of each policy will be also linked with the available set of codec-rates and transmission rates. A higher density of codec-rates (i.e., smaller rate changes between two codecs) would allow for a finer tuning of the network resources. 5

6

If a call a single If a call a single

is decreased by more than one codec in the same step, it is considered to be change. is increased by more than one codec in the same step, it is considered to be change.

350

4

B. Bellalta et al.

Performance Results

The joint call admission control and the VoIP codec-rate selection mechanism are evaluated by using a simulation model of the described system. The simulator is built upon the Component Oriented Simulation Toolkit (COST) simulation libraries [11], allowing for a modular implementation of the proposed scheme (see Figure 3). The considered parameters are shown in Table 1. Table 1. Simulation Parameters Scenario 1 Scenario 2 Channel Capacity (C) Mbps 1 Available Channel Rates (Mbps) [C, C/2, C/5.5, C/11] VoIP Codec Rates (Kbps) [64, 32, 16, 8] 1/μ seconds / call 120 λ calls / second A·μ Traffic Load (A) Erlangs variable 15 γ rate changes / second 1/60 variable

In Figure 4 the blocking (a,c) and dropping (b,d) probabilities are shown. Figure 5 shows the average number of simultaneous active calls (a,c) and the average bandwidth used (b,d). Finally, Figure 6 shows the voice quality index (a,c) and the average number of codec-rate changes per call (b,d), for the different set of policies. P1 and P2 are the reference policies, as they always select a fixed codec independently of the system state and the transmission rate used by the requesting call. P1 uses the highest bandwidth codec, which results in the lowest average number of active calls. This low value is due to the high blocking (dropping) probabilities that the new (already active) VoIP calls suffer, which, as expected, increases with the traffic load. Transmission rate changes cause a very high dropping probability on active calls (higher than with the other policies), as most changes to a lower transmission rate imply dropping the call (for example, note that a 128 Kbps call using R1 which changes to R4 is always dropped as it requires more bandwidth than the channel capacity). Nevertheless, all calls that could depart from the system satisfactorily have perceived the best voice quality. The opposite case is P2, which uses the lowest bandwidth codec-rate. Then, P2 allows for the highest average number of simultaneous calls among all policies, due to its lower blocking probability. However, the high acceptance probability makes the system more vulnerable to call drops. Hence, P2 only shows the lowest values in this category at very low load conditions. Obviously it provides the lowest VoIP voice quality. Both P1 and P2 do not imply any codec change during the duration of the call. P3 is the basic transmission rate / codec-rate policy. It selects a codec based on the rate used by a MN / AP, therefore it is the most fair policy as each VoIP call consumes resources proportionally to its transmission rate. However, a performance lower than intuitively expected is observed, showing a low number

Evaluation of Joint Admission Control and VoIP Codec Selection Policies 1

1 P1 P2 P3 P4 P5 P6

0.9 0.8

0.8 0.7

Dropping Probability

Blocking Probability

P1 P2 P3 P4 P5 P6

0.9

0.7 0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1

0.1

0

5

10

15 20 Traffic Load (Erlangs)

25

0 0

30

5

10

15 20 Traffic Load (Erlangs)

(a)

0.8

35

P1 P2 P3 P4 P5 P6

0.9 0.8 0.7

Dropping Probability

Blocking Probability

30

1 P1 P2 P3 P4 P5 P6

0.7 0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.2

0.2

0.1 0

25

(b)

1 0.9

351

0.1

50

100

150

200 1/γ

(c)

250

300

350

400

0

50

100

150

200 1/γ

250

300

350

400

(d)

Fig. 4. VoIP call Blocking and Dropping Probabilities; Scenario 1 (a-b), Scenario 2 (c-d)

of simultaneous active calls (only higher than P1). This is due to the high blocking probability, higher than P1 for high offered load values, due to the low dropping probability at those points. Thus, P3 ensures that an accepted call has a higher probability to finish satisfactorily as it presents a lower dropping probability, specially at high offered VoIP traffic loads. This is due to the fact that the relative capacity used by P3 for an accepted call, Bi• (va , Ra ), will remain more or less constant in spite of rate changes. An interesting observation is that the voice quality index for P3 increases with the traffic load. The reason is quite simple; the calls dropped are those which change from a high to a low transmission rate, so degrading the voice quality due to the codec change. P4 is based on P3 but provides a higher adaptation to the system state by reducing its own voice codec-rate to lower ones than those allowed by P3 if necessary. This results in a lower blocking probability and a higher number of active calls. However, since the number of accepted calls is higher, the dropping probability is also a bit higher than P3. In terms of voice quality, P4 shows a degradation compared to P3 due to the fact of using lower VoIP codec-rates than P3. Anyway, the performance of P3 is closer to P4 than to P5 and P6, meaning that adapting only a single VoIP call can not improve the overall system performance significantly.

352

B. Bellalta et al. 16

1 P1 P2 P3 P4 P5 P6

12

0.9 0.8

Average Bandwidth used (Mbps)

Average number of active calls

14

10

8

6

4

0.7 0.6 0.5 0.4 0.3 P1 P2 P3 P4 P5 P6

0.2 2

0.1

0

5

10

15 20 Traffic Load (Erlangs)

25

0 0

30

5

10

15 20 Traffic Load (Erlangs)

(a)

30

1 P1 P2 P3 P4 P5 P6

12

0.9

Average Bandwidth used (Mbps)

14

10

8

6

4

0.8

0.7

0.6 P1 P2 P3 P4 P5 P6

0.5 2

0

35

(b)

16

Average number of active calls

25

50

100

150

200 1/γ

(c)

250

300

350

400

0.4 0

50

100

150

200 1/γ

250

300

350

400

(d)

Fig. 5. Number of active calls and bandwidth used; Scenario 1 (a-b), Scenario 2 (c-d)

With 1/γ = 60 seconds, on average non-dropped calls suffer 4 transmission rate changes during their lifetime7 . P4 shows a lower number of codec changes, specially at high traffic loads. The reason is quite simple: Before the call is started (at the admission control phase) a lower codec-rate than in the P3 case is selected. Hence, that call ends up using the same low codec for all the call duration. P5 is based on P4 but it is able to reduce also the VoIP codec-rates of the other active calls, starting by those which use the lowest transmission rates. For example, a new call request using R1 could be accepted if and only if it uses the lowest bandwidth codec v4 and another call commits to reduce its codec to a lower one. As expected, P5 shows the closest performance to P2, but providing a better speech quality for all load conditions. Finally, P6 tries to improve the low voice quality achieved with P5 by sharing among the remaining calls the available (free) bandwidth that is released when a call leaves the system or is dropped. Moreover, when a new call arrives to the system, the highest possible codec is allocated to it, independently of its transmission rate. At very low offered traffic, P6 even provides a better speech quality 7

Notice that as the dropping probability increases the average number of transmission rate changes per call falls for both P3 and P4, since some of the calls are evicted before exhausting their lifetime.

Evaluation of Joint Admission Control and VoIP Codec Selection Policies 4

10

3

2.5

2

P3 P4 P5 P6

9

Average number of codecs changes / call

P1 P2 P3 P4 P5 P6

3.5

Voice Quality

353

8 7 6 5 4 3 2

1.5 1 1

5

10

15 20 Traffic Load (Erlangs)

25

0 0

30

5

10

15 20 Traffic Load (Erlangs)

(a)

30

35

(b)

4

16 P1 P2 P3 P4 P5 P6

P3 P4 P5 P6

14

Average number of codec changes / call

3.5

3

Voice Quality

25

2.5

2

12

10

8

6

4

1.5 2

1

50

100

150

200 1/γ

(c)

250

300

350

400

0

50

100

150

200 1/γ

250

300

350

400

(d)

Fig. 6. Voice quality indicator and number of codec changes; Scenario 1 (a-b), Scenario 2 (c-d)

than P3, and remains always higher than P5 for any offered load. However, it is still considerably worse than all other policies except P2. As it is shown, P6 provides the best bandwidth usage, as it is able to allocate more than 95% of the bandwidth to active calls. P4, P5 and P6 require frequent codec-rate changes, which could prove impractical in real systems. Especially due to the signalling overhead involved in re-negotiating the codecs for every call, which would need an intense and repeated control packet interchange (with its associated additional delay and jitter), as well as additional processing burden for the involved parties. A possible solution would be to include the selection of the new codecs in-band with the data flow. RTP, for example, includes a field in its header for indicating the codec format transported [12]. Assuming a solution along this or similar lines, P6 provides a very good trade-off for all metrics across all load conditions. Notice that different rate values scale proportionally to all policies and do not change. Therefore, the selected policy will perform as expected, compared with others for any value of γ. A counterintuitive result is that reducing the transmission rate change frequency, policies P4, P5 and P6 reduce their voice

354

B. Bellalta et al.

quality index. This is due to the fact that, for low 1/γ values, the VoIP codecrate reduction is due the bandwidth allocation for new requests more than to avoiding to drop an already active call. 4.1

Is There a Right Policy?

There is not any policy which could be considered the optimal for all cases simultaneously. Therefore, the policy selection has to be based in balancing several metrics depending on the scenario. From the set of policies, two of them seem particularly attractive: First, P4, as it is the best tradeoff between all considered variables (number of calls, blocking and dropping probability, speech quality) and requires the lowest number of VoIP codec changes. Due the robustness of this policy and its simplicity (very low signalling traffic) it could be applied to scenarios which require a fast response, such as an scenario with emergency calls. The second one is P6, which provides always the best channel utilization and the maximum number of simultaneous calls (same as P5 but with better speech quality) and, for low load conditions the best speech quality compared with all other adaptive policies. However, a much heavier signalling load and processing power burden is necessary. Therefore, it could be used in low loaded scenarios, switching to P5 or P4 when the call arrival rate increases.

5

Conclusions

A proper selection of the VoIP codecs allows the system to react to the random variations in capacity due to transmission rate changes, which are motivated by the wireless channel and / or the user mobility. In this paper, a simple set of decision policies are presented, based on the ratio between the VoIP codec bandwidth requirement and the channel rate. Numerical results show that adaptive schemes provide a trade-off among the different expected characteristics of the set of codecs, such as quality speech index, average number of active calls and the achieved blocking and dropping probabilities. Merging all these metrics in a single indicator is a difficult task, since it is difficult to quantify the impact of increasing or decreasing one of these variables on the overall user / operator perception. This will be considered for further research.

References 1. Lacage, M., Manshaei, M.H., Turletti, T.: IEEE 802.11 Rate Adaptation: A Practical Approach. In: ACM International Symposium on Modeling, Analysis, and Simulation of Wireless and Mobile Systems (MSWiM), Venice, Italy, ACM Press, New York (2004) 2. IEEE Std 802.11. Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. ANSI/IEEE Std 802.11 (1999 edn.) (Revised 2003) 3. Heusse, M., Rousseau, F., Berger-Sabbatel, G., Duda, A.: Performance Anomaly of 802.11b. In: IEEE INFOCOM 2003, San Francisco, USA, IEEE Computer Society Press, Los Alamitos (2003)

Evaluation of Joint Admission Control and VoIP Codec Selection Policies

355

4. Leng Ng, S., Hoh, S., Singh, D.: Effectiveness of Adaptive Codec Switching VoIP Application over Heterogeneous Networks. In: 2nd Int. Mobility Conference, Guangzhou, China (2005) 5. Toshihiko, T., Tadashi, I.: Wireless LAN Resource Management Mechanism Guaranteeing Minimum Available Bandwidth for Real-time Communication. In: IEEE WCNC 2005, New Orleans, USA, IEEE Computer Society Press, Los Alamitos (2005) 6. Sfairopoulou, A., Macian, C., Bellalta, B.: QoS adaptation in SIP-based VoIP calls in multi-rate 802.11 environments. In: ISWCS 2006, Valencia (2006) 7. Servetti, A., Martin, J.C.D.: Adaptive interactive speech transmission over 802.11 wireless LANs. In: Proc. IEEE Int. Workshop on DSP in Mobile and Vehicular Systems, Nagoya, Japan, April 2003, IEEE Computer Society Press, Los Alamitos (2003) 8. Matta, J., P´epin, C., Lashkari, K., Jain, R.: A Source and Channel Rate Adaptation Algorithm for AMR in VoIP Using the Emodel. In: 13th NOSSDAV, Monterey, CA, USA (2003) 9. McGovern, P., Murphy, S., Murphy, L.: Addressing the Link Adaptation Problem for VoWLAN using Codec Adaptation. In: IEEE Globecom 2006 - Wireless Communications and Networking, San Francisco, CA, IEEE Computer Society Press, Los Alamitos (2006) 10. Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., Schooler, E.: RFC3261: SIP: Session Initiation Protocol. Internet RFCs (2002) 11. Gilbert (Gang) Chen.: Component Oriented Simulation Toolkit (2004), http://www.cs.rpi. edu/ cheng3/ 12. Schulzrinne, H., Casner, S., Frederick, R., Jacobson, V.: RFC1889: RTP: A Transport Protocol for Real-Time Applications. Internet RFCs (1996)

A Novel Inter-LMD Handoff Mechanism for Network-Based Localized Mobility Management Joong-Hee Lee , Jong-Hyouk Lee, and Tai-Myoung Chung Internet Management Technology Laboratory, Electrical and Computer Engineering, Sungkyunkwan University, 300 Cheoncheon-dong, Jangan-gu, Suwon-si, Gyeonggi-do, 440-746, Korea {jhlee00, jhlee, tmchung}@imtl.skku.ac.kr

Abstract. Network-based Localized Mobility Management (NetLMM) is an outstanding candidate solution for the mobility management controlled by the network. In NetLMM, mobile nodes (MNs) can be provided mobility services without any installation of mobility-support stack. However, there is a restriction that the MN is able to have the mobility only within a single localized mobility domain (LMD). In this paper, we propose a novel Inter-LMD handoff mechanism in order to eliminate the shortcoming of the current NetLMM protocol. The proposed Inter-LMD handoff mechanism enables that the MN hands off across LMDs, even if the MN does not have a functionality of Mobile IPv6 (MIPv6). According to the performance evaluation, the proposed Inter-LMD handoff mechanism has approximately 5.8% more overhead than the current Inter-LMD handoff of MIPv6-capable devices, while the current NetLMM protocol does not support the handoff of MIPv6-incapable devices.

1

Introduction

MIPv6 is the basic mobility management protocol [1], and IETF has been working on improving MIPv6. As the result of the effort, improved mobility management protocols have been introduced to the Internet Society such as Fast Mobile IPv6 (FMIPv6) and Hierarchical Mobile IPv6 (HMIPv6) [2,3]. These protocols are improved based on the functionality of host perspectives. A mobile node (MN) must have a host software stack and act as an important role to manage its mobility in the host-based mobility management protocols. Note that the MN is quite likely to be a hand-held device which has low battery capability and low computing power. The specialized and complex security transactions are also required between the MN and the network, because the MN has to act as a part of the mobility management [4]. Hence, IETF has been interested in the solution for the NetLMM in order to minimize the load for the mobility operation of the MN [5]. In the network-based mobility management protocol, the MN can be provided the continuity of an access to the Internet without the functionality of 

Corresponding author.

Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 356–366, 2007. c Springer-Verlag Berlin Heidelberg 2007 

A Novel Inter-LMD Handoff Mechanism for NetLMM

357

MIPv6 or any other host-based mobility management protocol such as FMIPv6 and HMIPv6. This means that the MN is able to have the mobility only with the capability of the wireless access such as IEEE 802.11 series. A protocol called “A protocol for Network-based Localized Mobility Management (NetLMM)” is developing as a candidate solution for the network-based mobility management protocol by NetLMM working group [6,7]. The NetLMM protocol is based on the concept of Proxy Mobile IPv6 (PMIP) and employs PMIP client [5], and it uses entities called Local Mobility Anchor (LMA), Mobile Access Gateway (MAG), and PMIP client to support mobility for a MN. There is no additional signaling than the basic MIPv6 specified in [1]. Whenever the MN changes the MAG to which the MN is attached, PMIP client simply generates a binding update message (BU) and sends to LMA which acts as the home agent (HA) of the MN. This message generated by the PMIP client is called Proxy Binding Update message (PBU). As a response for the PBU, the LMA sends Proxy Binding Acknowledgment message (PBAck) back to the PMIP client, and then the PMIP client sends the PBAck back to the MAG. With this operation simply explained above, the MN is able to maintain the access to the Internet regardless of movements within a single LMD. The MN does not need any kinds of mobility management protocols, and only has the ability for the wireless access. The handoff within the single LMD is called Intra-LMD handoff. However, the MN must have the functionality of Mobile IP client in the IPv6 stack of the MN, if the MN wants to hand off to the other LMD, which is called Inter-LMD handoff [6]. We insist that this is contrary to the goal of NetLMM, because the MN should be a part of the operation for mobility management. In this paper, we propose a mechanism which supports Inter-LMD handoff without any additional requirement for a MN. The MN does not need to have the functionality of MIPv6 or any other mobile management protocol. The entities in the network such as LMA, MAG, and PMIP client exchange messages which have been already introduced in [1], [5], and [6]. With the proposed mechanism, the MN is able to have global mobility while the MN still has all of benefits acquired from the Network-based Localized Mobility Management. The rest of this paper is organized as follows. In section 2, we introduce the details of NetLMM, Intra-LMD handoff, and Inter-LMD handoff. Then, the details of the proposed mechanism are explained in section 3. In section 4, we evaluate the proposed mechanism and discuss the benefits of this mechanism. In section 5, we conclude the proposed Inter-LMD mechanism proposed in this paper.

2 2.1

Network-Based Localized Mobility Management Protocol Overview

The NetLMM protocol is the result of an effort toward the localized mobility management controlled by network. The NetLMM protocol is designed to support the mobility for MNs sufficiently in an administrative network. In the NetLMM protocol, there are several entities to provide the mobility for a MN. A

358

J.-H. Lee, J.-H. Lee, and T.-M. Chung

LMA acts as the standard MIPv6 Home Agent (HA) described in [1]. A MAG is the current attachment point of the MN. The LMA sends the MAG data traffic toward the MN using the address called Proxy Care of Address (pCoA). The pCoA is the address of the MAG to which the MN is currently attached. If the MN hands off to the other MAG, the designated PMIP client sends the LMA the PBU containing new pCoA of the MN. By receiving the PBU, the LMA always recognizes the current location of the MN, so that the LMA is able to tunnel the data traffic to the appropriate MAG. The MN has only one address in the single domain, even though the MN hands off between several MAGs. In the single domain, the MN does not have to send BU and change its address. That is the reason why the MN does not need to have the functionality of MIPv6.

Fig. 1. Message flow of MN’s initial attachment to LMA

The procedure of initial attachment of a MN to a LMD is represented in Fig. 1. As you can see in Fig. 1, a Proxy Home Address (pHoA) is assigned to the MN after attaching to the MAG1 at layer 2 and receiving a router advertisement message from the MAG1. The MAG1 sends the PMIP client the trigger containing the MN ID and the pHoA of the MN. With this information of the MN, the PMIP client sends the PBU to the LMA. After the PBU message is confirmed by the LMA, the LMA returns the PBAck, and the LMA begins to act as a standard HA of the MN. With the LMA and the pHoA, the MN is ready to communicate and connect to the Internet. Communication sessions would be established through the pHoA of the MN after this initial attachment to the LMA.

A Novel Inter-LMD Handoff Mechanism for NetLMM

359

If the MN is the MIPv6-capable device, and if the MN has already established sessions with some correspondent nodes via the HoA using standard MIPv6 before the initial attachment to the LMA, the MN must send BU containing the pHoA as the Care-of Address (CoA) to the GMA. The BU procedure to the GMA is the same procedure described in “Mobility Support in IPv6”[1], because the GMA is the entity that acts as the HA to provide the global mobility for the MN. The MN must be the device having the functionality of MIPv6 for the binding update to the GMA to acquire the global mobility. Note that the LMA is able to provide only the local mobility to the MN, which means that if the MN moves to the MAG controlled by the other LMA, the MN would be configured new pHoA and lose the session established via the old pHoA. 2.2

Intra-LMD Handoff

The Fig. 2 represents the procedure for Intra-LMD handoff. With this procedure, the MN is able to hand off between MAGs without losing connection to the Internet in the single domain. After the MN changes its layer 2 attachment to the MAG2, which is another MAG in the same LMD, the MAG1 transfers the context of the MN and data packets to the MAG2 using Context Transfer Protocol (CXTP) [8]. Then, the MAG2 sends the trigger to the PMIP client. PMIP client performs the proxy binding update with the LMA using the PBU and the PBAck. With this procedure, the LMA is able to intercept the data packets toward the MN, and forward them to the appropriate MAG which is the current attachment point of the MN.

Fig. 2. Message flow of Intra-LMD handoff

2.3

Inter-LMD Handoff with the Current Protocol

If the MN wants to hand off across the LMDs with the current protocol, the MN must have a functionality of MIPv6 or other mobility management protocols.

360

J.-H. Lee, J.-H. Lee, and T.-M. Chung

Because the MAGs located in different LMDs advertise different prefixes which represent their own LMD, the MN should configure new address with new prefix when the MN changes the attachment to the MAG collocated with the other LMA. The session cannot be maintained with the old pHoA in this scenario, so that the standard BU message has to be sent to the GMA by the MN to maintain the session via the old pHoA. Therefore, the MN must be the Mobile IPv6-capable device, even though this is contrary to the goals of NetLMM. The procedure for Inter-LMD handoff is exactly same as the initial attachment to the LMA depicted in Fig. 1.

3

Novel Inter-LMD Handoff Mechanism

In this section, we propose a novel Inter-LMD handoff mechanism. With the proposed mechanism, a MN does not have to be a MIPv6-capable device, and does have mobility without any restriction. Let us consider a simple topology to simplify the explanation. The topology is depicted in Fig. 3.

Fig. 3. Simple topology for NetLMM

The MN is the device communicating in the LMD1. Before the communication is over, the MN changes its attachment to the MAG3. The MAG3 does not advertise the prefix of the LMA1. Hence, the MN configures New pHoA (NpHoA) after the L2 attachment to the MAG3. After the MN is configured to the NpHoA, with the current protocol, there is no way to maintain the session via the Old pHoA (OpHoA) obtained in the LMD1 when the MN is not the MIPv6-capable device. However, the basic purpose of the NetLMM protocol is to provide mobility for the MN that is incapable to any other mobility management protocol such as MIPv6. If the MIPv6-incapable MN is able to hand off to the other LMD, the MN can have global mobility. Therefore, the novel Inter-LMD handoff is mandatory for the NetLMM protocol as the mobility management solution.

A Novel Inter-LMD Handoff Mechanism for NetLMM

361

Fig. 4. Message flow of Inter-LMD handoff

To make the session maintained even if the MN hands off to the other LMD, we develop the mechanism represented in Fig. 4. When the MN hands off to the MAG3 at layer 2, the MAG2 sends the information of the MN to the MAG3 using CXTP [8]. The information of the MN consists of its MN ID, OpHoA, and the address of the LMA1. After the end of the CXTP, the MAG3 sends the trigger message to the PMIP client2. The information contained in the trigger has a delicate difference from the original trigger of the NetLMM protocol. The difference is that the trigger used in the Inter-LMD handoff contains the OpHoA as well as the MN IP and NpHoA. The OpHoA is to send the PBU1 to the LMA1, and the NpHoA is to send the PBU2 to the LMA2. The PMIP client2 can recognize the trigger that means Inter-LMD handoff, so that the PMIP client2 sends the PBU1 containing (OpHoA, NpHoA) pair to the LMA1, and the PBU2 containing (NpHoA, pCoA) pair to the LMA2. After receiving the PBAcks from the LMA1 and LMA2 respectively, the data traffic toward the OpHoA is delivered via the LMA1. The PBU and PBack have to be protected basically by IP security (IPsec) [1,9]. However, it is likely that the Security Association (SA) between the PMIP client2 and the LMA1 has not established. Before sending the PBU2, the SA between the PMIP client2 and the LMA1 should be dynamically established using some key exchange protocol such as IKEv2 [10]. Then, the packets arrived to the LMA1 are forwarded to the LMA2 because the LMA1 receives the PBU1 from the PMIP client2, so that the LMA1 recognizes the NpHoA as the pCoA of the MN. The packets delivered to the LMA2 are forwarded to the MAG3 because the LMA2 receives the PBU2 from

362

J.-H. Lee, J.-H. Lee, and T.-M. Chung

the PMIP client2. With this procedure, the MN is able to maintain the connectivity of the session established with both the OpHoA and the NpHoA, and the handoff is transparent to the transport layer. The secure considerations for the CXTP such as the key establishment can be achieved based on public keys or an AAA protocol [8], but the details of the security considerations in CXTP is out of scope of this paper.

4

Performance Evaluation and Discussion

In this section, we evaluate the performance of the proposed Inter-LMD handoff mechanism explained in section 3. Before we evaluate the proposed Inter-LMD handoff mechanism, the fact that the NetLMM protocol cannot provide the mobility for the MIPv6-incapable device should not be overlooked. Note that we assume the MN has no functionality of Mobile IPv6. We evaluate the performance of the proposed Inter-LMD handoff mechanism based on the concept of the signaling cost introduced in [11], then we discuss about the result of the performance evaluation. 4.1

Performance of the Handoff Signaling

CL2 . The total cost to change L2 attachment. CCXT P . The total cost to exchange MN context using CXTP. Cacq . The total cost to acquire IP address after the L2 attachment. Tmp . The transmission cost of the handoff signaling between the MAG and the PMIP client. – Tpl . The transmission cost of the handoff signaling between the PMIP client and the LMA. – Tpl1 . The transmission cost of the handoff signaling between the PMIP client and the LMA1.

– – – –

Fig. 5. Signaling of the current Inter-LMD handoff

A Novel Inter-LMD Handoff Mechanism for NetLMM

363

Fig. 6. Signaling of the proposed Inter-LMD handoff

– Tpl2 . The transmission cost of the handoff signaling between the PMIP client and the LMA2. – Tmg . The transmission cost of the handoff signaling between the MN and the GMA. – PMN . The processing cost of the handoff signaling at the MN. – PGMA . The processing cost of the handoff signaling at the GMA. – PMAG . The processing cost of the handoff signaling at the MAG. – PP MIP . The processing cost of the handoff signaling at the PMIP client. – PLMA . The processing cost of the handoff signaling at the LMA. According to the message flows illustrated in Fig. 5 and Fig. 6, the signaling costs of the current Inter-LMD handoff and the proposed Inter-LMD handoff can be calculated as follows. CCh and CP h mean the cost of the current Inter-LMD handoff and the proposed Inter-LMD handoff respectively. CCh = CL2 + Cacq + 2Tmp + 2Tpl + 2Tmg + 2PP MIP + PLMA + PMAG + PMN + PGMA CP h = CL2 + Cacq + CCXT P + 2Tmp + max(2Tpl1 , 2Tpl2 ) + 3PP MIP + 2PLMA + PMAG

(1)

(2)

The transmission cost can be assumed as the proportional value to the distance between the source and the destination, and the proportionality constant is δU [11]. Hence, Tmp , Tpl , Tpl1 , and Tpl2 can be expressed as lmp δU , lpl δU , lpl1 δU , and lpl2 δU respectively, where lmp , lpl , lpl1 , and lpl2 are the average distances between the MAG and the PMIP client, between the PMIP client and the LMA,

364

J.-H. Lee, J.-H. Lee, and T.-M. Chung

between the PMIP client and the LMA1, and between the PMIP client and the LMA2, respectively. We also assume that the transmission cost over the wireless link is ρ times higher than the cost over the unit of wired link since the cost over the wireless link is higher than the cost over the wired link in general. Since Tmg is the cost of BU and BA for MN, Tmg consists of a unit of the wireless link,which is from the MN to the MAG, and wired link which is from the MAG and the GMA. Hence, Tmg can be expressed as ρδU + lmg δU , where lmg is the average distances between the MAG and the GMA. Therefore, we can rewrite Eq. (1) and Eq. (2) as:

4.2

CCh = CL2 + Cacq + 2(lmp + lpl + lmg + ρ)δU + 2PP MIP + PLMA + PMAG + PMN + PGMA

(3)

CP h = CL2 + Cacq + CCXT P + 2(lmp + max(lpl1 , lpl2 ))δU + 3PP MIP + 2PLMA + PMAG

(4)

Performance of the Packet Delivery

In the proposed Inter-LMD handoff mechanism, a GMA is not necessary, whereas the GMA is the indispensable entity in the current Inter-LMD handoff. Because of this difference, we assume that the LMA, which the MN is attached firstly to, acts as the GMA in the current protocol in order to simplify the evaluation.

Fig. 7. The Packet Delivery of Current protocol

Fig. 8. The Packet Delivery of Proposed protocol

As you can see in Fig. 7 and 8, the packet delivery of the current protocol and the packet delivery of the proposed protocol have exactly same path as each other if the GMA in Fig. 7 and the LMA1 in Fig. 8 are the same entity. The LMA1 is the LMA in the previous LMD where the MN hands off from. Therefore, we do not evaluate the cost of the packet delivery. 4.3

Numerical Results and Discussion

We compare the cost of the handoff signaling based on Eq. (3) and Eq. (4) in this subsection. For simplification, we assume that each of the processing costs in Eq. (3) and Eq. (4) is equivalent to each other (i.e., P = PP MIP =

A Novel Inter-LMD Handoff Mechanism for NetLMM

365

Table 1. The parameter values for performance analysis Parameter L P δU ρ CL2 Cacq CCXT P Value 15 10 0.1 10 5 10 12

Performance Comparison 100 Current Handoff Proposed Handoff 90

80

The signaling cost

70

60

50

40

30

20

10

0 −1 10

0

10 The movement rate across LMDs

Fig. 9. The comparative result for the evaluation

PLMA = PMAG = PMN = PGMA ), and the average distances are also same (i.e., L = lpl = lpl1 = lpl2 = lmp = lmg ). For the analysis of the performance, we also assume the parameter values in Table 1. The result of the comparison for two handoff mechanisms is represented in Fig. 9. The cost of the proposed Inter-LMD handoff mechanism is approximately 5.8% higher than the current mechanism. The difference of the performance is from the cost of CXTP in the proposed Inter-LMD handoff mechanism. However, we assume that the average distance between the MAG and the GMA is same as the average distance between the other entities because the GMA is not necessary in the proposed Inter-LMD mechanism. It is quite likely that the GMA is much further entity than the other entities, because every entity is in the same or near administrative domain except the GMA. We should also remember the assumptions of the mechanisms. In the proposed mechanism, we assume that the MN does not need the functionality of the MIPv6, whereas the MN in the current mechanism cannot hand off across LMDs without the standard MIPv6 functionality which can be a burden to the MN as explained in [4].

366

5

J.-H. Lee, J.-H. Lee, and T.-M. Chung

Conclusion

In the current NetLMM protocol, it is impossible to support the mobility to a MN which wants to hand off across LMDs. Hence, the MN must have the functionality of MIPv6 in order to hand off across LMDs, even if the MN adopts the NetLMM protocol as its mobility management. Thus, we propose a novel Inter-LMD handoff mechanism. In the proposed mechanism, the MN does not have to be a MIPv6-capable device to be provided the global mobility which means the MN is able to hand off across LMDs. As the result of the performance evaluation, the cost of the proposed Inter-LMD handoff signaling is similar to the current mechanism, even though the MN does not need the functionality of MIPv6. As the future work, we are trying to decrease the cost of the handoff signaling and developing a scalable key distribution mechanism for MAGs in NetLMM.

Acknowledgement This research has been supported by a grant of the Korea Health 21 R&D Project, Ministry of Health & Welfare, Republic of Korea (02-PJ3-PG6-EV080001).

References 1. Johnson, D., Perkins, C., Arkko, J.: Mobility Support in IPv6, RFC 3775 (June 2004) 2. Koodli, R. (ed.) Fast Handovers for Mobile IPv6, RFC 4068 (July 2005) 3. Soliman, H., Castelluccia, C., El Malki, K., Bellier, L.: Hierarchical Mobile IPv6 Mobility Management (HMIPv6), RFC 4140 (August 2005) 4. Kempf, J. (ed.) Goals for Network-based Localized Mobility Management (NETLMM), draft-ietf-netlmm-nohost-req-05 (October 2006) 5. Gundavelli, S., Leung, K., Devarapalli, V., Chowdhury, K., Patil, B.: Proxy Mobile IPv6, draft-sgundave-mip6-proxymip6-02 (March 2007) 6. Bedekar, A., Singh, A., Kumar, V., Kalyanasundaram, S.: A protocol for Networkbased Localized Mobility Management, draft-singh-netlmm-protocol-02 (March 2007) 7. NetLMM WG web site: (Accessed on March 2007), http://www.ietf.org/html. charters/netlmm-charter.html 8. Loughney, J., Nakhjiri, M., Perkins, C., Koodli, R.(ed.) Context Transfer Protocol (CXTP), RFC 4067 (July 2005) 9. Arkko, J., Devarapalli, V., Dupont, F.: Using IPsec to Protect Mobile IPv6 Signaling Between Mobile Nodes and Home Agent, RFC 3776 (June 2004) 10. Kaufman, C.(ed.) Internet Key Exchange (IKEv2) Protocol, RFC 4306 (December 2005) 11. Xie, J., Akyildiz, I.F.: A Novel Distributed Dynamic Location Management Scheme for Minimizing Signaling Costs in Mobile IP. IEEE Transactions on Mobile Computing 1(3) (2002)

Improvement of Link Cache Performance in Dynamic Source Routing (DSR) Protocol by Using Active Packets Dimitri Marandin Technische Universität Dresden, Chair for Telecommunications, Georg-Schumann-Str. 9, 01062, Dresden, Germany [email protected]

Abstract. Dynamic Source Routing (DSR) is an efficient on-demand routing protocol for ad hoc networks, in which only needed routes are found and maintained. The route discovery/setup phase becomes the dominant factor for applications with short-lived small transfer traffic (one single packet or a short stream of packets per transaction) between the source and the destination: resource discovery, text-messaging, object storage/retrieval, queries and short transactions. Route caching is helpful to avoid the need for discovering a route or to shorten route discovery delay before each data packet is sent. The goal of this work is to develop a caching strategy that permits nodes to update their cache quickly to minimize end-to-end delay for short-lived traffic. The proposed approach consists of an Active Packet that travels through the nodes of the network twice. During the first travel, it visits the nodes collecting fresh network topology information. When the first visit is finished, another one is started to validate and update the caches of the nodes with this newly obtained information. This mechanism removes invalid cache links and caches valid links based on the collected topology information. The correct information in caches allows to speed up the Route Discovery and even to avoid it. Keywords: mobile ad hoc networks, Dynamic Source Routing (DSR), link cache.

1 Introduction Mobile ad hoc networks are the active research topic in wireless communications. This technology makes possible for network nodes to communicate to each other using wireless transceivers (perhaps along multihop paths) without the requirement for a fixed infrastructure or centralized administration[1]. This is a unique characteristic of ad hoc networks concerning more conventional wireless networks, for example cellular networks and WLAN, in which nodes (such as, mobile phone users) communicate with each other using base stations. Since the nodes in ad hoc networks use wireless technology, the topology of the network created by the nodes can change, when the nodes move. A routing protocol, that can manage these topology changes to make the network reconfigurable, is necessary. Routing in ad hoc networks has been a dynamically growing research area for the last years. Many routing protocols for multihop ad hoc networks has been developed Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 367–378, 2007. © Springer-Verlag Berlin Heidelberg 2007

368

D. Marandin

beginning with straightforward modifications of Internet protocols, to complicated multilevel hierarchical proposals. The design routing protocols is one of the most significant challenges in ad hoc networks and is critical for the basic network operations. In [1] it is shown that on-demand routing protocols perform better than tabledriven ones in mobile ad hoc networks. In an on-demand mechanism, a node attempts to find a route when it has to send data packets to a destination. To avoid the cost to find a route for each data packet, nodes maintain discovered routes in a cache. Because of the potential mobility of nodes in an ad hoc network, cached routes can become stale. Thus, a good caching strategy is necessary to update caches of nodes. In this paper we focus on the “Dynamic Source Routing protocol” (DSR) [2], which is an on-demand routing protocol. We investigate and develop a caching strategy for it which permits nodes to efficiently adapt their cache to network changes. This paper has been structured as follows. Section 2 gives an overview of DSR protocol. Section 3 describes the cache structure used and the problem statement. In section 4 we discuss related work. Section 5 describes our approach for improving DSR with the use of Active Packets. Section 6 presents an evaluation of our approach, and finally, in section 7 we present our conclusions.

2 DSR: Dynamic Source Routing The Dynamic Source Routing protocol (DSR) [2] is a simple but effective on-demand protocol used in ad hoc networks. DSR has two basic mechanisms [2]: Route Discovery and Route Maintenance. Route discovery is the mechanism used at the source of the packets to find a route to the destination. When a source generates a data packet to the destination, it floods the network to find a route to the desired destination. It broadcasts a ROUTE REQUEST(RREQ) packet. When this RREQ is received at an intermediate node, it adds its address to the source route contained in the RREQ packet, and re-broadcasts it. When this RREQ is received by the destination, it adds its address to the source route and unicasts a ROUTE REPLY (RREP) to the originator of the RREQ. To reach the source of the packet, the destination reverses the route contained in the RREQ packet. Each intermediate node on the source route forwards this RREP. When the source receives this RREP, it can extract the source route from this packet and send data packets (DP) to the destination using this route. Route Maintenance is the mechanism that detects link failures and repairs them. Each node of the source route has to detect whether the packet has been received by the next hop. When the Route Maintenance mechanism detects that the link is broken, a ROUTE ERROR(RERR) packet is sent to the source of the packet. The source has to use another route to send packets to the destination or start the route discovery again.

3 Caching Any on-demand routing protocol must maintain some type of route cache in order to avoid the need of discovering a route every time before sending each data packet. A

Improvement of Link Cache Performance in Dynamic Source Routing (DSR) Protocol

369

route discovery is an expensive operation, due to the flooding of the network, and it causes delay before the first packet data packet can be sent. After the source discovers a route, it has to store the route in some cache for transmitting following packets. Thus, caching is an essential component of on-demand routing protocols for wireless ad hoc networks. The DSR uses the route cache even more often, using it not only to cache routes for the purpose of originating packets, but also for the purpose of allowing nodes to answer Route Requests targeted at other nodes[10]. A route caching is the main approach to reduce the flooding overhead by avoiding route discovery as much as possible so that nonoptimal but available routes are preferred to the effort of finding the current optimal route. The use of a cache introduces the problem to properly manage it. A good caching strategy that updates the cache of nodes to the new topology is needed. For the development of a caching strategy in on-demand routing protocols, the cache structure used is very important. Two types of cache structure can be used: i) Path cache, when a node caches the complete path (a sequence of links), ii) Link cache, when a node caches each link separately, adding it to a graph of links. In Fig. 1, the differences between both structures are shown. In case of the path cache structure (Fig. 1 (a)), when the node A adds a new route A-B-F-G-H to its cache, it has to add the whole path as an independent entry in its cache. In case of the link cache structure (Fig. 1 (b)), when the node A adds the new route A-B-F-G-H to its cache, it has to add only the link B-F, since the other links already exist in its cache. It has been shown that link cache outperforms path cache [3]. This is because link cache can make better use of the cached network state information and it only deletes a broken link when the link causes a path to break, instead of deleting the whole path as path cache does. From all these reasons, we use a link cache in our approach. For a link cache, the reasonable choice is to permit the cache to store all links that are learnt, because there is a predetermined maximum of N2 links that may be in a network of N nodes. A

B

C

D

A

E

F

G

H

A

B

New path

A

B

F (a) Path Cache

C

D

G

H

New link

G

H

E

F

(b) Link Cache

Fig. 1. Path and Link cache structure for node A

Each node maintains its own cache. A node learns routes either when it forwards a data packet or a route reply. To make the protocol independent of the used MAC protocol, all the cached links have to be bi-directional ones. So, when a route reply message is forwarded by a node, its cache stores the links between the originator of the route reply and a node. They are cached as both directions for them have been tested: from the current node to the destination by the route discovery and from the destination to the current node by the route reply. The links from the originator of a

370

D. Marandin

route request to the node forwarding a route reply might be unidirectional, so they are not cached. These links of the path will not be cached until a data packet is sent.

4 Related Work Simulation studies in [1],[11],[12],[13],[14] have demonstrated the efficiency of route caching in on-demand routing protocols. Hoverer, [15] has shown that the high delay and low throughput of the DSR are mostly due to aggressive use of caching and lack of any technique to remove stale routes or to decide about the freshness of routes when many routes are available. Replies from caches reduce the delay of the route discovery, avoiding that the RREQ storm reaches every node of the network. Cached replies stop the flooding early, reducing the overhead. However, without an effective caching strategy, the information maintained in caches might be stale. Thus, replies from the node caches might contain invalid links. The undesirable effects of invalid routes when they are used by a source node to send data packets can be summarized as: • Packet losses, increasing packet delivery latency and routing overhead. These problems can be substantial. When mobility, traffic load, or network increase, more routes will become stale and have a negative effect on more traffic flows. When replying to route requests from the node caches is allowed, stale routes will be rapidly propagated to other nodes, worsening the situation. The route discovery overhead increases because the source node has to initiate more attempts of route discovery procedure. • Degradation of TCP performance. Stale routes also extremely degrade TCP performance[3]. As TCP cannot make a distinction between packet losses due to route failures from those due to congestion, it will incorrectly start congestion control mechanisms, causing the reduction in throughput. • Increased energy consumption at source nodes and intermediate nodes. If stale routing information is not removed quickly from the cache, TCP will retransmit lost packets again with invalid routes. • More time is required to find a new route to the destination node. The main reason of having stale routes is the mobility. Another reason that causes the stale cache problem in the DSR protocol [4] is “Incomplete error notification”. When a link failure is detected, a node returns a RERR to the source of the data packet which could not be delivered. Doing it, only the node belonging to the source route of this packet removes the invalid links from their cache. Using the optimization “Gratuitous ROUTE ERRORS” [5], a RREQ packet piggybacks the RERR information. But due to the replies from cache, the RREQ flooding does not reach every node of the network. From other side, nodes must maintain some type of routing cache because the route discovery/setup phase becomes the dominant factor for applications with short-lived small transfer traffic (one single packet or a short stream of packets per transaction) between the source and the destination: resource discovery, text-messaging, object storage/retrieval, queries and short transactions. Thus, caching is an essential component of on-demand routing protocols for wireless ad hoc networks.

Improvement of Link Cache Performance in Dynamic Source Routing (DSR) Protocol

371

Hu and Johnson [14] suggested several adaptive link timeout mechanisms. For a link cache, assuming the cache capacity is limited as is typically the case, the setting of cache timeout is critical, as it can have a significant effect on the performance of routing protocol in metrics of packet delivery ratio, routing overhead and etc. But only a few studies have been done on how to adjust it. In [16], a static caching scheme was considered in which a fixed cache timeout is assigned to all links. After a link stays for this period of time, it is deleted from the link cache. The disadvantage of this approach is that it cannot adapt to network changes. If the timeout is relative short compared to the actual link lifetime, the link cache works poorly because of the large number of route requests and unnecessary overhead. Alternatively, if the timeout lasts too long, the link cache works badly as well since the number of route errors may grow due to failed links in the cache. From these reasons, assigning the timeout close to a link’s lifetime can improve the performance. As the real lifetime of a link strongly depends on network parameters, for example nodes’ mobility and node density, adaptive caching schemes are required to obtain good performance. But, if the cache timeout in an adaptive scheme do not work correctly in some scenarios, its performance can be even poorer than a static caching. [6] proposed an active network method to solve the problem of stale caches. In this approach, an active packet roams around the network, and gathers information about network topology. Nodes check the payload of this active packet once it is received and update their route caches. Therefore, the cache miss rate is smaller and the route discovery flooding is reduced. With active packets, not only used routes are updated, but also possible routes for future use are added to the cache. Therefore, both route request flooding for new routes and due to the stale routes are mostly avoided. But in [6] the update phase is similar to the phase of gathering information. It takes the same amount of time. In opposite, in our approach we employed a quick update mechanism that allows update caches by broadcast. Additionally, some information at nodes is already updated in the phase of collecting information. In [6] the active packet is generated periodically by a randomly chosen node. But how this random node is selected and how it is guaranteed that only one node generates an active packet in the specified period of time is not described. It has a consequence that multiple active packets can be created simultaneously. It increases overhead drastically without any benefit and decreases performance. In our approach, the last node in the phase of collecting information is responsible for a generation of the next active packet. In this way, we avoid a situation of several active packets in the same network segment. Also we complement the approach with mechanisms of an active packet regeneration when network segmentation occurs and the active packet cannot reach the part of the network.

5 Active Packets Approach 5.1 Overview The suggested improvement is based on the introduction of a new packet to the DSR. This network layer packet will be henceforth referred to as an Active Packet (AP). It visits each node of the network twice. Fig. 2 depicts a rough outline of the general functioning of the mechanism which guides the AP through the mobile network.

372

D. Marandin

The first time the packet arrives at a node, topology data is collected by the packet. If it is the first visit, then the left branch of the diagram is followed and the neighbour discovery procedure is started. The node must send messages to find out who its neighbours are at this precise moment. Once known that, the node fills up with this data a connection matrix stored in the AP in compressed form and forwards the AP to a randomly chosen unvisited neighbour. The unvisited neighbours can be identified by means of the connection matrix. From logical point of view, a connection matrix is an array of lists. Each list is associated to a visited node and contains addresses of node’s neighbouring nodes. After the neighbour discovery process, a visited node adds its address with the list of its neighbouring nodes to the connection matrix. When transmitted, a connection matrix is effectively encoded. If there is no more unvisited neighbouring nodes, the 2nd AP visit will start. The field “Enable 2nd visit” is set to “1”and the AP is broadcasted to nodes visited during the 1st AP visit. When a node receives the AP in its 2nd visit, the connection matrix in the AP is used to update and validate the link caches of the nodes of the network. The update is done in two phases. First, the cached links that (according to the connection matrix contained in the AP) no longer exist, that is, broken links, will be deleted from the nodes’ cache. Secondly, some new links of whose existence the cache had no knowledge and that are stored in the matrix are added. It helps to improve the cache

AP received

1st

Visit Number

Neighbour Discovery

2nd

First time received no yes

Update connection matrix

Number of unvisited neigbours > 0

Update node cache

no

Broadcast the AP

yes Send to unvisited neighbouring node

Start 2nd visit

Fig. 2. Simplified flow chart of the reception of an AP

Drop AP

Improvement of Link Cache Performance in Dynamic Source Routing (DSR) Protocol

373

performance of the protocol by keeping the entries in the caches from becoming stale (the portion of successful good cache hits should be higher). It is also expected, that the extra signalling traffic generated by the APs introduced by the modification is compensated by the avoided route discovery flooding. As soon as a node receives the AP broadcasted from its neighbour in 2nd visit phase, it will examine, whether the AP has been already received in 2nd visit phase. If yes, the node will drop the AP, if no, the cache of the node will be updated by the AP. Then the AP is broadcasted again if necessary. There is a timer that determines when an AP ought to be created. This timer runs at every node in the network and starts when a node joins the network or receives an active packet. When during predefined timeout no active packets are received, the node selects a random back off time after which it is responsible to generate an AP. If during the back off period, the node receives an AP, it cancels a new AP generation and restart the timer. 5.2 Format of Active Packet First of all we must define a format of the AP. To remain compatible with nodes that use a standard DSR protocol, the DSR header is left unchanged. The AP information is included in an extra header after a standard DSR header. All the useful information is enough to be saved in the header. The header number 253 is used that is reserved by the Intermodal Association of North America (IANA) for testing. Thus, the value 253 of the next header field in the DSR header indicates the following AP header. The resulting frame is presented in fig. 3.

Fig. 3. MAC frame with the AP header

Nodes that are not AP compatible, but run the original DSR protocol simply ignore the AP header, at the same time as nodes that are AP capable can extract and process the enclosed information. We define two AP header formats for the two visit phases of the AP. For the first visit phase, the AP header contains the following information: • Enable 2nd visit (1 bit): determines if the AP is in its 1st visit phase of collecting information (when the value of this field is false or “0”) or in its 2nd visit phase of updating the caches (when the value of this field is true or “1”) • Backtracking(BT) variable (4 byte): the identifier of the node to which the packet must be route back (in the case that the previous node has been already visited but still has unvisited neighbours). • GenNodeID(4 bytes) and AP_ID (1 byte): a node generated the AP and the unique identifier of the AP. This pair is unique for each an AP. AP_ID must be set to a new value, different from that used for other APs recently initiated by this node. For example, each node may maintain a single counter value for generating a new AP_ID value for each AP it initiates. When a backtracking

374

D. Marandin

variable must be stored at the node, it saved together with this pair of fields. This pair allows knowing later to which AP a backtracking variable belongs. • Connection matrix representing a current view of the topology after AP 1st visit. It contains the information about direct links discovered during the 1st AP visit. The AP collects information only about bidirectional links in the network. For the second visit phase, the AP header contains the following information: • Enable 2nd visit (1 bit): determines if the AP is in its 1st visit phase of collecting information (when the value of this field is false or “0”) or in its 2nd visit phase of updating the caches (when the value of this field is true or “1”) • GenNodeID(4 bytes) and AP_ID (1 byte): a node generated the AP and the unique identifier of the AP. When a node broadcast the AP in the 2nd visit, it stores this pair in its ACTIVE_PACKET table. This pair is used to prevent broadcasting the packet multiple times during 2nd visit. AP_ID must be set to a new value, different from that used for other APs recently initiated by this node. For example, each node may maintain a single counter value for generating a new AP_ID value for each AP it initiates. • List of visited nodes (variable length): a list of node addresses that includes nodes that have been already visited in the 2nd phase. • Connection matrix (variable length) representing a current view of the topology after AP 1st visit. It is peremptory to somehow reduce the amount of data sent. From logical point of view, a connection matrix is an array of lists. Each list is associated to a visited node and contains addresses of node’s neighbouring nodes. After the neighbour discovery process, a visited node adds its address with the list of its neighbouring nodes to the connection matrix. When transmitted, a connection matrix is effectively encoded and transmitted in a compressed way. There are many other techniques that can be used in order to reduce the size of the packet and it is also important to be aware that the size of the AP is dynamic, so the more the nodes visited the larger the packet will become. This can cause great overhead and slow down the whole functioning of the network, and this is one of the reasons for thinking of limiting the number of nodes that the AP visits before starting to update the caches. 5.3 Initialization In normal operation, the node that starts 2nd visit phase is responsible for the generation of the next AP after timeout ACTIVE_PACKET_CREATION. This timeout defines a frequency of updates and must be set depending on the degree of mobility. For the exceptional situations (AP loss during 1st visit phase, at the beginning after the network formation or in the case of the network partition), a timer ACTIVE_PACKET_REGENERATION determines when an AP ought to be created. This timer runs at every node in the network and starts when a node joins the network. The timer is restarted when an AP is received. When during ACTIVE_PACKET_REGENERATION timeout no APs are received, the node selects a random back off time after which it is responsible to generate an AP. If during

Improvement of Link Cache Performance in Dynamic Source Routing (DSR) Protocol

375

the back off period, the node receives an AP, it cancels a new AP generation and restart the timer. The timer ACTIVE_PACKET_REGENERATION is introduced to generate an AP at the beginning after the network formation or in the case of the network partition. The most important problem with the active packets is that, because of the mobile nature of this kind of networks, packets are often prone to getting lost.

6 Performance Evaluation We compared our algorithm, called DSR-AP (DSR with Active Packets), and the normal operation of DSR, called DSR. The network simulator ns-2 [7] is used, including the Monarch Project’s [8] wireless and mobile extensions. The signal propagation model used is a two-ray ground reflection model. The network interface uses IEEE 802.11 DCF MAC protocol [2]. The mobility model used is the modified version of random waypoint [9], in which the steady-state speed distribution is applied to the first trip to eliminate any speed decay. Node speed is randomly chosen from 1 m/s to 19 m/s, like proposed in [9]. The communication model is Constant Bit Rate (CBR) with four packets per second and packet size of 64 bytes in order to factor out the effect of congestion [1]. At every instant of the simulation, 3 simultaneous short-time flows between random nodes exist. Each flow lasts 10 seconds. The simulation time is 2000 s. As every 10 seconds new flows are started, the route requests to the route cache are frequent that allows evaluating the performance of the DSR-AP.

Fig. 4. Performance evaluation

376

D. Marandin

It can be observed (fig. 4) that the throughput of the DSR increases when the pause time is increased, but the throughput of the DSR-AP does not change. It means that the AP approach reduces strongly the impact of mobility on the performance. When the mobility is higher more links are broken, more packets are dropped. The DSR-AP has achieved its better performance by increasing the good cache hit portion. The highest improvement takes place with pause time equals 0. At pause time 0, the nodes move a lot and the network topology is changing quickly. From this reason the cache information becomes obsolete and contains a lot of invalid information. But thanks to the quick propagation of the AP, in the DRS-AP the cache is updated in time. At pause time 600 seconds, the network is not changing quickly, the DSR shows better performance, but still under the DSR-AP. Generally, as the pause time increases, the packet delivery ratio constantly increases as well. Despite of the degree of mobility, the packet delivery ratio of the DSR-AP are almost the same at 99%, which indicates that the AP approach successfully updated nodes caches and significantly reduced the chance of packet loss due to a route failure. When a route failure occurs, the up-to-dated node cache helps in switching to the new valid route and unnecessary packet losses are avoided. In the standard DSR, if an intermediate node cannot locate an alternate route in its cache, the data packet will be dropped. The proposed approach reduces the average end-to-end delay as well. Since the AP avoids performing new unnecessary route discoveries, some delay time has been saved. The requested routes are mostly available in caches and the cache information collected through APs seems to be correct. It has a consequence in insignificant or zero route discovery delay. The improvement is more noticeable for high mobility. Normalized routing overhead(the ratio of the amount of routing overhead comprising the routing packets to the amount of transferred data) decreases when the nodes move less (higher pause times) because less links will be broken and therefore, less routing traffic is needed to repair or discovery routes. The DSR produces lower control overhead, but at the cost of a low packet delivery ratio and a high end-to-end delay. In low traffic conditions like in our scenario, the per-packet overhead is high. If the data rate will be higher, then the normalized overhead of the DSR-AP will decreases at all pause times. Also if the number of connection will increase, the normalized overhead of the DSR will grow, as more route discoveries must be performed (since fewer routes are available in the caches). The overhead of the DSR-AP will stay stable. Improvement of the caching system performance results in reducing the route discovery control traffic. The improvement is obtained due to the fact that the AP helped to avoid or speed up many route discoveries by maintaining more valid entries in the node caches. Although the AP overhead is high, it reveals no problem for the high capacity of modern WLAN. The increased control overhead does not cause network congestion, significantly increased average packet latency and packets loss. In opposite, the proposed pro-active link cache maintenance can significantly improve overall routing performance.

7 Conclusions The route discovery/setup phase becomes the dominant factor for applications with short-lived small transfer traffic (one single packet or a short stream of packets per

Improvement of Link Cache Performance in Dynamic Source Routing (DSR) Protocol

377

transaction) between the source and the destination: resource discovery, textmessaging, object storage/retrieval, queries and short transactions. The proposed approach to shorten it consists of an Active Packet (AP) that travels through the nodes of the network twice. During the first travel, it visits the nodes and collects fresh network topology information. When the first visit is finished, another one will be started to validate and update the route caches of the nodes with this newly obtained information. This mechanism removes invalid cache links and stores the currently existing links based on the collected topology information. The valid and full information in caches allows to speed up the route discovery and even to avoid it. It is concluded that the Active Packet approach achieved its objective, by systematically improving to some extent all the metrics under analysis, except overhead. The improvements are in most of the cases more significant in high mobility of the nodes (smaller values of the pause time).

References [1] Broch, J., Maltz, D.A., Johnson, D., Hu, Y.-C., Jetcheva, J.: A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols. In: Proceedings of the 4th Annual ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), Dallas, Texas, pp. 85–97 (1998) [2] Johnson, D., Maltz, D., Hu, Y.-C.: The Dynamic Source Routing for mobile ad hoc networks, IETF Internet Draft (July 2004), http://www.ietf.org/internet-drafts/draft-ietfmanet-dsr-10.txt [3] Hollan, G., Vaidya, N.: Analysis of TCP performance over mobile ad hoc networks. In: Proc. 5th ACM/IEEE MOBICOM, pp. 219–230 (1999) [4] Marina, M., Das, S.: Performance of routing caching strategies in Dynamic Source Routing. In: Proc. 2nd WNMC, pp. 425–432 (2001) [5] Maltz, D., Brooch, J., Jetcheva, J., Johnson, D.: The effects of on-demand behavior in routing protocols for multi-hop wireless ad hoc networks. IEEE J. on Selected Areas in Communication 17(8), 1439–1453 (1999) [6] He, Y., Raghavendra, C.S., Berson, S., Braden, B.: Active Packets Improve Dynamic Source Routing for Ad-hoc Networks. In: Proceedings OpenArch 2002 (June 2002) [7] Fall, Varadhan, K. (eds.) ns notes and documentation. The VINT Project, UC Berkeley, LBL, USC/ISI, and Xerox PARC (1997) [8] The Monarch Project. Mobile networking architectures. http://www.monarch.cs.rice.edu/ [9] Yoon, J., Liu, M., Noble, B.: Random Waypoint Considered Harmful. Electrical Engineering and Computer Science Department. University of Michigan [10] Maltz, D., Brooch, J., Jetcheva, J., Johnson, D.: The effects of on-demand behavior in routing protocols for multi-hop wireless ad hoc networks. IEEE J. on Selected Areas in Communication 17(8), 1439–1453 (1999) [11] Das, S.R., Perkins, C.E., Royer, E.M.: Performance Comparison of Two On demand Routing Protocols for Ad Hoc Networks. In: Proceedings of the IEEE Conference on Computer Communications (INFOCOM), Tel Aviv, Israel, pp. 3–12 (March 2000) [12] Johansson, P., Larsson, T., Hedman, N., Mielczarek, B., Degermark, M.: Scenario-based Performance Analysis of Routing Protocols for Mobile Ad-Hoc Networks. In: Proceedings of the 5th ACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), Seattle, WA, pp. 195–206 (August 1999)

378

D. Marandin

[13] Hollan, G., Vaidya, N.: Analysis of TCP performance over mobile ad hoc networks. In: Proc. 5th ACM/IEEE MOBICOM, pp. 219–230 (August 1999) [14] Hu, Y.-C., Johnson, D.B.: Caching strategies in on-demand routing protocols for wireless ad hoc networks. In: ACM/IEEE MOBICOM, pp. 231–242 (2000) [15] Perkins, C., Royer, E., Das, S., Marina, M.: Performance comparison of two on-demand routing protocols for ad hoc networks. IEEE Personal Communications 8(1), 16–28 (2001) [16] Marina, M.K., Das, S.R.: Performance of Route Cache Strategies in Dynamic Source Routing. In: Proc. Second Wireless Networking and Mobile Computing (WNMC) (April 2001)

tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing in Wireless Sensor Networks Evgeny Osipov, LTU Lule˚ a University of Technology, Department of Computer Science and Electrical Engineering, Campus Pors¨ on, S-971 87 Lule˚ a, Sweden

Abstract. In this paper we consider a problem of implementing a hybrid routing protocol for wireless sensor networks, which natively supports data-centric, geographic-based and address-centric communication paradigms. We demonstrate the feasibility of such protocol by presenting tinyLUNAR, an adapted to the specifics of sensor networks reactive routing scheme originally developed for mobile wireless ad hoc networks. In addition to the support for several communications paradigms tinyLUNAR implements highly efficient multihop forwarding using only 1 B field that can be directly encoded in the standard IEEE 802.15.4 MAC header.

1

Introduction

Over the recent years wireless sensor networks (WSN) appeared as a unique networking environment with respect to routing amongst other aspects. Firstly, WSNs inherit the need for routing on geographic coordinates from its closest “relative” mobile ad hoc networks (MANETs) due to spatial distribution of nodes. Secondly, differently from MANETs and the Internet in general where communications are purely address-centric, the communications in sensor networks are heavily data-centric. With this type of communications a set of nodes satisfying certain attributes (e.g. particular readings of on-board sensors larger than a pre-defined threshold in a specific geographic region) should report the information mainly in connection-less manner. However, while data-centric communications dominate in sensor networks, there are numbers of applications that still require address-centric communications (e.g. sending an alarm message to a base station with a pre-defined unique ID). Finally, being built of severely energy and computing resources constrained devices WSNs place serious performance requirements routing protocols and data forwarding. 



The work described in this paper is based on results of the IST FP6 STREP UbiSec&Sens (www.ist-ubisecsens.org). A significant part of this work has been performed in the Department of Wireless Networks in RWTH Aachen University while the author was working there.

Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 379–392, 2007. © Springer-Verlag Berlin Heidelberg 2007

380

E. Osipov

The motivation behind this work is straightforward. During the last several years the area of WSN routing blossomed with a variety of protocols that separately support data-centric, geographic-based and address-centric communication paradigms. A good overview of the existing approaches is presented in [1]. One general conclusion, however, is that none of the existing protocols supports all three communication styles. This leads to a situation that in complex WSN applications such as large scale surveillance in homeland security scenarios where all types of communications are present at least three software instances for routing is needed. While not arguing against the development of specialized protocols, in this work we want to demonstrate the feasibility of designing an efficient hybrid protocol natively supporting multiple communication paradigms. In this paper we present tinyLUNAR, an adapted Lightweight Underlay Ad hoc Routing protocol [18] to the specifics of WSN environment. The major contribution delivered by our protocol is implementation of the multihop forwarding using a one byte field that can be encoded directly in the IEEE 802.15.4 MAC header. Secondly, tinyLUNAR supports multiple data communication types in one package, exposes flexible interfaces to an application level programmer and delivers competitive performance in comparison to the existing protocols. The protocol is implemented under TinyOS v.2.x1 and currently is the default routing scheme for the secure distributed data storage middleware tinyPEDS [3]. To the best of our knowledge currently there are no competitors to tinyLUNAR by its existing functionality and potential capabilities. The paper is structured as follows. In Section 2 we present the considered network model overview routing principles in wireless sensor networks and formulate the design requirements for tinyLUNAR. We outline our solution and present the background material in Section 3. The details of tinyLUNAR operations follow in Section 4. Overviewing the related work in Section 5 we discuss future developments of the protocol in Section 6. We conclude the paper in Section 7.

2

Problem Statement: Design Objectives for tinyLUNAR

2.1

Networking Model of WSN

We design tinyLUNAR for a network formed by wireless sensor nodes arbitrarily deployed in a monitored area. We do not place any assumptions either on the scale of the network or on its topological structure. The network can be hierarchical, deploying certain cluster formation scheme, or flat. We consider a relatively static network with low or no node mobility. The low mobility is implicitly present in a form of changing the position of a cluster head node and eventual node failures. Note that we neither place any specific assumptions on the structure of node’s IDs. Moreover, we consider a general case where each sensor node maintains a set of identities, including a) MAC addresses; b) position information (geographic coordinates, relative position); c) functional roles (cluster head, actuator); 1

TinyOS web portal. Online. Available: http://www.tinyos.net

tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing

381

d) description of on-board sensors (temperature, humidity); etc. We also do not consider availability of a centralized directory service for address/name resolution. The target application of the sensor network requires both data-centric and address-centric communications. The network supports anycast, unicast, convergecast and multicast traffic flows. 2.2

Routing in Wireless Sensor Networks

There are two routing approaches applicable to wireless sensor networks. In centralized routing schemes [11,2] the forwarding map is computed at a base station based on the link and neighbor information gathered from the network. While one can achieve accurate computation of optimal paths in this case, the centralized schemes are known for poor scalability as they require periodic gathering of the topology information from all network nodes. On the other hand, in the distributed approaches [5, 13, 14] the routes are computed cooperatively by all network nodes exchanging the network state and control information. In this paper we concentrate on the decentralized routing approaches only. The distributed routing approaches fall into two global categories: the proactive and reactive schemes. The proactive approach is inspired by the routing experience in the wireline Internet. The routing topology is created prior to data transmissions from mobile nodes. The routing information is then dynamically updated according to changes of the network topology. In contrast, the reactive routing approach assumes no existing routing state in the network prior to data transmission from the particular station. Upon arrival of a first data packet the node enters a route discovery phase in which it announces the request for the particular destination address to the network. In reactive routing the routing information is maintained in the network only for the period of activity of the particular session. The major representatives of proactive routing for WSN is DSDV [14]. For reactive routing these are adapted versions of DSR [8] and AODV [13]. A special subclass of reactive routing schemes constitute self-routable protocols. In this approaches the forwarding state is not maintained by intermediate relay nodes. Instead, the forwarding decision is taken individually for every packet. Examples of the self-routable schemes are geographic routing [10, 9] and flooding. 2.3

Design Objectives for tinyLUNAR

The protocols referenced in the previous section represent only a small share of approaches developed for sensor networks during the last decade. While each scheme targets a specific need in WSNs one conclusion is clear: Currently there are no solution supporting several communication types in one package. The major design objective for tinyLUNAR is a capability to support as many types of traffic flows as possible for both data-centric and address-centric communications. Of course, functional universality comes at a price of increased complexity. With tinyLUNAR we want to understand how complex a universal protocol can be. At the same time we are not arguing against the development of specialized routing protocols. They obviously are best suited in specially engineered and rather narrow-purpose

382

E. Osipov

WSNs. However, we foresee that in large scale distributed networks with rich functionality a single implementation of routing supporting several connection types is more efficient than several separate single purpose schemes.

3

From LUNAR to tinyLUNAR: Solution Outline

The original Lightweight Underlay Adhoc Routing [18] is a reactive protocol that uses a simple mechanism of flooding and limited re-broadcasting (by default the range is limited to three hops) to establish a label switching virtual circuit in the forwarding plane. While keeping the core logic of its predecessor the operations of tinyLUNAR is different in the following way. Firstly, tinyLUNAR does not interpret data packets as connection initiation events. Instead, the protocol exposes well defined interfaces that allow upper layer programmers to configure the characteristics of the path. With these interfaces it is possible to parametrically specify a) various classes of destination identifiers; b) the type of the desired communications; and c) the propagation behavior for the route request messages. Secondly, in the data forwarding plane we change the dimension and impose certain structure on the selector2 field. The size of the selector field in tinyLUNAR is 1 byte. This potentially allows implementing the multihop forwarding using only the type field of the standard IEEE 802.15.4 MAC header and not consuming the payload space. Finally, we scale the forced path re-establishment mechanisms accordingly to satisfy bandwidth and energy limitations of wireless sensor networks. While the periodic route rediscovery is still present in tinyLUNAR, the duration of the period is tailored to a specific application that uses the protocol in a particular sensor network. For example, if tinyLUNAR is used in a hierarchical network to build routes toward a cluster head then, naturally, the the period is synchronized with the cluster head re-election mechanism. 3.1

Packet Forwarding Via Label Switching

Label switching is technique for overcoming the inefficiency of traditional layer 3 hop-by-hop routing. In the Internet the label switching (or virtual circuit) model is used amongst others in MPLS [16]. A simplified example of multihop data forwarding using label switching is illustrated in Figure 1. Assume each application running at a particular node is assigned with an ID number which allows a deterministic internal multiplexing of data packets between different applications3 . Suppose also our application needs a bidirectional unicast path for communication. In this setting the application with AppID = 120 at nodes with addresses X and Z communicate through a node with address Y . The figure shows the content of the forwarding table 2

3

Traversing the network LUNAR’s route request messages obtain a special forwarding label (called selector) in each router along a path to a destination. Further on, we use the terms application, component, interface in the sense defined by TinyOS.

tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing 2

App (AppID=120)

App (AppID=120) Out lbl 1

120

2

1

Addr

Out lbl 1 2

Y

Addr Z

2

Out lbl

X

4

Addr

1 2

1

383

3

3

4

4

4

5

5

5

FT at ID=X

FT at ID=Y

120

3 3

Y

FT at ID=Z

Data packet format: MAC header

Link Label header

Payload

Fig. 1. Packet forwarding via label switching

(FT) in each node. The label switching paradigm consists of two phases: the path establishment and the data forwarding. Upon establishing a connection node X locally generates an incoming label and creates an entry in its FT. This number will be used by the next hop node to send packets to X on the backward path. In the created entry node X sets the outgoing label to the application ID 120 as the destination for the incoming packets. The empty address field in the first FT entry of X in our example indicates that the packet must be multiplexed locally to the application. Finally, node X signals the generated label to the next hop node in a path establishment message. During the forward propagation of this message node Y locally generates its incoming label (in our case 3). As an outgoing label it specifies the received incoming label from node X and also records the address of the node from which the request is received. In its turn node Z figures out that it is the destination node and performs the following actions. Firstly, it creates a backward label (in our case 4), which will be used by local application to send packets to node X. The outgoing label is set to the incoming label generated by node Y , the address of Y is recorded in the corresponding field. Secondly, the local application (AppID = 120) is signaled that for sending packets to node X it needs to indicate 4 as the entry point to the virtual circuit between Z and X. Finally, node Z initiates building of the forward path from X to itself by the same procedure described for node X. The path establishment message returns back to node X using the established backwards path from Z to Y and from Y to X. The result of the path establishment procedure are two numbers available for our communicating applications indicating the entry points of the established virtual circuits (2 from node X and 4 from node Z). The data forwarding that may commence immediately upon successful path establishment is straightforward. Each packet is assigned with an outgoing label as shown in the figure. In packets sent to the network it is the outgoing label recorded in FT for a specific incoming label. Normally, the incoming labels are directly mapped into FT indices inside the next hop node. Originally, this was meant to allow the label lookup and the label switching procedures to be placed directly within the switching fabric.

384

E. Osipov Concast logic

App_i Snd

Rcv

Snd

RTAction1 handler

Mcast logic

Rcv Snd

Rcv Snd

Rcv

RTActionN handler Snd

Rcv

tinyLUNAR Send

Receive

ActiveMessage component Send

Receive

MAC + Radio

Fig. 2. Position of tinyLUNAR in the TinyOS software architecture

This would enable the label switching technique performing much faster than the traditional routing method where each packet is examined in CPU before a forwarding decision is made4 .

4

tinyLUNAR: Protocol Description

Overall, the design of tinyLUNAR is partially influenced by the specifics of TinyOS software architecture. These design steps, however, do not affect the applicability of the protocol to a general class of embedded operating systems. As is the case with its predecessor we position tinyLUNAR directly above the link layer5 as illustrated in Figure 2. It intercepts all broadcast and unicast packets addressed to a node and is responsible for their forwarding to the remote destination and local demultiplexing between the communicating components. All networking components communicate directly through tinyLUNAR. Note that by networking components we mean pieces of software with self-consistent functionality, e.g. aggregator node election, multicast/convergecast handlers, etc. Each local component executed in a node is assigned a locally unique ID. The networking components can be one of the two types depending on the type of communications : a) connection-less and b) connection-oriented (uni- or bidirectional). 4.1

Modified Selector Structure

The selector field appended to each outgoing data packet and all tinyLUNAR control routing messages is a one byte integer, its format is shown in Figure 3(a). 4

5

One general comment is needed on the uniqueness of the labels. Since incoming labels are assigned internally by the relay node the values are obviously only locally unique. The determinism in multihop forwarding between several neighbors is achieved by per-link uniqueness introduced by MAC and link layer addresses. In wireless sensor networks achieving the global uniqueness of MAC level IDs is challenging. However, for correct functionality of the label switching forwarding it is sufficient to have a two-hops MAC uniqueness. An example of a distributed address auto-configuration protocol which achieves the needed uniqueness level is [15]. In TinyOS the ActiveMessage component situated directly above the MAC layer can be regarded as the link layer component.

tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing

NetPTR

APP/FWD 1 bit

Index DstAddr (2 B)

7 bits 1 Byte

(a) Format of the selector field.

1 2 3

a1 a2

NetPTR (1 B) np1 np2

AppID (1 B)

ConnID (1 B)

Flags (1 B)

app1

con1

app2

con2

f1 f2 f3

385

Signature (2 B) s1 s2

(b) Structure of the forwarding table.

Fig. 3. Format of the selector field and the structure of forwarding table in tinyLUNAR

The first bit of the selector field splits the value space two parts: the application and the forwarding parts. When the AP P/F W D bit is zero tinyLUNAR treats the following seven bits denoted as N etP T R as an ID of the internal application or service component where the packet shall be multiplexed. When the APP/FWD bit is set the N etP T R part is interpreted as the incoming label as explained in Section 3.1. The rationale behind restricting the selector field to one byte is simple. We want to minimize the overhead due to multihop forwarding. In addition, we want to maximally re-use the standard fields of the IEEE 802.15.4 MAC header. Our intention is to encode the selector value in the type field of the later. 4.2

Generation of Incoming Labels

The format of the forwarding table is shown in Figure 3(b). The purpose of the DstAddr, N etP T R, AppID fields is essentially the same as in our example in Section 3.1. The ConnID field is used for connection oriented application components to differentiate between incoming (outgoing) connections. The field f lags is used internally for the FT management purposes; Signature is used to detect duplicates of route request messages, its computation is described below. The procedure for local generation of incoming labels is as follows. As it is described in Section 3.1, the major objective for the label switching forwarding is to make the route lookup procedure fast. Recall also that many of the embedded operating systems including TinyOS lack the support for dynamic memory allocation. In tinyLUNAR we statically allocate the memory for the forwarding table6 . The incoming label in our protocol is the index of the FT entry. In this way we completely avoid implementing a sophisticated searching algorithm. Upon reception of a packet with a selector belonging to the forwarding space we find the outgoing label and the MAC address of the next hop by directly indexing the forwarding table with the value extracted from the N etP T R part. Recall that the selector field should be present in all control messages of tinyLUNAR. In order for the routing layer to direct these packets to the corresponding processing blocks we assign the selector values for route request (RREQ) and route reply (RREP) messages in the application space. The two values are the same for all RREQs and RREPs generated by any node. 6

The number of entries in the FT is an adjustable parameter of tinyLUNAR, it is bounded by 128 entries (maximal number of incoming labels that can be encoded in 7 bits of the selector).

386

E. Osipov

N0

N1

N2

RREQ: TTL

AppID

num_IDs

DST_ID_SET match_action not_match_action replyto_sel replyto_addr Signature

RREQ

RREQ

RREQ

RREP:

Route Timeoute

RREP

AppID replyto_sel

replyto_addr replyfor_sel

DATA

DATA: Type field (1 Byte) FWD SELECTOR

RREQ Time

RREQ

RREQ

Standard MAC header

Payload

RREQ

Fig. 4. Path establishment and data forwarding phases of tinyLUNAR

4.3

Path Establishment in tinyLUNAR

The path establishment and the data forwarding phases of tinyLUNAR are schematically shown on a time sequence graph in Figure 4. TinyLUNAR is a reactive routing protocol where the route establishment is initiated upon a demand from the application. Normally, in all reactive routing schemes the indication of such demand is the first data packet arriving from an upper layer component. Recall, however, that our major design objective is to create a flexible routing protocol supporting different types of addressing and communications. We foresee that guessing the routing properties from the content of data packets is difficult. Instead, we decided to expose a set of intuitive interfaces to upper-layer programmers to allow them actively configuring the characteristics of desired routes. An intuitive abstraction for the route request procedure (RREQ) from the programmer’s point of view is an “if” statement formed by the source node that travels through the network. The intermediate nodes check whether identity condition matches their own identity. If yes then the node performs a matching action otherwise a not matching action is invoked. The identity condition is a set of tuples (ID CLASS, VALUE) that describes an individual node or a set

tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing

387

RREQinit(); appendRREQcondition(ID_CLASS, VALUE);

appendRREQcondition(ID_CLASS, VALUE); addRREQaction(MATCH_ACTION, ACTION); addRREQaction(NOT_MATCH_ACTION, ACTION);

IF(Identity condition(s)) DO match action. ELSE DO not match action ENDIF

RREQfini(); Parameter

Meaning

IDENTITY CONDITIONS (256 ID classes are possible) ID CLASS GEOGRAPHIC Geographic coordinates ID CLASS ROLE Functional role played by node (e.g. cluster head) ID CLASS ADDRESS Unique address (e.g. MAC) MATCH ACTIONS (256 MATCH and NOT MATCH actions are possible) RT ACTION ROUTE UNIDIR Establish one way unicast route RT ACTION ROUTE BIDIR Establish bidirectional unicast route RT ACTION SUBSCRIBE Establish a multicast tree RT ACTION REPORT Establish a convergecast tree (directed diffusion like) NOT MATCH ACTIONS RT ACTION REBCAST Re-broadcast to all. Simple flooding. RT ACTION REBCAST GEO Re-broadcast geo based

Fig. 5. Interfaces exposed by tinyLUNAR to actively control the characteristics of the path and examples of ID classes, matching and not matching actions

of nodes. The match action allows a programmer to specify the behavior of the destination node(s). With the not match action the programmer may coordinate the propagation of the RREQ message through the network. Figure 5 illustrates the above concepts. The following piece of the NesC code from a networking component shows an example of forming a RREQ message for a bidirectional path to a node which is a cluster head in a neighboring cluster, the RREQ should be propagated using the shortest geographic distance. dstregion= getDSTregion(); if(call tinyLUNAR.RREQinit()== SUCCESS) { //First condition should always be "OR call tinyLUNAR.appendRREQcondition(OR_CLAUSE,ID_CLASS_GEOGRAPHIC, CONDITION_TYPE_EQ, (void*)dstregion); //Subsequent conditions could be any call tinyLUNAR.appendRREQcondition (AND_CLAUSE,ID_CLASS_ROLE, CONDITION_TYPE_EQ, (void*)ROLE_CLUSTER_HEAD); call tinyLUNAR.addRREQaction(MATCH_ACTION, RT_ACTION_ROUTE_BIDIR); call tinyLUNAR.addRREQaction(NOT_MATCH_ACTION, RT_ACTION_REBCAST_GEO); call tinyLUNAR.finiRREQpacket(APP_ID); }

When being called by the RREQf ini() interface, tinyLUNAR creates an entry for incoming route reply or data messages, forms and sends the RREQ

388

E. Osipov

message. In the first vacant entry of the forwarding table field AppID is the ID of a component that initiates the route discovery. The index of the entry becomes an incoming label. The entry with index 1 in Figure 3(b) is an example of an entry created through the RREQf ini() interface. The format of the RREQ message is shown in Figure 4. There AppID is the ID of the communicating component, T T L is the maximum hop count that RREQ is allowed to traverse, numID is the number of items in the following identity set (DST ID SET ), match action and not match action fields are set by using the appendRREQcondition() interface, replyto sel is the locally generated incoming label and replyto addr is the MAC address of the node. The Signature field is a CRC number computed over all fields of the RREQ message. The completed message is marked with the RREQ selector and is sent to the broadcast MAC address. The following after that path establishment procedure with respect to setting up the label switching path follows the logic presented in Section 3.1. The entry with index 2 in Figure 3(b) is an example of an entry created at a relay node after processing and re-broadcasting the RREQ message and entry 3 is created at the destination node before sending the RREP message. Further we only highlight differences in the path establishment procedure of tinyLUNAR: the identity checking, the decision for RREQ propagation and the reaction of the destination nodes on the received RREQ messages. The identity checking procedure while being more complex than conventional matching of a single destination address is rather straightforward and follows a similar logic. With tinyLUNAR a node has a choice on the particular pattern to propagate the route request when its identity does not match the one in the RREQ message. In the simplest case it is a “blind” re-broadcast. However, the node may drop the not matched request if it does not satisfy the propagation parameters. For example if one of the target identities is a geographic coordinate and the not match action is geographic forwarding then the node will re-broadcast the request only if it is within a certain proximity metric to the target region. As for the matching action the destination node will issue the route reply message only if the specified type of communication requires either joining a multicast group, sending acknowledgments (reliable routing) or a bi-directional unicast path. However when the match action is REPORT the matching node may start sending the data immediately upon completed processing of the RREQ message. In the forwarding plane tinyLUNAR provides a connection-less service. When upper layer components require connection-oriented communication this can be achieved by using port numbers in the protocol specific header. TinyLUNAR supports connection-oriented components by interfacing the ConnID field in the forwarding table to such components. It remains to say that as is the case with LUNAR our protocol adopts the simplest route recovery procedure. It re-builds paths periodically from scratch. In tinyLUNAR the route expiration time is tailored to the specifics of relatively static sensor networks. We keep a route as active for several minutes, moreover the timer is shifted every time a data packet arrives on this route. In hierarchical

tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing

389

Table 1. Memory footprint of tinyLUNAR and tinyAODV Item tinyLUNAR FIB (B) 56 RAM Supporting (B) 714 Total (B) 770 ROM Total (B) 1134

tinyAODV 133 (incl. cache) 204 337 2760 (AODV Core and AODV fwd components)

networks we foresee that the route expiration timer is synchronized with a cluster head election protocol. 4.4

Implementation Details and Memory Footprint

We implemented tinyLUNAR in TinyOS v 2.x operating system. Note that currently TinyOS uses the type field of the IEEE 802.15.4 MAC header to indicate the communicating applications. In order to minimize changes to the core parts of the operating system we decided to encode the selector in the payload part of the packet for the proof-of-concept implementation. The memory footprint of tinyLUNAR component and the reference numbers from the implementation of tinyAODV (the number of supported FT entries in both protocols is 7) are shown in Table 1. Note that tinyLUNAR consumes twice less ROM memory than its counterpart. With respect to the RAM consumption the total number for tinyLUNAR is higher, however the size of the FT table in memory is more than twice lower. The remaining RAM overhead in TinyLUNAR comes from the universality of the protocol. An additional gain in RAM can be achieved by further careful optimization of the implementation.

5

Related Work

The major representatives of routing protocols for wireless sensor networks are overviewed in Section 2. In this section we discuss the relation of tinyLUNAR to the existing approaches. The universality of the protocol through supporting both data-centric and address-centric communications uniquely positions our approach in the domain of WSN routing. The ports of almost all address-centric protocols from MANETs such as DSDV, AODV remain address-centric in WSNs. One exception from this rule could be a possible modification of DSR, where the forwarding plane (source routing) is separated from the routing plane (reactive route request). With relatively easy modifications the addressing concept presented here can be adapted to work with DSR as well. However, one clear advantage of tinyLUNAR is its ability to conduct multihop communications using only one byte overhead. Obviously, the source routing of DSR becomes quickly less efficient on paths larger than one hop. As for the data-centric routing, the parametrized specification of the destination node(s) in tinyLUNAR allows to implement address-centric communications as a special case of data-centric ones. In principle, any existing data-centric

390

E. Osipov

routing scheme can be adapted according the principles described in this paper. However, to the best of our knowledge we are unaware of such attempts. Furthermore, normally in the data-centric domain the routing issues are hidden behind a specific data-centric application. Typical examples of this are TinyDB [12] and Directed Diffusion [6], where the authors mainly focus on a systematic way of querying the information from the network and not on the routing issues.

6 6.1

Discussion and Future Developments Addressing and Routing for WSN

While tinyLUNAR allows address-centric connections, a general question, however, is to which extend this communication paradigm is suitable for wireless sensor networks. In the original Internet architecture addresses indicate the location of the data, while names (e.g. URLs, email, etc.) are used to describe the communication parties [17]. In the context of the Internet, however, addresses were mainly introduced to hierarchically organize the network and to perform an efficient route lookup based on fixed-length bit sequences. Another property of addresses that justify their usage in the Internet is centralized control over their spatial distribution. The presence of names and addresses implied a two stage destination resolution process: Firstly the name is mapped to a destination address using a name resolution service and then the address is mapped to the forwarding path using a routing protocol. In a general class of randomly deployed and large scale wireless sensor networks, however, the control on global address distribution and the subsequent their centralized topological organization is rather infeasible task. In this case the Internet’s purely address-based routing approach appears as redundant to WSNs. This also limits the usability of ports of the address-centric MANET routing protocols as an additional bandwidth and energy consuming directory service is required. TinyLUNAR on contrary follows an alternative communication model routing by name, appeared in the early time of the Internet [4] and recently revived and proved to be successful in the context of address self-configuration in MANETs [7]. 6.2

Future Development

In general, the active message communication paradigm is a very powerful tool which increases the intelligence of the network. We conjecture that using this mechanism only for internal multiplexing between the communication components as it is currently implemented in TinyOS is suboptimal. The selectors used in tinyLUNAR play a twofold role. Firstly, they are used as forwarding labels; secondly, inside the end nodes they also indicate the application to which the packet must be multiplexed. Thus, having multiple functional roles tinyLUNAR selectors semantically better reflect the meaning of active message tags. In our future work we intend to move the tinyLUNAR functionality as close to the MAC layer as possible. In the case of TinyOS this would require modification of the ActiveM essage component.

tinyLUNAR: One-Byte Multihop Communications Through Hybrid Routing

391

The current implementation of tinyLUNAR includes a limited set of route request and reply actions. In particular, implementation of the RREQ propagation handler based on geographic coordinates, support for in-network processing and building the multicast and convergecast trees remain to be implemented. We leave the development of these issues for our future work. We also consider inserting some connection-oriented functionality in tinyLUNAR by encoding a limited number of ports in the selector field. By this we intend to further reduce the communication overhead for a selected class of connection-oriented applications.

7

Conclusions

In this paper we presented tinyLUNAR, a reactive routing protocol for wireless sensor networks. TinyLUNAR features the simplicity of its predecessor originally developed for mobile ad hoc networks. We showed that multihop forwarding in wireless sensor networks is feasible to implement using only one byte field of the IEEE 802.15.4 MAC header by adopting the label switching forwarding paradigm. The interfaces exposed by tinyLUNAR to upper-layer programmers allow flexible configuration of the protocol’s behavior. One distinct feature which makes our protocol unique is its ability to build routes to parametrically specified destinations. With this property TinyLUNAR is capable to establish routes both for data-centric and address-centric communications.

References ´ 1. Acs, G., Butty´ an, L.: A taxonomy of routing protocols for wireless sensor networks. In: Hiradastechnika, December 2006 (2006), [Online]. Available: http://www.hit.bme.hu/∼ buttyan/publications/AcsB06ht-en.pdf 2. Deng, J., Han, R., Mishra, S.: INSENS: Intrusion-tolerant routing in wireless sensor sensor networks. In: Department of Computer Science, University of Colorado, no. CU-CS-939-02, Boston, MA (2002) 3. Girao, J., Westhoff, D., Mykletun, E., Araki, T.: Tinypeds: Tiny persistent encrypted data storage in asynchronous wireless sensor networks. Elsevier Journal on Ad Hoc Networks (2007) 4. Hauzeur, B.: A model for naming, addressing and routing. ACM Transactions on Office Information Systems (October 1986) 5. Hill, J., Szewczyk, R., Woo, A., Hollar, S., Culler, D., Pister, K.: System architecture directions for networked sensors. In: Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2000) 6. Intanagonwiwat, C., Govindan, R., Estrin, D.: Directed diffusion: a scalable and robust communication paradigm for sensor networks. In: MOBICOM, pp. 56–67 (2000), [Online]. Available: http://doi.acm.org/10.1145/345910.345920 7. Jelger, C., Tschudin, C.: Dynamic names and private address maps: complete selfconfiguration for manet. In: ACM CoNEXT’06, December 2006, ACM Press, New York (2006) 8. Johnson, D.: Routing in ad hoc networks of mobile hosts. In: Workshop on Mobile Computing Systems and Applications, Santa Cruz, CA, U.S. (1994)

392

E. Osipov

9. Karp, B., Kung, H.T.: Gpsr: Greedy perimeter stateless routing for wireless sensor networks. In: Proc. ACM MOBICOM, Boston, MA, August 2000 (2000) 10. Kuhn, F., Wattenhofer, R., Zollinger, A.: Worst-case optimal and average-case efficient geometric ad-hoc routing. In: Proc. 4th ACM International Conference on Mobile Computing and Networking, ACM press, New York (2003) 11. Li, Q., Aslam, J., Rus, D.: Hierarchical power-aware routing in sensor networks. In: Proc. DIMACS Workshop on Pervasive Networking (May 2001) 12. Maden, S.R., Franklin, M.J., Hellerstein, J.M.: TinyDB: An acquisitional query processing system for sensor networks. ACM Transactions on Data Base Systems (March 2005) 13. Perkins, C., Belding-Royer, E., Das, S.: Ad hoc On-Demand Distance Vector (AODV) Routing. RFC 3561 (Experimental) (July 2003), [Online]. Available http://www.ietf.org/rfc/rfc3561.txt 14. Perkins, C.E., Bhagwat, P.: Highly dynamic destination-sequenced distance-vector routing (DSDV) for mobile computers. In: SIGCOMM, pp. 234–244 (1994), [Online]. Available: http://doi.acm.org/10.1145/190314.190336 15. Ribeiro, C.: Robust sensor self-initialization: Whispering to avoid intruders. In: IEEE SECURWARE’07: International Conference on Emerging Security Information, Systems and Technologies, IEEE Computer Society Press, Los Alamitos (2007) 16. Rosen, E., Viswanathan, A., Callon, R.: Multiprotocol Label Switching Architecture. RFC 3031 (Proposed Standard), (January 2001), [Online]. Available http://www.ietf.org/rfc/rfc3031.txt 17. Shoch, J.: Inter-network naming, addressing, and routing. In: 17th IEEE Conference on Computer Commmunication Networks, IEEE Computer Society Press, Los Alamitos (1978) 18. Tschudin, C., Gold, R., Rensfelt, O., Wiblilng, O.: Lunar: a lightweight underlay network ad-hoc routing protocol and implementation. In: Next Generation Teletraffic and Wired/Wireless Advanced Networking (NEW2AN’04) (2004)

On the Optimality and the Stability of Backoff Protocols Andrey Lukyanenko University of Kuopio, [email protected]

Abstract. In this paper, we analyze backoff protocols, such as Ethernet. We examine a general backoff function (GBF) rather than just the binary exponential backoff (BEB) used by Ethernet. Under some mild assumptions we find stability and optimality conditions for a wide class of backoff protocols with GBF. In particular, it is proved that the maximal throughput rate over the class of backoff protocols with N stations  1 is 1 − N1 N −1 and the optimal average service time for any station is 

1

ES = N/ 1 − N1 N −1 or about N e for large N . The reasons of the instability of the BEB protocol (for a big enough input rate) is explained. Keywords: Ethernet, backoff protocol, contention resolution, stability, optimality, queueing theory.

1

Introduction

Ethernet was developed in 1973 by Bob Metcalf and David Boggs at the Xerox Palo Alto Research Center. Nowadays, it is the most popular local area network due to the ease in maintenance and the low cost. The principle of Ethernet is that all stations are connected to the same shared medium through transceivers. Whenever a single station wants to send a message, it simply broadcasts it to the medium. When at some moment of time there are two or more messages in the medium, they interfere, and none of them should be received by any station. To deal with unnecessary collisions, a resolution protocol was developed (see [7]). It has the following mechanisms: 1. Carrier detection. This mechanism lets stations know when the network has a message. If any station senses that there is a phased encoded signal (continuous) in the network, then it will defer the time of its own transmission until the channel becomes empty. 2. Interference detection. Each station listens to the channel. When it sends a message it is continuously comparing the signal that has just been sent and the signal in the network at the same moment of time. If these signals have different values, the message is considered to be corrupted. Here we introduce the round trip time: this is the time during which the signal propagates from one end of the network to the other and back. Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 393–408, 2007. c Springer-Verlag Berlin Heidelberg 2007 

394

A. Lukyanenko

3. Packet error detection. This mechanism uses checksums to detect corrupted messages. Every message with a wrong checksum is discarded. 4. Truncated packet filtering. This mechanism lets us reduce the load on the system if the message is already corrupted (detection during round trip time), and filter them on the hardware level. 5. Collision consensus enforcement. When a station sees that its message is corrupted, it jams the whole network by sending special “jammed” information. This mechanism ensures that no station will consider the corrupted message in the network to be a good one. Due to these mechanisms, a message is not sent when it is known that there is information in the medium, and if a collision happens, it can be clearly sensed. But there is still a possibility that one station decides that the medium is empty, while another has already started a transmission. There will be interference at some point of the network. A probabilistic protocol will help us to avoid this problem. Our work examines a general type of the probabilistic protocol. 1.1

Protocol

Let there be N stations and every station have a queue of messages to send. These stations are connected to a shared medium, where collisions may happen time to time. To deal with such collisions, the backoff protocol from the Aloha network was adopted (see [2,3]). If a collision occurs in the backoff protocol, the next retransmission will be done in one of the next W moments, where W is a time window of certain size and the retransmission in the window is chosen uniformly. A time slot (or just a slot) is a time equal to the round trip time. We can vary this time in our model in order to get model closer to the real protocol, where is no time synchronization. If a collision does not happen during the first time slot (for a message that require more than one timeslot to be transmitted), it will not happen at all, due to the carrier detection mechanism. Due to this behavior, we can consider that the transmission of one message takes only one time slot in our model. The segment of time slots [1 . . . W ] is called a contention window. The idea behind the contention window is that we select a time slot for transmission uniformly in the contention window. The main goal of this principle is to reduce the load on the network, and hence to increase the probability of successful resending of a message during one of the next time slots, within the contention window. Backoff protocols are acknowledgement-based protocols. This means that they store information about their own transmissions. This information is the number of uninterrupted unsuccessful transmissions up to this moment. It is called a backoff counter, and denoted by bi for station i. At first, the counter is equal to 0, and it increases after every unsuccessful attempt. The counter returns back to 0 after a successful transmission, and the message is removed from top of the corresponding queue. The counter is not increased endlessly; at some moment of time it is stopped, and we decide that we cannot successfully send the current message, and discard it. In Ethernet the upper bound for the backoff counter is 16.

On the Optimality and the Stability of Backoff Protocols

395

In general, in any backoff protocol, the contention window changes with the change of the backoff counter. The probability of sending at a time slot of the contention window (Wbi for station i) is a function of the backoff counter (bi for station i), and we call this probability  the backoff function. We consider f (bi ) as a probability, but not necessary bi f (bi ) = 1. At any moment of time the probability f (bi ) defines uniform distribution for the next attempt to transmit when the backoff counter bi is known. We can set the contention window size via f (bi ) ≤ 1, as Wi = f −1 (x)W0 ≥ 1, where W0 is the minimal contention window def

1 size and f −1 (x) = f (i) . For W0 we use value 1 by default, when the opposite is not mentioned. Note, function f −1 (x) not necessary gives integer numbers, in that case, we will define below some a bit modified uniform distribution more precisely. We need to retransmit a message after every collision, or discard it. First of all, we increase the backoff counter, which represents the load of the system. If we know the backoff function for this counter, we can determine the contention window size. Then we take a random value from the contention window, representing some time delay in the slots. This is the time that must elapse before the next transmit endeavor. This random value we call a backoff time, and it is uniformly distributed on the contention window. As an example, in Ethernet the backoff protocol is called the BEB, and f (bi ) = 2−bi , for bi ≤ 10, and f −1 (bi ) = 1024, for bi > 10. As we mentioned before, after M = 16 we discard the packet.

1.2

Related Work

BEB protocol has over 30 years of research history. Results that have been received, appear to contradict each other; some authors say that the protocol is stable, some say it is not. The result of analyses greatly depends on the used mathematical model, i.e. how they mathematically approximate the Ethernet protocol. Here, we are going to mention some of the most interesting research outcomes. Kelly [8] (later with MacPhee [9]) showed that BEB protocol is unstable for λ > 0.693, and strongly stable for λ < 0.567 for infinite model (N = ∞). Also this author says that ”the expected number of successful transmissions is finite for any acknowledgement based scheme with slower than exponential backoff”. Then Aldous [1] with almost the same model found that all acknowledgmentbased protocols are unstable. They used queue-free infinite model. However, later H˚ astad et al. [6] discovered that finite model (i.e. model with a finite number of stations) with queues is stable for polynomial backoff protocol and BEB protocol is unstable for λ ≥ λ0 +

1 with λ0 ≈ .567. 4N − 2

Additionally, the definition of stability in [6] differs from the first authors. They define stability as the finiteness of expected time to return to zero (which is also

396

A. Lukyanenko

called positive recurrence [6]) and finiteness of expected average total queue-size. While the first two authors talk about stability in terms of throughput rate. Also several other result about the stability of Ethernet can be found in literature, see [10, 11, 12, 14, 13]. However, we are mostly interested in works of Bianchi [5] and Kwak et al. [4]. The models proposed by the latter authors seem to be the most reasonable. In [5] the throughput rate of wireless local area network is analyzed for a protocol which is close to Ethernet protocol. Similar model was considered in [4], where some results on the expected delay time and throughput for exponential backoff protocol from Ethernet network with general factor r (in BEB protocol factor r is equal 2) are obtained.

2

Analysis

Our analysis is based on the work of Kwak et al. [4] and Bianchi [5]. We use their model and now present some assumptions adopted from [4, 5]. 2.1

Model with Unbounded Backoff Counter

We have the following assumptions – Our first assumption is that our system is in a steady state. It is a reasonable assumption for a big enough number of stations N . We assume that a large number of stations makes the network work uniformly. In other words, if the number of stations is large, a single station does not have a great effect on the performance of the system. On the other hand, for a small number of stations this assumption may be far from reality (it is possible, for example, that one station may capture the channel, what is called a capture effect). By this assumption at any moment of time we have the same probability pc for the collision of message sent to the medium. – The second assumption is that all stations are identical, so the performance of every station is the same. – The third assumption is that our model is under the saturation condition. Hence, there are always messages waiting in the input. Without saturation assumption the system might show “bette” results, but this assumption lets us understand the worse case. – The last assumption is that the time is divided into time slots of equal length. During every time slot we can send a message, the propagation time of the message is assumed to be equal to the time slot. Every message is synchronized to the time slot bounds. We know that if collision has not happened during the first timeslot for some large message (large means that the message transmission duration is longer then timeslot duration), then most likely, it will not happen in the remaining timeslots of this message with high probability. When a new packet is tried to send for the first time, the (initial) contention window size is W0 . After the first time, the time within which the transmission

On the Optimality and the Stability of Backoff Protocols

i=0 1 - pc

pc

1 - pc

i=1

pc

1 - pc

i=2

pc

1 - pc

i=3

397

pc

1 - pc

Fig. 1. State model

will be tried is delayed by an amount of time which is uniformly distributed over the set {1, . . . , W0 }. Every time we have a collision we increase this delay set according to a backoff function f (i), where 0 < f (i) < 1 for i > 0 and we assume that f (0) = 1. After the ith collision the delay is distributed over {1, . . . , f −1 (i)W0 }. The initial value W0 can be interpreted as the multiplier value for function f −1 (i) (we always see it as a multiplier). After a successful transmission the delay again is distributed in {1, . . . W0 }. In our model, the backoff counter specifies the state of each station. Let Di be the time of staying in state i, called delay, thus we have the following formula for Di : Xi + 1 − Yi 1 Yi = , k = 1, . . . , Xi − Xi Xi (Xi + 1) Xi (Xi + 1) Yi , P r{Di = Xi + 1} = Xi + 1 P r{Di = k} =

(1)

where Xi = f −1 (i)W0  and Yi = f −1 (i)W0 − Xi . The construction above helps to deal with continuous backoff function, and it is applicable if f −1 (i)W0 ≥ 1,

for all i.

(2)

If f −1 (i)W0 is integer then equation (1) has the following (uniform) distribution, P r{Di = k} =

1 f −1 (i)W0

, k = 1, . . . , f −1 (i)W0 .

Definition (1) is almost the same as in [4]; now Xi and Yi are the integer and fractional parts not only for ri W0 , but for f −1 (i)W0 in general (in [4] f (i) = r1i ). Now we know how long we are going to stay in the state i. Next what we should do, is to find the probability Pi to succeed state i. The state model remains the same as in [4] (see Figure 1), hence the probability Pi is Pi = (1 − pc )pic ,

(3)

where the collision probability pc to be determined below. 2.2

System Load

Let EDi be the expected delay for state i. It then follows from (1) that EDi =

Wi + 1 , 2

(4)

398

A. Lukyanenko

where Wi = f −1 (i)W0 . We know that we enter state i with probability Pi and stay in i for EDi time in average. Thus, we can find the probability γi to be in state i at any instant. It corresponds to the fraction of time that system spends in this state in steady-state model: EDi Pi (Wi + 1)(1 − pc )pic = ∞ γi = ∞ j j=0 EDj Pj j=0 (Wj + 1)(1 − pc )pc =

(Wi + 1)(1 − pc )pic . ∞ W0 (1 − pc ) j=0 f −1 (j)pjc + 1

(5)

∞ In general we cannot find the exact value of j=0 f −1 (j)pjc , and furthermore we cannot expect that the sum of the series even converges on 0 < pc < 1. Let us define a new function ∞  def F (z) = f −1 (j)z j . (6) j=0

Denote by ξ = ξ(pc ) the (random) number of successive collisions before the successful transfer, then ∞

 f −1 (i)P {ξ = i} = (1 − pc )F (pc ). E f −1 (ξ) = i=0

Note that we cannot consider F (pc ) as a generating function, because of a dependence between pc and the set of the backoff functions {f (i), i ≥ 0}. Substituting (6) into (5), we obtain a compact form of the equation for γi : γi =

(Wi + 1)(1 − pc )pic . W0 (1 − pc )F (pc ) + 1

(7)

It follows from [4] that the probability to be in state i with backoff timer γi , hence the equal to zero (the station is transmitting in state i) is exactly ED i transmission probability pt at any instant is (see (4)) pt =

∞ ∞   γi 2(1 − pc )pic . = EDi W0 (1 − pc )F (pc ) + 1 i=0 i=0

This immediately implies pt =

2 . W0 (1 − pc )F (pc ) + 1

(8)

Another dependence between pt and pc can be taken from [5]: pc = P {collision} = 1 − P {no transmissions from other N − 1 stations} = 1 − (1 − pt )N −1 .

(9)

On the Optimality and the Stability of Backoff Protocols

399

So we obtain another equation connecting pt and pc : 1

pt = 1 − (1 − pc ) N −1 .

(10)

Combining (8) and (10) implies 1 2 = 1 − (1 − pc ) N −1 . W0 (1 − pc )F (pc ) + 1

(11)

Note that the right-hand side of (11) is 0, when pc = 0, it is 1, when pc = 1 and it monotonically increases with pc . Let def

G(pc ) =

2 . W0 (1 − pc )F (pc ) + 1

(12)

Putting pc = 0 in (12) and taking into account (2) we obtain 0 < G(0) =

2 2 = ≤ 1. W0 f −1 (0) + 1 W0 + 1

To have a unique solution pc of (11) the monotone decrease of function G(pc ) (for 0 < pc < 1) is sufficient. To check this, we calculate derivative   2  G (pc ) = W0 (1 − pc )F (pc ) + 1 pc " ! 2  =− 2 −W0 F (pc ) + W0 (1 − pc ) Fpc (pc ) . (W0 (1 − pc )F (pc ) + 1) As we can see, only the rightmost parentheses part in the equation above dedef  −1 (j)pjc . termines the sign of the derivative G(pc ) . Recall that F (pc ) = ∞ j=0 f Thus we have W0 [−F (pc ) + (1 − pc ) F  (pc )] = ⎡ ⎤ ∞ ∞   f −1 (j)pjc + (1 − pc ) (j + 1)f −1 (j + 1)pjc ⎦ = W0 ⎣− j=0

⎡ = W0 ⎣− ⎡ = W0 ⎣−

∞ 

j=0

f −1 (j)pjc +

∞ 

(j + 1)f −1 (j + 1)pjc −

∞ 

j=0

j=0

j=0

∞ 

∞ 

∞ 

j=0

f −1 (j)pjc +

j=0

(j + 1)f −1 (j + 1)pjc −

⎡ ⎤ ∞  ! −1 " = W0 ⎣ (j + 1) f (j + 1) − f −1 (j) pjc ⎦ .

⎤ ⎦ (j + 1)f −1 (j + 1)pj+1 c ⎤ jf −1 (j)pjc ⎦

j=0

j=0

From the last equations we have that condition f −1 (i + 1) ≥ f −1 (i) for every i is enough to have non-increasing function G. Hence, if we have for at least one

400

A. Lukyanenko

Fig. 2. Intersection points for equation F (x) = L(x), where F (x) is observed in par1+x 1 ticular cases. FQ (x) = (1−x) 3 for quadratic polynomial function, FL (x) = (1−x)2 - for 1 linear function, and FE (x) = 1−2x - for BEB protocol.

k that f −1 (k + 1) > f −1 (k) (in addition to condition f −1 (i + 1) ≥ f −1 (i) for every i), then we have only one intersection. Note, that if W0 > 1 then we need only condition f −1 (i + 1) ≥ f −1 (i). This is a sufficient condition to have unique solution pc satisfying (11). Note that, if we have f (i) = d for all i (Aloha protocol), then the function G will be a horizontal line. Now we resolve equation (11) in such a way to obtain F (pc ): 1

F (pc ) =

1 + (1 − pc ) N −1 .  1 W0 (1 − pc ) 1 − (1 − pc ) N −1

(13)

For z ∈ (0, 1), introduce the function 1

def

L(z) =

1 + (1 − z) N −1 .  1 W0 (1 − z) 1 − (1 − z) N −1

(14)

Thus, the solution of the equation F (pc ) = L(pc ) gives us the value of pc (See Figure 2). We can see that the faster we increase the resolution window, the smaller is the probability of collision. So by this graphics, BEB protocol seems to be better than, for example, polynomial as the contention window of exponential backoff increases faster than the contention window for polynomial. Later we will show that with small number of collisions (big contention window) the channel becomes more and more loaded. (More packets wait in the queue, what is called instability in [6]).

On the Optimality and the Stability of Backoff Protocols

2.3

401

Expected Transmission Time

Next, we should find the average time for a message to leave the system successfully. In this model we do not have another choice, as we do not discard messages. For this reason we should introduce a new random variable. Let NR be the random variable of state at which a node transmits successfully. In other words, it is the number of collisions before a successful transmission. Obviously, the probability to transmit successfully exactly at the ith state is Pi = P (NR = i) = (1 − pc )pic

i ≥ 0.

(15)

Hence, the average number of collisions before transmission is E[NR ] =

∞ 

iPi =

i=0

pc . 1 − pc

Let S be the service time of a message. That is the time since the first attempt to transmit up to the instant when the message leaves the system. (In other words, that is complete time of transmission of a message). Now we can compute the average service time ES of a message being in the top of the queue. Because variables NR and Di are independent, we can use the known property of conditional expectation to obtain ' * *N ++ R '  ' Di = ENR E Di ' NR ES = E ' i=0 i=0 + *N + *N R R   W0 Wi + 1 E[NR ] + 1 −1 = E . f (i) + =E 2 2 2 i=0 i=0 *N R 

+

(16)

The first term in the sum is *N + *∞ + ∞ R    −1 −1 E f (i) = E f (i)½{NR ≥ i} = f −1 (i)P (NR ≥ i), i=0

i=0

i=0

where ½(·) is the indicator function. Recall that P (NR ≥ i) = pic . Then (16) becomes   ∞  1 1 W0 f −1 (i)pic + ES = . 2 1 − pc i=0 Thus, we have finally 1 ES = 2

 W0 F (pc ) +

1 1 − pc

 .

(17)

Now we insert (13) into (17) and obtain the following expected delay of the message at the top of the queue ES =



1 1

(1 − pc ) 1 − (1 − pc ) N −1

.

(18)

402

A. Lukyanenko

By easy algebra, (18) gets minimum at  N −1 1 p∗c = 1 − 1 − N

(19)

 N 1 N →∞ −−−−→ e−1 . 1− N

(20)

Recall the well-known limit

Hence, p∗c

 N −1 1 N →∞ =1− 1− −−−−→ 1 − e−1 , N

(21)

and this simple expression gives us the optimal collision probability p∗c as the number of stations N tends to infinity. 2.4

Stability Condition

Let λ be the incoming rate for the system. We assume that the incoming message uniformly chooses the station in the network. Hence, incoming rate for single staλ tion is N . The condition that the queue of the station reduces in average during time is ES < N λ . This gives the following stability condition of the protocol:   1 (22) λ < N (1 − pc ) 1 − (1 − pc ) N −1 . 2.5

Optimality Condition

Now we could clearly say what conditions should be set to have the best possible protocol (over the class of the backoff protocols). Optimal value (19) of "N −1 ! collision probability p∗c = 1 − 1 − N1 tends to 1 − e−1 as the number of stations N tends to infinity. Also, for this (optimal) value the maximal attainable throughput of the system is

&  1 N −1 λ∗ = sup λ : λ < 1 − , (23) N which tends to e−1 as N tends to infinity. It then follows from (18) and (19) that optimal point pc = p∗c gives the following minimal average service time ES = ! 1−

N

" 1 N −1 N

.

(24)

This expectation tends to infinity as N tends to infinity. Note that for individual station this tendency means instability, if we have infinite number of stations. These results agree with previous results of other authors on stability of infinite model. In spite of tendency to infinity for individual

On the Optimality and the Stability of Backoff Protocols

i =-1

1 i=0 1 - pc

pc

i=1

1 - pc

pc

i=2

1 - pc

pc

1 - pc

i=3

403

pc

1 - pc

Fig. 3. State model without saturation condition

stations the whole network has finite expected service time when N → ∞. For us, it is most significant to know the service time of every station for some fixed finite parameter N (the number of stations), by which we can tune real network for the best performance. Now we see from (13) that to achieve the optimal point, the backoff parameters shall satisfy the following condition F (p∗c ) =

2N − 1  ! "N −1  . W0 1 − N1

(25)

It means that if we can find such a protocol (i.e. the set of backoff fucntions {f −1 (i)}) that (25) holds, then the protocol is the best over class of backoff protocols in the sense of minimization of the average transmission time of a message. 2.6

Elimination the Saturation Conditions

We extend our analysis by omitting the saturation condition. To get it we change the model on the following, see Figure 3. In the new model almost everything remains the same except we add new state −1 representing incoming messages. We assume that there is always some incoming rate (λ > 0). Hence there are always messages waiting in the incoming queue, the only difference is that we need to wait them for distinct the delay time. Let this time be the following random variable ) 0, if q > 0, def D−1 = τ, if q = 0, where τ is a random variable representing the time between incoming messages (E[τ ] = λ1 ) and q is the number of waiting messages in the queue. We can write the time delay for incoming queue in the following way, D−1 = τ · ½{q = 0}. As we can see there is a tight dependence between τ and queue size q. Technique of negative drift could help us to analyze the stability of this model. This technique says if we are outside some finite set has negative expected tendency for the change in the queue size then the system is positive recurrent (see [15]). (Also another condition is required to exclude an infinite jump.) To use this technique we define a random process X(t) = {q(t), b(t), l(t)} for some

404

A. Lukyanenko

i=0 1 - pc

pc

1 - pc

i=1

pc

1 - pc

i=2

pc

i=3

pc

...

pc

i=M

1 - pc

...

1

Fig. 4. State model with bounded counter

station, where q(t) is the number of waiting massages in the queue, b(t) is the size of backoff counter, and l(t) is the number of timeslots remained till the end of current contention window at some moment of time t (Note that b(t) is 0 or l(t) is 0 if q(t) is 0 at some moment of time t). For us the recurrence is enough for the system to be stable. Let us define the finite set outside which we need negative drift, as the set representing station with at least one waiting message in it. Due to this condition the station becomes saturated, and our model becomes identical to the already studied saturation model. The condition of negative drift hence is identical to the inequality (22). 2.7

Model with Bounded Backoff Counter

We can easily extend previous results on the model with an upper bound on the backoff counter (see Figure 4). In this model, if backoff counter exceeds some value M then the message becomes discarded and we take a new one from the queue. Probability to discard a message is P {discard} = pM+1 . c Now we shall recalculate the values for the new model, but some of them do not depend on the number of states and hence they remain the same. One of the unchanged values is the delay time Di . The probability to enter state i is to be modified, now we can find it as Pi =

(1 − pc )pic . 1 − pM+1 c

(26)

Hence the probability to be in state i at any moment of time is Pi EDi (Wi + 1)(1 − pc )pic γi = M = M j j=0 Pj EDj j=0 (Wj + 1)(1 − pc )pc =

(Wi + 1)(1 − pc )pic .  −1 (j)pj + 1 W0 (1 − pc ) M c j=0 f

(27)

Using the same arguments we find that pt = =

M M   γi 2(1 − pc )pic = EDi W0 (1 − pc )FM (pc ) + 1 − pM+1 c i=0 i=0

) 2(1 − pM+1 c , W0 (1 − pc )FM (pc ) + 1 − pM+1 c

(28)

On the Optimality and the Stability of Backoff Protocols

405

 −1 where FM (pc ) = M (i)pic . Note that equation (8) is independent of the i=0 f upper bound for backoff counter. Hence we can use it here. Combining (28) and (8) we have solution for FM (pc ):  " ! 1 1 + (1 − pc ) N −1 1 − pM+1 c .  FM (pc ) = 1 W0 (1 − pc ) 1 − (1 − pc ) N −1

(29)

Applying almost the same formula for the service time (now we have a finite sum instead of an infinite sum) we have ⎡

min{M,NR }

ES = E ⎣







Di ⎦ = ENR ⎣E ⎣

i=0

min{M,NR }

=E⎣

⎡ ⎡

 i=0

min{M,NR }







Wi + 1 ⎦ W0 ⎣ E = 2 2

i=0

' ⎤⎤ ' ' Di '' NR ⎦⎦ ' ⎤

min{M,NR }



f −1 (i)⎦ +

i=0

E[NR ] + 1 , (30) 2

where similar computation for the last component is possible by virtue of the same random variable NR . ⎡ E⎣



min{M,NR }



f

−1

(i)⎦ = E

*M 

i=0

+ f

−1

(i)½{NR ≥ i} =

i=0

=

M 

f

−1

(i)P (NR ≥ i) =

M 

i=0

f −1 (i)pic = FM (pc ).

i=0

Combining again the last equations we have ES =

1 2

 W0 FM (pc ) +

1 1 − pc

 .

(31)

But we already found FM (pc ) for (31), hence ! " 1 − pM+1 c ,  ES = 1 (1 − pc ) 1 − (1 − pc ) N −1

(32)

which is equal to (17) when M = ∞. The negative drift condition for the bounded model will be   1

N (1 − pc ) 1 − (1 − pc ) N −1 ! " . λ< 1 − pM+1 c

When M = ∞ (33) gives (22).

(33)

406

A. Lukyanenko

Fig. 5. Collision probability for Ethernet, where L(x) plots for 11, 51, 101, 501 and 1001 16 16 , F2.4 are exponential backoff functions stations, FE16 (x) is function for Ethernet, F2.1 (with parameter a = 2.1 and a = 2.4, respectively) with at most 16 attempts to transmit (as for Ethernet)

3

Application to the Ethernet Case

Now we try to apply these results to the real Ethernet protocol (See section 1.1). In addition, we present two exponential protocols that seems to show better performance in mathematic models. We will probe these protocols in cases, when the number of stations is 11, 51, 101, 501 and 1001. In the end we will give an outline of the estimated performances for these cases. In the introduction we said that Ethernet is a bounded BEB protocol with M = 16. Hence, for the Ethernet policy we have FE16 (pc )

=

10  i=0

2i pic

+

16 

210 pic =

i=11

1 − p6c 1 − (2pc )11 + 210 p11 . c 1 − 2pc 1 − pc

Additionally, we consider another (exponential) set of backoff-functions FaM (x) =

M  i=0

ai xi =

1 − (ax)M+1 . 1 − ax

16 Especially, we are interested in some sets of functions Fa16 (x) (particulary F2.1 (x) 16 and F2.4 (x)). See Figure 5 to understand the behavior of the functions. In the table bellow we give comparative data on the behavior of the network depending on the protocol and the number of stations. In the table, the leftmost column shows the number of the stations, the next row shows the point of intersection pc , after that column shows the average service time for the network, and the last column shows the probability of discarding for that protocol. Every cell has 16 (x), 3 numbers, separated by ’/’, these numbers are correspondingly data for F2.4 16 16 F2.1 (x), FE (x).

On the Optimality and the Stability of Backoff Protocols

N N N N N

= 11 0.48 = 51 0.54 = 101 0.57 = 501 0.64 = 1001 0.67

/ / / / /

pc 0.54 0.62 0.65 0.73 0.77

ES N

/ / / / /

0.62 2.77 0.74 2.76 0.80 2.74 0.94 2.72 0.99 2.73

/ / / / /

2.64 2.69 2.71 2.82 2.93

/ / / / /

407

P {discard} 2.59 3 ∗ 10−6 / 3 ∗ 10−5 / 3 ∗ 10−4 2.83 3 ∗ 10−5 / 3 ∗ 10−4 / 6 ∗ 10−3 3.02 6 ∗ 10−5 / 7 ∗ 10−4 / 0.022 3.86 5 ∗ 10−4 / 5 ∗ 10−3 / 0.349 3.52 1.2 ∗ 10−3 / 0.012 / 0.809

From the table we derive that Ethernet protocol is better to use for small number of stations (something like 10 stations), on the other hand the protocol 16 16 F2.1 (x) is better to use for 51, 101 stations, and the protocol F2.4 (x) good choice for 501, 1001 active stations. Also the existing Ethernet protocol is better not to use if the number of stations is greater than 500, because the probability that your message will be discarded is high. In real network, where some stations may be almost “silent”, the number of active stations much lower then the actual number of the stations in the network.

4

Conclusions

We have found stability conditions for the steady state models for backoff protocols. These conditions were obtained both for the bounded and for the unbounded retry limit models. Consequently, we can analytically measure the throughput of the system, and the service time. We have also found the optimality conditions. The question of the optimality is still open, but for our model (unbounded retry limit) we prove that exponential function is the best choice for the backoff function. In the paper we present graphics that show the correlation between the level function L(x) and the “extended” backoff function F (x). Moreover, from the graphic in figure 2 we can see why sub-linear and super-exponential functions are not good choices. Finally, we show the “connection” of the stability for bounded backoff and the successful throughput rate (1 − P {discard}). In this paper, we present analytical solutions, but some questions still remain open. For example, the optimality (for different M ) of the general bounded backoff (particulary Ethernet) and the appropriateness of the steady state assumption. A simulation would help to answer the last question, but at present there are good reasons for supposing that this assumption is appropriate.

Acknowledgment I would like to thank Prof. Martti Penttonen, who introduced me the problem arising in Ethernet and helped me a lot with editing of this article. I thank Prof. Evsey Morozov for many valuable comments which have improved presentation of the paper and attracting my attention to the principle of negative drift, and excellent book of Meyn and Tweedie.

408

A. Lukyanenko

References 1. Aldous, D.J.: Ultimate Instability of Exponential Back-Off Protocol for Acknowledgement-Based Transmission Control of Random Access Communication Channel. IEEE Trans. on Information Theory, IT 33(2), 219–223 (1987) 2. Abramson, N.: The ALOHA system-another alternative for computer communications. AFIPS 37, 281–285 (1970) 3. Abramson, N.: Development of the ALOHANE. IEEE Trans. on Inform. Theory 31(2), 119–123 (1985) 4. Kwak, B., Song, N., Miller, L.E.: Performance analysis of exponential backoff. IEEE/ACM Trans. Netw. 13(2), 343–355 (2005) 5. Bianchi, G.: Performance Analysis of the IEEE 802.11 Distributed Coordination Function. IEEE J. on Sel. Areas in Commun. 18(3), 535–547 (2000) 6. H˚ astad, J., Leighton, T., Rogoff, B.: Analysis of backoff protocols for multiple access channel. SIAM J. Comput. 25(4), 740–774 (1996) 7. Metcalfe, R., Boggs, D.: Ethernet: Distributed Packet Switching for Local Computer Networks 8. Kelly, F.P.: Stochastic models of computer communication systems. J. Roy. Statist. Soc. B 47, 379–395 (1985) 9. Kelly, F.P., MacPhee, I.M.: The number of packets transmitted by collision detect random access scheme. Annals of Prob. 15, 1557–1568 (1987) 10. Goodman, J., Greenberg, A.G., Madras, N., March, P.: Stability of binary exponential backoff. J. of the ACM 35(3), 579–602 (1988) 11. Fayolle, G., Flajolet, P., Hofri, M.: On a functional equation arising in the analysis of a protocol for a multi-access broadcast channel. Adv. Appl. Prob. 18, 441–472 (1986) 12. Rosenkrantz, W.A.: Some theorems on the instability of the exponential back-off protocol. Performance ’84, 199–205 (1985) 13. Goldberg, L.A., MacKenzie, P.: Analysis of Practical Backoff Protocols for Contention Resolution with Multiple Servers. J. of Comp. and System Sciences 58, 232–258 (1999) 14. Goldberg, L.A., MacKenzie, P., Paterson, M., Srinivasan, A.: Contention Resolution with Constant Expected Delay. Journal of the ACM (JACM) 47(6), 1048–1096 (2000) 15. Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Springer, London (1993)

Maximum Frame Size in Large Layer 2 Networks Karel Slavicek CESNET and Masaryk University Botanicka 68a 60200 Brno Czech Republic [email protected]

Abstract. Ethernet protocol originally designed local area networks became very popular and in practice the only protocol used in local and metropolitan networks. Currently both IETF and IEEE are working on some improvements of this protocol to make it more attractive for usage in WAN networks. These modifications may add some additional fields into ethernet header so that the effective maximum transportable data unit will be decreased. From this point follows the problem how to inform upper layers about maximum data unit which ethernet layer can transport. This problem is not addressed in prepared IETF and IEEE standards. This paper tries to point out to this problem and propose some possible solutions.

1 Introduction Ethernet is the grand unifying technology that enables communication via grand unifying network protocol - IP. Today’s applications (based on IP) are passed seamlessly throughout a complex ethernet system across carrier networks, enterprise networks and consumer networks. From its origin more than 30 years ago ethernet has evolved to meet the increasing demand of IP networks. Due to its low implementation cost, relative simplicity and easy maintenance, ethernet population has grown to today state that mostly all IP traffic start and end on ethernet connection. Ethernet has evolved beyond just offering fast and reliable local area network. It’s now being used for access, metropolitan and wide area networks. The next step on the road are carrier class networks. Let’s think about why ethernet is preparing to colonise carrier backbone networks just now. We can see that ethernet has grown to the SONET/SDH speed and rather near SONET/SDH transport properties. Since experimental ethernet in let say 1976, first standardised ethernet in 1982 to IEEE standard in 1985, the world of networking protocols was very diverse and was difficult to predict which protocol will survive. In 1995 fast ethernet was standardised and we can say that it defeated its main data network competitors - token ring an 100VG-AnyLan. However carriers transport protocols - SONET and SDH - at that time offered more bandwidth better resiliency and quality of service. It took yet rather long way through gigabit ethernet (standardised in 1998) to ten gigabit ethernet (standardised in 2002). Ten gigabit ethernet provides the Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 409–418, 2007. c Springer-Verlag Berlin Heidelberg 2007 

410

K. Slavicek

same speed as most of carriers backbone networks and probably it will keep up with speed of SONET and SDH forever. Up to gigabit ethernet it is reasonable to transport ethernet frames inside SONET/SDH concatenated VC-4 containers. Ten gigabit ethernet would consume the whole STM-64 frame or a quarter of STM-256. The STM-256 is not so widely deployed because of more constraints on fiber optics infrastructure. The next bandwidth step in ethernet - 100 gigabit - will probably come earlier than next bandwidth step in SONET/SDH. Because mostly all today’s data traffic starts and ends on ethernet connection and ethernet technology is very price competitive to legacy SONET/SDH technologies. For all the above reasons the deployment of ethernet into provider backbone networks is very straightforward step.

2 Carrier Grade Ethernet There are more reasons which make service providers think about ethernet. With development of Internet and data networking in general there appeared customers who want to interconnect their remote branches to central office or/and between them selves. Such interconnections were implemented as leased layer 2 lines. Network operators usually offered leased lines on top of SDH or PDH transport network. Typical router at that time offered a lot of serial interfaces of type V.35, X.21 or G.703. As customers wanted more bandwidth, the ethernet was natural replacement of serial lines. A natural property of ethernet is the ability to connect large number of hosts via uniform layer 2 environment. It’s something very different from former networks offering a set of point-to-point connections. SONET/SDH networks is not able to simply simulate ethernet environment for more than two hosts. Another way how service providers may offer interconnection of remote customers facilities is to use layer 3 protocols - typically MPLS - and build a virtually dedicated infrastructure for each customer. Customers private layer 3 routes are stored in Virtual Route Forwarding (VRF) tables. If the customer requests layer 2 service, the situation is rather complicated even in case of MPLS usage. Originally only point-to-point ethernet over MPLS tunnels were defined. Even first IETF standards [9], [10] describing layer 2 network transport over MPLS were speaking about pseudo wires. Even though a protocol for point to multipoint layer 2 connections over MPLS was developed during pas few years. This protocol is called VPLS ind is specified in [12] and [13]. The VPLS protocol is rather complicated and uses many layers of encapsulation. The customers ethernet frame is encapsulated into MPLS frame and the MPLS frame is once more encapsulated into another layer 2 frame. The providers MPLS network has to emulate ethernet switching environment. MPLS PE routers have to learn MAC addresses carried over the VPLS network and distribute this information to other MPLS PE routers servicing given VPLS network. In other words MPLS PE routers have to maintain customers layer 2 forwarding database. The VPLS protocol is than implemented only in very expensive and complicated ”carrier-grade” networking equipment. On the other hand legacy carrier protocols (SONET/SDH and PDH) offer some properties which are not yet standardised for ethernet. There are two mostly independent working groups which are working on standardisation of these properties for ethernet.

Maximum Frame Size in Large Layer 2 Networks

411

One of these groups is IEEE 802.1. IEEE start from original ethernet specification and tries to address network operators needs without employing layer 3 protocols. The second one is the IETF TRILL group. IETF group has big experience and expertise in Internet protocol. They more emphasise layer 2 / layer 3 cooperation and don’t stick at utilisation of layer 3 protocols where it is useful. The third group working on ethernet protocol improvement is the ITU-T study group 15. The work of ITU-T is concentrated mainly on ethernet management and is out of the scope of this paper.

3 IEEE Approach In the IEEE approach we can see the endeavour for truly homogeneous set of networking protocols which can seamlessly carry data across local, metropolitan and provider backbone networks. The basis of scaling ethernet networks is the IEEE 802.1Q Virtual Local Area Network standard. This standard allows us to construct logical layer 2 networks on single physical network infrastructure. Of course this standard was designed for simple enterprise LAN environment and due to this it has scalability limitations. The limiting factor is the fact that the VLAN ID is only 12 bits so we can use at most 4096 VLAN. This number of VLANs is enough for the enterprise but may not be enough for larger metropolitan or even wide area networks. For this reason the Q-in-Q concept was introduced. The IEEE 802.1ad standard defines ”stacking” of two .1Q labels. Packets coming from customers network into providers equipment are labeled with this new label. The IEEE 802.1ad frame is on Fig. 1. Dest. Addr. Src. Addr. T service VL-ID T Cust VL-ID T/L User Data FCS Fig. 1. IEEE 802.1ad frame format

The main benefit of this approach is the independence of customers and providers addressing schema (VLAN numbering). This is the first prerequisite for ethernet being usable in large networks. However the IEEE 802.1ad standard doesn’t solve the problem with limited number of VLANs usable in provider backbone. For really large network operators this still may be limiting. More over all the operators devices carrying customers data should learn MAC addresses of customers devices and it may introduce very large forwarding information base in core providers networking equipment. Inside provider backbone some sort of aggregation is obvious. For this case IEEE is working on new standard 802.1ah or MAC-in-MAC. This protocol should be used on provider backbone bridges. It simply encapsulates original 802.1ad (or 802.1Q or ”legacy ethernet” frame) into new ethernet frame. By this way provider bridges need to know only limited number of MAC addresses. More over in 802.1ah header are two new tags available: so called Btag or backbone tunnel tag and I-tag or extended services tag. This way enable enough service instances across large backbone networks. The frame format of 802.1ah is on Fig. 2.

412

K. Slavicek Dest. Addr. Src. Addr. B-TAG I-TAG Original 802.1ad frame FCS Fig. 2. IEEE 802.1ah frame format

Another problem of ethernet is low efficiency of underlaying physical link utilisation. The matter of the problem is the fact that ethernet frame doesn’t have any hop count or time to live or similar field by which it would be possible to identify frames wandering in a circle. Instead of this the ethernet bridges utilise a spanning tree algorithm and construct a logical tree topology on top of general physical one to avoid packet circulating in the network. IEEE 802.1aq group works on new protocol called shortest path bridging which should improve physical topology utilisation.

4 IETF Approach The IETF approach defines a new type of devices called RBridge. The proper definition of this device we can cite from [3]:” RBridges are link layer (L2) devices that use routing protocols as a control plane. They combine several of the benefits of the link layer with network layer routing benefits. RBridges use existing link state routing (without requiring configuration) to improve RBridge to RBridge aggregate throughput. RBridges also provide support for IP multicast and IP address resolution optimizations. They are intended to be applicable to similar L2 network sizes as conventional bridges and are intended to be backward compatible with those bridges as both ingress/egress and transit. They also support VLANs (although this generally requires configuration) and otherwise attempt to retain as much ’plug and play’ as is already available in existing bridges. ” The IETF Trill group tries to support increased RBridge to RBridge bandwidth, keep the layer 2 network configuration free as it can be in current layer 2 networks while still being compatible with existing bridges and hubs. Of course the configuration free request is (as in current layer 2 networks) usable only in small networks. In large layer 2 networks it is theoretically possible but practically very unstable as we can confirm from practical experiences. RBridges use an Intermediate System to Intermediate System (IS-IS) routing protocol to discover RBridge peers, determine RBridge link topology, advertise layer 2 reachability information and establish layer 2 delivery using shortest path. Forwarding information is derived from combination of attached MAC addresses learning and path computation using link-state routing protocol. The RBridge have some characteristic of both bridges and routers. An unicast ethernet frame is forwarded toward the rbridge advertising its destination MAC address. A distribution tree is used for broadcast, multicast and unknown unicast traffic forwarding. This solutions have no impact on the Internet network layer architecture. However it is designed to cooperate with IPv4 ARP and IPv6 ND protocols. More precisely RBridges should cache ARP/ND responses an may not to forward ARP/ND request but send ARP/ND response instead of the target if they know the proper IP-MAC address mapping. Let’s cite from the RBridge protocol definition [1]:

Maximum Frame Size in Large Layer 2 Networks

413

“ Rbridges SHOULD implement an ’optimized ARP/ND response’ When the target’s location is assumed to be known by the first RBridge, it needs not flood the query. Alternative behaviors of the first Designated RBridge that receives the ARP/ND query would be to: 1. send a response directly to the querier, with the layer 2 address of the target, as believed by the RBridge 2. encapsulate the ARP/ND query to the target’s Designated RBridge, and have the Designated RBridge at the target forward the query to the target. This behavior has the advantage that a response to the query is authoritative. If the query does not reach the target, then the querier does not get a response 3. block ARP/ND queries that occur for some time after a query to the same target has been launched, and then respond to the querier when the response to the recentlylaunched query to that target is received The reason not to do the most optimized behavior all the time is for timeliness of detecting a stale cache. Also, in the case of secure neighbor discovery (SEND) [RFC3971], cryptography might prevent behavior 1, since the RBridge would not be able to sign the response with the target’s private key. ” This fact can be used for distributing information about MTU. The TRILL concept use encapsulation of transported layer 2 data frame into trill header. The overall RBridge architecture is on Fig. 3. The TRILL approach is in this point very similar to the IEEE 802.1ah one. The structure of the ”TRILL” frame is on Fig. 4. Higher Layers Trill Layer Rbridge realy Trill Layer Datalink Layer Datalink Layer Physical Layer Physical Layer Fig. 3. The overall RBridge architecture

Outer ethernet header TRILL header Inner (original) ethernet header Ethernet payload New (outer) FCS Fig. 4. Structure of TRILL frame

5 The MTU Problem The IP protocol was designed to be used on a variety of transmission links. Although the maximum length of an IP packet is 64kB most of the transmission lines enforce smaller packet length. The maximum length of packet which can be transported on given line is

414

K. Slavicek

called MTU. If an application sends IP packet larger than MTU some network device or devices must fragment this packet into smaller units. In the IP header there exist enough gears to deal with fragmented packets. Packet fragmentation has many disadvantages: Firewalls that filter or manipulate packets based on higher layers may have trouble processing fragmented packets. If the fragments come to firewall out of order then the non-initial fragments may be dropped so that the original packed could not be reassembled by the receiver. And of course reassembling packets (even if we do not use firewalls) by the receiving host in environment where fragments may come out of order and may be dropped (e.g. due to congestion) can consume a meaningful part of CPU time at the receiving host. For these reasons many applications, especially TCP based, try to avoid fragmentation by setting the DF (don’t fragment) bit in IP header and splitting the data into segments of proper size by the sending host. As the network protocol stack grows (ethernet header, IP header, GRE header, IPSEC header ...) the real size of data transported inside one packed decreases. Till now all necessary information about MTU were known at layer 3. At layer 3 there is a mechanism for signalling the requested MTU to the sender. This mechanism is the ICMP unreachable message. Much more difficult situation is in case where we are using some advanced layer 2 protocols like TRILL or 802.1ah. These protocols may decrease the effective MTU somewhere on the path where only layer 2 devices are used. These devices have no possibilities to inform sender about MTU. More over these devices are not intended to deal with layer 3 protocols so that they can’t fragment IP packets and should simply truncate or drop them. The situation is easy to see from Fig. 5. Till now in case that some e.g. metropolitan network operator is using 802.1ad protocol the MTU problem is solved by simply increasing the ethernet MTU of bridges evolved. Even if this solution doesn’t strictly corresponds to IEEE standards (the 802.3as which specifies larger frames than 1500 Bytes was released in September 2006) this is a commonly used solution. The problem is that this solution is bounded to single network

L3

L3

MTU 1500 MTU 2000

MTU 2000

L2

L2 transport network

L2

Fig. 5. MTU problem in large scale multiprovider ethernet network

Maximum Frame Size in Large Layer 2 Networks

415

operator and is not prepared for large ethernet network which eventually may traverse more operators where some operator may not be able or compliant to carry larger frames. In this new ethernet environment is would be much better to be able to signal the MTU to upper layers. As we mention above the TRILL protocol is very suitable for it. We are working on ”IEEE-like” solution as well but this work still needs some effort and will be presented in some next paper.

6 Possible Solutions for the MTU Problem RBridges are exchanging reachability and link state information via IS-IS protocol. This protocol may be very simply modified to carry the MTU of each link. By this way all RBridges may (mostly costless) know next to current layer 2 topology also MTU of all lines. RBridge can then offer rather smart solution of the problem how to pass the MTU size from layer 2 device to layer 3 network. Of course there are two trivial solutions: The first one is to reduce the MTU on the outgoing interface on customers boundary router. This solution is possible and anytime applicable. The main disadvantage is higher protocol overhead and suboptimal utilisation of WAN connectivity. The second one is to ask all service providers along the path used for long distance layer 2 line to increase the MTU. This solution is not every time applicable because the customer not every time knows all the carriers and carrier’s carrier participating on the transport service. The are two possible solutions which may lead to optimal bandwidth utilisation (that means maximum available MTU usage) and which would cost minimum (if any) changes to existing protocol at the same time. The first one is to use the same mechanism which is used in Path MTU discovery that is ICMP unreachable messages. The second one is to modify a little bit the ARP protocol. 6.1 Gratuitous ICMP Unreachable Type Fragment Needed This method is so simple that there is mostly nothing to explain. The idea can be easily demonstrated on the following example: Let say that an ingress RBridge has received an IP ethernet frame which is larger then MTU of the line which should be used to send the packet towards the destination. This RBridge simply discards the frame and sends to the originator an ICMP unreachable message with type Fragment needed and put the proper MTU into the messages as it would be done by router. Here are of course question about source IP address of this ICMP messages because RBridges need not to have any IP address and if they do have some IP address the address may be in another VLAN than sender of large data frame. Even in the case that the RBridge has IP address in the same VLAN as the IP ethernet frame that caused the ICMP Fragment needed message there may be problem with firewalling or filtering on some node on the path from RBridge to the originator of such frame. Here the solution is a little bit tricky - the RBridge will borrow the target IP address of the packet being dropped. I.e. from point of view of originator of oversize packet it will look a like the target host will respond with this ICMP message. This solution is very easy to

416

K. Slavicek

implement but the fact that it need to borrow someones else IP address is not very nice. (However it is similar to the way RBridges deal with ARP). The main advantage of this approach is full compatibility with existing protocols. This approach don’t introduce any modification of existing protocol stacks. It is fully transparent to today’s IP stack implementations. 6.2 Modification of ARP/ND A slightly more nice way which on the other hand needs some modification of existing protocols is to use ARP for sending the MTU information to layer 3. Because RBridges have all necessary information and more over may respond to ARP requests we can include the MTU information into ARP packet. The original ARP frame is on Fig. 6 and the modified one is on Fig. 7. The idea is very simple: ARP protocol uses 16-bit opcode field for only two possible values - request and response. We can utilise some bit of this field to identify that the ARP uses new frame type - i.e. i contains one more field containing the MTU. Much easier situation is in case of IPv6. The IPv6 ND (Neighbour Discovery) protocol already has and option field and one of defined options is MTU. Proposed modification of ARP is in some sense retrofitting of IPv6 properties into IPv4. The big disadvantage of this method is modification of existing ARP protocol which will cause modification of existing IP stack of equipment communicating directly with RBridges. Of course the backward compatibility problem can be very easily solved. hardware address space protocol address space length of hardware address length of protocol address opcode hardware address of sender protocol address of sender hardware address of target protocol address of target

16 bit 16 bit 8 bit 8 bit 16 bit 1 = request, 2 = reply n m n m

Fig. 6. Structure of ARP frame

hardware address space protocol address space length of hardware address length of protocol address opcode hardware address of sender protocol address of sender hardware address of target protocol address of target maximum hardware frame size

16 bit 16 bit 8 bit 8 bit 16 bit 0x0101 = MTU request, 0x 0102 MTU reply n m n m 16 bits

Fig. 7. Proposed enhancement of ARP frame

Maximum Frame Size in Large Layer 2 Networks

417

The RBridge may as a response to an ARP request send a pair of ARP responses - the first one with the MTU option and the second one without it. This way the hosts with the original implementation of ARP will ignore the first ARP response and learn MACto-IP address mapping from the second one. Of course the MTU problem is not solved for such hosts. 6.3 Comparison of Gratuitous ICMP Fragment Needed and Modified ARP Approaches The common problem of both approaches is how to signal to the host that the MTU for some destinations was increased. As an example let consider simple network as on the figure 5. Now we introduce a new connection with higher MTU as on the figure 8. Let the newly added path be the preferred one.

L3

L3

MTU 2000

MTU 1500

MTU 2000

L2 transport network

L2

L2

MTU 1800 L2 transport network

Fig. 8. MTU may increase under some circumstances

In this situation we can use the MTU 1800 instead of the original 1500. The problem is how to inform transmitter that larger MTU is available. In the gratuitous ICMP fragment needed approach there is no possibility of large MTU propagation. A little bit better situation is in case of modified ARP approach. In this case a gratuitous arp with new MTU option can be sent. Of course this approach can solve the problem only partially. The RBridge can send gratuitous arp only to hosts which are already in its ARP cache. More over this approach will work only for end nodes. If the L3 device on figure 8 is a router it probably has propagated smaller MTU to to true originator of the traffic. Current IP protocols have capabilities only to decrease the MTU. The reverse problem is not solved.

418

K. Slavicek

7 Conclusion The intention of this paper was to point out the problem of MTU signalling between layer 2 (ethernet) and layer 3 (IPv4 or IPv6). This problem may be a real issue of next generation ethernet deployment in provider networks. Both proposed solutions are straightforward and easy enhancement of existing protocols. They have both positive and negative properties. Proposed solutions are not mend as the optimal method of layer 2 to layer 3 MTU signalling but as an entry point for discussion about this problem which should end up with common consensus and may be yet another solution.

References 1. Perlman, R., Gai, S., Dutt, D.G.: RBridges: Base Protocol Specification.draft-ietf-trillrbridge-protocol-03.txt 2. Touch, J., Perlman, R.: Transparent Interconnection of Lots of Links (TRILL): Problem and Applicability Statement. draft-ietf-trill-prob-01.txt 3. Gray, E.: The Architecture of an RBridge Solution to TRILL. draft-ietf-trill-rbridge-arch02.txt 4. Gray, E.: TRILL Routing Requirements in Support of RBridges. draft-ietf-trill-routing-reqs02.txt 5. Plummer, D.: An Ethernet Address Resolution Protocol – or – Converting Network Protocol Addresses to 48.bit Ethernet Address for Transmission on Ethernet Hardware, RFC 826 (November 1982), http://www.ietf.org/rfc/rfc826.txt 6. Narten, T., Nordmark, E., Simpson, W.: Neighbor Discovery for IP Version 6 (IPv6), RFC 2461 (Standards Track) (December 1998) http://www.ietf.org/rfc/ rfc2461.txt 7. Callon, R.: Use of OSI IS-IS for Routing in TCP/IP and Dual Environments, RFC 1195, (December 1990), http://www.ietf.org/rfc/rfc1195.txt 8. IEEE Standard for Local and metropolitan area networks / Virtual Bridged Local Area Networks, 802.1Q-2005 (May 19, 2006) 9. Bryant, S., Pate, P. (eds.) Pseudo Wire Emulation Edge- to-Edge (PWE3) Architecture”, RFC 3985 (March 2005), http://www.ietf.org/rfc/rfc3985.txt 10. Martini, L., El-Aawar, N., Heron, G., Rosen, E., Tappan, D., Smith, T.: Pseudowire Setup and Maintenance using the Label Distribution Protocol (LDP), RFC 4447 (April 2006), http://www.ietf.org/rfc/rfc4447.txt 11. ANSI/IEEE Standard 802.1Q-2005, IEEE Standards for Local and Metropolitan Area Networks: Virtual Bridged Local Area Networks (2005) 12. Andersson, L., Rosen, E.: Framework for Layer 2 Virtual Private Networks (L2VPNs)”, RFC 4664 (September 2006), http://www.ietf.org/rfc/rfc4664.txt 13. Lasserre, M., Kompella, V.: Virtual Private LAN Service (VPLS) Using Label Distribution Protocol (LDP) Signaling, RFC 4762 (January 2007), http://www.ietf.org/ rfc/rfc44762.txt

Analysis of Medium Access Delay and Packet Overflow Probability in IEEE 802.11 Networks Gang Uk Hwang Department of Mathematical Sciences and Telecommunication Engineering Program Korea Advanced Institute of Science and Technology 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Republic of Korea [email protected] http://queue.kaist.ac.kr/∼ guhwang

Abstract. In this paper, we first analyze the medium access delay of a packet in a terminal in the saturated IEEE 802.11 network. In our analysis, we use the renewal theory to analyze the detailed packet transmission processes of terminals in the network such as the backoff counter freezing. Using our detailed analysis of the packet transmission processes of terminals, we analyze the packet transmission process of a tagged terminal and the background traffic for the tagged terminal which is generated by non-tagged terminals, and derive the Laplace transform of the medium access delay of a packet under the saturated condition. Next, based on the analysis of the medium access delay under the saturated condition, we propose a mathematical model to analyze the packet overflow probability of an unsaturated terminal. We also provide numerical and simulation results to validate our analysis and investigate the characteristics of the system performance. IEEE 802.11 WLAN, Distributed Coordination Function, Medium Access Delay, Performance Evaluation, Packet Overflow Probability

1

Introduction

During the past few years, standards for WLANs (Wireless Local Area Networks) have been proposed to satisfy the demand for wireless services, and the IEEE 802.11 MAC (Medium Access Control) protocols [1] are the de facto standards for WLAN and the most widely used nowadays. In the IEEE 802.11, the main mechanism to access the channel is the DCF (Distributed Coordination Function), which is a random access scheme based on the CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance). The DCF has two access techniques for packet transmission: the default, called the basic access mechanism, and an optional four-way handshake scheme, called the RTS/CTS mechanism. In both mechanisms, they use backoff counter and backoff stage to determine packet transmission times [14,16]. The RTS/CTS mechanism involves the transmission of the RTS (Request-ToSend) and CTS (Clear-To-Send) control frames prior to the transmission of the Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 419–430, 2007. c Springer-Verlag Berlin Heidelberg 2007 

420

G.U. Hwang

data frame. A successful exchange of RTS and CTS frames attempts to reserve the channel for the time duration needed to transmit the data frame under consideration. The rules for the transmission of an RTS frame are the same as those for a data frame under the basic access scheme. Hence, the analysis of the basic access mechanism is the same as that of the RTS/CTS mechanism except that transmission times of packets are different in both mechanisms. We consider the RTS/CTS mechanism only in this paper. Many different works dealing with the performance analysis of the IEEE 802.11 DCF can be found in the literature. They have focused primarily on its throughput and capacity [2,3,4], adaptive backoff schemes [5,6,7], statistical analysis such as packet service times (or packet delay) of a terminal and related queueing analysis [8,9,10,11,12]. Regarding the analysis of the medium access delay (or the packet service time) which is defined by the time needed for a packet to be successfully transmitted after it is positioned in the transmission buffer of the terminal for the first time, Carvalho and Garcia-Luna-Aceves [8,9] have introduced an analytical model to characterize the service time of a terminal in saturated IEEE 802.11 ad hoc networks. However, their analysis in [8] and [9] has been carried out in terms of ¨ first and second order statistics only. Ozdemir and McDonald [13] have derived the packet service time distribution of a terminal. Tickoo and Sikdar [11,12] have computed the service time distributions by including explicitly modelling the impact of the network load on the loss rates and thus the delays. In this paper, we consider a network of N identical IEEE 802.11 DCF (Distributed Coordinate Function) terminals with the RTS/CTS mechanism, each of which is assumed to be saturated. For performance analysis, we propose a simple and efficient mathematical model to derive the Laplace transform of the medium access delay of a packet in a terminal. In our analysis, we first use the renewal theory to analyze the detailed procedure of backoff counter freezing as in [16]. We then consider the packet transmission process of a tagged terminal and the background traffic for the tagged terminal which is generated by non-tagged terminals. Finally, we derive the Laplace transform of the medium access delay of a packet in the tagged terminal. In addition to the saturated analysis, we consider an unsaturated terminal in the IEEE 802.11 network where the other N − 1 terminals in the network are assumed to be saturated. Based on our analytic results in the saturated analysis, we propose a mathematical model to analyze the packet overflow probability of an unsaturated terminal. Note that our approach is totally different from those in previous studies including Bianchi [14], and provides a simple and efficient model to analyze the system performance as shown later, which is the main contribution of this paper. The organization of this paper is as follows: In section 2, we first develop an analytic model to consider the details of the packet transmission process in a terminal. Then, we consider a tagged terminal, introduce the background traffic of the tagged terminal and analyze the medium access delay of a packet in the tagged terminal through the Laplace transform. In section 3, we consider an unsaturated terminal and analyze the packet overflow probability based on the

Analysis of Medium Access Delay and Packet Overflow Probability

421

results obtained in section 2. In section 4, we provide numerical examples to validate our analysis and investigate the characteristics of the system performance. In section 5, we give our conclusions.

2

Medium Access Delay

In this section, we analyze the medium access delay of a packet in a terminal of a saturated IEEE 802.11 network, which is defined by the time needed for the packet to be successfully transmitted after it is positioned in the transmission buffer of the terminal for the first time. We assume that there are N identical terminals in the IEEE 802.11 network. We tag an arbitrary terminal in the network, called the tagged terminal, and consider the medium access delay of a packet of the tagged terminal. For analysis, we consider the packet transmission processes of the other N −1 terminals as the background traffic of the tagged terminal. So, if there is a packet transmission in the background traffic at a packet transmission epoch of the tagged terminal, the transmitted packet of the tagged terminal experiences a collision. On the other hand, if there is no packet transmission in the background traffic at a packet transmission epoch of the tagged terminal, the packet of the tagged terminal is successfully transmitted. For analysis, we first analyze the packet transmission process of an arbitrary terminal, and then the packet transmission process in the background traffic. Finally, we analyze the medium access delay of the tagged terminal. 2.1

The Packet Transmission Process of a Terminal

For analysis, we focus ourselves on embedded time points where the backoff counter value of a terminal of interest is changed (i.e., the backoff counter value is decremented by 1 or a new backoff counter value is selected). A time interval between two consecutive embedded time points are defined by a virtual slot in the analysis. We assume that the packet transmission process of a terminal is well approximated by a renewal process. Let R denote the inter-renewal time, i.e., the number of virtual slots between two consecutive packet transmissions of an arbitrary terminal in steady state. To obtain the distribution of the random variable R, we first compute the steady state probability πk , 0 ≤ k ≤ m, that the terminal is in backoff stage k just after a packet transmission. Here, m denotes the maximum backoff stage. To do this, we assume that packet collision occurs independently at each packet transmission with a fixed probability p as in [14] in steady state. Then, it is shown that the steady state probability πk , 0 ≤ k ≤ m, is computed by [16]

k p (1 − p), if 0 ≤ k ≤ m − 1, πk = (1) pm , if k = m. Next, let BC denote the backoff counter value selected by the terminal after the packet transmission in steady state. When the backoff stage after the packet

422

G.U. Hwang

transmission is k, the tagged terminal selects a new backoff counter value uniformly from the window [0, CWk − 1] where the window size CWk at backoff state k is CWk = 2k CW0 , 1 ≤ k ≤ m and CW0 is the initial window size at backoff stage 0. Hence, the steady state probability rk , 0 ≤ k ≤ CWm − 1, that the backoff counter BC is equal to k is given as follows [16]: Δ

rk = P {BC = k} =

m 

πj

j=l

1 , CWj

CWl−1 ≤ k ≤ CWl − 1,

(2)

for l = 0, 1, 2, · · · , m where CW−1 = 0. Since R includes the virtual slot containing a packet transmission by its definition, we have R = BC + 1, from which the distribution of R is given by P {R = k} = P {BC = k − 1},

1 ≤ k ≤ CWm .

(3)

Now, let τ denote the steady state probability that an arbitrary terminal transmits a packet in an arbitrary virtual slot. By the definition of τ and the random variable R, it follows τ=

1 E[R]

(4)

where E[R] denotes the expectation of the random variable R and given by E[R] = =

m  j=0 m  j=0

P {BS = j}E[R|BS = j] =

m  j=0

πj



CWj

πj

k

k=1

CWj + 1 , 2

1 CWj (5)

where BS denotes the backoff stage after an arbitrary packet transmission in steady state. In addition, from the definitions of p and τ we have p = 1 − (1 − τ )N −1 .

(6)

Therefore, starting from (1), (6) with an initial value of τ , we obtain an updated value of τ from (4) and (5). Then, if the updated value of τ is not very close to the initial value of τ , we perform the same procedure with the updated value of τ as the initial value of τ . We do this recursive procedure until we get an exact value of τ . Note that our recent paper [16] shows that the value of τ from the above procedure is exactly the same that is obtained by Bianchi’s model [14]. For later use, we introduce a random variable R(e) which denotes the remaining number of virtual slots for an arbitrary terminal to perform the next packet transmission at an arbitrary virtual slot boundary in steady state. By renewal theory [15], it can be shown that P {R(e) = k} =

P {R ≥ k} , E[R]

1 ≤ k ≤ CWm

Analysis of Medium Access Delay and Packet Overflow Probability

423

where P {R ≥ k} is obtained from (2) and (3). Since all terminals are operated identically, the probabilistic characteristics of terminals are all identical, that is, the above results for R and R(e) can be used for the packet transmission process of any terminal in the network. 2.2

The Background Traffic Analysis

Now we first construct the packet transmission process of the background traffic based on our analysis in section 2.1. To do this, we consider actual slot boundaries where the backoff counter value of the tagged terminal is either changed or frozen. A time interval between two consecutive actual slot boundaries is called an actual slot in the analysis. Note that the actual slot is different from the physical slot in IEEE 802.11 DCF standard. In addition, the actual slot is also different from the virtual slot. The difference between actual and virtual slots occurs when there is a packet transmission of a non-tagged terminal in the virtual slot. In this case the virtual slot containing a packet transmission of a non-tagged terminal is divided into two actual slots, one of which is the packet transmission time of the non-tagged terminal and the other of which is the physical slot during which the backoff counter value of the tagged terminal is still frozen. Let Hbg be the number of actual slots between two consecutive packet transmissions in the background traffic. To get the distribution function of Hbg in steady state, we assume that there occurs at least one packet transmission in the background traffic at an arbitrary actual slot boundary, called the initial boundary, in steady state. We condition on the number N0 of non-tagged terminals in the background traffic transmitting packets simultaneously at the initial boundary. Then since there are N −1 non-tagged terminals, the probability mass function of N0 is given as follows: !N −1" k τ (1 − τ )N −1−k P {N0 = k} = N −1k !N −1" , 1 ≤ k ≤ N − 1. (7) τ l (1 − τ )N −1−l l=1 l Here, the denominator is the probability that there is at least one non-tagged terminal transmitting a packet at the initial boundary in steady state. Now, to compute the distribution of Hbg , we should consider the backoff freezing procedure for non-transmitting terminals in the background traffic at the initial boundary. That is, a non-transmitting terminal in the background traffic with positive backoff counter value, detecting a packet transmission needs one more actual slot to decrease its backoff counter value after the actual slot containing the packet transmission [1]. Then, if we consider the terminal transmitting a packet at the initial boundary in steady state, the next transmission time for the terminal is given by R. On the other hand, if we consider the terminal transmitting no packet at the initial boundary in steady state, the next transmission time is given by R(e) + 1. Here, one acutal time slot is added due to the backoff freezing procedure for non-transmitting terminal explained above. Hence, the distribution of Hbg is given as follows: For 1 ≤ k ≤ CWm , N−1

P {Hbg ≤ k} =





 

P {N0 = j} 1 − (P {R > k})j P {R(e) > k − 1} j=1

N−1−j

(8)

424

G.U. Hwang

! "N −j where P {N0 = j} is given in (7). Note that (P {R > k})j P {R(e) > k − 1} is the conditional probability that there will be no packet transmission in the background traffic during [0, k], given that there are j terminals transmitting packets in the background traffic at the initial boundary. For later use, We need to consider the conditional probability qbg that there occurs a collision in the background traffic, given that there is a packet transmission in the background traffic. Since there are N − 1 terminals in the background traffic, we have qbg =

1 − (1 − τ )N −1 − (N − 1)τ (1 − τ )N −2 . 1 − (1 − τ )N −1

We then define the packet transmission time Tbg in the background traffic by T = (1 − qbg )Ts + qbg Tc .

(9)

Here, Ts and Tc denote the times taken by a successful transmission and a collided transmission, respectively. That is, since we consider the RTS/CTS mechanism, Ts and Tc are given by Ts = RT S + SIF S + CT S + SIF S + P acket + SIF S + ACK + DIF S, Tc = RT S + DIF S where RTS, CTS, Packet and ACK are the respective packet transmission times of RTS, CTS, data packet and ACK, and SIFS and DIFS denote the Short Inter-Frame Space and the DCF Inter-Frame Space, respectively. In practice, there is a correlation between the background packet transmission process and the packet transmission process of the tagged terminal due to the packet collision. However, to make the analysis as simple as possible we assume that both packet transmission processes are independent from now on. 2.3

Medium Access Delay Analysis

In this subsection, we analyze the medium access delay of our tagged terminal based on our results in subsection 2.2. We start with the maximum backoff stage m. Let Dm be the time needed for a packet in the tagged terminal to be successfully transmitted, given that the starting backoff stage of the packet is stage m at an arbitrary time. To compute Dm , which is in fact the sojourn time in stage m during the medium access delay, let Dm (i) denote the time needed for a packet to be successfully transmitted, given that the starting backoff stage is m and the value of the starting backoff counter is i(0 ≤ i ≤ CWm − 1) at an arbitrary time, and Em (i) denote the remaining time needed for a packet to be successfully transmitted, given that the backoff stage is m and there occurs a backoff counter freezing at an arbitrary time with the backoff counter value i(1 ≤ i ≤ CWm − 1). Note that, since we have a backoff counter freezing for Em (i), i should be in 1 ≤ i ≤ CWm − 1, while since there is no such condition for Dm (i), i can take

Analysis of Medium Access Delay and Packet Overflow Probability

425

any value in [0, CWm − 1]. Now we consider the packet transmission processes of the background traffic as well as the tagged terminal. The fact that the starting stage is m implies there is a collision and consequently the next packet transmission time in the background traffic is Hbg obtained in (8). So, we have the following cases to compute Dm (i): For Hbg = j, - When 1 ≤ j ≤ i, there occurs a packet transmission in the background traffic before the packet transmission of the tagged terminal. In this case, the packet transmission in the background traffic is either a successful one d or a collided one. Then, we have Dm (i) = (j − 1)σ + T + Em (i − j + 1). - When j = i + 1, there occur simultaneous packet transmissions in the background traffic as well as the tagged terminal. In this case, we have a collided packet transmission. Since the backoff stage is the maximum stage m, we d have Dm (i) = iσ + Tc + Dm . - When i + 2 ≤ j ≤ CWm , there occurs a packet transmission of the tagged terminal before the packet transmission in the background traffic. In this case, the packet transmission of the tagged terminal is successful. Then, we d have Dm (i) = iσ + Ts . Here, T is given by (9), σ denotes the physical slot time in the IEEE 802.11 DCF, d

and = means both sides are equal in distribution. For later use we compute the Laplace transform E[e−sDm (i) ] of Dm (i) as follows: i

E[e−sDm (i) ] =

P {Hbg = j}E[e−s[(j−1)σ+T +Em (i−j+1)] ] j=1

+P {Hbg = i + 1}E[e−s[iσ+Tc +Dm ] ] +

CWm

P {Hbg = j}E[e−s[iσ+Ts ] ].

(10)

j=i+2

Here, P {Hbg = j} is computed from (8), and the Laplace transforms E[e−sEm (i) ] and E[e−sDm ] of Em (i) and Dm , respectively, will be computed soon. By similar arguments, the Laplace transform E[e−sEm (i) ] of Em (i) is given by i

E[e−sEm (i) ] =

P {Hbg = j}E[e−s[(j−1)σ+T +Em (i−j+1)] ] j=1

+P {Hbg = i + 1}E[e−s[iσ+Tc +Dm ] ] +

CWm

P {Hbg = j}E[e−s[iσ+Ts ] ]. (11)

j=i+2

For backoff stage n (0 ≤ n ≤ m − 1), we define Dn , Dn (i) and En (i) similarly as for backoff stage m. Let Dn (0 ≤ n ≤ m − 1) be the time needed for a packet in the tagged terminal to be successfully transmitted, given that the starting backoff stage of the packet is stage n. Since the starting backoff stage after a successful packet transmission is always stage 0, the medium access delay of a packet is, in fact, D0 by our definition. Let Dn (i) (0 ≤ i ≤ CWn − 1) and En (i) (1 ≤ i ≤ CWn − 1) be the random variables for stage n, (0 ≤ n ≤ m − 1) corresponding to Dm (i) and Em (i),

426

G.U. Hwang

respectively. Then, by similar arguments as above (here, to save space we omit the detailed derivation), the corresponding Laplace transforms E[e−sDn (i) ] and E[e−sEn (i) ] can be obtained similarly as given in (10) and (11), respectively. When the starting backoff stage is 0 at an arbitrary time, which implies that the tagged terminal transmits a packet successfully just before, all nontagged terminals in the background traffic do not transmit packets. Hence, in this case the next packet transmission time in the background traffic is the re(e) (e) maining transmission time, denoted by Hbg , which is give by P {Hbg ≤ k} = . ! " N −1 . Hence, the computation of D0 (i), 0 ≤ i ≤ CW0 −1 1 − P {R(e) > k − 1} (e)

is the same as above except that Hbg is used instead of Hbg . However, there is a packet transmission in the background traffic when we compute E0 (i), 1 ≤ i ≤ CW0 − 1, we use Hbg in the computation of E0 (i). Hence, the corresponding Laplace transforms are given by i

E[e−sD0 (i) ] =

P {Hbg = j}E[e−s[(j−1)σ+T +E0 (i−j+1)] ] (e)

j=1

+P {Hbg = i + 1}E[e−s[iσ+Tc +D1 ] ] +

CWm

P {Hbg = j}E[e−s[iσ+Ts ] ],

(e)

(e)

(12)

j=i+2

E[e−sE0 (i) ] =

i

P {Hbg = j}E[e−s[(j−1)σ+T +E0 (i−j+1)] ]

j=1

+P {Hbg = i + 1}E[e−s[iσ+Tc +D1 ] ] +

CWm

P {Hbg = j}E[e−s[iσ+Ts ] ].

(13)

j=i+2

Now using the above equations we can compute the Laplace transforms of Dn , 0 ≤ n ≤ m. Since the initial backoff counter in backoff stage n is selected uniformly in the window [0, CWn − 1], it follows that E[e−sDn ] =

CW n −1  i=0

1 E[e−sDn (i) ], 0 ≤ n ≤ m. CWn

Note that the Laplace transform of D0 is, in fact, the Laplace transform of the medium access delay. Hence, the expectation and variance of the medium access delay can be easily obtained by differentiating the Laplace transform of D0 once and twice, respectively. In the numerical studies, we compute the expectation and variance of the medium access delay.

3

Analysis of an Unsaturated Terminal

In this section, we evaluate the system performance of an unsaturated terminal in the IEEE 802.11 wireless network consisting of N terminals. Since the performance of a terminal in the network largely depends on the packet transmission processes of the other terminals in the network, we consider the worst case scenario where N − 1 terminals are assumed to be saturated and the terminal of

Analysis of Medium Access Delay and Packet Overflow Probability

427

interest, called the tagged terminal, is unsaturated. Since the network contains an unsaturated terminal, it is called the unsaturated network for simplicity from now on. Due to the saturation assumption of the other terminals in the worst case scenario, the resulting performance of the tagged terminal is the worst case performance. Note that the analysis of the worst case performance is meaningful because the quality of service that a terminal (or a user) feels largely depends on the worst case performance. To analyze the performance of the tagged terminal, we first compute the service capacity (or the service rate) of the tagged terminal in our worst case scenario. In section 2, we obtain the expected medium access delay E[D0 ] of a packet. Noting that the medium access delay is in fact the service time of a packet in a terminal, we see that the service rate of the tagged terminal is 1/E[D0 ]. From now on we assume that λ < 1/E[D0 ] for the stability of the tagged terminal. Next, we propose a mathematical model to evaluate the performance of the tagged terminal in the worst case scenario. The proposed mathematical model is based on the effective bandwidth theory [17]. The merit of using the effective bandwidth theory is that we can consider the arrival process and the service process separately and accordingly we do not need to consider the arrival process in the analysis of the service process. That is, even though we consider the performance of the tagged terminal in the unsaturated network, with the help of the effective bandwidth theory we use the packet transmission process of the tagged terminal in the saturated condition as the packet service process of the tagged terminal in the analysis. We will verify this in numerical studies later. Let N (t) be the number of successfully transmitted packets of the tagged terminal in the saturated condition during time interval [0, t]. We first compute the M.G.F. (Moment Generating Function) E[eθN (t) ] of N (t). For convenience, let μ and σ 2 denote the expectation and variance of the successful packet transmission time of the tagged terminal, which are obtained in section 2. From the renewal theory [15], for sufficiently large t it can be shown that N (t) can 2 be well approximated by a Normal distribution with mean μt and variance σμ3 t. t

θ+ σ

2t

θ2

Then, it follows that E[eθN (t) ] ≈ e μ 2μ3 . Then the EBF (Effective Bandwidth Function) ξS (θ) of the packet service process of the tagged terminal, defined by 1 E[eθN (t) ] [17], is given by ξS (θ) = limt→∞ tθ σ2 θ −1 1 E[e−θN (t) ] ≈ − 3 . t→∞ tθ μ 2μ

ξS (θ) = lim

Similarly, we define the EBF ξA (θ) of the arrival process by ξA (θ) = ΛAθ(θ) , where A(t) denotes the number of arriving packets at the tagged terminal during time interval [0, t] and ΛA (θ) = limt→∞ t−1 log E[eθA(t) ] [17]. Then, it can be shown [17] that the number Q of packets in the tagged terminal d

in steady state satisfies Q = supt≥0 A(t)−N (t) and that the overflow probability of Q is given by

428

G.U. Hwang ∗

P {Q > x} ≈ P {Q > 0}e−θ x ,

(14)



where θ is the unique real solution of the equation ξA (θ) − ξS (θ) = 0. In the numerical studies, we will use equation (14) to see the overflow probability of the tagged terminal and compare the result with simulation result.

4

Numerical Studies

In this section, we first provide numerical results based on our analysis in subsection 2.3 to investigate the medium access delay of the IEEE 802.11 DCF. We also simulate the IEEE 802.11 network by using a simulator we develop which is written in the C++ programing language. We compare simulation results with our numerical results to validate our analysis. In all simulations as well as numerical studies, we use the system parameters as given below. – – – – – – – – –

Payload Size : 8000 bits Phy Header (including preamble) : 192 bits Mac Header (including CRC bits) : 272 bits RTS Frame : Phy Header + 160 bits CTS Frame : Phy Header + 112 bits ACK Frame : Phy Header + 112 bits Data Rate : 1e6 bits/sec Time Slot σ = 20e-6 sec, SIFS = 10e-6 sec, DIFS = 50e-6 sec CW0 = 32, m = 5

The results are given in Fig. 1. From Fig. 1(a), our analytic results are well matched with simulation results. In addition, the expectation of the medium access delay is linearly increasing in the number of terminals. From Fig. 1(b), our analytic results well follow the simulation results even though our analytic results are likely to slightly underestimate the variance of the medium access delay. We think this underestimation is due to the independence assumption 0.55

2 0.5

0.45

1.6

0.4

1.4 The Variance

The Expectation

1.8

0.35

1.2

1 0.3

0.8 0.25

0.6 0.2

0.4 0.15 15

20

25

30 35 40 The number of terminals

(a)

45

50

55

15

20

25

30 35 40 The number of terminals

45

50

55

(b)

Fig. 1. The Expectation and Variance of Medium Access Delay of A Packet (in seconds)

Analysis of Medium Access Delay and Packet Overflow Probability

429

0 simulation analysis −0.5

−1

log10(P{Q >x})

−1.5

−2

−2.5

−3

−3.5

−4

0

5

10

15

20

25 30 the buffer size (x)

35

40

45

50

Fig. 2. The Overflow Probability of An Unsaturated Terminal

on the packet transmission processes of the background traffic and the tagged terminal in our analysis. In Fig. 2, we plot the overflow probabilities obtained by (14) as well as simulation. In this study, the number of nodes is 50 and the packet arrival process is an ON and OFF process where the packet arrivals during ON periods are according to a Poisson process and there is no packet arrival during OFF periods. The transition rate from the ON state (resp. OFF state) to the OFF state (resp. ON state) is 70000 (resp. 25000) 1/second. The Poisson arrival rate during ON periods is obtained from the constraint that the average arrival rate is 1.5. As shown in Fig. 2, our analytic results are well matched with the simulation results, from which we can verify the usefulness of the proposed model in this paper.

5

Conclusions

In this paper, we analyzed the medium access delay of a packet in the saturated IEEE 802.11 wireless network. In our analysis, we considered the detailed packet transmission processes of terminals in the network and derived the Laplace transform of the medium access delay of a packet. Based on the analysis of the medium access delay under the saturated condition, we proposed a mathematical model to analyze the packet overflow probability of an unsaturated terminal. We also provided numerical and simulation results to validate our analysis and to investigate the characteristics of the system performance.

References 1. IEEE LAN MAN Standard, Part 11: Wireless LAN Medium Access Control (MAC) and Physical layer (PHY) specifications, ANSI/IEEE Std 802.11, 1999 edn. 2. Chhaya, H., Gupta, S.: Performance modeling of asynchronous data transfer methods of IEEE 802.11 MAC protocols. Wireless Networks 3, 217–234 (1997) 3. Foh, C.H., Tantra, J.W.: Comments on IEEE 802.11 saturation throughput analysis with freezing of backoff counters. IEEE Communications Letters 9(2), 130–132 (2005)

430

G.U. Hwang

4. Tay, Y., Chua, K.: A capacity analysis for the IEEE 802.11 MAC protocol. Wireless Networks 7(2), 159–171 (2001) 5. Cali, F., Conti, M., Gregori, E.: IEEE 802.11 protocol: design and performance evaluation of an adaptive backoff mechanism. IEEE Journal on Selected Areas in Communications 18(9), 1774–1786 (2000) 6. Wang, C., Li, B., Li, L.: A new collision resolution mechanism to enhance the performance of IEEE 802.11 DCF. IEEE Transactions on Vehicular Technology 53(4), 1235–1246 (2004) 7. Weinmiller, J., Woesner, H., Ebert, J.-P., Wolisz, A.: Modified backoff algorithms for DFWMAC’s distributed coordination function. In: Proceesings of the 2nd ITG Fachtagung Mobile Kommunikation, Neu-Ulm, Germany, pp. 363–370 (September 1995) 8. Carvalho, M.M., Garcia-Luna-Aceves, J.J.: Delay analysis of IEEE 802.11 in singlehop networks. In: Proceedings of the 11th IEEE International Conference on Network Protocols, Atlanta, USA (November 2003) 9. Carvalho, M.M., Garcia-Luna-Aceves, J.J.: Modeling single-hop wireless networks under Rician fading channels. In: Proceedings of WCNC 2004 (March 2004) 10. Tickoo, O., Sikdar, B.: On the impact of IEEE 802.11 MAC on traffic charateristics. IEEE Journal on Selected Areas in Communications 21(2), 189–203 (2003) 11. Tickoo, O., Sikdar, B.: Queueing analysis and delay mitigation in IEEE 802.11 random access MAC based wireless networks. In: Proceedings of IEEE INFOCOM, Hong Kong, China, pp. 1404–1413 (March 2004) 12. Tickoo, O., Sikdar, B.: A queueing model for finite load IEEE 802.11 random access MAC. In: Proceedings of IEEE International Conference on Communications, vol. 1, pp. 175–179 (June 2004) ¨ 13. Ozdemir, M., McDonald, A.B.: A queueing theoretic model for IEEE 802.11 DCF using RTS/CTS. In: The 13th IEEE Workshop on Local and Metropolitan Area Networks, pp. 33–38 (April 2004) 14. Bianchi, G.: Performance analysis of the IEEE 802.11 distributed coordinate function. IEEE Journal of Selected Areas in Communications 18(3), 535–547 (March 2000) 15. Ross, S.M.: Stochastic Processes, 2nd edn. John Wiley & Sons, Chichester (1996) 16. Hwang, G.U., Lee, Y., Chung, M.Y.: A new analytic method for the IEEE 802.11 distributed coordinate function. In: revision for IEICE Transactions on Communications (March 2007) 17. Chang, C.-S.: Performance guarantees in communication networks. Springer, Heidelberg (2000)

Communications Challenges in the Celtic-BOSS Project G´ abor Jeney1 , Catherine Lamy-Bergot2, Xavier Desurmont3 , ´ Garc´ıa-Sanchidri´an5, Michel Bonte6 , Rafael Lopez da Silva4 , Rodrigo Alvarez 7 8 Marion Berbineau , M´ arton Csapodi , Olivier Cantineau9 , Naceur Malouch10 , David Sanz11 , and Jean-Luc Bruyelle12 1

7

Budapest University of Technology and Economics, Hungary [email protected] 2 THALES Communications, France [email protected] 3 Multitel Asbl, Belgium [email protected] 4 Telef´ onica Investigacion y Desarollo, Spain [email protected] 5 Ingenier´ıa y Econom´ıa del Transporte, S.A., Spain [email protected] 6 ALSTOM-TRANSPORT, France [email protected] Institut National de Recherche sur les Transports et leur S´ecurit´e, France [email protected] 8 EGROUP-Services Ltd, Hungary [email protected] 9 BARCO-SILEX, Belgium [email protected] 10 University Pierre and Marie Curie, France [email protected] 11 Soci´et´e Nationale des Chemins de fer Fran¸cais, France [email protected] 12 Universit´e Catholique de Leuven [email protected]

Abstract. The BOSS project [1] aims at developing an innovative and bandwidth efficient communication system to transmit large data rate communications between public transport vehicles and the wayside to answer to the increasing need from Public Transport operators for new and/ or enhanced on-board functionality and services, such as passenger security and exploitation such as remote diagnostic or predictive maintenance. As a matter of fact, security issues, traditionally covered in stations by means of video-surveillance are clearly lacking on-board trains, due to the absence of efficient transmission means from the train to a supervising control centre. Similarly, diagnostic or maintenance issues are generally handled when the train arrives in stations or during maintenance stops, which prevents proactive actions to be carried out. The aim of the project is to circumvent these limitations and offer a system level solution. This article focuses on the communication system challenges. Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 431–442, 2007. c Springer-Verlag Berlin Heidelberg 2007 

432

1

G. Jeney et al.

Introduction

The purpose of the BOSS project is to design, develop and validate an efficient railway communication system both on-board trains and between on-board and wayside with a guaranteed QoS (Quality of Service) level to support the high demands for services such as security on-board trains, predictive maintenance, as well as providing internet access or other information services to the travellers. The principle of the overall architecture proposed to meet this goal is given in Figure 1. Railway communications can be considered as highly challenging because of the specific environment (safety, speed up to 350 km/h, tunnels, low cost, etc.). Therefore, the results from the BOSS project will serve other application areas like road applications, The work in the BOSS project will focus on providing an optimised communication system relying on an IPv6 (Internet Protocol version 6) architecture for both on-board and wayside communications, that will offer a high QoS level to allow support of both the high demands for better passenger security on-board trains and the high throughputs and lower QoS for the travellers. The BOSS project will lead to the availability of validated new and/or enhanced passenger services such as passenger security, predictive maintenance and internet access on board public transport vehicles such as trains, through an efficient interconnection between land and on-board communication systems with a guaranteed level of QoS. Moreover, in order to validate the design but also integrate real requirements, the use of realistic specifications and the adaptation of simulations based on real life inputs, will be validated by the BOSS project through the implementation and the testing of two different transmitting

Fig. 1. The BOSS project architecture

Communications Challenges in the Celtic-BOSS Project

433

techniques (typically among WLAN, WiMAX, UMTS or similar systems). This will demonstrate the wireless connection capability both for surveillance needs inside the carriages and from the train to the surveillance centre as well as for Internet access within the train. This technology platform will allow to demonstrate the feasibility of transport on-board surveillance, leading to the increase of the level of security services and comfort feeling in Public Transport for European citizens. It should be noted that the reason why video applications are a key issue in the project is that the bandwidth they require is far from negligible, which will have a deep impact on the choice of wireless accesses and design of end-to-end QoS solutions, but also handover capabilities. Furthermore, video streams have to be transmitted simultaneously from several trains to the control centre when alarms are set up. The problem of radio resources management and multi user accesses are also an important issue of the BOSS project. To reach these goals, the following axes will be followed and investigated: – establish interconnectivity between internal wired, internal wireless (e.g. WLAN) and external wireless (e.g. WiMAX, UMTS, 3GPP+, 802.22, . . . ) systems, with handover issues on external wireless links, coverage and multi user issues, – ensure guaranteed QoS, including while performing handovers, to allow for an end-to-end QoS over the whole system, manage the different QoS on the different data links, – develop robust and efficient video coding tools that can be embedded in handheld devices and respect low delay requirements (for mobile supervisors use), – propose video analysis solutions for security issues, – provide audio processing tools to complement video and improve robustness of alarms generation. This document focuses on the first two areas. Since the project was started in October 2006, and it is 2.5 years long, there are no specific results yet, which are worth mentioning. The primary aim of this publication is to receive opinion feedback from the scientific world about the way the project chose to follow.

2 2.1

State-of-the-Art Situation State-of-the-Art on Dual Mobility for Transmission with QoS

Mobility over IP networks. Mobile IPv6 (MIPv6) provides layer 3 transparent mobility for the higher layers (for example TCP), so a mobile node remains reachable with its home address without the need of being connected to its home network. The transition or handover between networks is transparent for the higher layers and the connectivity loss produced during the handover is due to the exchange of the corresponding signalling messages. Every Mobile Node (MN) has a local address (or Home Address), which represents its original network address. This address remains the same independently of the mobile node

434

G. Jeney et al.

position, which when passing to another network still keeps its home address. Packets sent to the mobile node, when staying in its original network, will be routed normally as if the node was not mobile. The prefix of this address is the same as the network prefix where the node was originated. When a mobile node goes to a different network, it obtains a guest address (Care-of-Address, CoA), belonging to the address space of the visited network. The mobile node can acquire its care-of address through conventional IPv6 mechanisms, such as stateless or stateful auto-configuration. From now on, it can be reached also with this new address (apart from the home address). After obtaining the new address, the mobile node contacts a specific router from its home network (Home Agent, HA) and, during the registration process, the node registers its current CoA. Afterwards, when a packet is sent to the mobile node home address, the Home Agent will intercept it and tunnel it to the mobile node CoA. With this mechanism, the packets reach the mobile node at any location, because the CoA belongs to the address space of the subnet where it is connected. Handovers over wireless systems. Handover refers to the process of changing access points during communication. It can be divided into two separate parts, horizontal handover (HH) and vertical handover (VH). Horizontal handover means changing access points inside the same network, i.e. the user changes its geographic position and thus a new access point is assigned to maintain the communication link. For example, horizontal handover happens when a mobile service subscriber exits a cell and enters another. Vertical handover means changing between different technology access technologies (networks) which are available at the same geographic location without disturbing the communication. For instance, if both GSM/GPRS and UMTS networks are available, the user switches from GSM/GPRS to UMTS, typically in the hope of higher bandwidth. Another goal is to implement IP based vertical handover protocols. With such protocols transparent mobility management might be realised (switching between access networks, roaming). An important scope of such protocols is the support of the multimedia streaming services spreading nowadays in mobile communication. The most serious problem is IP address changes caused by the handover. During the handover the mobile client roams to another access technology, and to another service provider at the same time ; after the handover a new IP address is assigned to it. The objective of the protocol is to hide the change of the address from upper layers both at the mobile device and at the communication partner. While we are focusing on TCP/UDP traffic, the protocol must handle the socket-pair on both sides to maintain the connection state. End-to-end QoS over heterogeneous networks. A special care will be taken to address end-to-end service continuity and QoS support across wired and ad hoc wireless networks. Providing Quality of Service to the network, especially in the case of several hops in a heterogeneous and dynamic environment is a challenging task. Static internet Quality of Service (QoS) solutions are difficult to implement in mobile ad hoc networks: The IntServ/RSVP architecture QoS

Communications Challenges in the Celtic-BOSS Project

435

model is not suitable due to the limited resources (e.g. the amount of state information increases dramatically with the number of flows and the nodes must perform admission control, classification and scheduling functions). The DiffServ architecture would seem a better solution. However, DiffServ is defined for a fixed infrastructure in which there are boundary DiffServ routers that perform QoS functions and a Service Level Agreement (SLA) that defines the kind of contract between the Internet Service Provider (ISP) and the client. In a mobile ad hoc network it may be difficult to identify what are the boundaries. A node should be able to work both as a boundary and as an interior router. That complicates the tasks of the nodes. Furthermore the very concept of SLA is also difficult to define in such environments. There are several proposals based on QoS aware routing and QoS aware MAC layer. However, QoS architectures and protocols still need to be studied further, especially with respect to node mobility, network diameter and processing power. Transport protocols face major challenges for the end-to-end support of QoS in wired/wireless heterogeneous networks. The main transport protocol, TCP (Transmission Control Protocol), has been designed to provide reliable end-to-end data delivery. The mechanisms have been designed and tuned for wired networks and ignoring the specificities of the wireless medium, such as high bit error rates, frequent loss of connectivity, power constraints. . . Streaming applications, especially for audio and video, share a preference for timeliness over reliability. These applications tend to use RTP (Real Time Protocol) in place of TCP, to avoid the built-in congestion control mechanisms of TCP that lead to increased delay in the delivery of packets. The coexistence of congestion controlled traffic (TCP) and non congestion controlled traffic (RTP/UDP) on the same network induces a lack of fairness between the different sessions and thus poses severe challenges to the overall QoS architecture. The project will study this issue, will review the work carried out in the IETF group DCCP (the Datagram Congestion Control Protocol) and will propose specific congestion control mechanisms adapted for the BOSS heterogeneous wired/wireless mesh network. The BOSS project will study and propose an end-to-end QoS framework that can apply both to fixed and ad hoc network extensions. The key features of this framework will comprise the definition of a QoS policy for the ad hoc and public safety world, in compliance with management and accounting rules, the identification, signalling and marking of flows to render a co-ordinated end to end service and the development of techniques for prioritising flows in each individual sub network. QoS over IP networks. In itself, QoS is an intuitive concept, defined by ITU-T Rec. E.800 as: “the collective effect of the service performance which determines the degree of satisfaction of a user of the service” or “a measure of how good a service is, as presented to the user, it is expressed in user understandable language and manifests itself in a number of parameters, all of which have either subjective or objective values.” Even though these definitions are quite simple

436

G. Jeney et al.

and comprehensive on a global level, it is generally complex to determine real measures to reflect specific network requirements or constraints. Furthermore, realising applications conforming to subjective parameters can be extremely difficult due to contrasting user needs. For these reasons, standard organisations and network researchers have spent considerable efforts in order to map the end-user perspective to specific network requirements. The results essentially report a subjective opinion of a properly set of hypothetical and testing users with regards to the satisfaction of the service (for example the vision of a video or the listening of a recorded audio), that in turn depends on several aspects, in particular network related in a telecommunication scenario. ITU-T has defined in Rec. P.800 a Mean Opinion Score (MOS) scale, which can be used to map the subjective opinion score to a qualitative value. New wireless solutions in mobility mode. The current state of the mobile communication and consumer electronics can be characterised by the convergence of devices and by the growing needs for connecting those devices. In this last respect, simplicity and security are the two primary goals. Cumbersome network settings can possibly be dealt with in the computer world but certainly not in the mobile and consumer electronics world. The main driver for creating the Near Field Communication Interface and Protocol (NFCIP-1), was to make the users able to create a connection between two devices without any special knowledge about the “network”, yet any NFC-compliant device could be connected securely. The concept is strikingly simple: in order to make two devices communicate, bring them together or make them touch. As the two devices identify each other, they exchange their configuration data via NFC and then the devices can set up and continue communication either with NFC or via other (longer range and faster) communication channels (such as Bluetooth or WiFi). In Fall 2002, Sony and Philips reached agreement on the development of Near Field Communication (NFC) technology. In order to promote NFC worldwide, the two companies submitted the draft specifications to ECMA International, the organisation responsible for standardising information and communication systems. After developing open technical specifications, NFCIP-1 was approved under EMCA-340, and subsequently submitted by EMCA International to ISO/IEC. It has received approval under ISO/IEC IS 18092. The NFCIP-2 standard allows interoperation with RFID, Proximity Card (ISO/IEC 14443: 2001, Identification cards – Contactless integrated circuit(s) cards – Proximity cards) and Vicinity Card (ISO/IEC 15693: 2001, Identification cards – Contactless integrated circuit(s) cards – Vicinity cards) devices and readers by defining a mechanism for selecting the three operation modes as part of establishing the connection. Communication systems for video surveillance tools. The combination of video surveillance and communication systems have been tested for metro applications in the PRISMATICA [14], STATUE [15], ESCORT [16] projects and a first generation system is already deployed on the RATP line 14 and Singapour [17] underground in specific tunnel environments. In the case of urban

Communications Challenges in the Celtic-BOSS Project

437

buses the RATP systems AIGLE et ALTA¨IR were a first step based on low capacity professional radio. Projects like, TESS [13] SECURBUS in Belfort [18], LOREIV in Marseilles [19] and EVAS can be also mentioned. In the TESS project a DVB-T link using the Worldspace geostationary satellite was successfully experimented to transmit passenger information in urban busses. WiMax have been experimented in Lille and innovative tools for audio event detection have been proposed to design a new bimodal surveillance system adapted to an embedded context. Some bricks developed will be re-used in the BOSS project in the context of the railway environment. Existing cellular systems allowing multimedia communication such as GSM, GPRS, EDGE and UMTS offer poor bandwidth efficiency in the downlink direction. Even if the problem of high communication costs for nominal traffic mode, it is impossible for these systems to cope with the flexibility and high traffic demand in crisis situation. 2.2

State-of-the-Art on Wireless Mobile Transmission Links

In today’s offer the high data rate and mobility access is not realistic particularly from the train to the ground due to the poor spectral efficiency of existing systems. In the French national project TESS [13] the inadequation of existing cellular systems have been demonstrated in the case of data transmission from a moving urban bus and a control centre. Recently many new signal processing schemes have been proposed and are being studied to increase the wireless capacity link taking into account the mobility issues. The main topics of interest are, MIMO/MTMR links, adaptive links, cross layer optimisations, interference cancellation, . . . Some of these techniques are proposed as options in the standards and need to be evaluated to guarantee they will lead to the expected capacity and robustness expectations. Ongoing standards such as 802.16e, 3GPP+, 802.20 and 802.22 aim at providing both mobility and high throughputs. To do so the signal processing schemes that are generally foreseen are based on diversity techniques (MIMO), new coding and modulation schemes and adaptivity as presented in the previous section. These issues need thus to be qualified in order to evaluate their impact on the system performances. Convergence between WMAN and 3GPP+ is also foreseen that could lead to lower cost deployments. For the application of these wireless links to the project objective, we can identify two major issues. The security issue will require mainly a large uplink (from the train to the ground) whereas the travellers need would rather be ADSL like service with a large downlink throughput and a smaller uplink. In this scope, two main directions will be followed: the throughput increase and the coverage increase. In both cases, multiple element antennas are accurate as the capacity increase may be used to work at lower SNR and/or increase the throughput. For a capacity increase to be achievable using point to point links, it is necessary to have sufficient propagation diversity for the MIMO channel to increase the capacity. Notice that the antennas could be distributed along the train. For a capacity increase in a multi-point to point link, the capacity increase is made

438

G. Jeney et al.

by making multiple links with either the train or the access points (the links sharing the same bandwidth and transmitting simultaneously. A harsh issue in the specific context is the short coherence time of the propagation channel. Indeed, due to the speeds of the trains and the specific propagation conditions, the propagation channels will vary drastically than the one experienced in the today standards developments that focus on pedestrian applications. Efficient capacity increasing schemes rely on the propagation channel knowledge requiring fast tracking/prediction and robust communications links.

3

Relevance to Market Needs and Expected Impact

The BOSS project will provide the communications systems needed to develop new telecommunications applications in the domain of public transport. The major impact will be on societal benefits to European end-users, both in terms of confidence in terms of security of the transport system and comfort feeling and transport efficiency especially during off-peak hours. 3.1

Technological Innovation and Strategic Relevance

The objectives presented in Section 1 will be reached by the development of a dedicated architecture relying on the interconnection of the indoor world (or train) and the outdoor world via a common and unique IP gateway. As illustrated by Figure 1, the IP gateway will allow to interconnect a set of both wired and wireless accesses: – Cameras and microphones dedicated to a bimodal surveillance, aiming at ensuring a high security level for passengers and employees by alerting on abnormal events; – Sensors dedicated to maintenance needs, aiming at providing alarms and machine functioning state information to any supervisor able to manage them; – Mobile units with wireless access via indoor wireless network (e.g. WLAN), carried by on-board supervisors (e.g. controllers), aiming at ensuring a second level of safety/security by being able to react to alarms/alerts and also by being able to release such alarms/alerts Wireless access via outdoor network (e.g. UMTS/WiMAX) between train and ground control centre where in-depth events detections or maintenance analysis can be carried out, as well as specific security/safety actions launched (e.g. calling the police to come to next train stop). Mobility new approaches. Due to its target use case, the BOSS project is considering not only one level of mobility, but two levels. The first one corresponding to mobility inside the train itself, which can be viewed as a traditional setting for the mobile user considering the train reference frame. The second mobility level, which takes into account the mobility of train itself in the terrestrial reference frame. As such, the BOSS Project is targeting a dual mobility mode with a guaranteed QoS level.

Communications Challenges in the Celtic-BOSS Project

439

Example of scenario for the BOSS project demonstration. Based on the BOSS IP gateway architecture presented in Figure 1, and based on the partners experience in the domain, a first example of scenario is described hereafter, in order to illustrate the type of application that the BOSS architecture could offer. This scenario corresponds to the transmission of an alert, for example due to a passenger’s health problem, to the distributed control centre for immediate action. Communication from/to train and distributed control centre (see Figure 2). A train (especially sub-urban trains) will cross several types of wireless network, from GSM-R/GPRS/UMTS/WiMAX and WLAN, and soon future systems such as 802.16e (mobile WiMAX) and 802.22 WRAN (Wireless Regional Area Network). Based on its IP mobility capability, the BOSS architecture will select the adequate outdoor wireless link for transmission of the data to the control centre. In practice, the IP gateway sees the outdoor wireless link as just another link in its network, and the control centre as another node in the global IP network. By one or several hops the data (alert, context, . . . as well as other data if necessary) are then transmitted to the control centre for analysis or action. Abnormal event detection (see Figure 3). A camera installed in the passenger car, and connected to the IP gateway by a wired link, films the scene that is then analysed by events detection tools for generating an automatic alert in case of abnormal event detection, or by means of a manual alert transmitted over a wireless link to the IP gateway by a mobile supervisor (e.g. controller). The health problem is here detected. In the same way, a set of microphones is connected to the IP gateway and capture the sound environment inside the vehicle. Audio signals are then analysed to automatically detect some events. In overcrowded environments where occlusions appear, a single visual analysis is not always sufficient to reliably understand

Fig. 2. Scenario example: communication from/to train and distributed control centre

440

G. Jeney et al.

Fig. 3. Example: video surveillance and abnormal event detection (health problem)

passengers activity. Sound can be a salient information to deal with ambiguities and to improve detection/identification rate. Decision and feedback. The alert having been given, the control centre will launch the adapted procedure. If available a local team will be able to take over immediately assembling the appropriate medical team and sending it to the next train station for immediate care of the passenger. In the case where the alert was an automatic one, the control centre could also decide to alert the controller by return channel to direct him/her to the sick passenger and help directing the medical team. This feedback capability will also be used by events detection systems to adapt their detection process and sensitivity according to relevance measures information from surveillance operators.

4

Conclusions

This article introduces the Celtic project BOSS. The BOSS project intends to develop a communication system relying on an IP gateway inside the train that will enable the communications both inside the train, for communications inside carriages and for mobile passengers and controllers, and outside the train, mobile in the terrestrial reference frame, with a link towards wireless base stations (e.g. WiMAX, DVB). The BOSS project will consequently work on a dual mobility level, and will work to guarantee a differentiated Quality of Service for the different targeted services. The project partners will also work on video surveillance applications adaptation, in particular via – the robustification of the existing tools and development of behaviour analysis algorithms to ensure that the passenger security is handled in the best possible way, – the addition of audio processing tools to increase the confidence of a generated alarm in situations where a single video analysis is not sufficient.

Communications Challenges in the Celtic-BOSS Project

441

As an enhanced level of railway passenger security services is highly demanding in terms of bandwidth, this application thus represents a good case study to validate the BOSS concepts. Moreover, taking advantage of the bandwidth made available both downlink and uplink, wireless communications solutions such as video on demand, internet access, travel information services, . . . which greatly interest travellers, will be integrated in the global BOSS framework via an adapted level of service management. The BOSS project, with its IP gateway, is willing to offer the mobile train the possibility to inform a control centre both security, and trains exploitation related issues, and consequently to greatly increase the user protection as well as demonstrating the possibility to offer at the same time video on demand, onboard information and telecommunication services to the travellers. Validation will be performed through on-line tests on train in revenue service.

Acknowledgement The editor of this document wants to thank all contributions received from different BOSS project members which made this publication possible.

References 1. http://www.celtic-boss.org 2. Haritaoglu, I., Harwood, D., Davis, L.: W4: real-time surveillance of people and their activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 809–830 (2000) 3. Oliver, N., Rosario, B., Pentland, A.: A bayesian computer vision system for modelling human interactions. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 831–843 (2000) 4. Johnson, N., Hogg, D.: Learning recognition. Image and Vision Computing 14, 609–615 5. Hongeng, S., Brmond, F., Nevatia, R.: Representation and optimal recognition of human activities. In: IEEE Proceedings of Computer Vision and Pattern Recognition (2000) 6. Vu, T., Brmond, F., Thonnat, M.: Automatic video interpretation: a novel algorithm for temporal scenario recognition. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI03) (2003) 7. Desurmont, X., Chaudy, C., Bastide, A., Parisot, C., Delaigle, J.F., Macq, B.: Image analysis architectures and techniques for intelligent systems. In: IEE proc. on Vision, Image and Signal Processing, Special issue on Intelligent Distributed Surveillance Systems (2005) 8. Huang, T., Russell, S.: Object identification in a Bayesian context. In: Proceedings of International Joint Conference on Artificial Intelligence, Nagoya, Aichi, Japan, August 23–29, pp. 1276–1283 (1997) 9. Kettnaker, V., Zabih, R.: Bayesian multi-camera surveillance. In: IEEE Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, 23–25 June, pp. 253–259 (1999)

442

G. Jeney et al.

10. Javed, O., Rasheed, Z., Shafique, K., Shah, M.: Tracking Across Multiple Cameras With Disjoint Views. In: Proceedings of the Ninth IEEE International Conference on Computer Vision, Nice, France, pp. 952–957 (2003) 11. Pedersini, F., Sarti, A., Tubaro, S.: Multi-camera Systems? IEEE Signal Processing Magazine 16(3), 55–65 (1999) 12. Wren, C.R., Rao, S.G.: Self-configuring lightweight sensor networks for ubiquitous computing. In: Proceedings of the International Conference on Ubiquitous Computing, Seattle, WA,USA, 12–15 October 2003, pp. 205–206 (2003) 13. Final report for TESS project: Les syst`emes de communications satellites ou terrestres pour les flottes d’autobus urbains (satellite or terrestrial communication systems for urban busses fleet) (March 2003) 14. Final report, 5th PCRD project PRISMATICA 15. Final report, PREDIT2 – STATUE project 16. Final report, ESCORT – Enhanced diversity and Space Coding for underground metrO and Railway Transmission IST 1999–20006 project (December 2002) 17. INRESTS synthesis document: Synth`ese INRETS N0 40 Nov 2001, Le syst`emes de tl communication existants ou mergeants et leur utilisation dans le domaine des transports guides (Existing or emerging telecommunication systems and their use in guided transports) 18. SECURBUS – ACTES de la journ´ee s´ecurit´e dans les transports (Proceedings of Day on security in transport means), BELFORT, France (January 2002) 19. Communication avec les mobiles: application au trafic et aux transports routiers (Communication with mobiles: application to traffic and road transportation) Collection du CERTU (March 2001) ISBN 2-11-090861-0, ISSN 0247-1159 20. UITP (international Association of Public Transport), http://www.uitp.com 21. Vacher, M., Istrate, D., Serignat, J.F.: Sound Detection through Transient Models using Wavelet Coefficient Trees. In: Proc. CSIMTA, Cherbourg, France (2004) 22. Khoudour, L., Aubert, D., Bruyelle, J.L., Leclerq, T., Flancquart, A.: A distributed multi-sensor surveillance system for public transport applications. In: Intelligent Distributed Video Surveillance Systems, ch. 7, IEEE, Los Alamitos (to appear) 23. Bruyelle, J.L., Khoudour, L., Velastin, S.A.: A multi-sensor surveillance system for public transport environments. In: 5th International Conference on Methods and Techniques in Behavioral, The Netherlands (2005) 24. Sun, J., Velastin, S.A., Vicencio-Silva, M.A., Lo, B., Khoudour, L.: An intelligent distributed surveillance system for public transport. In: European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, London, UK (2004)

Performance Analysis of the REAchability Protocol for IPv6 Multihoming Antonio de la Oliva1 , Marcelo Bagnulo2, Alberto Garc´ıa-Mart´ınez1, and Ignacio Soto1 Universidad Carlos III de Madrid Huawei Lab at Universidad Carlos III de Madrid {aoliva,marcelo,alberto,isoto}@it.uc3m.es 1

2

Abstract. There is ongoing work on the IETF aimed to provide support for different flavors of multihoming configurations, such as SHIM6 for multihomed sites, multiple CoAs support in MIP for multihomed mobile nodes and HIP for multihomed nodes and sites. A critical aspect for all the resulting multihoming protocols is to detect failures and gain information related with the paths available between two hosts. The Failure Detection and Locator Path Exploration Protocol (in short REAchability Protocol, REAP) being defined in the SHIM6 WG of the IETF is a good candidate to be included as a reachability detection component on protocols requiring this functionality. Performance study is performed by combining analytical estimations and simulations to evaluate its behavior and tune its main parameters. Keywords: multihoming, failure detection, SHIM6, REAP.

1

Introduction

So far, IPv4 has failed to provide a scalable solution to preserve established communications for arbitrarily small-size sites connected to the Internet through different providers after an outage occurs. This is so because the current IPv4 multihoming solution, based on the injection of BGP [BGPMULT] routes in order to make a prefix reachable through different paths, would collapse if the number of managed routing entries would increase to accommodate small sites and even hosts. In particular, this restriction prevents end hosts equipped with different access interfaces, such as IEEE 802.11, UMTS, etc, connected to different providers, to benefit from fault tolerance, traffic engineering, etc. On the other hand, the huge address space provided by IPv6 has enabled the configuration of public addresses from each of the providers of an end host. A step further is been taken in the SHIM6 WG of the IETF to develop a framework to manage in an end-to-end fashion the use of the different addresses between a communication held by two hosts. To achieve this, a SHIM6 layer is included 

This work was supported by IST FP6 Project OneLab and by the Spanish MEC through Project CAPITAL (TEC2004-05622-C04-03).

Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 443–454, 2007. c Springer-Verlag Berlin Heidelberg 2007 

444

A. de la Oliva et al.

inside the IP layer to assure that the same IP address pair is provided to the upper layers to identify a given communication, while the packets flowing in the network can use different IP addresses (locators) to be able to enforce different paths. The SHIM6 [SHIM] [SHIMAPP] layers of two communicating nodes that want to benefit from the multihoming capabilities first execute a four-way handshake to exchange in a secure way the relevant information for managing the IP addresses that play the roles of identifiers and locators. Once this exchange has been performed, both SHIM6 layers use the REAP (REAchability Protocol) protocol to timely detect failures in the currently used path, and once a failure is detected to select a new path through which the communication could be continued. Note that while the REAP protocol is being defined as a component of the SHIM6 multihoming solution, it is envisioned that such a protocol could became a generic component in different scenarios in which end-to-end path validation is of paramount concern, such as HIP [HIP] or Mobile IPv6 with registration of multiple CoAs [MONAMI].It is straightforward to conclude from the framework presented above that the REAP protocol determines the performance that upper layers perceive when an outage occurs in the communication path. The path failure detection function of the REAP protocol relies on timers driven by the inspection of upper layer traffic, and by specific Keep Alive probe packets when upper layer traffic is too sparse. The current specification of the REAP protocol lacks of experimental support to properly configure the timers of the protocol and to fully understand the interaction with transport layers such as UDP or TCP. In this paper we simulate the protocol with the OPNET1 tool and we analyze the configuration of these timers and the impact on different type of applications. The remainder of the paper is organized as follows: A description of the REAP protocol is given in section 2. The scenario used in the simulation, as well as the details of the simulation environment are shown in section 3. Section 4 presents the results obtained regarding the behavior of UDP (section 4.1) and TCP (section 4.2) protocols, providing an analysis of several application types. Finally in section 5 we provide the conclusions of our work.

2

Failure Detection and Path Exploration in the SHIM6 Architecture

The SHIM6 architecture is currently being defined by the IETF to provide multihoming between hosts with multiple provider independent addresses. The architecture defines a shim sublayer, placed in the IP layer, which is responsible for ensuring that the same local and remote addresses are provided to the upper layers for the peers involved in a given communication, while at the same time different addresses can be used to allow the usage of different paths. As a consequence, two roles are assumed by the IP addresses. The term identifier is used for addresses passed to transport and application layers, and the term locator is reserved for the actual addresses used for IP forwarding. SHIM6 defines 1

OPNET University Program, http://www.opnet.com/services/university/

Performance Analysis of the REAchability Protocol

445

two components to manage the identifier/locator relationship in two communicating peers: the secure exchange between the peers of information related with identifiers and locators, performed by the SHIM6 protocol [SHIM], and the identification of communication failures and the exploration of alternative paths.Failure detection does not need specific tools if traffic is flowing between two hosts. On the other hand, when a node has no packets to send, it is irrelevant for the node if the locator pair is properly working or not, since it has no information to transmit. So, a potential relevant failure situation occurs when a node is sending packets but it is not receiving incoming packets. Such situation does not necessarily imply a failure, since a unidirectional flow may be being received, but this is indistinguishable from a failure without additional tests. In this case, the node needs to perform an explicit exchange of probe packets to discover if the current locator pair is properly working. This exchange is described in the REAchability Protocol, REAP [REAP] specification. The REAP protocol relies on two timers, the Keep Alive Timer and the Send Timer, and a probe message, namely the Keepalive message. The Keep Alive Timer TKA is started each time a node receives a data packet from its peer, and stopped and reset, each time the node sends a packet to the peer. When the Keep Alive Timer expires, a Keep Alive message is sent to the peer. The Send Timer TSend , defined roughly as three times the Keep Alive Timer plus a deviation to accommodate the Round Trip Time, is started each time the node sends a packet and stopped each time the node receives a packet from the peer. If no answer (either a Keep Alive or data packet) is received in the Send time period a failure is assumed and a locator path exploration is started. Consequently, the Send timer reflects the requirement that when a node sends a payload packet there should be some return traffic within Send Timeout seconds. On the other hand, the Keepalive timer reflects the requirement that when a node receives a payload packet there should a similar response towards the peer within Keepalive seconds (if no traffic is interchanged, there is no Keep Alive signaling). As a consequence, there is a tight relationship between the values of the timers defined by the REAP protocol and the time required by REAP to detect a failure. The current specifications suggest a value of 3 seconds for the Keepalive Timer, and of 10 seconds for the Send Timer, although these values are supported by neither analytical studies nor experimental data.Once a node detects a failure, it starts the path exploration mechanism. A Probe message is sent to test the current locator pair, and if no responses are obtained during a period of time called Retransmission Timer TRT x , the nodes start sending Probes testing the rest of the available address pairs, using all possible source/destination address pairs. Currently, a sequential algorithm is defined to drive the exploration of locator pairs, behavior that will be assumed for the rest of the paper. So far the REAP specification process has focused on functionality, without paying too much attention to performance metrics in common operation conditions, such as the time required to detect a failure and recover it. An experimental analysis would provide relevant guidelines to tune the main parameters that define the REAP behavior when combined with different types of applications. In particular,

446

A. de la Oliva et al.

interaction with applications using TCP should be considered, in order to characterize the interactions between REAP and the flow and congestion control mechanisms provided by this protocol. When UDP transport is considered, the resulting behavior is driven mainly by the application protocol; in this case relevant applications should be analyzed. In the next sections we perform some simulations that try to provide valuable information related with REAP timer configuration for applications using UDP and TCP.

3

Simulation Setup

In this section we present the scenario used to test the path failure detection functionality of the REAP protocol. Figure 1 shows two nodes, Node A and B, each one with two interfaces and an IPv6 address configured on each interface. All simulations have been performed by establishing a communication through the pair (IPA1 , IPB1 ). All traffic exchanged between these IP addresses goes through Cloud 1 and 2. At a certain time, the link connecting Cloud 1 and 2 fails, this is detected by REAP and after a path exploration, the communication is continued using the IP pair (IPA2 , IPB2 ). Tests performed involve the TCP

Fig. 1. Simulated Scenario

and UDP protocols. The TCP tests, designed to evaluate the TCP behavior in cases with high and low data rates, are performed using an FTP file download application and a Telnet application. The traffic used to evaluate UDP behavior corresponds to either a Voice over IP (VoIP) application showing bidirectional packet exchange or to an unidirectional voice flow. Note that unidirectional flows result in increased exchange of REAP specific packets.For TCP, the Windows XP model defined in OPNET has been used. For UDP, a VoIP conversation, using the codec G.729 with a compression delay of 0.02 seconds. The RTT in both paths is the same, it has been implemented as a normal distribution with mean 80ms and 20ms variance. The failure event occurs at a time defined by an uniform distribution between 75 and 125 seconds. All simulations have been run for 250 seconds, the presented results are the average

Performance Analysis of the REAchability Protocol

447

of 45 samples. The real values are within ±10% (on the worst case) of the estimated values with a confidence interval of 95%.

4

Analysis of the Results

In order to find the values for the REAP timers that optimize the behavior of TCP and UDP when a path failure occurs, several measures have been performed. The main metric used through the analysis is the Application Recovery Time. This metric is defined as the difference in time between the last packet arriving through the old IP locators (addresses) and the first packet arriving through the new ones. This metric accurately measures the time to recover from a path failure when there is continuous traffic. The analysis is structured in the following items: – UDP behavior: To fully understand the behavior of applications using UDP, two types of traffic have been considered, bidirectional traffic (VoIP conversation) and unidirectional traffic (streaming of audio). – TCP behavior: TCP incorporates several characteristics such as congestion control and reliability that determines the resulting performance when a valid path is provided as a result of the REAP operation. To understand the behavior of applications using TCP two traffic types have been considered, a FTP download from a server and a telnet session showing sparse traffic exchange. With these two traffic types the behavior of applications with high traffic demands and applications with low traffic profiles are considered. 4.1

UDP Behavior

Consider that a failure occurs, when the TSend timer expires, the node tries to reach the peer by sending a probe to the IP address that is currently in use. This probe expires after TRT x seconds. At this time, a second probe is sent to the secondary IP address. The path exploration mechanism finalizes after the exchange of 3 probes per peer, directed to the secondary IP addresses. The time required to finalize the path exploration mechanism is almost constant (there is some variation due to the RTT variance) with a value of 0.7 seconds2 . Figure 2 shows the Recovery time for different TSend (TKA = TSend /3) values and for two types of UDP applications, Voice Over IP (VoIP) and an unidirectional VoIP flow. The results follow the expected behavior, being the relation between the Recovery Time and the TSend linear. This relation was expected to be linear since UDP is not reactive to path conditions and once the path is restored traffic is immediately sent through the new IP locators. Note that in figure 2 the Recovery Time of the unidirectional traffic is lower than the bidirectional one. The difference between them can be quantified and, in mean, it is approximately 2

This value is obtained by experimental results, although it can be computed as 0.5sec + 3RT T .

448

A. de la Oliva et al.

Fig. 2. UDP Recovery Time

equal to TKA 2 . This behavior is due to the fact that when there is only unidirectional traffic, Keep Alive messages are always exchanged in a regular basis. When a failure occurs, the last Keep Alive was probably received at the peer side some time before the failure. So the TSend was started when the first packet after the reception of the Keep Alive is sent, and thus is, probably, some time before the failure. On the other hand, if we have continuous traffic in both ways, the TSend timer is probably started closer to the time of the failure (the last time a packet was sent after the reception of a packet from the other side). Keep Alive Signaling on the unidirectional case. The worst case scenario related with the overhead on signaling introduced by REAP is the unidirectional communication traffic case. If the traffic is unidirectional, Keep Alive messages are exchanged in a regular basis to maintain the relation updated. Once a packet is received, the TKA timer is initiated, after this time period without sending any packet, a Keep Alive message is sent. The timer is not set again until a new packet arrives, hence the number of Keep Alive messages sent is dependant on the transmission rate of the source. If we call δ the time between two consecutive packets sent by the source, δ is an upper boundary to the time between sending a Keep Alive message and starting the Keep Alive timer again. Finally, the formula providing the number of Keep Alive messages sent per second is TKA1 +δ . 4.2

TCP Behavior

FTP Application. Figure 3 shows the Recovery Time achieved while varying the TSend timer. Note that the results for TCP traffic are not linear with

Performance Analysis of the REAchability Protocol

449

Fig. 3. TCP Recovery Time

the TSend parameter as occurred with the UDP values (figure 2). This behavior is due to the mechanisms implemented in TCP for congestion detection and avoidance, in particular dependent on the retransmission timeout of TCP. TCP interprets a retransmission timeout as an indication of congestion in the network, so it uses an exponential back-off to retransmit the packets to avoid increasing the congestion. This mechanism affects the Recovery Time, since although the path has been reestablished, the packets will not be retransmitted until the retransmission timeout expires. To show a detailed explanation of this behavior we present figure 4(a). Figure 4(a) presents, for a given experiment (TSend = 10sec), the Retransmission Timeout, Congestion Window and traffic sent through both paths. Traffic starts being sent through the primary path, until the link fails. At this moment the congestion window decreases and the retransmission timer increases. When the path exploration mechanism ends, the retransmission timer set up to 8 seconds has not expired. When it expires, packets are sent according to the slow start policy set for TCP. Figure 5 shows the difference in time between the arrival of a packet in a connection with a failure and the arrival of the same packet if no failure in the path occurs. As can be observed, packets suffer a big delay when the link fails (this delay is equivalent to the time needed to discover the failure and complete a path exploration mechanism), and then it remains roughly constant. This effect is due to the increase in the congestion window after the communication is recovered, packets will start to be sent faster until the congestion window reaches its top, after this packets are sent in a constant way, this behavior can be observed in figures 4(a) and 5. Due to the explanation presented above, we argue that the stair shaped graph in figure 3 is caused by the impact of the backoff mechanism of the retransmission timer of TCP. Figure 6, presents the backoff mechanism used by TCP to set up the

450

A. de la Oliva et al.

(a) Normal TCP operation

(b) TCP operation resetting the retransmission timeout Fig. 4. TCP behavior explanation

retransmission timer. As the number of retransmissions increases, the retransmission timer duplicates its value. We argue that as the TSend varies, the instant in time when the path is recovered falls in one of the steps presented in figure 6, this is the cause of the differences in time presented in figure 3.To improve

Performance Analysis of the REAchability Protocol

451

Fig. 5. Difference in time between packets in a communication with path failure and without path failure

Fig. 6. TCP Retransmission Timeout

the performance, we propose to reset the retransmission timer of TCP after a new path is chosen for the communication (figure 4(b)). Notice that this is not only more efficient, but also appropriate as the retransmission timer value is dependant on the properties of the path. In the simulator, we implemented a

452

A. de la Oliva et al.

Fig. 7. UDP vs TCP (resetting the retransmission timer) Recovery Time

hook in the TCP stack to allow REAP to reset the retransmission timer forcing TCP to start retransmitting packets immediately. The experimental results of this proposal are presented in figure 7 along with the previous results for UDP bidirectional VoIP traffic and TCP traffic to easy the comparison. As expected, the relation between the TCP Recovery Time and TSend is linear, and even more, the TCP modified behavior is very similar to UDP. Telnet Application. To conclude the TCP study, an application with low data traffic has been analyzed. The chosen application is Telnet, in which usually long periods of time without packet transmission occurs. The design of REAP tries to minimize signalling overhead, so no Keep Alive probing is performed when no upper layer traffic is exchanged, due to this behavior, REAP will notice the failure after a time, defined by TSend , of the first packet sent after the failure. To show this behavior figure 8 is presented. The Application Recovery Time metric, refers to the time elapsed between the first packet (using the old locators) sent after the failure and the first packet sent using the new locators. On this time period, the failure discovery and path exploration mechanisms are performed. The recovery procedure start time offset depends on the time between the path failure and the time when the application first tries to send a packet, but this offset is not important since the application is not affected by the failure because it was not trying to exchange any traffic. Figure 8 presents a similar trend to figure 3, presented on this figure for comparison purposes. The impact on the Application Recovery time of resetting the retransmission timer of TCP is shown in figure 8. As can be seen the effect is similar to the one presented on figure 7, being it a noticeable decrease on the Recovery Time of the application.

Performance Analysis of the REAchability Protocol

453

Fig. 8. Recovery time in a telnet application

TCP Reset Time. One of the most important characteristics of the REAP protocol to work in a TCP environment is to handle the recovery before the TCP session expires. In order to measure how much address pairs may be checked in the path exploration phase before TCP resets the connection, several tests have been done, using the default TCP configuration of a Microsoft Windows Server 2003 machine. The TCP stack implemented on it, resets the TCP connection if a limit of 53 retransmissions is reached. The time between the failure detection and the reset of the connection is of 75 seconds in mean.The REAP specification sets a backoff mechanism to handle the retransmission in the path exploration protocol. This mechanism follows Tout = 2n where n is the number of the retransmission count. This exponential backoff starts when the number of addresses proved is higher than 4, for the first 4 retransmissions a Tout of 0.5 seconds is used. The exponential backoff has a limit of 60 seconds, where it reaches the maximum, being the Tout for the rest of the retransmissions of 60 seconds.Taking into account the backoff mechanism, the number of possible IP address pairs explored is of 10. It is worth to notice that the first try of the REAP protocol is to check the current IP pair being used.As the previous results prove, the REAP protocol can check a big number of IP addresses before the TCP session expires, providing a mechanism to restore the communication.

5

Conclusion and Future Work

This paper presents details of the appropriate timer configuration of the REAP protocol as well as the effect of the protocol configuration in the two main 3

http://technet2.microsoft.com/WindowsServer/

454

A. de la Oliva et al.

transport protocols, TCP and UDP. We also present a possible modification to the TCP stack which enables TCP to take advantage of the REAP protocol providing a faster Recovery Time. The results show a clear increase in the performance of the failure detection over TCP, being comparable to UDP. As future work, there are several issues to be analyzed such as the aggressiveness of the path exploration mechanism (for example, explore alternative paths on parallel instead of serially) or a timer configuration based on the RT T which will decrease the Recovery Time while minimizing false positives on failure detection. We are also interested on increasing the information provided by the REAP protocol, going for path availability to gather some characteristics of the path (RTT could be the simplest example), and finally to study the combination of REAP with other protocols and scenarios, for example Mobile IP with multiple registrations and mobility.

References [REAP] Arkko, J., van Beijnum, I.: Failure Detection and Locator Pair Exploration Protocol for IPv6 Multihoming; IETF draft; draft-ietf-shim6-failure-detection-06 (September 2006) [LOCSEL] Bagnulo, M.: Default Locator-pair selection algorithm for the SHIM6 protocol; IETF draft; draft-ietf-shim6-locator-pair-selection-01 (October 2006) [HIP] Moskowitz, R., Nikander, P.: Host Identity Protocol (HIP) Architecture; Request for Comments: 4423 [MIP] Johnson, D., Perkins, C., Arkko, J.: Mobility Support in IPv6; Request for Comments: 3775 [SHIM] Nordmark, E., Bagnulo, M.: Level 3 multihoming shim protocol; IETF draft; draft-ietf-shim6-proto-06 (November 2006) [SHIMAPP] Abley, J., Bagnulo, M.: Applicability Statement for the Level 3 Multihoming Shim Protocol(shim6); IETF draft; draft-ietf-shim6-applicability-02 (October 2006) [BGPMULT] Van Beijnum, I.: BGP: Builiding Reliable Networks with the Border Gateway Protocol; Ed O’Reilly (2002) [MONAMI] Wakikawa, R., Ernst, T., Nagami, K.: Multiple Care-of Addresses Registration; IETF draft; draft-ietf-monami6-multiplecoa-01 (October 2006)

Controlling Incoming Connections Using Certificates and Distributed Hash Tables Dmitrij Lagutin and Hannu H. Kari Laboratory for Theoretical Computer Science Helsinki University of Technology, P.O. Box 5400, FI-02015 TKK, Finland [email protected], [email protected]

Abstract. The current architecture of the Internet where anyone can send anything to anybody presents many problems. The recipient of the connection might be using a mobile access network and thus unwanted incoming connections could produce a high cost to the recipient. In addition, denial of service attacks are easy to launch. As a solution to this problem, we propose the Recipient Controlled Session Management Protocol where all incoming connections are denied by the default and the recipient of the connection can choose using certificates what incoming connections are allowed. The recipient can also revoke rights for making an incoming connection at any time. Index terms: Session management, rights delegation, rights management, certificates, DoS countermeasures.

1 Introduction In the current Internet architecture, the initiator of the communication controls the connection. This is fine when connecting to Internet servers, but this policy might cause problems when connecting a private user directly. There are many reasons why the other endpoint, the recipient, would want to control which incoming connections will be allowed. The recipient might be using a wireless access network that has a limited bandwidth. The recipient even might have to pay for all network traffic, including the incoming traffic. In addition, the recipient might be in a situation where he does not want to be disturbed by unnecessary connections, like sales calls. In this case, only really important connections from a limited set of initiators should go through.Many access network's firewalls block all incoming connections unless the recipient has initiated the connection first, however this is too restrictive policy, the recipient should have the option for receiving direct incoming connections from trusted initiators. Naturally, unwanted incoming connections could also be blocked using personal firewall, but in that case they would still consume network resources. Thus, it is better to block unwanted connections already at the access network level, before they even reach the destination. Blocking unwanted connections would also make it much harder to initiate a denial of service and distributed denial of service attacks against the network or the recipient. The structure of this paper is as follows: in Chapter 2 we go through related work and requirements of our system. Chapter 3 introduces Recipient Controlled Session Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 455–467, 2007. © Springer-Verlag Berlin Heidelberg 2007

456

D. Lagutin and H.H. Kari

Management Protocol. Chapter 4 describes how can the packet security be handled under proposed system, Chapter 5 contains comparison with other similar solutions and Chapter 6 contains conclusions and discusses future work.

2 Requirements and Related Work We challenge the traditional freedomness of Internet (i.e. “anyone can send anything to anybody”) by setting requirements for initiator. The initiator must have direct or indirect permission from the recipient in advance in order to bypass a gatekeeper that protects the recipient against garbage from the Internet. The basic requirement of the proposed system is that unauthorized incoming connection attempts are denied before they reach the destination. The blocking is done by a gatekeeper (i.e. firewall of the access network).Detailed requirements for the system are listed in Table 1. Table 1. Requirements for limiting incoming connections Requirement

Description

Mandatory requirements R1

Blocking the unauthorized incoming traffic before the destination

The unauthorized incoming traffic should not disturb the recipient or cause any (monetary, energy or other) costs to the recipient or the access network of the recipient. This requirement is important especially in mobile networks where the available bandwidth might be quite low and where the recipient may have to pay also for incoming traffic.

R2

Rights for making an incoming connection should be revocable

The recipient must be able to revoke rights from the initiator at any given time.

R3

System must support mobility and change of initiator's IP address

The initiator might use different networks like fixed LAN, wireless LAN and 3G, and might change between them frequently. Change of initiator's network or an IP address should not require a full renegotiation of rights.

R4

Verification of data packets

There should be a mechanism to verify that the incoming data is really coming from a trusted initiator and the incoming data is not forged or duplicated by a malicious party.

R5

Authentication of the initiator

The recipient must be certain that the initiator is really the authorized initiator and not some malicious party.

R6

Rights for making an incoming connection should be delegatable

If the recipient allows delegation, initiator must be able to delegate rights to other parties.

R7

Resilience of the system

The system should not have a single point of failure

Optional requirements R8

Good performance, low signalling overhead

R9

System must support quality-of-service issues

Rights management should not consume excessive amount of any network resources.

Future enhancements The right for sending the incoming data may contain bandwidth or other quality-of-service limitations.

Controlling Incoming Connections Using Certificates and Distributed Hash Tables

457

Previous research on this field has mainly concentrated on the prevention of denial of service attacks. Anderson et al. [2] describe system of capabilities to prevent denial of service attacks. The idea behind their system is that the initiator of the connection receives capability tokens beforehand from the recipient and these tokens will allow the initiator to send data to the recipient for a limited time. In this approach, there exist a network of Request-To-Send (RTS) servers coupled with verification points (VPs). The initiator sends a token request using RTS servers which forward the request to the recipient that allows or denies the request. In former case, the capability token is returned to the initiator using same RTS servers and the data connection can be established. RTS servers along the path take note of a token and pass token information to coupled verification points. When VPs receive traffic from the initiator, they check wherever the token included in data packets is valid and will allow only data packet with valid tokens to go through. This kind of solution will limit incoming connections, but it does not satisfy all our requirements, especially R1, R3, R4 and R5. Under this scheme, token requests are always forwarded to the recipient and thus even denied requests will consume resources within recipient's access network. Additionally, when the recipient receives a token request from the initiator, the recipient will only know initiator's IP address, thus the recipient cannot be completely sure who is actually the initiator. Since the recipient sends capability tokens unencrypted to the initiator, malicious party can snoop tokens and thus gain recipient's rights to itself. Finally, the recipient or the recipient's network cannot guarantee that data packets sent with valid tokens are indeed coming from the original trusted initiator and are not duplicated. Yaar et al. [20] introduce Stateless Internet Flow Filter (SIFF) to prevent distributed denial of service attacks. The aim of SIFF is to separate traffic in privileged and unprivileged traffic. The privileged traffic would be treated with a higher priority by routers, while the unprivileged traffic would receive a low-priority treatment and it would be impossible to launch denial of service attacks using only unprivileged connections. In order to establish privileged connection to the recipient, the initiator must first obtain a “capability” from the recipient. During the capability negotiation phase routers among the path between the initiator and the recipient would take note of capabilities and would later check that the data packets have the correct capability within them. The capability is bound to the initiator's IP address and is valid for a limited time after which the recipient must renew it.While SIFF reduces the risk of denial of service attacks it does not completely satisfy our requirements, especially R1, R3, R4, R5 and R6. For example, binding capabilities to a certain IP address presents some problems. The Host Identity Indirection Infrastructure (Hi3) [14] combines the Host Identity Protocol [13] with the Secure Internet Indirection Infrastructure (Secure-i3) [1]. Under the Host Identity Protocol (HIP), separate host identifiers are introduced to describe the identity of the host and the IP addresses will only be used for determining topological location of the host. The Secure-i3 is based on a Internet Indirection Infrastructure (i3) [18]. The aim is to use separate overlay network for connections. The initiator sends data to the certain identifier and the overlay networks forwards the data to the certain recipient based on the identifier. This way the initiator does not have to know the IP address of the recipient beforehand. Under the Hi3, the HIP base

458

D. Lagutin and H.H. Kari

exchange is handled by the overlay network while the actual data transfer is done between hosts using HIP. This way the recipient is not bothered by connection attempts, only after the base exchange has been completed successfully with an overlay network the initiator can start the actual data connection to the recipient. In order to provide more protection against denial of service attacks, SPI multiplexed NAT (SPINAT) [21] can be used together with Hi3. A SPINAT proxy is placed near the recipient and the initiator makes a connection to the SPINAT proxy which forwards the connection to the recipient. Thus, the initiator does not know the actual IP address of the recipient which makes it harder to launch denial of service attacks against the recipient. Such a system however would not satisfy the requirement R6 concerning delegatibility of rights. It would also be possible to implement similar system for limiting incoming connection based on the SIP [16] protocol. With this approach, the recipient and the initiator would register themselves to the SIP proxy and the SIP proxy would be used for negotiation of rights before the actual data connection. The initiator would contact the SIP proxy and ask for permission to send data to the recipient. The SIP proxy would then ask for permission from the recipient and if this permission is granted the proxy would send recipient's IP address to the initiator and the data connection could be established. SIP based system would not satisfy fully our requirements, especially R1, R2, R4 and R5. The biggest drawback in this kind of system is that after the initiator gains recipient's IP address, it will be impossible for the recipient to stop the initiator from sending data. Distributed Hash Tables (DHTs) [6] provide an effective solution for storing large amount of data in a distributed way. The DHT network is fault tolerant, if some node exits the DHT network, other nodes can take over its duties. There exists several DHT protocols like Tapestry [7] and Chord [19]. Many DHT protocols achieve O(ln n) lookup latency where n is the amount of nodes in the DHT network, some protocols achieve even lower latency using a higher amount of links to other nodes.

3 Recipient Controlled Session Management Protocol The aim of the Recipient Controlled Session Management Protocol is to allow a recipient to control incoming connections using certificates. Only explicitly allowed incoming connections will reach the recipient. A general architecture of the system is presented in Figure 1. To better understand the architecture depicted in Figure 1 and use cases presented below, main concepts like the initiator, the recipient, the session controller and the firewall are explained here. The initiator is the party that initiates the connection. The recipient is the party at the other end of connection, in our model the recipient grants explicit rights to initiators to allow incoming connections. The session controller (also called as “proxy”) is the entity in which the recipient trusts and it can give certificates for making incoming connections to trusted initiators. If the recipient is changing access networks, the session controller keeps track of recipient's current IP address or addresses just like the home agent in MobileIPv6 [8]. The services of the

Controlling Incoming Connections Using Certificates and Distributed Hash Tables

459

Fig. 1. A general architecture of the Recipient Controlled Session Management Protocol

session controller can be offered to the recipient by e.g. recipient's operator1.In the proposed architecture, session controllers form a Distributed Hash Table network to eliminate a single point of failure. When the initiator requests certificates, it contacts the closest proxy from the DHT network and this proxy forwards the request through the DHT network to the home proxy of the recipient, thus the initiator does not need to know which proxy is the home proxy of the recipient. This home proxy is trusted by the recipient and it will provide certificates for trusted initiators. The firewall denotes the firewall in recipient's access network. The duty of the firewall is to block incoming traffic into the access network that is not allowed or desired. Thus the firewall must take note of the certificates that pass through it. The firewall will also grant certificates that allow nodes to use the firewall's access network. The objective is to create a connection from the initiator to the recipient. In order to create this connection, the initiator needs to receive necessary certificates for making an incoming connection from the proxy that acts as a session controller, otherwise the firewall in the recipient's access network would not allow the incoming connection to the recipient. Two conditions must be satisfied before the initiator can receive certificates from the proxy. First the recipient must grant a certificate to the initiator which shows that the initiator is trusted by the recipient. In addition, the recipient must certify the home proxy so that the home proxy has a right to grant necessary certificates to trusted initiators. The structure of the certificates that are used with the Recipient Controlled Session Management Protocol is explained briefly here. The certificate contains public keys of the issuer and the subject of the certificate. In addition the certificate contains a validity period and information about the rights and delegatable rights provided by the certificate. Rights include the right to create the incoming connection, the right for 1

The proposed architecture does not care who controls the proxy or is there several proxies in the network. The recipient might buy proxy services from the operator or operate an own proxy. The important thing is that recipient must trust that the proxy will give permissions to initiate connections and will disclose recipient's location (IP addresses) only to authorized initiators.

460

D. Lagutin and H.H. Kari

Fig. 2. Controlling incoming connections

session initialization management, the right to request a new certificate, the right to send data to the network and the right to delegate rights to other parties. Finally, there is an issuer's signature over the certificate. Below we show different cases how the Recipient Controlled Session Management Protocol can be used. The first example describes how Recipient Controlled Session Management Protocol can be used to limit incoming connections to the recipient. The second example shows how revocation of rights works with the Recipient Controlled Session Management Protocol. 3.1 Controlling Incoming Connections This basic case where the mobile user, the recipient, wants to restrict incoming connections is illustrated in Figure 2. The recipient authorizes the proxy 1. The recipient gives a C1 certificate to the proxy which means that the proxy can disclose the recipient's location to certain trusted initiators and the proxy can also authorize them to initiate connection to the recipient. Based on traditional certificate formats in this certificate the recipient is the issuer of the certificate, the proxy is the subject and the certificate is signed by the recipient. The certificate is valid from time T1 until T2. The recipient authorizes the initiator In steps 2 and 3 the recipient authorizes the initiator. This is a necessary step, since without the recipient's authorization, the proxy will not trust in the initiator. 2. The initiator delivers its public key to the recipient, it can be delivered offline.

Controlling Incoming Connections Using Certificates and Distributed Hash Tables

461

3. If the recipient trusts the initiator, it creates a certificate C2 that allows initiator to request recipient's current location (IP address) and necessary certificates from the proxy (the right for session initialization management bit is set to one in C2 certificate). This certificate is sent to the initiator. Steps 2-3 can be carried offline. The recipient changes the access network Steps 4 and 5 describe the mobility management. 4. When the recipient changes the access network, as part of the network's AAA procedure the recipient gets a certificate Cf from the network (denoted in figure as “firewall”) that allows the recipient to use the new access network. 5. The recipient sends a location update together with this new Cf certificate to the proxy. Establishment of data connection Steps 6-9 describe establishment of data connection between the initiator and recipient. 6. The initiator contacts the proxy. The initiator sends C2 certificate to the proxy and requests the IP address of the recipient together with all required certificates. The recipient might change the network frequently, thus the initiator must retrieve recipient's latest IP address from the proxy. 7. Based on C2 certificate, the proxy knows that the initiator is authorized to get recipient's latest IP address and to receive a certificate for sending data to the recipient. As a result, the proxy creates a new C3 certificate that allows initiator to initiate the incoming connection to the recipient. Certificates Cf, C1, C3 together with the recipient's IP address are then sent to the initiator. 8. The initiator first sends a control message, using e.g. ICMP, with certificates Cf, C1 and C3 to the recipient. This allows the firewall of the recipient's network to check that the recipient is a legitimate node in the access network and the recipient is willing to receive the incoming traffic from the initiator. The Cf certificate tells the firewall that the recipient is authorized by the access network and the recipient is a valid entity. The C1 and C3 certificates create together a certificate chain: recipient->proxy->initiator, which denotes that the initiator has a right to initiate the connection to the recipient. 9. The data connection can be now established. Notes: The C3 certificate that allows initiator to create an incoming connection is given through the proxy because that way the recipient can easily revoke it. For example, if after step 2 the recipient does not trust the initiator anymore, the recipient can notify the proxy that the C2 certificate is revoked and thus the initiator would not be able to receive the C3 certificate from the proxy. If the recipient would have sent the C3 certificate directly to the initiator, then the revocation of the C3 would be much more difficult. If the recipient uses several network interfaces simultaneously, the proxy could return several different IP address and associated Cf certificates to the initiator. In addition to permitting certain initiators to send data to itself, the recipient could also specify more general policies to proxy, like “now I do not want to be disturbed, do not allow anybody to contact me” or “anybody can contact me, even without a valid C2 certificate”.

462

D. Lagutin and H.H. Kari

3.2 Revocation of Rights This use case is continuation to the previous one, it describes a situation where the recipient wants to revoke a right to send data from the initiator. The revocation is done by notifying the proxy, firewall and the initiator. The proxy is responsible for giving initiator the recipient's IP address with necessary certificates in exchange for C2 certificate. It is possible for the recipient to send a new version of C2 certificate (called C2N and C2N2 in this example) to the proxy. The proxy will keep only the newest version of this kind certificate in its database and this certificate always overrides the certificate received from the initiator. Thus, this method can be used to revoke and reissue C2.

Fig. 3. Revocation of rights for making an incoming connection

The use case, which is illustrated in Figure 3, demonstrates the situation where the initiator has already got a C3 certificate, the initiator is sending data but then starts misbehaving. As a result, the recipient revokes initiator's rights and the firewall in recipient's access network notices the revocation and stops the traffic coming from initiator. Later in the example, recipient reissues rights to initiator for making an incoming connection for another time period. The recipient wants to revoke rights 1. The recipient creates a C2N certificate, which is similar to the C2 certificate in the previous use case but all rights are cleared. This C2N certificate is then sent to the proxy and it will override any previous similar certificate. If the initiator requests recipient's IP address or C3 certificate from the proxy (step 6 in the previous use case), the proxy denies the request because in its database exist a valid C2N certificate that has its rights set to zero. The C2N certificate is also sent to the initiator to notify that it should not send any more data and at the same time the C2N certificate passes the firewall which takes note of it. Thus, if the initiator tries to send data directly to the recipient, the firewall will block the data flow.

Controlling Incoming Connections Using Certificates and Distributed Hash Tables

463

The recipient wants to give rights back for a different time period 2. A new certificate, C2N2, is created and this certificate has a right for session initialization management bit set to one. This certificate is sent to both proxy and the initiator. This certificate will also pass the firewall which will take a note of it. Establishment of data connection 3. Now the initiator can request the recipient's IP address with necessary certificates from the proxy. If the initiator will make a request to the proxy using the original C2 certificate outside the validity time of C2N2 certificate, the request will be denied since the proxy has C2N2 certificate in its database and this certificate automatically overrides other certificates. 4. Just like in a previous case, the proxy sends recipient's IP address with necessary certificates to the initiator. 5. Similarly, the initiator sends a control message to the recipient containing necessary certificates. Certificates Cf, C1 and C3 show that the initiator has a valid rights for making an incoming connection to the recipient and C2N2 certificate does not refute this right. 6. The data connection can now be established

4 Providing a Packet Level Security The solution based on certificates presented above does not satisfy our requirements completely by itself. Especially during the negotiation phase and when the data connection is established, the sender of the data must be verified on the packet level to ensure that the data is really sent by the trusted initiator. The Packet Level Authentication (PLA) [3][10][15] can be used to accomplish this task. The idea behind the PLA is that every packet is cryptographically signed by the sender of the packet. Thus the authenticity and integrity of the packet can be verified by any node in the network. The PLA adds additional header to the standard IPv6 packet and this header includes information like sender's public key, sender's signature over the whole packet, sequence number, and a timestamp. Also, a certificate given by a trusted third party is included in the PLA header, this certificate guarantees that the sender is a valid well behaving entity. The trusted third party can for example be an operator or state authority. The PLA uses the elliptic curve cryptography (ECC) [9][12] with 160 bit keys and 320 bit signatures, thus using the PLA does not create a significant bandwidth overhead. The elliptic curve cryptography is computationally intensive, but hardware accelerators can be used for speeding the signing and verification tasks. Such accelerators achieve a good performance [11][17] with a relatively low power consumption [4][5]. The PLA provides benefits in following steps of the original example (Figure 2). In step 6., when the initiator requests recipient's IP address and certificates from proxy, the proxy can check wherever the public key of the packet received from the initiator matches the public key in the C2 certificate's subject field. If they match, then the initiator is really trustee of the recipient and the proxy can trust the initiator. If they do not match, then it means that some malicious entity has intercepted the C2 certificate and tries to obtain rights to itself with an intercepted certificate, in this case the request will be naturally denied.

464

D. Lagutin and H.H. Kari

Similar check must be made in step 9, the firewall must check that the sender of the data is really the entity that is authorized to make an incoming connection by the certificate C3. In addition the PLA is also required in the revocation of rights example (Figure 3). In steps 1 and 2 the proxy must check that those certificate updates that override previously issued certificates really come from the recipient. Thus other parties will not be able to send certificate updates to the proxy in recipient's name.

5 Analysis and Comparison with Other Solutions The summary of comparison with other solutions is presented in Table 2. The first R1 requirement is blocking the unauthorized incoming traffic before the destination. Our proposed system and Hi3 together with SPINAT satisfy this requirement, with other approaches malicious parties can freely send data to the recipient if recipient's IP address is known. The R2 requirement is the revocability of rights. SIFF and Capability tokens approaches do not support revocability directly, but similar results can be achieved using short time rights: if the initiator misbehaves, the right for sending data will not be renewed. With the hypothetical SIP based system the initiator can send freely data to the recipient after the initiator has received recipient's IP address. With our system it is possible revoke rights using new certificates that override existing ones. The R3 requirement is the mobility. SIFF does not support mobility since capabilities are bound to the initiator's IP address. When the initiator's IP address changes capabilities will not be valid anymore. Capability tokens include path between the recipient and initiator, thus if recipient changes the network, the capability token will not be valid anymore. The R4 requirement is the verifiability of data packets. Only our system and Hi3 guarantee the authenticity and integrity of the data. However with Hi3 only the recipient can validate the integrity while with our system any node on the path can perform integrity check for data packets. Thus, with Hi3 forged or duplicated packets will reach the recipient and will consume resources in the recipient's access network. With other systems, data is sent unsigned, thus it can be easily forged or duplicated. The R5 requirement is the authentication of the initiator. Our system and Hi3 use public keys to check the initiator before giving rights. SIFF and Capability tokens approaches do not have any means to the check the initiator beforehand, with these approaches the recipient will know only the IP address of the initiator. The R6 requirement is delegability of rights. Our system supports fully this requirement, if the recipient allows delegation, then the initiator can create a new certificate where the initiator authorizes another party. This certificate can be combined with a existing certificate chain: recipient->proxy->initiator to create a new certificate chain: recipient->proxy->initiator->initiator2, thus the initiator2 will be also allowed to create an incoming connection to the recipient. With Capability tokens it is possible to give the token, and thus delegate rights, to another party that is located within the same subnet. But this approach will not work in generic way. With SIP based system delegation of rights could be also implemented using proxy.

Controlling Incoming Connections Using Certificates and Distributed Hash Tables

465

The final mandatory requirement, R7, is the resilience of the system, the system should not have a single point of failure. Hypothetical SIP based system with a centralized proxy would not satisfy this requirement. Our proposed system satisfies this requirement because a DHT network is used for proxies. Performance is quite good with all approaches, they do not consume excessive amount of network resources. Table 2. Comparison between different approaches regarding our requirements Requirement Mandatory R1 R2 R3 R4 R5 R6 R7 Optional R8 Future en- R9 chantments

Capability Tokens

SIFF

Sip based Our system Hi3+ system (RCSMP + PLA) SPINAT

No

No

Yes

No

Yes

Using short time rights No No No Within the same subnet Yes Yes Yes

Using short time rights No No No No

Yes

No

Yes

Yes Yes Yes No

Yes No No Yes

Yes Yes Yes Yes

Yes Yes No

Yes Yes No

No Yes No

Yes Yes With extensions

The final R9 requirement is the support for quality-of-service issues. Capability tokens have some support for this, e.g. the number of packets that the initiator is allowed to send is included in capability. Our system can be extended to satisfy this requirement, the certificate that is given to the initiator can include e.g. the priority level of the traffic and the maximum amount of bytes and packets that the initiator is allowed to send using the certificate. Overall, the biggest problem with most other approaches is that the traffic will always reach the recipient initially and thus consume resources within the recipient's network. With our approach unauthorized traffic to the recipient is blocked and connection attempts will go through the proxy first and only authorized initiators can send packets directly to the recipient. In addition, when the recipient changes the access network, it needs to send only one message to the proxy regardless of the number of initiators that want to contact the recipient.

6 Conclusions and Future Work We have presented the Recipient Controlled Session Management Protocol that uses certificates and distributed hash tables, a novel way to control incoming connections. Its aim is to block in a robust way unauthorized incoming connections before they reach destination.

466

D. Lagutin and H.H. Kari

Using certificates for rights management is a better approach than binding rights to the certain IP address. The initiator might use different networks like fixed LAN, wireless LAN and 3G, and he might change networks rapidly. If the right for making an incoming connection is bound to a certain IP address, the initiator would need to renegotiate that right each time when the initiator's access network is changed. This problem does not exist when using certificates, as long as the initiator possesses a valid certificate, he can make an incoming connection to the certain recipient regardless of initiator's IP address. Secondly, IP addresses can be forged by malicious entity which would allow malicious entity to highjack rights given to others. Finally, the IP address is not directly related to the identity of the initiator, thus when the recipient grants rights to the certain IP address, he cannot be completely sure to whom he is actually granting rights. The certificate approach does not have this problem since certificates contain initiator's public key. Security on a packet level is essential to satisfy our requirements, otherwise the recipient and the firewall in recipient's access network cannot be sure that packets are indeed coming from the trusted initiator and they have not been tampered with. Using a DHT network for proxies eliminates a single point of failure from the system. If some proxy goes offline, other proxies can take over its duties. Our presented system can be improved and extended in many ways to provide more complex services. For example, the recipient could mark trusted initiators with priority levels and provide this information to the proxy. The recipient could also notify the proxy of his state (busy, available, in the meeting, etc.) and the proxy could decide wherever to grant rights to initiators based on this recipient's state. Thus, if the recipient is very busy, proxy would grant rights for making incoming connections to only trusted initiator's with a high priority level. The recipient's state could also contain information wherever he is at work or at home. If the recipient is at home then the proxy could block incoming connection from work related initiator's. Since certifying each initiator separately by the recipient may be cumbersome and time consuming, the proposed system could be modified to allow certificates for making an incoming connections to be given on the higher level. For example, two companies could give necessary certificates to each other and afterwards employees of those companies could contact each other without additional certificates.

References 1. Adkins, D., Lakshminarayanan, K., Perrig, A., Stoica, I.: Towards a more functional and secure network infrastructure. Technical Report UCB/CSD-03-1232, Computer Science Division (EECS), University of California, Berkely, USA (2003) 2. Anderson, T., Roscoe, T., Wetherall, D.: Preventing Internet Denial-of-Service with Capabilities. In: ACM SIGCOMM Computer Communications Review, pp. 39–44 (2004) 3. Candolin, C.: Securing Military Decision Making In a Network-centric Environment. Doctoral dissertation, Espoo (2005) 4. Gaubatz, G., Kaps, J., Öztürk, E., Sunar, B.: State of the Art in Ultra-Low Power Public Key Cryptography for Wireless Sensor Networks. In: proceedings of the third International Conference on Pervasive Computing and Communications Workshops, Hawaii, USA (March 2005)

Controlling Incoming Connections Using Certificates and Distributed Hash Tables

467

5. Goodman, J., Chandrakasan, A.: An Energy-Efficient Reconfigurable Public-Key Cryptography Processor. IEEE Journal of Solid-State Circuits 36(11), 1808–1820 (2001) 6. Gribble, S.D., Brewer, E.A., Hellerstein, J.M., Culler, D.: Scalable, Distributed Data Structures for Internet Service Construction. In: Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000), pp. 319–332 (2000) 7. Hildrum, K., Kubiatowicz, J.D., Rao, S., Zhao, B.Y.: Distributed Object Location in a Dynamic Network. In: Proceedings of the 14th ACM Symposium on Parallel Algorithms and Architectures (SPAA 2002), pp. 41–52 (2002) 8. Johnson, D., Perkins, C., Arkko, J.: Mobility Support in IPv6. The Internet Society, Network Working Group, Request for Comments: 3775 (2004) 9. Kobliz, N.: Elliptic Curve Cryptosystems. Mathematics of Computation 48, 203–209 (1987) 10. Lunberg, J.: Packet level authentication protocol implementation. In: Military Ad Hoc Networks, vol. 1(19), Helsinki (2004) 11. Lutz, J., Hasan, A.: High Performance FPGA based Elliptic Curve Cryptographic CoProcessor. In: Proceedings of the International Conference on Information Technology: Coding and Computing, ITCC 2004, Las Vegas, USA (April 2004) 12. Miller, V.: Use of Elliptic Curves in Cryptography. In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, Springer, Heidelberg (1986) 13. Moskowitz, R., Nikander, P.: Host Identity Protocol. Internet draft, work in progress (June 2006) 14. Nikander, P., Arkko, J., Ohlman, B.: Host Identity Indirection Infrastructure (Hi3). In: proceedings of the Second Swedish National Computer Networking Workshop, Karlstad, Sweden (November 2004) 15. Packet level authentication [online] [Accessed 10 October 2006], Available from: http://www.tcs.hut.fi/Software/PLA/ 16. Rosenberg, J., et al.: SIP: Session Initiation Protocol. The Internet Society, Network Working Group, Request for Comments: 3261 (2002) 17. Satoh, A., Takano, K.: A Scalable Dual-Field Elliptic Curve Cryptographic Processor. IEEE Transactions on Computers 52(4), 449–460 (2003) 18. Stoica, I., Adkins, D., Zhuang, S., Shenker, S., Sunara, S.: Internet Indirection Infrastructure. In: Proceedings of ACM SIGCOMM 2002, Pittsburgh, USA (August 2002) 19. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. In: Proceedings of the ACM SIGCOMM 2001, pp. 149–160 (2001) 20. Yaar, A., Perrig, A., Song, D.: SIFF: A Stateless Internet Flow Filter to Mitigate DDoS Flooding Attacks. In: proceedings of the 2004 IEEE Symposium on Security and Privacy, Oakland, USA (May 2004) 21. Ylitalo, J., Nikander, P.: BLIND: A Complete Identity Protection Framework for Endpoints. In: proceedings of the Twelfth International Workshop on Security Protocols, Cambridge, UK (April 2004)

Design and Implementation of an Open Source IMS Enabled Conferencing Architecture A. Buono2 , T. Castaldi1 , L. Miniero1 , and S. P. Romano1 1

University of Napoli Federico II, Via Claudio 21, 80125 Napoli, Italy 2 CRIAI Consortium, P.le E. Fermi 1, 80055 Portici (NA), Italy

Abstract. In this paper we embrace an engineering approach to service delivery over the Internet, by presenting an actual implementation of a conferencing framework compliant with the IP Multimedia Core Network Subsystem (IMS) specification. The architecture we describe has been conceived at the outset by taking into account ongoing standardization efforts inside the various active international bodies. At its current state, it is capable to provide video conferencing facilities with session management capabilities and floor control. The system presented is intended to serve as a running experimental testbed useful for protocol testing, as well as field trials and experimentations. It will be first described from a high level design perspective and subsequently analyzed in further detail by highlighting the most notable implementation choices. A mapping between the actual system components and the corresponding IMS logical functions will be provided and a discussion concerning those parts of the system which somehow depart from the IMS paradigm will be conducted. This on one hand will help the reader figure out potential discrepancies between our solution and the IMS model; on the other hand will open space for discussion around some important open issues on which the international research community still has to attain a rough consensus. Keyword: IP Multimedia Subsystem, Centralized Conferencing, Floor Control.

1

Introduction

The IP Multimedia Systems (IMS) architecture is currently being standardized by the third generation Partnership Project (3GPP) and aims to provide a common service delivery mechanism capable to significantly reduce the development cycle associated with service creation across both wireline and wireless networks. IMS main objective resides in trying to reduce both capital and operational expenditures (i. e. CAPEX and OPEX) for service providers, at the same time providing operational flexibility and simplicity. The envisaged portfolio of IMS services includes advanced IP-based applications like Voice over IP (VoIP), online gaming, videoconferencing, and content sharing. All such services are to be provided on a single, integrated infrastructure, capable to offer seamless switching functionality between different services. It is worth noting that IMS Y. Koucheryavy, J. Harju, and A. Sayenko (Eds.): NEW2AN 2007, LNCS 4712, pp. 468–479, 2007. c Springer-Verlag Berlin Heidelberg 2007 

Design and Implementation of an Open Source IMS

469

is conceived as an access agnostic platform. This requirement clearly imposes a careful study of the core IMS components (such as Call/Session Control Function – CSCF, Home Subscriber Server – HSS, Media Resource Function – MRF and Application Server – AS), which must be scalable and able to provide advanced features, like five nine reliability. In the above depicted scenario, the need arises for an architecture which effectively tackles many complex issues, including implementation. Indeed, although early IMS trials and deployments are underway, various challenges still have to be faced, related to both the infrastructure and the service level. The goal of this paper is to provide a contribution to the solution of some of the above mentioned challenges, with special regard to the need for actual implementations of IMS architectures and services. More precisely, we will herein present an implementation of a conferencing framework compliant with the IP Multimedia Core Network Subsystem specification and currently capable to provide video conferencing facilities in conjunction with session management capabilities and floor control. The overall architecture will be first described from a high level design view; then, we will delve into the implementation details. The paper is structured as follows. Section 2 helps position our work by providing useful information about the reference context, as well as about the motivations behind our contribution. An IMS-compliant architecture for moderated video conferences is depicted in section 3. Implementation details are illustrated in section 4, whereas in section 5 we deal with related work. Finally, section 6 provides some concluding remarks, together with information about our future work.

2

Context and Motivation

The Session Initiation Protocol (SIP) [1] provides users with the capability to initiate, manage, and terminate communication sessions in an IP network. SIP already allows multi party calls between multiple parties. However, conferencing does represent a more sophisticated service than multi party calls among multiple users. Indeed, conferencing applies to any kind of media stream by which users may want to communicate: this includes, for example, audio and video media streams, as well as conferences based on instant messaging or even gaming. The conferencing service provides the means for a user to create, manage, terminate, join and leave conferences. This service also provides the network with the ability to deliver information about these conferences to the involved parties. The standardization process associated with centralized conferencing over IP is still at an early stage within the different communities involved in the development of a standard conference system. The Internet Engineering Task Force (IETF) is an open international community concerned with the evolution of the Internet architecture and protocols. The main working groups within the IETF involved in the multimedia conferencing standardization effort are Session Initiation Proposal Investigation (SIPPING) and Centralized Conferencing (XCON). The SIPPING working group has

470

A. Buono et al.

developed a framework for multi-party conferencing with SIP [2]. This framework is based on the Conferencing Requirements document [3], which defines a general architectural model, presents terminology, and explains how SIP is involved in a tightly coupled conference. On the other hand, the goal of the XCON working group is to define both a reference framework and a data model [4] for tightly coupled conference scenarios envisaging the presence of a centralized management entity, called focus. A focus is a logical entity which maintains a call signalling interface between each participating client and the so-called conference object representing a conference at a certain stage (e. g. description upon conference creation, reservation, activation, etc.). Thus, the focus acts as an endpoint for each of the supported signaling protocols and is responsible for all primary conference membership operations. At present, XCON has specified the so-called Binary Floor Control Protocol (BFCP) [5]. BFCP enables applications to provide users with coordinated (shared or exclusive) access to resources like the right to send media over a particular media stream. The 3rd Generation Partnership Project (3GPP) is a collaboration agreement among a number of regional standard bodies. The scope of 3GPP is to develop Technical Specifications for a third-generation mobile system based on GSM. 3GPP has specified the requirements and defined the overall architecture [6] for tightly-coupled conferencing. The mentioned document actually represents a sort of umbrella specification within the IP Multimedia Core Network subsystem (IMS), trying to harmonize the combined use of existing standard protocols, such as the Session Initiation Protocol (SIP), SIP Events, the Session Description Protocol (SDP) and the Binary Floor Control Protocol (BFCP). The Open Mobile Alliance (OMA) is the leading industry forum for developing market driven, interoperable mobile service enablers on the extensive 3GPP IMS architecture. The OMA Conferencing solution builds on the service enablers. At present, OMA has standardized a conference model for Instant Messaging [7], as well as a conference model for the Push to Talk service [8]. Our conferencing framework is based on the architecture for the 3GPP conference service and is fully compliant with the associated requirements. 2.1

The IMS Architecture

Fig. 1 shows the architecture for the 3GPP IMS conferencing service and the interfaces among the different entities involved. For the sake of clarity we don’t explain all the IMS entities and interfaces; we rather focus on those which are relevant to our project. The User Equipment (UE) implements the role of a conference participant and may support also the floor participant or floor chair role (the difference between such roles will be clarified in section 4). The UE might be located either in the Visited or in the Home Network (HN). In any case, it can find the P-CSCF via the CSCF discovery procedure. Once done with the discovery phase, the UE sends SIP requests to the Proxy-Call Session Control Function (P-CSCF). The P-CSCF in turn forwards such messages to the Serving-CSCF (S-CSCF). In order to handle properly any UE request,

Design and Implementation of an Open Source IMS

471

Legacy PLMN SIP-AS Sh ISC

Access Network

C, D, IM-SSF Gc, Gr

OSA-SCS

MRFC Mr

Sh

Mp

Si Mb

ISC

P-CSCF

ISC

MRFP

Mw Mw

S-CSCF

Dx

Mw

Access Gm Network

HSS

Cx

Cx Mw

P-CSCF

SGW

SLF

Dx

Mw

Mi Mk I-CSCF

Mm

Non-IMS IP PDN

Mg Mj

MGCF

BGCF

Mn Mb

MGW

Fig. 1. The IMS Architecture

the S-CSCF needs both registration and session control procedures (so to use both subscriber and service data stored in the Home Subscriber Server – HSS). It also uses SIP to communicate with the Application Servers (AS). An AS is a SIP entity hosting and executing services (in our scenario, the AS clearly hosts the conferencing service). The IP Multimedia Service Control (ISC) interface sends and receives SIP messages between the S-CSCF and the AS. The two main procedures of the ISC are: (i) routing the initial SIP request to the AS; (ii) initiating a SIP request from the AS on behalf of a user. For the initiating request the SIP AS and the OSA SCS (Open Service Access - Service Capability Server ) need either to access user’s data or to know a S-CSCF to rely upon for such task. As we already mentioned, such information is stored in the HSS, so the AS and the OSA SCS can communicate with it via the Sh interface. In a SIP based conferencing scenario, the MRFC (Media Resource Function Control ) shall regard the MRFP (Media Resource Function Processing) as a mixer. In fact, the MRFP hosts both the mixer and the floor control server. When the MRFC needs to control media streams (creating a conference, handling or manipulating a floor, etc.) it uses the Mp interface. This interface is fully compliant with the H.248 protocol standard. The MRFC is needed to support bearer related services, such as conferencing. The focus, conference policy server and media policy server are co-located in an AS/MRFC component in the 3GPP framework. S-CSCF communicates with MRFC via Mr, a SIP based interface. In this scenario the AS/MRFC shall implement the role of a conference focus and a conference notification service. MRFC may support the floor control server role, the floor chair role or the floor participant role.

472

3

A. Buono et al.

An IMS Compliant Video Conferencing Architecture

From an architectural perspective our effort was first to identify and locate the IMS logical elements needed to properly handle an advanced conferencing scenario and subsequently to find out how it was possible to replace such elements with real-world components. The following sections provide more information about the two steps mentioned above. Starting from a bird’s eye view of the IMS architecture as shown in Fig. 1, we can clearly identify several elements defining the behaviors needed in a conferencing scenario. The very first mandatory element is the User Equipment (UE), which has to be both SIP compliant and XCON enabled in order to correctly support conferencing. According to the flow of messages, we need to have the P-CSCF, which may behave like a proxy. The P-CSCF accepts requests and forwards them to the S-CSCF. Hence, S-CSCF and HSS are the next selected elements, which will perform a number of important tasks, such as checking users access and authorization rights, handling session control for the registered endpoint sessions and interacting with Services Platforms for the support of services. Once done with the control elements needed to build the signaling plane of a conferencing scenario, we can now focus on floor management, streaming and control. To accomplish this task we selected the following elements. The SIP-AS is the SIP Application server as defined in [9] and will be in charge of managing conferences (e.g. creating, modifying, deleting them). Besides, it will be responsible for Floor Control, by managing access rights to shared resources in our conferencing framework. The MRFC, in turn, will Control the media stream resources in the MRFP, will interpret information coming from an AS and S-CSCF and control MRFP accordingly. The MRFP will provide resources to be controlled by the MRFC, as well as additional functionality like mixing of incoming media streams (in our case, audio and video streams) and media stream processing (e.g. audio transcoding, media analysis). The MGCF will perform the interworking with the PSTN, while controlling the MG for the required media conversions. Finally, the MGW will help perform the interworking with the PSTN, at the same time controlling and reserving the resources required by the media streams. According to the identified requirements, we can replace the IMS Elements with several real-world components. In our architecture, some of these components have been provided by the open source community. Some other entities have been either developed from scratch or based on open source components that have been appropriately extended in order to meet our architecture requirements. As described in Fig. 2, we replaced the UE with a SIP client called Minisip (http://www.minisip.org/), made capable to handle BFCP protocol messages. We also replaced the P-CSCF with a fully compliant SIP Proxy server called OpenSER (http://www.openser.org/). The S-CSCF and HSS elements have been realized by exploiting an open source SIP server called Asterisk (http://www.asterisk.org). Asterisk actually provided us with many required

Design and Implementation of an Open Source IMS

473

Fig. 2. IMS Elements Mapping

IMS functions. In fact, the role of the SIP-AS is played in our architecture by an enhanced version of an Asterisk component called MeetMe, capable to manage conferences. Furthermore, the roles of the MRFC and MRFP components are played by a couple of ad-hoc modified Asterisk modules capable to provide media management, streaming and floor control. Finally we replaced the MGCF and MGW components with a native Asterisk component performing the interworking with the PSTN, including all the related activities. Based on the above considerations, in the next section we will delve into the details of the implementation of our architecture.

4

CONFIANCE: An Open Source Implementation of the Conferencing Architecture

In this section we present an actual implementation of an open platform for the support of IP-based conferencing scenarios. Such platform, which we called CONFIANCE (CONFerencing IMS-enabled Architecture for Next-generation Communication Experience), has been realized in the framework of a collaboration activity involving the University of Napoli and Ericsson’s Nomadic Lab in Helsinki and it tries to take into account the most recent proposals under development inside the various standardization communities. More precisely, starting from the IMS-compliant design described in the previous sections, we implemented a video conferencing service which provides advanced capabilities, like moderated access to conference resources. In order to accomplish this task, we took inspiration from the work ongoing inside the IETF XCON working group. As stated in section 2, the XCON framework defines a set of conferencing protocols, which are complementary to the call signaling protocols, for building

474

A. Buono et al.

advanced conferencing applications. Among them, the so-called BFCP (Binary Floor Control Protocol) deserves special attention. BFCP enables conferencing applications to provide users with coordinated (shared or exclusive) access to available conference resources. By coordinated access we mean the capability to manage the access to a set of shared resources, such as the right to send media over a particular media stream. Each shared resource or set of resources is associated with a so-called floor, which is defined as a permission to temporarily access or manipulate the set of resources in question. A logical entity called chair is made responsible for one or more such floors. Its main task is managing requests for the floors it is assigned to. The clients of a conference can make floor requests on a transactionby-transaction basis to the Floor Control Server, thus asking for the permission to access a specific set of resources. The server forwards incoming requests to the chair, asking her/him for a decision about them. The BFCP can be used not only to make requests, but also to ask for information about specific floors, existing requests and requests made by other participants as well in a conference. A 3rd party Floor Request mechanism is also offered to enable requests for floors having other users in a conference as beneficiaries, instead of the original requester. Chairs are besides offered more complex functionality, e.g. to actively revoke a floor from a participant who may be abusing it. Notice that while BFCP offers a mechanism for coordinated access to resources, the policies a Floor Control Server follows to grant or deny floors are outside its specification. To make available an XCON-compliant architecture, we had to work both on the client and on the server side, as well as on the communication protocols between them. On the client side we implemented the two roles envisaged in the architecture, namely the simple participant and the chair. On the server side, we implemented the role of the focus, as defined in [4]. Finally, coming to the communication protocols, besides implementing BFCP as it is currently specified in the IETF [5], we also designed and realized a brand new conferencing control protocol, whose features will be briefly described in the following. Specifically, BFCP has been implemented as a library integrated into both client and server entities of the architecture. Instead, due to the current lack of agreement in the XCON WG about a specific Conference Control Protocol, we went for a temporary alternative solution capable to offer the basic functionality that the architecture is supposed to provide. We then specified and implemented a text-based protocol, called XCON Scheduler. Clients can use such protocol to dynamically manage conference creation as well as conference information. 4.1

Server Side Components

On the server side, we adopted Asterisk, an open source PBX which is gaining more and more popularity. Asterisk has been conceived as a modular software component, which can be quite easily modified in order to introduce new functionality upon necessity. Indeed, we added to Asterisk the following functionality:

Design and Implementation of an Open Source IMS

475

– XCON-related identifiers, needed to manage conferences; – Floor Control Server (FCS), by means of a library implementing the serverside behaviour of the BFCP; – Scheduler Server, the server side component implementing the conference scheduling protocol; – Notification Service, to enable asynchronous events interception. The components have been realized as extensions to an already available conferencing facility, called MeetMe, allowing users to access conferences by simply calling a predefined conference room, associated with a standard extension of Asterisk’s dialplan. By interacting with clients through the dynamic exchanging of conference scheduling messages (as defined in the Scheduler component), the enhanced MeetMe module allows for dynamic conference management in a user-friendly fashion. Required changes in the dialplan, as well as dynamic reloading of the needed Asterisk modules, have been achieved by appropriately modifying the already available MeetMe module. To allow video conferencing functionality, which is lacking in the base MeetMe module, we added a BFCP-moderated video switching feature to MeetMe. Work is currently being done on a video mixing and transcoding functionality as well. Since the XCON framework defines new identifiers (e.g. Conference URI [10] and User ID [11]), as does the BFCP specification, the existing MeetMe data model has been enriched with the new required information. Coming to BFCP, this has actually required the greatest development effort, since we had to implement the entire protocol from scratch (at the time of this writing BFCP has only recently become a completely specified RFC, and nonetheless still work is in full swing inside the IETF with respect to its refinement). BFCP has been realized as a library, which is loaded at run time by the Asterisk server and is called whenever the need arises to deal with BFCP messages (creation, parsing, etc.). In fact, Asterisk acts as the Floor Control Server of the architecture (see Fig. 3). It comes into play every time a request is generated from a participant, asking for the right to access a specific resource (e.g. audio or video). As highlighted in the picture, the FCS itself does not take any decision; it rather forwards floor requests to the appropriate floor chair, who comes out with a decision that is eventually notified to the originating client (and potentially to other interested parties, e.g. other participants involved in the same conference as the originating client). When a chair is missing or no chair is assigned to a floor, then the FCS takes automatic decisions corresponding to a previously specified policy (i.e. automatically accept or deny new requests for the floor). As a transport method for BFCP messages, support for both TCP/BFCP and TCP/TLS/BFCP (as specified in [5]) has been implemented. Since an UE, to take advantage of the BFCP functionality, will need to know all the BFCP-related information involved in a conference she/he will be participating into, the focus needs a way to transmit this data to her/him. Apart from any out-of-band mechanism that could be exploited, the IETF specifies a

476

A. Buono et al.

2. Notification

1. Request

3. Decision

6. Notification

4. Granted or Denied

Fig. 3. The BFCP protocol in action

way [12] to let SDP (Session Description Protocol) transport this information. This functionality has been implemented as well in the module. Then, in order to provide users with the capability of dynamically managing conferences, we specified and implemented a brand new protocol. The Scheduler has been conceived as a text-based protocol, mainly used by the clients either to create a new conference instance, or to ask the server for the list of available conferences (i.e. either ongoing conferences or conferences scheduled in the next future). Starting from this protocol, we implemented a Web Services-enabled wrapper to its functionality, and a proxy client that would allow clients exploiting simple browsers to access and manage conference information. This kind of approach is the same the WG has recently started investigating for a new Conference Protocol Candidate, the Centralized Conferencing Manipulation Protocol [13], with which our approach is fully compliant when looking at the currently provided functionality. Finally, we implemented a Notification Service by exploiting both existing solutions and customly implemented modules. As an existing solution, the Asterisk Manager Interface (AMP) has been used to notify authorized listeners about all relevant conference-related events. However, this approach only allows active notifications towards a passive listener. To overcome this limitation, and allow a bidirectional event notification mechanism with an external entity, we implemented a brand new protocol, which we called Dispatcher. This protocol is the base for a work we’re carrying on in order to improve the scalability of centralized conferencing frameworks, and as such it is only mentioned here. It is presented in detail in [14]. 4.2

Client Side Components

On the client side, we decided to adopt Minisip as a starting point. Minisip is an open source soft-phone supporting the SIP protocol and making use of the

Design and Implementation of an Open Source IMS

477

GTK+ framework for its graphical interfaces. Most of the available configuration widgets used in Minisip have been appropriately modified in order to enable support for XCON and BFCP settings. Furthermore, brand new widgets have been created in order to include the required client side functionality, related to both conference scheduling and BFCP. As to conference scheduling, a widget has been implemented, which provides users with the needed conference scheduling functionality: through such widget it becomes easy to either create a new conference, or retrieve the list of active XCON conferences, or join an existing XCON conference. Coming to BFCP, new classes implementing the client side BFCP behavior have been added to Minisip. Associated with such classes, new widgets have been provided which enable users to: (i) send BFCP messages to the BFCP server; (ii) interactively build BFCP floor requests, either in participant or in chair (i.e. with enhanced floor management functionality) mode; (iii) keep an up-to-date log of the BFCP messages exchanged with the server. Finally, with respect to the role of the chair, we implemented ad-hoc interfaces used to either manage floor requests issued by conference participants, or build so-called third-party floor requests, i.e. requests generated by the chair on behalf of a different participant1 . Since we added to Minisip too the support for the encapsulation of BFCP information in SDP bodies, the BFCP automatically is automatically exploited when a SIP invite or reinvite contains BFCP-related data. Besides, the appropriate transport method for the BFCP communication with the FCS (i.e. TCP/BFCP or TCP/TLS/BFCP) is automatically chosen and exploited with respect to this SDP negotiation.

5

Related Work

The architecture we presented in this paper focuses on two main aspects: (i) compatibility with the IMS framework; (ii) capability to offer advanced functionality such as floor control, conference scheduling and management, etc. While there is a rich literature on each of the above points, when considered alone, to the best of our knowledge no integrated effort has been made to date which tries to provide a real architecture being both IMS compliant and capable to offer floor management functionality. This is mainly due to the fact that no agreed-upon solution has been so far designated in the various international standardization fora, with respect to some crucial points, such as the choice of the most suitable conferencing control protocol, as well as its integration in the general IMS architecture. Interestingly enough, a few works have already proposed to make a further step ahead, by moving from a centralized to a distributed perspective. This is the case, for example of [15], where the authors propose a model trying to extend the XCON approach to a distributed scenario. While this is currently out of the scope of the IETF, it does represent one of our primary goals for the next future, as it will be explained in the next section. On the IMS side, some efforts have already been devoted to the realization of IMS compliant testbeds, as in 1

Notice that such functionality is particularly interesting since it enables the chair to allow non XCON-enabled clients to take part to an XCON-enabled conference.

478

A. Buono et al.

the case of [16], where the authors propose a testbed for multimedia services support based on the IMS specification. Finally, several other works can be found in the literature, though based on superseded models such as those defined in the IETF SIPPING Working Group. This is the case, e.g. of [17] and [18].

6

Conclusions and Future Work

In this paper we presented an actual implementation of an IMS-compliant architecture aimed at offering an video conferencing service with enhanced functionality, such as conference scheduling facilities and conference moderation. The system we developed is based on open source components, which have been appropriately modified in order to introduce support for the new required protocols and mechanisms. We are currently done with the main implementation effort, related to a tightly coupled, centralized conferencing model. Though, many challenging issues still have to be faced. Particularly, we have already definined an architecture capable to realize a distributed conferencing system having strong reliability and scalability properties. Starting from the available centralized conferencing system, we have defined the overall architecture for distributed conferencing in terms of framework, data model and protocols definitions. The framework under definition has been called DCON, standing for Distributed Conferencing, but at the same time explicitly recalling the already standardized XCON model. Indeed, DCON will be implemented as a large scale evolution of the XCON framework. We are proposing to deploy our architecture on top of a two-layer network topology. The top layer is represented by an overlay network in which each node plays the role of the focus element of an XCON “island”. The lower layer, in turn, is characterized by a star topology (in which the central hub is represented by the focus element) and is fully compliant with the XCON specification. In the DCON scenario, communication among different islands (i.e. among the focus elements managing different islands) becomes of paramount importance since it enables to share information about the state of the available conferences, as well as about the participants involved in a distributed conference. To the purpose, we are investigating the possibility of adopting the so-called S2S (Server to Server ) module of the XMPP (Extensible Messaging and Presence Protocol ) protocol. XMPP has been standardized by the IETF as the candidate protocol to support instant messaging, e-presence and generic request-response services, and it looks to us as the ideal communication means among DCON focus entities. A prototype of the platform is already available (http://dcon.sf.net/) and currently provides distributed videoconferencing functionality.

Acknowledgments This work has been carried out with the financial support of the European projects NetQoS, OneLab and Content. Such projects are partially funded by the EU as part of the IST Programme, within the Sixth Framework Programme.

Design and Implementation of an Open Source IMS

479

References 1. Rosenberg, J., Schulzrinne, H., Camarillo, G., et al.: SIP: Session Initiation Protocol. RFC3261 (June 2002) 2. Rosenberg, J.: A Framework for Conferencing with the Session Initiation Protocol (SIP). RFC4353 (February 2006) 3. Levin, O., Even, R.: High-Level Requirements for Tightly Coupled SIP Conferencing. RFC4345 (November 2005) 4. Barnes, M., Boulton, C., Levin, O.: A Framework and Data Model for Centralized Conferencing. draft-ietf-xcon-framework-07 (January 2007) 5. Camarillo, G., Ott, J., Drage, K.: The Binary Floor Control Protocol (BFCP). RFC4582 (November 2006) 6. 3GPP: Conferencing using the IP Multimedia (IM) Core Network (CN) subsystem; Stage 3. Technical report, 3GPP (March 2006) 7. OMA: Instant Messaging using SIMPLE Architecture.Technical report, OMA. 8. OMA: Push to talk over Cellular (PoC) - Architecture. Technical report, OMA 9. 3GPP: IP multimedia subsystem; Stage 2, Technical Specification. Technical report, 3GPP (June 2006) 10. Boulton, C., Barnes, M.: A Universal Resource Identifier (URI) for Centralized Conferencing (XCON). draft-boulton-xcon-uri-01 (February 2007) 11. Boulton, C., Barnes, M.: A User Identifier for Centralized Conferencing (XCON). draft-boulton-xcon-userid-01 (February 2007) 12. Camarillo, G.: Session Description Protocol (SDP) Format for Binary Floor Control Protocol (BFCP) Streams. RFC4583 (November 2006) 13. Barnes, M., Boulton, C., Schulzrinne, H.: Centralized Conferencing Manipulation Protocol. draft-barnes-xcon-ccmp-02 (January 2007) 14. Buono, A., Loreto, S., Miniero, L., Romano, S.P.: A Distributed IMS Enabled Conferencing Architecture on Top of a Standard Centralized Conferencing Framework. IEEE Communications Magazine 45(3) (2007) 15. Cho, Y., Jeong, M., Nah, J., Lee, W., Park, J.: Policy-Based Distributed Management Architecture for Large-Scale Enterprise Conferencing Service Using SIP. IEEE Journal On Selected Areas In Communications 23, 1934–1949 (2005) 16. Magedanz, T., Witaszek, D., Knuettel, K.: The IMS Playground @ Fokus An Open Testbed For Next Generation Network Multimedia Services. In: Proceedings of the First International Conference on Testbeds and Research Infrastructures for the DEvelopment of NeTworks and COMmunities (TRIDENTCOM05) (2005) 17. Yang, Z., Huadong, M., Zhang, J.: A Dynamic Scalable Service Model for SIPbased Video Conference. In: Proceedings of the 9th International Conference on Computer Supported Cooperative Work in Design 18. Singh, A., Mahadevan, P., Acharya, A., Shae, Z.: Design and Implementation of SIP Network and Client Services. In: Proceedings of the 13th International Conference on Computer Communication and Networks (ICCCN), Chicago, IL (2004)

Author Index

Aalto, Samuli 1 Alanen, Olli 148 ´ Alvarez Garc´ıa-Sanchidri´ an, Rodrigo 431 Ayoun, Moti 121 Bagnulo, Marcelo 443 Bellalta, B. 342 Berbineau, Marion 431 Bohnert, Thomas Michael Bonte, Michel 431 Borcoci, Eugen 133 Bruneel, Herwig 248 Bruyelle, Jean-Luc 431 Buono, A. 468

De Cicco, Luca 73 de la Oliva, Antonio 443 Desurmont, Xavier 431 De Turck, Koen 109 De Vuyst, Stijn 109 Doma´ nska, Joanna 61 Doma´ nski, Adam 61 Domenech-Benlloch, Ma Jose

Fiems, Dieter

269

H¨ am¨ al¨ ainen, Timo 148 Heegaard, Poul E. 26, 162 Herrera-Joancomart´ı, Jordi 281 Hickling, Ron 294 Hryn, Grzegorz 38 Hwang, Gang Uk 419

133 Iversen, V.B.

Karakoc, Mustafa 223 Kari, Hannu H. 455 Katz, Marcos 133 Kavak, Adnan 223 Konorski, Jerzy 316 Koucheryavy, Yevgeni 133

210

121 109

Galkin, Anatoly M. 187 Garc´ıa, J. 13 Garc´ıa-Mart´ınez, Alberto 443 Gimenez-Guzman, Jose Manuel Giordano, Stefano 269

260

Jakubiak, Jakub 133 Jang, Kil-Woong 306 Jeney, G´ abor 431 Jeon, Seungwoo 235 Jorguseski, Ljupco 194 Juva, Ilmari 1

Cano, C. 342 Cantineau, Olivier 431 Casares-Giner, Vicente 210 Castaldi, T. 468 Chai, Wei Koong 86 Chung, Tai-Myoung 356 Chydzinski, Andrzej 38 Claeys, Dieter 248 Csapodi, M´ arton 431 Czach´ orski, Tadeusz 61

Engelstad, Paal

Grønsund, P˚ al 121 Gubinelli, Massimiliano

210

Laevens, Koenraad 248 Lagutin, Dmitrij 455 Lamy-Bergot, Catherine 431 Lee, Hanjin 235 Lee, Jong-Hyouk 356 Lee, Joong-Hee 356 Li, Vitaly 99 Linder, Lloyd 294 Litjens, Remco 194 Lopez da Silva, Rafael 431 Lukyanenko, Andrey 393 Macian, C. 342 Malouch, Naceur 431 Marandin, Dimitri 367 Martikainen, Henrik 148 Mart´ınez, I. 13 Martinez-Bauset, Jorge 210 Mascolo, Saverio 73

482

Author Index

Miniero, L. 468 Moltchanov, D. 49 Monteiro, Edmundo 133 Montes, Angel 175 Osipov, Evgeny

379

Pagano, Michele 269 Panfilov, Oleg 294 Park, Hong Seong 99 Pavlou, George 86 P´erez, Jes´ us A. 175 Peuhkuri, Markus 1 Pla, Vicent 210 Popova, Mariya 194 Raymer, Dave 330 Rif` a-Pous, Helena 281 Romano, S.P. 468 Samudrala, Srini 330 Sandmann, Werner 162 Sanz, David 431

Sayenko, Alexander 148 Sfairopoulou, A. 342 Simonina, Olga A. 187 Skeie, Tor 121 Slavicek, Karel 409 Soto, Ignacio 443 Stepanov, S.N. 260 Strassner, John 330 Susitaival, Riikka 1 Turgeon, Antonio 294 Tykhomyrov, Vitaliy 148 Viruete, E.

13

Walraevens, Joris 248 Wittevrongel, Sabine 109 Wojcicki, Robert 38 Yanovsky, Gennady G. Yoon, Hyunsoo 235 Z´ arate, Victor

175

187