Selected papers from the 3rd international workshop on QoS in multiservice IP networks (QoS-IP 2005)


345 85 3MB

English Pages 170 Year 2006

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Selected papers from the 3rd international workshop on QoS in multiservice IP networks (QoS-IP 2005)......Page 1
Introduction......Page 4
Arrival curves......Page 5
The Legendre transform......Page 6
Convex and concave conjugates......Page 7
Properties of the Legendre transform......Page 8
Service curves......Page 9
Output bounds......Page 10
Performance bounds......Page 11
Rate-latency performance bounds......Page 13
Service-curve-based routing in the Legendre domain......Page 14
References......Page 16
Introduction......Page 18
Related study......Page 19
Adaptive models......Page 20
QoS requirements......Page 21
WRR scheduler......Page 22
QoS requirements......Page 23
Pricing criterion......Page 24
QoS requirements......Page 25
Upper constraints......Page 26
Integrated Services......Page 27
Differentiated Services......Page 28
Simulation environment......Page 29
Case 1......Page 30
Case 2......Page 33
References......Page 35
Introduction......Page 37
State-dependent arrival rates......Page 39
Packet-level balancing......Page 40
Fixed routes......Page 41
Separately balanced capacity allocation and routing......Page 42
Jointly balanced capacity allocation and routing......Page 43
Numerical results......Page 44
Conclusions......Page 45
References......Page 46
Introduction......Page 47
Network, traffic and problem description......Page 48
The BE traffic models......Page 49
The stationary model......Page 50
The dimensioning mechanism......Page 51
QoS traffic dimensioning......Page 52
BE traffic dimensioning......Page 53
The final dimensioning......Page 54
Numerical results......Page 55
Appendix A......Page 60
Appendix B......Page 61
References......Page 62
Introduction......Page 64
Transport-layer QoS translators......Page 66
Virtual private networks......Page 67
The Capacity Assignment problem......Page 68
The Buffer Assignment problem......Page 69
Numerical examples and simulations......Page 70
The capacity and flow assignment problem......Page 72
The Greedy Weight Flow Deviation method......Page 73
Numerical examples......Page 74
Numerical examples and simulations......Page 75
Acknowledgement......Page 77
Genetic algorithms......Page 78
References......Page 79
Introduction......Page 82
Model description......Page 84
Probability of losses in a block......Page 85
Performance analysis......Page 86
Constant average load case......Page 88
Constant average packet loss case......Page 94
On the accuracy of the Gilbert model......Page 98
Acknowledgements......Page 104
References......Page 105
Introduction......Page 108
An overview of MPLS......Page 109
Problem description and existing solutions......Page 110
Virtual Path Hopping (VPH) concept......Page 111
Dynamic Virtual Path Allocation (DVPA) algorithm......Page 113
DVPA algorithm......Page 115
VPH concept......Page 116
DVPA algorithm......Page 118
Conclusions......Page 120
References......Page 121
Introduction......Page 123
IPv4 and IPv6 multihoming......Page 124
Impact of PI and PA prefixes on available AS paths......Page 125
A two-level topology with delays......Page 126
Simulation results......Page 127
A new path diversity metric......Page 128
Internet topologies......Page 129
Simulation results......Page 130
Influence of topology on path diversity......Page 131
Acknowledgements......Page 134
References......Page 135
Introduction......Page 136
Micromobility protocol and resourcereservations......Page 137
Packet classifier and scheduler......Page 138
MEHROM handoff scheme......Page 139
Bandwidth table......Page 140
Bandwidth reservations (Fig. blank 5(1))......Page 141
Handoff phase 2 mdash route optimization (Fig. blank 5(3))......Page 142
Mobile host is sender......Page 143
Handoff (Fig. blank 6(2))......Page 144
Evaluation......Page 145
Load of control traffic in the access network......Page 146
Handoff performance......Page 147
Highly loaded access networks......Page 149
References......Page 151
Introduction......Page 154
Related works......Page 155
The packet marking algorithm (PMA)......Page 156
The analytical model......Page 157
About fixed point approximations......Page 158
The sources model......Page 160
The network model......Page 162
Model validation......Page 164
A model application......Page 165
Conclusions and further research issues......Page 166
References......Page 168
12.pdf......Page 170
Recommend Papers

Selected papers from the 3rd international workshop on QoS in multiservice IP networks (QoS-IP 2005)

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Computer Networks 50 (2006) 1023–1025 www.elsevier.com/locate/comnet

Guest Editorial

Selected papers from the 3rd international workshop on QoS in multiservice IP networks (QoS-IP 2005)

Technological advances in communication have fostered an impressive increase in the transport capacity of currently deployed networks. Despite this, quality of service (QoS) remains a central research issue. New bandwidth-consuming greedy services are emerging (e.g., peer-to-peer applications) and many critical bottlenecks do exist in currently deployed networks (especially, but not necessarily, in the access part). All this calls for the strong need to develop techniques devised to properly manage resources and differentiate/prioritize/control traffic when congestion situations arise. This special issue is a collection of selected papers, originally presented at the 3rd international workshop on QoS in multiservice IP networks (QoS-IP 2005), held in Catania, Italy, in February 2005, and here presented in significantly revised and extended versions. QoS-IP 2005 was organized as the final event of the project TANGO (Traffic models and Algorithms for Next Generation IP networks Optimization) funded by the Italian Ministry for Education, Universities and Research during 2003 and 2004. As the third edition of a successful international event, the workshop attracted almost 100 submissions from 21 different countries, from which 51 high-quality papers were selected for presentation. QoS-IP 2005 covered most of the hot topics in the field of networking. A representative selection of the issues discussed during the workshop is collected in this special issue of Computer Networks which includes 10 high quality papers. With the objective of providing new effective analytical tools, in the paper ‘‘Conjugate network calculus: A dual approach applying the Legendre

transform,’’ by Markus Fidler and Stephan Recker, the Legendre transform is proposed as the basis for deriving a dual domain for network calculus that is analogous to the Fourier transform domain in system theory. Network calculus min–plus convolution and deconvolution are used as the basis for the computation of an output signal given the system component burst response and input signal. The mapping consists of simple additions and subtractions in the Legendre transform domain. As an application, bounds on backlog and delay are derived in simple queuing systems. QoS provision for different service classes is the topic of the paper ‘‘Comparison and analysis of the revenue-based adaptive queuing models’’ by Alexander Sayenko, Timo Hamalainen, Jyrki Joutsensalo, and Lari Kannisto, in which several resource sharing models are considered as a mean to jointly guarantee that QoS requirements are met and service providerÕs revenues are maximized. Resources are allocated so that a minimum amount of bandwidth is provided to the various services depending on their requirements, and the additional available resources are shared according to a pricebased scheme aiming at maximizing revenue. Analytical models and simulations are used to evaluate and compare several schemes and to derive criteria for choosing the best solution to be employed in a router, given its characteristics and the network conditions. Schemes for load balancing are investigated in the paper ‘‘Insensitive load balancing in data networks,’’ by Juha Leino and Jorma Virtamo. In particular, the authors consider load balancing policies that are insensitive to flow size distribution,

1389-1286/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.09.014

1024

Guest Editorial / Computer Networks 50 (2006) 1023–1025

insensitivity being an interesting property that allow the derivation of robust and simple results without detailed traffic characterization. The schemes are analyzed by applying the Markov decision process theory. A linear programming technique is used to derive the optimal routing policy. Packet level schemes are also considered. Network design and planning are crucial tasks, especially in networks with integrated multimedia and best-effort traffic as in the cases considered in the paper ‘‘Capacity planning in IP virtual private networks under mixed traffic,’’ by Raffaele Bolla, Roberto Bruschi, and Franco Davoli. The authors propose a mechanism for bandwidth dimensioning based on a hybrid combination of analytical and simulation models. Call level analytical models of the traffic are combined with a simulation tool employing fluid models of traffic aggregates. The network design methodology proposed in the paper ‘‘Algorithms for IP networks design with end-to-end QoS constraints,’’ by Emilio C.G. Wille, Marco Mellia, Emilio Leonardi, and Marco Ajmone Marsan, aims at meeting an end-to-end QoS constraint. In order to account for the effect of protocols at different levels of the protocol stack, the end user QoS constraints are first mapped into transport layer requirements and then translated into network layer specifications. Several design problems are discussed, from buffer dimensioning to flow and capacity assignment. An effective solution for reducing the degradation of VBR audio and video streams in modern data networks (especially when ARQ schemes cannot be adopted due to real-time constraints) is to use a forward error correction coding scheme, increasing the redundancy of the transmitted data so as to be able to cope with packet losses. The paper ‘‘On the effect of the packet size distribution on FEC performance,’’ by Gyo¨rgy Da´n, Vikto´ria Fodor, and Gunnar Karlsson, explores the relation between the loss process, represented by an analytical model, the packet size distribution of the streaming data, and the FEC performance in several scenarios, showing the different effect of the packet size distribution in the access network and in the backbone. The increase in availability of IP/MPLS networks, especially for real-time traffic, is the objective of the algorithms discussed in ‘‘A connection oriented network architecture with guaranteed QoS for future real-time applications over the internet,’’

by Manodha Gamage, Mitsuo Hayasaka, and Tetsuya Miki. While virtual path hopping provides proactive allocation of alternative paths to reduce the degradation of the data connections due to temporary failures in the MPLS control plane, the proposed dynamic virtual path algorithm is used to pre-plan backup paths for the premium traffic that needs to be quickly re-routed during the failures, so that their joint application can effectively provide a fast recovery time but still maintain a good efficiency in bandwidth utilization. The advantages of using the multihoming features of the future IPv6 networks are discussed in ‘‘Leveraging network performances with IPv6 multihoming and multiple provider-dependent aggregatable prefixes,’’ by Ce´dric de Lanois, Bruno Quoitin, and Olivier Bonaventure, where the authors compare the existing multihoming solutions in IPv4 and IPv6 and show how the use of multiple prefixes can help in creating distinct alternate routes for the multihomed autonomous system, therefore improving the path diversity. Micromobility management in a mobile-IP wireless network is the topic of ‘‘Q-MEHROM: Mobility support and resource reservation for mobile senders and receivers,’’ by Liesbeth Peters, Ingrid Moerman, Bart Dhoedt, and Piet Demeester. The authors integrate the MEHROM protocol and the resource reservation capabilities of QOSPF to provide efficient micromobility management using the available resources of the wired, meshed network. The performance of the protocol is studied under various network topologies and hand-off scenarios. The differentiated services QoS architecture can provide further benefits other than ‘‘just’’ differentiated treatment of traffic flows. The paper ‘‘An analytical model of a new packet marking algorithm for TCP flows,’’ by Giovanni Neglia, Vincenzo Falletta, and Giuseppe Bianchi, exploits differentiated treatment of marked packets to improve the throughput performance of TCP flows. An adaptive packet marking strategy, to be enforced at the ingress nodes of a DiffServ domain (edge routers), is proposed and thoroughly studied by means of analytical techniques. Finally, we would like to thank all the people who contributed to this special issue. Above all, the authors of the papers for submitting their valuable work you can appreciate in this volume. We are

Guest Editorial / Computer Networks 50 (2006) 1023–1025

also in debt with the reviewers who, besides helping us in selecting the manuscripts, contributed to improve the quality of the papers with their suggestions and comments. Last but not least, we are very grateful to Harry Rudin, Co-Editor-in-Chief of Computer Networks, not only for approving this special issue, but also for his continuous support and precious help during all the phases of the preparation of this special issue. Giuseppe Bianchi was Assistant Professor at the Politecnico di Milano, Italy, from 1993 to 1998, and Associate Professor at the University of Palermo, Italy, from 1998 to 2003. He is currently Associate Professor at the University of Roma Tor Vergata, Italy. He spent 1992 as a Visiting Researcher at the Washington University of St. Louis, Missouri, USA, and 1997 as a Visiting Professor at the Columbia University of New York. His research activity (documented in about 100 papers in peer-refereed international journals and conferences) spans several areas, among which are multiple access and mobility management in wireless local area networks, design and performance evaluation of broadband networking protocols, and quality of service support in IP networks. He has been co-organizer of the first ACM workshop on wireless mobile internet (ACM WMI 2001), of the first ACM workshop on wireless mobile applications over WLAN hot-spot (ACM WMASH 2003), and of the third IEEE international workshop on multiservice IP networks (IEEE QoSIP 2005). He was general chair for the second ACM workshop on wireless mobile applications over WLAN hot-spot (ACM WMASH 2004).

Marco Listanti received his Dr. Eng. degree in electronics engineering from the University ‘‘La Sapienza’’ of Roma in 1980. He joined the Fondazione Ugo Bordoni in 1981, where has been leader of the group ‘‘TLC network architecture’’ until 1991. In November 1991 joined the INFOCOM Dept. of the University of Roma ‘‘La Sapienza’’, where he is Full Professor of Switching Systems. He is author of several papers published on the most important technical journals and conferences in the area of telecommunication networks and has been guest editor of the feature topic ‘‘Optical Networking Solutions for Next Generation Internet Networks’’, on IEEE Communications Magazine. His current research interests focus on traffic control in IP networks and on the evolution of techniques for optical networking. Prof. Listanti has been representative of Italian PTT administration in international standardization organizations (ITU, ETSI) and has been coordinator of several national and international research projects (CNR, MURST, RACE, ACTS, ICT). He is also a member of IEEE Communications Society.

1025

Michela Meo received the Laurea degree in Electronic Engineering in 1993, and the Ph.D. degree in Electronic and Telecommunication Engineering in 1997, both from Politecnico di Torino. Since November 1999, she is an Assistant Professor at Politecnico di Torino. She co-authored more than 70 papers, about 20 of which are in international journals. She edited five special issues of international journals, including ACM Monet, Performance Evaluation Journal and Computer Networks. She was program co-chair of two editions of ACM MSWiM (International workshop on modeling, analysis and simulation of wireless and mobile systems), general chair of another edition of ACM MSWiM, program co-chair of IEEE QoS-IP 2005 (the 3rd International Workshop on QoS in multiservice IP Networks) and she was in the program committee of more than twenty international conferences, including Sigmetrics, ICC and Globecom. Her research interests are in the field of performance evaluation of transport and link layer protocols, analysis and dimensioning of cellular networks, and traffic characterization.

Maurizio M. Munafo` is Assistant Professor in the Electronics Department of Politecnico di Torino. He holds a Dr.Ing. degree in Electronic Engineering since 1991 and a Ph.D. in Telecommunications Engineering since 1994, both from Politecnico di Torino. Since November 1991 he has been with the Electronics Department of Politecnico di Torino, where he has been involved in the development of an ATM networks simulator. He is co-author of about 30 journal and conference papers in the area of communication networks and systems. He was on the program committee of QoS-IP 2003, HPSR 2003 and QoS-IP 2005 and edited a special issue of computer networks. His current research interests are in simulation and performance analysis of communication systems, QoS routing algorithms and network security.

Guest Editors Giuseppe Bianchi Universita` di Roma Tor Vergata, Italy Marco Listanti Universita` di Roma La Sapienza, Italy Michela Meo Politecnico di Torino, Dpto. di Elettronica 10129 Torino, Italy Tel.: +39 011 5644167; fax: +39 011 5644099 E-mail address: [email protected] Maurizio M. Munafo` Politecnico di Torino, Dpto. di Elettronica 10129 Torino; Italy Available online 10 October 2005

Computer Networks 50 (2006) 1026–1039 www.elsevier.com/locate/comnet

Conjugate network calculus: A dual approach applying the Legendre transform q Markus Fidler a

a,*

, Stephan Recker

b

Centre for Quantifiable Quality of Service in Communication Systems, NTNU Trondheim, Norway b IMST GmbH, Kamp-Lintfort, Germany Available online 5 October 2005

Abstract Network calculus is a theory of deterministic queuing systems that has successfully been applied to derive performance bounds for communication networks. Founded on min–plus convolution and de-convolution, network calculus obeys a strong analogy to system theory. Yet, system theory has been extended beyond the time domain applying the Fourier transform thereby allowing for an efficient analysis in the frequency domain. A corresponding dual domain for network calculus has not been elaborated, so far. In this paper we show that in analogy to system theory such a dual domain for network calculus is given by convex/ concave conjugates referred to also as the Legendre transform. We provide solutions for dual operations and show that min–plus convolution and de-convolution become simple addition and subtraction in the Legendre domain. Additionally, we derive expressions for the Legendre domain to determine upper bounds on backlog and delay at a service element and provide representative examples for the application of conjugate network calculus.  2005 Elsevier B.V. All rights reserved. Keywords: Network calculus; Legendre transform; Fenchel duality theorem

1. Introduction Network calculus [3,11] is a theory of deterministic queuing systems that allows analyzing various fields in computer networking. Being a powerful and elegant theory, network calculus obeys a number of analogies to classical system theory, however, under a min–plus algebra, where addition becomes computation of the minimum and multiplication becomes addition [1,3,11]. q *

This article is an extended version of [8]. Corresponding author. E-mail address: fi[email protected] (M. Fidler).

System theory applies a characterization of systems by their response to the Dirac impulse, which constitutes the neutral element. A system is considered to be linear, if a constantly scaled input signal results in a corresponding scaling of the output signal and if the sum of two input signals results in the sum of the two output signals that correspond to the individual input signals. The output of a system can be efficiently computed by convolution of the input signal and the systemÕs impulse response and the concatenation of separate systems can be expressed by convolution of the individual impulse responses. Network calculus relates very much to the above properties of system theory, while being based on

1389-1286/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.09.004

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

the calculus for network delay presented in [5,6] and on Generalized Processor Sharing in [17,18]. The neutral element of network calculus is the burst function and systems are described by their burst response. Min–plus linearity is fulfilled, if the addition of a constant to the input signal results in an addition of the same constant to the output signal and if the minimum of two input signals results in the minimum of the two output signals that correspond to the individual input signals. The output of a system is given as the min–plus convolution respective de-convolution of the input signal and the systemÕs burst response and the concatenation of separate systems can be described by min–plus convolution of the individual burst responses. Extensions and a comprehensive overview on current network calculus are given in [3,11]. Yet, system theory provides another practical domain for analysis applying the Fourier transform, which is particularly convenient due to its clearness and because the convolution integral becomes a simple multiplication in the Fourier domain. Besides, fast algorithms for Fourier transformation exist. A corresponding domain in network calculus has, however, not been elaborated in depth so far. In [15] it is shown that the backlog bound at a constant rate server is equal to the Legendre, respectively Fenchel, transform [1,21] of the input. A similar concept is used in [10,16], where the output of a network element is computed in the Legendre domain. Related theories are, however, far more developed in the field of morphological signal processing [7,14] where the slope transform has been successfully applied. Yet, it can be shown that the Legendre transform provides the basis for a new and comprehensive theory, which constitutes a dual approach to network calculus that can be efficiently applied to a variety of problems [8,20]. As for the Fourier transform, fast algorithms for the Legendre transform exist [13]. The remainder of this paper is organized as follows: first, Section 2 briefly summarizes some essential elements of network calculus, including arrival curves for input and output as well as service curves. Then, Section 3 introduces the Legendre transform convex and concave conjugates and related properties, especially min–plus convolution and de-convolution. In Section 4 the derived theory is applied to introduce the conjugate network calculus comprised of the transformed elements of network calculus described in Section 2. Section 5 provides two sample

1027

applications of conjugate network calculus and finally Section 6 concludes the paper. 2. Elements of network calculus The foundations of network calculus are min–plus convolution and de-convolution, where min–plus operations can be derived from the corresponding classical operations by replacing addition by computation of the minimum and multiplication by addition. Consequently the algebraic structure that is used by network calculus is the commutative dioid ðR [ 1; min; þÞ [11]. Definition 1 (Min–plus convolution and min–plus deconvolution). The min–plus convolution  of two functions f(t) and g(t) is defined as ðf  gÞðtÞ ¼ inf ½f ðt  uÞ þ gðuÞ u

and min–plus de-convolution Ü as ðf ÜgÞðtÞ ¼ sup½f ðt þ uÞ  gðuÞ. u

In the context of network calculus t P u P 0 respectively t P 0 and u P 0 is often applied to min–plus convolution respectively min–plus deconvolution. The algebraic structure ðF; min; Þ is again a commutative dioid, where F is the set of wide-sense increasing functions with f(s) 6 f(t) for all s 6 t and f(t) = 0 for t < 0 [11]. Note however, that min–plus de-convolution does not generally fulfill these properties. In particular it is not commutative. Furthermore, in the context of network calculus, arrival and service curves play an important role. Arrival curves constitute upper bounds on the input and output of network elements, while service curves represent lower bounds on the service offered by network elements. Networks are usually comprised of more than one network element and a network service is generally composed of a concatenation of the services of the individual network elements. This section provides the formal definitions for arrival and service curves, and the computation rules for concatenated service curves and single server output bounds, which will be used in the sequel. The corresponding proofs can be found, for example, in [3,11]. 2.1. Arrival curves Flows or aggregates of flows can be described by arrival functions F(t) that are given as the

1028

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

cumulated number of bits seen in an interval [0, t]. Arrival curves a(t) are defined to give upper bounds on the arrival functions.

Theorem 1 (Concatenation). The service curve b(t) of the concatenation of n service elements with service curves bi(t) becomes

Definition 2 (Arrival curve). An arrival function F(t) conforms to an arrival curve a(t), if for all t P 0 and all s 2 [0, t]

bðtÞ ¼ b bi ðtÞ.

aðt  sÞ P F ðtÞ  F ðsÞ.

n i¼1

2.4. Output bounds

A typical constraint for incoming flows is given by the leaky-bucket algorithm, which allows for bursts of size b and a defined sustainable rate r.

Bounds on the output from a service element can be derived to be the min–plus de-convolution of the bound on the input and the corresponding service curve.

Definition 3 (Leaky-bucket arrival curve). The arrival curve that is enforced by a leaky bucket is given by  0; t ¼ 0; aðtÞ ¼ b þ r  t; t > 0.

Theorem 2 (Output bound). Consider a service element b(t) with input that is bounded by a(t). A bound on the output a 0 (t) is given by

2.2. Service curves The service that is offered by the scheduler on an outgoing link can be characterized by a minimum service curve, denoted by b(t). Definition 4 (Service curve). A lossless network element with input arrival function F(t) and output arrival function F 0 (t) is said to offer a service curve b(t) if for any t P 0 there exists at least one s 2 [0, t] such that F 0 ðtÞ  F ðsÞ P bðt  sÞ. A characteristic service curve is the rate-latency type with rate R and latency T. Definition 5 (Rate-latency service curve). The ratelatency service curve is defined as

a0 ðtÞ ¼ aðtÞÜbðtÞ. 2.5. Performance bounds The bounds on the performance delivered by a particular service element can be immediately determined from the arrival curve of traffic traversing the element and from the pertaining service curve. Theorem 3 (Server performance thresholds). Consider a server with service curve b(t). Let Q(t) be the backlog at the server at time t and let D(t) be the virtual delay of the last packet that arrives at time t for traffic with an arrival curve a(t). Then the backlog is upper bounded by Q 6 supfaðuÞ  bðuÞg ¼ ðaÜbÞð0Þ uP0

and the maximum delay in case of FIFO scheduling is bounded by D 6 inffd P 0 : aðuÞ 6 bðu þ dÞ 8u P 0g.

where [  ]+ is zero if the argument is negative.

The maximum backlog and maximum delay can be determined from the maximum vertical and horizontal deviations of the arrival and service curve, respectively.

2.3. Concatenation

3. The Legendre transform

Networks are usually comprised of more than one network element and a network service is generally composed of a series of individual network element services. The service curve of a concatenation of service elements can be efficiently described by min–plus convolution of the individual service curves.

In this section we show the existence of eigenfunctions in classical and in particular in min–plus system theory. The corresponding eigenvalues immediately yield the Fourier respective Legendre transform. Following the definition of convex and concave conjugates, the two major operations of network calculus, min–plus convolution and de-

þ

bðtÞ ¼ R  ½t  T  ;

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

convolution, are derived in the Legendre domain and finally a list of properties of the Legendre transform is provided. 3.1. Eigenfunctions and eigenvalues Let us recall the definition of eigenfunctions and eigenvalues in classical and in min–plus algebra. Definition 6 (Eigenfunctions and eigenvalues). Consider a linear operator A on a function space. The function f(t) is an eigenfunction for A with associated eigenvalue k, if A½f ðtÞ ¼ f ðtÞ  k.

is the Fenchel conjugate that applies for convex functions g(t). The Fenchel conjugate of a differentiable function is generally denoted as the Legendre transform. Under the assumption of differentiability we use the terms Legendre transform and conjugate interchangeably. 3.2. Convex and concave conjugates Before providing further details on the Legendre transform, convexity and concavity are defined. Definition 7 (Convexity and concavity). A function f(t) is convex, if for all u 2 [0, 1]

Accordingly in min–plus algebra eigenfunctions and eigenvalues are defined by

f ðu  s þ ð1  uÞ  tÞ 6 u  f ðsÞ þ ð1  uÞ  f ðtÞ.

A½f ðtÞ ¼ f ðtÞ þ k.

gðu  s þ ð1  uÞ  tÞ P u  gðsÞ þ ð1  uÞ  gðtÞ.

The output h(t) of a linear time-invariant system with impulse response g(t) and input f(t) is given by the convolution integral Z þ1 hðtÞ ¼ f ðtÞ  gðtÞ ¼ f ðt  uÞgðuÞ du. 1

The functions f(t) = ej2pst are known to be eigenfunctions for the convolution integral as shown by Z þ1 ej2pst  gðtÞ ¼ ej2psðtuÞ gðuÞ du ¼ ej2pst  GðsÞ; 1

where the eigenvalue Z þ1 GðsÞ ¼ ej2psu gðuÞ du 1

is equivalent to the Fourier transform of g(t). In analogy, network calculus applies min–plus convolution to derive lower bounds on the output of network elements, respectively min–plus de-convolution to derive upper bounds. For explanatory reasons the following derivation is made applying min–plus de-convolution according to Definition 1, which provides an upper bound on the output of a network element with burst response g(t) and upper bounded input f(t). Eigenfunctions with regard to min–plus de-convolution are the affine functions b + s Æ t as established by ðb þ s  tÞÜgðtÞ ¼ sup½b þ s  ðt þ uÞ  gðuÞ

1029

A function g(t) is concave, if for all u 2 [0, 1] Among others the following properties are of particular interest: if f(t) = g(t) is convex then g(t) is concave. The sum of two convex or two concave functions is convex, respectively concave. If the domain of a convex or concave function is smaller than R, the function can be extended to R while retaining convexity respectively concavity by setting it to +1, respectively 1, where it is undefined [21]. The Legendre transform is defined independently for convex and concave functions. Details can be found in [21]. Further on, set-valued extensions exist that allow transforming arbitrary functions [7,14] which, however, are not used here. Let L denote the Legendre transform in general, where we for clarity distinguish between convex conjugates L and concave conjugates L. Definition 8 (Convex and concave conjugates). The convex Fenchel1 conjugate is defined as F ðsÞ ¼ Lðf ðtÞÞðsÞ ¼ sup½s  t  f ðtÞ t

and the concave conjugate as GðsÞ ¼ LðgðtÞÞðsÞ ¼ inf ½s  t  gðtÞ. t

If f(t) = g(t) is convex then LðgðtÞÞðsÞ ¼ Lðf ðtÞÞðsÞ holds.

u

¼ b þ s  t þ GðsÞ; where the eigenvalue GðsÞ ¼ sup½s  u  gðuÞ u

1 Convex and concave conjugates can also be derived by means of the Fenchel duality theorem, which will be discussed in Section 4.5.

1030

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

3.3. Min–plus convolution and de-convolution

Since S is convex, hðtÞ ¼ inffn 2 Rjðt; nÞ 2 Sg is also convex [21]:

The foundation of network calculus are min–plus convolution and de-convolution for which corresponding operations in the Legendre domain are derived here.

hðtÞ ¼ inffn 2 Rjðu; #Þ 2 R2 ;  f ðt þ uÞ 6 n  #; gðuÞ 6 #g

Theorem 4 (Min–plus convolution in the Legendre domain). The min–plus convolution of two convex functions f(t) and g(t) in the time domain becomes an addition in the Legendre domain. Theorem 4 has already been reported in [1,21]. Proof. The min–plus convolution of two convex functions is convex [11]. Thus, it is meaningful to apply the convex conjugate which becomes

¼ inffn 2 RjðuÞ 2 R; f ðt þ uÞ þ gðuÞ 6 ng ¼ inf ff ðt þ uÞ þ gðuÞg. u

It follows that (f Ü g)(t) = supu[f(t + u)  g(u)] is concave. h Theorem 5 (Min–plus de-convolution in the Legendre domain). The min–plus de-convolution of a concave function f(t) and a convex function g(t) in the time domain becomes a subtraction in the Legendre domain.

Lððf  gÞðtÞÞðsÞ ¼ sup½s  t  inf ½f ðt  uÞ þ gðuÞ u

t

¼ sup½s  t þ sup½f ðt  uÞ  gðuÞ t

u

Lððf ÜgÞðtÞÞðsÞ ¼ inf ½s  t  sup½f ðt þ uÞ  gðuÞ

¼ sup½sup½s  ðt  uÞ  f ðt  uÞ u

Proof. With Lemma 1 we find t

t

u

¼ inf ½s  t þ inf ½f ðt þ uÞ þ gðuÞ

þ s  u  gðuÞ

t

¼ sup½Lðf ðtÞÞðsÞ þ s  u  gðuÞ

u

¼ inf ½inf ½s  ðt þ uÞ  f ðt þ uÞ u

u

¼ Lðf ðtÞÞðsÞ þ sup½s  u  gðuÞ

t

þ gðuÞ  s  u

u

¼ Lðf ðtÞÞðsÞ þ LðgðtÞÞðsÞ.



Lemma 1 (Concavity of min–plus de-convolution). The min–plus de-convolution of a concave function f(t) and a convex function g(t) is concave. Proof. The proof is a variation of a proof provided in [11], where it is shown that the min–plus convolution of two convex functions is convex. Define Sf ðtÞ and SgðtÞ to be the epigraphy of f(t) and g(t) according to Sf ðtÞ ¼ fðt; gÞ 2 R2 j  f ðtÞ 6 gg; SgðtÞ ¼ fðt; #Þ 2 R2 jgðtÞ 6 #g. Since f(t) and g(t) are both convex, the corresponding epigraphy are also convex [11,21] as well as the sum S ¼ Sf ðtÞ þ SgðtÞ that is S ¼ fðr þ s; g þ #Þjðr; gÞ 2 R2 ; ðs; #Þ 2 R2 ;  f ðrÞ 6 g; gðsÞ 6 #g. Substitution of r + s by t, s by u, and g + # by n yields S ¼ fðt; nÞ 2 R2 jðu; #Þ 2 R2 ; f ðt þ uÞ 6 n  #; gðuÞ 6 #g.

¼ inf ½Lðf ðtÞÞðsÞ þ gðuÞ  s  u u

¼ Lðf ðtÞÞðsÞ  sup½s  u  gðuÞ u

¼ Lðf ðtÞÞðsÞ  LðgðtÞÞðsÞ.



3.4. Properties of the Legendre transform The Legendre transform exhibits a number of useful properties of which Table 1 lists the most relevant ones. More details can be found in [21]. The Legendre transform is self-dual, that is it is its own inverse. More precisely LðLðf ÞÞðtÞ ¼ ðcl f ÞðtÞ, where (cl f)(t) is the closure of f(t) that is defined to be (cl f)(t) = lim infs!tf(s) for convex functions and (cl f)(t) = lim sups!tf(s) for concave functions. Thus, if f(t) is convex then (cl f)(t) 6 f(t) and if f(t) is concave then (cl f)(t) P f(t). If (cl f)(t) = f(t) then f(t) is said to be closed. The Legendre transform of a convex function is a closed convex function, respectively the Legendre transform of a concave function is a closed concave function. Now consider an arbitrary function f(t). The convex conjugate becomes Lðf ðtÞÞðsÞ ¼ Lðclðconvf Þ ðtÞÞðsÞ, where the convex hull (cl(conv f))(t) of f(t) is the greatest closed convex function majorized by

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039 Table 1 Properties of the Legendre transform Time domain

Legendre domain

f(t) f ðtÞ ¼ LðF ðsÞÞðtÞ

F ðsÞ ¼ Lðf ðtÞÞðsÞ F(s)

f(t) f convex

F ðsÞ ¼ Lðf ðtÞÞðsÞ ¼ supt ½s  t  f ðtÞ F convex

f(t) f concave

F ðsÞ ¼ Lðf ðtÞÞðsÞ ¼ inf t ½s  t  f ðtÞ F concave

f(t) + c f(t) Æ c f(t) + t Æ c f(t + c) f(t Æ c)

F(s)  c F(s/c) Æ c F(s  c) F(s)  s Æ c F(s/c)

f(t) = g(t)  h(t), g convex, h convex

F(s) = G(s) + H(s), G convex, H convex

f(t) = g(t) ø h(t), g concave, h convex

F(s) = G(s)  H(s), G concave, H convex

f(t). It can be seen as the pointwise supremum on all affine functions majorized by f(t) such that (cl(conv f))(t) = supb,r{b + r Æ t : ("s : b + r Æ s 6 f(s))} and consequently (cl(conv f))(t) 6 f(t) holds. For the concave conjugate Lðf ðtÞÞðsÞ ¼ Lðclðconc f ÞðtÞÞðsÞ holds, where the concave hull follows as (cl(conc f))(t) = infb,r{b + r Æ t : ("s : b + r Æ s P f(s))} and (cl(conc f))(t) P f(t).

1031

3.2 by setting a(t) = 1 for t < 0. Strictly, the extended arrival curve does not belong to the set F, which is usually applied by network calculus in the time domain, where f ðtÞ 2 F implies f(t) = 0 for t < 0. The concave extension is, however, meaningful in this context. Thus, we can derive the concave conjugate of a leaky-bucket arrival curve according to Corollary 1. Corollary 1 (Conjugate leaky-bucket arrival curve). The concave conjugate of the leaky-bucket constraint given in Definition 3 can be computed according to Definition 8 and is given as  1; s < r; AðsÞ ¼ inf ½s  t  aðtÞ ¼ t b; s P r. In [12] the burstiness curve is defined to be the maximal backlog at a constant rate server with rate s and input a(t), whereby it has been pointed out in [15] that the burstiness curve is actually the Legendre transform LðaðtÞÞðsÞ. Thus, we obtain a very clear interpretation of A(s). Generally the concave hull of an arbitrary arrival curve is a valid arrival curve, since cl(conc a)(t) P a(t) for all t. Note that the hull must not be derived explicitly in the time domain, since it follows immediately from Lðclðconc aÞðtÞÞðsÞ ¼ LðaðtÞÞðsÞ.

4. Conjugate network calculus 4.2. Service curves After the introduction of convex and concave conjugates and of the dual min–plus operations in the Legendre domain we can derive the dual operations to the network calculus concatenation theorem and output theorem. However, the prerequisite for application of the dual operations is a transformation of arrival and service curves into this domain, which will be presented first. The set of dual elements in the Legendre domain shall be denoted by the term conjugate network calculus. These dual operations are complemented by a dual approach to determine performance bounds in the Legendre domain. Each of the following sub-sections presents the dual element in the Legendre domain that corresponds to the element of network calculus presented in the pertaining sub-section of Section 2. 4.1. Arrival curves According to Definition 3 arrival curves of leakybucket type are concave and defined for t P 0. We apply the concave extension described in Section

The rate-latency service curve according to Definition 5 is convex and defined for t P 0. The convex extension described in Section 3.2 allows setting the curve to +1 for t < 0. However, in this context it is more meaningful to set the service curve to zero for t < 0 which results in a convex function that belongs to F where f ðtÞ 2 F implies f(t) = 0 for t < 0. Corollary 2 (Conjugate rate-latency service curve). The convex conjugate of the rate-latency service curve given in Definition 5 can be computed based on Definition 8 and follows immediately according to 8 þ1; s < 0; > > < BðsÞ ¼ sup½s  t  bðtÞ ¼ s  T ; s P 0; s 6 R; > t > : þ1; s > R.

The conjugate B(s) of the service curve b(t) can be interpreted as the backlog bound that holds, if a constant bit rate stream with rate s is input to the respective network element.

1032

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

Generally the convex hull of an arbitrary service curve is a valid service curve, since cl(conv b)(t) 6 b(t) for all t. Note that the hull must not be derived explicitly in the time domain, since it follows immediately from Lðclðconv bÞðtÞÞðsÞ ¼ LðbðtÞÞðsÞ.

service curve that is derived here by Legendre transform is exactly the same as the one that follows by min–plus convolution in the time domain.

4.3. Concatenation

For the output bound defined in Theorem 2 we can formulate the following corollary by applying Theorem 5.

The concatenation of service elements, can be represented by min–plus convolution of the individual service curves according to Theorem 1. With Theorem 4 we can immediately formulate the following corollary. Corollary 3 (Conjugate concatenation). The conjugate service curve B(s) of the concatenation of n service elements is given as the sum of the individual conjugate service curves Bi(s) according to n X Bi ðsÞ. BðsÞ ¼ i¼1

Since it is known that LðLðbÞÞðtÞ ¼ LðBðsÞÞ ðtÞ ¼ ðclðconv bÞÞðtÞ 6 bðtÞ we find that LðBðsÞÞðtÞ is generally a valid service curve.

Here, we provide an example for the concatenation of rate-latency service elements. Consider n service elements in series with service curves bi(t) = Ri Æ [t  Ti]+ for all t. The corresponding conjugates are Bi(s) = s Æ Ti for 0P 6 s 6 Ri and +1 elsewhere. The sum is BðsÞ ¼ s  i T i for 0 6 s 6 mini[Ri] and +1 elsewhere. An example for n = 2 is shown in Fig. 1. The result is convex and deriving the convex conjugate of B(s) wePfind ðclðconv bÞÞðtÞ ¼ LðBðsÞÞ þ ðtÞ ¼ mini ½Ri   ½t  i T i  . The result is exact since (cl(conv b))(t) = b(t), where bðtÞ ¼ ni¼1 bi ðtÞ. The

4.4. Output bounds

Corollary 4 (Conjugate output bound). The conjugate output bound A 0 (s) of a service element with conjugate service curve B(s) and constrained input with conjugate input bound A(s) is provided by A0 ðsÞ ¼ AðsÞ  BðsÞ. As stated before LðLða0 ÞÞðtÞ ¼ LðA0 ðsÞÞðtÞ ¼ ðclðconc a0 ÞÞðtÞ P aðtÞ holds and we find that LðA0 ðsÞÞðtÞ is generally a valid output arrival curve. As an example consider the output bound that can be derived for a rate-latency service element with service curve b(t) = R Æ [t  T]+ for all t and leaky-bucket constrained input with arrival curve a(t) = b + r Æ t for t P 0, zero for t = 0 and 1 for t < 0. The respective conjugates are B(s) = s Æ T for 0 6 s 6 R and +1 else and A(s) = b for s P r and 1 else. The difference is A 0 (s) = b  s Æ T for r 6 s 6 R and 1 else. The example is shown in Fig. 2. The result is concave according to Lemma 1 and the concave conjugate of A 0 (s) becomes ðclðconc a0 ÞÞðtÞ ¼ LðA0 ðsÞÞðtÞ ¼ b þ r  ðt þ T Þ for t P  T and ðclðconc a0 ÞÞðtÞ ¼ LðA0 ðsÞÞðtÞ ¼ b þ R  ðt þ T Þ for t < T. Again our solution is exact since (cl(conc a 0 ))(t) = a 0 (t), where a 0 (t) = a(t) Ü b(t). The result is well known for t P 0 from min– plus de-convolution in the time domain. Further

F(s) F(s)

-b

r

R

A(s)

R2(T1+T2) R2T2 R1T1

-B(s)

B(s)

-b-rT

B2(s)

-RT

A’(s)

B1(s) -b-RT

R2

Fig. 1. Conjugate concatenated service curve.

R1

s

Fig. 2. Conjugate output arrival curve.

s

1033

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

on, it is shown in [11] that the result of min–plus deconvolution is also valid for t < 0. Note that min– plus de-convolution in the time domain as well as in the Legendre domain is not closed in F, where f ðtÞ 2 F implies f(t) = 0 for t < 0. In contrast, the output arrival curve a 0 (t) is strictly positive for 0 > t > T  b/R. The usual approach applied by network calculus in the time domain is to truncate functions for t < 0 implicitly by allowing only values t P u P 0 in min–plus convolution, respective t P 0 and u P 0 in min–plus de-convolution. Here, we require an explicit truncation of the output arrival curve a 0 (t) for t < 0 before interpreting the result of the Legendre transform Lða0 ðtÞÞðsÞ as a meaningful backlog bound for a virtual subsequent constant rate network node. 4.5. Performance bounds According to Theorem 3 the maximum backlog is given as the maximum vertical deviation between arrival and service curve. The derivation of this maximum deviation can be mapped to the problem considered in the Fenchel duality theorem, which deals with the problem of finding the minimum distance between two functions f1 and f2, where in our case we define f1 : R ! ð1; 1 is a convex and f2 : R ! ½1; 1Þ a concave extended real-valued function. Theorem 6 (Fenchel duality theorem). Let the functions f1 and f2 be convex with f1 ; f2 : R ! ð1; 1. Then inf ff1 ðxÞ  f2 ðxÞg ¼ supfF 2 ðsÞ  F 1 ðsÞg x2R

s2R

holds, with F1(s) and F2(s) being the convex and concave conjugate of f1(x) and f2(x), respectively. Proof. We will partly reproduce the proof given in [2], but limited to functions on R instead of functions on Rn . First, we want to apply the Lagrange approach for optimization with equality constraints. Therefore we restate the problem minimize

f 1 ðxÞ  f2 ðxÞ

subject to

x2R

to the problem minimize

f 1 ðyÞ  f2 ðzÞ

subject to

zy ¼0

f1(x)

~s F2( s ) – F1 ( s ) x*

f2(x)

x

F2( s* ) – F1( s*)

Fig. 3. Illustration of FenchelÕs duality theory.

with y 2 dom(f1) and z 2 dom(f2). With the Lagrangian given by L(y, z, s) = f1(y)  f2(z) + (z  y)s we obtain the dual function: qðsÞ ¼ inf ff1 ðyÞ  f2 ðzÞ þ ðz  yÞsg y2R;z2R

¼ inf fsz  f2 ðzÞg þ inf ff1 ðyÞ  sy g z2R

y2R

¼ inf fsz  f2 ðzÞg  sup fsy  f1 ðyÞg z2R

y2R

¼ F 2 ðsÞ  F 1 ðsÞ; where F 1 ðsÞ ¼ supy2R fsy  f1 ðyÞg and F 2 ðsÞ ¼ inf z2R fsz  f2 ðzÞg denote the convex respective concave Fenchel conjugates2 of a convex function f1(t) respective a concave function f2(t) as previously defined in Definition 8. Now, with the functions F 1 : R ! ð1; 1 and F 2 : R ! ½1; 1Þ and according to duality theory in optimization [2], we can derive the dual problem maximize

qðsÞ ¼ F 2 ðsÞ  F 1 ðsÞ

subject to

s 2 R;

which corresponds to sups2R fF 2 ðsÞ  F 1 ðsÞg.

h

Fig. 3 illustrates a graphical interpretation of the Fenchel conjugates and of the Fenchel duality theorem. The convex conjugate F1(s) for a particular s can be constructed by drawing the lower tangent to f1 with a slope of s. As F 1 ðsÞ ¼ supx2R fsx f1 ðxÞg ¼ inf x2R ff1 ðxÞ  sxg holds, the ordinate of this tangent equals F1(s). Accordingly, the 2

As f1 and f2 are convex functions and R is a convex set with a linear constraint z  y = 0 there is no duality gap between the primal function inf x;y2R;z¼y ff1 ðyÞ  f2 ðzÞg and inf y2R;z2R ff1 ðyÞ  f2 ðzÞ þ ðz  yÞsg.

1034

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

negative concave conjugate F2(s) is given where the upper tangent to f2 with slope s intersects with the vertical axis. FenchelÕs duality theorem says that the minimum vertical distance between a convex and concave curve is equivalent to the maximum difference of the concave conjugate and the convex conjugate. The first order necessary condition for an extremum of f1(x)  f2(x) yields df1(x)/dx = df2(x)/dx, i.e., at the minimum distance the tangents to both functions must be parallel. Consequently, the maximum vertical distance of parallel tangents to f1(x) and f2(x) at one x* corresponds to the minimum distance of the two functions. By applying the Fenchel duality theorem we can derive the backlog bound in the Legendre domain. Theorem 7 (Backlog bound in the Legendre domain). The maximum backlog at the server considered in Theorem 3 is given as Q ¼  supfAðsÞ  BðsÞg. s2R

Proof. The analogy to FenchelÕs duality theorem becomes apparent, when f1 corresponds to the convex minimum service curve and f2 corresponds to the concave arrival curve. Generally the concave arrival curve a resides above the convex function b on a subset U  R. This implies a negative difference bðtÞ  aðtÞ 8t 2 U such that the minimization yields the maximum vertical distance between arrival and minimum service curve. Therefore we can immediately apply FenchelÕs duality theorem and obtain inf fbðtÞ  aðtÞg ¼ supfAðsÞ  BðsÞg. t2R



s2R

Theorem 8 (Delay bound in the Legendre domain). An upper bound on the delay at the server considered in Theorem 3 in case of FIFO scheduling is indicated by the difference of the slopes of the tangents to B(s) and A(s) in the same s*, which intersect the vertical axis in the same point. Proof. We cannot easily apply FenchelÕs duality theorem for the horizontal deviation of arrival and minimum service curve unless we want to express the deviation as vertical distance of the inverse functions of a and b. Instead, we assume that a and b are closed functions and make use of their convexity and concavity properties. As a consequence, for their biconjugates LðLðaÞÞðtÞ ¼ aðtÞ and LðL

bit

B(s) ~(t+d) s*

bit/s

~t

–α –β

A(s) Fig. 4. Illustration of maximum delay in the conjugate transform domain.

ðbÞÞ ¼ bðtÞ holds. Fig. 4 depicts the conjugate transforms of a concave arrival curve and a convex minimum service curve. The value of the biconjugate for example of b(t) is given as the intersection of the tangent to B(s) with slope t with the vertical axis. The problem of determining the maximum horizontal deviation in the time domain can be stated as follows: maximize

d

subject to

aðtÞ ¼ bðt þ dÞ;

t; d 2 R.

The Lagrangian for this problem is L(t, s) = d + s(a(t)  b(t + d)) and the first order necessary condition oL /ot = 0 and oL/os = 0 yields oaðtÞ=ot ¼ obðt þ dÞ=ot; aðtÞ ¼ bðt þ dÞ. The equivalent condition to a(t) = b(t + d) in the Legendre domain means that the tangents to the conjugate functions must intersect with the vertical axis at the same point, as illustrated in Fig. 4. Condition oa(t)/ot = ob(t + d)/ot translates to requiring tangents to the conjugate functions at the same s. Consequently, the points on a(t) and b(t) with maximum horizontal deviation in the time domain correspond to the points on A(s) and B(s) in the Legendre domain, where the tangents to A and B at the same s intersect in the same point on the vertical axis. Due to the characteristics of the Fenchel conjugates the difference of the slopes of these tangents indicates the maximum horizontal deviation in the time domain. h We do not claim that the computation of performance bounds in the conjugate transform domain is simpler. But even if the complexity is the same in both domains, the benefit is that one can determine performance bounds once all arrival and service

1035

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

curves have been transformed. In summary, as major driver for conjugate network calculus we see the simplification of convolution and de-convolution operations, while determining the performance bounds in the conjugate transform domain additionally alleviates the need to re-transform into the time domain.

bit ρ D

σ

p

r

Q

M

5. Example applications s

d

The application of network calculus and performance analysis in the Legendre domain will be demonstrated using the example of calculating the maximum delay and maximum backlog of Dual Leaky Bucket (DLB) constrained traffic at a ratelatency server. Furthermore, the example of service-curve-based routing following the proposal presented in [19] intends to illustrate the advantage of conjugate network calculus in terms of simplifying calculations which are more complex in the time domain. 5.1. Rate-latency performance bounds The example of DLB-constrained traffic traversing a rate-latency server originates from the Integrated Services Model [22] and considers a variable bit rate flow, which is upper bounded by an arrival curve a(t) = min[pt + M, qt + r]. Theorem 9 (DLB—rate-latency bounds in the Legendre domain). Consider a variable bit rate flow, which is upper bounded by a DLB arrival curve, served at a rate-latency server that guarantees a minimum service curve b = r[t  d]+, where p P r P q shall hold. Then, the maximum backlog can be computed by  þ rM  d ðq  rÞ Q ¼ r þ qd þ pq and the maximum packet delay, when being served in FIFO order, can be calculated from D¼

M rM pr þ þ d. r pq r

These performance bounds have been derived in [11], for example, and correspond to the maximum horizontal and vertical deviations between a and b, respectively (cf. Fig. 5). We will provide an alternative proof for Theorem 9 serving as sample application of Theorems 7 and 8.

Fig. 5. DLB-LR performance bounds in the time domain.

Proof of Theorem 9. In order to derive the maximum backlog bound we have to determine the minimum vertical distance between B(s) and A(s). Note that A(s) for a concave aðtÞ 2 F is generally negative and so is A(s)  B(s). The concave conjugate of a DLB arrival curve a(t) = min[pt + M, qt + r] and the convex conjugate of a latencyrate service curve b(t) = r[t  d]+ are denoted by 8 1; s < q; > > >

pq > > : M; s > p; 8 1; s < 0; > < BðsÞ ¼ sd; 0 6 s 6 r; > : 1; s > r. As depicted in Fig. 6 the minimum vertical distance between B(s) and A(s) is either at s = q in case that d P (r  M)/(p  q) holds, or at s = r otherwise. For s = q this distance is r + qd, which corresponds to the maximum backlog of Theorem 9 for (r  M)/(p  q)  d 6 0.

bit rd Q p ρ

–M

–σ

bit/s

r

D

Fig. 6. DLB-LR performance bounds in the Legendre domain.

1036

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

If s = r applies, the vertical distance is denoted by

x

rM ðr  qÞ pq rM ¼ r þ rd þ ðq  rÞ; pq

R

Q ¼ rd þ r 

S

which corresponds to the maximum backlog given in Theorem 9 with  þ rM rM  d ðq  rÞ ¼ ðq  rÞ  qd þ rd. pq pq Thus we have derived the backlog bounds from the minimum vertical distance of the conjugate arrival and service curves. In order to derive the delay bound we need to determine the tangents to the conjugates at the same s* which intersect in the same ordinate. As illustrated in Fig. 6 such intersecting tangents can only be at s* = r. The tangent to A(s) has a slope of (r  M)/(p  q), the tangent to B(s) a slope of d + r/r + q(r  M)/[r(p  q)]. Now the maximum delay bound can be calculated from the difference of the slopes of these tangents, where  r r  M q 1 D¼dþ þ r pq r rðp  rÞ  Mðq  rÞ ¼dþ rðp  qÞ M rM pr ¼dþ þ r pq r corresponds to D in Theorem 9. h 5.2. Service-curve-based routing in the Legendre domain We consider the problem of finding a feasible path in a certain network for traffic with a particular arrival curve a(t) and delay demand de2e,max. In this context the feasibility of a path is defined as follows: Definition 9. For traffic which is upper bounded by a(t) and for a given delay demand dxy,max between two network nodes x and y the necessary condition for a series of network elements to become a feasible path for that demand is inffs P 0 : aðtÞ 6 bxy ðt þ sÞ 8t P 0g 6 d xy;max where according to Theorem 1 the end-to-end service curve bxy(t) is computed by y1

bxy ðtÞ ¼ b bi;iþ1 ðtÞ i¼x

y Fig. 7. Local search in the neighborhood of a minimum weight path.

with bi,i+1(t) denoting the service curve assigned to the link between node i and its link peer i + 1. Initially, one path shall be given between source x and destination y, which, for example, may result from a minimum hop computation. Fig. 7 depicts an excerpt of an example graph representing the network with a given (minimum hop) path. In case that an initially computed path is not feasible, [19] proposes to deploy tabu-search as local search heuristic in order to exploit the neighborhood of the initial path to find a feasible solution. The neighborhood of a path is given as the number of nodes adjacent to each of the nodes along the current path. A move of the tabu-search procedure means replacing one or more links of the current path by another sequence of links. After selecting the starting node of an alternative path a minimum hop path to one downstream node of the original path shall constitute a substitute path. For example, the link from the source node to the next downstream node can be replaced by choosing the link to the right hand neighbor. The substitute path is marked by solid arrows in Fig. 7 and is denoted by R, while the substituted path is denoted by S. The service curve of the substitute path is given by bR ¼ bi2R bi and the resulting end-to-end service curve is computed from bxy;subs ¼ bR 

b

bj;jþ1

j2fx;...;y1gnS

¼ b bi;iþ1  i2R

b j2fx;...;y1gnS

bj;jþ1 .

1037

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

stored as bbest (lines 4–13). A tabu-move is a move which is stored in the so called tabu-list. The tabulist stores the recent history of moves and comes into play when attempting to leave a local extremum. The identified next moves in terms of the substitute path are stored in the tabu-list (cf. line 13). For further details on tabu-search we refer to [9]. The computational complexity of this procedure mainly stems from the frequently invoked min–plus convolution in line 10 and from the infimum operation in the while condition in line 2. In the Legendre domain the min–plus convolution is transformed to a simple addition of the convex conjugates Bk,k+1(s) such that line 10 of the procedure transforms to Bxy;subs ðsÞ

X

BR ðsÞ þ

Bk;kþ1 ðsÞ

k2fx;...;y1gnS

Fig. 8. Algorithm to find a new feasible path.

A procedure COMPUTEROUTE to find a feasible path is depicted in Fig. 8 using pseudo-code notation [4]. The parameter p[Æ] is a vector denoting the sequence of nodes traversed by the initially configured path, the parameters a and dxy,max represent the arrival curve and delay demand of the traffic to be routed, respectively. The two-dimensional array Np ½;  describes the neighborhood of the path p[Æ], where for each node p[i] the array element Np ½i; j with j 2 J indicates one direct neighbor3 and J denotes the number of neighbors. This neighborhood is always adapted to the currently configured path (cf. line 3). The procedure checks all substitute paths, which are determined by a sub procedure FIND-SUBSTITUTE, beginning at node p[i] for all nodes i 2 I along p[Æ]. If changing the route to include the substitute path R does not lead to a tabu-move and if the service curve bxy,subs along the new path results in a lower delay performance for traffic constrained by a, then the new path is set as best currently found path pbest[Æ] and the pertaining service curve is

3

A node with direct link to p[i].

P with BR ðsÞ ¼ l2R Bl;lþ1 ðsÞ, where the pointwise addition of the convex conjugates can be considered significantly less complex than the min–plus convolution operation, which is mainly comprised of an infimum operation. The feasibility criterion of Definition 9 transforms to requiring that the maximum difference between the slopes of the tangents to A(s) and Bxy(s), which intersect in one ordinate, be less than dxy,max. However, if we consider a particular arrival curve, for example a leaky-bucket constrained arrival process, then we can derive more convenient feasibility criteria. If a corresponds to a leaky bucket, then according to Corollary 1 the concave conjugate is given by A(s) = b "s P r and A(s) = 1 "s < r. In this case the tangent to A(s) intersects with the y-axis in b and the feasibility criterion is that the function T(s) = sdxy,max  b must reside above

bit B(s) ~d bit/s r A(s) –b

Fig. 9. Path feasibility criterion in Legendre domain for leakybucket constrained traffic.

1038

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039

Bxy(s) or be a tangent to Bxy(s) for one s* > r. Fig. 9 illustrates a graphical explanation of this criterion. With a horizontal tangent to A(s) the slope of the tangent to B(s) which intersects in the same ordinate must not exceed dxy,max. Consequently, all feasible service curve conjugates B(s) must have a tangent with a smaller slope, i.e., either B(s) resides below the line T(s) = sdxy,max  b or T(s) is tangent to B(s). The condition ‘‘for at least one s* P r’’ ensures that the common s*, where both tangents touch A(s) and B(s) respectively (cf. Theorem 8), is in a subset of R, where A(s) 5 1. 6. Conclusions In this paper we have shown that the Legendre transform provides a dual domain for analysis of data networks applying network calculus. Our work is a significant extension of the known analogy of system theory and network calculus where the Legendre transform corresponds to the Fourier transform in system theory. In particular, we have proven that min–plus convolution and min–plus de-convolution correspond to addition respective subtraction in the Legendre domain, which allows for an efficient analysis and fast computation. Furthermore, we have derived bounds on backlog and delay in the Legendre domain using the conjugate arrival and service curves. Example applications of this conjugate network calculus demonstrate the characteristic properties of network calculus and performance analysis in the Legendre domain. Acknowledgements This work was supported in part by an Emmy Noether grant of the German Research Foundation (DFG) and in part by the Centre for Quantifiable Quality of Service in Communication Systems (Q2S). The Q2S Centre of Excellence is appointed by the Research Council of Norway and funded by the Research Council, NTNU and UNINETT. References [1] F. Baccelli, G. Cohen, G.J. Olsder, J.-P. Quadrat, Synchronization and Linearity. An Algebra for Discrete Event Systems, Wiley, 1992. [2] D. Bertsekas, A. Nedic, A. Ozdaglar, Convex Analysis and Optimization, Athena Scientific, Massachusetts, USA, 2003.

[3] C.-S. Chang, Performance guarantees in communication networksTNCS, Springer, 2000. [4] T. Cormen, C. Leiserson, R. Rivest, Introduction to Algorithms, MIT Press, 1990. [5] R.L. Cruz, A Calculus for Network Delay, Part I: Network Elements in Isolation, IEEE Transactions on Information Theory 37 (1) (1991) 114–131. [6] R.L. Cruz, A Calculus for Network Delay, Part II: Network Analysis, IEEE Transactions on Information Theory 37 (1) (1991) 132–141. [7] L. Dorst, R. van den Boomgard, Morphological Signal Processing and the Slope Transform, Elsevier Signal Processing 38 (1) (1994) 79–98. [8] M. Fidler, S. Recker, A Dual Approach to Network Calculus Applying the Legendre Transform, Proceedings of QoS-IP (2005) 33–48. [9] F. Glover, M. Laguna, Tabu search, in: C. Reeves (Ed.), Modern Heuristic Techniques for Combinatorial Problems, Blackwell Scientific Publishing, Oxford, England, 1993. [10] T. Hisakado, K. Okumura, V. Vukadinovic, L. Trajkovic, Characterization of a simple communication network using Legendre transform, Proceedings of IEEE International Symposium on Circuits and Systems (2003) 738–741. [11] J.-Y. Le Boudec, P. Thiran, Network calculus a theory of deterministic queuing systems for the internetLNCS 2050, Springer, 2002. [12] S. Low, P. Varaiya, Burst reducing servers in ATM networks, Queueing Systems, Theory and Applications 20 (1–2) (1995) 61–84. [13] Y. Lucet, A fast computational algorithm for the Legendre– Fenchel transform, Computational Optimization and Application (1) (1996) 27–57. [14] R. Maragos, Slope Transform: Theory and Application to Nonlinear Signal Processing, IEEE Transactions on Signal Processing 43 (4) (1995) 864–877. [15] J. Naudts, Towards real-time measurement of traffic control parameters, Elsevier Computer Networks 34 (1) (2000) 157– 167. [16] K. Pandit, J. Schmitt, C. Kirchner, R. Steinmetz, Optimal allocation of service curves by exploiting properties of the min–plus convolution, Technical Report, TR-KOM-200408, Technical University Darmstadt, 2004. [17] A.K. Parekh, R.G. Gallager, A generalized processor sharing approach to flow control in integrated services networks: the single-node case, IEEE/ACM Transactions on Networking 1 (3) (1993) 344–357. [18] A.K. Parekh, R.G. Gallager, A generalized processor sharing approach to flow control in integrated services networks: the multiple-node case, IEEE/ACM Transactions on Networking 2 (2) (1994) 137–150. [19] S. Recker, Service curve based routing subject to deterministic QoS constraints, in: B. Gavish, P. Lorenz (Eds.), Telecommunications Systems, vol. 24, Kluwer Academic Publishers, Boston, MA, 2003, pp. 385–413. [20] S. Recker, M. Fidler, Network calculus and conjugate duality in network performance analysis, in: Proceedings of ITC 19, 2005. [21] R.T. Rockafellar, Convex Analysis, Princeton, 1972. [22] S. Shenker et al., Specification of guaranteed quality of service, RFC 2212, Internet Engineering Task Force, 1997.

M. Fidler, S. Recker / Computer Networks 50 (2006) 1026–1039 Markus Fidler received his Dipl. Ing. in Electrical Engineering from Aachen University (Germany) in 1997 and his Dipl. Kfm. in business economics from University of Hagen in 2001. He was with Hagenuk Telecom and Alcatel Research and Development from 1997 to 2000, where he worked in the field of cellular mobile communications and in particular on the General Packet Radio Service (GPRS). In 2001 he joined the Department of Computer Science at Aachen University, where he received his Dr. Ing. end of 2003. Currently he is with the Centre for Quantifiable Quality of Service in Communication Systems at NTNU Trondheim, Norway. His research interests are in Traffic Engineering, Quality of Service, and Network Calculus.

1039

Stephan Recker received a diploma degree in Electrical and Electronic Engineering from the Technical University of Aachen, Germany, and a Ph.D. degree in Information and Communications Engineering from University of Duisburg-Essen, Germany, in 1997 and 2004, respectively. Since 1997 he has been a researcher at IMST GmbH, a medium sized research company specializing on radio communication systems, where he is responsible for research activities in the area of resource management for packet switched networks and for radio networking systems.

Computer Networks 50 (2006) 1040–1058 www.elsevier.com/locate/comnet

Comparison and analysis of the revenue-based adaptive queuing models Alexander Sayenko *, Timo Ha¨ma¨la¨inen, Jyrki Joutsensalo, Lari Kannisto University of Jyva¨skyla¨, MIT Department, P.O. Box 35, Mattilaniemi 2 (Agora), 40014 Jyva¨skyla¨, Finland Available online 5 October 2005

Abstract This paper presents several adaptive resource sharing models that use a revenue criterion to allocate bandwidth in an optimal way. The models ensure QoS requirements of data flows and, at the same time, maximize the total revenue by adjusting parameters of the underlying schedulers. Besides, the adaptive models eliminate the need to find the optimal static weight values because they are calculated dynamically. The simulation consists of several cases that analyse the models and the way they provide the required QoS guarantees. The simulation reveals that the installation of the adaptive model increases the total revenue and ensures the QoS requirements for all service classes. The paper also presents how the adaptive models can be integrated with the IntServ and DiffServ QoS frameworks.  2005 Elsevier B.V. All rights reserved. Keywords: QoS; WRR; DRR; WFQ; Adaptive scheduling

1. Introduction The current development of communication networks can be characterized by several important factors: (a) the growth of the number of mobile users and (b) the tremendous growth of new services in wired networks. While at the moment most mobile users access services provided by mobile operators, it is possible to predict that they will be eager to use services offered by wired networks. Furthermore, as the throughput of wireless channels grows, more users will access them. In this framework, it is important that users obtain the required *

Corresponding author. Tel.: +358 14 260 3243. E-mail addresses: [email protected].fi (A. Sayenko), timoh@ cc.jyu.fi (T. Ha¨ma¨la¨inen), [email protected].fi (J. Joutsensalo), lari.kannisto@jyu.fi (L. Kannisto).

end-to-end guarantees, which are referred to collectively as the Quality-of-Service (QoS) requirements. The provision of QoS implies that appropriate arrangements are taken along the packet path that comprises of network access points, one or more core networks, and interconnections between them. As a packet moves from a source to a destination point, it spends most of the time in the core networks. As a result, the efficient provision of resources in the core networks is the essential part of QoS. IETF has proposed several architectures to realize QoS in packet networks. While Integrated Services (IntServ) [7] rely upon the per-flow approach, Differentiated Services (DiffServ) [6] perform the allocation of resources on per-class basis, thus providing more scalable solutions. However, the presence of services, such as VoIP and video

1389-1286/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.09.002

A. Sayenko et al. / Computer Networks 50 (2006) 1040–1058

conferencing, impose additional constraints on the DiffServ framework. As the output bandwidth is allotted for each service class by the queuing policy in routers along a path, the provided QoS guarantees depend on a scheduler and its parameters. In most cases, a static configuration is used, which makes the scheduler irresponsive to of the number of flows within each traffic class. As a result, a service provider has to overprovision its network with resources to meet all the QoS requirements, regardless of the current number of active flows. Such an approach results in inefficient allocation of resources. Though it may not be a significant issue for wired providers, it is very critical for the core wireless network, in which resources should be allocated in an optimal way. The problem of effective allocation of the network resources can be solved if a router exploits adaptive service weight assignment. This means that the minimum departure rate should be adjusted dynamically to reflect changes in the bandwidth requirements of service classes. Obviously, it requires the DiffServ framework to track the number of active data flows within each service class. One possibility is to use the Resource Reservation Protocol (RSVP) from the IntServ framework, as considered in [3,5]. Also, other proprietary solutions can be used. This paper presents and analyses several resource sharing models that ensure the QoS requirements and maximize a service providerÕs revenue. The objectives of the models are somewhat similar to those considered in [13]: (a) to share bandwidth between various service classes with the required level of QoS and (b) to distribute free resources in a predictable and controlled way. However, we propose a more rigorous bandwidth allocation. While the QoS requirements determine the minimal required amount of resources, the prices of network services can control the allocation of free bandwidth. It is intuitively understandable that it is worth providing more bandwidth for those classes for which end-users are willing to pay more. Furthermore, such an approach may interest wireless providers as they charge for all kinds of services that are accessed from mobile devices. Thus, the goal of the proposed models is to increase the total revenue by allocating free resources to certain service classes and reducing bandwidth previously assigned to the other ones. This paper extends our previous research work, in which we have presented several adaptive

1041

resource sharing models [30,28,29]. The adaptive models are the optimization tasks that calculate the optimal parameters of a scheduler based on the QoS requirements of data flows and the pricing information. As we present in this article, these models solve several important tasks simultaneously: (a) control whether a router has enough resources, (b) ensure the QoS guarantees, and (c) calculate the optimal configuration to increase the total revenue. Though the general idea of each model is the same, different underlying schedulers are used. Since each scheduler has its advantages and drawbacks, it is a challenging task to analyse which adaptive model gives the best results. The analysis is done by comparing the analytical models and by extensive simulations. One simulation case considers the behaviour of the adaptive models in the IntServ framework. The rest of this paper is organized as follows. Section 2 provides a brief overview of related research work. Section 3 considers the fundamental queuing disciplines used to provide QoS. Section 4 presents the proposed adaptive models. Section 5 presents several simulation scenarios that are used in analysing the models. This section also considers the simulation results. Finally, Section 6 summarizes the obtained results and sketches future research directions. 2. Related study The problem of efficient allocation of resources has recently gained significant attention, including the creation of new scheduling policies and a combination of existent ones. Another direction is the dynamic adaptation of parameters to the varying network environments. Here, we present a brief overview of studies devoted to the adaptive allocation of resources in different queuing disciplines. The problem of adjusting weights of the Weighted Round Robin (WRR) policy to support the premium service has been considered in [36]. It is proposed to allocate resources according to the dynamics of the average queue size that is calculated using use a low-pass filter. If the average queue size of the Expedited Forwarding (EF) aggregate increases then the processing resources are taken from the Assured Forwarding (AF) and Best Effort (BE) aggregates, otherwise they are returned to the AF class. However, it was not considered how the resources should be allocated for the AF aggregate to ensure all QoS guarantees.

1042

A. Sayenko et al. / Computer Networks 50 (2006) 1040–1058

A similar kind of algorithm in which the state of the queues are used to adapt weights has been proposed in [20]. It relies upon the Weighted Fair Queuing (WFQ) policy and makes dynamic assignment of weights based on the usage of queues. Unfortunately, this algorithm was considered only for the IntServ framework. It might be necessary to refine it for the DiffServ architecture, in which, for instance, short physical queues are built and, at the same time a significant amount of resources is allocated for the EF aggregate. In [38], a modified WRR scheme, which is called Fair WRR (FWRR), has been proposed to protect BE traffic from AF out-of-profile packets in the core routers. This policy adjusts dynamically the service weights and buffer allocations by using congestion hints in order to avoid unfair bandwidth sharing. However, the EF traffic aggregate was not considered and no recommendations were provided on how to reserve resources for this aggregate. In [21], the Variable WRR (VWRR) policing model was introduced that uses the average packet length to adapt weights in the WRR policy. A weight value, which bases itself on bandwidth requirements, is corrected using the average length of packets. However, the simulation presented in that paper only includes the general case. Neither DiffServ architecture nor the settings to implement PHBs with the proposed policy were considered. Moreover, it is not clear how to choose the base value of a weight. In [23], the effective bandwidth is used to adjust bandwidth allocation in the DiffServ framework. The proposed measurement-based adaptive scheme, which is referred to as Dynamic WRR (DWRR), either increases or decreases the bandwidth based on the values of the estimated bandwidth, multiplexing gain factor, and the measured loss ratios. The proposed scheme might work well for the AF traffic aggregates, but it fails to take into account the delay requirements of the EF class. Besides, the article does not clarify how bandwidth allocations are translated into the weight values of the WRR scheduler. In [24], a configuration scheme has been considered that guarantees maximum revenue for the service provider while keeping the utilization high. The proposed scheme, which is based on the WFQ scheduler, selects those traffic flows that maximize the benefit. However, the study has focused only on the best effort service class. Neither EF nor the other behaviour aggregates have been considered.

3. Queuing disciplines The choice of an appropriate queuing discipline is the key to providing QoS because it is the basis for allocating resources [39]. Though it is possible to share resources using buffer management mechanisms [16], it is not as efficient as the usage of schedulers or the combination of both approaches. The most popular and fundamental queuing disciplines are First-Come-First-Served (FCFS), Priority Queue (PQ) [22], WRR [17] and WFQ [26]. FCFS determines service order of packets strictly based on their arrival order. Therefore, this policy cannot perform the necessary bandwidth allocation and provide the required QoS. The PQ policy absolutely prefers classes with higher priority and, therefore, packets of a higher priority queue are always served first. Thus, if a higher priority queue is always full then the lower priority queues are never served. This problem can be eliminated by using WRR, in which queues of all service classes are served in the round robin manner. However, if some queue has a longer average packet size than the other queues, it receives more bandwidth implicitly. This disadvantage was overcome with the WFQ technique, which schedules packets according to their arrival time, size, and the associated weight. Though WFQ provides a way to specify precisely the amount of output bandwidth for each traffic class, it is complicated in implementation. From this viewpoint, WRR does not introduce any computational complexity, but it fails to take the packet size into account. The Deficit Round Robin (DRR) policy [32] came as the tradeoff between the implementation complexity and the precise allocation of resources. So, depending on the requirements and the network equipment available, a network provider uses the appropriate queuing disciplines [35]. 4. Adaptive models In order to calculate the optimal values of weights, different criteria can be exploited, e.g., the mean packet size, queue size, and packet loss. Here, we propose to use the prices of network services as the main criterion for allocation of network resources. Nowadays, several charging models are used in pricing the Internet services [27], among which flat charging and usage charging are the most popular. Flat pricing implies that a customer only pays the joining fee and has unlimited access to the network resources. However, this strategy does

A. Sayenko et al. / Computer Networks 50 (2006) 1040–1058

not usually guarantee any QoS. Furthermore, flat charging is not suitable for maximizing revenue, as it charges all customers equally and fails to take into account the amount of data transferred or the time a service is provided. Thus, usage pricing, which is based on the amount of resources used or reserved by a customer, could be used. Experiments have shown that usage pricing is a fair way to charge customers and to allocate network resources [1]. Internet services usually use volume-based rather than time-based charging because the former reflects the duration of a connection and access speeds. 4.1. Weighted Fair Queuing 4.1.1. WFQ scheduler Weighted Fair Queuing (WFQ) [26] schedules packets according to their arrival time, size, and the associated weight. Upon arrival of a new packet, the virtual time is calculated and the packet is scheduled for departure in the right order with respect to the other packets [11]. Such an approach enables the sharing of resources between traffic aggregates in a fair and predictable way. Furthermore, it is possible to estimate the bandwidth allocation and the worstcase delay performance, which makes the use of the WFQ policy very attractive for the provision of QoS. However, this policy is not deployed widely in the high-speed routers due to its implementation complexity and computational cost. 4.1.2. QoS requirements Each service class can have an associated weight that specifies the allocated bandwidth. Suppose, B is the total throughput of an output link on which a router implements the WFQ service discipline. If all sessions of the WFQ scheduler are active, then each class receives a portion of the total bandwidth, which is determined by its weight wi and is equal to wiB. Hence, to simplify the expressions, we assume that it holds for all weights wi that X wi ¼ 1; wi 2 ð0; 1Þ. ð1Þ

If there are Ni active flows within the ith class, then each flow has bandwidth that can be approximated by1 wi B Bfi ¼ ; ð2Þ Ni 1

In the case of WFQ, the fairness between data flows within a service class can be achieved by per-flow buffer management.

1043

Bfi can be treated as one of the QoS parameters that specifies the required bandwidth of a flow belonging to the ith service class. Thus, the minimum value of the weight, which provides the necessary amount of bandwidth for every flow, can be given by wi P N i

Bfi . B

ð3Þ

The inequality states that a provider can allocate more resources than necessary. Indeed, if the network has free bandwidth resources, then a provider can allocate, either explicitly or implicitly, more bandwidth to a service class. It is often the case that instead of specifying the requirements for a single traffic stream it is necessary to allocate resources for the whole class not taking the number of active flows into account. For instance, suppose a provider wants to allocate a minimum amount of bandwidth for the best-effort class. For these purposes, a modified version of (3) is proposed: wi P

Bi . B

ð4Þ

Here Bi specifies the minimum amount of bandwidth resources for the whole ith class. Depending on the resource allocation strategy, a provider uses either (3) or (4), or both. In the latter case, it is possible to reserve a certain minimum amount of bandwidth regardless of the number of active flows. Due to the buffering, scheduling, and transmission of packets the size of router queues varies all the time. In turn, the length of a queue in a routing node has an impact on the queuing delay and on the overall end-to-end delay of a packet. It can be shown that under the WFQ policy the worst-case queuing delay is given by the following expression, where Lmax denotes the maximum packet size: D¼

r Lmax þ . q B

ð5Þ

In (5), it has been assumed that each incoming flow is regulated by the Token Bucket scheme [9] with the bucket depth r and token rate q. Parameters r and q can be viewed as the maximum burst size and the long term bounding rate, respectively. Since q is a long-term rate, it is possible to imply that it is equal to the bandwidth allocated for a distinctive flow in a service class. As mentioned earlier, it is equal to wiB/Ni. Thus, (5) can be given in the following way, where Di is the worst-case delay of a packet in the ith class:

1044

Di ¼

A. Sayenko et al. / Computer Networks 50 (2006) 1040–1058

N i r Lmax . þ wi B B

ð6Þ

Since there is a need to know the value of wi, under which the required queuing delay can be guaranteed, it is possible to use (6) to obtain it: wi P

N ir . BDi  Lmax

ð7Þ

Here, wi specifies the minimum value that is necessary to guarantee the delay requirements. It is clear that the more active flows there are and the least the required delay is, the bigger portion of resources must be allocated. 4.1.3. Pricing criterion To charge customers, a provider uses the pricing function, which can be chosen to be in the following form. If the fluid model is taken into consideration and if all sessions of the WFQ scheduler are active, then the expression wiB approximates the amount of the ith service class data a scheduler outputs during a time unit. Suppose, Ci() specifies the price for one data unit, then the instantaneous revenue for the ith class is given by rðwi Þ ¼ C i ðÞwi B.

ð8Þ

Since resources are shared between several classes, then the overall instantaneous revenue can be written in the form: m X C i ðÞwi B½monet.units n second. rðw1 ; . . . ; wm Þ ¼ i¼1

ð9Þ

It is interesting to note that the proposed function does not depend on the number of active flows. Indeed, if a provider has switching equipment with a certain capacity, then it does not matter how many flows there are. The more data streams that are aggregated within a class, the less bandwidth each stream has. The total amount of data, capable of being transferred over a period of time, remains the same (which is true if all flows send data continuously and use all resources). Thus, by manipulating the weights wi different instantaneous revenue can be obtained which affects the total revenue. 4.1.4. General model The adaptive model for the WFQ scheduler consists of the pricing function (9), which hence will be referred to as the target function, and a set of constraints (1), (3), and (7). It should be noted that

the target function can be simplified by removing the constant component B. Furthermore, it is proposed to add a new parameter ci. Its purpose is to disable or enable the allocation of excess resources for the ith service class. Suppose, there is a service class consisting of applications that generate constant rate data streams. Although it may be the most expensive class, all the excess resources, allotted to it, will be shared among the other classes, because the constant rate sources will not increase their transmission rates. Therefore, the allocation of the excess resources can be disabled by setting ci = 0. If more bandwidth is allocated for a service class that consists predominantly of TCP flows, then the applications will increase their window sizes and, as a result, their transmission rates. Thus, it makes sense to set ci = 1: ( ) m X max ci C i ðÞwi ð10Þ i¼1

subject to m X wi ¼ 1;

wi 2 ð0; 1Þ;

i¼1

 N i Bfi N ir wi P max ; ; B BDi  Lmax 

8i ¼ 1; m.

The model presented in (10) is a linear optimization problem, in which the object of optimization is weights wi. By solving this task, the optimal weight values are obtained. One of the methods that can be used to calculate the optimal values of the wi coefficients is the simplex algorithm [33]. If a provider has to ensure only bandwidth guarantees, then the delay term can be removed from a constraint, but the general model remains the same. This adaptive model can be used with other FQ scheduler, such as Worst-case Weighted Fair Queuing [2], Self-Clocked Fair Queuing [14], and Starttime Fair Queuing [15]. Only minor corrections of the delay constraint will be necessary. 4.2. Weighted Round Robin 4.2.1. WRR scheduler The WRR scheduler works in a cyclic manner serving the input queues in sequence. During a cycle, a certain number of packets, determined by the associated weight, are sent from each queue. If a queue has fewer packets than the value of the weight, the WRR scheduler begins to serve the next

A. Sayenko et al. / Computer Networks 50 (2006) 1040–1058

queue. The WRR scheduler does not take the size of transmitted packets into account. As a result, it is difficult to predict the actual bandwidth that each queue gets. In other words, it is difficult to use only the weight values as the means to specify the amount of the output bandwidth. Suppose, wi is the value of the weight associated with the ith queue. If Li is the mean packet size of the ith input queue, then wiLi bits of data are sent during each cycle on average. If there are m input queues, then it is easy to show that the average amount of data transmitted from all queues during one cycle can be approximated by the following expression: m X wi Li . ð11Þ i¼1

Taking the mean packet size and weight values of all queues into account, it is possible to approximate the amount of the output bandwidth for the given kth queue: wL P k k B; i wi Li

ð12Þ

4.2.2. QoS requirements Assume that each service class is associated with a queue of the WRR scheduler. Then, (12) approximates the bandwidth allocated for the whole class. However, if class k contains Nk active data flows, then each data stream obtains the bandwidth that can be expressed as follows2: wk Lk P B. N k i wi Li

Parameter can be understood as one of the QoS parameters that specifies the bandwidth that should be provided for each data flow in the kth traffic class. Thus, if Bfk is given, then a router must allocate a certain minimum amount of resources to satisfy the QoS requirements of all data streams. Based on this, it is possible to rewrite (13) in the following form:

2

wk P

m X Bfk N k wi Li . Lk ðB  Bfk N k Þ i¼1

ð15Þ

i6¼k

As in the case of WFQ, sometimes, there is a need to allocate resources on a per-class basis, regardless of the number of flows and their requirements. For instance, the best-effort class has no requirements at all, but resources should be provided for this class as well. For these purposes, it is possible to modify (15) as follows: wk P

m X Bk wi Li . Lk ðB  Bk Þ i¼1

ð16Þ

wk Lk P B P Bfk . w L i i i

Here Bk stands for the bandwidth requirements of the whole service class. Depending on the resource allocation strategy, a service provider chooses either (15) or (16). Along with bandwidth requirements, a certain service class must be provided with the delay guarantees. Since the WRR scheduler serves input queues in a cyclic manner, the processing of a packet can be delayed by ! X max wi Li =B i

ð13Þ

Bfk

Nk

While the right side of this inequality specifies the minimum amount of resources to be allocated, the left side specifies the amount of provided resources. Since the weights of the WRR scheduler control the allocation of resources, the task is to find such values of wk that provide enough bandwidth resources. It is possible to use (14) to determine wk:

i6¼k

k 2 ½1; m;

where B specifies the output bandwidth of an interface, on which a router implements WRR.

Bfk ¼

1045

ð14Þ

The fairness among data flows within a service class can be achieved by the Stochastic Fair Queuing (SFQ) [25] combined with the WRR scheduler and/or by the per-flow buffer management.

seconds. To decrease this delay, it is possible to introduce the Low Latency Queue (LLQ) that can work in two modes [35]: strict priority mode and alternate priority mode. In the strict priority mode, the WRR scheduler always outputs packets from LLQ first. However, it is difficult to predict the allocation of bandwidth for other queues in this case. Thus, the alternate priority mode will be considered, in which LLQ is served in between queues of the other service classes. For instance, if there are 3 input queues, numbered from 1 to 3, and queue 1 is LLQ, then the queues are served in the following order: 1–2–1–3–  . In this case, the processing of a packet in LLQ can be delayed by maxfwi Lmax g=B i i

1046

A. Sayenko et al. / Computer Networks 50 (2006) 1040–1058

seconds. As in the WRR scheduler, each queue is allowed to transmit no more than wi packets during a round. If a router implements LLQ, it is necessary to reformulate the above equations (11)–(15). Suppose, LLQ is identified by index l, where l 2 [1; m]. Then, it is possible to approximate the amount of data that the WRR scheduler outputs in a round: m X ðm  1Þwl Ll þ wi Li . ð17Þ i¼1 i6¼l

Taking account of the presented considerations (12)–(15), it is possible to derive an expression for the minimum weight values that satisfy all the bandwidth requirements of each service class: 1 0 wk P

m X C B Bfk N k Bðm  1Þwl Ll þ wi Li C f A; k 6¼ l; @ Lk ðB  Bk N k Þ i¼1 i6¼k;i6¼l

ð18Þ

wl P

Bfl N l ðm  1ÞLk ðB  Bfl N l Þ

m X

wi Li .

ð19Þ

i¼1 i6¼l

These constraints only reserve bandwidth for normal queues and LLQ, but they do not provide any delay guarantees. Suppose, that each data flow, which belongs to the class that has the delay requirements, is regulated by the Token Bucket policer [9] with the mean rate q and the burst size r. Thus, it takes the WRR scheduler r/B seconds to transmit the received burst under ideal conditions. However, if r is bigger than wlLl, then more time is needed to output the burst because the WRR scheduler will start to serve another queue. While the scheduler serves that queue, packets in LLQ can be delayed by maxfwi Lmax g=B i i;i6¼l

seconds at most. Thus, the queuing delay of packets in LLQ can be estimated by   g r r maxi;i6¼l fwi Lmax i .  1; 0 D ¼ þ max min B B wl Ll ð20Þ Here D stands for the worst-case delay, experienced by packets in LLQ. The term   r max  1; 0 wl Lmin l

just estimates the number of times the LLQ is interrupted by other queues. If r is less than wi Lmin i , then the burst is transmitted completely during one round. The previous expression does not consider the fact that the initial processing of LLQ can be delayed by the other queue being processed when a LLQ packet arrives at an empty queue. Thus, it is possible to introduce a corrected estimation:   g r r maxi;i6¼l fwi Lmax i D ¼ þ max  1; 0 min B B wl Ll g maxi;i6¼l fwi Lmax i B   g r r maxi;i6¼l fwi Lmax i . ; 1 ¼ þ max min B B wl Ll þ

ð21Þ

Based on the value of r and wl Lmin l , it is possible to consider two distinctive cases: 8   1 > max > r þ maxfwi Li g ; r 6 wl Ll ; ðaÞ > < i;i6¼l B   D¼ > g r maxi;i6¼l fwi Lmax > i > 1þ ; r > wl Ll ; ðbÞ : B wl Lmin l ð22Þ

The first inequality corresponds to the case when a burst is output completely in one round. However, as a service class aggregates multiple data flows, one could expect that the resulting burst size of the whole service class is bigger than r and is not bigger than Nlr. Since the latter value is usually larger than wl Lmin l , we will consider (22b). We will present it in the form of the inequality meaning that packets in LLQ can experience various queuing delays, however the worst-case delay must not exceed a certain value:   N lr maxi;i6¼l fwi Lmax g i Dl > 1þ . ð23Þ B wl Lmin l It is possible to rewrite it in the following form that is suitable for the optimization problem:   B Dl  1 wl Lmin  maxfwi Lmax g P 0. ð24Þ l i i;i6¼l N lr 4.2.3. Pricing criterion The pricing function for the WRR scheduler slightly differs from the one considered earlier for WFQ. Though it is possible to approximate the instantaneous revenue per time unit, it makes the function too complicated and non-linear. Instead, it is possible to approximate the revenue obtained during a round of the WRR scheduler. As presented

1047

A. Sayenko et al. / Computer Networks 50 (2006) 1040–1058

earlier, (11) approximates the amount of data the WRR scheduler outputs during one cycle. Thus, the mean revenue can be approximated as follows: m X C i ðÞwi Li ½monet.units n round. rðw1 ; . . . ; wm Þ ¼ i¼1

ð25Þ

In the LLQ mode, it is modified to the form: m X C i ðÞwi Li . rðw1 ; . . . ; wm Þ ¼ ðm  1ÞC l ðÞwl Ll þ i¼1 i6¼l

ð26Þ 4.2.4. General model The general adaptive model for the WRR scheduler comprises the pricing function (26) and constraints (18), (19), and (24). Parameter ci has the same purpose as in the case of the adaptive model based upon WFQ. The last set of constraints specifies that the weight values are integer numbers that must be greater than or equal to 1. It ensures that at least one packet will be transmitted from each queue during a round: 9 8 > > > > = < m X ð27Þ max ðm  1Þcl C l ðÞwl Ll þ ci C i ðÞwi Li > > > > i¼1 ; : i6¼l

subject to

ðm  1ÞBfk N k wl Ll þ Bfk N k

m X

w i Li

i¼1 i6¼k;i6¼l

þ ðBfk N k  BÞwk Lk 6 0; 8k ¼ 1;m; k 6¼ l; ðm  1ÞðBfl N l  BÞwl Ll þ Bfl N l

m X

wi Li 6 0;

i¼1 i6¼l





B Dl  1 wl Lmin  wk Lmax P 0; 8k ¼ 1;m; k 6¼ l; l k N l rl

wk P 1; wk 2 N; 8k ¼ 1;m. Alternatively, if a provider has to provide only bandwidth guarantees, then the model given above can be simplified significantly, i.e., the pricing function (25) and the constraint (15) are used: ( ) m X max ci C i ðÞwi Li ð28Þ i¼1

subject to m X wi Li þ ðBfk N k  BÞwk Lk 6 0; Bfk N k

8k ¼ 1; m

i¼1 i6¼k

wk P 1;

wk 2 N; 8k ¼ 1; m.

The presented adaptive model is the integer linear optimization problem. Since weights of the WRR scheduler are integer values, the solution for the optimization problem must be a set of integer weight values. One of the methods that can be used to calculate the optimal values of wi is the branch & bound algorithm. It should be noted that if the packet size is constant, as in the ATM networks, then we can simplify the model because there is no need to include Li term in the constraints and the target function. Furthermore, this model can be used with the MWRR scheduling policy [35] with minor differences in the target function and constraints. 4.3. Deficit Round Robin 4.3.1. DRR scheduler The DRR scheduler works in a cyclic manner serving input queues in sequence. During a round, a certain number of packets, determined by the value of the deficit counter, are sent from each queue. As all queues are served, the DRR scheduler updates the deficit counter using the quantum value (Qi) and begins the next cycle. As in WRR, it is possible to approximate the bandwidth allotted for a given class k based on the quantum values Qi: Q P k B; i Qi

k 2 ½1; m;

ð29Þ

where B is the bandwidth of the output link.

4.3.2. QoS requirements The general considerations for the DRR queuing discipline are the same as for WRR. So, we will present briefly the final formulas. Since quantum values control the allocation of the output bandwidth between service classes, the task is to find such values of Qk that all the QoS requirements are satisfied: Qk P

m Bfk N k X Qi . f B  Bk N k i¼1 i6¼k

ð30Þ

1048

A. Sayenko et al. / Computer Networks 50 (2006) 1040–1058

In the LLQ mode, there are two constraints to reserve the minimum amount of bandwidth resources: 0 1 Qk P

m X C Bfk N k B Bðm  1ÞQl þ Qi C f @ A; B  Bk N k i¼1

k 6¼ l;

Ql P

ðm  1ÞðB  Bfl N l Þ

m X

m X

ci C i ðÞQi

ð37Þ

> > ;

i¼1 i6¼l

subject to m X

9 > > =

Qi

i¼1 i6¼k;i6¼l

þ ðBfk N k  BÞQk 6 0;

Qi .

ð32Þ

i¼1 i6¼l

Two previous equations yield the final constraint:   B Dl  1 Ql  maxfQi g P 0. ð34Þ i;i6¼l N lr 4.3.3. Pricing criterion The pricing function for the DRR scheduler is very similar to WRR, except that quantum values Qi are used: C i ðÞQi ½monet.units n round.

i¼1

ð35Þ

In the LLQ mode it has the following form: rðQ1 ; . . . ; Qm Þ ¼ ðm  1ÞC l ðÞQl þ

8k ¼ 1; m; k 6¼ l; m X ðm  1ÞðBfl N l  BÞQl þ Bfl N l Qi 6 0; i¼1 i6¼l

ð33Þ

rðQ1 ; . . . ; Qm Þ ¼

> > :

ð31Þ

The worst-case delay of a packet in the LLQ is determined as follows:  8  > > 1 r þ maxfQi g ; r 6 Ql ; ðaÞ >

r maxi;i6¼l fQi g > > : 1þ ; r > Ql . ðbÞ B Ql

m X

ðm  1Þcl C l ðÞQl þ

ðm  1ÞBfk N k Ql þ Bfk N k

i6¼k;i6¼l

Bfl N l

max

8 > >
0; xj > 0.

ð2Þ

All balanced service rates can be expressed in terms of a unique balance function U so that U(0) = 1 and /i ðxÞ ¼

Uðx  ei Þ UðxÞ

8x : xi > 0.

ð3Þ

The steady-state distribution of the process is pðxÞ ¼ pð0ÞUðxÞ

N Y

qxi i ;

ð4Þ

i¼1

where p(0) is the normalization constant. The state distribution depends on the traffic characteristics only through the traffic intensities qi. 2.2. State-dependent arrival rates In the more general case with state-dependent arrival rates, the network is insensitive if and only if the function wi(x) defined as wi ðxÞ ¼

ki ðx  ei Þri /i ðxÞ

ð5Þ

is balanced [7] wi ðx  ej Þ wj ðx  ei Þ ¼ wi ðxÞ wj ðxÞ

8i; j; xi > 0;

xj > 0.

ð6Þ

Balance condition (6) is equivalent to the existence of a balance function W so that W(0) = 1 and wi ðxÞ ¼

Wðx  ei Þ WðxÞ

8x : xi > 0.

ð7Þ

Balance condition (6) is also equivalent to detailed balance conditions ki ðxÞpðxÞ ¼

/i ðx þ ei Þ pðx þ ei Þ ri

8x; i.

ð8Þ

These constitute a stricter requirement than the global balance conditions (1). The steady-state distribution of the system is pðxÞ ¼

pð0Þ . WðxÞ

ð9Þ

If the service rates are balanced by some function U, a network with state-dependent arrival rates is balanced if and only if the arrival rates satisfy balance conditions [7]

1062

J. Leino, J. Virtamo / Computer Networks 50 (2006) 1059–1068

ki ðx þ ej Þ kj ðx þ ei Þ ¼ ki ðxÞ kj ðxÞ

8i; j.

ð10Þ

The balance conditions are equivalent to the existence of a balance function K so that K(0) = 1 and ki ðxÞ ¼

Kðx þ ei Þ KðxÞ

8x.

ð11Þ

In this case, the steady-state distribution of the process is pðxÞ ¼ pð0ÞUðxÞKðxÞ.

ð12Þ

3. Application to data networks The queueing network model presented in the previous section can be used to model a variety of systems, including for example both circuit- and packet-switched communication networks and grid computing networks [11,12]. In this section, we discuss the modeling of data networks. We also present how the model can be used to analyze load balancing at packet or flow level. We focus on data networks but the methods used can also be applied to other kind of systems. The traffic is modeled at flow level. The flows are assumed to be elastic, i.e., the size of a transfer is fixed and the duration depends on the allocated bandwidth. Currently most Internet traffic consists of elastic TCP-flows, for example HTTP and FTP transfers. Also most peer-to-peer transfers are elastic TCP connections. If needed, the effect of nonelastic traffic, such as audio or video streams, can be approximated using the upper and lower bounds presented in [14]. The queueing network described in the previous section can be used to model data networks [6]. The customers in the queueing network represent data flows. The capacity of a PS node is the bit rate allocated for the flows in that node. We assume that every time the number of flows in the system changes, the bandwidth resources of the network are instantaneously reallocated among the flows. Thus, between epochs of a flow arrival or departure each flow receives a constant bandwidth. The communication network consists of nodes n 2 N and links l 2 L with capacities Cl. We assume that there are K traffic classes. Class-k flows arrive at rate kk and the mean flow size is rk. The traffic load (bits/s) of class-k is qk = kkrk. We introduce three different network models: fixed routes, packet-level balancing and flow-level

balancing. In the simplest approach, each traffic class has a fixed route. Network resources, however, are utilized more efficiently if the traffic load is controlled dynamically depending on the network state. A dynamic policy reacts to congestion and the load is distributed more evenly among the network. Instead of using a fixed route for each traffic class, the traffic load is divided between several routes. Load balancing can be executed either at packet or flow level. When packet-level balancing is used, a single flow can be split over several routes. Respectively, when flow-level balancing is used, each arriving flow is directed to one of the possible routes and the same route is utilized until the flow is finished. 3.1. Fixed routes The simplest network model uses a fixed route for each traffic class. In the queueing network model, each traffic class corresponds to a PS node. The capacity of the node is the bitrate (bits/s) allocated for the class. Class-k traffic utilizes route rk which is a set of links rk  L. Network state is denoted x = (x1, . . . , xK), where xk is the number of active class-k flows. The bitrate allocated for class k in state x is denoted /k(x). Allocations need to satisfy the link capacity constraints X /k ðxÞ 6 C l 8x; l. ð13Þ k:l2rk

The network is insensitive to the flow size distribution if and only if the allocation is balanced, i.e., it satisfies the balance condition (2). 3.2. Packet-level balancing

When packet-level balancing is used, a flow can be split over several routes at packet level. The flows are divided between the routes depending on the network state. The aim is to balance the load while retaining insensitivity to detailed traffic characteristics. When packet-level balancing is modeled using the queueing network model, each node represents a traffic class. The capacity /k(x) of a node k is the total bandwidth allocated for class-k traffic among all feasible routes. System state is denoted by a vector x = (x1, . . . , xK), where xk is the number of active class-k flows. Class-k flows can be split over routes r 2 Rk. Each route r consists of a set

J. Leino, J. Virtamo / Computer Networks 50 (2006) 1059–1068

of links r  L. The bandwidth allocated for class-k flows on route r is denoted /rk ðxÞ. The total bandwidth allocated for class-k traffic is /k ðxÞ ¼ P r r2Rk /k ðxÞ. The allocations have to satisfy the link capacity constraints X X r ð14Þ /k ðxÞ 6 C l 8x; l. r2Rk :l2r

k

The network is insensitive if and only if the total class capacities /k(x) satisfy the balance condition (2). The capacities on the individual routes /rk ðxÞ are not required to be balanced. 3.3. Flow-level balancing

When flow-level balancing is used, an arriving flow is directed to one of the possible routes and the same route is used until the flow is finished. The arriving flows are routed stochastically (with state dependent probabilities) to the available routes so that the system remains insensitive. In the queueing network model, each feasible route of a traffic class is represented with a PS node. The capacity of the node is the bandwidth allocated for the traffic class on the corresponding route. The arrival rates at the different PS nodes depend on the routing policy. System state is denoted by a vector x ¼ ðx11 ; . . . ; jR1 j jR j x1 ; . . . ; x1K ; . . . ; xK K Þ, where xrk is the number of active class-k flows on route r and jRkj is the number of different class-k routes. The total arrival rate of class-k traffic is kk. The arrival rate on route r is krk ðxÞ. The rates need to satisfy the traffic constraints X r ð15Þ kk ðxÞ 6 kk 8x; k. r2Rk

P

r r2Rk kk ðxÞ

< kk , part of class-k traffic is rejected. Bitrate allocated for class-k flows on route r is denoted /rk ðxÞ and assumed to be equally shared between these flows. The allocations have to satisfy the capacity constraints X X r /k ðxÞ 6 C l 8x; l. ð16Þ If

k

r2Rk :l2r

If the capacity allocation is fixed to some balanced allocation, the network is insensitive if and only if the arrival rates krk ðxÞ satisfy the balance condition (10). It should be noted, that in this case the capacities on the different routes /rk ðxÞ need to be balanced in contrast to packet-level balancing where the total class capacities /k(x) are balanced. A more

1063

efficient way is to balance the routing and capacity allocation jointly. In this case, arrival rates krk ðxÞ and capacity allocations /rk ðxÞ need to satisfy balance condition (6). 4. Optimal capacity allocation and load balancing policies In this section, we discuss several insensitive load balancing methods using the presented network models. With each method, our aim is to find the policy that minimizes the flow blocking probability of the system. Because we consider insensitive routing, the blocking probabilities do not depend on the flow size distribution with a given policy. More efficient routing policies may be found if the requirement for insensitivity is omitted. The best insensitive policy can be used to derive a lower bound for the performance of a sensitive network. We assume that the network has an access-control policy that guarantees a certain minimum bandwidth for all the flows. An arriving class-k flow is blocked if the minimum bit rate /min cannot be prok vided [13]. If a network has fixed routes or packetlevel load balancing is used, the state space S of the system is   /k ðxÞ min S¼ x: ð17Þ P /k ; 8k . xk The state space of a network utilizing flow-level balancing is   /r ðxÞ S ¼ x : k r P /min ; 8k; r 2 R ð18Þ k . k xk 4.1. Fixed routes First we consider the case without load balancing. Traffic class k is routed to a fixed route rk. The aim is to find the capacity allocation that performs best while satisfying the balance conditions (2) and the link capacity constraints (13). The higher the value of the factor U(x)1 in Eq. (3), the more bandwidth is utilized in state x. Balanced allocation with the highest bandwidth utilization can be determined recursively. The bandwidths allocated to different classes in state x are proportional to the values U(x  ei), xi > 0. The link capacities impose constraints on the maximum bandwidths. In order to determine the most efficient capacity allocation, U(x)1 is increased until a constraint is met. Capacity allocation maximizing the utilized bandwidth in

1064

J. Leino, J. Virtamo / Computer Networks 50 (2006) 1059–1068

every state is known as balanced fairness [7,15] and can be defined recursively by U(0) = 1 and X 1 Uðx  ek Þ. ð19Þ UðxÞ ¼ max l C l k:l2r ;x >0 k

k

BF is the balanced capacity allocation that is most efficient in the sense that at least one link is saturated in each state. It is also the insensitive allocation for which the network is empty with the highest probability [15]. 4.2. Static load balancing

4.4. Flow-level balancing

The simplest load balancing policy is static balancing executed at flow level. Arriving flows are stochastically routed in some fixed ratios among the different routes. The network is insensitive if the capacity allocation is balanced. The best such policy can be determined as a straightforward optimizing problem, see, e.g., [16]. 4.3. Packet-level balancing When packet-level balancing is used, we assume that all traffic is accepted as long as the minimum bandwidth bmin can be provided for each class-k k flow. Similarly to BF, capacity allocation utilizing the most bandwidth can be determined recursively [10]. In each state x, the balance function U(x) is minimized so that the balance condition (2) and the link capacity constraints (14) are satisfied. The optimization problem can be formulated as a linear optimization problem. For state x, the formulation is max

UðxÞ1 ;/rk ðxÞ

s.t.

UðxÞ X

1

ð20Þ 1

/rk ðxÞ ¼ UðxÞ Uðx  ek Þ

r2Rk

8k : xk > 0; X X r /k ðxÞ 6 C l k

ð21Þ 8l;

ð22Þ

r2Rk :l2r

/rk ðxÞ P 0

several parallel links limit the amount of traffic but the traffic classes can be split onto the links in different ways. The optimal packet-level load balancing problem can be defined in two ways. In this paper, we assumed that each class has a predefined set of possible routes. A more general problem can be formulated by allowing the packets to use arbitrary routes in the network. The problem can be formulated and solved as an LP problem utilizing network flows. For details, see [10].

8k; r;

ð23Þ

where (21) is the balance condition and (22) represents the link capacity constraints. The maximization problem can be solved using standard LP algorithms. ThePmaximal capacities of the traffic classes /k ðxÞ ¼ r /rk ðxÞ are unambiguous, but the capacity may be provided with different route capacities /rk ðxÞ. For example, this happens when

Optimization of flow-level balancing is a more difficult problem than the packet-level balancing problem. Instead of optimizing only the capacity allocation, also the routing probabilities need to be considered. As discussed in Section 2, the arrival process can be balanced either separately or jointly with the capacity allocation. Separate balancing is easier as the separate problems are smaller than the joint problem. On the other hand, separate balancing is more restrictive hence performance is worse than with jointly balanced allocation and routing. When routing and capacity allocation are balanced separately, we use BF as the allocation as it is the most efficient one. When fixed routes or packet-level load balancing is studied, the capacity allocation policy maximizing the utilized bandwidth can be determined recursively one state at a time. The ratios of the capacities are given and the amount of allocated capacity is decided. The obvious policy choice is to maximize the allocated bandwidth in every state. When flowlevel balancing is considered, the situation is not as simple. The amount of arriving traffic is given and the routing probabilities among the different routes need to be decided. There is no obvious optimal policy that could be determined recursively. In fact, the optimal routing policy cannot be determined recursively similarly to the capacity allocation problem but the whole state space needs to be considered at the same time. 4.4.1. Separately balanced capacity allocation and routing First, we consider the case with separately balanced routing and capacity allocation. We assume that the capacity is allocated according to balanced fairness as it is the most efficient balanced allocation.

1065

J. Leino, J. Virtamo / Computer Networks 50 (2006) 1059–1068

When a routing decision is made the best result is achieved if the state of the whole network is known at the time of the decision. In practice, this information is not always available. If a traffic class knows only the number of flows belonging to that class on each route it is said to have local information. The routing policy of class k depends only on the local state krk ðxÞ  krk ðxk Þ, where xk ¼ fxrk gr2Rk . When the state of the whole network is known at the time of the routing decision, the routing is more efficient than with only local information. For example, if a link is congested, the traffic can be routed to another link with less traffic. In [11], the authors present a method to determine optimal insensitive routing with local information when capacity allocation and routing are balanced separately. For a variety of objective functions, including blocking probability, the optimal policy is simple, i.e., there is only one local state where traffic is rejected. Finding the optimal policies is straightforward and fast. The routing problem with global information can be solved using the theory of Markov decision processes (MDP), see, e.g., [17,18]. We apply the linear programming (LP) formulation of the MDP theory to formulate and solve the problem. In the MDP-LP formulation, the state of the process consists of the network state and the routing decision. The routing vector is denoted d = (d1, . . . , dK), where dk = r if class-k traffic is directed to route r 2 Rk and dk = 0 if the class is blocked. p(x, d) is a decision variable in the LP problem and corresponds to the probability that the network is in state x and routing d is used. In the ordinary LP formulation of the MDP theory, global balance conditions (1) appear as linear constraints on the decision variables. In order to retain insensitivity, we impose the stricter detailed balance conditions (8) as constraints. The objective function of the LP problem is the total blocking probability. The probability that thePsystem is in state x and blocking class-k traffic is d:d k ¼0 pðx; dÞ hence the total blocking probability is X kk X X pðx; dÞ; ð24Þ k x d:d ¼0 k k P where k ¼ k kk is the total arrival rate. Using this notation the MDP-LP formulation of the problem reads X kk X X min pðx; dÞ ð25Þ pðx;dÞ k x d:d ¼0 k k

s.t.

kk

X

d:d k ¼r

pðx; dÞ ¼

/rk ðx þ erk Þ X pðx þ erk ; dÞ rk d

8x; k; r 2 Rk ; XX pðx; dÞ ¼ 1; x

ð26Þ ð27Þ

d

pðx; dÞ P 0 8x; d;

ð28Þ

where (26) is the detailed balance condition corresponding to Eq. (8). When the problem has been solved, the obtained values p(x, d) can be used to analyze the optimal policy. For example, the state probabilities are X pðxÞ ¼ pðx; dÞ ð29Þ d

and the class-k arrival rates at the different routes in state x are P d:d ¼r pðx; dÞ krk ðxÞ ¼ P k kk . ð30Þ d pðx; dÞ

4.4.2. Jointly balanced capacity allocation and routing In the previous section, capacity allocation was assumed to be separately balanced and fixed in advance (balanced fairness) and only routing was optimized. Better results can be obtained if routing and capacity allocation are balanced jointly. The problem of jointly balanced allocation and routing utilizing local information has been studied for networks with only one traffic class in [12], where a simple two-pass recursive algorithm is presented to solve the problem. Joint capacity allocation and routing with local information and several classes is still an open question. The problem with global information can be formulated and solved as a MDP-LP problem. In the jointly balanced problem, the decisions consist of capacity allocation and routing decisions. Let C rk ¼ minl2r ðC l Þ be the maximum feasible bandwidth for class-k traffic on route r. The allocation decisions are modeled with a binary vector jR j jR j b ¼ ðb11 ; . . . ; b1 1 ; . . . ; b1K ; . . . ; bK K Þ, where brk ¼ 1 if r bandwidth C k is allocated to class k on route r and 0 if no capacity is allocated. The decision variable in the LP problem, p(x, d, b), corresponds to the probability that the system is in state x, routing vector d is used and capacity allocation is b. When routing is considered, the accepted traffic may not exceed the offered traffic. The problem formulation takes this constraint

1066

J. Leino, J. Virtamo / Computer Networks 50 (2006) 1059–1068

into account implicitly. When capacity allocation is considered, the allocated capacity on any link may not exceed the capacity of the link. In addition to the detailed balance constraints, the capacity constraints need to be added explicitly to the problem. Additional constraints are also needed to guarantee the minimum bit rate /min for the accepted flows. k The MDP-LP formulation of the problem reads X kk X X X min pðx; d; bÞ ð31Þ pðx;d;bÞ k x d:d ¼0 b k k X X s.t. kk pðx; d; bÞ d:d k ¼r

b

Cr X X ¼ k pðx þ erk ; d; bÞ rk d b:b ¼1 d;r

8x; k; r 2 R X X k X X C rk pðx; d; bÞ r2Rk :l2r

k

ð32Þ

b:brk ¼1

d

XX

b:brk ¼1

XXX x

d

pðx; d; bÞ ¼ 1;

ð34Þ ð35Þ

b

pðx; d; bÞ P 0

8x; d; b;

ð36Þ

where (32) represents the detailed balance condition, (33) the link capacity constraints and (34) the minimum bit rate constraints. The optimal p(x, d, b) values can be used to analyze the capacity allocation and routing policy. State probabilities are XX pðxÞ ¼ pðx; d; bÞ. ð37Þ d

where jxj is the number of active flows in state x. The blocking probability of class k can be constrained to be less than threshold ak as follows: X X X pðx; d; bÞ 6 ak . ð41Þ x

pðx; d; bÞ 8x; l; ð33Þ 6 Cl d b XX pðx; d; bÞ xrk /min k d bX X 6 C rk pðx; d; bÞ 8x; k; r 2 Rk ; d

ables can be included in the problem. The traffic classes can have additional service requirements that need to be satisfied. The following examples are given using the formulation with jointly balanced capacity allocation and routing but the same equations can be written using the formulation with separate balancing. For instance, we may want to minimize the mean duration of the flows. Using LittleÕs formula, the objective function is PP P pðx; d; bÞjxj min x d Pb ; ð40Þ k kk

b

In state x, class-k arrival intensity on route r is P P d:d k ¼r b pðx; d; bÞ r kk ð38Þ kk ðxÞ ¼ P P d b pðx; d; bÞ

d:d k ¼0

b

Another possible requirement is that the average throughput of accepted class-k flows is at least ck X X X X C rk pðx; d; bÞ P ck . ð42Þ r2Rk

x:xrk >0

d

b:brk ¼1

5. Numerical results In this section, we compare the different insensitive load balancing methods in a simple network illustrated in Fig. 1. There are two parallel links with capacities C1 and C2 and three traffic classes. An adaptive traffic class can utilize both the links while the links also receive dedicated background traffic. The offered arrival rate of the adaptive class is k0 and the rates of the background traffic classes are k1 and k2. The number of active background flows on link i is xi and the number of adaptive flows is x0. In addition to the total number of adaptive flows, the number of flows in the individual links are needed when static or flow-level load balancing is used.

and the capacity allocated for class k on route r is P P d b:br ¼1 pðx; d; bÞ r r /k ðxÞ ¼ P Pk Ck . ð39Þ d b pðx; d; bÞ

4.4.3. Extensions to the MDP-LP formulations The LP formulation makes it possible to easily modify the objective function or the constraints. Any objective function or constraint that can be written as a linear expression in the decision vari-

Fig. 1. Example network.

1067

J. Leino, J. Virtamo / Computer Networks 50 (2006) 1059–1068

x0;i x0;i þxi

and the bandwidths are /0;i ðxÞ ¼ C i and i C . /i ðxÞ ¼ x0;ixþx i i We compare different policies using different traffic loads. Each background class is assumed to make up 10% of the total load and the mean flow sizes r0, r1 and r2 are assumed identical. We assume unit link capacities and the minimum bit rate /min is taken to be 0.2 for all the traffic classes. We compare the blocking probabilities of insensitive static load balancing, flow-level balancing with BF, flow-level balancing with jointly balanced routing and capacity allocation, packet-level balancing and sensitive flow-level balancing. In the network we are analyzing, the blocking probability of flowlevel balancing with local information is practically indistinguishable from the policy utilizing global information hence it is omitted from the results. The optimal policies are determined using the methods discussed in Section 4. In order to estimate the performance penalty caused by the insensitivity requirement, we also determined the performance utilizing the optimal sensitive flow-level balancing policy assuming exponentially distributed flow sizes. The optimal sensitive policy accepts all traffic and routes an arriving adaptive class flow to the link that provides more bandwidth. Fig. 2 illustrates the blocking probabilities as a function of the total offered traffic load. Static load balancing has the worst performance as expected. Flow-level balancing with separately balanced routing and capacity allocation outperforms static routing but is the least efficient dynamic policy. If routing and allocation are jointly balanced, slightly better results are achieved. Packet-level balancing is significantly better than flow-level balancing and it performs almost as well as the sensitive flow-level policy. A key factor is the more efficient bandwidth usage. When packet-level balancing is used, an adaptive flow can utilize the capacity of both the

1 Blocking probability

The number of adaptive flows on link i is denoted x0,i. The allocated bit rate on link i is /0,i(x) for the adaptive class and /i(x) for the background traffic. If capacity is allocated according to balanced fairness, the capacity of a link is equally shared between all the active flows utilizing that link. The balance function is    x1 þ x0;1 x2 þ x0;2 ð43Þ x1 x2 UðxÞ ¼ x1 þx0;1 x2 þx0;2 C2 C1

0.1 0.01 Flow level Joint routing and allocation

Static 0.001

Packet level Sensitive 0

0.25

0.5

0.75 1 Overall load

1.25

1.5

Fig. 2. Blocking probabilities.

links while with flow-level balancing only one link is used. Especially with low loads this has a significant effect. If an adaptive flow arrives in empty network, it utilizes twice the bandwidth when compared to flow-level routing. 6. Conclusions Load balancing has a significant effect on performance in many communication networks. Analysis of optimal load balancing is difficult because optimal policy and performance depend on detailed traffic characteristics. In this paper, we studied load balancing policies that are insensitive to flow size distribution. We extended and summarized our earlier work on optimal insensitive load balancing. Based on the insensitivity results by Bonald and Proutie`re, we studied insensitive load balancing in data networks executed either at packet or flow level. When insensitive load balancing is used, the flow size distribution does not affect the state distribution or performance of the system. When packet-level balancing is used, the most efficient capacity allocation policy can be determined recursively. The flow-level balancing problem is analyzed using the theory of Markov decision processes. By using the linear programming formulation of MDP theory, the optimal routing policy can be solved either separately or jointly with the capacity allocation. The size of the LP problem is proportional to the state space of the network, hence the approach is feasible only in small networks. The recursive packet-level approach is more scalable. We compared the performance of the different methods in a toy network. Flow-level balancing is the least efficient dynamic insensitive policy. The performance is improved if capacity allocation and

1068

J. Leino, J. Virtamo / Computer Networks 50 (2006) 1059–1068

routing are jointly balanced and optimized. Packetlevel balancing outperforms flow-level balancing regardless of the network load. Still, even in this case, some performance penalty has to be paid for insensitivity. Acknowledgement This work has been financially supported by the Academy of Finland (Grant No. 74524). References [1] A. Ephremides, P. Varaiya, J. Walrand, A simple dynamic routing problem, IEEE Transactions on Automatic Control 25 (1980) 690–693. [2] D. Towsley, D. Panayotis, P. Sparaggis, C. Cassandras, Optimal routing and buffer allocation for a class of finite capacity queueing systems, IEEE Transactions on Automatic Control 37 (1992) 1446–1451. [3] W. Whitt, Deciding which queue to join: some counterexamples, Operations Research 34 (1986) 55–62. [4] L. Massoulie´, J.W. Roberts, Bandwidth sharing and admission control for elastic traffic, Telecommunication Systems 15 (2000) 185–201. [5] G. Fayolle, A. de La Fortelle, J. Lasqouttes, L. Massoulie´, J.W. Roberts, Best-effort networks: modeling and performance analysis via large networks asymptotics, in: Proceedings of INFOCOM 2001, pp. 709–716. [6] T. Bonald, A. Proutie`re, Insensitivite bandwidth sharing in data networks, Queueing Systems and Applications 44 (2003) 69–100. [7] T. Bonald, A. Proutie`re, Insensitivity in processor-sharing networks, Performance Evaluation 49 (2002) 193–209. [8] J.W. Roberts, A survey on statistical bandwidth sharing, Computer Networks 45 (2004) 319–332. [9] J. Leino, J. Virtamo, Optimal load balancing in insensitive data networks, in: Proceedings of QoS-IP, 2005, pp. 313– 324. [10] J. Leino, J. Virtamo, Insensitive traffic splitting in data networks, in: Proceedings of the 19th International Teletraffic Congress (ITC19), 2005, pp. 1355–1364. [11] T. Bonald, M. Jonckheere, A. Proutie`re, Insensitive load balancing, SIGMETRICS Performance Evaluation Review 32 (2004) 367–377. [12] M. Jonckheere, J. Virtamo, Optimal insensitive routing and bandwidth sharing in simple data networks, SIGMETRICS Performance Evaluation Review 33 (2005) 193–204.

[13] L. Massoulie´, J.W. Roberts, Arguments in favour of admission control for TCP flows, in: Proceedings of the 16th International Teletraffic Congress (ITC16), 1999, pp. 33–44. [14] T. Bonald, A. Proutie`re, On performance bounds for the integration of elastic and adaptive streaming flows, SIGMETRICS Performance Evaluation Review 32 (2004) 235– 245. [15] T. Bonald, A. Proutie`re, On performance bounds for balanced fairness, Performance Evaluation 55 (2004) 25–50. [16] M.B. Combe, O.J. Boxma, Optimization of static traffic allocation policies, Theoretical Computer Science 125 (1994) 17–43. [17] H. Tijms, Stochastic Models: An Algorithmic Approach, John Wiley & Sons, New York, 1994. [18] E. Altman, Applications of Markov decision processes in communication networks: a survey, in: Handbook of Markov Decision Processes, Kluwer, Dordrecht, 2002, pp. 489– 536.

Juha Leino received his M.Sc. degree in engineering physics from Helsinki University of Technology in 2003. Currently he is a Ph.D. student in the Networking Laboratory of Helsinki University of Technology. His research interests include queueing theory and performance analysis of data networks.

Jorma Virtamo received the M.Sc. (Tech) degree in engineering physics and D.Sc. (Tech) degree in theoretical physics from the Helsinki University of Technology, in 1970 and 1976, respectively. In 1986, he joined VTT Information Technology, where he led a teletraffic research group, and became a Research Professor in 1995. Since 1997 he has been a Professor in the Networking Laboratory of Helsinki University of Technology. His current research interests include queueing theory and performance analysis of the Internet, ad hoc networks and peer-to-peer networks.

Computer Networks 50 (2006) 1069–1085 www.elsevier.com/locate/comnet

Capacity planning in IP Virtual Private Networks under mixed traffic Raffaele Bolla *, Roberto Bruschi, Franco Davoli DIST—Department of Communications, Computer and Systems Science, University of Genoa, Via Opera Pia 13, 16145 Genova, Italy Available online 6 October 2005

Abstract A new mechanism for capacity planning of a Virtual Private Network, carrying both Quality of Service (QoS) and BestEffort (BE) IP traffic is introduced. The link bandwidth allocation in the network is based on a hybrid combination of analytical and simulation models. The allocation tool works in three phases: phase 1 and 2 deal with the assessment of the capacities needed to satisfy requirements for QoS and BE traffic individually; the effects of multiplexing gains arising from traffic mixing are taken into account in phase 3. There are two analytical parts, based on a simplified version of calllevel traffic models, derived from teletraffic theory of circuit-switched networks, and on an algorithmic procedure to compute the distribution of average values of BE traffic flows, respectively. The same algorithm is used inside a fast simulation model, adopting a fluid representation of TCP traffic aggregates. The tool is tested both on simple networks and on a more complex real national backbone.  2005 Published by Elsevier B.V. Keywords: Virtual private networks; Network planning; Multiservice networks; QoS; Hybrid models

1. Introduction The Internet at large has not yet seen a dramatic increase in Quality of Service (QoS) traffic. As a matter of fact, though rather sophisticated technologies exist for QoS support over IP networks, their full exploitation is not generally felt as a must by many Service Providers and, if packet loss, time delay or bandwidth constraints have to be maintained for some user applications, their satisfaction is most often achieved by over-provisioning [1]. *

Corresponding author. E-mail addresses: raff[email protected] (R. Bolla), roberto. [email protected] (R. Bruschi), [email protected] (F. Davoli). 1389-1286/$ - see front matter  2005 Published by Elsevier B.V. doi:10.1016/j.comnet.2005.09.012

On the other hand, an attractive playground for experimenting and offering truly mixed Best-Effort (BE) and QoS traffic environments with a customer-limited scope may be that of Virtual Private Networks (VPNs) or Enterprise Networks. Indeed, VPNs have become one of the most interesting ‘‘new’’ services, which involves potentially all traffic types and has a lot of customer-attractive aspects; as such, is has received the attention of both the standard bodies and the scientific community (see, among others, [2–7]). In the last decade a wide range of network architectures and technologies has been proposed (and used) to support this kind of service. Today, most VPNs are realized by using ‘‘layer 2’’ (Frame Relay or, in a broad interpretation of layer 2, ATM and

1070

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

MPLS) and layer 3 technologies (IP over IP, GRE, IPSEC) [8]. But it should be noted that the same type of service may be built also by using layer 1 technologies (as SDH or WDM) [9]. In particular, we refer to the new layer 1 protocols, able to guarantee bandwidth allocations on the network links with a minimum acceptable level of granularity and efficiency, e.g., the SDH Generic Framing Procedure (GFP) [10], which provides leased links with a bandwidth allocation up to 10 Gbps and a minimum granularity of about 1 Mbps (with this extension, one may indeed consider SDH as another layer 2 technology). All these technological solutions (layer 1, 2 and 3) can be used to realize VPNs with guaranteed bandwidth allocations and support for QoS algorithms (see, e.g., [11,12]). In this paper, the bandwidth dimensioning of a fully integrated data and multimedia VPN with heterogeneous traffic patterns and requirements is approached, and a possible integrated tool for its solution, mixing analytical and simulation techniques, is proposed and analyzed. The metric considered in the dimensioning is throughput only (we do not explicitly consider delay or delay jitter constraints, nor pricing issues). The paper extends the formulation and the results of [13]. The planning/ design phase is always a very complex and difficult task in data networks and integrated services packet networks [14–16]. It assumes a specific role in this context, where it has to be solved a number of times, at least at each new VPNÕs activation upon a customerÕs request. As to what concerns the need of precise sizing of a VPN, it can be noted that, nowadays, backbone and local networks are often characterized by capacities so high to be heavily under-utilized: as reported by the traffic measurements of many Internet Service Providers (ISPs) the bandwidth occupation peaks are about 10% of the overall available capacity. But, in spite of all that, bandwidth on-demand is still quite an expensive resource, which makes the optimal dimensioning for VPN planning a strongly topical issue. We propose a set of analytical and simulationbased mechanisms and tools to optimize the link bandwidth allocations for fully integrated data and multi-service IP VPNs. The simulation part concerns mainly Best-Effort (TCP) services and is based on the fluidic TCP traffic model that has been introduced in [17]. This type of approach allows us to impose very specific and accurate constraints on the traffic performance, but it requires the knowledge of the traffic matrix, in terms of mean genera-

tion rates per traffic classes. The proposed mechanism could be quite easily extended to provide complete VPN planning (including the topology design). In the scientific literature, some different approaches can be found to this kind of problem. One of the most interesting is that based on the ‘‘hose’’ model, which has been proposed and recently developed [18,19]. This technique does not need a complete traffic matrix specification among the connected customer nodes. As underlined in [19], the hose model is characterized by some key advantages with respect to the others (as the ease of specification, the flexibility or the multiplexing gain), but, especially in a static capacity provisioning environment, this kind of model might over-dimension the network resources. Other approaches, based on different types of technologies, can be found in [20–22], among others. The rest of the paper is organized as follows. The main hypotheses and the basic characteristics of the problem and scenario are summarized in Section 2. Section 3 reports the traffic models applied in the optimization procedure, while in Section 4 the proposed dimensioning mechanism is described. Section 5 shows some numerical results obtained from the toolÕs application, while Section 6 reports the conclusions. 2. Network, traffic and problem description In a multi-service environment, it is reasonable to suppose that the network can carry at least two main sets of traffic: a first one composed by all multimedia streams generated by applications that require QoS constraints (i.e., VoIP, videoconference, or similar), indicated as QoS traffic in the following; another set that groups together all the standard best effort data traffic (WWW, e-mail, FTP, remote login, etc.). We assume that the bandwidth is allocated to a VPN statically and that it is not shared with other VPNs. To achieve the desired end-to-end service level for all traffic classes, we suppose that the ISP provides the needed set of infrastructure for QoS support on its backbone and, specifically, that this is achieved within a Differentiated Services approach. In fact, as is well described in [11], the DiffServ environment suits VPN applications very well for many and different reasons; first of all, DiffServ handles traffic aggregates and it is able to differentiate the service levels per VPN and/or per traffic types. The DiffServ network architecture aims at satisfying common-class QoS requirements, by

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

controlling network resources at the nodes (Per Hop Behavior, PHB). This is done by setting suitable queue service disciplines at the Core Routers (CR), by limiting the amount of traffic with QoS requests in the network, by assigning different priorities, and by performing Call Admission Control (CAC) at the Edge Routers (ER) of the providerÕs network. Thus, to achieve the desired service level, the providerÕs Core Routers need to apply suitable queue management mechanisms that forward QoS traffic packets with a certain priority, or, better, a different PHB, with respect to BE ones. This differentiation should be done on two levels: among VPNs, and among different traffic types inside each VPN. In this sense, we can suppose the VPN bandwidth to be allocated by using global VPN MPLS pipes; so, our proposed tool tries to find the optimal size of these pipes. Inside the VPN, we suppose each QoS flow, i.e., QoS traffic streams between the same pair of ISP border routers, to be activated dynamically and subjected to Call Admission Control (CAC). More specifically, for what concerns the QoS traffic, we suppose that each stream has to ask for bandwidth during the setup phase and, if it is accepted (by CAC), the allocated bandwidth is in principle completely dedicated to it. To realize this type of allocation, the QoS streams are treated as Connection Oriented (CO) flows and routed by using a static routing algorithm. This means that every time there is a new QoS stream activation request, a static path is computed and the availability of the requested bandwidth along this path is verified. If the bandwidth is available, it is reserved along the whole path (e.g., by a second level MPLS path setup) and the traffic is sent along that route. The BE traffic uses all the VPN bandwidth not utilized by the QoS streams and it is supported by datagram adaptive routing, where the link weights are computed as the inverse of the bandwidth available on the link for BE. In our mechanism we provide a bandwidth threshold for the QoS traffic on each link, which avoids that the QoS streams occupy all the available resources of the VPN. In summary, our planning problem is the following. We assume knowledge of the QoS traffic matrix, in terms of average stream activation request rates, durations and bandwidth requirements for each source–destination (SD) pair (we will refer to a QoS ‘‘class’’ to indicate calls with a given SD pair and bandwidth requirement). The BE traffic matrix contains the average arrival rate and the

1071

average length of data bursts (blocks of data with random size, representing all the data to be carried by a single TCP connection, which will be described in more detail in Section 3) for each SD pair. Moreover, we define QoS constraints at call level, in terms of the maximum blocking probability for the QoS traffic stream requests. We fix a constraint with respect to the BE traffic in terms of the maximum number of ‘‘congestion periods’’ (to be precisely defined in Section 4) of a certain duration, acceptable during a day. The latter is defined as a stochastic constraint (i.e., the maximum number with a certain probability). Given these initial data and the topology of the VPN, we find the minimum value of the VPN link capacities and, within these, of thresholds on the volume of QoS traffic, which together allow the satisfaction of the constraints. Note that the QoS and BE traffic flows are modeled at the edge nodes of the ISPÕs network as aggregates: this means that the same node can generate multiple QoS or BE flows. In particular, we think of an aggregate as composed by all flows that traverse the network between the same pair of nodes. The QoS flows are generated at each source with exponentially distributed and independent interarrival and service times; the BE ones follow exponentially distributed and independent inter-arrival times, whereas the dimension of the data blocks is modeled as a Pareto-distributed random variable. 3. The BE traffic models While the guaranteed QoS traffic can be reasonably described as Continuous Bit Rate (CBR) or Variable Bit Rate (VBR) streams, which are anyway modeled at the flow level as requests for ‘‘bandwidth units’’ within a bandwidth pipe, it is quite difficult to capture the most important characteristics of BE traffic. This difficulty derives almost entirely from the presence of TCP, which is characterized by an adaptive and heavily non linear congestion control, as transport protocol for data traffic. On the other hand, TCP is a fundamental element for the evaluation of BE performance and it cannot be neglected. TCP behavior has been studied for many years; a good deal of models have been proposed and can be found in the scientific literature. However, most of the proposals describe the congestion control mechanism in detail, and they generally have a limited scalability. Moreover, traffic models play a strategic role inside the optimization procedures used in

1072

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

network planning and dimensioning, where they often need to be applied repeatedly: they usually determine in large part the scalability level and the accuracy of the whole procedure. Thus, we have chosen to adopt simple traffic models that can assure high scalability levels. In particular, we use two similar representations, derived from [17], which are both based on a fluid approximation. This kind of approximation permits realizing TCP traffic models [23–25] able to guarantee a better scalability than the ‘‘classical’’ ones (i.e., those based on finite-state Markov chains; see, e.g., [26,27]), as it represents the behavior of TCP connections without explicitly taking into account the dynamics at the packet level. The very simple framework adopted can capture the aggregate behavior of network traffic between an ingress–egress router pair, without describing the performance of each single TCP connection. The first model is substantially realized as a simulator (with flow birth and death events) that also allows observing the transient behavior of some network performance indexes (namely, throughput and link bandwidth utilizations). The other is an algorithmic procedure (based on the same principle adopted in the simulator), able to obtain an estimation of the average values of the performance indexes with a very low computational effort. They will be referred to as the ‘‘simulative model’’ and the ‘‘stationary model’’, respectively. Both are based on a simple aggregate fluid approximation that describes all the data exchanges between the same source and destination pair as a single fluid: in other words, each flow represents a set of aggregate TCP connections with the same path in the network. 3.1. The simulative model The simulative model represents the incoming traffic as blocks of data with random size, which become available (for transmission) at a source node at random instants. Each block (or ‘‘burst’’) represents all the data to be carried by a single connection. The details of this type of traffic source can be found in Appendix A. Each source–destination pair is modeled as a tank, and the network links represent the pipes, which drain the tank, while the data blocks fill it. Where more than one ‘‘fluid data’’ source (either coming directly from a tank or from a previous link) has to share a single pipe, this sharing is guided by a specific law, which, roughly speaking, divides the

capacity of the link in proportion to a sort of ‘‘pressure’’ of the crossing flows. This source pressure is simply given by the traffic offered load (in terms of bits per second) of the aggregate flow. The source–destination path of each flow is decided by the routing algorithm. The rate (in bits per second) associated to each aggregate flow is fixed by the link along the path, which assigns the smallest available bandwidth to that specific flow (the bottleneck link). These rates are always dynamic ones; they can change when a tank in the network becomes void, when it is void and receives a block of data, or when the routing algorithm changes one or more paths. Thus, the core procedure is represented by the algorithm to compute these rates. This algorithm is substantially based on the max–min rule described in [28]. It is worth noting that, though the max–min sharing is known not to hold when considering the behavior of individual TCP connections, it can be used as a reasonable (and fast) approximation to represent fairly well the way the bandwidth is subdivided among aggregates of TCP connections between the same SD pairs. This characteristic has been confirmed by numerous simulation experiments, over a wide range of operating conditions [17]. This kind of model results in a very fast simulator, which gives the possibility of computing the values of performance indexes between consecutive significant events (as specified above) numerically in quite a precise way, and also to follow the time evolution of these indexes. Moreover, any routing algorithm can be used with it, without modifying anything in the system. For the notational aspects, we indicate with N the set of the nodes, with P the set of flows p between each node pair (i, j): i 5 j, i, j 2 N. The instants tk, k = 0, 1, 2, . . . ,are those of occurrence of the significant events (i.e., the instants when a fluid buffer changes from filled to empty or vice versa, or when there is a change in a BE routing ðpÞ table), and rtk is the rate associated to the flow p 2 P during the period [tk, tk+1). The model acts at each event instant tk, k = 1, 2, 3, . . . , by re-computing the algorithm described in Appendix B, obtaining the value of rate ðpÞ rtk for every aggregate flow p 2P. 3.2. The stationary model This model has been studied and proposed with the aim of obtaining the average link bandwidth uti-

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

lizations in a very fast way. It is based on the same max–min criterion used in the simulative model, but it is realized as a simple ‘‘one-shot’’ algorithmic procedure. The latter derives from the simple consideration that the average bandwidth occupancy of an aggregate flow should be equal to or less than its mean offered load. In particular, a flow that does not cross any bottleneck links (i.e., the links without enough available bandwidth) is expected to have an average throughput almost equal to its mean offered load, whereas the flows that cross at least one bottleneck link should be limited to a smaller average throughput value. In this last case, the average throughput value is thought to be limited by the most restrictive bottleneck link on the flow path, and the resources of shared links are divided among the flows crossing them, according to the offered load ‘‘pressure’’, as in the simulative model. It must be noted that the model is used in a first rough dimensioning case, up to the satisfaction of a given link utilization limit, which will be detailed in the next section. Thus, these average throughput values are calculated by applying only once (at each iteration of the dimensioning procedure) the same max–min algorithm used in the simulative model and reported in Appendix B, with all the BE traffic sources active ðpÞ at the initial step and by setting the Rmax parameter equal to the average offered load of the aggregate flow p 2 Pe (with reference to the notation of Appendix B). 4. The dimensioning mechanism The proposed dimensioning mechanism is composed of three different parts: the first two (QoS module and BE module) are essentially used to identify a set (one for each link) of small capacity ranges, inside which the optimal solution should be contained; the last one finds the final solution by searching the optimal assignments inside these reduced ranges. More in particular, the QoS module dimensions the network by considering only the QoS traffic, while the other (BE module) does the same action by considering only the BE traffic. The QoS module finds the link capacities needed to maintain the blocking probability of QoS stream requests under a certain threshold. By using the stationary model, the BE module finds the link capacities needed to maintain the maximum (over the links) average utilization around a reference value (e.g., 80%), which should be the maximum link uti-

1073

lization needed to maintain an acceptable performance level for the TCP traffic flows. From these two modules, which can operate in parallel, we obtain two capacity values for each link; their sum represents an upper bound on the link bandwidth dimensioning, when the two traffic types are carried by the same network. In this case, we neglect the multiplexing gains generated by the partial superposition of the two traffic types. Consider now the VPN dimensioned with the above described upper bound (for each link, the sum of the two capacities found by the two modules). First of all, note that the capacity values found for the QoS traffic are also the QoS thresholds, i.e., the maximum capacities usable by QoS traffic on the links. This means that the network considered correctly respects the performance constraints on the QoS traffic, i.e., the network is correctly dimensioned with respect to it. However, one main problem remains: we have done an overprovisioning for BE traffic. There are two reasons why this happens. First, because this kind of traffic can utilize both the capacity computed by the BE modules and any amount of bandwidth temporarily left unused by the QoS traffic. Second, the constraint imposed on the BE traffic by the BE module is quite rough and conservative. To overcome this drawback, the final module first computes the occurrence distribution histogram of the capacity used by QoS traffic per link; this is done by simulation for each link of the network dimensioned by the QoS modules. By using this result, we can obtain the value of each link capacity occupied by QoS traffic for more than d% of the time (e.g., 90%). For each link, we add this value to the one computed by the BE module to identify a bandwidth lower bound. At this point, we have the capacity ranges for the final solution. Thus, we define a more precise performance constraint for BE traffic: we want to obtain a minimum capacity dimensioning that assures a maximum of v BE congestion periods per day with probability b, where a congestion period is considered as each time interval larger than s seconds, in which at least one network link exhibits 100% utilization value for the BE traffic. Finally, we operate by considering the two traffic types active and by using the two bounds described above and the QoS threshold (computed by the QoS module) to find the minimum capacities needed to assure the last BE constraint. The reduced range is required because of the relatively high complexity of this last computation. Let us now describe in detail the

1074

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

operation of the three modules that compose the proposed dimensioning mechanism. 4.1. QoS traffic dimensioning As previously described, we consider the QoS traffic as composed by streams, which are activated dynamically at the edge nodes of the VPN and which are subject to CAC at those nodes. The hypotheses are that each stream is routed as a virtual circuit (the path is decided at the setup) and during the set-up phase it asks for an amount of bandwidth. If this bandwidth is available along the source–destination path identified by the routing, the stream is accepted in the VPN, otherwise it is refused. Each stream in progress occupies all its assigned capacity. Indeed, this is equivalent to assuming the following conditions to hold: (i) CBR traffic, (ii) shortest path static routing, (iii) throughput constraints only (no delay, jitter, etc.), (iv) peak rate admission control. However, it is worth noting that, in the case of VBR traffic or in the presence of other QoS constraints beside throughput, we implicitly suppose the QoS problem at the packet level to be already solved by the node schedulers. In other words, we suppose to operate at the call level, over a schedulable region that ensures the satisfaction of packet-level constraints, and the bandwidth need expresses, in these cases, a requirement in terms of equivalent bandwidth [29]. Thus, we model only the dynamics of stream activations and completions, and we take into account quality constraints at the stream level only. With this approach, for what concerns the QoS traffic, we can use a product-form loss network model identical to that described in [29]. We consider a VPN with J links that supports K stream classes. Every ðkÞ ðkÞ class-k stream has kQoS [stream/s] arrival rate, lQoS 1 [s ] departure rate, a bandwidth requirement b(k) and a route R(k), and the k-class offered load is ðkÞ

ðkÞ

qðkÞ ¼ kQoS =lQoS .

ð1Þ

Note that, owing to the use of a static shortest path routing, we have only one path for each source–destination pair. Moreover, we suppose the amounts of required bandwidth belong to a finite size set B = {b(r), r = 1, . . . , R} of admissible sizes. The stream arrival and departure instants are supposed to be generated by a Poisson process (both arrival and duration times follow an exponential distribution). An arrival class-k request is admitted

in the network if and only if there are at least b(k) free capacity units along each link of R(k). Our objective in this context is to find the minimum e QoS ; 8j ¼ 1; . . . ; J ; needed to capacity assignment C j respect the following constraints on the blocking probability: ðkÞ

PbðkÞ 6 Pb ;

k ¼ 1; . . . ; K;

ð2Þ

(k)

where Pb is the blocking probability of the ðkÞ streams of class k and Pb is its upper-bound constraint. The equilibrium distribution of this loss network and the corresponding blocking probabilities can be computed by using a stochastic knapsack problem solution. However, this product form computation of Pb(k), k = 1, . . . , K, for arbitrary topologies requires the solution of a NP-complete problem (see [29]), and it can be computationally heavy in our context (we have to use itQoSinside the minimizae ; 8j ¼ 1; . . . ; J , we tion procedure). To find C j use an approach that follows the same philosophy as the whole mechanism: at first we apply an approximate fast analytical method to find a starting temporary solution not too far from the optimal one, and finally we improve it by using a simulation. The first step is to define a simple method to find ; j ¼ 1; . . . ; J . Pb(k), k = 1, . . . , K, given the C QoS j The idea is to use an approximate computation link by link by supposing, in principle, the setup requests and CAC functions to be realized for each link along the stream path. For the sake of simplicity, let us suppose there is only one rate for all classes, i.e., b(k) = 1 bandwidth unit, "k = 1, . . . , K. Indicatj the total offered traffic to link j, and ing by q expressing C QoS in bandwidth units, the blocking j probability per link can be computed by the Erlang B formula as C QoS

Pbj ¼

ER½ qj ; C QoS  j

ð qj Þ j =C QoS ! j . ¼ PCQoS s j ð q Þ =s! j s¼0

ð3Þ

e QoS ; 8j ¼ Eq. (3) may be used to determine the C j 1; . . . ; J , if we are able to transform the end to end probability constraints (2) into the following link constraints: ðkÞ

ðkÞ

Pbj 6 Pbj ;

k ¼ 1; . . . ; K.

ð4Þ ðkÞ Pbj ,

k = 1, . . . , K (or So, now the problem is to find Pbj in the single rate case), and we solve it by fixing X ðkÞ Pb ¼ Pbj . ð5Þ j2RðkÞ

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

This choice represents an upper-bound in the single rate case, and in general it results in a capacity assignment usually rather close to the optimum one. Then, we use the following algorithm, by which ðkÞ we ‘‘distribute’’ the Pb probabilities along the links. Let us introduce some notations: max

is the maximum blocking probability • Pbj constraint associated to link j; • RðkÞ res is the set of links in the residual path for class k; • PbðkÞ res is the residual blocking probability for class k; • PbðkÞ of the blocking probability of eq is the value ðkÞ class k if Pb would be equally spread on the path links; e is the set of the classes k with RðkÞ ¼ ;. • K res The proposed algorithm works as follow:

ðkÞ e ¼ ; and 1. Let PbðkÞ k = 1, . . . , K, K res ¼ Pb , ðkÞ ðkÞ Rres ¼ R , k=1, . . . , K. ðkÞ ðkÞ ðkÞ e 2. Compute PbðkÞ eq as Pbeq ¼ Pbres =jRres j; 8k 62 K (jXj denotes the cardinality of the set X). 3. Choose the class ~k(the class with the most restricðkÞ ~ e tive constraint):  k ¼ argmink fPbeq g; 8k 62 K and e e [ ~k . K K ~ kÞ 4. For each link j that belongs to Rðres : ~ ðk Þ (a) Pbj ¼ Pbeq , e , if the link j (b) for each traffic class k 62 K belongs to RðkÞ then do: RðkÞ RðkÞ res res res  fjg ðkÞ ðkÞ and Pbres Pbres  Pbj . e 5. If j K j ¼ K then stop, else return to 2.

By using the Pbj ; j ¼ 1; . . . ; J ; computed by the above algorithm, to find a suboptimal value of the ^ QoS ; j ¼ 1; . . . ; J , we can adopt (3) link capacities C j in the case of a single class rate, and the per class link blocking probabilities resulting from a stochastic knapsack in the multirate case. These values result in upper-bounds in the single rate case (this can be demonstrated by using the ‘‘Product Bound’’ theorem [29]) and in general they are rather close to the optimum quantities if the constraints are very restrictive (60.01) and the pathÕs maximum length is not too large. To obtain the final result, we start ^ QoS , j = 1, . . . , J, network configuration with the C j and apply an optimization procedure by using a simulative approach. The stream births and deaths

1075

are generated to find the exact (within a given confidence interval) end-to-end blocking probabilities, and if they are more restrictive than the constraints the capacities on the links are reduced (or increased if they do not respect the constraints) and the simulation is re-started in an interactive procedure. This kind of simulation is quite fast and the number of iterations needed are very few, so a near to optimal solution can be found in a very short time. 4.2. BE traffic dimensioning To allocate network resources for BE is a quite complex task, because this kind of traffic has no QoS guarantees and it exhibits a very elastic and adaptive behavior (caused by the TCP congestion control): the average BE performance can improve according to the available resources. By the way, our aim is not to maximize the BE performance, but to find a fair settlement between the service level and the allocated resource quantity. Thus, as the BE performance heavily depends on the available bandwidth of links traversed, we have decided to perform a first rough network dimensioning for BE traffic in this module, by imposing a high average link utilization, but quite distant from the network saturation point. In particular, since different simulation results and measurements [30] have shown that for a link loaded with TCP traffic composed by many TCP connections, when the global offered load increases above 80%, the performance of each single connectionÕs end-to-end throughput decreases very quickly, we have chosen to impose an average link utilization equal to this experimental threshold. This phase of the mechanism is substantially based on the stationary traffic model reported in Section 3.2, and it performs a first BE traffic dimensioning, by trying to find the minimum capacities e BE ; 8j ¼ 1; . . . ; J , for which the maximum average C j throughput is around 80%. The flow model is used inside a descent procedure, which stops when the bandwidth on each network link is such that the average throughput satisfies the imposed constraint. For the notational aspects we define: BEðiÞ

• Cj the bandwidth assignment to link j at step i; i • L the set of links traversed by the BE traffic at step i; • gBE the minimum bandwidth granularity;

1076

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

• qij the throughput on link j at step i, measured by the simulative process; • c the required link bandwidth utilization (we always use 0.8 in this paper); • Dij the maximum bandwidth tolerance range for link j at step i. The bandwidths of all the network links are initialized with a value above the sum of all the BE traffic offered loads. At the generic step i the algorithm works as follows: 1. 2. 3. 4. 5.

BEðiÞ

"j 2 Li do Dij ¼ gBE =C j . BEðiþ1Þ BEðiÞ If qij < c  Dij then C j ¼ Cj  gBE . BEðiþ1Þ BEðiÞ If qij > c þ Dij then C j ¼ Cj þ gBE . If c  Dij < qij < c þ Dij 8j 2 Li then stop. Run another simulation with the new bandwidth allocation and return to step 1.

4.3. The final dimensioning As already stated in Section 2, the output of the previous two steps gives us: • An upper-bound capacity assignment represented by the values e BE ; e QoS þ C C max ¼C j j j

8j ¼ 1; . . . ; J .

ð6Þ

• The CAC thresholds for the QoS traffic (i.e., e QoS , "j = 1, . . . , J). C j

We recall that the capacity assignment (6) represents an upper bound, because it corresponds to a very conservative configuration with respect to BE traffic. Thus, the mechanism runs a simple network simulation, whose duration is fixed by a stabilization algorithm (i.e., a T-Student based mechanism with a 95% confidence range of 1%), by generating only the QoS flowsÕ birth and death events and, during this simulation, it builds an occurrence distribution histogram for the bandwidth occupancy on each link, i.e., it computes the percentage of time Dj(C) during which link j has C bandwidth units occupied by QoS traffic. This information gives us e QoS;min , the possibility to identify the capacity C j;n below which the QoS utilization of link j remains for n% of the time, i.e., ( ) C X QoS;min e Dj ðiÞ > n . ð7Þ ¼ min C : C j;n i¼0

Thus we consider e QoS;min þ C e BE C min j;n ¼ C j;n j

ð8Þ

as the lower bound assignment, and e QoS;min e QoS  C Gmax j;n j;n ¼ C j

ð9Þ

as the maximum gains that we can obtain when the two traffic types are carried on the same network. In fact, it is reasonable to think that in a network dimensioned with the C min capacities, when the j;n QoS bandwidth occupancy on one or more network e QoS;min bandwidth links begins to utilize more than C j;n units, the BE connections that cross those links might not receive the minimum capacities needed for an efficient average operation. Moreover, as the QoS traffic dynamics are generally very slow with respect to BE traffic dynamics, these periods of resource starvation for BE might be excessively long, and they might generate sensible deterioration of the user-perceived performance: the largest part of the TCP connections during this kind of period suffers packet drops or expired timeouts; so, they apply the congestion avoidance mechanism, lowering their throughput and increasing their transfer time. An example of this condition is shown in Fig. 1, which reports some measurements obtained by ns2 (network simulator 2, [31]) simulations. The large difference between BE and QoS traffic modelsÕ dynamics is the main reason why the global simulation can be computationally very heavy. So, in this last step we have realized a sort of ‘‘smart’’ simulation tool, which operates in the following way. The algorithm is iterative and it executes one simulation at each step. During the general iteration step i, the main procedure simulates the activations and terminations of the QoS traffic streams. Only when at least on a link j the bandwidth occupied by the QoS traffic overcomes a ðiÞ threshold T j , is the aggregate flow BE simulator activated and the BE performance is monitored and measured. Obviously, the BE simulator is stopped when all the QoS traffic bandwidth ðiÞ occupancies decrease below T j . At step i = 0, we ð0Þ e QoS;min . e min and T ð0Þ initialize C j ¼ C j ¼ C j;n

j;n

While the BE traffic is injected into the network, we collect the BE average bandwidth utilization on all the links every s seconds. Every time we find a linkÕs utilization equal to 100% on a generic link j, we count a congestion event. If we collect more than v congestion events on link j every C hours (e.g., 24 h) with a probability greater than b and if

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

1077

Fig. 1. Bandwidth occupation obtained by means of ns2 simulation. The figure shows the decay in BE traffic performance (average throughput per TCP connection [kbps]), when the bandwidth left free by the QoS traffic goes below a certain threshold. ðiÞ

C j < C max , then we increase the bandwidth assignj ment on such a link by a bandwidth unit and we update the parameters as follows: ðiþ1Þ

Cj

ðiÞ

¼ C j þ 1;

ðiþ1Þ

Tj

ðiÞ

¼ T j þ 1 and i þ 1

i.

The dimensioning mechanism stops when the simulation yields stable values since the last link assignment change. Note that this final dimensioning procedure can be considerably speeded up by paralleling the network simulations: for example, it is possible to run several simulations in parallel to observe the network behavior (in terms of congestion events) during different periods C. 5. Numerical results In this section the results obtained in the validation test sets are reported. All tests, except the last one, have been realized by using the networks in Figs. 2 and 11, respectively. In the last test we have applied the global method on a complex network, which closely matches a national backbone. In the first test, we have used the traffic matrix reported in Table 1 applied to the network in Fig. 2. To easily verify the correctness of the results, we have utilized the same end-to-end blocking probability constraint for all of the four QoS traffic classes. Note that, for this reason and since the A–B,

Fig. 2. The network topology used in the first validation test sessions.

B–E, C–D and D–E links (with reference to Fig. 2) are traversed by traffic flows with identical offered loads, they should result in the same capacity allocation, and this is what happens at the end of the procedure. Our first aim is to verify the QoS dimensioning procedure. Fig. 3 shows the blocking probabilities measured on the network, dimensioned both by the analytical procedure only and by the whole QoS phase (i.e., including the simulationbased refinement described at the end of Section 4.1). It can be seen that the difference between the two dimensioning results grows with the constraint value, and it is also quite evident that the

Table 1 QoS traffic matrix for the first test ðkÞ

ðkÞ

Class ID k

Source node

Destination node

kQoS [streams/s]

lQoS [streams/s]

b(k) [Mbps]

1 2 3 4

A A C C

F F F F

0.1 0.1 0.2 0.2

0.01 0.01 0.002 0.002

1 2 1 2

Fig. 3. Measured blocking probabilities versus imposed constraints with both analytical and complete QoS procedure capacity assignments.

1078

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

approximate analytical solution always remains a lower bound. On the other hand, by observing Figs. 4 and 5, where the corresponding computed capacities versus different values of the desired blocking probability are reported, it can be noted that even a significant difference in blocking probabilities determines a relatively small difference in capacity allocation. This means that in very complex networks, the analytical procedure might be considered precise enough to determine the upper bounds and the QoS thresholds. Fig. 6. Average errors obtained by comparing the mean throughputs calculated by the stationary model and by ns2 simulations for different values of offered traffic load.

Fig. 4. QoS capacity assignment for links A–B, B–E, C–D and D–E, computed by the analytical part only and by the whole procedure, versus the blocking probability constraint.

Fig. 5. QoS capacity assignment for link E–F, computed by the analytical part only and by the whole procedure, versus blocking probability constraints.

The aim of the second test is checking the BE dimensioning procedure. Some tests for the stationary model performance evaluation are shown in Fig. 6, which compares the mean throughput values obtained by the model with respect to the ns2 ones for different offered traffic loads. As the ns2 simulations have been stabilized within a 95% confidence interval equal to 3% of the estimated value, and as the maximum average error does not exceed 2.5%, the results in Fig. 6 point out the high accuracy level of the proposed model. The reference network topology is again that in Fig. 2, while the QoS and BE traffic matrices have been changed; they are reported in Tables 2 and 3, respectively. The BE and QoS capacity assignments obtained from the BE dimensioning and QoS dimensioning procedures are reported in Table 4. The same table also shows the measured (by ns2 simulation) BE average link throughput. These last values confirm again the correctness of the BE dimensioning procedure, by taking into account that the allocation granularity gBE has been fixed to 1 Mbps and the choice of the procedure is always conservative. The third test concerns the results of the global method. Thus, Fig. 7 shows a first validation test derived by comparing the simulative model, applied in the final dimensioning procedure, with ns2. In

Table 2 QoS traffic matrix for the second test ðkÞ

ðtÞ

ðkÞ

Class ID k

Source node

Destination node

kQoS [streams/s]

lQoS [streams/s]

b(k) [Mbps]

Pb

1 2 3 4

A A C C

F F F F

0.01 0.002 0.01 0.002

0.01 0.002 0.01 0.002

1 2 1 2

0.75 0.75 1 0.75

[%]

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085 Table 3 BE traffic matrix for the second test Flow ID f

Source node

Destination node

kBE [conn/s]

ðf Þ

xBE [MB]

ðf Þ

a(f )

1 2 3 4

A B C D

F F F F

1 1 1 1

0.24 0.24 0.24 0.24

2.5 2.5 2.5 2.5

Table 4 Capacity assignments for BE and QoS dimensioning and measured BE average throughput Link

QoS assigned capacity [Mbps]

BE assigned capacity [Mbps]

Measured average throughput [Mbps]

A–B C–D B–E D–E E–F

14 14 14 14 20

3 3 5 5 10

2.0167 1.9983 3.940 3.975 7.915

1079

tests comparing the congestion periods detected by both the simulative model and ns2. To this end, both simulators have been used with the same traffic traces. As pointed out in Figs. 8 and 9, where the detected congestion numbers are shown, according to different congestion durations, the simulative model behavior results are more conservative than those from ns2; besides, when the congestion periods increase, the model appears to overestimate more clearly the congestion events with respect to ns2. Continuing to use the network in Fig. 2 and the traffic matrixes in Tables 2 and 3, we have fixed the final BE constraint, by imposing C = 24 h and b = 95% (i.e., the constraint is imposed on the maximum number of BE congested periods per day and with probability 0.95). Fig. 10 reports the obtained final global capacity reductions (bandwidth gains)

this case, the average errors are obtained by comparing the link bandwidth utilizations over time for both simulators. The link utilizations have been averaged using different sample time non-overlapping windows, while the results have been stabilized with a 95% confidence interval at 3% width. As shown in Fig. 7, the average error of this model decreases according to the increase in sample time duration, and achieves values lower than 4%. Moreover, as the performance of the final dimensioning mechanism heavily depends on the simulative modelÕs ability to detect the link congestion periods, we have chosen to report further validation

Fig. 8. Congestion events detected by both the simulative model and ns2, according to different bottleneck link bandwidth values and congestion durations. The average traffic load offered to this link is 24 Mbps.

Fig. 7. Average errors derived by comparing the link bandwidth utilizations over time in the simulative model and in ns2. The bandwidth utilization values have been calculated by averaging over different time windows (total duration 10,000 s).

Fig. 9. Congestion events detected by both the simulative model and ns2, according to different bottleneck link bandwidth values and congestion durations. The average traffic load offered to this link is 24 Mbps.

1080

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085 Table 5 QoS traffic matrix ðkÞ

ðtÞ

Class ID k

Source node

Destination node

kQoS [streams/s]

lQoS [streams/s]

b(k) [Mbps]

1 2 3 4

C B A D

F G G F

0.05 0.05 0.05 0.05

0.01 0.01 0.01 0.01

1 1 1 1

Table 6 BE traffic matrix Fig. 10. Bandwidth gains (capacity reduction with respect to the upper bound) versus allowable number of daily BE congestion periods, with respect to different values of the congestion period duration.

with respect to the initial upper-bound computation. These results are shown for an increasing number of allowed congestion instances per day (v = 10, 15, 20, 25, 30 and 35) and also for different durations of what is considered to be a congestion period (s = 30, 60, 90 and 120 s). It can be seen that the advantage obtained by using the final procedure with respect to the first computed upper-bound assignment increases by progressively relaxing the constraints. In the extreme cases the gain reaches the maximum value forecast by the procedure (i.e., 25 Mbps); this means that the first BE constraint (80% utilization) is here more stringent than the other one. Then, we have carried out further tests, by using the test network in Fig. 11, to evaluate the performance of the final dimensioning procedure. Tables 5 and 6 report the BE and QoS traffic matrixes, the blocking probability constraint has been imposed equal to 2% for all QoS traffic flows, while Table 7 shows the bandwidth assignments obtained in BE and QoS traffic dimensioning modules. We have fixed the final BE constraint by imposing

Fig. 11. Network topology.

ðf Þ

ðf Þ

Row ID f

Source node

Destination node

kBE [conn/s]

xBE [MB]

a(f)

l 2 3 4

A B C D

F F F F

2.5 2.5 2.5 2.5

0.24 0.24 0.24 0.24

2 2 2 2

Table 7 Capacity assignments for BE and QoS dimensioning and gain ranges Link

QoS assigned capacity [Mbps]

BE assigned capacity [Mbps]

Gain range [Mbps]

A–B C–B B–D B–E C–D D–F D–G

15 0 21 0 15 21 21

13 6 6 13 0 6 6

3 0 4 0 3 4 4

All the links that are not reported here have received no bandwidth allocation.

C = 24 h, b = 95% and s = 40. Fig. 12 reports the sum of the capacities allocated by the final mechanism. These results are shown for an increasing number of allowed congestions per day (v = 5, 10, 20, 30, 40, 50 and 60). Figs. 13 and 14 show the average computation time needed by the QoS and BE dimensioning mechanisms, respectively. These values have been obtained using the network in Fig. 11 and different offered traffic loads; the computations were carried out on Intel Xeon processors with a 2.4 GHz internal clock. Fig. 15 reports the average computation times needed to simulate a day of network operation with the final dimensioning mechanism, according to different average offered loads of BE traffic. The last test has been realized to dimension a large VPN on the national (Italian) backbone, shown in Fig. 16, which represents substantially

1081

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

Fig. 12. Sum of the link bandwidths allocated by the final dimensioning procedure according to different numbers of allowed congestion events per day (v = 5, 10, 20, 30, 40, 50 and 60).

Fig. 15. Average values and variance of the computation time needed by the final procedure for the network in Fig. 11, over a ðf Þ 24-h period, according to different kBE values.

Fig. 13. Average computation time needed by the first dimensioning procedure with the network in Fig. 11 and according to ðkÞ different kQos values (all classes have the same intensities).

Fig. 16. Network topology used in the last test.

ðkÞ

Fig. 14. Average computation time needed by the first dimensioning procedure with the network in Fig. 11 and according to ðkÞ different kBE values (all sources have the same intensities).

the network of the Italian academic and research provider GARR [32]. The traffic matrix used represents a fully meshed traffic exchange among all the nodes. More in detail, each node has the following three traffic streams/flows sources for each possible destination:

• QoS sources A: b(k) 1 Mbps, kQoS ¼ 0:01 streams/ ðkÞ s, lQoS ¼ 0:01 streams/s, blocking probability constraint = 1% for "k 2A; ðkÞ • QoS sources B: b(k) 2 Mbps, kQoS ¼ 0:02 streams/ ðkÞ s, lQoS ¼ 0:02 streams/s, blocking probability constraint = 1% for "k 2B; ðf Þ ðf Þ • BE sources: kBE = 1 flow/s, xBE ¼ 0:24 MB, (f) a = 2 for "f 2BE. Note that, in this case, Eq. (3) must be substituted by the expressions of the blocking probabilities corresponding to a stochastic knapsack. The final BE constraints are: congestion period duration s = 60 s, v = 10 congestion events every C = 24 h, b = 95%. Table 8 reports all the results obtained at the different steps of the global method. The

1082

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

Table 8 Last testsÕ dimensioning results Link

QoS bandwidth [Mbps]

BE bandwidth [Mbps]

Gain range [Mbps]

Allocated bandwidth [Mbps]

MI-RM MI-PD MI-PI MI-BO PD-BO PI-BO BO-NA RM-MI PD-MI PI-MI BO-MI BO-PD BO-PI NA-BO MI-U1 U1-MI MI-U2 U2-MI RM-U3 U3-RM RM-NA NA-RM NA-CT CT-NA NA-BA BA-NA NA-PA PA-NA RM-CA CA-RM MI-GE GE-MI PI-GE GE-PI MI-TO TO-MI PD-TS TS-PD MI-TS TS-MI

186 51 51 51 63 42 96 123 51 51 135 42 42 156 81 81 81 81 78 78 144 78 81 81 81 81 81 81 78 78 75 75 15 15 81 81 15 42 75 54

199 36 36 0 0 0 0 59 36 36 143 0 0 143 36 36 36 36 36 36 143 0 36 36 36 36 36 36 36 36 36 36 0 0 36 36 0 0 36 36

25 16 14 0 0 0 0 23 14 13 23 0 0 19 23 20 18 17 15 17 25 0 22 18 21 14 18 19 15 18 17 17 0 0 18 20 0 0 16 10

365 74 77 51 63 42 96 187 77 78 261 42 42 285 101 104 104 107 104 101 269 78 107 109 108 108 107 106 104 104 106 106 15 15 107 107 15 42 106 84

measures realized by ns2 of BE average throughput and QoS blocking probabilities (not reported here) confirm the correctness of the first two dimensioning procedures. Moreover, it can be noticed that the final step allows a global bandwidth saving of 299 Mbps. 6. Conclusions A complete set of methodologies for bandwidth dimensioning in a VPN has been proposed and analyzed in the paper. The aim of the whole structure is

to obtain a reasonable balance between precision and computational complexity. In particular, the presence of both QoS and BE traffic has been explicitly taken into account, and two dimensioning procedures, tailored to the specific traffic models, have been devised. The QoS procedure is based on an analytical model, whereas the BE one uses a static algorithmic solution as a first approximation. The final dimensioning considers the effect of the multiplexing gain, arising when the different traffic types share the assigned link capacities. This refinement requires a more accurate fast simulation model for BE, as well as the iteration of simulation steps involving both traffic categories. The whole methodology has been tested on specific networks, and the effects of the approximations made have been examined, by comparison with ns2 detailed simulations. Globally, even though further testing and refinements are being planned, the proposed tool appears to be capable of responding to the needs of a network operator with reasonable accuracy and allows realizing significant savings in bandwidth activation. Further work should address some aspects that have been neglected here, especially as regards packet-level constraints; this includes buffer dimensioning to meet assigned loss probability constraints, a problem that is treated, for instance, in [16]. However, the consideration of this issue implies a change in our model of BE traffic, in order to represent explicit queueing at intermediate nodes. This, in turn, is likely to increase the modelÕs complexity; the whole matter is currently under investigation. Appendix A To obtain realizations of the data sources we have to generate two quantities: the arrival instants of the blocks between pairs and their sizes. Note that the first quantity is a quite ‘‘random’’ parameter, because it strictly depends on the user behavior and on the interactions between users and applications. By taking into account this characteristic, we have decided to use a Poisson process to generate the arrival instants, i.e., we use an exponential interarrival distribution, where the interarrival time DT of a burst in the flow p 2 P has an exponential probability density with parameter k(p), p 2 P. Concerning the size of data blocks, many studies and measures reported in the literature (see [33,34], among others) suggest the use of heavy-tailed distributions, which originate a self-similar behavior, when traffic coming from many generators is mixed

1083

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

on the network links. Thus, we have decided to use a Pareto–Levy distribution, i.e., following a probability density function: ðpÞ

ðpÞa

ðpÞ

ðpÞ

fX ðxÞ ¼ aðpÞ DINF xð1þa Þ ;

ðpÞ

x P DINF ; p 2 P ; ð10Þ (p)

where x is the block size in bits, a is the shape ðpÞ parameter and DINF the location parameter, i.e., the minimum size of a block. This distribution is characterized by an infinite variance when a(p) 6 2 and an infinite mean for a(p) 6 1. The average block size xðpÞ , with a(p) > 1, is ðpÞ

xðpÞ ¼ EfxðpÞ g ¼

aðpÞ DINF ; aðpÞ  1

p 2 P.

ð11Þ

Actually, in the model we utilize a truncated version of Eq. (10). This choice has two origins. On one side, it makes the generator more realistic, since the real maximum data block size is finite. On the other hand, this choice makes the model able to give stable average performance indexes, which is not really possible with a generator with infinite variance (note that the realistic values for a(p) are between 1 and 2). In particular, all the reported numerical results have been obtained by imposing a(p) = 2. This approach gives us the possibility to represent both short connections (mice), which do not exit the slow-start phase of TCP flow control, as well as very long data transfers (elephants). It is evident, anyway, that phenomena related to the packet level (like synchronization of elephant connections) are not observable with our representation.

ðaÞ

• U i 6 C ðaÞ , the capacity used on the link a 2 A at step i; ðpÞ • ri , the rate of the flow p 2 Pe at step i; ðpÞ • Rmax ; the maximum rate that flow p 2 Pe can reach. At step i = 0 the variables are initialized ðaÞ as follows: A0 = A; Pe 0 ¼ Pe ; U 0 ¼ 0; 8a 2 A; ðpÞ ~ ri ¼ 0; 8p 2 P . Then, the algorithm applied to find the rate of ðpÞ each flow (rtk , 8p 2 Pe Þ during the period [tk, tk+1) (for i = 1, 2, . . .) is: 1. Compute the percentage of link capacity sharing for each flow as ða;pÞ

qi

kðpÞxðpÞ ¼ P ðjÞ ðjÞ ; k x ðaÞ

j2P~ i1

ðaÞ 8p 2 Pe i1 ;

8a 2 Ai1 .

2. Then compute the incremental rate for all flows that have not yet reached the maximum value, as ðpÞ

ðaÞ

ða;pÞ

Dri ¼ min fðC a  U i1 Þqi a2S ðpÞ

g;

3. The new rates become ðpÞ

ðpÞ

ðpÞ

ðpÞ ri ¼ minfri1 þ ri ; Rmax g;

8p 2 Pe i1 .

8p 2 Pe i1 .

4. Then the value of the utilized capacities within the sets of active links and flows must be updated as X ðpÞ ðaÞ ri ; 8a 2 Ai1 ; Ui ¼ ðaÞ

Appendix B Let us drop, for the sake of simplicity, the time step index tk, and define the following quantities: • A, the set of all the links in the network; • C(a), the capacity of the link a 2 A; • Ai, the set of all the links that are not already completely utilized at step i; • Pe , the set of all the active aggregates (i.e., the aggregates with the buffer not empty); • Pe i , the set of all the aggregate flows that have not reached the maximum possible rate value at algorithm step i; ðaÞ • Pe i , the set of all the flows p 2 Pe i which cross the link a 2 A at step i; • S(p), the set of the links on the path followed by flow p 2 Pe ;

p2P~ i1

ðaÞ

Ai ¼ fa : C ðaÞ  U i > 0g; ðpÞ ðpÞ Pe i ¼ fp : S ðpÞ  Ai ^ ri 6 Rmax g.

5. i i + 1. 6. If Pe i ¼ 6 ; go to step 1; otherwise the procedure ends. In our model the concept of connection looses its individual meaning, but this observation also suggests that the average bandwidth sharing has a steady point proportional to the offered load (if we suppose that the instantaneous throughput of each connection be the same, on average). So, in step 1 of the algorithm, we have defined a sharing parameter q(p,a), which represents the maximum bandwidth portion that an aggregate flow p on a link a can use as

1084

qðp;aÞ ¼ P

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085

kðpÞxðpÞ j2P~

ðaÞ

ðjÞ ðjÞ

k x

;

ð12Þ [17]

i.e., proportionally to the offered load of different active flows on the link. References [1] M. Montan˜ez, QoS in the enterprise, Packet—CISCO Syst. Users Mag. 14 (4) (2002) 30–34. [2] Y. Maeda, Standards for Virtual Private Networks (Guest Editorial), IEEE Commun. Mag. 42 (6) (2004) 114–115. [3] M. Carugi, J. De Clercq, Virtual Private Network Services: scenario, requirements and architectural construct from a standardization perspective, IEEE Commun. Mag. 42 (6) (2004) 116–122. [4] C. Metz, The latest in Virtual Private Networks: Part I, IEEE Internet Comput. 7 (1) (2003) 87–91. [5] C. Metz, The latest in VPNs: Part II, IEEE Internet Comput. 8 (3) (2004) 60–65. [6] J. De Clercq, O. Paridaens, Scalability implications of Virtual Private Networks, IEEE Commun. Mag. 40 (5) (2002) 151–157. [7] W. Cui, M.A. Bassiouni, Virtual Private Network bandwidth management with traffic prediction, Comput. Networks 42 (6) (2003) 765–778. [8] P. Knight, C. Lewis, Layer 2 and 3 Virtual Private Networks: taxonomy, technology, and standardization efforts, IEEE Commun. Mag. 42 (6) (2004) 124–131. [9] T. Takeda, I. Inoue, R. Aubin, M. Carugi, Layer 1 Virtual Private Networks: service concepts, architecture requirements, and related advances in standardization, IEEE Commun. Mag. 42 (6) (2004) 132–138. [10] S.S. Gorshe, T. Wilson, Transparent Generic Framing Procedure (GFP): a protocol for efficient transport of block-coded data through SONET/SDH networks, IEEE Commun. Mag. 40 (5) (2002) 88–95. [11] J. Zeng, N. Ansari, Toward IP Virtual Private Network Quality of Service: a service provider perspective, IEEE Commun. Mag. 41 (4) (2003) 113–119. [12] I. Khalil, T. Braun, Edge provisioning and fairness in VPNDiffServ networks, J. Network Syst. Manag. 10 (1) (2002) 11–37. [13] R. Bolla, R. Bruschi, F. Davoli, Planning multiservice VPN networks: an analytical/simulative mechanism to dimension the bandwidth assignments, in: Proc. 3rd Int. Workshop on Quality of Service in Multiservice IP Networks (QoS-IP 2005) Catania, Italy, February 2005, Lecture Notes in Computer Science (LNCS 3375), Springer-Verlag, Berlin, 2005, pp. 176–190. [14] A. Olsson, Understanding Telecommunications, Chapter A.10, Ericsson, STF, Studentlitteratur, 1997–2002. [15] A. Kumar, R. Rastogi, A. Silberschatz, B. Yener, Algorithms for provisioning Virtual Private Networks in the hose model, IEEE/ACM Trans. Networking 10 (4) (2002) 565– 578. [16] E.C.G. Wille, M. Mellia, E. Leonardi, M. AjmoneMarsan, Topological design of survivable IP networks using metaheuristic approaches, in: Proc. 3rd Int. Workshop on Quality of Service in Multiservice IP Networks (QoS-IP 2005)

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28] [29] [30]

[31]

Catania, Italy, February 2005, Lecture Notes in Computer Science (LNCS 3375), Springer-Verlag, Berlin, 2005, pp. 191–206. R. Bolla, R, Bruschi, M. Repetto, A fluid simulator for aggregate TCP connections, in: Proc. 2004 Int. Symp. on Performance Evaluation of Computer and Telecommunication Systems (SPECTSÕ2004), San Diego, CA, July 2004, pp. 5–10. N.N. Duffield, P. Goyal, A. Greenberg, P. Mishra, K. Ramakrishnan, J. van der Merwe, Resource management with hoses: point-to-cloud services for Virtual Private Networks, IEEE/ACM Trans. Networking 10 (10) (2002) 679–691. ´ . Szentesi, On bandwidth efficiency of A. Ja`u¨ttner, I. Szabo´, A the hose model in private networks, in: Proc. IEEE Infocom 2003, San Francisco, CA, April 2003, vol. 1, pp. 386–395. W. Yu, J. Wang, Scalable network resource management for large scale virtual private networks, in: Simulation Modelling Practice and Theory 12, Elsevier, 2004, pp. 263–285. R. Isaacs, I. Leslie, Support for resource-assured and dynamic Virtual Private Networks, IEEE J. Select. Areas Commun. 19 (3) (2001) 460–472. R. Cohen, G. Kaempfer, On the cost of Virtual Private Networks, IEEE/ACM Trans. Networking 8 (6) (2000) 775– 784. V. Misra, W. Gong, D. Towsley, A fluid-based analysis of a network of AQM routers supporting TCP flows with an application to RED, in: Proc. ACM SIGCOMM 2000, Stockholm, Sweden, August 2000, pp. 151–160. M. Barbera, A. Lombardo, G. Schembra, A. Tricarico, A fluid-based model of policed RIO router networks loaded by time-limited TCP flows, in: Proc. 2004 Int. Symp. on Performance Evaluation of Computer and Telecommunication Systems (SPECTSÕ2004), San Diego, CA, July 2004, pp. 11–18. M. Ajmone Marsan, G. Carofiglio, M. Garetto, P. Giaccone, E. Leopardi, E. Schiattarelle, A. Tarello, Of mice and models, in: Proc. 3rd Int. Workshop on Quality of Service in Multiservice IP Networks (QoS-IP 2005) Catania, Italy, February 2005, Lecture Notes in Computer Science (LNCS 3375), Springer-Verlag, Berlin, 2005, pp. 15–32. M. Garetto, R. Lo Cigno, M. Meo, M. Ajmone Marsan, Closed queueing network models of interacting long-lived TCP flows, IEEE/ACM Trans. Networking 12 (2) (2004) 300–311. B. Sikdar, S. Kalyanaraman, K.S. Vastola, Analytic models and comparative study of the latency and steady-state throughput of TCP Tahoe, Reno and SACK, in: Proc. IEEE GLOBECOM 2001, San Antonio, TX, vol. 3, November 2001, pp. 1781–1787. D. Bertsekas, R. Gallager, Data Networks, second ed., Prentice-Hall, Englewood Cliffs, NJ, 1992. K.W. Ross, Multiservice Loss Models for Broadband Telecommunication Networks, Springer, Berlin, 1995. R. Bolla, R. Bruschi, F. Davoli, M. Repetto, Analytical/ simulation optimization system for access control and bandwidth allocation in IP networks with QoS, in: Proc. 2005 Int. Symp. on Performance Evaluation of Computer and Telecommunication Systems (SPECTS 05), Philadelphia, PA, July 2005, pp. 339–248. The Network Simulator—Ns2. Documentation and source code from the home page: http://www.isi.edu/nsnam/ns/.

R. Bolla et al. / Computer Networks 50 (2006) 1069–1085 [32] GARR backbone, network topology and statistics homepage: http://www.garr.it/. [33] W. Willinger, V. Paxson, M. Taqqu, Self-similarity and heavy tails: structural modeling of network traffic, in: A practical Guide to Heavy Tails: Statistical Techniques and Applications, Birkhauser, Boston, 1998. [34] K. Park, W. Willinger, Self-Similar Network Traffic and Performance Evaluation, John Wiley & Sons, Inc., New York, 2000.

Raffaele Bolla was born in Savona, Italy, in 1963. He obtained the ‘‘Laurea’’ degree in Electronic Engineering from the University of Genoa in 1989 and the Ph.D. degree in Telecommunications at the Department of Communications, Computer and Systems Science (DIST) of the University of Genoa, in 1994. From 1996 to 2004 he was a researcher at DIST and since 2004 he has been an Associate Professor at DIST, where he teaches a course in Telecommunication Networks and Telematics. His current research interests are in resource allocation, Call Admission Control and routing in Multi-service IP networks, Multiple Access Control, resource allocation and routing in both cellular and ad hoc wireless networks. He has authored or coauthored over 100 scientific publications in international journals and conference proceedings. He has been the Principal Investigator in many projects in the Telecommunication Networks field.

Roberto Bruschi received the ‘‘Laurea’’ degree in Telecommunication Engineering in 2002 from the University of Genoa. He is now a Ph.D. student at the Department of Communications, Computer and Systems Science of the University of Genoa. He is also a member of the CNIT, the Italian inter-university consortium for telecommunications. He is an active member of some Italian research project groups such as TANGO and EURO, whose main research interests include Open Router, Network processor, TCP modeling, VPN design, P2P and QoS IP network control.

1085

Franco Davoli received the ‘‘Laurea’’ degree in Electronic Engineering in 1975 from the University of Genoa, Italy. Since 1990 he has been a Full Professor of Telecommunication Networks at the University of Genoa, where he is with the Department of Communications, Computer and Systems Science (DIST). From 1989 to 1991 and from 1993 to 1996, he was also with the University of Parma, Italy. His current research interests are in bandwidth allocation, admission control and routing in multi-service networks, wireless mobile and satellite networks and multimedia communications and services. He has coauthored over 200 scientific publications in international journals, book chapters and conference proceedings. He is a member of the Editorial Board of the International Journal of Communication Systems (Wiley) and of the International Journal of Studies in Informatics and Control, and is an Associate Editor of the Simulation—Transactions journal of the SCS. He has been a guest co-editor of two Special Issues of the European Transactions on Telecommunications and of a Special Issue of the International Journal of Communication Systems. In 2004, he was the recipient of an Erskine Fellowship from the University of Canterbury, Christchurch, New Zealand, as a Visiting Professor. He has been the Principal Investigator in a large number of projects and has served in several positions in the Italian National Consortium for Telecommunications (CNIT). He was the Head of the CNIT National Laboratory for Multimedia Communications in Naples for the term 2003–2004, and he is currently the Vice-President of the CNIT Management Board. He is a Senior Member of the IEEE.

Computer Networks 50 (2006) 1086–1103 www.elsevier.com/locate/comnet

Algorithms for IP network design with end-to-end QoS constraints q Emilio C.G. Wille a, Marco Mellia a

b,* ,

Emilio Leonardi b, Marco Ajmone Marsan

b

Federal Technology Education Center of Parana´, Department of Electronics, Av. Sete de Setembro 3165, Curitiba (PR), Brazil b Politecnico di Torino, Corso Duca degli Abruzzi, 24-10129 Torino, Italy Available online 5 October 2005

Abstract The new generation of packet-switching networks is expected to support a wide range of communication-intensive realtime multimedia applications. A key issue in the area is how to devise reasonable packet-switching network design methodologies that allow the choice of the most adequate set of network resources for the delivery of a given mix of services with the desired level of end-to-end Quality of Service (e2e QoS) and, at the same time, consider the traffic dynamics of todayÕs packet-switching networks. In this paper, we focus on problems that arise when dealing with this subject, namely Buffer Assignment (BA), Capacity Assignment (CA), Flow and Capacity Assignment (FCA), Topology, Flow and Capacity Assignment (TCFA) problems. Our proposed approach maps the end-userÕs performance constraints into transport-layer performance constraints first, and then into network-layer performance constraints. This mapping is then considered together with a refined TCP/IP traffic modeling technique, that is both simple and capable of producing accurate performance estimates, for general-topology packet-switching design networks subject to realistic traffic patterns. Subproblems are derived from a general design problem and a collection of heuristic algorithms are introduced for compute approximate solutions. We illustrate examples of network planning/dimensioning considering Virtual Private Networks (VPNs).  2005 Elsevier B.V. All rights reserved. Keywords: Packet-switching networks design and planning; TCP/IP; Queueing theory; Mathematical programming/optimization; Heuristic methods

1. Introduction The new generation of packet-switching networks is expected to support a wide range of communicaq

This work was supported by the Italian Ministry for Education, University and Research under the FIRB project TANGO, and by a CAPES Foundation scholarship from the Ministry of Education of Brazil. * Corresponding author. Tel.: +39 011 227 6608. E-mail address: [email protected] (M. Mellia).

tion-intensive real-time multimedia applications. These applications will have their own different quality-of-service (QoS) requirements in terms of throughput, reliability, and bounds on end-to-end (e2e) delay, jitter, and packet-loss ratio. It is technically a challenging and complicated problem to deliver multimedia information in a timely, synchronized manner over a decentralized, shared network environment, especially one that was originally designed for best-effort traffic such as the Internet.

1389-1286/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.09.005

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

Accordingly, a key issue in this area is how to devise reasonable packet-switching network design methodologies that allow the choice of the most adequate set of network resources for the delivery of a given mix of services with the desired level of e2e QoS and, at the same time, consider the traffic dynamics of todayÕs packet-switching networks. The traditional approaches to optimal design and planning of packet networks, extensively investigated in the early days of packet-switching networks [1,2], focus on the network-layer infrastructure thus neglecting e2e QoS issues, and Service Level Agreement (SLA) guarantees. From the end-userÕs point of view, QoS is driven by e2e performance parameters, such as data throughput, Web page latency, transaction reliability, etc. Matching the user-layer QoS requirements to the network-layer performance parameters is not a straightforward task. The QoS perceived by end-users in their access to Internet services is mainly driven by the Transmission Control Protocol (TCP), the reliable transport protocol of the Internet, whose congestion control algorithms dictate the latency of information transfer. Indeed, it is well known that TCP accounts for a great amount of the total traffic volume in the Internet [3,4], and among all the TCP flows, a vast majority is represented by short-lived flows (also called mice), while the rest is represented by longlived flows (also called elephants). The description of traffic patterns inside the Internet is a particularly delicate issue, since it is well known that IP packets do not arrive at router buffers following a Poisson process [5]. Traditionally, either M/M/1 or M/M/1/B queueing models were considered as good representations of packet queueing elements in the network. However, the traffic flowing in IP networks is known to exhibit Long Range Dependent (LRD) behavior, which cause queue dynamics to severely deviate from the above model predictions. For these reasons, the usual approach of modeling packet-switching networks as networks of M/M/1 queues [6–8] appears now inadequate for the design of such networks. Recently, in [9], the authors for the first time abandon the Markovian assumption in favor of a fractional Brownian motion model, i.e., an LRD traffic model. They solve the discrete Capacity Assignment problem under network e2e delay constraints only, using a simulated annealing meta heuristic. Unfortunately, it is difficult to extend this approach to consider more general network problems, because the relation among traffic, capacity

1087

and queueing delay is not expressed as a closedform expression. Additionally, with the enormous success of the Internet, all enterprises have become dependent upon networks or networked computation applications. In this context the loss of network services is a serious outage, often resulting in unacceptable delays, loss of revenue, or temporary disruption. To avoid loss of network services, communications networks should be designed so that they remain operational and maintain as high a performance level as feasible, even in the presence of network component failures. In this paper, we focus on several types of problems that arise when dealing with packet-switching network design. We consider the traffic dynamics of packet networks, as well as the effect of protocols at the different layers of the Internet architecture on the e2e QoS experienced by end-users. Of course, in any realistic network problem an ‘‘optimal design’’ is an extremely difficult task. In [10,11] an IP network design methodology is proposed which is based on a ‘‘Divide and Conquer’’ approach, in the sense that it corresponds to several tasks. Fig. 1 shows the flow diagram of the design methodology. Shaded, rounded boxes represent function blocks, while white parallelograms represent input/ output of functions. There are three main blocks, which correspond to the classic blocks in constrained optimization problems: constraints (on the left), inputs (on the bottom right) and optimization procedure (on the top right). Considered as constraints, for every source/destination pair, are the specification of user-layer QoS parameters. Thanks to the definition of the QoS translators, all the user-layer constraints are then mapped into lowerlayer constraints, down to the IP layer. The optimization procedure takes as inputs (in accordance to the problem to be solved) the description of the physical topology, the routing algorithm specification, the traffic matrix, and the expression of the cost as function of the design variables. The objective of the optimization is to find the minimum-cost solution that satisfies the user-layer QoS constraints. A second important point of the proposed methodology is the adoption of a refined TCP/IP traffic modeling technique that is both simple and capable of producing accurate performance estimates for packet-switching networks subject to realistic traffic patterns. The main idea behind this approach corresponds to reproducing the effects of traffic correlations on network queueing

1088

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

Fig. 1. Schematic flow diagram of the network design methodology.

elements by means of Markovian queueing models with batch arrivals. Hence, using M[X]/M/1 like queues. The rest of the paper is organized as follows. Section 2 briefly describes the QoS translation problem as well as the traffic and queueing models. Section 3 outlines the general design problem and provides the formulation of the related optimization subproblems. It also introduces heuristic algorithms to compute approximate solutions, and discusses numerical and simulation results. Finally, Section 4 summarizes the main results obtained in this research. 2. QoS translation and models In this section we describe the QoS translation problem as well as the traffic and queueing models (focusing on the TCP protocol) [10,11]. 2.1. QoS translators The process of translating QoS specifications between different layers of the protocol stack is called QoS translation. According to the Internet protocol architecture, at least two QoS translating procedures should be considered; the first one translates the application-layer QoS constraints into transport-layer QoS constraints, and the second translates transport-layer QoS constraints into network-layer QoS constraints.

2.1.1. Application-layer QoS translator This module takes as input the application-layer QoS constraints, such as Web page transfer latency, data throughput, audio quality, etc. Given the multitude of Internet applications it is not possible to devise a generic procedure to solve this problem. Hence, in this paper we will focus on ad-hoc solutions depending on the application. 2.1.2. Transport-layer QoS translators The translation from transport-layer QoS constraints to network-layer QoS parameters, such as Round Trip Time (RTT) and packet loss probability (Ploss), is more difficult. This is mainly due to the complexity of the TCP protocol, which implements error, flow and congestion control algorithms. The TCP QoS translator accepts as inputs either the maximum file transfer latency (Lt), or the minimum file transfer throughput (Th). We require that all flows shorter than a given threshold (i.e., mice) meet the maximum file transfer latency constraint, while longer flows (i.e., elephants) are subjected to the throughput constraint. Obviously, more stringent constraints among latency and throughput will be considered. The approach is based on the numerical inversion of analytic TCP models, taking as input either the file transfer throughput or latency, and obtaining as outputs RTT and Ploss. Among the many models of TCP presented in the literature, we used the TCP latency model described in [12]. We will

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

refer to this model as the CSA model (from the authorsÕ names). When considering throughput, we instead exploit the formula in [13], referred as the PFTK model (from the authorsÕ names). Here, the numerical inversion is just a root finding procedure. There are at least two parameters that affect TCP performance, i.e., RTT and Ploss. We decided to fix the Ploss parameter, and leave RTT as the free variable. This choice is due to the fact that the loss probability has a larger impact on the latency of very short flows, and that it may impact the network load due to retransmissions. Therefore, after choosing a value for Ploss, a set of curves can be derived, showing the behavior of RTT as a function of file latency and throughput. As one example, we consider a mixed traffic scenario where data files are exchanged with the file size distribution related in [4]. This distribution, obtained by one-week long measurements, says that 85% of all TCP flows are shorter than 20 packets. Considering this distribution, given the file transfer latency and a fixed throughput of 512 kbps constraint, the curves of Fig. 2 report the maximum admissible RTT which satisfies the most stringent constraint for different values of Ploss. 2.2. Traffic and queueing models In order to obtain a useful formulation of the optimization problems, it is necessary on one side to be accurate in the prediction of the performance metrics of interest (average delay, packet-loss probability), while on the other side limiting the com-

10

0

10

-2

RTT [s]

-1

10

-3

10-1

plexity of the model, (i.e., we adopt models allowing a simple closed-form solution). The representation of traffic patterns inside the Internet is a particularly delicate issue, since it is well known that IP packets do not arrive at router buffers following a Poisson process [5]. Instead, a high degree of correlation exists, which can be partly due to the TCP control mechanisms. In [14], a simple and quite effective expedient was proposed to accurately predict the performance of network elements subject to TCP traffic, using Markovian queueing models. The main idea behind this approach consists in reproducing the effects of traffic correlations on network queueing elements by means of Markovian queueing models with batch arrivals. The choice of using batch arrivals following a Poisson process has the advantage of combining the nice characteristics of Poisson processes (analytical tractability in the first place) with the possibility of capturing the burstiness of the TCP/IP traffic. Hence, we model network queueing elements using M[X]/M/1 like queues. The batch size varies between 1 and W with distribution [X], where W is the maximum TCP window size expressed in segments. The distribution [X] is obtained considering the number of segments that TCP sources send in one RTT for a given file size distribution [14]. The Markovian assumption for the batch arrival process is mainly justified by the Poisson assumption for the TCP connection generation process, as well as the fairly large number of TCP connections simultaneously present in the network. Given the file size distribution, a stochastic model of TCP (described in [14]) is used to obtain the batch size distribution [X]. The distribution [X] is obtained only once before starting the optimization process. 2.3. Virtual private networks

Th (512 Kbps) 10

1089

Flow size < 20 pkt P loss =0.001 P loss =0.005 P loss =0.010 P loss =0.020

100 File Tranfer Latency [s]

101

Fig. 2. RTT constraints as given by the transport-layer QoS translator.

Designing a packet-switching network today may have quite different meanings, depending on the type of network that is being designed. If we consider the design of the physical topology of the network of a large Internet Service Provider (ISP), the design must very carefully account for the existing infrastructure, for the costs associated with the deployment of a new connection or for the upgrade of an existing link, and for the very coarse granularity in the data rates of high-speed links. Instead, if we consider the design of a corporate Virtual Private Network (VPN), then connections are leased from a long distance carrier, the set of leased lines

1090

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

is not a critical component, costs are directly derived from the leasing fees, and the data rate granularity is much finer. While the general methodology for packet-network design and planning, described in Section 1, can be applied to both contexts, as well as others, in this paper we concentrate on the design of corporate VPNs. 3. Problem statement The network infrastructure is represented by a graph G = (V, E) in which V is a set of nodes (with cardinality n) and E is a set of edges (with cardinality m). A node represents a network router and an edge represents a physical link connecting one router to another. The output interfaces of each router is modeled by a queue with finite buffer. Each network link is characterized by a set of attributes which principally are the flow, the capacity and the buffer size. For a given link (i, j), the flow fij is defined as the quantity of information transported by this link, while its capacity Cij is a measure of the maximal quantity of information that it can transmit. Flow and capacity are expressed in bits per second (bps). Each buffer can accommodate a maximum of Bij packets, and dij is the link physical length. Considering the M[X]/M/1/1 queue, the average packet delay is given by the following expression [15] (where we drop the subscript (ij) for simplicity): E½T  ¼

K q K 1 ¼ k 1q l Cf

ð1Þ

with K given by K¼

m0½X  þ m00½X  2m0½X 

;

ð2Þ

where q = f/C is the link utilization factor, the packet length is assumed to be exponentially distributed [1,2] with mean 1/l (bits/packet), k = lf is the arrival rate (packets/s), and m0½X  and m00½X  are the first and second moments of the batch size distribution [X]. The average traffic requirements between nodes ^ ¼ f^csd g, where are represented by a traffic matrix C ^ the traffic csd between a node pair (s, d) represents the average number of bps sent from source s to destination d. We consider as traffic offered to the network csd ¼ ^csd =ð1  P loss Þ, to take into account the retransmissions due to the losses that flows experience along their path to the destination. The flow of each link that composes the topological

configuration depends on the traffic matrix. We consider that for each source/destination pair (s, d), the traffic is transmitted over exactly one directed path in the network. The routing and the traffic uniquely determine the vector f ¼ ðf1 ; f2 ; . . . ; fm Þ where f is a multi-commodity flow for the traffic matrix; it must obey the law of flow conservation. A multi-commodity flow results from the sum of single-commodity flows fkl, where fkl is the average flow generated by packets with source node k and destination node l. Now we can state the general network design problem as follows: consider that we are given the locations of the network routers, the traffic flow requirements, and the link and buffer costs. Our design task is to choose a topology, to select the capacity of the links in this topology, and to design a routing procedure for the traffic from its origins to its destinations, in a way which optimizes an objective function while meeting all the system (QoS and reliability) constraints. As reliability constraint we consider that all traffic must be exchanged even if a single node fails (2-connectivity), and the QoS constrains correspond to maintaining the e2e packet delay for each network source/destination pair below a maximum tolerable value. When explicitly considering TCP traffic it is also necessary to tackle the Buffer Assignment (BA) problem, which corresponds to dimension buffer sizes subject to packetloss probability constraints. The above stated problem is intractable. The number of topologies to consider is too large and, in addition, we have a multi-commodity flow problem. Subproblems can be derived from this general problem and solved separately, in such a way so as to obtain feasible solutions to the general problem. Hence, we may now define three subproblems that differ only in the set of permissible design variables. It is important to note that for a given subproblem a specific optimization technique must be applied to solve it. 3.1. The Capacity Assignment problem In this subsection we focus on the Capacity Assignment (CA) problem, i.e., the selection of the link capacities. The decision of fixing a priori the loss probability allows us to decouple the CA problem from the BA problem. We first solve the CA problem considering the e2e delay constraints only. Then, we enforce the loss probability to meet the Ploss constraints by properly choosing buffer sizes.

1091

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

In the first optimization, a M[X]/M/1/1 queueing model is used, i.e., a queueing model with infinite buffers. This provides a pessimistic estimate of the queueing delay that packets suffer with finite buffers, which will result from the second optimization step, during which an M[X]/M/1/B queueing model is used. Different formulations of the CA problem result by selecting (i) the cost functions, (ii) the routing model, and (iii) the capacity constraints. In the VPN case common assumptions are (i) linear costs, (ii) non-bifurcated routing, and (iii) continuous capacities. Our goal is to determine the link capacities in order to minimize the network cost subject to the maximum allowable e2e packet delay. Given the network topology, the traffic requirements, and the routing, the CA problem is formulated as the following optimization problem: X Z CA ¼ min gðd ij ; C ij Þ ð3Þ i;j

subject to:

dsd ij K1 6 RTT sd  ssd  sds C  fij ij i;j X sd fij ¼ dij csd 8ði; jÞ; X

s;d

C ij P fij P 0

8ði; jÞ.

8ðs; dÞ;

ð4Þ ð5Þ ð6Þ

The objective function (3) represents the total link cost, which is a linear function of both the link capacity and the physical length, i.e., g(dij, Cij) = dij Cij. Eq. (4) is the e2e packet delay constraint for each source/destination pair. It says that the total amount of delay experienced by all the flows routed on a path should not exceed the maximum RTT (see Section 2.1) minus the propagation delay of the route. dsd ij is an indicator function which is one if link (i, j) is in path (s, d) and zero otherwise. Non-bifurcated routing model is used where the traffic will follow exactly one path from source to destination. Eq. (5) defines the average data flow on the link. Constraints (6) are non-negativity constraints. Finally, K1 = K/l. We notice that the above stated CA problem is a convex optimization problem, and its global minimum can be found using standard convex programming techniques, for example, the logarithm barrier method [16]. However, these algorithms are timeconsuming. A fast suboptimal solution to this problem can be found using the following heuristic.

3.1.1. Suboptimal solution to the CA problem A simple heuristic can be derived to obtain solutions to the CA problem. The main idea is to decompose the problem into n · (n  1) single constrained problems (one for each path (s, d)). Let Isd be the set of links which compose path (s, d), and let C sd ij be an auxiliary variable which corresponds to the capacity of the link (i, j) when considering the path (s, d). To solve each single path problem we apply the Lagrangean multiplier method obtaining: " !# X X 1 sd LðwÞ ¼ min ; sd d ij C ij þ w  bsd ; C ij  fij ði;jÞ2I sd ði;jÞ2I sd ð7Þ

subject to : C sd ij P fij P 0 8ði;jÞ;8ðs;dÞ;

ð8Þ

where bsd ¼

1 ðRTT sd  ssd  sds Þ K1

8ðs; dÞ.

The solutions to this problem are given by pffiffiffiffiffiffi P ðk;lÞ2I sd d kl sd pffiffiffiffiffi . C ij ¼ fij þ bsd d ij

ð9Þ

ð10Þ

Knowing the values for the variables C sd ij (in the single path problem) we obtain admissible values for the capacities Cij (in the original CA problem) assigning: C ij ¼ maxfC sd ij g.

ð11Þ

s;d

3.2. The Buffer Assignment problem A second step corresponds to dimension buffer sizes, i.e., to solve the following optimization problem: X Z BA ¼ min hðBij Þ ð12Þ i;j

Subject to: X sd dij pðBij ; C ij ; fij ; ½X Þ 6 P loss ; ij

Bij P 0;

8ði; jÞ.

8ðs; dÞ;

ð13Þ ð14Þ

The objective function (12) represents the total buffer cost, which is the sum of the buffer cost functions, h(Bij) = Bij. Eq. (13) is the loss probability constraint for each source/destination node pair. It says that the total loss probability experienced by

1092

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

ers, link routing weights (in parentheses), and traffic requirements. Routing weights are chosen in order to have one single path for every source/destination pair. We consider a mixed traffic scenario where the file size (ranging from 1 to 195 packets) follows the distribution related in [4]. We choose, for this case, the following TCP QoS constraints: (i) latency Lt 6 0.5 s for files shorter than 20 packets, (ii) throughput Th P 512 kbps for files longer than 20 packets, and (iii) Ploss = 0.01, using the transportlayer QoS translator we obtain the equivalent network-layer performance constraint RTT 6 0.07 s (for the sake of simplicity, in the examples we will consider RTTsd = RTT, "(s, d)). The CA and BA problems associated with this network have 12 unknown variables and 11 constraint functions (we have discarded nine redundant constraint functions). In order to obtain some comparisons, we also implemented a design procedure using the classical formula (see [1,2]) which considers an M/M/1 queue model in the CA problem. We also extended the classical approach to the BA problem, which is solved considering M/M/1/B queues. We also imposed these same constraints in the classical approach. In Fig. 4, it can be immediately noticed that considering the burstiness of IP traffic radically changes the network design. The link utilization factors have an average equal to about q ¼ 0:8, and buffer sizes have average B ¼ 175, which is about 4 times the average number of packets in the queue (40 packets). Indeed, the link utilizations obtained with our methodology are much smaller than those produced by the classical approach, and buffers are much larger. This is due to the bursty arrival process of IP traffic, which is well captured by the M[X]/M/1/B model.

all the flows routed on the path (s, d) should not exceed the maximum fixed Ploss. Here, p(Bij, Cij, fij, [X]) is the average loss probability for the M[X]/M/1/B queue, which is evaluated by solving its Continuous Time Markov Chain (CTMC). Constraints (14) are non-negativity constraints. In the previous formulation we have considered the following upper bound on the value of loss probability for path (s, d) (constraint (13)).  Y P^ loss ¼ 1  1  dsd ij pðBij ; C ij ; fij ; ½X Þ i;j

6

X ij

dsd ij pðBij ; C ij ; fij ; ½X Þ.

ð15Þ

Notice also that the first part of Eq. (15) is based on the assumption that link losses are independent. Consequently, the solution of the BA problem is a conservative solution to the full problem. The above stated BA problem is a convex optimization problem [10], and its global minimum can be found using standard convex programming techniques [16]. 3.2.1. Numerical examples and simulations In this section we present some numerical results, which correspond to the solution of selected CA and BA problems (here we used the logarithm barrier method). In order to validate our designs, we ran simulation experiments using the ns-2 [17], software package. As a first example, we present results obtained considering the multi-bottleneck mesh network shown in Fig. 3. The network topology comprises 5 nodes and 12 links. In this case, link propagation delays are all equal to 0.5 ms, that correspond to a link length of 150 km. Fig. 3 indicates link identifi-

1 5 (6)

1 (8) 10 (8)

6 (3) 11 (10)

2

5 12 (9) 4 (6)

7 (3) 2

9 (5) 8 (3)

4

3

O/D

1

2

3

4

5

1

0

7

9

8

3

2

9

0

3

9

2

3

4

1

0

8

7

4

8

1

6

0

2

5

3

8

4

9

0

(5)

Traffic Matrix [Mbps]

3 (5)

Fig. 3. 5-Node network: topology and traffic requirements.

1093

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103 1

3

0.95

2.5 File Transfer Latency [s]

Utilization Factor

0.9 0.85 0.8 0.75 0.7 0.65 0.6

1

0 Links

1

2

4

6 10 Flow Length [pkt]

19

195

Fig. 5. Model and simulation results for latency; 3-link path from the 5-node network.

M [X] /M/1/B M/M/1/B

250 Buffer Length [pkt]

1.5

M/M/1 M [X] /M/1

0.5

200 150 100 50 0 Links

Fig. 4. Link utilization factor and buffer size for a 5-node network.

To validate the design methodology, we ran ns-2 simulations for drop-tail and RED 1 buffers. We assume that New Reno is the TCP version of interest. In addition, we assume that TCP connections are established choosing at random a server–client pair, and are opened at instants described by a Poisson process. Connection opening rates are determined so as to set the link flows to their desired values. The packet size is assumed constant, equal to the maximum segment size (MSS); the maximum window size is assumed to be 32 segments. We report detailed results selecting traffic from node 4 to node 1, which is routed over one of the most congested paths (three hops, over links: 8, 7, 6). Fig. 5 plots the file transfer latency for all file sizes for the selected source/destination pair. The QoS constraint of 0.5 s for the maximum latency is also reported. We can see that model results and simulation estimates are in perfect agreement with 1

2

0.5

0.55

300

CSA Model DropTail RED

Optimal values for RED [18] parameters are obtained according to the procedure given in [10,11].

specifications, the constraints being perfectly satisfied for all files shorter than 20 packets. Note also that longer flows obtain a much higher throughput than the target, because the file transfer latency constraint is more stringent (as also shown in Fig. 2). It is important to observe that a network dimensioned using the classical approach cannot satisfy all the QoS constraints. As a second example of multi-bottleneck topology we chose a network comprising 10 nodes and 24 links. Link propagation delays are uniformly distributed between 0.05 and 0.5 ms, i.e., link lengths vary between 15 km and 150 km. The traffic requirement matrix is set to obtain an average link flow of about 15 Mbps. The CA and BA problems associated with this network have 24 unknown variables and 66 constraint functions (we have discarded 24 redundant constraint functions). We considered the same design target QoS parameters as for the previous example. In order to observe the impact of traffic load and performance constraints on our design methodology, we performed several numerical experiments. Fig. 6 shows the range of network link utilization as a function of traffic load (first plot). Looking at how traffic requirements impact the CA problem, we observe that the larger the traffic load, the higher the utilization factor. This is quite intuitively explained by a higher statistical multiplexing gain, and by the fact that the RTT is less affected by the transmission delay of packets at higher speed. The behavior of buffer sizes as a function of traffic requirements is shown on the second plot. As expected, the larger the traffic load, the higher the space needed in queue (buffer sizes).

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103 1

1

0.8

0.8 Link Utilization

Link Utilization

1094

0.6

0.4

0.6

0.4

0.2

0.2

Average Min Max

0 0

5

10

15

Average Min Max

0 0

20

0.5

1 1.5 File Transfer Latency [s]

2

0.5

1 1.5 File Transfer Latency [s]

2

Source/Destination Average Traffic [Mbps] Average Min Max

500 Buffer Length [pkt]

Buffer Length [pkt]

600

Average Min Max

300

200

100

400 300 200 100 0

0 0

5

10

15

20

Source/Destination Average Traffic [Mbps]

0

Fig. 6. Link utilization factor and buffer length for a 10-node network (considering different source/destination traffic).

Fig. 7. Link utilization factor and buffer length for a 10-node network (considering different target file transfer latencies).

The impact of more stringent QoS requirements is considered in Fig. 7 (Ploss = 0.01, average link traffic = 15 Mbps). Notice that, in order to satisfy a very tight constraint (file latency Lt 6 0.2 s), it is necessary to have an utilization factor close to 20% on some particularly congested links (first plot). Tight constraints mean packet delays with small values and thus larger capacity values concerning the link flows. On the contrary, relaxing the QoS constraints, we note a general increase in the link utilization, up to 90%. The behavior of buffer sizes as a function of file transfer latency requirements is shown in the second plot. We see that stringent QoS requirements force small values for buffer sizes. Finally, Fig. 8 shows link utilization and buffer sizes considering different packet-loss probability constraints, while keeping fixed the file transfer latency Lt 6 2 s and throughput Th P 512 kbps (average link traffic = 15 Mbps). Obviously, an increase of Ploss values forces the transport-layer QoS translator to reduce the RTT to meet the

QoS constraints. As a consequence, the utilization factor decreases (first plot). More interesting is the effect of selecting different values of Ploss on buffer sizes (second plot). Indeed, to obtain Ploss 6 0.005, buffer sizes longer than 350 packets are required, while Ploss 6 0.02 can be guaranteed with buffers shorter than 70 packets. This result stems from the correlation of TCP traffic and is not captured by a Poisson model. Simulations using ns-2 confirm that the target QoS parameters are met in all cases. 3.3. The capacity and flow assignment problem Traditionally, packet-network design focused on optimizing either network cost or performance by tuning link capacities and routing strategies. Since the routing and link capacities optimization problems are closely interrelated, it is appropriate to jointly solve them in what is called the Capacity and Flow Assignment (CFA) problem.

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103 1

0.8 Link Utilization

C ij P fij P 0

Average Min Max

jsd ij 2 f0; 1g

0.6

0.4

0.2

0 0.005

0.01 0.015 Packet Loss Probability

400

0.02

Average Min Max

Buffer Length [pkt]

300

200

100

0 0.005

0.01 0.015 Packet Loss Probability

0.02

Fig. 8. Link utilization factor and buffer length for a 10-node network (considering different target packet-loss probabilities).

Our goal is to determine a route for the traffic that flows on each source/destination pair and the link capacities in order to minimize the network cost subject to the maximum allowable e2e packet delay. Let jsd ij be a decision variable which is one if link (i, j) is in path (s, d) and zero otherwise. Thus the CFA problem is formulated as the following optimization problem: X Z CFA ¼ min gðd ij ; C ij Þ ð16Þ i;j

subject to: X

jsd ij

j

K1



X

X j

jsd ij

8 1 > < sd jji ¼ 1 > : 0

6 RTT sd C ij  fij X jsd 8ði; jÞ fij ¼ ij csd i;j

s;d

if i ¼ s if i ¼ t

8ði; s; dÞ

otherwise X  K2 jsd ij d ij i;j

ð17Þ

8ðs; dÞ ð18Þ ð19Þ

8ði; jÞ 8ði; jÞ; 8ðs; dÞ

1095

ð20Þ ð21Þ

The objective function (16) represents total link cost, which is a linear function of both the link capacity and the physical length. Constraint set (17) enforces flow conservation, defining a route for the traffic from a source s to a destination d. Eq. (18) is the e2e packet delay constraint for each source/destination pair. Eq. (19) defines the average data flow on the link. Constraints (20) and (21) are non-negativity and integrality constraints, respectively. Finally, K1 = K/l and K2 is a constant to convert distance into time. We notice that this problem is a nonlinear nonconvex mixed-integer programming problem. Other than the nonlinear constraint (18), it is basically a multi-commodity flow problem [19]. Multi-commodity flow belongs to the class of NP-hard problems for which no known polynomial-time algorithms exist [20]. In addition, thanks to its non-convex property there are in general a large number of local minima solutions. Therefore, in this paper we only discuss CFA suboptimal solutions. In [11] we proposed a composite upper and lower bounding procedure based on a Lagrangean relaxation [22] of the CFA problem. The purpose is to obtain a relaxed problem, called Lagrangean subproblem, which is easier to solve than the original problem. The objective value from the Lagrangean relaxation problem provides a lower bound (LB), in the case of minimization, for the optimal solution to the original problem. The best lower bound can be derived by solving the Lagrangean dual. To solve the dual problem we used a subgradient optimization technique [23]. Information obtained from the Lagrangean relaxation is then used by applicationdependent heuristics to construct feasible solutions to the original problem, i.e., a primal heuristic (PH). In order to permit some comparisons, we also apply a logarithmic barrier CA solution with minimum-hop routing (MinHop + CA), i.e., we just ignore the routing optimization when solving the CA problem. Another approach is described in the following subsection.

3.4. The Greedy Weight Flow Deviation method The classical Flow Deviation (FD) method is well known to solve CFA problems [1,2]. In this section

1096

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

we present a heuristic, based on the FD method, to solve the CFA problem presented in Section 3.3. Considering that no closed-form expression for the optimal capacities can be derived from our CFA formulation we proceeded in the following way: first, it is straightforward to show that the link weights in the original FD method are given by Lij ¼

d ij C ij fij

ð22Þ

Second, in order to enforce e2e QoS delay performance constraints, the link capacities Cij must be obtained using the CA solver presented in Section 3.1.1. As our new method relies on the greedy nature of the CA solver algorithm to direct computations toward a local optima, we called it the Greedy Weight Flow Deviation (GWFD) method. As noted before the CFA problem admits several local minima. A way to obtain a more accurate estimate of the global minima is to restart the procedure using random initial flows. However, we obtained very good results setting as initial trail Lij = dij. The following is a description in pseudo-code of the GWFD method: Greedy Weight Flow Deviation method:

1. 2. 3. 4. 5.

Given: feasible f0 and C0; f* = f0; C* = C0; p = 0 Repeat Compute link weights Lp Compute minimum-weight paths Compute flows fp + 1 Solve CA problem and obtain Cp + 1 If D(Cp + 1) P D(Cp) Stop

3.4.1. Numerical examples In this section we present results obtained considering several fixed topologies (40-node, 160-link each), which have been generated using the BRITE topology generator [21] with the router level option. Link propagation delays are uniformly distributed between 0.5 ms and 1.5 ms, i.e., link lengths vary between 100 km and 300 km. Random traffic matrices were generated by picking the traffic intensity of each source/destination pair from a uniform distribution. The average source/destination traffic requirement was set to ^csd ¼ 5 Mbps. For all source/destination pairs, the target QoS constraints are: (i) latency Lt 6 0.2 s for files shorter than 20 segments, (ii) throughput Th P 512 kbps for files longer than 20 segments, and (iii) Ploss = 0.001. Using the transport-layer QoS translator (Section 2), we obtain the equivalent network-layer performance constraint RTT 6 0.032 s for all source/ destinations node pairs. Our goal is to obtain routing and link capacities. For each topology, we solved both the related CFA and BA problems. Fig. 9 shows network costs for 10 different topologies. The GWFD solutions are compared to solutions from other three techniques (LB, PH, and MinHop + CA) [11]. We can observe that the GWFD solutions, for all considered topologies, always fall rather close to the lower bound (LB). The gap between GWFD and LB is about 13%. In addition, the GWFD algorithm is faster than the primal heuristic approach (PH)— only 5 s of CPU time are needed to solve an instance with 40 nodes—while it obtains very similar results. Avoiding to optimize the flow assignment subproblem results in more expensive solutions, as 6.0*106

Else (a) f* = fp + 1; C* = Cp + 1 (b) p = p + 1

It must be noted that the problem represented by the formulation (16)–(21) and the problem addressed by the GWFD algorithm are not exactly the same. In fact, the traffic routing solutions resulting from the GWFD algorithm are minimum-weight paths, and those resulting from the CFA formulation are not necessarily minimum-weight paths.

5.5*106 Network Cost

End Else End Repeat End

L t < 0.2 s, γ = 5 Mbps Min Hop + CA GWFD PH LB

5.0*106 4.5*106 4.0*106 3.5*106 3.0*106 1

2

3

4

5 6 7 Network ID

8

9

10

Fig. 9. Network cost for 40-node network random topologies.

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103 4.2*10

6

γ = 5 Mbps

GWFD LB

4.1*106

Network Cost

4.0*106 3.9*10

6

3.8*106 3.7*106 3.6*106 3.5*106 3.4*106

0

0.2

0.4

0.6

0.8

1

1.2

File Transfer Latency [s]

Fig. 10. Network costs as a function of latency constraint (40-node, 160-link network).

shown by the ‘‘Min Hop’’ routing associated with an optimized CA problem. This underlines the need to solve the CFA problem rather then a simpler CA problem. A second set of experiments was performed to investigate the impact of the latency constraints on the optimized network cost. Fig. 10 shows the LB and GWFD results for latency constraint values ranging from 0.2 s to 1.0 s. The plots clearly show the trade off between cost and latency; as expected, costs grow when the latency constraints become tighter. It is interesting to observe that when the latency constraints become very tight (latencies become close to zero), the sensitivity of the network cost increases. 3.5. The Topology, Capacity and Flow Assignment problem In this section the objective is to determine a less expensive solution to interconnect nodes, and assign flow and capacities, while satisfying the reliability and e2e QoS constraints. This problem is called the Topological, Capacity and Flow Assignment (TCFA) problem. This is a complex combinatorial optimization problem, which can be classified as NP-complete [20]. Polynomial algorithms which can find the optimal solution for this problem are not known. Therefore, heuristic algorithms are applied, searching for solutions. We analyze two meta heuristic approaches: the Genetic Algorithm (GA) [24,25], and the Tabu Search (TS) algorithm [26] to address the topological design problem. GAs are heuristic search procedures which apply

1097

natural genetic ideas such as natural selection, mutations and survival of the fittest. The TS algorithm is derived from the classical Steepest Descent method, however thanks to an internal mechanism that accepts worse solutions than the best solution found so far, it is less subject to local optima entrapment. Details about the application of GA and TS algorithms to topological design can be found in Appendix A. The TCFA problem can be formulated as follows: given the geographical location of the network nodes on the territory, the traffic matrix, and the capacity costs; minimize the total link cost, by choosing the network topology and selecting link flows and capacities, subject to QoS and reliability constraints. As reliability constraint we consider that all traffic must be exchanged even if a single node fails (2-connectivity). There is a trade off between reliability and network cost; we note that more links between nodes imply more routes between each node pair, and consequently the network is more reliable; on the other hand, the network is more expensive. Finally, the QoS constrains correspond to maintaining the e2e packet delay for each network source/destination pair below a maximum tolerable value. Our solution approach is based on the exploration of the solution space (i.e., 2-connected topologies) using the meta heuristic algorithms. As the goal is to design a network that remains connected despite one node failure, for each topology evaluation, actually, we construct n different topologies that are obtained from the topology under evaluation by the failure of a node each time, and then for each topology we solve its related CFA problem (using the GWFD method). Link capacities are set to the maximum capacity value found so far considering the set of topologies. Using the capacities obtained, the objective function (network cost) is obtained. 3.5.1. Numerical examples and simulations In this section we present some selected numerical results considering network designs obtained with the meta heuristic approaches. We consider the same mixed traffic scenario where the file size follows the distribution shown in [4]. A first set of experiments was performed to investigate the performance of GA and TS algorithms. As a first example, we applied the proposed methodology to set up a 15-node VPN network over a given 35-node, 140-link physical topology. The target QoS

1098

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

Network Cost

3.2*105 3.1*10

5

3.0*10

5

3

35-nodes Network GA NI=80 GA NI=160 TS TL=10 TS TL=20

7

3

8

(A)

7

2

(B)

5

5 6

2.9*105

8 2

6 9

9 4

2.8*105

4

1

1

2.7*105 10

2.6*10

5

2.5*10

5

10

Fig. 12. Network topologies for two different traffics scenarios: (A) uniform distribution and (B) non-uniform distribution. 0

1000

2000

3000

4000

5000

Computational Time [s]

Fig. 11. Network cost as a function of GA and TS computational time (35-node network).

constraints for all source/destination pairs are: (i) file latency Lt 6 1 s for files shorter than 20 segments, (ii) throughput Th P 512 kbps for files longer than 20 segments. Selecting Ploss = 0.01, we obtain a network-level design constraint equal to RTT 6 0.15 s for all source–destination pairs. The average source/destination traffic requirement was set to ^csd ¼ 3 Mbps. Link lengths vary between 15 km and 300 km (average = 225 km). Fig. 11 shows the network cost as a function of computational time (in seconds) considering both the GA and TS algorithms (for different values of population (NI) and Tabu list (TL) sizes). We notice that, after a period of 1.38 h, the solution values differ by at most 2.5%. The best solution is given by the GA with NI = 160 individuals (in contrast, the same solution value was reached by the TS, with TL = 20 moves, after a period of 12 h). The GA with a small population quickly stagnates, and its solution value is ‘‘relatively poor’’. Using a population size of NI = 160 (value suggested by the estimate given in Appendix A.1) the GA makes slower progress but consistently discover better solutions. If the population size increases further, the GA becomes slower, without however, being able to improve the solutions (not shown in the figure). In the second set of experiments, we analyze the impact of the traffic scenario on the obtained network topology. We consider the dimensioning of VPN network over a given 10-node, 40-link physical topology. The target QoS constraints for all source/ destination pairs are the same used in the first experiment; and the link lengths vary between 140 km

and 760 km (average = 380 km). Two traffic scenarios are considered. In the first scenario, source/destination traffic is randomly generated from a uniform distribution with average value ^csd ¼ 1 Mbps. Using the GA approach, the final topology is shown in Fig. 12(A). Solid lines correspond to the chosen links that synthesize the VPN network topology (dashed lines are existing links, but they are not chosen for the VPN topology). We notice that several connections are needed to guarantee the network 2-connectivity. In the second case, traffic relations are set as follows: two nodes offer an average aggregated traffic equal to 5 Mbps (nodes 3 and 6 in Fig. 12), one node offers 2 Mbps (node 4), and the rest offer traffic equal to 1 Mbps. From Fig. 12(B) we see that three new links were added in order to drain off the increased traffic from nodes 3 and 6; while a link between nodes 2 and 5 was removed. In order to validate the network design, we compare the target performance parameters against the performance measured from very detailed simulation experiments (using the ns-2 simulator). We performed packet-level simulations to check whether the e2e QoS constraints are actually met. In this case, we completed the design of the network shown Fig. 12(A) by solving its associated BA problem with drop-tail buffers. As one example, we did path simulations considering the path that connects nodes 10, 6, 2, 8 and 3. Table 1 reports the optimal values for capacities and buffer sizes; it also shows link flows values for two working scenarios: (i) normal network operation, and (ii) failure of network node 5 (in this case some links must transport an increased traffic flow, f 0 ). We notice that, in this example, the path from nodes 10 to 3 is the same for node 5 working/failure case.

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103 Table 1 Dimensioning for a 10-node network (values for a 5-link path) Link

f (Mbps)

f 0 (Mbps)

C (Mbps)

B [pkt]

d (km)

10–6 6–2 2–8 8–3

35.6 82.1 57.3 43.7

44.7 67.3 62.4 82.6

47.3 88.6 67.2 87.0

656 222 587 756

485 380 155 270

5.0

CSA Model (fault) NS (fault) NS (no fault)

File Trasfer Latency [s]

4.0

3.0

2.0

1.0

0.0 1

2

4

6 10 Flow Length [pkts]

19

195

Fig. 13. Model and simulation results for latency; 4-link path from the 10-node network.

Fig. 13 plots the file transfer latency for all flow size classes for the selected source/destination pair. The QoS constraint of 1 s for the maximum latency is reported, as well as model results. We can see that the results are in perfect agreement with the specifications, the constraints being perfectly satisfied for all files shorter than 20 packets, for both working scenarios. Note also that in the case of normal network operation the file transfer latency has smaller values, resulting from the greater gap between link capacities and flows. As noted before, longer flows obtain a much higher throughput than the desired one. 4. Conclusion In this paper, we have considered the QoS and reliability design of packet networks, presenting mathematical formulations and introducing a collection of heuristic algorithms to compute approximate solutions. Two important elements are considered in our approach: (a) the mapping of the e2e QoS constraints into transport-layer performance constraints first, and then into network-layer performance constraints; and (b) a refined TCP/IP

1099

traffic modeling technique that is both simple and capable of producing accurate performance estimates for general-topology packet-switching networks loaded by realistic traffic patterns. By explicitly considering TCP traffic, we also need to consider the impact of finite buffers, therefore facing the Buffer Assignment problem. To the best of our knowledge, no previous work solves packet-network design problems accounting for user layer e2e QoS constraints considering more realistic traffic models. The numerical results have shown that the burstiness of IP traffic radically changes the network design. Indeed, the link utilization obtained with our approach is much smaller than those produced by the classical approach, and the buffer values are much longer. This is due to the bursty arrival process of IP traffic, which is well captured by the M[X]/M/1/B model. On the other hand, the capacity assignment using the classical approach cannot satisfy all the QoS constraints. In addition, network costs can be reduced by the jointly optimization of routing and link capacities since these are closely interrelated. For this scope, we have proposed a new CFA algorithm, called GWFD, that is capable of assigning flow and capacities under e2e QoS constraints. The proposed GWFD method is particularly interesting. It can solve CFA instances in a fast and quite accurate way. Based on the GWFD method we have proposed a practical, useful way to solve the topological design problem with e2e QoS and reliability constraints. This approach, which considers GA and TS meta heuristics, while not necessarily an original idea, represents a pragmatic solution to the problem. Computational results suggest a better efficiency of the GA approach in providing good solutions for medium-sized computer networks, in comparison with well-tried conventional TS methods. In order to validate the proposed methodology, we have compared results against detailed simulation experiments (using ns-2 software) in terms of network performance. The target QoS performance is met in all cases. Acknowledgement The authors would like to thank the anonymous reviewers for their helpful comments and suggestions.

1100

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

Appendix A. Applying meta heuristic algorithms to network design A.1. Genetic algorithms Genetic algorithms (GAs), as powerful and broadly applicable stochastic search and optimization techniques, are the most widely known types of evolutionary computation methods today. The basic principles of GAs were first established rigorously by Holland [24]. Selection, Genetic Operation and Replacement, directly derived by from natural evolution mechanisms are applied to a population of solutions, thus favoring the birth and survival of the best solutions. They are generally good at finding ‘‘acceptably good’’ solutions to problems in ‘‘reasonable’’ computing times. In general, a genetic algorithm has five basic components as follows: (a) an encoding method, that is a genetic representation (genotype) of solutions to the problem; (b) a way to create an initial population of individuals (chromosomes); (c) an evaluation function, rating solutions in terms of their fitness, and a selection mechanism; (d) the genetic operators (crossover and mutation) that alter the genetic composition of offspring during reproduction; and (e) values for the parameters of the genetic algorithm. In GAs, a population of NI solutions (individuals) is created initially. Then by using genetic operators a new generation is evolved. The fitness of each individual determines whether it will survive or not. The individuals in the current population are then replaced by their offspring, based on a certain replacement strategy. After a number of generations NG, or some other criterion is met, it is hoped that a near optimal solution is found. Genotype: The genetic encoding of an individual is called a genotype and the corresponding physical appearance of an individual is called a phenotype. As we know, a gene in a chromosome is characterized by two factors: locus, the position of the gene within the structure of chromosome, and allele, the value the gene takes. The allele can be encoded as binary, a real number, or other forms and its range is usually defined by the problem. Manipulation of topologies with the genetic algorithm requires that they are represented in some suitable format. Although more compact alternatives are possible, the connectivity matrix of a graph is an adequate representation. A graph is represented by an n · n binary matrix, where n is the

number of nodes of the graph. The value of each element in row i and column j, a ‘‘1’’ or a ‘‘0’’, tells whether or not a specific edge connects the pair of nodes (i, j). Fitness evaluation: The evaluation of the objective function is usually the most demanding part of a GA algorithm because it has to be done for every individual in every generation. In this paper, a fitness function, which is an estimation of the goodness of the solution for the topological design problem, is inversely proportional to the objective function value (cost). The lower the cost, the better the solution is. Selection: Parent selection emulates the survivalof-the-fittest mechanism in nature. It is important to ensure some diversity among the population by breeding from a selection of the fitter individuals, rather than just from the fittest. The selection process used here is based on the tournament method. In this approach pairs of individuals are picked at random and the one with the better fitness (the one which ‘‘wins the tournament’’) is used as one parent. The tournament selection is then repeated on a second pair of individuals to find the other parent from which to breed. Crossover: Crossover is a recombination operator that combines subparts of two parent chromosomes to produce offspring that contain some parts of both parentsÕ genetic material. The probability, pc, that these chromosomes are recombined (mated) is a user-controlled option and is usually set to a high value (e.g., 0.95). Unfortunately, according to the genotype representation used here, the above crossover operators are not suitable for recombination of two individuals (the crossover operation mostly leads to illegal individuals). In this paper, we first use a simple one-point crossover operator. Then, an effective and fast check and recovery algorithm is used to repair the illegal individuals. If the repair operation is unsuccessful the parents are considered as crossover outputs. Mutation: Mutation is used to change, with a fixed small probability, pm, the value of a gene (a bit) in order to avoid the convergence of the solutions to ‘‘bad’’ local optima. As a population evolves, there is a tendency for genes to become dominant. Mutation is therefore important to ‘‘loosen up’’ genes which would otherwise become fixed. In our experiments good results were obtained using a mutation operator that simply changes one bit, picked at random, for each produced offspring.

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

Replacement strategies: In this paper we use the generational-replacement method with elitist strategy where once the sonsÕ population has been generated, it is merged with the parentsÕ population according to the following rule: only the best individuals present in both sonsÕ population and parentsÕ population enter the new population. The elitist strategy may increase the speed of domination of a population by a super chromosome, but on balance it appears to improve the performance. Population size: GAs work surprisingly well with quite small populations. Nevertheless, if the population is too small, there is an increased risk of convergence to a local optima; the variety that drives a genetic algorithmÕs progress cannot be maintained. As population size increases, the GA discovers better solutions more slowly; it becomes more difficult for the GA to propagate good combinations of genes through the population and join them together. A simple estimate of appropriate population size for topological optimization is given by the following expression: 1

NI P

logð1  pq Þ   q log 1  nðn1Þ

ðA:1Þ

where NI is the population size, n is the number of nodes, q is the number of links, and p the is probability that the optimal links occurs at least once in the population. The choice of an appropriate value for q is based on the behavior of the topological optimization process, which can change the number of links in order to reduce the network cost. We, therefore, set q as the minimum number of links that potentially can maintain the network 2-connectivity. Some population size values are given in Table A.1. A.2. Tabu search algorithm The second heuristic considered relies on the application of the Tabu Search (TS) methodology

Table A.1 Population size for topological optimization (NI) n

q

p = 0.990

p = 0.995

p = 0.999

10 25 35

20 50 70

30 98 146

33 106 157

39 124 184

1101

[26]. The TS algorithm can be seen as an evolution of the classical local-search algorithm called Steepest Descent, however, thanks to the interior mechanism that also accepts worse solutions than the current one, it is not subject to local minima entrapment. For each admissible solution, a class of neighbor solutions is defined. A neighbor solution, is defined as a solution that can be obtained from the current solution by applying an appropriate transformation called move. The set of all the admissible moves uniquely defines the neighborhood of each solution. The dimension of the neighborhood is NN. In this paper a simple move is considered: it either removes an existing link or adds a new link between network nodes. At each iteration of the TS algorithm, all solutions in the neighborhood of the current one are evaluated, and the best is selected as the new current solution. A special rule, the tabu list, is introduced in order to prevent the algorithm from deterministically cycling among already visited solutions. In the tabu list are the last accepted moves and while a move is stored in the it, it cannot be used to generate a new move for a duration of a certain number of iterations. Therefore it may happen that TS continuing the search will select an inferior solution because better solutions are tabu. If a TS move yields a better solution than any encountered so far, its tabu classification can be overridden (aspiration criterion). The choice of the tabu list size is very important in the optimization procedure: a small size could cause the cyclic visitation of the same solutions while a big one could block the optimization process for many iterations, avoiding a good visit of the solution space. References [1] L. Kleinrock, Queueing Systems, Volume II: Computer Applications, Wiley Interscience, New York, 1976. [2] M. Gerla, L. Kleinrock, On the topological design of distributed computer networks, IEEE Transactions on Communications 25 (January) (1977) 48–60. [3] K. Claffy, G. Miller, K. Thompson, The nature of the beast: recent traffic measurements from an Internet backbone, INET Õ98, July 1998. [4] M. Mellia, A. Carpani, R. Lo Cigno, Measuring IP and TCP behavior on edge nodes, in: Proceedings of IEEE Globecom 2002, Taipei, Taiwan, November 2002. [5] V. Paxson, S. Floyd, Wide-area traffic: the failure of Poisson modeling, IEEE/ACM Transactions on Networking 3 (3) (1995) 226–244.

1102

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103

[6] K.T. Cheng, F.Y.S. Lin, Minimax end-to-end delay routing and capacity assignment for virtual circuit networks, IEEE Globecom 95, Singapore, November 1995, pp. 2134–2138. [7] E. Rolland, A. Amiri, R. Barkhi, Queueing delay guarantees in bandwidth packing, Computers and Operations Research 26 (1999) 921–935. [8] A. Gersht, R. Weihmayer, Joint optimization of data network design and facility selection, IEEE Journal on Selected Areas in Communications 8 (9) (1990) 1667–1681. [9] C. Fraleigh, F. Tobagi, C. Diot, Provisioning IP backbone networks to support latency sensitive traffic, IEEE Infocom 03, San Francisco, CA, March 2003. [10] E. Wille, M. Garetto, M. Mellia, E. Leonardi, M. Ajmone Marsan, Considering end-to-end QoS in IP network design, NETWORKS 2004, Vienna, Austria, June 13–16. [11] E. Wille, M. Garetto, M. Mellia, E. Leonardi, M. Ajmone Marsan, IP network design with end-to-end QoS constraints: The VPN Case, in: 19th International Teletraffic Congress, Beijing, China, August/September 2005. [12] N. Cardwell, S. Savage, T. Anderson, modeling TCP latency, IEEE Infocom 00, Tel Aviv, Israel, March 2000. [13] J. Padhye, V. Firoiu, D. Towsley, J. Kurose, Modeling TCP Reno performance: a simple model and its empirical validation, IEEE-ACM Transactions on Networking 8 (2) (2000) 133–145. [14] M. Garetto, D. Towsley, Modeling, simulation and measurements of queuing delay under long-tail internet traffic, ACM SIGMETRICS 2003, San Diego, CA, June 2003. [15] X. Chao, M. Miyazawa, M. Pinedo, Queueing Networks, Customers, Signals and Product Form Solutions, John Wiley, 1999. [16] M. Wright, Interior methods for constrained optimization, Acta Numerica 1 (1992) 341–407. [17] S. McCanne, S. Floyd, NS Network simulator, Available at . [18] S. Floyd, V. Jacobson, Random early detection gateways for congestion avoidance, IEEE/ACM Transactions on Networking 1 (4) (1993) 397–413. [19] B. Gendron, T.G. Crainic, A. Frangioni, Multicommodity capacitated network design, in: B. Sanso`, P. Soriano (Eds.), Telecommunications Network Planning, Kluwer, Boston, MA, 1998, pp. 1–19. [20] M.R. Garey, D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman, San Francisco, CA, 1979. [21] A. Medina, A. Lakhina, I. Matta, J. Byers, BRITE: Boston university representative internet topology generator, Boston University. Available from , April 2001. [22] A.M. Geoffrion, Lagrangean relaxation and its uses in integer programming, Mathematical Programming Study 2 (1974) 82–114. [23] M.L. Fisher, The Lagrangean relaxation method for solving integer programming problems, Management Science 27 (1981) 1–18. [24] J.H. Holland, Adaptation in Natural and Artificial Systems, MIT Press, 1975. [25] D.E. Goldberg, Genetic Algorithm in Search, Optimization, and Machine Learning, Addison-Wesley Publishing Company, 1989. [26] F. Glover, M. Laguna, Tabu Search, Kluwer Academic Publishers, 1997.

Emilio C.G. Wille received his degree in Electronic Engineering in February 1989, and a M.Sc. in Electronic and Telecommunications Engineering in July 1991, both from Centro Federal de Educac¸a˜o Tecnolo´gica do Parana´—CEFET/ PR (Curitiba-Brazil). Since October 1991, he is with Electronics Department of CEFET/PR as an Assistant Professor. His teaching duties at CEFET/PR comprise graduate and undergraduate-level courses on electronic and telecommunication theory. From February 2001, until February 2004, he was with the Electronics Department of Politecnico di Torino (Italy), as a Ph.D. student. He was supported by a CAPES Foundation scholarship from the Ministry of Education of Brazil. His research interests are centered upon the application of optimization algorithms for telecommunication networks design and planning, Markov processes, queueing models, and performance analysis of telecommunication systems.

Marco Mellia was born in Torino, Italy, in 1971. He received his degree in Electronic Engineering in 1997, and a Ph.D. in Telecommunications Engineering in 2001, both from Politecnico di Torino. From March to October 1999 he was with the CS department at Carnegie Mellon University as visiting scholar. Since April 2001, he is with Electronics Department of Politecnico di Torino as Assistant Professor. He has co-authored over 50 papers published in international journals and presented in leading international conferences, all of them in the area of telecommunication networks. He participated in the program committees of several conferences including IEEE Globecom and IEEE ICC. His research interests are in the fields of All-Optical Networks, Traffic measurement and modeling, QoS Routing algorithms.

Emilio Leonardi was born in Cosenza, Italy, in 1967. He received a Dr. Ing degree in Electronics Engineering in 1991 and a Ph.D. in Telecommunications Engineering in 1995 both from Politecnico di Torino. He is currently an Assistant Professor at the Dipartimento di Elettronica of Politecnico di Torino. In 1995, he visited the Computer Science Department of the University of California, Los Angeles (UCLA), in summer 1999 he joined the High Speed Networks Research Group, at Bell Laboratories/ Lucent Technologies, Holmdel (NJ); in summer 2001, the Electrical Engineering Department of the Stanford University and finally in summer 2003, the IP Group at Sprint, Advanced Technologies Laboratories, Burlingame CA. He participated in several national and european projects such as MIUR-MQOS, MIUR-PLANET IP, MIUR-IPPO, MIURTANGO, IST-SONATA and IST-DAVID. He is also involved in

E.C.G. Wille et al. / Computer Networks 50 (2006) 1086–1103 several consulting and research project with private industries, including Lucent technologies-Bell Labs., British Telecom, Alcatel and TILAB. He has co-authored over 100 papers published in international journals and presented in leading international conferences, all of them in the area of telecommunication networks. He received the IEEE TCGN best paper award for paper presented at at the IEEE Globecom 2002, ‘‘High Speed Networks Symposium’’. He participated to the program committees of several conferences including: IEEE infocom, IEEE Globecom and IEEE ICC. He was guest editor of two special issues of IEEE Journal of Selected Areas of Communications focused on high speed switches and routers. His research interests are in the field of: performance evaluation of communication networks, all-optical networks, queueing theory, packet switching.

Marco Ajmone Marsan is a Full Professor at the Electronics Department of Politecnico di Torino, in Italy, and the Director of the Institute for Electronics, Information and Telecommunications Engineering of the National Research Council. He has degrees in Electronic Engineering from Politecnico di Torino and University of California, Los Angeles.

1103

He was at Politecnico di Torino Electronics Department from November 1975 to October 1987—first as a researcher, then as an Associate Professor. He was a Full Professor at the University of Milan Computer Science Department from November 1987 to October 1990. During the summers of 1980 and 1981, he was with the Research in Distributed Processing Group, Computer Science Department, UCLA. During the summer of 1998 he was an Erskine Fellow at the University of Canterbury in New Zealand Computer Science Department. He has co-authored over 300 journal and conference papers in Communications and Computer Science, as well as the two books ‘‘Performance Models of Multiprocessor Systems,’’ published by the MIT Press, and ‘‘Modeling with Generalized Stochastic Petri Nets,’’ published by John Wiley. In 1982, he received the best paper award at the Third International Conference on Distributed Computing Systems in Miami, Florida,. In 2002, he was awarded a ‘‘Honoris Causa’’ Degree in Telecommunications Networks from the Budapest University of Technology and Economics. He is a corresponding member of the Academy of Sciences of Torino. He participates in a number of editorial boards of international journals, including the IEEE/ACM Transactions on Networking, and the Computer Networks Journal by Elsevier. He has been the principal investigator in national and international research projects dealing with telecommunication networks. His current interests are in the performance evaluation of communication networks and their protocols.

Computer Networks 50 (2006) 1104–1129 www.elsevier.com/locate/comnet

On the effects of the packet size distribution on FEC performance Gyo¨rgy Da´n *, Vikto´ria Fodor, Gunnar Karlsson Department of Signals, Sensors and Systems, KTH, Royal Institute of Technology, Osquldas Va¨g 10, 10044 Stockholm, Sweden Available online 5 October 2005

Abstract For multimedia traffic like VBR video, knowledge of the average loss probability is not sufficient to determine the impact of loss on the perceived visual quality and on the possible ways of improving it, for example by forward error correction (FEC) and error concealment. In this paper we investigate how the packet size distribution affects the packet loss process, i.e., the probability of consecutive losses and the distribution of the number of packets lost in a block of packets and the related FEC performance. We present an exact mathematical model for the loss process of an MMPP + MMPP/ Er/1/K queue and compare the results of the model to simulations performed with various other packet size distributions (PSDs), among others, the measured PSD from an Internet backbone. The results show that analytical models of the PSD matching the first three moments (mean, variance and skewness) of the empirical PSD can be used to evaluate the performance of FEC in real networks. We conclude that the exponential PSD, though it is not a worst case scenario, is a good approximation for the PSD of todayÕs Internet to evaluate FEC performance. We also conclude that the packet size distribution affects the packet loss process and thus the efficiency of FEC mainly in access networks where a single multimedia stream might affect the multiplexing behavior. We evaluate how the PSD affects the accuracy of the widely used Gilbert model to calculate FEC performance and conclude that the Gilbert model can capture loss correlations better if the CoV of the PSD is high.  2005 Elsevier B.V. All rights reserved. Keywords: Forward error correction; Packet loss process; MMPP; Packet size distribution; Gilbert-model

1. Introduction For flow-type multimedia communications, as opposed to elastic traffic, the average packet loss is not the only measure of interest. The burstiness of the loss process, the number of losses in a block *

Corresponding author. Tel.: +46 8 790 4253; fax: +46 8 752 6548. E-mail addresses: [email protected] (G. Da´n), [email protected] (V. Fodor), [email protected] (G. Karlsson).

of packets, has a great impact both on the user-perceived visual quality and on the possible ways of improving it, for example by error concealment and forward error correction. Forward error correction (FEC) is an attractive means to decrease the loss probability experienced by delay sensitive traffic, such as real-time multimedia, when ARQ schemes cannot be used to recover losses due to strict delay constraints. There are two main directions of FEC design to recover from packet losses. One solution, proposed by the IETF

1389-1286/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.09.006

G. Da´n et al. / Computer Networks 50 (2006) 1104–1129

and implemented in Internet audio tools like Rat [1] and Freephone [2] is to add a redundant copy of the original packet to one of the subsequent packets [3]. In the case of packet loss the information is regained from the redundant copy. The other set of solutions use media-independent block coding schemes based on algebraic coding, e.g. Reed–Solomon coding [4]. The error correcting capability of RS codes with k data packets and c redundant packets is c if data is lost. While Reed–Solomon codes are typically used to correct bit errors, they can be used to recover lost packets via block interleaving as described in [5]. Given a block of k packets, the packets are prefixed by their lengths in bytes, and packets shorter than the longest one in the block are padded by zeros. Reed–Solomon coding is applied to the ith symbol (typically byte) of each packet (in total k symbols) to form the ith symbols of the c redundant packets. Packets are then transmitted one-by-one, without the padding zeros. The loss of a packet appears as the loss of a symbol in a block of c + k symbols at the receiver and can be corrected as long as the number of lost packets is no more than c. The performance of both FEC schemes depends on the burstiness of the loss process: the performance of the first, media-dependent FEC scheme depends on the probability of consecutive packet losses; the capability of the second, media-independent FEC scheme to recover from losses depends on the distribution of the number of packets lost in a block. The burstiness of the loss process in the network can be influenced by three factors, the burstiness of the stream traversing the network, the burstiness of the background traffic and the packet size distribution. The effects of the burstiness of the stream traversing the network and the background traffic have been investigated before. In [6], the authors performed simulations to study the efficiency of FEC for recovery of lost packets. They concluded that as long as the ratio of streams using FEC is low, FEC can decrease the uncorrected loss probability significantly. In [4], the authors used an analytical model to evaluate how the burstiness, the block length and the number of sources employing FEC influences the efficiency of FEC, concluding that the burstiness has a significant effect on the efficiency of FEC. The authors in [7] performed simulations to investigate the trade-off between traffic shaping and FEC to decrease the uncorrected loss probability. They concluded that for longer delays the joint use of shaping and FEC gives the best per-

1105

formance. The authors studied the effects of the burstiness of the background traffic on the efficiency of FEC as well, and concluded that its effects are moderate. In [8], the authors used the Gilbert channel model [9] and evaluated the efficiency of FEC as a function of the burstiness of the loss process, concluding that a burstier loss process decreases the efficiency of FEC. The effects of the packet size distribution (PSD) are not clear however, since previous research concentrated on the exponential and deterministic packet size distributions only. The deterministic packet size distribution was motivated by the ATM standard in the last decade, while the exponential packet size distribution was motivated by the analytical tractability of the resulting models. Nevertheless, the PSD in the network can vary on the short term due to changes in the ongoing traffic and on the long term as new applications and protocols emerge. As individual applications cannot control the PSD in the network, it is important to know how the PSD will affect their performance, for example, how much gain can an application expect from FEC given a certain measured end-toend average loss probability. It is well known that in an M/G/1 queue the average number of customers is directly proportional to the coefficient of variation (CoV) of the service time distribution, as given by the Pollaczek–Khintchine formula [10]. For the finite capacity M/G/1 queue there is no closed form formula to calculate the packet loss probability [11,12], though we know from experience that a lower CoV of the service time distribution yields lower average loss probability. It is however unclear how the distribution of the service time affects the loss process in a finite queue and thus how the potential of using FEC changes. In this paper we present a queuing model to analyze the packet loss process of a bursty source, for example VBR video, multiplexed with background traffic in a single multiplexer. The multiplexer is modeled as a single server with a finite queue. We model the bursty source and the background traffic by Markov-modulated Poisson processes (MMPP) and consider Erlang-r distributed packet sizes. We investigate the effects of the network PSD on the packet loss process and the efficiency of FEC based on the queuing model presented here, a similar model with deterministic packet sizes and simulations. In particular, we compare the analytical results with Erlang-r (exponential, as a special case)

1106

G. Da´n et al. / Computer Networks 50 (2006) 1104–1129

and deterministic PSDs to those of simulations performed with general PSDs, among them the measured PSD of an Internet backbone [13]. The paper is organized as follows. Section 2 gives an overview of the previous work on the modeling of the loss process of a single server queue. In Section 3 we describe our model used for calculating the loss probabilities in a block of packets and the consecutive loss probability. In Section 4 we evaluate the effects of the PSD on the packet loss process in various scenarios. We consider constant average load in Section 4.1, constant average loss probability in Section 4.2, and we isolate the effect of the PSD from other factors in Section 4.3. In Section 5 we evaluate how the packet size distribution affects the accuracy of the Gilbert model to capture the correlation between losses. We conclude our work in Section 6. 2. Related work In [14], Cidon et al. presented an exact analysis of the packet loss process in an M/M/1/K queue, that is the probability of losing j packets in a block of n packets, and showed that the distribution of losses may be bursty compared to the assumption of independence. They also considered a discrete time system fed with a Bernoulli arrival process describing the behavior of an ATM multiplexer. Gurewitz et al. presented explicit expressions for the above quantities of interest for the M/M/1/K queue in [15]. In [16], Altman amd Jean-Marie obtained the multidimensional generating function of the probability of j losses in a block of n packets and gave an easy-to-calculate asymptotic result under the condition that n 6 K + j + 1. Schulzrinne et al. [17] derived the conditional loss probability (CLP) for the N*IPP/D/1/K queue and showed that the CLP can be orders of magnitude higher than the loss probability. In [4] Kawahara et al. used an interrupted Bernoulli process to analyze the performance of FEC in a cell-switched environment. The loss process of the MMPP/D/1/K queue was analyzed in [18] and the results compared to a queue with exponential packet size distribution. Models with general service time distribution have been proposed for calculating various measures of queuing performance [19,20], but not to analyze the loss process. Though models with exponential and deterministic PSDs are available, a thorough analysis of the effects of the PSD on the packet loss process has not yet been done.

3. Model description In this section we first present the model used to calculate the probability of losses in a block, then we show how it can be used to calculate the consecutive loss probability. Flows traversing large networks like the Internet cross several routers before reaching their destination. However, most of the losses in a flow occur in the router having the smallest available bandwidth along the transmission path, so that one may model the series of routers with a single router, the bottleneck [21,22]. We consequently model the network with a single queue with Erlang-r distributed packet sizes having average transmission time 1/l. The Erlang-r distribution is the distribution of the sum of r independent identically distributed random variables each having an exponential distribution. By increasing r to infinity the variance of the Erlang-r distribution goes to zero, and thus the distribution becomes deterministic. Packets arrive at the system from two sources, two Markov-modulated Poisson processes (MMPP), representing the tagged source (MMPPa) and the background traffic (MMPPs) respectively. The packets are stored in a buffer that can host up to K packets, and are served according to a FIFO policy. It is well known that compressed multimedia, like VBR video, exhibits a self-similar nature [23]. In [24], Robert and Le Boudec used a discrete time MMPP to fit the mean and the Hurst parameter of pseudo selfsimilar traffic. In [25], Andersen and Nielsen used the superposition of two-state MMPPs to model secondorder self-similar behavior over several timescales. Yoshihara et al. used the superposition of two-state interrupted Poisson processes (IPPs) and a Poisson process to model self-similar traffic in [26] and compared the behavior of the resulting MMPP/D/1 queue with simulations. They found that the approximation works well under heavy load conditions and gives a tight upper bound on the queue length. Klemm et al. [27] used the batch Markovian arrival process for aggregated traffic modeling in IP networks, and showed the effectiveness of the model in terms of queueing behavior and statistical properties of the traffic. Ryu and Elwalid [28] showed that short term correlations have dominant influence on the network performance under realistic scenarios of buffer sizes for real-time traffic. Based on these previous works we argue that the MMPP may be a practical model to derive approximate results for the queuing behavior of long-range dependent traffic such as

G. Da´n et al. / Computer Networks 50 (2006) 1104–1129

real-time VBR video, especially in the case of small buffer sizes [29]. Our assumption on the background traffic is justified by recent results indicating that Internet traffic can be approximated by a non-stationary Poisson process [30]. According to the results the change-free intervals are well above 150 ms, the ITUÕs G.114 recommendation for end-to-end delay for real-time applications. The authors in [31] used the superposition of MMPPs to model self-similar traffic over several timescales, and achieved good results in terms of queueing behavior. These empirical results are consistent with recent theoretical results [32]. 3.1. Probability of losses in a block

Each packet in the queue corresponds to r exponential stages, and thus the state space of the queue is {0, . . . , rK} · {1, . . . , L}. Our purpose is to calculate the probability of j losses in a block of n packets P(j, n), n P 1, 0 6 j 6 n. We define the probability P ai;l ðj; nÞ, 0 6 i 6 rK, l 2 L, n P 1, 0 6 j 6 n as the probability of j losses in a block of n packets, given that the remaining number of exponential stages in the system is i just before the arrival of the first packet in the block from the tagged source and the first packet of the block is generated while the superposed MMPP is in state l. As the first packet in the block is arbitrary, rK L X X P ðj; nÞ ¼ Pði; lÞP ai;l ðj; nÞ. ð1Þ l¼1

In the following we describe the calculation of the probability of losses in a block. Every n consecutive packets from the tagged source form a block, and we are interested in the probability distribution of the number of lost packets in a block in the steady state of the system. Throughout this section we use notation similar to that in [14]. We assume that the sources feeding the system are independent. MMPPa is described by the infinitesimal generator matrix Qa with elements rabc ; ðb; c 2 B ¼ ð1; . . . ; BÞÞ and the arrival rate matrix Ka ¼ diagfka1 ; . . . ; kaB g, where kab is the average arrival rate while the underlying Markov chain is in state b [33]. MMPPs is described by the infinitesimal generator matrix Qs with elements rsuv ; ðu; v 2 U ¼ ð1; . . . ; U ÞÞ and the arrival rate matrix Ks ¼ diagfks1 ; . . . ; ksU g, where ksu is the average arrival rate while the underlying Markov chain is in state u. Let us denote the joint state space of the two MMPPÕs with the set of ordered pairs BU ¼ B  U of cardinality B · U. The superposition of the two sources can be described by a single MMPP with arrival rate b ¼ Ka  Ks , and infinitesimal generator matrix K b Q ¼ Qa  Qs , where  is the Kronecker sum b and Q b are square matrices of size [33,34]. Both K L = B · U and we denote the state of the superposed MMPP with l 2 L ¼ ð1; . . . ; LÞ. Due to the b l = (b  1) · U + u is a onespecial structure of K, to-one and onto mapping from BU to L, with the property that the arrival intensity in state l 2 L of the superposed MMPP is b k l ¼ kab þ ksu for l = (b  1) · U + u, b 2 B; u 2 U. aSince the mapping is invertible, we can calculate b k l , the arrival intensity of MMPP in state l of the superposed a a MMPP, as b k l ¼ kabl=U cþ1 .

1107

i¼0

P(i, l), the steady state distribution of the exponential stages in the queue as seen by an arriving packet from the tagged source can be derived from the steady state distribution of the MMPP/Er/1/K queue as

a pði; lÞb kl Pði; lÞ ¼ L ; rK Pb P k l pði; lÞ l¼1

ð2Þ

i¼0

where p(i, l) is the steady state distribution of the a MMPP/Er/1/K queue and b k l is the arrival intensity of MMPPa in state l of the superposed MMPP. The probabilities P ai;l ðj; nÞ can be derived according to the following recursion. The recursion is initiated for n = 1 with the following relations  1 j¼0 a P i;l ðj; 1Þ ¼ i 6 rðK  1Þ; 0 jP1 ð3Þ  0 j ¼ 0; j P 2 a P i;l ðj; 1Þ ¼ rðK  1Þ < i. 1 j¼1

We denote the probability that a packet arriving in state m of the superposed MMPP comes from the b tagged source with pm ¼ kbkþk and the probability u u , where of the complement event with pm ¼ kbkþk u m = (b  1) · U + u. Thus for n P 2 the following equations hold. P ai;l ðj; nÞ ¼

iþr L X X

l;m Qiþr ðkÞfpm P aiþrk;m ðj; n  1Þ

m¼1 k¼0

þ pm P siþrk;m ðj; n  1Þg

ð4Þ

for 0 6 i 6 r(K  1), and for r(K  1) < i P ai;l ðj; nÞ ¼

L X i X

a Ql;m i ðkÞfp m P ik;m ðj  1; n  1Þ

m¼1 k¼0

þ pm P sik;m ðj  1; n  1Þg.

ð5Þ

1108

G. Da´n et al. / Computer Networks 50 (2006) 1104–1129

P si;l ðj; nÞ is given by P si;l ðj; nÞ ¼

L X iþr X m¼1 k¼0

a Ql;m iþr ðkÞfp m P iþrk;m ðj; nÞ

þ pm P siþrk;m ðj; nÞg;

ð6Þ

for 0 6 i 6 r(K  1), and for r(K  1) < i P si;l ðj; nÞ ¼

L X i X

Qil;m ðkÞfpm P aik;m ðj; nÞ

m¼1 k¼0

þ pm P sik;m ðj; nÞg.

ð7Þ

The probability P si;l ðj; nÞ, 0 6 i 6 rK, l 2 L, n P 1, 0 6 j 6 n is the probability of j losses in a block of n packets, given that the remaining number of exponential stages in the system is i just before the arrival of a packet from the background traffic and the superposed MMPP is in state l. Ql;m i ðkÞ denotes the joint probability of that the next arrival will be while the superposed MMPP is in state m and that k exponential stages out of i will be completed before the next arrival from the joint arrival process given that the last arrival was in state l of the superposed MMPP. A way to calculate Ql;m i ðkÞ is shown in Appendix A. The procedure of computing P ai;l ðj; nÞ is as follows. First we calculate P ai;l ðj; 1Þ, i = 0, . . . , rK from the initial conditions (3). Then in iteration k we first calculate P si;l ðj; kÞ, k = 1, . . . , n  1 using Eqs. (6) and (7) and the probabilities P ai;l ðj; kÞ, which have been calculated during iteration k  1. Then we calculate P ai;l ðj; k þ 1Þ using Eqs. (4) and (5). In the special case when the background traffic is a Poisson process with arrival intensity k, we have b ¼ Ka þ kI and Q b ¼ Qa , where that U = 1, L = B, K I is an identity matrix of size B. The mapping from m as preBU to L becomes l = b, and thus pm ¼ kmkþk sented in [35]. We will use the resulting MMPP + M/Er/1/K model in Section 4. 3.2. Consecutive loss probability Now we turn to calculating the probability of consecutive losses. We define two sets of states, a and x as the set of states of the queue where arriving packets can enter the system and where arriving packets are discarded respectively. Then a = {0, . . . , r(K  1)} and x = {r(K  1) + 1, . . . , rK}. Let us denote by Ai the event that the first packet in a block arrives to the system when the remaining number of exponential stages in the system is i, and we define the event Ax = [i2xAi. Similarly, we

denote with Al the event that the first packet in the block was generated in state l of the superposed MMPP. Using these notations the consecutive loss probability, i.e., the conditional probability that a packet arriving to the system from the tagged source is lost, given that the previous packet from the tagged source was lost is given as P ð2; 2 \ Ax Þ pxjx ¼ P ð2; 2jAx Þ ¼ P ðAx Þ P P ð2; 2 \ Ai Þ i2x ¼ P ðAx Þ P P P ð2; 2 \ Ai \ Al Þ l2L i2x ¼ P ðAx Þ P P a P i;l ð2; 2ÞPði; lÞ l2L i2x ¼ P ðAx Þ P P a P i;l ð2; 2ÞPði; lÞ P ð2; 2Þ l2L i2a[x ; ¼ ¼ P ð1; 1Þ P ðAx Þ

ð8Þ

since the first packet is arbitrary and P ai;l ð2; 2Þ ¼ 0 for i 2 a. The probabilities paja, pxja and pajx can be defined similarly and calculated as pajx = 1  pxjx, pxja = pajxpx/(1  px) and paja = 1  pxja, where px is the average loss probability. 4. Performance analysis In this section we show results obtained with the MMPP + M/Er/1/K model described in Section 3, the MMPP + M/D/1/K model described in [18] and simulations. The average packet length of both the tagged and the background traffic is set to 454 bytes, which is the mean packet size measured on an Internet backbone [13]. Note that increasing the average packet length is equivalent to decreasing the link speed, and thus the particular fixed value of the average packet length does not limit the generality of the results presented here. The PDF, CoV P 3 (r/m) and skewness ( ðX  mÞ =r3 ) parameters of the twelve considered PSDs are shown in Table 1 (see Fig. 1). The G1 distribution is the measured PSD on a 2.5 Gbps Internet backbone link as given by the Sprint IP Monitoring project [13]. The considered link speeds are 10 Mbps, 22.5 Mbps and 45 Mbps. The maximum queuing delay is set to around 1.5 ms in all cases, resulting in buffer sizes from 5 to 20 packets depending on the link speed. Both in the analytical models and in the simulations we consider a three state MMPP as the tagged

1109

G. Da´n et al. / Computer Networks 50 (2006) 1104–1129 Table 1 Considered packet size distributions: coefficient of variation, skewness, PDF and notation in the figures Distribution

CoV

Skewness

PDF

Notation

General 1 General 2 Phase type Exponential General 4 General 5 Erlang-2 General 6 General 7 Erlang-10 General 8 Deterministic

1.2 1.2 1.2 1 1 pffiffiffi 1=pffiffi2ffi 1=p2ffiffiffi 1= pffiffiffiffiffiffi2ffi 0:1ffi pffiffiffiffiffiffi p0:1 ffiffiffiffiffiffiffi 0:1 0

1.07 1.07 1.07 2pffiffiffi 2 2 ffiffiffi p p2ffiffiffiffiffiffiffi pffiffi0:4 ffi 2 ffi pffiffiffiffiffiffi 0:4 0 0

b(x) taken from [13], see Fig. 1 b(x) = 0.74N(127, 20) + 0.26N(1366, 20) b(x) = 0.54E(5, 26) + 0.46E(5, 956) E(1, 454) b(x) = 0.79N(219, 1) + 0.21N(1331, 1) b(x) = 0.85N(321, 1) + 0.15N(1229, 1) E(2, 454) b(x) = 0.65N(219, 1) + 0.35N(892, 1) b(x) = 0.79N(379, 1) + 0.21N(731, 1) E(10, 454) b(x) = 0.5N(310, 1) + 0.5N(598, 1) b(x) = d454(x)

G1 G2 G3 M* G4 G5 E2* G6 G7 E10* G8 D*

1

1

0.8

0.8

0.6

0.6

P(B≤b)

P(B≤b)

N(m, r) denotes a normal distribution with mean m and variance r2. E(r, 1/l) denotes an r-stage Erlang distribution with mean 1/l. Results for PSDs marked with a * are obtained with the models, the rest with simulations.

0.4

M

0.2

0.4

G1 G2

0.2

G3

G4 0

500

1000

0

1500

1

1

0.8

0.8

0.6

0.6

0.4

G5 E2

0.2

1000

0.4

G7 E10

0.2

G6 0

500

1000

1500

Packet size (b)

P(B≤b)

P(B≤b)

Packet size (b)

500

G8 1500

Packet size (b)

0

500

1000

1500

Packet size (b)

Fig. 1. Cumulative density functions of the considered packet size distributions.

source, with an average bitrate of 540 kbps, arrival intensities ka1 ¼ 116/s, ka2 ¼ 274/s, ka3 ¼ 931/s and transition rates ra12 ¼ 0:12594, ra21 ¼ 0:25, ra23 ¼ 1:97, ra32 ¼ 2. These values were derived from an MPEG-4 encoded video trace by matching the average arrival intensities in the three states of the MMPP with the average frame size of the I,P and B frames. For the background traffic we use a Poisson process. This assumption is valid if there are many

sources sharing the same link, as the traffic generated by a large number of sources tends to Poisson as the load increases due to statistical multiplexing [36]. We believe that this assumption will not influence our results with respect to the effects of the PSD; on the other hand it keeps the number of parameters of the model low, and thus eases readability. The simulations were performed in ns-2, the simulation time was between 40,000 and 400,000 s (5–50 million packets from the tagged

1110

G. Da´n et al. / Computer Networks 50 (2006) 1104–1129

source), the margin of error in the simulations was below 5% at a 95% confidence level. We use three measures to compare the packet loss process. The first measure is the consecutive packet loss probability, denoted by pxjx and calculated according to (8). The consecutive packet loss probability has an influence on the efficiency of media-dependent FEC schemes proposed for realtime audio [3,37]. The lower the value of the consecutive packet loss probability, the more effective are the media-dependent FEC schemes, as shown in [38]. The second measure is the Kullback–Leibler distance [39] of the distributions of the number of packets lost in a block. The Kullback–Leibler distance is a commonly used measure of closeness, defined for two distributions as n X P 1 ðj; nÞ dðp1 ; p2 Þ ¼ ; ð9Þ P 1 ðj; nÞlog2 P 2 ðj; nÞ j¼0

Given the probabilities P(j, n) the uncorrected loss probability for an FEC(k, c + k) scheme can be calculated as ¼ pk;cþk x

Uncorrected loss probability

0

10

–1

10

–2

10

–3

10

–4

10

–5

10

–6

10

–7

0.1

f ðk; c þ kÞ ¼ px =pk;cþk . x

ð11Þ

4.1. Constant average load case In this subsection we investigate the effects of the PSD on the packet loss process and the efficiency of FEC as a function of the average load in the network. Fig. 2 shows the uncorrected packet loss probability without error correction (denoted by FEC(1, 1)), for FEC(10, 11) and for FEC(20, 22) on a 10 Mbps link for the G1, G2, G3 (which have the same first three moments), M and D distributions. Figs. 3 and 4 show the same results on a 22.5 Mbps and a 45 Mbps link. The figures show that results obtained with the G1, G2 and G3 distributions are practically the same (the difference is 4. Performing the inverse Laplace transform we get fl;m ðtÞ ¼

L X

bj t Bl;m . j e

ðA:3Þ

j¼1

Now we turn to the calculation of Ql;m i ðkÞ. We denote with Pl,m(k) the joint probability of having k Poisson arrivals with intensity rl between two arrivals from the MMPP and the next arrival from the MMPP coming in state c of the MMPP given that the last arrival came in state b. The z-transform Pl,m(z) of Pl,m(k) is given by [10] ! Z 1 1 X ðrltÞk rlt l;m e fl;m ðtÞ dt zk P ðzÞ ¼ k! 0 k¼0  ¼ fl;m ðrl  rlzÞ.

ðA:4Þ

Thus we can express Pl,m(k) from (A.4) by performing the inverse z-transform after the substitution. Using the notation aj = 1 + bj/(rl) and Al;m j ¼ Bl;m =ðrla Þ we get j j P l;m ðkÞ ¼

L X

Al;m j

j¼1

1 . akj

ðA:5Þ

Given the probability Pl,m(k) one can express Ql;m i ðkÞ as l;m ðkÞ if k < i; Ql;m i ðkÞ ¼ P 1 P Ql;m P l;m ðjÞ if k ¼ i; i ðkÞ ¼

ðA:6Þ

j¼i

so that we have 8 L P l;m  1 k > > > < Aj a j Ql;m i ðkÞ ¼

0 6 k < i;

j¼1

L > P > > :

j¼1

Al;m j

11=aj

 i 1 aj

ðA:7Þ k ¼ i.

References [1] The Multimedia Integrated Conferencing for European Researchers (MICE) Project, RAT: Robust Audio Tool, Available from: . [2] A.V. Garcı´a, S. Fosse-Parisis, Freephone audio tool, Available from: . [3] P. Dube, E. Altman, Utility analysis of simple FEC schemes for VoIP, in: Proceedings of IFIP Networking, 2002, pp. 226–239. [4] K. Kawahara, K. Kumazoe, T. Takine, Y. Oie, Forward error correction in ATM networks: An analysis of cell loss distribution in a block, in: Proceedings of IEEE INFOCOM, 1994, pp. 1150–1159. [5] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley, J. Crowcroft, The use of forward error correction (FEC) in reliable multicast, RFC 3453, 2002. [6] E. Biersack, Performance evaluation of forward error correction in ATM networks, in: Proceedings of ACM SIGCOMM, 1992, pp. 248–257. [7] G. Da´n, V. Fodor, Quality differentiation with source shaping and forward error correction, in: Proceedings of MIPSÕ03, 2003, pp. 222–233. [8] P. Frossard, FEC performances in multimedia streaming, IEEE Commun. Lett. 5 (3) (2001) 122–124. [9] E. Gilbert, Capacity of a burst-noise channel, Bell Syst. Tech. J. 69 (1960) 1253–1265. [10] L. KleinrockQueueing Systems, vol. I, Wiley, New York, 1975. [11] E. Altman, C. Barakat, V. Ramos, On the utility of FEC mechanisms for audio applications, in: Proceedings of Quality of Future Internet Services, LNCS 2156, 2001, pp. 45–56. [12] J.W. Cohen, The Single Server Queue, North-Holland Publishing, Amsterdam, 1969.

1128

G. Da´n et al. / Computer Networks 50 (2006) 1104–1129

[13] Sprint IP Monitoring Project, Available from: . [14] I. Cidon, A. Khamisy, M. Sidi, Analysis of packet loss processes in high speed networks, IEEE Trans. Inform. Theor. 39 (1) (1993) 98–108. [15] O. Gurewitz, M. Sidi, I. Cidon, The ballot theorem strikes again: packet loss process distribution, IEEE Trans. Inform. Theor. 46 (7) (2000) 2595–2599. [16] E. Altman, A. Jean-Marie, Loss probabilities for messages with redundant packets feeding a finite buffer, IEEE J. Select. Areas Commun. 16 (5) (1998) 779–787. [17] H. Schulzrinne, J. Kurose, D. Towsley, Loss correlation for queues with bursty input streams, in: Proceedings of IEEE ICC, 1992, pp. 219–224. [18] G. Da´n, V. Fodor, G. Karlsson, Analysis of the packet loss process for multimedia traffic, in: Proceedings of the 12th International Conference on Telecommunication Systems, Modeling and Analysis, 2004. [19] H. Heffes, D.M. Lucantoni, A Markov modulated characterization of packetized voice and data traffic and related statistical multiplexer performance, IEEE J. Select. Areas Commun. 4 (6) (1986) 856–868. [20] C. Blondia, The N/G/1 finite capacity queue, Commun. Stat.—Stoch. Mod. 5 (2) (1989) 273–294. [21] J.C. Bolot, End-to-end packet delay and loss behavior in the Internet, in: Proceedings of ACM SIGCOMM, 1993, pp. 289–298. [22] O.J. Boxma, Sojourn times in cyclic queues-the influence of the slowest server, Comput. Perform. Reliab. (1988) 13–24. [23] J. Beran, R. Sherman, M. Taqqu, W. Willinger, Long-range dependence in variable-bit-rate video traffic, IEEE Trans. Commun. 43 (2/3/4) (1995) 1566–1579. [24] S. Robert, J. Le Boudec, New models for pseudo self-similar traffic, Perform. Evaluat. 30 (1–2) (1997) 57–68. [25] A.T. Andersen, B.F. Nielsen, A Markovian approach for modeling packet traffic with long-range dependence, IEEE J. Select. Areas Commun. 16 (5) (1998) 719–732. [26] T. Yoshihara, S. Kasahara, Y. Takahashi, Practical timescale fitting of self-similar traffic with Markov-modulated Poisson process, Telecommun. Syst. 17 (1–2) (2001) 185–211. [27] A. Klemm, C. Lindemann, M. Lohmann, Modeling IP traffic using the batch Markovian arrival process, Perform. Evaluat. 54 (2) (1993) 149–173. [28] B. Ryu, A. Elwalid, The importance of long-range dependence of VBR video traffic in ATM traffic engineering: myths and realities, in: Proceedings of ACM SIGCOMM, 1996, pp. 3–14. [29] P. Skelly, M. Schwartz, S. Dixit, A histogram-based model for video traffic behavior in an ATM multiplexer, IEEE/ ACM Trans. Networking 1 (4) (1993). [30] T. Karagiannis, M. Molle, M. Faloutsos, A nonstationary Poisson view of Internet traffic, in: Proceedings of IEEE INFOCOM, 2004, pp. 1–12. [31] P. Salvador, R. Valadas, A. Pacheco, Multiscale fitting procedure using Markov-modulated Poisson processes, Telecommun. Syst. 23 (1–2) (2003) 123–148. [32] R. Gaigalas, I. Kaj, Convergence of scaled renewal processes and a packet arrival model, J. Bernoulli Soc. Math. Stat. Prob. 9 (4) (2003) 671–703. [33] W. Fischer, K. Meier-Hellstern, The Markov-modulated Poisson process MMPP cookbook, Perform. Evaluat. 18 (2) (1992) 149–171.

[34] R. Bellman, Introduction to Matrix Analysis, McGraw-Hill, New York, 1960. [35] G. Da´n, V. Fodor, G. Karlsson, Packet size distribution: an aside? in: Proceedings of QoS-IPÕ05, 2005, pp. 75–87. [36] J. Cao, W.S. Cleveland, D. Lin, D.X. Sun, Internet traffic tends toward Poisson and independent as the load increases, in: Nonlinear Estimation and Classification, Springer, 2002. [37] J.C. Bolot, S. Fosse-Parisis, D. Towsley, Adaptive FECbased error control for Internet telephony, in: Proceedings of IEEE INFOCOM, 1999, pp. 1453–1460. [38] G. Da´n, V. Fodor, G. Karlsson, Are multiple descriptions better than one? in: Proceedings of IFIP Networking 2005, 2005, pp. 684–696. [39] S. Kullback, Information Theory and Statistics, Wiley, New York, 1959. [40] L. Le Ny, B. Sericola, Transient analysis of the BMAP/PH/1 queue, Int. J. Simulat. 3 (3–4) (2003) 4–15. [41] W. Whitt, Approximating a point process by a renewal process. I: Two basic methods, Operat. Res. 30 (1) (1982) 125–147. [42] M. Neuts, Matrix Geometric Solutions in Stochastic Models, John Hopkins University Press, 1981. [43] E.P.C. Kao, Using state reduction for computing steady state probabilities of queues of GI/PH/1 types, ORSA J. Comput. 3 (3) (1991) 231–240. [44] O. Ait-Hellal, E. Altman, A. Jean-Marie, I.A. Kurkova, On loss probabilities in presence of redundant packets and several traffic sources, Perform. Evaluat. 36–37 (1–4) (1999) 485–518. [45] M. Yajnik, S. Moon, J. Kurose, D. Towsley, Meaurement and modelling of the temporal dependence in packet loss, in: IEEE INFOCOM, 1999, pp. 345–352. [46] W. Jiang, H. Schulzrinne, Modeling of packet loss and delay and their effect on real-time multimedia service quality, in: Proceedings of NOSSDAV, 2000. [47] P. Frossard, O. Verscheure, Joint source/FEC rate selection for quality-optimal MPEG-2 video delivery, IEEE Trans. Image Process. 10 (12) (2001) 1815–1825. [48] P. Billingsley, Statistical Inference for Markov Processes, Chicago University Press, 1961. [49] E. Elliott, Estimates of error rates for codes on burst-noise channels, Bell Syst. Tech. J. 42 (1963) 1977–1997.

Gyo¨rgy Da´n received the M.Sc. degree in Informatics from the Budapest University of Technology and Economics, Hungary in 1999 and the M.Sc. degree in Business Administration from the Budapest University of Economic Sciences, Hungary in 2003. He worked as a consultant in the field of access networks, streaming media and videoconferencing from 1999 until 2001. He was a Ph.D. student at the Technical University of Budapest from 1999 to 2002 and a guest researcher at KTH, Royal Institute of Technology, Stockholm, Sweden from 2001 to 2002. Currently he is a Ph.D. student at the Department of Signals, Sensors and Systems of KTH, Royal Institute of Technology working on traffic control for real-time multimedia traffic in packet switched networks.

G. Da´n et al. / Computer Networks 50 (2006) 1104–1129 Vikto´ria Fodar received her M.Sc. and Ph.D. degrees in Computer Engineering from the Budapest University of Technology and Economics in 1992 and 1999, respectively. In 1994 and 1995 she was a visiting researcher at Politecnico Torino and at Boston University, where she conducted research on optical packet switching solutions. In 1998 she was a senior researcher at the Hungarian Telecommunication Company. In 1999 she joined the KTH, Royal Institute of Technology, where she now works as an assistant professor. Her current research interests include performance analysis of communication networks and traffic and error control for multimedia communication.

1129

Gunnar Karlsson is professor in the Department of Signals, Sensors and Systems of KTH, the Royal Institute of Technology since 1998; he is the director of the Laboratory for Communication Networks. He has previously worked for IBM Zurich Research Laboratory and the Swedish Institute of Computer Science (SICS). His Ph.D. is from Columbia University (1989), New York, and his M.Sc. from Chalmers University of Technology in Gothenburg, Sweden (1983). He has been a visiting professor at EPFL, Switzerland, and the Helsinki University of Technology in Finland. His current research relates to quality of service and wireless LAN developments.

Computer Networks 50 (2006) 1130–1144 www.elsevier.com/locate/comnet

A connection-oriented network architecture with guaranteed QoS for future real-time applications over the Internet Manodha Gamage *, Mitsuo Hayasaka, Tetsuya Miki The Department of Electronics, The National University of Electro-Communications, 1-5-1, Chofugaoka, Chofu-Shi, Tokyo 182-8585, Japan Available online 5 October 2005

Abstract The QoS provided by todayÕs best effort Internet is not good enough, especially for real-time interactive traffic categorized as Premium Traffic (PT). It is believed that QoS guarantees could be better provided by connection-oriented networks such as IP/MPLS. However, these connection-oriented networks are inherently more prone to network failures. Failures of connection-oriented MPLS can be broadly classified into two types namely: link/path failures and degraded failures. Degraded failures that account for about 50% of total failures are detected by the timers maintained at the control plane peers. The control plane and the data plane of IP/MPLS packet networks are logically separated and therefore a failure in the control plane should not immediately disconnect the communications in the data plane. The Virtual Path Hopping (VPH) concept in this study distinguishes these two types of failures and avoids the disconnections of communications in the data plane due to degraded failures. Thereby it reduces the number of failures in the data plane. Computer simulations were performed and the results indicate that VPH is a proactive technique that minimizes failures in the data plane. The proposed Dynamic Virtual Path Allocation (DVPA) algorithm improves the availability of the connection-oriented networks by overcoming link/path failures, especially for PT, without compromising the network resource utilization efficiency. Comprehensive simulations were performed to evaluate the DVPA algorithm and the results show the improvements that DVPA can achieve. Therefore implementation of the DVPA algorithm in conjunction with VPH would improve the reliability and availability aspects of QoS in connection-oriented IP/MPLS packet networks.  2005 Elsevier B.V. All rights reserved. Keywords: VPH; DVPA algorithm; Connection-oriented networks; Degraded failures; Link/path failures; PT; BET

1. Introduction With the explosive growth and operational cost reductions of the Internet, real-time multimedia applications over the Internet are increasingly

*

Corresponding author. E-mail address: [email protected] (M. Gamage).

becoming popular. The real-time interactive traffic that is categorized as Premium Traffic (PT) over the Internet prefers guaranteed QoS in the dimensions of minimum delay and jitter, guaranteed bandwidth, high availability and reliability, fast recovery from failures, etc. Connection-oriented networks such as IP/MPLS [1] are better able to provide the required QoS in many of these dimensions compared to the conventional connectionless,

1389-1286/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.09.007

M. Gamage et al. / Computer Networks 50 (2006) 1130–1144

best-effort IP networks. On the other hand connection-oriented networks are potentially more vulnerable to failures and the focus of this study is to find a suitable solution to overcome the problems due to the network failures and increase the availability and reliability, especially for PT. According to [2], the failures in connection-oriented networks such as IP/MPLS can be categorized into degraded failures and link/path failures. Degraded failures are detected using the control plane timers and they are mainly due to restart of the control plane nodes, congestion, hardware and software failures in the control plane, protocol failures such as adjacency losses, etc. The studies have shown that about 50% of the network failures last only for very short periods of time ( n the VPH is made in a cyclic manner and therefore each path is used more than once as explained in the previous section. The VPH concept will almost eliminate degraded type failures, making pdij very small (almost zero). Therefore PVPH < PNo_VPH. This shows an improved reliability and availability of the network due to the VPH concept. 6.2. DVPA algorithm In the analysis performed here for the DVPA algorithm, it is assumed that the VP-pool consists of n link-disjoint VPs each with a bandwidth of B, and there are N AVPs at any given time (2 6 N 6 n). The initial analysis is done for a single failure scenario and then the analysis is extended for dual failure situations that might be more appropriate for future networks.

1138

M. Gamage et al. / Computer Networks 50 (2006) 1130–1144

PT ratio of ith AVP; Pi ¼

PT in ith AVP Total BW of ith AVP

where ð1 6 i 6 N Þ.

Total PT in N AVPs; ! N X B Pi . i¼1

i¼1

Total available bandwidth after failure of one AVP; B  ðN  1Þ. The available bandwidth for recovery of PT (SC to protect PT), if jth AVP fails (1 6 j 6 N) is given by; !! ! N X B N 1 Pi  Pj . i¼1

Here the bandwidth used by the BET is not considered because the objective of this method is to provide guaranteed QoS and availability for PT. The maximum PT to be recovered is max(Pi) and the DVPA algorithm always should have enough spare capacity to recover it. For 100% restorability of PT (single failure); !! ! N X Pi  Pj B N 1 > B  maxðP i Þ i¼1

) N > maxðP i Þ þ

N X

Pi

i¼1

In Fig. 2, A ¼ maxðP i Þ þ

N X i¼1

Pi

the network administrator according to the needs of the network. Similar analysis can be extended for a dual failure situation, where jth and kth VPs fail and then it is possible to obtain; ! N X P i  ðP j þ P k Þ þ 2. A ¼ 2 maxðP i Þ þ

!

!

 P j þ 1.

 P j þ 1.

A service factor s is considered to allow some extra bandwidth to make sure that the AVPs are not overloaded. Always N > A should be maintained and if N < A/s, where 0 < s < 1, another VP should be activated in the VP-pool in order to increase N by one. The PT ratio for every AVP is calculated by the ingress and these values are used to allocate PT to the AVP with minimum Pi as explained in Section 5. Whenever a new allocation of PT is done the PT ratios are updated. The utilization factor, u is defined such that 0 < u < s < 1 and if N > max{A/u,TT/(B * u)}, then the utilization of the AVPs is low. Therefore reduce N by one as explained before. u is used to avoid frequent fluctuations in the number of AVPs that can occur if a single threshold value is used to increase and decrease N. The values of s and u can be decided by

This means that there should be enough spare capacity to recover 2 max(Pi) of affected PT. 7. Simulations and results Sections A and B below explain the separate simulations carried out to evaluate the VPH concept and DVPA algorithm respectively. The traffic flow arrivals were simulated according to Poisson distribution. Many simulations were performed with different average traffic flow arrival values such as 3 s, 5 s, 10 s, 30 s, 60 s, and 300 s. The durations of communication sessions were decided based on an exponential distribution and different averages of 300 s, 600 s, 900 s, 1800 s, and 3600 s were simulated. The bandwidths of the sessions were randomly decided with averages of 1 Mbps, 5 Mbps, 10 Mbps, and 20 Mbps. After many simulations with different combinations of the average values above, it was found that the improvements due to the VPH and DVPA were not very sensitive to these average values. Therefore the results shown here are for averages of 10 s, 1800 s and 10 Mbps for traffic flow arrivals, session durations and bandwidth of sessions respectively. The number of sessions in a flow arrival was randomly decided to be between 1 and 10. According to many simulations, the parameters of the distribution of repair times of failures had negligible effects to the performance of VPH and therefore it was assumed to be a constant for all simulations of VPH. 7.1. VPH concept Different network topologies with nodes 10, 20, 40, 50, 60 and 90 with different number of links were simulated for many failure combination scenarios in nodes and links as shown in Table 1. The results indicated almost complete elimination of terminations of the data communications in the data plane due to degraded failures irrespective of the network size and topology. Therefore the results of the topol-

1139

M. Gamage et al. / Computer Networks 50 (2006) 1130–1144

performed to restore it and a free VP from the VP-pool was always used as a backup path in all simulations. Therefore, the number of re-routings done would be a count of network failures and this is used as a performance evaluation measure of VPH concept. VPH_Timer values were randomly decided for each peer to be a multiple of 10 s in the range of 30–80 s. Fig. 3 indicates the Frequency of Occurrence vs. Number of Re-routings (per month) graphs. The vertical dashed lines in these graphs indicate the highest frequency of occurrence of re-routings and we can see that most occurrences are concentrated around these dashed lines as expected. According to the results shown in Fig. 3, VPH always reduces this highest frequency of occurrence of re-routings by about 50% irrespective of the failure probability

Table 1 Different failure combination scenarios Combination

Failures/link/month

Failures/node/month

I II III IV

0.01 0.01 0.05 0.05

0.01 0.1 0.1 0.01

ogy with 90 nodes and 270 bi-directional links, simulated for failure combination scenarios in Table 1 are presented here. Over 30 connectivity orientations of topologies with 90 nodes and 270 bi-directional links were simulated for each failure combination in order to obtain more generalized results. Simulations were performed for with-VPH and without-VPH scenarios. Whenever a network failure occurs in the data plane a fast re-routing is

6 5

With VPH

With VPH

Without VPH

5

Without VPH

Frequency of Occurrence

Frequency of Occurrence

4

4

3

2

3

2

1

1

0

0

0.2

(a)

0.4

0.6

0. 8

1

1.2

1.4

1.6

1.8

5

6

5

8

7

5

With VPH

Without VPH

Without VPH

Frequency of Occurrence

Frequency of Occurrence

4

With VPH 4

3

2

1

(b)

3

2

Number of Re-routings (per month)

(c)

Number of Re-routings (per month)

0

1

2

4

3

2

1

1

2

3

4

Number of Re-routings (per month)

0 0.5

5

(d)

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

Number of Re-routings (per month)

Fig. 3. Frequency of occurrence vs. number of re-routings: (a) failure combination I, (b) failure combination II, (c) failure combination III and (d) failure combination IV.

1140

M. Gamage et al. / Computer Networks 50 (2006) 1130–1144

width of each VP was considered to be 100 Mbps. All simulations were carried out for durations of 24 h. Utilization factor, u and service factor, s explained in our numerical analysis were set to 0.7 and 0.9 respectively. DVPA was also evaluated for dual fault situations and it was necessary to have a VP-pool of seven link-disjoint VPs, in order to accommodate the same amount of traffic as in the single fault scenario. Networks with six fixed AVPs and seven fixed AVPs between an ingress and egress were also simulated in order to compare the above single (S) and dual (D) fault scenarios respectively. The fixed numbers of AVPs were decided as 6 and 7 for the two scenarios because they were the minimum bandwidth (600 and 700 Mbps) required to

6 (UE with VPH-UE without VPH)

Difference of Utilization Efficiency (%)

8

4

2

0

-2

-4 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Load Fig. 4. Difference of utilization efficiency vs. load (for VPH).

100 90

Utilazation Efficiency (%)

80 70 60 50 40 30 20

0

(a)

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Load

100 90 80 70 60 50 40 30 20

Proposed for S Proposed for D Fixed for S Fixed for D

10

7.2. DVPA algorithm A VP-pool consisting of six link-disjoint VPs was simulated to evaluate the performance of the proposed DVPA algorithm for single faults. The band-

Proposed for S Proposed for D Fixed for S Fixed for D

10

Utilization Efficiency (%)

of nodes and links. This reflects the reduction of failures in the data plane and accounts for the elimination of the terminations of data communications in the data plane due to degraded failures, which are detected by the control plane timers. Similar graphs were obtained for the network topologies with 10, 20, 40, 50 and 60 nodes and all those graphs indicated almost similar 50% improvement irrespective of the network topology and the failure probabilities. The changes in the network utilization efficiency due to the implementation of VPH were measured. Fig. 4 indicates the differences of utilization efficiency (with and without VPH) vs. load graph. Here the difference of utilization efficiency is defined as utilization efficiency with VPH  utilization efficiency without VPH. Load values ranging from 0.1 to 0.9, the utilization efficiency of network resources was measured with and without VPH being implemented. As we expected the graph shows there is not much difference (less than ±5%) between them. This means that the implementation of VPH does not affect the utilization efficiency of the network resources. This is because at any given time VPH uses only one VP and its resources, even though it has a VP-pool.

(b)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Load

Fig. 5. Utilization efficiency vs. load (for DVPA): (a) for 25% PT and (b) for 50% PT.

1141

M. Gamage et al. / Computer Networks 50 (2006) 1130–1144

Dual Dynamic (Proposed) Dual Fixed Single Dynamic (Proposed) Single Fixed

7

Number of Occurrence

6

5

4

3

2

1

0 10

20

30

(a)

40

50

60

70

80

90

100

Utilization Efficiency (%)

Dual Dynamic (Proposed) Dual Fixed Single Dynamic (Proposed) Single Fixed

7

6

Number of Occurrence

accommodate both active and backup bandwidth of all traffic during the peak time. Fig. 5 shows Utilization Efficiency vs. Load graphs for the DVPA algorithm and for fixed number of AVPs for both S and D fault scenarios. Utilization efficiency here is defined as the ratio of used bandwidth (PT and BET) between ingress and egress to the total available active bandwidth between them. Simulations were done for 10%PT, 20%PT, 25%PT, 30%PT, 35%PT, 40%PT, 45%PT, and 50%PT. We believe it is highly unlikely that PT would go beyond 50% in the foreseeable future. Since they all indicated similar improvements, Fig. 5(a) and (b) shown here are only for 25% and 50% of PT respectively. According to the graphs in Fig. 5(a), when the proposed algorithm was implemented for the single fault scenario, the utilization efficiency for 25% PT is over 80% for loads above 0.25. Even at low loads (0.1–0.2) it is over 50%. In contrast to this the utilization efficiency for fixed AVP (for S) is less than 50% for the loads below 0.5 and less than 80% even for loads of 0.8. As expected, for the dual (D) failure situation, the utilization efficiency is slightly reduced compared to single failure scenario as more spare capacity is required for more protection of PT. Still it can achieve over 60% efficiency when the load is over 0.4. The utilization efficiency even deteriorated for the dual fault scenario of fixed AVP. As shown in Fig. 5(b) the utilization efficiency is slightly reduced for all cases when PT is increased up to 50%. In all these simulations there were no dropped traffic flows even at the peak time. Many simulations with the same traffic patterns were performed for different VP-pools with a given load and PT percentage. They were done for load values varying from 0.3 to 0.9 and for PT percentages varying from 10 to 50. Figs. 6 and 7 show the Number of Occurrences vs. Utilization Efficiency graphs for 25% and 50% of PT respectively. They show graphs for loads of 0.4 and 0.8 for each PT percentage of 25 and 50. The vertical dashed lines in these graphs represent the most frequent occurrence of utilization efficiency. In both Figs. 6 and 7, we can see that the most frequent occurrence for the proposed DVPA is always higher than the fixed number of VP situations indicating an improvement in the utilization efficiency due to the DVPA algorithm. It is also evident that the utilization efficiency is slightly less for higher loads (for both S and D scenarios) as well as for higher PT%. However, according to these graphs we can conclude that the DVPA algorithm, in general,

5

4

3

2

1

0 10

(b)

20

30

40

50

60

70

80

90

100

Utilization Efficiency (%)

Fig. 6. Frequency of occurrence vs. UE (PT of 25%): (a) load of 0.8 and (b) load of 0.4.

improves the utilization efficiency for both S and D situations irrespective of the PT percentages and the load of the network that are considered. As expected when the average PT was over 50%, first it started to drop a few BET calls and when further increased, it started to drop some PT too. This was due to lack of bandwidth and it can be overcome by increasing the bandwidth of each VP or by increasing the number of VPs in the VP-pool. Increasing the VPs in the VP-pool does not affect the utilization efficiency in the proposed algorithm as it dynamically varies the number of AVPs. However an increasing number of VPs between any ingress and egress would make the conventional

1142

M. Gamage et al. / Computer Networks 50 (2006) 1130–1144 100

7

Dual Dynamic (Proposed) Dual Fixed Single Dynamic (Proposed) Single Fixed

6

BET PT

90

70

5

Traffic (Mbps)

Number of Occurrence

80

4

3

60 50 40 30

2

20 10

1

0

0 10

1

20

30

40

(a)

50

60

70

80

90

6

5

4

3

2

(a)

Virtual Path Number

100

Utilization Efficiency

100 BET PT

90

7

Dual Dynamic (Proposed) Dual Fixed Single Dynamic (Proposed) Single Fixed

80 70

Traffic (Mbps)

Number of Occurrence

6

5

4

3

60 50 40 30 20

2 10

1

0

(b) 0 10

(b)

20

30

40

50

60

70

80

90

100

Utilization Efficiency

Fig. 7. Frequency of occurrence vs. UE (PT of 50%): (a) load of 0.8 and (b) load of 0.4.

fixed VP method more and more inefficient during low traffic periods. In all these simulations there were no interruptions for applications, as the recovery was very fast whenever there was a failure. Fig. 8 describes how the proposed algorithm distributes the PT among all AVPs for both (a) Off-peak and (b) Peak conditions. Fig. 8(a) and (b) is for when the load is 0.4 and 0.8–0.9 respectively. These figures are the results of the simulations performed for the single fault situation when there is an average of 50% PT. Fig. 8(a) indicates that only three VPs are activated (VP #1, #2 and #3) at low loads allowing the resources of the unused VPs to be used by other traffic. Fig. 8(b) shows, when the load is very high the algorithm activates all VPs in the VP-pool. This

1

2

3

4

5

6

Virtual Path Number

Fig. 8. Traffic with respect to VP number: (a) off-peak and (b) peak.

clearly shows the dynamic variation of number of AVPs between ingress and egress according to the traffic arrival. Also it proves the DVPA algorithm can distribute PT among AVPs and improve the utilization efficiency of the network resources. 8. Conclusions The rapid growth of real-time multimedia applications over the Internet demands guaranteed QoS and near 100% availability. Therefore connectionoriented networks that can meet QoS demands with respect to delay, jitter, and guaranteed bandwidth can be expected to dominate the future Internet. Connection-oriented networks are more vulnerable to network failures and it is a timely requirement to find a solution to achieve 100% availability not only for single failure situations but also for dual

M. Gamage et al. / Computer Networks 50 (2006) 1130–1144

failure situations in these networks. The DVPA algorithm with the VPH concept provides this demanded availability and reliability without deteriorating the utilization efficiency of network resources as the results of this study show. The terminations of data communications in the data plane due to degraded failures that are detected by control plane timers can be completely eliminated by VPH. The link/path failures of the data plane, especially for PT, can be recovered with very fast recovery times (x) [%]

remains an active research topic [21]. Hence, in order to draw some conclusions about the real Internet, we perform our simulations on several Internet-like topologies, with different properties. The simulations on these various topologies allow us to determine the impact of the topology on the results, but also to explore possible evolution scenarios for the Internet. In Section 4, we used a large router-level Internet topology that models delays. Here, we use AS-level topologies instead, for two reasons. A first reason is the computation time. The topology used in Section 4 is unnecessarily complex for an AS-level simulation since it models routers and delays. A second reason is that we want to consider different types of topologies to estimate the variability of our results with respect to the topology. We first use an AS-level Internet topology inferred from several BGP routing tables using the method developed by Subramanian et al. [13]. Next, we generate three AS-level Internet-like topologies, using a Baraba´si–Albert model [22]. The topologies are created level by level, from the dense core to the customer level. Nodes are added one at a time, using the Baraba´si–Albert preferential connectivity model, i.e., new nodes tend to connect to existing nodes that are highly connected. The generated topologies provide details about customer-provider and peer-to-peer relationships. Their numbers of Internet hierarchy levels and nodes in each level can be specified, so that we can produce small- or large-diameter Internet topologies while preserving the same number of stub ASes and transit ASes. This feature is used in Section 5.4 to explore different scenarios of the Internet evolution.

25 providers 10

50 40 20

6 5 4 3 2

10

1

30

0

0

0.1

0.2

0.3

0.4

0.5 0.6 Diversity

0.7

0.8

0.9

1

Fig. 10. AS-level path diversity for the inferred Internet topology, using traditional IPv4 multihoming.

IPv6 Multihoming BGP Path Diversity

100 90 80 70 P(X>x) [%]

1152

60

25 providers

50 10

40 30 20

1

2

3

0.3

0.4

4

5

6

10 0

0

0.1

0.2

0.5 0.6 Diversity

0.7

0.8

0.9

1

Fig. 11. AS-level path diversity for the inferred Internet topology, using IPv6 multihoming.

This percentage raises to 22% for dual-homed stub ASes. Fig. 11 shows that about 50% dual-homed IPv6 stub ASes have a path diversity better than 0.2. We can observe that the diversity remains the same when considering only single-homed destinations. Indeed, only one prefix is announced by a single-homed stub AS, using either IPv4 or IPv6 multihoming technique. The use of IPv6 multihoming does not introduce any benefit in this case. When comparing Figs. 10 and 11, it appears that the AS-level path diversity is much better when stub ASes use multiple PA prefixes than when they use a single PI prefix. For example, when considering dual-homed IPv6 stub ASes, Fig. 11 shows that the path diversity observed is already as good as the path diversity of a 25-homed stub AS that uses traditional IPv4 multihoming. The path diversity obtained by a 3-homed stub AS that uses IPv6 multihoming completely surpasses the diversity of even

1153

C. de Launois et al. / Computer Networks 50 (2006) 1145–1157

a 25-homed stub AS that uses traditional IPv4 multihoming. These results are corroborated by Figs. 12 and 13. These figures present the probability that a stub AS has at least two disjoint paths towards another stub AS, in the inferred Internet topology. They show the mean, 5e percentile, median and 95e percentile of this probability. The results are classified according to the number of providers of the stub AS. The percentage of single-homed stub ASes in this topology is about 40%, and thus the probability of having disjoint paths is at most 60%, whatever the number of providers. Fig. 12 shows for instance that a dual-homed stub AS has at least two disjoint paths towards 20% of the destination ASes on average. Fig. 13 considers the use of multiple PA prefixes. It shows in this case that being dual-homed is sufficient for most stub ASes to reach the maxi-

100

mum probability of having disjoint paths up to a destination AS. This confirms our previous finding. 5.4. Influence of topology on path diversity The way the Internet will evolve in the future remains essentially unknown. In order to determine the range of variation for our simulation results, we perform simulations with three distinct generated topologies. The first is a topology that tries to resemble the current Internet [13]. Four hierarchy levels of ASes are generated for this topology: a fully-meshed dense core, a level of large transit ASes, a level of local transit ASes, and a level of stub ASes. The proportion of nodes in each level is similar to the proportion observed for the current Internet. Figs. 14 and 15 show the AS-level path diversity for this generated topology. As expected, the path diversity results for

5%, Median, 95% Mean

90

AS-Level IPv4 Multihoming BGP Path Diversity 100 90

70

80

60

70

50

P(X>x) [%]

P(disjoint path) [%]

80

40 30

50

10 6

20

10

2 1

10 1

2

3

4

5

6

7

8

9

0

Number of providers

Fig. 12. Probability that a stub AS has at least two disjoint paths towards any other stub AS, when it uses a single PI prefix.

100

0.1

0.2

0.3

0.4

0.5 0.6 Diversity

0.7

0.8

0.9

1

AS-Level IPv6 Multihoming BGP Path Diversity 100 90

80

80

70

70 P(X>x) [%]

60 50 40 30

60

30 20 10

2

3

4 5 6 7 Number of providers

8

9

Fig. 13. Probability that a stub AS has at least two disjoint paths towards any other stub AS, when it uses multiple PA prefixes.

10

40

10 1

25 providers

50

20

0

0

Fig. 14. AS-level path diversity d for a generated Internet-like topology, using a single PI prefix.

5%, Median, 95% Mean

90

P(disjoint path) [%]

25 providers

40 30

20

0

60

0

2

1

0

0.1

0.2

3

0.3

4

0.4

5

6

0.5 0.6 Diversity

0.7

0.8

0.9

1

Fig. 15. AS-level path diversity d for a generated Internet-like topology, using multiple PA prefixes.

1154

C. de Launois et al. / Computer Networks 50 (2006) 1145–1157

this generated topology are almost identical to the results obtained for the inferred topology. The second is a small-diameter Internet topology, consisting of stub ASes directly connected to a fully meshed dense core. This topology simulates a scenario where ASes in the core and large transit ASes concentrate for commercial reasons. At the extreme, the Internet could consist in a small core of large transit providers, together with a large number of stub ASes directly connected to the transit core. This could lead to an Internet topology with a small diameter. The AS-level path diversity for such a topology is illustrated in Figs. 16 and 17. As expected, the diversity in a small-diameter topology is better, since the paths are shorter than in the current Internet. When comparing the results illustrated by Figs. 16 and 17, it appears that the gain in path diversity is also large for a low-diameter topology.

The third is a large-diameter topology, generated using eight levels of ASes. This topology simulates a scenario where the Internet continues to grow, with more and more core, continental, national and metropolitan transit providers. In this case, the Internet might evolve towards a network with a large diameter. The same simulations are performed. The path diversity results are presented by Fig. 18 and 19. These figures show a poor path diversity in comparison with the path diversity of the previous topologies. This is due to the paths being longer. Again, these two figures show that the path diversity remains low when stub ASes use a single PI prefix, whatever their number of providers. When multiple PA prefixes are used, the path diversity rises much faster with the number of providers, as shown by Fig. 19. These two figures confirm that the gain in path diversity is substantial also for a large-diameter topology.

AS-Level IPv4 Multihoming BGP Path Diversity

AS-Level IPv4 Multihoming BGP Path Quality

100

100 90

80

80 P(X>x) [%]

P(X>x) [%]

70 60 25 providers 40

6 5

20 0

0.1

0.2

50 40 30

25 providers 10

20

1 0

60

10 0.3

0.4

0.5 0.6 Diversity

0.7

0.8

0.9

0

1

Fig. 16. AS-level path diversity d for a small-diameter generated topology, using a single PI prefix.

1 0

0.1

0.2

0.3

0.4

0.5 0.6 Diversity

0.7

0.8

0.9

1

Fig. 18. AS-level path diversity d for a large-diameter generated topology, using a single PI prefix.

AS-Level IPv6 Multihoming BGP Path Diversity

AS-Level IPv6 Multihoming BGP Path Diversity 100

100

90 80

80

25 providers 10 6

40 20 0

P(X>x) [%]

P(X>x) [%]

70 60

2

1 0

0.1

0.2

3

4

5

60

25 providers

50 40

10

30 20 10

0.3

0.4

0.5 0.6 Diversity

0.7

0.8

0.9

1

Fig. 17. AS-level path diversity d for a small-diameter generated topology, using multiple PA prefixes.

0

1 0

2

0.1

3

4

5

0.2

6

0.3

0.4

0.5 0.6 Diversity

0.7

0.8

0.9

1

Fig. 19. AS-level path diversity d for a large-diameter generated topology, using multiple PA prefixes.

1155

C. de Launois et al. / Computer Networks 50 (2006) 1145–1157

Figs. 20 and 21 show the average path diversity in function of the number of providers for all topologies considered. For a given destination stub AS D, we compute the mean of path diversities from every source stub towards D. We then group the destination stub ASes according to their number of providers, and compute the mean of their path diversities. In Figs. 20 and 21, we can first observe that the results obtained for the generated and inferred Internet topologies are fortunately quite close. We can also observe that the average diversity of the inferred Internet is included between the average diversities of the small- and large-diameter generated Internet topologies. Fig. 20 shows that the average path diversity using a single PI prefix does not rise much in function of the number of providers, for all topologies considered. Figs. 10 and 20 suggest that it is nearly impossible that a stub AS achieves a good path diversity using traditional IPv4 multihoming, whatever its number of provid1

Low-diameter Internet Inferred Internet Generated Internet Large-diameter Internet

Average diversity

0.8

0.6

0.4

0.2

0

1

2

3

Fig. 20. Average multihoming.

path

4 5 6 Number of provider

diversity

using

7

8

traditional

9

IPv4

ers. In contrast, as shown by Fig. 21, the path diversity that is obtained using multiple PA prefixes is much better. Figs. 20 and 21 show that a dualhomed stub AS using IPv6 multihoming already gets a higher diversity than any multihomed stub AS that uses traditional IPv4 multihoming, whatever its number of provider and for all topologies considered. In a small-diameter Internet, this diversity rises fast with the number of providers, but also shows a marginal gain that diminishes quickly. In a large-diameter Internet, the diversity rises more slowly. Fig. 22 summarises the results for the analysed topologies. It shows the path diversity benefit in percent that a stub AS obtains when it uses multiple PA prefixes instead of a single PI prefix. We can notice that the gain is obviously null for single-homed stubs, as the use of one PA prefix instead of one PI prefix has no impact on the path diversity. The figure shows that the gain is high when multiple PA prefixes are used, as soon as the stub AS has more than a single provider. Additionally, we can see that the gain does not vary much with the topology considered. Fig. 22 also shows that the gain for the current inferred Internet is almost everywhere included between the gains of the two extreme cases. Hence, this figure strongly suggests that the results observed for our synthetic topologies should also hold for the real Internet. In particular, the gain curve for the real Internet should most likely lie somewhere between the two extreme cases. So far, we have analysed the AS-level path diversity considering one router per AS. However, a factor that can impact the path from a source to a destination is the intradomain routing policy used inside transit ASes. In [23], we also evaluate the path Average IPv6 path diversity gain

1

250 Average gain [%]

0.8 Average diversity

300

IPv6 Low-diameter Internet IPv6 Inferred Internet IPv6 Generated Internet IPv6 Large-diameter Internet

0.6

0.4

200 150 100

0 0

Low-diameter Internet Inferred Internet Generated Internet Large-diameter Internet

50

0.2

1

2

3

4 5 6 Number of provider

7

8

Fig. 21. Average path diversity using IPv6 multihoming.

9

1

2

3

4 5 6 Number of provider

7

8

9

Fig. 22. Summary of path diversity gains when using multiple PA prefixes instead of a single PI prefix.

1156

C. de Launois et al. / Computer Networks 50 (2006) 1145–1157

diversity that exists when ISP routing policies in the Internet conform to hot-potato routing. In hotpotato routing, an ISP hands off traffic to a downstream ISP as quickly as possible. Results presented in [23] show that hot-potato routing has no significant impact on the AS-level path diversity.

The first factor suppresses paths, while the second factor increases the probability that paths overlap. An IPv6 multiaddress multihoming solution circumvents the first factor by using multiple prefixes. However, the use of multiple PA prefixes has no impact on the second factor, since it does not modify BGP and its decision process in particular.

5.5. Impact of BGP on path diversity 6. Conclusion We discuss in this section how the path diversity is affected by the BGP protocol. Multihoming is assumed to increase the number of alternative paths. However, the AS-level path diversity offered by multihoming depends on how much the interdomain routes, as distributed by BGP, overlap. The results presented in the previous section suggest that BGP heavily reduces the path diversity, at the level of autonomous systems. Two factors can explain why the diversity is so much reduced. The first and primary factor is that, for each destination prefix, each BGP router in the Internet receives one route from a subset of its neighbours. Based on this set of received routes, BGP selects a single best route towards the destination prefix, and next advertises this single best route to its neighbours. Therefore, each BGP router reduces the diversity of available paths. As a consequence, a single homed stub AS will receive from its provider only a single route towards each destination prefix, even if the destination site is connected to the Internet through multiple providers. Unfortunately, BGP is designed as a single path routing protocol. It is thus difficult to do better with BGP. A second factor exists that further reduces the path diversity. The tie-breaking rule used by BGP to decide between two equivalent routes often prefers the same next-hops. Let us consider a BGP router that receives two routes from its provider towards a destination D. According to the BGP decision process, the shortest AS path is selected. However the diameter of the current Internet is small, more or less 4 hops [2]. As a consequence, paths are often of the same length, and do not suffice to select the best path. It has been shown that between 40% and 50% of routes in core and large transit ASes are selected using tie-breaking rules of the BGP decision process [24]. In our model with one router per AS, the only tie-breaking rule used in this case is to prefer routes learned from the router with the lowest router address. This is the standard rule used by BGP-4. Unfortunately it always prefers the same next-hop, a practice that degrades the path diversity.

In this paper, we have revealed that a new way to improve network performance at the interdomain level is to use multiple provider-dependent aggregatable (PA) prefixes, in an IPv6 Internet. We have shown that stub ASes that use multiple PA prefixes can exploit paths that are otherwise unavailable. In other words, the use of multiple prefixes increases the number of paths available, i.e., the Internet path diversity. Among the newly available paths, some offer lower delays. Our simulations suggest that about 60% of the pairs of stub ASes can benefit from lower delays. We have also proposed a new, fine-grain metric to measure the AS-level path diversity. We performed simulations on various topologies to quantify the gain in path diversity when multiple prefixes are used. We have shown that a dual-homed stub AS that uses multiple PA prefixes has already a better Internet path diversity than any multihomed stub AS that uses a single provider-independent (PI) prefix, whatever its number of providers. We have observed that this gain in path diversity does not vary much with the topology considered, which suggests that the results obtained will most likely also hold for the real Internet. Our observations show that, from a performance point of view, IPv6 multihomed stub ASes get benefits from the use of multiple PA prefixes and should use them instead of a single PI prefix as in IPv4 today. This study thus strongly encourages the IETF to pursue the development of IPv6 multihoming solutions relying on the use of multiple PA prefixes. The use of such prefixes reduces the size of the BGP routing tables, but also enables hosts to use lower delays and more diverse Internet paths, which in turn yields to larger possibilities to balance the traffic load and to support quality of service. Acknowledgements Ce´dric de Launois is supported by a grant from FRIA (Fonds pour la Formation a` la Recherche dans lÕIndustrie et dans lÕAgriculture, Belgium). Bruno

C. de Launois et al. / Computer Networks 50 (2006) 1145–1157

Quoitin is supported by the Walloon Government within the WIST TOTEM project http://totem.info.ucl.ac.be. This work is also partially supported by the European Union within an E-Next project. We thank Steve Uhlig and Marc Lobelle for their useful comments and support. We also thank the authors of [13] for providing the inferred Internet topology. References [1] C. de Launois, B. Quoitin, O. Bonaventure, leveraging network performances with IPv6 multihoming and multiple provider-dependent aggregatable prefixes. In: 3rd International Workshop on QoS in Multiservice IP Networks (QoSIP 2005), Catania, Italy, February 2005. [2] G. Huston, BGP routing table analysis reports, , May 2004. [3] S. Agarwal, C.N. Chuah, R.H. Katz, OPCA: Robust interdomain policy routing and traffic control, in: Proceedings OPENARCH, 2003. [4] J.W. Stewart, BGP4: Inter-Domain Routing in the Internet, Addison-Wesley, 1999. [5] R. Atkinson, S. Floyd, IAB concerns and recommendations regarding internet research and evolution, RFC 3869, IETF, August 2004. [6] T. Bu, L. Gao, D. Towsley, On routing table growth, in: Proceedings IEEE Global Internet Symposium, 2002. [7] C. Huitema, R. Draves, M. Bagnulo, Host-Centric IPv6 Multihoming, Internet Draft, (February 2004), work in progress. [8] A. Akella et al., A measurement-based analysis of multihoming, in: Proceedings ACM SIGCOMMÕ03, 2003. [9] A. Akella et al., A comparison of overlay routing and multihoming route control, in: Proceedings ACM SIGCOMMÕ04, 2004. [10] V. Fuller, T. Li, J. Yu, K. Varadhan, Classless inter-domain routing (CIDR): an address assignment and aggregation strategy, RFC 1519, IETF, September 1993. [11] G. Huston, Architectural approaches to multi-homing for IPv6. Internet Draft, IETF, (October 2004), work in progress. [12] L. Gao, On inferring autonomous system relationships in the internet, IEEE/ACM Trans. Network. 9 (6) (2001). [13] L. Subramanian, S. Agarwal, J. Rexford, R.H. Katz, Characterizing the internet hierarchy from multiple vantage points, in: Proceedings IEEE Infocom, 2002. [14] A. Medina, A. Lakhina, I. Matta, J. Byers, BRITE: an approach to universal topology generation, in: Proceedings MASCOTS Õ01, 2001. [15] C. Jin, Q. Chen, S. Jamin, Inet: Internet topology generator, Technical Report CSE-TR-433-00, 2000. [16] K. Calvert, M. Doar, E. Zegura, Modeling internet topology, IEEE Commun. Mag. (1997). [17] B. Quoitin, Towards a POP-level internet topology, , August 2004. [18] B. Quoitin, C-BGP—An efficient BGP simulator, , March 2004.

1157

[19] R. Teixeira, K. Marzullo, S. Savage, G.M. Voelker, Characterizing and measuring path diversity of internet topologies, in: Proceedings SIGMETRICSÕ03, June 2003. [20] R. Teixeira, K. Marzullo, S. Savage, G.M. Voelker, In search of path diversity in ISP Network, in: Proceedings IMCÕ03, 2003. [21] B. Zhang, R. Liu, D. Massey, L. Zhang, Collecting the internet AS-level topology, SIGCOMM Comput. Commun. Rev. 35 (1) (2005) 53–61. [22] A. Bara´basi, R. Albert, Emergence of scaling in random networks, Science 286 (October) (1999) 509–512. [23] C. de Launois, Leveraging Internet Path Diversity and Network Performances with IPv6 Multihoming, Research Report RR 2004-06, Universite´ catholique de Louvain, Department of Computer Science and Engineering, , August 2004. [24] B. Quoitin, C. Pelsser, O. Bonaventure, S. Uhlig, A performance evaluation of BGP-based traffic engineering, Int. J. Network Manage. 15 (3) (2004).

Ce´dric de Launois obtained his degree in computer science and engineering in 2001 from Universite´ catholique de Louvain (UCL), Belgium. He is currently finalizing his Ph.D. in the Department of Computing Sciences and Engineering of the same university. His research interests include IPv6 multihoming and traffic engineering.

Bruno Quoitin obtained his MS degree in computer science from the University of Namur (Belgium). He currently works as a researcher at the Universite´ Catholique de Louvain (Belgium). His research interests include interdomain routing and traffic engineering.

Olivier Bonaventure leads the network research group at Universite´ catholique de Louvain (UCL), Belgium. He has published more than thirty papers and is on the editorial board of IEEE Network Magazine. His current research interests include intra- and interdomain routing, traffic engineering, multicast and network security.

Computer Networks 50 (2006) 1158–1175 www.elsevier.com/locate/comnet

Q-MEHROM: Mobility support and resource reservations for mobile senders and receivers Liesbeth Peters *, Ingrid Moerman, Bart Dhoedt, Piet Demeester Department of Information Technology (INTEC), Ghent University—IBBT—IMEC, Gaston Crommenlaan 8 bus 201, B-9050 Gent, Belgium Available online 5 October 2005

Abstract The increasing use of wireless networks and the popularity of multimedia applications, lead to the need for Quality of Service support in a Mobile IP-based environment. This paper investigates the reservation of resources for mobile receivers as well as senders in combination with micromobility support. We present Q-MEHROM, which is the close coupling between the micromobility protocol MEHROM and a resource reservation mechanism. In case of handoff, Q-MEHROM updates the routing information and allocates the resources for a sending or receiving mobile host simultaneously. Invalid routing information and reservations along the old path are explicitly deleted. Resource reservations along the part of the old path that overlaps with the new path are reused. Q-MEHROM uses access network topology and link state information calculated by QOSPF. Simulation results show that the control load is limited. Moreover, it consists mainly of QOSPF traffic and it is influenced by the handoff rate in the network. Q-MEHROM makes real use of the mesh links and extra uplinks, which are present in the access network to increase the robustness against link failures, to reduce handoff packet loss and to improve the performance for highly asymmetric network loads. Also, attention is paid to the differences between the mobile sender and the mobile receiver scenario.  2005 Elsevier B.V. All rights reserved. Keywords: IP QoS; IP mobility; Resource reservations; Micromobility

1. Introduction Today, wireless networks evolve towards IPbased infrastructures to allow a seamless integration between wired and wireless technologies. Most routing protocols that support IP mobility, assume that *

Corresponding author. Tel.: +32 9 33 14900; fax: +32 9 33 14899. E-mail addresses: [email protected] (L. Peters), [email protected] (I. Moerman), bart.dhoedt@ intec.ugent.be (B. Dhoedt), [email protected] (P. Demeester).

the network consists of an IP-based core network and several IP domains (also referred to as access networks), each connected to the core network via a domain gateway. This is illustrated in Fig. 1. In contrast to wired networks, the userÕs point of attachment to the network changes frequently due to mobility. Since an IP address indicates the location of the user in the network as well as the end point of its connections, user mobility leads to several challenges. Mobile IP (IPv4 [1], IPv6 [2]), which is standardized by the IETF, is the best known routing protocol that supports host mobility.

1389-1286/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.09.008

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

IP-based core network domain gateway

MACRO

IP home domain

domain gateway IP foreign domain

acces s router

MICRO

acces s router mobile hos t

Fig. 1. General structure of an all IP-based network.

While Mobile IP is used to support macromobility, i.e., the movements from one IP domain to another, much research is done to support the local movements within one IP domain, called micromobility. Examples of micromobility protocols are perhost forwarding schemes like Cellular IP [3] and Hawaii [4], and tunnel-based schemes like MIPv4 Regional Registration [5] and Hierarchical MIPv6 [6], proposed within the IETF, and the BCMP protocol [7], developed within the IST BRAIN Project. These protocols try to solve the weaknesses of Mobile IP with respect to micromobility support, by aiming to reduce the handoff latency, the handoff packet loss and the load of control messages in the core network. Low Latency Handoff protocols, like Low Latency MIPv4 [8] and Fast MIPv6 [9], were developed by the IETF to reduce the amount of configuration time in a new subnet by using link layer triggers in order to anticipate the movement of the mobile host. However, most research in the area of micromobility assumes that the links of the access network form a tree topology or hierarchical structure. Nevertheless, for reasons of robustness against link failures and load balancing, a much more meshed topology is preferred. In our previous work [10,11], MEHROM (Micromobility support with Efficient Handoff and Route Optimization Mechanisms) was developed, resulting in good performance, irrespective of the topology, for frequent handoffs within an IP domain. In a mobile IP-based environment, users want to receive real-time applications with the same QoS (Quality of Service) as in a fixed environment [12]. Several extensions to RSVP (Resource Reservation Protocol) under macro- and micromobility are proposed in [13,14]. However, the rerouting of the RSVP branch path at the cross-over node under micromobility again assumes an access network with tree

1159

topology. Moreover, the cross-over node triggers a PATH or RESV message after a Route Update message is received, introducing a substantial amount of delay. Current work within the IETF NSIS (Next Steps in Signaling) working group includes the analysis of some of the existing QoS signaling protocols for an IP network [15], the listing of Mobile IP specific requirements of a QoS solution [16] and the development of an architectural framework for protocols that signal information about a data flow along its path in the network [17]. The applicability of the proposed NSIS protocols in mobile environments [18] is also investigated; however, only macromobility scenarios are considered until now. In this paper, we investigate the reservation of resources for data flows from mobile receivers and senders in the micromobility scenario. We present Q-MEHROM, which is the close coupling between the micromobility protocol MEHROM and a resource reservation mechanism. At the time of handoff, the updating of routing information and the allocation of resources for a mobile receiver or sender are performed simultaneously, irrespective of the topology. Information gathered by QOSPF (Open Shortest Path First protocol with QoS extensions [19,20]) is used. The rest of this paper is structured as follows. Section 2 presents the framework used. In Section 3, a short overview of the goals and operation of the micromobility protocol MEHROM is given. Section 4 describes which information, obtained by the QOSPF protocol, is used by Q-MEHROM. In Section 5, the operation of Q-MEHROM is explained for mobile senders and receivers. Simulation results are presented in Section 6. The final Section 7 contains our concluding remarks. 2. Framework for micromobility support and resource reservations In Fig. 2, the framework, used for a close interaction between micromobility support and resource reservations, is presented. The grey blocks are the focus of this paper and are explained in more detail in the following subsections. 2.1. Micromobility protocol and resource reservations The central block in Fig. 2 is responsible for the propagation of information through the access network about the location of the mobile hosts and

1160

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

Q-MEHROM

Routing Agent QOSPF

Admission Control

Micromobility Protocol MEHROM Resource Reservations MEHROM extensions

Packet Classifier

Packet Scheduler

Fig. 2. Framework for micromobility routing and QoS support in an IP-based access network.

about the requested resources for the data flows sent or received by those mobile hosts. Frequent handoffs within one IP domain are supported by the micromobility protocol MEHROM [11], as explained in Section 3. By defining the resource reservation mechanism as an extension of MEHROM, resources can be allocated or re-allocated at the same time that the routing tables are updated at power up or after handoff. The combination of MEHROM and its resource reservation extensions is referred to as Q-MEHROM and is further explained in Section 5. Although a micromobility protocol is only required to update the routing tables to support data traffic towards a mobile host, in this paper we consider the reservation of resources for both data flows towards as well as from a mobile host. 2.2. Routing agent In order to obtain information about the topology of the access network and the state of the links, we use the routing agent QOSPF, as described in [20]. This is represented by the upper left block of the framework in Fig. 2. The interaction between both grey blocks is as follows: QOSPF computes paths in the access network that satisfy given QoS requirements and provides this information to Q-MEHROM in the form of QoS tables, which are presented in Section 4. Every time Q-MEHROM reserves or releases resources on a link, QOSPF is informed so that the updated link states can be advertised through the access network. 2.3. Admission control The upper right block in Fig. 2 represents the admission control policy. Although a variety of

mechanisms is possible, we have chosen for a simple admission control priority mechanism: priority is given to handoff requests over new requests. When a new request for resources is made by a mobile host, its access router decides whether the access network has sufficient resources to deliver the required QoS. If not, the request is rejected. At the time of handoff, the mobile host sends a handoff request for its existing connection. If the required resources cannot be delivered via the new access router, the handoff request is not rejected, but the delivered service is reduced to best-effort. This means that for a mobile receiver, a best-effort path is installed between the domain gateway and the access router, while for a mobile sender, standard IP routing is used. If the availability of resources for an existing connection is only checked at the time of handoff, this mechanism would have an important drawback: as long as a mobile host, receiving best-effort service, does not perform handoff, its service remains besteffort, even if enough resources became available due to the handoff of other mobile hosts. Therefore, the mobile host itself can trigger a new check at the access router, by sending an additional request. The mobile host sends this trigger when it receives a new beacon from its current access router, as long as it receives best-effort service and as long as it is not performing handoff soon. To make an estimation about the next time of handoff, cross-layer information can be used, e.g., link layer (L2) triggers or location information. The admission control mechanism gives the routing agent information about the resources that were reserved by the mobile host before handoff, as these resources can be reused. The routing agent in turn provides information about paths with sufficient resources in the access network by its QoS tables. It is obvious that during admission control, an access router must also take into account the availability of resources on the wireless link. This is, however, out of the scope of this paper. 2.4. Packet classifier and scheduler The lower blocks in Fig. 2 are the packet classifier and scheduler. When Q-MEHROM updates the routing tables, these routes are installed in the packet classifier so that data packets are mapped towards the correct outgoing link. For the scheduler, a variant of CBQ (Class Based Queueing) is used [21].

1161

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

When resources are reserved for a mobile host, QMEHROM inserts a class for that mobile host. When these resources are released after handoff, the class is removed again. The different classes for the mobile hosts have the same priority. Within those classes, the scheduler uses weighted roundrobin, with weights proportional to the amount of reserved resources. One class, that is never removed and that has a higher priority, is the class for all control traffic. To reduce complexity, we assume that the routing decision is only based on the mobile hostÕs care-of address, irrespective of the amount of flows. For simplicity, we consider one class for all flows to a specific mobile host and one class for all flows from a specific mobile host. Extensions to path setups and resource reservations for individual data flows are however straightforward. 3. Micromobility support by MEHROM 3.1. Requirements for micromobility support Based on our study of Cellular IP, Hawaii and MIPv4 Regional Registration [22,23], we concluded that it is necessary to cope with the following requirements for the handoff process to achieve a micromobility protocol that is able to support frequent handoffs: 1. During the handoff process, the load of the control messages should be concentrated near the involved access routers at the edge of the access network. If possible, the messages should not traverse the whole access network nor overload the domain gateway. 2. The handoff control messages should update the necessary routers as fast as possible, trying to limit packet loss and handoff latency. 3. The handoff process should result in an optimal new path between the current access router and the domain gateway. The term Ôoptimal pathÕ may refer to a path with minimum hop count, lowest cost, highest available bandwidth,. . . 4. The mobile host should be informed whether the handoff process succeeded or failed, so that it can react in case of a handoff failure. 5. The use of a micromobility protocol should be transparent to the mobile terminal, as this can ease the integration with Mobile IP. Doing so enables the mobile host to travel to domains that support Mobile IP but no micromobility protocol.

Table 1 Performance of Cellular IP (CIP), Hawaii (HAW) and MIPv4 Regional Registration (MIPv4-RR) concerning the protocol requirements

Control load near access routers Low latency and low packet loss Optimal paths (minimum hop count) Handoff acknowledgements Transparent to mobile host

CIP

HAW

MIPv4-RR

± ± Yes

Yes Yes No

No No Yes

No No

Yes Yes

Yes ±

Although the above mentioned micromobility protocols work correctly in a meshed access network, the topology has an important influence on their performance [22,23]. For example, Cellular IP and MIPv4 Regional Registration do not use the mesh links, while Hawaii takes advantage of the mesh links to reduce the handoff latency and packet loss. However, the use of Hawaii in a meshed topology results in suboptimal routes and high endto-end delays after several handoffs. To what extent the requirements are fulfilled by the studied existing micromobility protocols, when used in a meshed access network topology, is summarized in Table 1. 3.2. MEHROM handoff scheme Our developed micromobility scheme MEHROM aims to satisfy the requirements previously described for frequent handoffs within an IP domain and irrespective of the topology. This is realized by splitting the handoff scheme into two phases. During the first phase, a fast handoff is made, by sending a Route Update message from the new access router to the old one. As a result, MEHROM can use mesh links to limit the handoff latency and packet loss, in contrast to Cellular IP and MIPv4 Regional Registration. Only if the new path is not optimal, is the second phase executed by sending a new Route Update message from the new access router towards the domain gateway. This guarantees the setup of an optimal new path between the domain gateway and the current access router in meshed topologies, in contrast to Hawaii. A more detailed description of MEHROM and a comparison with Cellular IP, Hawaii and MIPv4 Regional Registration, is given in [10,11]. MEHROM is a per-host scheme and every router along the path between the domain gateway and the current access router has an entry in its routing table indicating the next hop to the mobile host. To perform handoff, MEHROM updates the

1162

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

tisements (LSAs). As a result, the domain gateway (GW), all routers and all access routers (ARs) have an updated link-state database that reflects the access network state and topology. To provide useful information to Q-MEHROM, QOSPF calculates, in each node of the access network, several QoS routing tables from the database. Fig. 3 shows a simple access network, which is used to illustrate the structure of the different QoS routing tables. One bidirectional link is indicated by two directed edges, each with its available bandwidth.

routing tables of the routers in the access network to set up an optimal path between the domain gateway and the currently used access router. Here, an optimal path is defined as the path with minimum hop count. Data packets are routed towards the mobile host by the use of the entries installed by MEHROM. For the routing of data packets from the mobile host towards the core network, thus via the domain gateway, standard IP routing is used. The handoff scheme has the following characteristics: at the time of handoff, the necessary signaling to update the routing tables is kept locally as much as possible. New entries are added by Route Update messages and obsolete entries are explicitly deleted by Route Delete messages, resulting in a single installed path for each mobile host. A successfull setup of a new path is reported to the new access router by an Acknowledgment message. This micromobility scheme is very suitable to be closely coupled with a resource reservation mechanism. When a mobile host performs a handoff, we want to propagate the necessary QoS information as fast as possible to limit the degradation of the delivered QoS. In addition, we want to restrict the signaling to the part of the new path that does not overlap with the old path and explicitly release reserved resources along the unused part of the old path. In what follows, we assume that the requested resource is a certain amount of bandwidth.

The Delay table is only used during the setup of a path towards a mobile host, after handoff. In this case, the first phase of the Q-MEHROM handoff algorithm uses the Delay table to send as fast as possible a Route Update message from the new access router towards the old one. The Delay table of a router has this router as source and an entry for every access router as destination. Therefore, the router calculates the path with smallest delay to a specific AR, using a single objective path optimization. The next hop to that AR is then put in the Delay table, as illustrated in Table 2. As the delay characteristics are not expected to change frequently, the Delay table needs no frequent recalculation.

4. Information calculated by QOSPF

4.2. Bandwidth table

In order to develop a resource reservation mechanism, information about the state of the access network must be provided. QOSPF advertises link metrics, like link available bandwidth and link delay, across the access network by Link State Adver-

A Bandwidth table is used when resources are reserved for a new traffic flow. Here, the available bandwidth on the links of the access network is taken into account. A double objective path optimization is used, also taking the hop count into account. For a certain value of the hop count, the path with at most this number of hops and with maximum bandwidth (BW) is calculated between source and destination. If more than one path with maximum bandwidth is available, the shortest path is chosen. To make reservations for data traffic towards a mobile host (MH), a router calculates a

GW [3]

[4] R3

[3]

[4] [5] [5]

R1

[2] R2

[5] [3]

[4]

[5]

[5]

4.1. Delay table

Table 2 Delay table for router R2 of Fig. 3, if all links have the same link delay Delay table

AR1

AR2

Fig. 3. Example of a simple access network [x] = [available bandwidth (Mbps)].

Destination

Next hop

AR1 AR2

R1 AR2

L. Peters et al. / Computer Networks 50 (2006) 1158–1175 Table 3 Bandwidth tables for router R2 of Fig. 3 Max. hops

1 2 3

BWtable_to_MH

BWtable_from_MH

Last hop

Max. bw

Next hop

Max. bw

– R3 R1

– 2 Mbps 3 Mbps

– R3 R3

– 4 Mbps 4 Mbps

BWtable_to_MH with the domain gateway as source and itself as destination. The BWtable_ to_MH gives the last node on the path before reaching the destination. To make reservations for data traffic from a mobile host, the router calculates a BWtable_from_MH with itself as source and the domain gateway as destination. In this case, the next hop on the path to the domain gateway is put in the BWtable_from_MH. When the amount of available bandwidth in the access network changes, the Bandwidth tables should be recalculated. An example is given in Table 3. 4.3. Reuse-Bandwidth table A Reuse-Bandwidth table is used for handoff reservations. When a mobile host performs handoff, a new path must be set up. It is possible that the new path partly overlaps the old path. Resources along this common part should be reused and not be allocated twice. Therefore, QOSPF must consider the resources reserved for the mobile host before handoff, also as available resources and calculate the Reuse-Bandwidth tables, called the RBWtable_to_MH for the data path towards the mobile host and the RBWtable_from_MH for the data path from the mobile host. Note that it is not our intention that the new path overlaps as much as possible with the old path. These tables can only be calculated at the time of handoff and they need the old path and the amount of reserved bandwidth for the mobile host as input. The Reuse-Bandwidth tables have the same structure as the Bandwidth tables. For the values in Table 4, it is assumed that a mobile host performs handoff from AR1 to AR2, with an old Table 4 Reuse-Bandwidth tables for router R2 of Fig. 3 Max. hops

1 2 3

RBWtable_to_MH

RBWtable_from_MH

Last hop

Max. bw

Next hop

Max. bw

– R3 R1

– 2 Mbps 5 Mbps

– R3 R3

– 5 Mbps 5 Mbps

1163

reservation of 2 Mbps on the downlink path [GW, R3, R1, AR1] and of 1 Mbps on the uplink path [AR1, R1, R3, GW]. 5. Q-MEHROM mechanism In this section, we explain the basic operation of Q-MEHROM in three cases. First, the situation where the mobile host is the receiver is considered. Secondly, the mechanisms in the case where the mobile host is the sender are presented. Thirdly, the situation where the mobile host is both sender and receiver is discussed. The update of the routing tables and the reservation of resources at the time of handoff is initiated by the mobile host. This results in a receiver-initiated path setup for data traffic towards the mobile host and a sender-initiated setup of the path for data traffic from the mobile host. By initiating the setup by the mobile host when performing handoff, the new path and reservations are set up as fast as possible and the reduction in the delivered QoS is minimized. Fig. 4 illustrates the packet formats, specific to Q-MEHROM (Version = 2), that are used in the following subsections. The fields in bold are extensions to the basic MEHROM protocol (Version = 1). 5.1. Mobile host is receiver The same phases of MEHROM are adopted to reduce the handoff packet loss (thanks to phase 1) and to guarantee the setup of an optimal path (thanks to phase 2). In Q-MEHROM, these phases are extended with a reservation mechanism. An optimal path is now defined as a path with minimum hop count that can deliver the requested amount of bandwidth. Like basic MEHROM, QMEHROM is capable detecting whether the new path will be optimal, using an optimal_path parameter in the Route Update messages. The basic operation of Q-MEHROM is explained using the three successive situations in Fig. 5. These three situations are elaborated below. 5.1.1. Bandwidth reservations (Fig. 5(1)) When a mobile host (MH) powers up in the access network, only MEHROM is used to set up a (best-effort) path for the MH in the access network. This path is a path with minimum hop count between the domain gateway (GW) and the current access router.

1164

L. Peters et al. / Computer Networks 50 (2006) 1158–1175 Acknowledgment (Type = 3)

Route Update (Type = 1) Version Type

Length

Requested_BW

Version Type

Requested_BW

Mobile Host Address

Mobile Host Address

New Access Router Address

Ack Source

Destination Address

New Path

Old Path

Power Up (Type = 4)

New Path until now Version Type Route Delete (Type = 2) Version Type

Length

OS

Optimal S

Length

Length

Requested_BW

S S

Mobile Host Address

Mobile Host Address

New Access Router Address

New Path

New Path until now

32 bit

32 bit

Fig. 4. Packet formats specific to Q-MEHROM. ÔOptimalÕ is the optimal_path parameter. ÔOÕ is a flag indicating whether the new path is optimal. ÔSÕ is a flag indicating whether the mobile host is sender.

When a traffic flow is set up towards a MH, this MH requests an amount of bandwidth (information obtained from the application layer) for that flow at its current access router. After a positive admission control, the access router starts the resource reservation mechanism, using its BWtable_to_MH. If the MH is the receiver of multiple flows, the requested bandwidth is the sum of the individual amounts of bandwidth requested by the applications. 5.1.2. Handoff phase 1—fast handoff (Fig. 5(2)) After the receipt of a Mobile IP registration request, the new access router (nAR) (we assume a positive admission control) adds a new entry for the MH in its routing table and sends a Route Update message towards the old access router (oAR) in a hop-by-hop fashion. When R2 processes the Route Update message, it checks the optimal_path parameter in this message. As the node is part of the optimal path between the GW and the nAR, the value of the optimal_path parameter was set to 1. Therefore, the necessary resources are reserved on the link between R2 and the nAR. As a next step, it looks up the next hop on the path with smallest delay towards the oAR in the Delay table (i.e., R1). Furthermore, it retrieves in the RBWtable_to_MH the last node before reaching R2 on the optimal path between the GW and R2 (i.e., R3). As both nodes are different, the optimal_path parameter is set to 0, indicating the path will be suboptimal. Once this parameter contains the value 0, it remains 0 and only the Delay table must be consulted. The Route Update message is forwarded using the next hop from the Delay table.

R1 has an old entry for the MH. Therefore R1 is considered as the cross-over node (CN), and the Route Update message is not forwarded further. The CN sends a suboptimal handoff Acknowledgment (ACK) back to the nAR. The CN also deletes the reservation on the link from R1 to oAR and sends a Route Delete message to the oAR, to delete the invalid entry in its routing table. The nAR is informed through a suboptimal handoff ACK that a suboptimal path was found during phase 1. As data packets can reach the MH via this suboptimal path, the nAR first sends a Mobile IP registration reply to the MH. Next, the nAR starts the route optimization phase. 5.1.3. Handoff phase 2—route optimization (Fig. 5(3)) The nAR sends a new Route Update message towards the GW, also in a hop-by-hop fashion, to establish an optimal path. During phase 2, the optimal_path parameter is set to value 2, indicating that the Route Update message must be forwarded towards the node retrieved in the RBWtable_to_MH (the Delay table is not consulted). Since R2 already has a new entry in its routing table, the entry is refreshed and the message is simply forwarded. The Route Update message finds a new CN, i.e., R3, which updates its routing table, allocates resources on the link from R3 to R2 and deletes the reservation on the link from R3 to R1. At that time, an optimal path is set up: an optimal handoff ACK is sent back to the nAR and a Route Delete message is sent to R1.

1165

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

(1)

oAR

MH

GW

[2, 5] [2, 5]

R2

R3

BW reservation (R3 R1)

power up

R2

power up

BW reservation (GW R3)

[2, 5]

[2, 2]

GW

BW reservation (R1 oAR)

power up

R3

R1

R1

MIP reg. request

[2, 2]

[2, 2]

nAR

optimal acknowledgment oAR

nAR MIP reg. reply

MH

(2)

[3]

oAR

MH

GW

nAR

[2, 5] [2, 5]

cro ss- ov er node BW release (R1 oAR)

R2

nAR

MH

(3)

route update

suboptimal acknowledgment MIP reg. reply

[3] MH

GW

oAR

nAR

R1

R3

R1

cro ss- ov er node BW reservation (R3 R2 ) BW release (R3 R1)

[2, 2] [2, 5]

oAR

nAR

MH

R3

GW

route update

delete

R2

optimal acknowledgment

[2, 2]

[2, 5]

R2

route update

[2, 2]

[2, 5]

GW

delete [2, 2]

[2, 5] oAR

R3

BW reservation (R2 nAR)

route update

R3

R1

R2

MIP reg. request

[2, 2]

[2, 2]

R1

MIP reg. reply

[3]

Fig. 5. Example of bandwidth reservation for a mobile receiver. (1) the MH requests resources to receive its traffic flows. The MH performs handoff which consists of phase 1 (see 2) and phase 2 (see 3). The arrows indicate entries for the mobile receiver and point to the next hop towards the mobile host. A dashed arrow indicates an entry without resource reservations on the next link. [x, y] = [link delay (ms), available bandwidth (Mbps)]; [z] = [requested bandwidth (Mbps)].

5.2. Mobile host is sender At the time of handoff, only the second phase of Q-MEHROM is required as the mobile host always sends the data packets to its current access router.

This results in two successive situations, which are illustrated in Fig. 6. The setup of a best-effort path after handoff towards the mobile host is realized by basic MEHROM and is not considered in the section, as this is very similar to Section 5.1.

1166

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

(1)

MH

oAR

nAR

R1

R3

R2

GW

GW MIP reg. request

[2, 2]

R1

BW reservation (R1 R3)

power up

R3 [2, 2]

BW reservation (oAR R1)

[2, 5] [2, 5]

power up

R2

BW reservation (R3 GW)

power up

[2, 5]

[2, 2]

optimal acknowledgment oAR

nAR MIP reg. reply

MH

(2)

[3]

MH

oAR

nAR

R1

R2

R3

GW

GW MIP reg. request

[2, 2]

R1

[2, 2] [2, 5]

BW release (R1 R3)

R2

(oAR

oAR

cro ss- ov er node

route update

delete

[2, 2] BW release

[2, 5]

BW reservation (R2 R3)

route update

R3 [2, 5]

BW reservation (nAR R2)

optimal acknowledgment delete

R1)

nAR MIP reg. reply MH

[3]

Fig. 6. Example of bandwidth reservation for a mobile sender. (1) the MH requests resources to send its traffic flows, (2) the MH performs handoff which consists of a single phase. The arrows indicate entries for the mobile sender and point to the next hop towards the domain gateway. [x, y] = [link delay (ms), available bandwidth (Mbps)]; [z] = [requested bandwidth (Mbps)].

5.2.1. Bandwidth reservations (Fig. 6(1)) When a MH powers up in the access network, standard IP routing is used to route the data packets on a (best-effort) path towards the domain gateway. When a traffic flow from a MH is set up, this MH requests an amount of bandwidth (information obtained from the application layer) for that flow at its current access router. After a positive admission control, the access router starts the resource reservation mechanism, using its BWtable_from_MH. If the MH is the sender of multiple flows, the requested bandwidth is the sum of the individual amounts of bandwidth requested by the applications. 5.2.2. Handoff (Fig. 6(2)) After the receipt of a Mobile IP registration request, the nAR (again we assume a positive admis-

sion control) adds a new entry for the MH in its routing table and reserves the necessary resources on the link between the nAR and R2. The optimal_path parameter is set to value 2, indicating that the Route Update message must be forwarded towards the GW, in a hop-by-hop fashion. Therefore, the next hop retrieved in the RBWtable_from_MH (no Delay table is consulted) is used. Thus, the nAR forwards the Route Update message to R2, which updates its routing table, makes reservations on the link between R2 and R3 and forwards the Route Update message further to R3. R3 has an old entry for the MH. Therefore R3 is considered as the CN, and the Route Update message is not forwarded further. The CN sends an optimal handoff ACK back to the nAR. Upon receipt of this ACK, the nAR sends a Mobile IP

1167

L. Peters et al. / Computer Networks 50 (2006) 1158–1175 Table 5 Cross-over node detection by router R3 for a receiving mobile host in Fig. 5(3)

Before handoff After handoff

Destination

Next hop

Reserved bandwidth (Mbps)

MH MH

R1 R2

3 3

Table 6 Cross-over node detection by router R3 for a sending mobile host in Fig. 6(2)

Before handoff After handoff

Source

Next hop

Previous hop

Reserved bandwidth (Mbps)

MH MH

GW GW

R1 R2

3 3

registration reply to the MH. The CN also sends a Route Delete message to R1, in order to delete the reservation on the link from R1 to R3. Finally this Route Delete message arrives in the oAR and the reservations on the link from oAR to R1 are released. 5.3. Mobile host is receiver and sender When the mobile host is both sender and receiver, the Q-MEHROM handoff mechanisms for the path from the mobile host and for the path towards the mobile host can be performed independently from each other. The Q-MEHROM control messages must indicate to which handoff mechanism they belong. As a result, both paths do not necessarily contain the same nodes and the cross-over nodes can be different. Tables 5 and 6 illustrate how a router can detect whether it has the function of crossover node during handoff. In the case of the path for data flowing towards the mobile host, this is based on a changing next hop in the routerÕs routing table, illustrated in Table 5. When the path for data flowing from the mobile host towards the domain gateway is considered, a changing previous hop in the routing table indicates that the router is the cross-over node, as shown in Table 6.

In order to evaluate the performance of Q-MEHROM, the protocol is implemented in the network simulator ns-2 [24]. QOSPF is simulated with extensions for the calculation of the different QoS routing tables, added to the implementation of [25]. Unless otherwise mentioned, the following parameter values are chosen: Wired and wireless links. The wired links of the access network have a delay of 2 ms and a capacity of 2.5 Mbps. This rather low value for the link capacity is chosen to investigate highly loaded networks with a limited number of mobile hosts. For the wireless link, IEEE 802.11 is used with a physical bitrate of 11 Mbps (ns-2 does not take scanning and authentication delays on the link layer (L2) into account). Access routers and mobile hosts. Every access router broadcasts beacons at fixed time intervals Tb of 1.0 s. The distance between two adjacent access routers is 200 m, with a cell overlap do of 30 m. All access routers are placed on a straight line. Mobile hosts move at a speed vMH of 20 m/s and travel from one access router to another, maximizing the overlap time of do/vMH. Traffic. CBR (constant bit rate) data traffic patterns are used, with a bitrate of 0.5 Mbps. For every mobile node (receiver or sender), one UDP connection is set up between a fixed host in the core network directly connected to the domain gateway and the mobile terminal itself. Access network topology. Tree, mesh and random topologies are investigated. The topologies that are used for the simulations, are given in Fig. 7. Every

domain gateway 1

sender/receiver

mesh links

2

3

IP domain

extra uplinks 4

8

5

9

10

6

11

12

7

13

14

15

acces s router receiver/sender

6. Evaluation The evaluation is done by examining the control traffic load and the handoff packet loss. Also the case of a highly loaded network is investigated.

Fig. 7. Tree, mesh and random topology of the access network, used during the simulations. The mesh topology consists of the tree structure (full lines) with the indicated additional mesh links (dashed lines), while the random topology is formed by adding extra uplinks (dotted lines) to the mesh topology.

1168

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

bile node moves towards a randomly chosen access router. When arriving at its destination, the mobile host moves towards another (again randomly) chosen access router. Using this movement pattern, a mobile host never stops and stays in a single cell. There are two contributors to the control load: Q-MEHROM and QOSPF. Both are shown in the figures. The Q-MEHROM control load is formed by the Route Update, Route Delete, Acknowledgment and Power Up messages. The QOSPF control traffic, formed by Link State Advertisements and Acknowledgements, consists of two parts. The first part is formed by the QOSPF messages at the start of the simulation, advertising the initial link state information through the network. The second part is the QOSPF traffic caused by resource reservations and resource releases as these change the amount of available bandwidth on the involved links.

change in link available bandwidth triggers QOSPF LSAs, and the information in the QoS routing tables is calculated on demand. 6.1. Load of control traffic in the access network For the different topologies of Fig. 7, the average amount of control traffic on a wired link in the access network is investigated. The results for an increasing number of mobile hosts are shown in Fig. 8, where in the left figure mobile receivers and in the right figure mobile senders are considered. Fig. 9 investigates the influence of the velocity of a single mobile host, where the figure on the left shows the results for a mobile receiver and the figure on the right the results for a mobile sender. During one simulation, the mobile hosts move randomly from one access router to another during 300 s, requesting 0.5 Mbps. Randomly means that a mo-

4

x 10

Average control traffic on a link (bytes)

Average control traffic on a link (bytes)

4

7

qospf tree q-mehrom tree qospf mesh q-mehrom mesh qospf rand q-mehrom rand

6 5 4 3 2 1 0 0

1

2

3 4 5 6 Number of mobile receivers

7

7

x 10

qospf tree q-mehrom tree qospf mesh q-mehrom mesh qospf rand q-mehrom rand

6 5 4 3 2 1 0 0

8

1

2

3 4 5 6 Number of mobile senders

7

8

Fig. 8. Average control load on a wired link of the access network as a function of the number of mobile receivers (left figure) or senders (right figure).

4

x 10

Average control traffic on a link (bytes)

Average control traffic on a link (bytes)

4

2.5

qospf tree q-mehrom tree qospf mesh q-mehrom mesh qospf rand q-mehrom rand

2

1.5

1

0.5

0 10

15

20 25 30 35 40 Speed of mobile receiver (m/s)

45

50

2.5

x 10

qospf tree q-mehrom tree qospf mesh q-mehrom mesh qospf rand q-mehrom rand

2

1.5

1

0.5

0 10

15

20 25 30 35 40 Speed of mobile sender (m/s)

45

50

Fig. 9. Average control load on a wired link of the access network as a function of the speed of a single mobile receiver (left figure) or sender (right figure).

1169

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

Note that we have investigated the worst case, where every change in available bandwidth triggers QOSPF LSAs. Periodically triggering of LSAs or triggering only after a significant change in available bandwidth can reduce the control load, at the cost of the accuracy of link state information. Even in the considered worst case, the required amount of bandwidth for control traffic remains low compared to the link capacities (±0.053% for 8 mobile receivers with a speed of 20 m/s and ±0.075% for 8 mobile senders with a speed of 20 m/s). Therefore, we reserved 1% of the given link capacity for the control traffic class of CBQ. This is sufficient to support more than 100 mobile hosts with a speed of 20 m/s.

The load of Q-MEHROM messages is very low compared to the load of QOSPF traffic. The QOSPF traffic in the absence of mobile hosts contains only the first part, with messages to advertise the initial link state information. This part depends purely on the topology and the number of links. The second part of the QOSPF load, with messages triggered by resource reservations and releases, is also determined by the number of handoffs and their location in the topology. The required amount of bandwidth for QOSPF and Q-MEHROM control traffic increases for higher handoff rates, i.e., the number of handoffs in the network per time unit. This handoff rate increases for higher number of mobile hosts, illustrated in Fig. 8 and for higher speeds as shown in Fig. 9. The load caused by Q-MEHROM messages is a little higher for mobile senders compared to the situation with mobile receivers. Although a mobile sender only requests reservations along a path towards the domain gateway, the micromobility protocol must also set up a best-effort path from the domain gateway towards the current access router. In contrast, no best-effort path to the domain gateway is set up for a mobile receiver, as standard IP routing can be used. Also the QOSPF load is higher for mobile senders, but for a different reason. When a mobile receiver performs handoff, the cross-over node makes reservations on the new link, releases the reservations on the old link and triggers a single link state update. For a mobile sender, the node that is indicated as the previous hop before handoff in the routing table of the cross-over node releases the resources and triggers a link state update. In addition, the new previous hop makes resource reservations and again triggers a link state update.

150

100

900

tree – 1.0 s mesh – 1.0 s rand – 1.0 s tree – 0.75 s mesh – 0.75 s rand – 0.75 s tree – 0.50 s mesh – 0.50 s rand – 0.50 s

50

0 10

15

20 25 30 35 40 Speed of mobile receiver (m/s)

To investigate the handoff performance, the following scenario is considered: during 1 simulation, a single mobile host moves from the leftmost to the rightmost access router of Fig. 7, performing 7 handoffs. The results in Figs. 10 and 11 are average values of a set of 200 independent simulations, i.e., there is no correlation between the sending of beacons by the access routers, the movements of the mobile host and the sending of the data packets. Fig. 10 shows the loss of data packets as a function of the speed of the mobile host. The data flow has a packet size of 500 bytes and a rate of 1 Mbps. Results are given for beacons sent at time intervals Tb = 1.0 s, 0.75 s and 0.50 s. The packet loss is independent of the velocity, as long as the mobile host resides long enough in the overlap region to receive at least one beacon from a new access router while it is still connected to its previous access router. Otherwise, packet loss increases rapidly. In the

Average number of lost packets

Average number of lost packets

200

6.2. Handoff performance

45

50

800 700 600 500 400

tree – 1.0 s mesh – 1.0 s rand – 1.0 s tree – 0.75 s mesh – 0.75 s rand – 0.75 s tree – 0.50 s mesh – 0.50 s rand – 0.50 s

300 200 100 0 10

15

20 25 30 35 40 Speed of mobile sender (m/s)

45

50

Fig. 10. Average packet loss as a function of the mobile receiverÕs (left figure) or senderÕs (right figure) speed, for several beacon time intervals.

1170

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

120 100 80 60

120

tree – 500 bytes mesh – 500 bytes rand – 500 bytes tree – 1000 bytes mesh – 1000 bytes rand – 1000 bytes tree – 1500 bytes mesh – 1500 bytes rand – 1500 bytes

Average number of lost packets

Average number of lost packets

140

40 20 0 0.5

0.65

0.8 0.95 1.1 1.25 1.4 1.55 1.7 Data rate towards mobile host (Mbps)

1.85

2

100 80 60

tree – 500 bytes mesh – 500 bytes rand – 500 bytes tree – 1000 bytes mesh – 1000 bytes rand – 1000 bytes tree – 1500 bytes mesh – 1500 bytes rand – 1500 bytes

40 20 0 0.5

0.65

0.8

0.95 1.1 1.25 1.4 1.55 1.7 Data rate from mobile host (Mbps)

1.85

2

Fig. 11. Average packet loss as a function of the data rate sent to (left figure) or from (right figure) the mobile host, for several packet sizes.

considered situation, where the mobile host moves on a straight line through the center of the overlap regions, this requirement is fulfilled when vMH 6 d o =T b .

ð1Þ

When we compare the results for a mobile receiver (left figure of Fig. 10) with the results for a mobile sender (right figure of Fig. 10), some important differences can be noticed. As long as the mobile sender resides long enough in the overlap region, the packet loss is much lower (almost zero) compared to the case of the mobile receiver. As soon as the mobile sender receives a beacon from a new access router, it sends its new data packets towards this new access router, while the mobile receiver must wait until a new path towards the new access router is set up and new data packets are routed towards this new access router. If requirement (1) is not fulfilled, the packet loss for a mobile sender is much higher. When the mobile sender moves out of the area of the old access router, all data packets, generated before the receipt of a new beacon, are routed towards the old access router and wait in the buffer of the mobile host to be sent. Even if the mobile sender receives a beacon from a new access router, it still tries to send those data packets to the old access router. This results in an extra delay before the first data packet is sent to the new access router. This extra delay leads to higher packet loss due to buffer overflow in the mobile sender. For the mobile receiver, the packet loss only increases because the time to set up a new path in the access network is longer when requirement (1) is not fulfilled. In the latter case of the mobile receiver, QMEHROM takes advantage of mesh links and extra uplinks to reduce the packet loss compared to the tree topology. For the mobile sender, the results

are almost the same for the three considered topologies. Fig. 11 gives the results for the data packet loss as a function of the data rate. The mobile host moves at a speed of 20 m/s and the access routers send beacons at time intervals of 1.0 s. The packet sizes used are 500 bytes, 1000 bytes and 1500 bytes. The use of IEEE 802.11 and collisions on the wireless link limit the achievable throughput. For the simulations in this subsection, the RTS/CTS scheme was used. For a wireless bitrate of 11 Mbps and a data packet size of x bytes, the TMT (Theoretical Maximum Throughput) is given by [26] 8x TMTðxÞ ¼  106 bps with ax þ b  a ¼ 0:72727; b ¼ 890:73 for CSMA=CA;  a ¼ 0:72727; b ¼ 1566:73 for RTS=CTS. ð2Þ This explains the rapid increase of packet loss for a packet size of 500 bytes and data rates higher then 1.85 Mbps (TMT = 2.072 Mbps). In the case of a mobile receiver (left figure of Fig. 11), while after handoff the attempts of the old access router to send a data packet to the mobile host are useless, the next beacon of this access router can be delayed. This delay is higher for increasing data rates and can result in a ping–pong handoff and thus higher packet loss. In the case of the mobile sender (right figure of Fig. 11), handoffs in combination with high data rates can result in a buffer overflow in the mobile host. Furthermore, the mobile sender can fail to successfully receive a beacon from the new access router because it was sending a data packet to the old access router at the same time.

1171

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

In the case of a mobile receiver, for a given packet size, the packet loss increases for higher data rates. Furthermore, it is higher for the tree topology compared to the other topologies. The reason is that for a mobile receiver, the handoff packet loss depends on the time needed to set up a new path. For the mobile sender, the packet loss is much less influenced by the data rate and is not influenced by the access topology. 6.3. Highly loaded access networks In order to investigate how well Q-MEHROM can handle mismatch in traffic load and network capacity, we consider the following scenario: at the start of the simulation, the network is symmetrically loaded by a single mobile host in each of the eight cells of Fig. 7. During the simulation, these mobile hosts move randomly from one access router to another during 300 s with a velocity of 20 m/s. Each

Best–effort, tree topology

5

x 10

5 4 3 2 1 0 0

50

100

150 200 Time (seconds)

Resource reservations, tree topology

5

6 Received bandwidth (bits/second)

Received bandwidth (bits/second)

6

mobile host is the receiver or sender of a CBR data flow with a packet size of 1500 bytes and a bitrate of 0.5 Mbps. These values correspond to a situation where the wireless channel is not the bottleneck for the received or sent bandwidth (TMT = 6.056 Mbps for the CSMA/CA scheme and TMT = 4.515 Mbps for the RTS/CTS scheme), even if all mobile hosts move into the same cell. For each of 8 mobile hosts, the average received or sent bandwidth, calculated as a sliding average with a window of 3 s, as a function of the simulation time is given in Figs. 12 and 13 for the tree topology and in Figs. 14 and 15 for the random topology. For the presented results, the CSMA/CA scheme was used. For the tree topology, see Figs. 12 and 13, only one possible path between the domain gateway and a specific access router exists. So both in the best-effort case where basic MEHROM and standard IP routing are used (left figures) as in the case of resource reservations by the use of Q-MEHROM

250

x 10

5 4 3 2 1 0 0

300

50

100

150 200 Time (seconds)

250

300

250

300

Fig. 12. Received bandwidth by the mobile hosts in highly loaded networks for the tree topology.

Best–effort, tree topology

5

x 10

5 4 3 2 1 0 0

50

100

150 200 Time (seconds)

Resource reservations, tree topology

5

6 Sent bandwidth (bits/second)

Sent bandwidth (bits/second)

6

250

300

x 10

5 4 3 2 1 0 0

50

100

150 200 Time (seconds)

Fig. 13. Sent bandwidth by the mobile hosts in highly loaded networks for the tree topology.

1172

L. Peters et al. / Computer Networks 50 (2006) 1158–1175 Best–effort, random topology

5

x 10

5

6 Received bandwidth (bits/second)

Received bandwidth (bits/second)

6 5 4 3 2 1

0 0

50

100

150 200 Time (seconds)

250

x 10

5 4 3 2 1 0 0

300

Resource reservations, random topoplogy

50

100

150 200 Time (seconds)

250

300

Fig. 14. Received bandwidth by the mobile hosts in highly loaded networks for the random topology.

Best–effort, random topology

5

x 10

5 4 3 2 1 0 0

50

100

150 200 Time (seconds)

Resource reservations, random topology

5

6 Sent bandwidth (bits/second)

Sent bandwidth (bits/second)

6

250

300

x 10

5 4 3 2 1 0 0

50

100

150 200 Time (seconds)

250

300

Fig. 15. Sent bandwidth by the mobile hosts in highly loaded networks for the random topology.

(right figures), the data packets of the data flows of mobile hosts residing in the same cell are routed via the same path. The capacity of the links closest to the domain gateway limits the amount of bandwidth that can be allocated for mobile hosts residing in the underlying cells. As a small part of the total link bandwidth of 2.5 Mbps is reserved for control traffic, only four reservations of 0.5 Mbps can be made on a single link. For the best-effort situation, the available bandwidth is shared between all mobile hosts using the same link. When Q-MEHROM is used, the mobile hosts that made a reservation, receive their requested bandwidth irrespective of newly arriving mobile hosts. As a result, more mobile hosts receive 0.5 Mbps compared to best-effort. For the random topology, the difference between the best-effort situation and the use of Q-MEHROM is much more significant, as illustrated in Figs. 14 and 15. Due to the presence of extra uplinks, more than one path with minimum hop count

may be found between the domain gateway and a specific access router. In the case of best-effort, one of these possible paths is chosen without taking into account the available bandwidth. Only for the mobile hosts in cell 15 of Fig. 7, does ns-2 use the path via the right link closest to the domain gateway. All the other mobile hosts share the bandwidth of the left link closest to the domain gateway. In contrast, Q-MEHROM chooses a path with enough resources, if available. Due to the mesh links and extra uplinks, the capacity of the links closest to the domain gateway no longer forms a bottleneck. The right figures of Figs. 14 and 15 clearly show how the proposed Q-MEHROM handoff scheme efficiently makes use of the extra links to support asymmetric network loads to spread the data load over the network. The results for mobile senders are very similar compared to the results for mobile receivers. However, when several mobile hosts are sending to the

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

same access router, many more collisions on the wireless link occur than when an access router has data packets for several mobile receivers. This explains the higher amount of smaller bandwidth variations in the case of mobile senders compared to the case of mobile receivers. 7. Conclusions In this paper we presented Q-MEHROM, which is the close coupling between the micromobility protocol MEHROM and a resource reservation mechanism. While a micromobility protocol updates the routing tables to support data traffic towards a mobile host, Q-MEHROM supports resource reservations for both mobile receivers and senders. In addition, Q-MEHROM makes no assumptions about the access network topology. A router can detect itself that it has the function of cross-over node and whether the new path is optimal. Q-MEHROM uses topology and link state information, which is gathered and calculated by QOSPF and provided to Q-MEHROM in the form of QoS tables. At the time of handoff, the Q-MEHROM handoff scheme is initiated by the mobile host, sender or receiver (or both). The updating of the routing information and the resource reservations for the data flows of the mobile host are performed at the same time. In addition, invalid routing information and reservations along the old path are explicitly deleted. Resource reservations along the part of the old path that overlaps with the new path are reused. Simulation results showed that the amount of control overhead caused by Q-MEHROM and QOSPF is mainly QOSPF traffic. The control load increases for higher handoff rates in the network, which is influenced by the number of mobile hosts and their velocity. For the studied topologies and scenarios, this load remains very low compared to the link capacities. The first phase of Q-MEHROM efficiently uses the extra links, which are present in the topology to increase the robustness against link failures, to reduce the handoff packet loss compared to a pure tree topology. This handoff packet loss is independent of the speed of the mobile hosts and increases with the data rate. For high network loads, the use of Q-MEHROM allows mobile hosts to make reservations and protect their received bandwidth against newly arriving mobile hosts. Furthermore, Q-MEHROM takes the available bandwidth into account while choosing a path and is able to

1173

use mesh links and extra uplinks to improve the balancing of data load in the access network. Acknowledgements Liesbeth Peters is a Research Assistant of the Fund for Scientific Research—Flanders (F.W. O.-V., Belgium). Part of this research is funded by the Belgian Science Policy Office (BelSPO, Belgium) through the IAP (phase V) Contract No. IAPV/11, and by the Institute for the promotion of Innovation by Science and Technology in Flanders (IWT, Flanders) through the GBOU Contract 20152 ‘‘End-to-End QoS in an IP Based Mobile Network’’. References [1] C. Perkins (Ed.), IP mobility support for IPv4, IETF RFC 3344, August 2002. [2] D. Johnson, C. Perkins, J. Arkko, Mobility support in IPv6, IETF RFC 3775, June 2004. [3] A. Valko´, Cellular IP: a new approach to internet host mobility, ACM Computer Communication Review 29 (1) (1999) 50–65. [4] R. Ramjee, T. La Porta, L. Salgarelli, S. Thuel, K. Varadhan, IP-based access network infrastructure for nextgeneration wireless data networks, IEEE Personal Communications (August) (2000) 34–41. [5] E. Gustafsson, A. Jonsson, C. Perkins, Mobile IPv4 regional registration, draft-ietf-mobileip-reg-tunnel-07.txt, October 2002 (work in progress). [6] H. Soliman, C. Catelluccia, K. El Malki, L. Bellier, Hierarchical mobile IPv6 mobility management (HMIPv6), draft-ietf-mipshop-hmipv6-02.txt, June 2004 (work in progress). [7] C. Boukis, N. Georganopoulos, H. Aghvami, A hardware implementation of BCMP mobility protocol for IPv6 networks, in: GLOBECOM 2003—IEEE Global Telecommunications Conference, vol. 22(1), 2003, pp. 3083–3087. [8] K. El Maki (Ed.), Low latency handoffs in Mobile IPv4, draft-ietf-mobileip-lowlatency-handoffs-v4-09.txt, June 2004 (work in progress). [9] R. Koodli (Ed.), Fast handovers for Mobile IPv6, draft-ietfmipshop-fast-mipv6-02.txt, July 2004 (work in progress). [10] L. Peters, I. Moerman, B. Dhoedt, P. Demeester, Micromobility support for random access network topologies, in: IEEE Wireless Communication and Networking Conference (WCNC 2004), 21–25 March, Georgia, USA, ISBN 0-78038344-3. [11] L. Peters, I. Moerman, B. Dhoedt, P. Demeester, MEHROM: micromobility support with efficient handoff and route optimization mechanisms, in: 16th ITC Specialist Seminar on Performance Evaluation of Wireless and Mobile Systems (ITCSS16 2004), 31 August–2 September, Antwerp, Belgium, pp. 269–278. [12] J. Manner, A. Toledo, A. Mihailovic, H. Mun¨oz, E. Hepworth, Y. Khouaja, Evaluation of mobility and quality

1174

[13]

[14]

[15]

[16]

[17]

[18]

[19] [20]

[21]

[22]

[23]

[24] [25] [26]

L. Peters et al. / Computer Networks 50 (2006) 1158–1175

of service interaction, Computer Networks 38 (2) (2002) 137– 163. B. Moon, A.H. Aghvami, Quality-of-service mechanisms in all-IP wireless access networks, IEEE Journal on Selected Areas in Communications 22 (5) (2004) 873–887. B. Moon, A.H. Aghvami, RSVP extensions for real-time services in wireless mobile networks, IEEE Communications Magazine (December) (2001) 52–59. J. Manner, X. Fu, P. Pan, Analysis of existing quality of service signaling protocols, draft-ietf-nsis-signalling-analysis04.txt, May 2004 (work in progress). H. Chaskar (Ed.), Requirements of a quality of service (QoS) solution for Mobile IP, IETF RFC 3583, September 2003. R. Hancock, G. Karagiannis, J. Loughney, S. van den Bosch, Next steps in signaling: framework, draft-ietf-nsis-fw06.txt, July 2004 (work in progress). S. Lee, Ed., S. Jeong, H. Tschofenig, X. Fu, J. Manner, Applicability statement of NSIS protocols in mobile environments, draft-ietf-nsis-applicability-mobility-signaling-00. txt, October 2004 (work in progress). J. Moy, OSPF Version 2. IETF RFC 2328, April 1998. G. Apostolopoulos, D. Williams, S. Kamat, R. Guerin, A. Orda, T. Przygienda, QoS routing mechanisms and OSPF extensions, IETF RFC 2676, August 1999. S. Floyd, V. Jacobson, Link-sharing and resource management models for packet networks, IEEE/ACM Transactions on Networking 3 (4) (1995) 365–386. L. Peters, I. Moerman, B. Dhoedt, P. Demeester, Influence of the topology on the performance of micromobility protocols, in: Proceedings of WiOptÕ03, 3–5 March 2003, Sophia Antipolis, France, pp. 287–292. L. Peters, I. Moerman, B. Dhoedt, P. Demeester, Performance of micromobility protocols in an access network with a tree, mesh, random and ring topology, in: Proceedings of the IST Summit 2003, 15–18 June 2003, Aveiro, Portugal, pp. 63–67. NS-2 Home Page, www.isi.edu/nsnam/ns. QoSR in ns-2, www.netlab.hut.fi/tutkimus/ironet/ns2/ns2. html. J. Jun, P. Peddabachagari, M. Sichitiu, Theoretical maximum throughput of IEEE 802.11 and its applications, in: IEEE International Symposium on Network Computing and Applications (NCA-2003), Cambridge, USA, 2003.

Liesbeth Peters was born in Temse, Belgium, in 1978. She received her Master of Science degree in Electrotechnical Engineering from Ghent University, Gent, Belgium in 2001. Since August 2001, she has been working as a doctoral researcher with the Department of Information Technology (INTEC) of the faculty of Applied Sciences, Ghent University, where she joined the Broadband Communications Networks Group. Since October 2002, she works there as a research assistant of the Fund for Scientific Research—Flanders (F.W.O.-V., Belgium). Her current research interests are in broadband wireless communication and the support of IP mobility in wired cum wireless networks.

Ingrid Moerman was born in Gent, Belgium, in 1965. She received the degree in Electro-technical Engineering and the Ph.D. degree from the Ghent University, Gent, Belgium in 1987 and 1992, respectively. Since 1987, she has been with the Interuniversity Micro-Electronics Centre (IMEC) at the Department of Information Technology (INTEC) of the Ghent University, where she conducted research in the field of optoelectronics. In 1997, she became a permanent member of the Research Staff at IMEC. Since 2000 she is a part-time professor at the Ghent University. Since 2001 she has switched her research domain to broadband communication networks. She is currently involved in the research and education on broadband mobile & wireless communication networks and on multimedia over IP. Her main research interests related to mobile & wireless communication networks are: adaptive QoS routing in wireless ad hoc networks, personal networks, body area networks, wireless access to vehicles (high bandwidth and driving speed), protocol boosting on wireless links, design of fixed access/metro part, traffic engineering and QoS support in the wireless access network. Ingrid Moerman is author or co-author of more than 300 publications in the field of optoelectronics and communication networks.

Bart Dhoedt received his degree in Engineering from the Ghent University in 1990. In September 1990, he joined the Department of Information Technology of the Faculty of Applied Sciences, University of Ghent. His research, addressing the use of micro-optics to realize parallel free space optical interconnects, resulted in a Ph.D. degree in 1995. After a 2 year post-doc in optoelectronics, he became a professor at the Faculty of Applied Sciences, Department of Information Technology. Since then, he is responsible for several courses on algorithms, programming and software development. His research interests are software engineering and mobile & wireless communications. He is an author or co-author of more than 100 papers published in international journals or in the proceedings of international conferences. His current research addresses software technologies for communication networks, peer-to-peer networks, mobile networks and active networks.

Piet Demeester finished his Ph.D. thesis at the Department of Information Technology (INTEC) at the Ghent University in 1988. At the same department he became group leader of the activities on Metal Organic Vapour Phase Epitaxial growth for optoelectronic components. In 1992 he started a new research group on Broadband Communication Networks. The research in this field resulted in already more than 300 publications. In this research domain he was and is a member of several programme committees of international conferences,

L. Peters et al. / Computer Networks 50 (2006) 1158–1175 such as: ICCCN, the International Conference on Telecommunication Systems, OFC, ICC, and ECOC. He was the Chairman of DRCNÕ98. In 2001 he was the chairman of the Technical Programme Committee ECOCÕ01. He was the Guest Editor of three special issues of the IEEE Communications Magazine. He is also a member of the Editorial Board of the Journals ‘‘Optical Networks Magazine’’ and ‘‘Photonic Network Communications’’. He was a member of several national and international Ph.D. thesis commissions. He is a member of IEEE (Senior

1175

Member), ACM and KVIV. His current research interests include: multilayer networks, Quality of Service (QoS) in IPnetworks, mobile networks, access networks, grid computing, distributed software, network and service management and applications (supported by FWO-Vlaanderen, the BOF of the Ghent University, the IWT and the European Commission). He is currently a full-time professor at the Ghent University, where he is teaching courses in Communication Networks. He has also been teaching in different international courses.

Computer Networks 50 (2006) 1176–1191 www.elsevier.com/locate/comnet

An analytical model of a new packet marking algorithm for TCP flows Giovanni Neglia

a,*

, Vincenzo Falletta a, Giuseppe Bianchi

b

a

b

Dipartimento di Ingegneria Elettrica, DIE, Universita` degli Studi di Palermo, 90128 Palermo, Italy Dipartimento di Ingegneria Elettronica, DIE, Universita` degli Studi di Roma, Tor Vergata, 00133 Roma, Italy Available online 5 October 2005

Abstract In Differentiated Services networks, packets may receive a different treatment according to their Differentiated Services Code Point (DSCP) label. As a consequence, packet marking schemes can also be devised to differentiate packets belonging to a same TCP flow, with the goal of improving the performance experienced. This paper presents an analytical model for an adaptive packet marking scheme proposed in our previous work. The model combines three specific sub-models aimed at describing (i) the TCP sources aggregate, (ii) the marker, and (iii) the network status. Preliminary simulation results show quite accurate predictions for throughput and average queue occupancy. Besides, the research suggests new interesting guidelines to model queues fed by TCP traffic.  2005 Elsevier B.V. All rights reserved. Keywords: TCP marking; Differentiated services; Models

1. Introduction Differentiated Services (DiffServ) networks provide the ability to enforce a different forwarding behavior for packets, based on their Differentiated Services Code Point (DSCP) value. A possible way to exploit the DiffServ architecture is to provide differentiated support for flows belonging to different traffic classes, distinguished on the basis of the DSCP employed. However, since it is not required that all packets belonging to a flow are marked with the same DSCP label, another possible way to *

Corresponding author. E-mail addresses: [email protected] (G. Neglia), [email protected] (V. Falletta), giuseppe.bianchi@ uniromaz.it (G. Bianchi).

exploit DiffServ is to identify marking strategies for packets belonging to the same flow. Several packet marking algorithms have been proposed for TCP flows. The marking strategy is enforced at the ingress node of a DiffServ domain (edge router). Within the DiffServ domain, marked packets are handled in an aggregated manner, and receive a different treatment based on their marked DSCP. Generally, a two-level marking scheme is adopted, where packets labelled as IN receive better treatment (lower dropping rate) than packets marked as OUT. Within the network, dropping priority mechanisms are implemented in active queue management schemes such as RIO—Random Early Discard with IN/OUT-packets [1]. The basic idea of the proposed algorithms is that a suitable marking profile (e.g., a token bucket

1389-1286/$ - see front matter  2005 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2005.09.003

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

which marks IN/OUT profile packets) may provide some form of protection in the case of congestion. A large number of papers [1–16] have thoroughly studied marking mechanisms for service differentiation, and have evaluated how the service marking parameters influence the achieved rate. More recently, TCP marking has been proposed as a way to achieve better than best-effort performance [17–19]. The idea is that packet marking can also be adopted in a scenario of homogeneous flows (i.e., all marked according to the same profile), with the goal of increasing the performance of all flows. Our algorithm was first proposed in [20] and shares this aim. An introductory comparison with the other marking algorithms is presented in Section 2. In this paper we slightly modify the mechanism proposed in [20], and we describe an analytical model to evaluate the network performance. This model can be employed to study possible variants of the algorithm. By the way, the network submodel exhibits some novelty in comparison to previous approaches and could be useful in different network scenarios where TCP traffic is considered. The rest of this paper is organized as follows. After an overview of proposed marking schemes in Sections 2 and 3 describes our adaptive packet marking algorithm, focusing on some changes to the previous version. Section 4 presents the analytical model which relies on the Fixed Point Approximation, whose rationale and whose employment in the computer networks field are shortly introduced in Section 4.1. The three submodels are detailed respectively in Sections 4.2–4.4, while in Section 4.5 existence and uniqueness of a solution are proven. Section 5 deals with validation of the proposed model. A simple application of the model to evaluate the performance of a variant of the algorithm is presented in Section 6. Finally, concluding remarks and further research issues are given in Section 7. 2. Related works The idea to employ marking mechanisms for service differentiation was first introduced in [1], where the authors propose a time-sliding window marker. In [2] token bucket appears to achieve better performance in comparison to time-sliding window. At the same time the authors claim that marking cannot offer a quantifiable service to TCP traffic due to the interaction of TCP dynamics with priority dropping: when IN-packets and OUT-packets are mixed in a single TCP connection, drops of OUT-

1177

packets negatively impact the connectionÕs performance. Afterwards token bucket and time-sliding window markers have been extended to three colors [3–5]. The following studies confirm the difficulty of marker configuration. A detailed experimental study of the main factors that impact the throughput of TCP flows in a RIO-based DiffServ network is provided in [6]. The article shows that in an over-provisioned network all target rates are achieved, but unfair shares of excess bandwidth are obtained. However, as the network approaches an under-provisioned state, not all target rates can be achieved. In [7] it is shown that it is possible to improve the throughput significantly even when a small portion of traffic is sent as in-profile packets. At the same time the authors observe that, in order to fully utilize the benefit of out-profile packets, the amount of outprofile packets sent in addition to the in-profile packets has to be carefully determined. In [8] a set of experimental measures is presented. The main result is that the differentiation among the transmission rates of TCP flows can be achieved, but it is difficult to provide the required rates with a good approximation. In [9] the limits of token bucket are throughly investigated. It appears that (i) the achieved rate is not proportional to the assured rate, (ii) it is not always possible to achieve the assured rate and, (iii) there exist ranges of values of the achieved rate for which token bucket parameters have no influence. These results suggested the need to introduce some adaptivity in order to cope with TCP dynamics. In [10] the Packet Marking Engine monitors and sustains the requested level of service by setting the DS-field in the packet headers appropriately. If the observed throughput falls below the minimum target rate the Engine starts prioritizing packets until the desired target rate is reached. Once the target is reached, it strives to reduce the number of priority packets without falling below the minimum requested rate. Active Rate Management is proposed in [11] in order to provide minimum throughput to traffic aggregates. It is a classical, linear, time-invariant controller, which sets the token bucket parameters (specifically the token bucket rate), adapting to changes in the network. The same issue is tackled in [12]. The adaptive dual token bucket in [13] regulates the amount of OUT-packets in order to prevent TCP packet losses caused by excess low-priority traffic in the network. This adaptive technique requires a congestion signaling procedure from internal routers to border routers.

1178

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

Equation-Based Marking [14] is someway similar to ours because it senses the current network conditions; in particular it estimates the loss probability and the Round Trip Time (RTT) experienced by a TCP flow (without any signaling with core routers), and adapts the packet marking probabilities accordingly. In particular it uses the TCP model in [21] and these estimates in order to identify the target loss probabilities, corresponding to target throughput rates. Then, it uses the current loss probability estimate as well as these target loss probabilities to calculate the packet-marking probabilities. Main targets are fairness among heterogeneous TCP flows and protection against non-assured traffic. Fairness is also the main focus of [15,16]: the first concentrates on the effect of different RTTs, the second proposes the Direct Congestion Control Scheme to achieve fairness between responsive and unresponsive aggregates. The proposals described above share the purpose of assuring a minimum throughput to TCP individual flows or aggregates. As we wrote in the previous section, TCP marking has also been proposed as a way to achieve better than best-effort performance [17–19]. In particular [17] focuses on WWW traffic and proposes two packet marking schemes. The first one is tightly integrated with the TCP protocol: the source is allowed to send up to Ns IN-packets when it starts, and then up to Na = sstresh at the beginning of a Slow Start phase, and up to Na = cwnd at the beginning of a Fast Recovery phase. The second scheme does not require the knowledge of internal TCP variables, but it uses a constant value Na = Ns = 5, hence this scheme can be implemented at ingress routers. The rationale behind the schemes in [17] is that packets marked as IN will be protected against network congestion; hence marking can be usefully employed to protect flows with small windows or retransmitted packets, when packet losses cannot be recovered via the fast retransmission algorithm but trigger timeouts, which would reduce TCP source throughput. The TCP-friendly marker in [18,19] considers long-lived flows and adopt goodput and loss as performance metrics. The main guidelines are: (1) protect small-window flows and retransmitted packets from losses by marking them IN; (2) avoid, if possible, marking OUT consecutive packets in order to reduce the possibility of burst loss of packets. Our approach shares the purpose to space packet losses as much as possible; at the same time many differences hold. In the TCP-friendly marker a fixed num-

ber of IN tokens is available for each time interval and these have to be distributed among the flows; on the contrary our scheme adaptively set the length of IN packets burst (i.e., the number of flow consecutive packets that are marked IN) according to the network status. Besides, all the marking schemes share the idea that packets marked IN will be protected against network congestion, while our algorithm operates according to the some what opposite philosophy to employ OUT-packets as probes (see Section 3). Finally, our approach is much simpler. We present some further remarks about the previous algorithms in order to stress the novelty of our approach. In a DiffServ Assured Forwarding (AF) scenario, the differentiation between traffic classes is relative. For example, the usual RIO configuration [1] assures that IN -packet dropping probability is lower than the OUT-packet one, but no bound is guaranteed. For this reason the protection of INpackets in [17] relies on the assumption that most of the packets in the network are of type OUT, hence IN-packets will receive a ‘‘good-enough’’ service. In fact in [17] the authors show that a throughput reduction may be encountered when the percentage of IN traffic becomes greater than a given threshold. The authors claim that the problem is interleaving IN and OUT-packets, when the loss rate of the OUT traffic is much larger than that of the IN traffic. We want to stress that the IN-packet protection vanishes as IN traffic increases. Indeed, we too have observed performance impairments for both a token-bucket marker and for a marking scheme very similar to the one proposed in [18,19] (protection of small window and retransmitted packets, an OUT-packet inserted every n INpackets). Hence our approach shows two main differences [20]: (1) the majority of packets are IN, (2) the performance takes advantage of a very high OUTpacket loss rate. The apparent conflict with results in [17] and with similar results for the marker proposed in [18,19] is a result of the adaptivity. These schemes are not designed to be adaptive to the network congestion status, while ours uses some heuristics to provide adaptivity. 3. The packet marking algorithm (PMA) In [20,22] we proposed a new marking algorithm, able to achieve better performance in terms of average queueing delay and flow completion time vs link

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

utilization. According to this marking scheme ‘‘long’’ IN-packet bursts are interleaved with a single OUT-packet. The OUT-packet is thence employed as a probe to reveal early a possible seed of congestion in the network. The algorithm dynamically updates the length of IN-packets bursts by a heuristic estimation of the experienced packet loss ratio. The idea of marking the majority of packets as IN seems to be in contrast with some results found with other marking schemes [18,19,17], but the intrinsic adaptivity of our algorithm is something all these models lack. If we think about Active Queue Management (AQM) techniques such as Random Early Detection (RED) we observe the same idea of dropping some packets when signs of an incoming congestion are received. Our algorithm goes further: it reallocates losses among the OUT packets, so it spaces them as much as possible, avoiding consecutive losses for a flow and assuring a more regular TCP adaptation behavior. By simulation evaluation we found better performance when OUT-packet dropping probability is near 100%, while IN-packets are not dropped at all. The algorithm flowchart is shown in Fig. 1. Now we will explain how this procedure works. Each time a new SYN packet arrives at the edge router a new state vector is set, containing the following variables: SNh: This counter stores the highest Sequence Number (SN) encountered in the flow. It is initially set to the ISN (Initial Sequence Number) value. It is

Arriving Packet

SN > SNh ?

YES

SNh := SN Lseq := Lseq + 1

NO

AIN := (1 - α)AIN+α Lseq Lseq := 0 CIN := CIN + 1

NO

CIN := CIN + 1

MARK IN Fig. 1. PMA flow diagram.

CIN > AIN ? YES

CIN := 0 AIN:= AIN+1

MARK OUT

1179

updated whenever a non-empty packet (i.e., nonACK) arrives with a higher SN. Lseq: It is initially set to zero. It is increased by one unit for each new arrived packet (i.e., insequence packet), while is reset to zero every time an out-of-sequence packet arrives. CIN: It counts the number of IN-packets in the burst. It is reset to zero when it exceeds AIN and an OUT-packet is sent. AIN: It stores the number of packets which will be marked IN, it tracks the average length of insequence packet bursts through autoregressive filtering. The algorithm has been slightly changed in comparison to the version presented in [20,22]. In the previous algorithm a single variable (LIN) took into account the number of in-sequence packets (as Lseq actually does) and the number of IN-packets of the actual IN-packets burst (as CIN actually does). This coupling required an artificial increase of the variable AIN after marking an OUT-packet, we chose AIN: = 2 AIN + 1 but its correct amount was dependent from the network condition as discussed in [20,22]. After the introduction of the new variable CIN, a small increase of AIN has been left: it assures better fairness among the flows, allowing flows with underestimated AIN values to faster reach the correct estimate. 4. The analytical model The algorithm has shown good performance, but it essentially relies on a heuristic. In order to achieve a deeper understanding and to establish RIO setting criteria, we have developed an analytical model. The model assumes n long-lived homogeneous flows sharing a common bottle-neck, whose capacity is C. The model is based on Fixed Point Approximation (FPA), a modeling technique described in the following subsection. According to FPA the system is divided into its three main components as shown in Fig. 2: the TCP sources, the network and the marker. Each element is modeled separately, taking into account the effects of the others through the parameters shown in figure. For example TCP sources depend on the network by the RTT and the dropping probabilities pin and pout, and on the marker by the length of IN packet bursts (AIN). After an overview of FPA methods in Section 4.1, the submodels for the TCP sources, for the marker and for the network are respectively presented

1180

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191 AIN AIN

Marker

T

Sources Lseq

Network RT T , pin , pout

Fig. 2. The three-block model.

in Sections 4.2–4.4. Each of them could be replaced by a more sophisticated one. 4.1. About fixed point approximations The expression Fixed Point Approximation (FPA) refers to a particular modeling technique, which we are going to describe in this section. This name is wide spread in scientific literature [23–25], but also other names appear: fixed point models [26,27], fixed point approach [28], and reciprocal model tuning [29]. Other papers [30,31] use the expression ‘‘fixed point’’ to refer to the specific method employed to solve the model system of equations, rather than to the modeling technique. This section is organized as follows. Firstly we introduce the idea of FPA with reference to our specific problem, and we explain the origin of the expression fixed point. Secondly we briefly present telecommunications works employing this kind of modeling technique. For a more detailed overview of FPA in the field of computer networks refer to [32]. Let us consider a single bottleneck network, where a single TCP flow is marked at the edge and feeds the queue at the bottleneck. Suppose we are interested in some average values, like TCP throughput or queue occupancy. If we know all the parameters characterizing the network (i.e., link capacities, link delays, buffer size) the TCP sender (e.g., the TCP version, the maximum congestion window size, the timer granularity), the TCP receiver and the marker, we are able to describe exactly the behavior of each element of the network and to evaluate the throughput of the TCP sender or the queue occupancy at each instant. If we were able to describe the evolution of these quantities in a closed form, we could evaluate their average value, by integrating the analytical expressions, but in general this is not the case. In order to achieve our purpose we have to sacrifice the exact description of the system. A way to make the problem analytically tractable is to divide the system into three parts (e.g., the TCP source, the

queue at the bottleneck and the marker), to assume some simplifying assumptions about their interaction, and then to develop an analytical model for each part. According to the FPA approach, the main assumptions are that we model each part considering the other in a steady state, and that this state is independent by the behavior of the part we are modeling. In our example we know that the throughput of the TCP source is dependent on the path current RTT, from packet discard at the queue and from the marking pattern (characterized by AIN). At the same time the TCP traffic produces the queue in the network and causes eventually packet discard when the buffer is full. Nevertheless, in order to model the TCP behavior, we assume that the network and the marker are in a steady state: specifically we consider that RTT and AIN are constant (AIN = A), and that the packet discard for both IN and OUT-packets are bernoullian processes; respectively, with mean values pin and pout. Different assumptions can be made. Anyway these allow us to derive an expression for the long-term steady-state TCP throughput as a function of RTT, p, A, say: T ¼ f ðRTT; pin ; pout ; AÞ

ð1Þ

and an expression for the average number of insequence packets as: L ¼ gðpin ; pout ; AÞ.

ð2Þ

In the same manner, in order to model the network, we assume the TCP source offers a constant traffic intensity to the network, independently from the present network status (queue occupancy and packet discard probability), with a ratio of IN-packets to OUT-packets equal to A. If we add some further hypothesis about the statistical characterization of packet arrivals at the buffer and the way IN and OUT-packets are interspersed, we are able to derive the mean number of packets in the router and hence the average RTT and the mean dropping probabilities (pin, pout), i.e., pin ¼ hin ðT ; AÞ;

pout ¼ hout ðT ; AÞ;

RTT ¼ lðT ; AÞ.

ð3Þ

ð4Þ

ð5Þ

Finally given the average number of in-sequence packet (L), we can derive the average length of IN-packet bursts: A ¼ mðLÞ.

ð6Þ

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

In order to determine T, RTT, pin, pout, L, A, we need to solve the system of Eq. (1)–(6). If we define the function U : R6 ! R6 as U ðT ; L; pin ; pout ; RTT; AÞ

¼ ½f ðRTT; pin ; pout ; AÞ; gðpin ; pout ; AÞ; hin ðT ; AÞ; hout ðT ; AÞ; lðT ; AÞ; mðLÞ;

ð7Þ

then we can note that a solution of such system (½T  ; L ; pin ; pout ; RTT ; A ), if any, satisfies the following relation: ½T  ; L ; pin ; pout ; RTT ; A 

¼ U ðT  ; L ; pIN ; pout ; RTT ; A Þ;

ð8Þ

i.e., the point ½T  ; L ; pin ; pout ; RTT ; A  is a fixed point for the R6 ! R6 mapping, established by the function U.1 This remark justifies the name of FPA. Under some proper conditions concerning the function U and its definition set, fixed-point theorems can be used to conclude that at least a solution exists, like BolzanoÕs theorem, BrouwerÕs theorem or KakutaniÕs theorem (see for example [33], or [34]). The question of uniqueness is more difficult; eventual monotonicity greatly constrains the possible dynamics. Different methods can be employed in order to solve Eq. (8). In particular repeated substitution takes into account the following relation: ½T iþ1 ; Liþ1 ; pin;iþ1 ; pout;iþ1 ; RTTiþ1 ; Aiþ1  ¼ U ðT i ; Li ; pin;i ; pout;i ; RTTi ; Ai Þ;

ð9Þ

assuming that lim ½T i ; Li ; pin;i ; pout;i ; RTTi ; Ai 

i>1

¼ ½T  ; L ; pin ; pout ; RTT ; A . This kind of solution is particularly appealing, because Eq. (9) can be read as a dynamic system, describing the network operation [24,35]: in our example if the network is unloaded (pin = pout = 0) and the TCP source starts injecting traffic in to the network, the buffer provides new (different) value of pin and pout by dropping packets. The source reacts to this packet loss probability adjusting its sending rate and at the same time the marker changes its marking profile until convergence is reached. Despite this striking interpretation, it is 1 Note that in what follows we will introduce for convenience other variables, but nothing changes as regards the idea of FPA described here.

1181

not clear how closely Eq. (9) actually describes the network operation. By the way, convergence of Eq. (9) is not guaranteed. Some other kinds of approximation are often employed together with FPA and they cannot often be easily distinguished from a FPA approach extended to all the network elements, i.e., when we divide the network into as many parts as the number of TCP sources plus the number of network routers. One example is the Mean Field Theory (also known under many names and guises, e.g., Self Consistent Field Theory, Bragg–Williams Approximation, Bethe Approximation, Landau Theory) which simplifies a many-body interactions problem by replacing all interactions to anyone body with an average or effective interaction. The Mean Field Theory is explicitly referenced in [36,37] as a way of modeling the interaction of many TCP flows. Another common assumption concerns the networks of queues and is known as KleinrockÕs independence approximation [38]. Also in a network of queues there is a form of interaction, in the sense that a traffic stream departing from one queue enters one or more other queues, perhaps after merging with portions of other traffic streams departing from yet other queues. Analytically, this has the unfortunate effect of complicating the character of the arrival process at downstream queues. Kleinrock suggested that merging several packet streams on a transmission line has an effect akin to restoring the independence of interarrival times and packet lengths. It was concluded that it is often appropriate to adopt M/M/1 queueing model for each communication link regardless of the interaction of traffic on this link with traffic on other links. The employment of FPA techniques to model networks is not a novelty. For example there is a considerable body of literature on the application of fixed point methods to estimating blocking probabilities in circuit-switched networks (see [39] for some applications). More recently FPA has been widely used to model the interaction of TCP sources with the network (see [23,24,26–29,31,35,40,41]). Ref. [26] has probably the merit to be the first paper where the FPA approach is clearly stated and presented as a method ‘‘which allows the adaptive nature of TCP sources to be accounted for’’. Regarding network models ([31,35,40,23,27,41]) do not need a stochastic queue model, but they essentially rely on the assumption that long-lived TCP flows are able to achieve full bandwidth utilization. Aside from [27,41] they consider AQM mechanisms

1182

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

relating the dropping probability and the queue occupancy. [27] considers a zero buffer queue and [41] considers large delay-bandwidth networks in order to neglect queueing delay. Multi-bottleneck networks are considered in [40,23,27,41], and the existence of a solution is proved in [41] under the above simplification. The hypothesis of full bandwidth utilization is removed in [26,29,28,24], which consider Poisson arrivals at the queue. In [28] each buffer is modeled as a M/M/1/K queue or as a M[X]/M/1/K queue with batch arrivals. The paper discusses the admissibility of the Poisson hypothesis and proves the existence and the uniqueness of the solution when the nominal load is less than one for short and long-lived TCP flows. A more detailed investigation of the existence, the uniqueness and the stability of equilibrium points appears in [24] for a single-bottleneck scenario and short-lived flows. As a final remark we note that there has been related work focusing on the development and solution of a set of differential equations describing the transient behavior of TCP flows and queue dynamics [42]. FPA complements this approach. The fixed point approach is much more efficient computationally as the number of unknowns equals the number of links in the network, whereas the differentialequation approach requires the solution of a number of equations equal to the number of links plus the number of TCP flows. On the other hand, the differential-equation approach can be used to study transient behavior. 4.2. The sources model According to the previous description, we aim to obtain an expression of the average TCP through-

put (T, the input to the Network block) and of the average length of the in-sequence packet burst (L, the input to the Marker block), given the marking profile (A) and the network status (RTT, pin, pout). We have conjectured a regenerative process for TCP congestion window (cwnd), thus extending the arguments in [21] to include two different service classes, with different priority levels. In our analysis we neglect slow-start operation and time-out events, we only consider loss indications due to triple duplicated acks, which turn on (always successful), the TCP fast-retransmit mechanism. As regards neglecting time-outs, this approximation appears not to be critical because PMA spaces OUT-packets and hence loss events. For this reason errors are usually recovered by fast retransmission, not by time-out. Such intuition is confirmed by our simulation results, where the number of time-outs appears to be significantly reduced in comparison to a no-marker scenario. A period of our regenerative process starts when the sender congestion window is halved due to a loss indication. Fig. 3 shows the cwnd trend as rounds succeed. Wi1 is the cwnd value at the end of the (i  1)th period, hence in the ith period cwnd starts from Wi1/2 and it is incremented by one every b rounds (b is equal to 2 or 1, respectively if the receiver supports or not the delayed ack algorithm). Notice that, due to our assumptions on TCP operation, each period starts with an IN retransmitted packet; hence the number of packets sent in the period (Yi) is equal to Lseq + 1, according to the marker description in Section 4.3. In the ith period we also define the following random variables: Ii is the length of the period; bi is the number of packets transmitted in the last round; ai is the number of the first lost packet since

Fig. 3. Timeline and transmitted packets.

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

the beginning of the period, while ci is the number of packets transmitted between the two losses occurred in the (i  1)th and in the ith period. We get Yi = ai + Wi  1 and ai = ci  (W i1  1). Due to the renewal-reward theorem we can obtain the expression for the average throughput of n sources sharing the same path: T ðA; RTT; pin ; pout Þ ¼ n

E½Y i  . E½I i 

We first compute E[Yi]. The relation between ai and ci allows us to explicitly write, E[Yi] as a function of the marking profile (A) and the network status (in particular pin, pout). In general Yi 5 ci; however if we consider their mean values, the following holds: E½Y i  ¼ E½ai  þ E½W i   1 ¼ E½ci   ðE½W i1   1Þ þ E½W i   1 ¼ E½ci . Let us denote by N the expected value E[ci]. We compute N as: 1 1 1 X X X QðnÞ; ð1  P ðnÞÞ ¼ npðnÞ ¼ N¼ n¼0

n¼0

n¼0

where p(n) is the probability of losing the nth packet after (n  1)th successful transmission, P ðnÞ ¼ Pn l¼0 pðlÞ is cumulative distribution function, and so Q(n) = 1  P(n) represents the probability of not losing any packet among these n. If we put n as n = k(A + 1) + h, with 0 6 h < (A + 1) we can write Q(n) as QðnÞ ¼ skAþh skout ; in

where sin = 1  pin, sout = 1  pout. The expression of N can be rewritten as N¼

A 1 X X

skAþh skout in

k¼0 h¼0

and can be solved in a closed form: N¼

sAþ1 1 in  1 . sin  1 1  sAin sout

ð10Þ

Now we compute E[Ii]. Denoting with Xi the round in the ith period when aPpacket is lost, we obtain the X i þ1 period length as I i ¼ j¼1 rij , where rij is the jth round trip time length. Supposing rij independent of the round number j (i.e., independent of cwnd size), and taking the expectation we find E½I i  ¼ ðE½X  þ 1ÞE½r;

1183

where E[r] = RTT is the average round trip time. In the ith period the cwnd size grows from Wi1/2 to Wi with linear slope 1/b, so2 W i1 X i þ 1 Wi ¼ 2 b and taking the expectation we get 2 E½W  ¼ ðE½X   bÞ. b To simplify our computations we assume Wi1/2 and Xi/b to be integers. Now let us count up all the packets:   XX i =b1 W i1 þ k b þ bi Yi ¼ 2 k¼0   X i W i1 X i X i þ  1 þ bi ¼ 2 2 b   Xi Xi W i1 þ  1 þ bi ¼ 2 b   Xi W i1 ¼ Wiþ þ bi 2 2 and again taking the expectation it follows:   E½X  E½W  E½W  þ N¼ þ E½b. 2 2 Assuming b identically distributed between 1 and Wi1 we can write E[b] = E[W]/2; therefore, solving for E[X]: 0 1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2 b @ 2 þ 3b 8N 2 þ 3b  þ þ þ 2A E½X  ¼ 2 3b 3b 3b sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2ffi 3b  2 2bN 2 þ 3b þ þ . ¼ 6 3 6 Then it follows that 0 1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  2ffi 3b  2 2bN 2 þ 3b þ þ þ 1A . E½I i  ¼ RTT@ 6 3 6

Now we can write down the throughput formula: N T ðN ; RTTÞ ¼ n RTTðE½X  þ 1Þ N 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼n . 2þ3b2ffi RTT 3b2 2bN þ þ þ 1 6 3 6

ð11Þ

2 The formula is a linear approximation of the exact relation Wi = Wi1/2+dXi/be1. In [21] a different approximation has been considered.

1184

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

Throughput dependance from A, pin and pout is included in N through Eq. (10). Note that if AIN = A = 0 (i.e., there is only one class of packets) and pout = p ! 0 we get the wellknown formula [21]: sffiffiffiffiffiffiffiffi n 3 T ðp; RTTÞ ’ . RTT 2bp Finally, as regards the average length of the insequence packet burst (L), from previous remarks it simply follows that: L ¼ E½Y i   1 ¼ N  1.

ð12Þ

4.3. The marker model We have already discussed PMA in this paper, and we have seen how the procedure acts marking one packet OUT every AIN IN-packets, where AIN is obtained by filtering Lseq with an autoregressive unitary-gain filter. Hence, given A and L respectively the average values of AIN and Lseq, they are tied by the relation3: A ¼ L.

ð13Þ

The relation between A and L is in accordance with the rationale discussed in Section 3. A different relation between A and L can be chosen by the provider: A ¼ mðLÞ.

ð14Þ

A change of the m() law leads to a different marking algorithm, for example pursuing a different objective. We show an example in Section 6. As regards the fixed-point approach approximation, we observe that the previous relation looks more suitable as long as the system reaches the state where pin ’ 0 and pout ’ 1. In fact, in the case of pin = 0, pout = 1 we would have AIN = Lseq, not simply A = L. In [20,22] we have shown that the algorithm exhibits optimal performance in the hard differentiation setting, which leads to pin ’ 0 and pout ’ 1. Hence fixed-point approximation appears justified for PMA.

4.4. The network model In [43] we have proposed a network submodel, extending the approach proposed in [35] for a best-effort scenario to a DiffServ one, where routers deploy RIO (we indicate the configuration parameters as (minout, maxout, Pmaxout) and (minin, maxin, Pmaxin) respectively for OUT and IN packets [1]). A limit of that approach is that TCP sources are intrinsically assumed to achieve full bottleneck utilization (assumption also in [31,35,40,23,27,41]), hence the model is able to predict average queue occupation, not link utilization. Besides, the model in [43] predicts a range of solutions when maxout < minin. These problems could be overcome by introducing queue variability in the model. In this paper the approach is radically different; we consider that the queue can be modelled as a M/M/1/K queueing system. This allows us to evaluate the stationary distribution of the queue for a given offered load T, and then the average values we are interested in, i.e., RTT, pin and pout. As regards the assumption of Markovian arrivals, it seems to be justified when the TCP connection rate increases [44]. Anyway, M/M/1/K models have been widely employed in literature and have shown good performance [45,26,30,46,47]. In particular our framework is similar to those of [30,47], which model respectively Token Bucket and Single Rate Three Color Marker, but our model is different because it assumes state-dependent arrivals, rather than uniform ones. These models (Fig. 4) take into account the presence of different classes of traffic and the effect of an AQM mechanism like RIO, but they assume that dropping probability depends only on the instantaneous queue size, disregarding the effect of filtering. According to [30], the stationary distribution of the queue can be evaluated as: pðiÞ ¼ pð0Þ

 i Y T i1 ð1  pðjÞÞ; C j¼0

i ¼ 1; 2; . . . ; maxin ;

T

Sources 3

A closer look to the algorithm reveals that this is an approximation due to the update A: = A + 1 after each OUTpacket transmission.

Network RT T , p

Fig. 4. Interaction between the Network model and the Sources model.

1185

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

and

pout ¼

pin ðiÞpðiÞ;

i¼0 max out X

0.06

0.02 0

a

0

20

40

60 80 Queue size

100

120

140

0.12 RIO settings=(2,6,0.2)–(8,24,0.05) RIO settings=(8,24,0.2)–(32,96,0.05) RIO settings=(24,72,0.2)–(96,288,0.05)

0.1

Probability

Note that we assumed maxout < maxin, and that it is useless to consider queue values greater than maxin because RIO drops all the incoming packets when the instantaneous queue is equal to maxin. Once p(i) has been obtained, RTT, pin and pout can be evaluated as max 1 Xin RTT ¼ R0 þ q=C ¼ R0 þ ipðiÞ; ð15Þ C i¼0 pin ¼

0.08

0.04

T in pin ðiÞ þ T out pout ðiÞ Apin ðiÞ þ pout ðiÞ . ¼ T in þ T out Aþ1

max Xin

RIO settings=(2,6,0.2)–(8,24,0.05) RIO settings=(8,24,0.2)–(32,96,0.05) RIO settings=(24,72,0.2)–(96,288,0.05)

0.1

0.08 0.06 0.04

ð16Þ

0.02

pout ðiÞpðiÞ;

ð17Þ

i¼0

where R0 is the sum of the propagation and transmission delays. We have followed such an approach, but the results are unsatisfactory. The physical explanation appears from Fig. 5(b) and (a), which show the empirical distribution coming from simulations and the queue distribution predicted by the model given the same average load, for three different configurations. The RIO settings in the legend are given in the form (minout, maxout, Pmaxout)  (minin, maxin, Pmaxin). According to the model the queue should exhibit a spread distribution, with high probability for low queue values (in particular the probability density is strictly decreasing if T < C), while the empirical distribution looks like a gaussian one: the dynamic adaptive throughput of the TCP sources, which increase their throughput when RTT decreases and vice versa, appear to be able to create a sort of ‘‘constant bias’’. In order to capture this behavior, we have modified the model in [30], by introducing arrival dependence from the network status. The input to the sub-model is N qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi F ðN Þ ¼ T  RTT ¼ ð18Þ  2 3b2 þ 2bN þ 2þ3b þ1 6 3 6 and the arrival rate when the there are j packets in the queue is

0

0

20

40

b

60 80 Queue size

100

120

140

0.12 RIO settings=(2,6,0.2)–(8,24,0.05) RIO settings=(8,24,0.2)–(32,96,0.05) RIO settings=(24,72,0.2)–(96,288,0.05)

0.1

Probability

pðiÞ ¼

0.12

Probability

where C is the bottleneck capacity, p(0) is given by the normalization equation !1 max i1 Xin T i Y pð0Þ ¼ 1 þ ð1  pðjÞÞ C j¼0 i¼1

0.08 0.06 0.04 0.02 0

c

0

20

40

60 80 Queue size

100

120

140

Fig. 5. (a) Queue distribution predicted by the model with uniform arrivals. (b) Queue distribution obtained by simulations. (c) Queue distribution predicted by the model with state dependent arrivals.

T ðjÞ ¼

F . R0 þ Cj

Now the stationary distribution can be evaluated as pðiÞ ¼ pð0Þ

i1 Y T ðjÞ ð1  pðjÞÞ; C j¼0

i ¼ 1; 2; . . . ; maxin .

Fig. 5(c) shows the queue distribution evaluated by the new model. The similarity with Fig. 5(b) is impressive, the only difference is for the first configuration ((2, 6, 0.2)  (8, 24, 0.05)), as regards

1186

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

low queue occupancy. The peak for q = 0 is probably due to timeouts, which are more common with low RIO settings, and make TCP throughput less uniform and hence the markovian arrival assumption less accurate.

F

4.5. About the solutions of the system F(q)

Summarizing, our model has 8 variables (N, A, T, L, RTT, pin, pout, F) and eight Eqs. (10)–(13) and (15)–(18). In this section we show existence and uniqueness of solutions for this system. We are going to reduce the system to a simpler one with two variables (F and q). First, let us note that F can be expressed as an increasing function of A by Eqs. (18), (12) and (13). Besides, it can be proven that p(i + 1)/p(i) increases with F and A alike, being pin(i) < pout(i). Hence q, pin and pout are continuous increasing function of F and A and by the relation between F and A, we can express them as increasing function of F (e.g., from q = q(A,F) and A = A(F), q = q (A(F),F) = q(F)). These functions being invertible, we can express pin and pout as (increasing) functions of q. Further, the following results hold: qðF ¼ 0Þ ¼ 0; lim qðF Þ ¼ maxin .

F !þ1

As regards A, from Eqs. (10) and (13) (A = L), we obtain that A is the solution (if any) of the following equation Aþ1¼

sAþ1 1 in  1 . sin  1 1  sAin sout

q!0

max in

q

Fig. 6. Existence and uniqueness of the solution.

Let us focus on the expression of F (18). From the relation between F and A and the relations established above, it appears that F is a decreasing function of q and: lim F ðqÞ ¼ þ1; q!0

lim F ðqÞ ¼ 0.

q!þ1

From the previous considerations and hypotheses it follows that the simplified system in F and q admits one and only one solution, as is qualitatively shown in Fig. 6. All the functions being monotone, the original system admits only one solution. It is possible to set up an iterative numerical procedure to find this solution, and this is just what we did using MATLAB.

ð19Þ

The right member of Eq. (19) (i.e., N) is an increasing function of A, sin and sout, as appears immediately from the same definition of N, taking into account that pin < pout. Being 1/pout 6 N 6 1/pin the curve represented by the right member always intersects the line represented by the left member only in one point (because N increases with A). Hence Eq. (19) admits one and only one solution A and this solution increases with sin and sout (because N increases with sin and sout). From the relation between pin, pout and q it follows that A is a decreasing function of q. Besides, when q converges to zero, pout converges to zero and A > 1/pout diverges, i.e., lim AðqÞ ¼ þ1.

q(F)

5. Model validation To validate our model we considered the network topology shown in Fig. 7, consisting of a single bottleneck link with capacity equal to 6 Mbps. Considering both the transmission and the propagation delay of packets and acks in the network, the average Round Trip Time is R0 ffi 138 ms. The IP packet size is chosen to be 1500 Bytes for a bottleneck link capacity of c = 500 packets/s. As regards RIO configurations we considered non-overlapping the ones in which maxout < minin, more precisely we choose maxout = 3minout, maxin = 3minin and minin = 4minout. In previous performance evaluation this kind of settings showed better results in comparison with a overlapping RIO configuration in which, maxout P minin. We tested seven

1187

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191 D1

S1 S2

D2 D3

30Mbps 5ms (avg)

S3

D4

S4

D5

S5 60Mbps 19ms

E1 C1

E3 C2

6Mbps 19ms

E4

E2 60Mbps 19ms

S6

D6 S7

D7

30Mbps 5ms (avg)

S8

D8

S9 S10

D10

D9

Fig. 7. Network topology.

different configurations, varying minout from 2 up to 24, and for each configuration we gathered statistics from 10 trials of 1000 s each. We chose P maxout ¼ 0:2 and P maxin ¼ 0:05. We ran our simulations using ns v2.1b9a, with the Reno version of TCP. Table 1 compares model predictions with simulation results when the number of flows is equal to n = 10, as regards throughput (T), goodput4 (G), queue occupancy (q), the dropping probability for the generic packet, for IN-packets and for OUTpackets (respectively Pdrop, P dropin , P dropout ), and the average length of IN-packets bursts. The average mean error over the different settings and the maximum error are shown in the last two rows. The model appears to be able to predict with significant accuracy throughput, goodput and queue occupancy, which are the most relevant performance indexes when we consider TCP long-lived performance flows. On the contrary dropping probability estimates are very inaccurate, in particular as regards P dropin . We think the reason is that the model neglects the effect of filtering on the dropping probability calculation for RIO routers. In fact some preliminary results which take into account filtering seem to suggest that filtering: (i) can be neglected in order to evaluate the dynamics of the instantaneous queue, (ii) is significant for the evaluation of the dropping probabilities. In particular probability estimates look better. At the moment we have introduced the effect of filtering by considering a two4

The goodput is estimated as G = T(1  Pdrop).

dimensional Markov chain where the status is the pair of instantaneous queue and filtered queue (whose values have been quantized). This approach is particularly heavy from the computational point of view; for this reason, at the moment, we have not adopted it. The goodput/delay tradeoff is presented in Fig. 8, where each point corresponds to a different threshold setting. We also evaluated the model with the same network topology with a different number of flows (n = 6, n = 20). The differences between model predictions and simulation results are similar to those observed for n = 10 flows. The relative errors for these two scenarios are shown in Table 2. 6. A model application Here we want to show a possible application of our model. In particular we want to evaluate a new variant of the algorithm where a higher number of packets is marked OUT. Intuitively this new version should be able to react more quickly to traffic changes, by allowing more probes. With reference to the algorithm flowchart in Fig. 1, in the new marking scheme a packet is marked OUT every time CIN exceeds AIN/2. From the modeling point of view we have only to change the marking law m() (see Section 4.3) as follows: 1 A ¼ L. 2

1188

6.02 16.39

75.099 80.599 84.961 89.252 89.453 89.628 88.069 85.179 648.84 991.51 30.52 50.24 3.92 18.63 5.00 12.22 0.65 1.00

Simulation Model

62.791 70.108 74.590 79.978 83.494 87.805 90.452 93.531 0.194 0.131 0.090 0.051 0.037 0.017 0.012 0.011

Simulation Model

1.009 0.731 0.580 0.408 0.303 0.184 0.121 0.060 1.455 1.288 1.135 1.017 0.939 0.792 0.670 0.506

Simulation Model

2.185 1.860 1.665 1.405 1.214 0.942 0.757 0.522 57.35 67.70 79.50 90.76 97.96 113.96 132.07 169.65 51.51 60.42 67.17 78.83 90.33 114.59 141.06 201.26 8.54 12.13 14.90 22.17 29.24 44.01 58.57 86.67 7.86 12.49 16.72 24.52 32.04 46.63 60.89 89.03

Simulation Model Simulation Model

0.28 0.68 Mean error (%) Max error (%)

Simulation

467.82 486.29 494.27 499.27 499.88 499.98 499.99 500.00

Model

467.57 485.02 490.94 494.25 495.02 495.68 496.15 496.84 474.87 492.59 499.76 504.15 504.39 503.83 503.23 502.37

Simulation Model

478.02 494.22 499.25 501.29 501.10 500.39 499.93 499.44

P dropin (%) Pdrop (%) A (packet) q (packet) G (packet/s) T (packet/s) RIO

Table 1 Model vs simulation with 10 flows

The model predictions and the simulation results are shown in Fig. 9 as performance frontiers. The same RIO configurations have been considered for both the original algorithm and the variant, with minout ranging from 2 to 24 while the other parameters have been chosen according to Section 5. It appears that the model is able to capture the main change: the curve of the new variant is shifted towards lower utilization because of higher sensitivity to congestion, but its shape is almost unchanged. In order to stress this point, two pairs of points are circled in the figure: they correspond to the minout = 12 configuration for the original algorithm and minout = 16 for the new variant. It appears that the two algorithms are able to achieve almost the same performance with different configurations. At the same time simulation results show another effect that the model is not able to catch: the new variant exhibits also a higher queuing delay for average utilization. We think the reason is a higher traffic variability (the dropping probability of OUTpackets decreases from about 80% to about 50%) which produces larger queues. This effect is not addressed by the model because the network submodel mainly takes into account the average throughput assuming the same Markovian arrival process independently from the specific marking strategy. 7. Conclusions and further research issues

(2, 6)(8, 24) (3, 9)(12, 36) (4, 12)(16, 48) (6, 18)(24, 72) (8, 24)(32, 96) (12, 36)(48, 144) (16, 48)(64, 192) (24, 72)(96, 288)

P dropout (%)

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

In this paper we have presented an analytical model for our adaptive packet marking scheme proposed in previous work. From preliminary simulation results, model predictions about throughput and average queue occupancy appear to be quite accurate. We have also shown that the model can be employed to evaluate variants of the original marking algorithm. We are going to extend simulation evaluation and to employ such models to study possible variants of the marking algorithm and to establish optimal RIO settings. Further, our network sub-model exhibits some novelty and seems to be more suited than traditional M/M/1/K proposals to capture the behavior of long-lived TCP flows. We are going to study this deeply and evaluate it in a simpler best-effort scenario. We want to evaluate the effect of filtering, which is usually neglected in M/M/1/K models, but appears to have a deep impact on performance.

1189

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191 0.25 PMA - model PMA - ns best effort

Queue Delay (sec)

0.2

0.15

0.1

0.05

0 80.00% 82.00% 84.00% 86.00% 88.00% 90.00% 92.00% 94.00% 96.00% 98.00% 100.00% Goodput %

Fig. 8. Queue delay vs goodput for the PMA model and the corresponding ns2 simulations, together with simulation results of a standard best effort service (without marking).

Table 2 Model vs simulation with 6 and 20 flows Pdrop (%)

P dropin (%)

P dropout (%)

A (packet)

n=6

T (packet/s)

G (packet/s)

q (packet)

Mean error (%) Max error (%)

0.65 0.86

0.74 0.95

1.01 22.46

12.12 23.71

649.54 1141.30

1.60 28.58

25.98 58.46

n = 20 Mean error (%) Max error (%)

1.25 4.43

0.27 1.58

2.17 18.42

83.70 120.97

598.05 729.45

3.23 13.51

36.48 46.13

0.2 0.18 0.16

A=L matlab A=L/2 matlab A=L ns A=L/2 ns

Queue Delay (sec)

0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 85.00%

87.00%

89.00%

91.00%

93.00%

95.00%

Goodput %

Fig. 9. A study of a variant of the PMA.

97.00%

99.00%

1190

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191

References [1] D.D. Clark, W. Fang, Explicit allocation of best effort packet delivery service, IEEE Transactions on Networking 6 (4) (1998) 362–373. [2] J. Ibanez, K. Nichols, Preliminary simulation evaluation of an assured service, IETF Draft (1998). [3] J. Heinanen, R. Guerin, A single rate three color marker, Request For Comments 2697 (1999). [4] J. Heinanen, R. Guerin, A two rate three color marker, Request For Comments 2698 (1999). [5] W. Fang, N. Seddigh, B. Nandy, A time sliding window three colour marker (tswtcm), Request For Comments 2859 (2000). [6] N. Seddigh, B. Nandy, P. Piedu, Bandwith assurance issues for TCP flows in a differentiated services network, in: Proceedings of IEEE Globecom, 1999, pp. 1792–1798. [7] S. Sahu, D. Towsley, J. Kurose, Quantitative study of Differentiated Services for the internet, in: Proceedings of IEEE Globecom, 1999, pp. 1808–1817. [8] J. Harju, Y. Koucheryavy, J. Laine, S. Saaristo, K. Kilkki, J. Ruutu, H. Waris, J. Forsten, J. Oinonen, Performance measurements and analysis of TCP flows in a differentiated services WAN, in: Proceedings of the 25th Annual IEEE Conference on Local Computer Networks, 2000. [9] S. Sahu, P. Nain, D. Towsley, C. Diot, V. Firoiu, On achievable service differentiation with token bucket marking for TCP, in: Proceedings of ACM SIGMETRICSÕ00, 2000. [10] W. Feng, D. Kandlur, D. Saha, K. Shin, Adaptive packet marking for maintaining end-to-end throughput in a differentiated services Internet, IEEE/ACM Transactions on Networking 7 (5) (1999) 685–697. [11] Y. Chait, C. Hollot, V. Misra, D. Towsley, H. Zhang, J.C. Lui, Providing throughput differentiation for TCP flows using adaptive two-color marking and two-level AQM, in: Proceedings of IEEE Infocom, 2002. [12] I. Yeom, A.L.N. Reddy, Adaptive marking for aggregated flows, in: Proceedings of IEEE Globecom, 2001. [13] P. Giacomazzi, L. Musumeci, G. Verticale, Transport of TCP/IP traffic over assured forwarding IP-differentiated services, IEEE Network 5 (5) (2003) 18–28. [14] M. El-Gendy, K. Shin, Equation-based packet marking for assured forwarding services, in: Proceedings of IEEE Infocom, 2002. [15] J.H. Lee, C.K. Jeong, Improvement of fairness between assured service TCP users in a differentiated service network, in: Proceedings of Joint 4th IEEE International Conference ATM (ICATM 2001) and High Speed Intelligent Internet Symposium, 2001. [16] H. Wu, K. Long, S. Cheng, J. Ma, Y. Le, TCP friendly fairness in differentiated services IP networks, in: Proceedings of 9th IEEE International Conference on Networks (ICON), 2001. [17] M. Mellia, I. Stoica, H. Zhang, Packet marking for web traffic in networks with RIO routers, in: Proceedings of Globecom, 2001. [18] F. Azeem, A. Rao, S. Kalyanaraman, A TCP-friendly traffic marker for IP differentiated services, in: Proceedings of IwQoS, 2000. [19] G.L. Monaco, F. Azeem, S. Kalyanaraman, Y. Xia, TCPfriendly marking for scalable best-effort services on the

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33] [34] [35]

[36]

[37]

[38]

Internet, Computer Communication Review (CCR) 31 (5) (2001). G. Neglia, G. Bianchi, F. Saitta, D. Lombardo, Adaptive low priority packet marking for better TCP performance, Net-Con (2002). J. Padhye, V. Firoiu, D. Towsley, J. Kurose, Modeling TCP throughput: a simple model and its empirical validation, Proceedings of ACM SIGCOMM (1998). G. Neglia, G. Bianchi, M. Sottile, Performance evaluation of a new adaptive packet marking scheme for TCP over DiffServ networks, in: Proceedings of Globecom, 2003. T. Bu, D. Towsley, Fixed point approximation for TCP behavior in an AQM network, in: Proceedings of ACM SIGMETRICS, 2001. M. Meo, M. Garetto, M.A. Marsan, R.L. Cigno, On the use of fixed point approximations to study reliable protocols over congested links, in: Proceedings of Globecom, 2003. F.P. Kelly, Blocking probabilities in large circuit-switched networks, Advances in Applied Probability 18 (1986) 473– 505. R. Gibbens, S. Sargood, C.V. Eijl, F. Kelly, H. Azmoodeh, R. Macfadyen, N. Macfadyen, Fixed-point models for the end-to-end performance analysis of IP networks, in: Proceedings of 13th ITC Specialis Seminar: IP Traffic Measurement Modeling and Management, 2000. M. Roughan, A. Erramilli, D. Veitch, Network performance for TCP networks. Part I: Persistent sources, in: Proceedings of International Teletraffic Congress, 2001. U. Ayesta, K. Avrachenkov, E. Altman, C. Barakat, P. Dube, Simulation analysis and fixed point approach for multiplexed TCP flows, INRIA Technical Report RR-4749 (2003). C. Casetti, M. Meo, A new approach to model the stationary behavior of TCP connections, in: Proceedings of IEEE Infocom, 2000. N.M. Malouch, Z. Liu, Performance analysis of TCP with RIO routers, in: Proceedings of IEEE Globecom, 2002. A. Misra, T. Ott, The window distribution of idealized TCP congestion avoidance with variable packet loss, in: Proceedings of IEEE Infocom, 1999. G. Neglia, Ingress traffic control in differentiated services IP networks, Ph.D. thesis, Universita` degli studi di Palermo, Dipartimento di Ingegneria Elettrica, 2004. V. Istratescu, Fixed Point Theory, Reidel, Dordrecht, Holland, 1981. G. Debreu, Theory of Value: An Axiomatic Analysis of Economic Equilibrium, Wiley, New York, NY, USA, 1959. V. Firoiu, M. Borden, A study of active queue management for congestion control, in: Proceedings of IEEE Infocom, 2000. F. Baccelli, D. Hong, Z. Liu, Fixed Point Methods for the Simulation of the Sharing of a Local Loop by a Large Number of Interacting TCP Connections, INRIA Technical Report RR-4154 (2001). F. Baccelli, D.R. McDonald, J. Reynier, A Mean-Field Model for Multiple TCP Connections through a Buffer Implementing RED, INRIA Technical Report RR-4449, 2002. D. Bertsekas, R. Gallagher, Data Networks, Prentice-Hall, Engelwood Cliffs, NY, USA, 1992.

G. Neglia et al. / Computer Networks 50 (2006) 1176–1191 [39] K.W. Ross, Multiservice Loss Networks for Broadband Telecommunication Networks, Springer-Verlag, Secaucus, NJ, USA, 1995. [40] V. Firoiu, I. Yeom, X. Zhang, A framework for practical performance evaluation and traffic engineering in IP networks, in: Proceedings of IEEE International Conference on Telecommunications, 2001. [41] E. Altman, K. Avrachenkov, C. Barakat, TCP network calculus: the case of large delay-bandwidth product, in: Proceedings of IEEE Infocom, 2002. [42] V. Misra, W. Gong, D. Towsley, A fluid-based analysis of a network of AQM routers supporting TCP flows with an application to RED, in: Proceedings of ACM SIGCOMM. [43] G. Neglia, G. Bianchi, V. Falletta, An analytical model of a new packet marking algorithm for TCP flows: preliminary insights, in: Proceedings of ISCCSP, 2004. [44] J. Cao, W.S. Cleveland, D. Lin, D.X. Sun, Nonlinear estimation and classification, Internet Traffic Tends Toward Poisson and Independent as the Load Increases, Springer, 2002. [45] M. May, J.-C. Bolot, A. Jean-Marie, C. Diot, Simple performance models of tagging schemes for service differentiation in the Internet, in: Proceedings of the IEEE Infocom, 1999. [46] M. Garetto, R.L. Cigno, M. Meo, M.A. Marsan, Closed queueing network models of interacting long-lived TCP flows, IEEE/ACM Transactions on Networking 12 (2) (2004) 300–311. [47] R. Stankiewicz, A. Jajszczyk, Modeling of TCP behavior in a DiffServ network supporting assured forwarding PHB, in: Proceedings of ICC, 2004.

Giovanni Neglia received a Laurea degree in Electronic Engineering and a Ph.D. degree in telecommunications from the University of Palermo, Italy, respectively in 2001 and 2005. From 2005 he is a postdoc at the University of Palermo and he is currently visiting the Computer Networks Research Group at the University of Massachusetts, Amherst. His research interests include performance evaluation in IP networks, statistical

1191

models for IP data traffic, admission control mechanisms, TCP models, peer-to-peer networks, videoconference architectures and protocols.

Vincenzo Falletta received the Laurea degree in Electronic Engineering from University of Palermo, Palermo, Italy, in November 2003. From November 2003 to November 2004 he has been involved in the Italian Research Project FIRBTANGO, dealing with simulative and analytical models for advanced TCP-IP control techniques, with QoS guarantees. Since November 2004 he is a Ph.D. student at the Department of Electronic Engineering of the University of Rome ‘‘Tor Vergata’’.

Giuseppe Bianchi has been Assistant Professor at the Politecnico di Milano, Italy, from 1993 to 1998, and Associate Professor at the University of Palermo, Italy, from 1998 to 2003. He is currently Associate Professor at the University of Roma Tor Vergata, Italy. He spent 1992 as Visiting Researcher at the Washington University of St. Louis, Missouri, USA, and 1997 as Visiting Professor at the Columbia University of New York. His research activity (documented in about 100 papers in peer-refereed international journals and conferences) spans several areas, among which: multiple access and mobility management in wireless local area networks; design and performance evaluation of broadband networking protocols; Quality of Service support in IP networks. He has been co-organizer of the first ACM workshop on Wireless Mobile Internet (ACM WMI 2001), of the first ACM workshop on Wireless Mobile Applications over WLAN Hot-spot (ACM WMASH 2003), and of the third IEEE international workshop on Multiservice IP networks (IEEE QoSIP 2005). He has been general chair for the second ACM workshop on Wireless Mobile Applications over WLAN Hot-spot (ACM WMASH 2004).

EDITORS-IN-CHIEF Ian F. Akyildiz School of Electrical & Computer Engineering Georgia Institute of Technology Atlanta, GA 30332, USA E-mail: comnet@ece. gatech.edu Harry Rudin Vordere Bergstrasse 1 8942 Oberrieden, Switzerland E-mail: [email protected] EDITOR-IN-CHIEF EMERITUS: Phillip H. Enslow EDITORIAL BOARD Marco Ajmone Marsan Politecnico di Torino, Italy Nail Akar Bilkent University, Turkey Eitan Altman INRIA Sophia-Antipolis, France Novella Bartolini Universita di Roma, Italy Buyurman Baykal Middle East Technical University Ankara, Turkey

Edith Cohen AT&T Labs-Research, Florham Park, NY, USA

Edward Knightly Rice University Houston, TX, USA

Xuemin Sherman Shen University of Waterloo Canada

Francesca Cuomo University of Rome Italy

Udo R. Krieger Otto Friedrich University Bamberg, Germany

Ness Shroff Purdue University West Lafayette, IN, USA

Christos Douligeris University of Piraeus, Greece

Ajay Kshemkalyani University of Illinois Chicago, IL, USA

Patrick W. Dowd National Security Agency, Fort Meade, MD, USA Eylem Ekici Ohio State University Columbus, OH, USA Serge Fdida Universite´ P. & M. Curie Paris, France Nelson L. S. Fonseca State University of Campinas Brazil Deborah Frincke University of Idaho Moscow, ID, USA Andrea Fumagalli University of Texas Richardson, TX, USA Dominique Gaiti Universite´ de Technologie de Troyes, France Reinhard Gotzhein Universita¨t Kaiserslautern Germany

Geng-Sheng Kuo National Chengchi University Taipei, Taiwan Simon S. Lam University of Texas Austin, TX, USA C.-T. Lea Hong Kong University of Science and Technology China Luciano Lenzini University of Pisa, Italy Renato LoCigno University of Trento, Italy

Raghupathy Sivakumar Georgia Institute of Technology Atlanta, GA, USA Michael Smirnow Fraunhofer Institute Berlin, Germany Josep Sole´-Pareta Universitat Polite`cnica de Catalunya Barcelona, Spain Arun K. Somani Iowa State University Ames, IA, USA Ioannis Stavrakakis University of Athens, Greece

Ibrahim Matta Boston University Boston, MA, USA

Dimitrios Stiliadis Bell Labs Holmdel, NJ, USA

Jelena Misic University of Manitoba Winnipeg, Canada

Violet R. Syrotiuk Arizona State University Tempe, AZ, USA

Refik Molva Institut EURECOM, France

Vassilis Tsaoussidis Democritos University of Greece Xanthi, Greece

Giacomo Morabito University of Catania, Italy

Kul Bhasin NASA Glenn Research Center Cleveland, OH, USA

Enrico Gregori Institute for Informatics & Telematics Pisa, Italy

Ioanis Nikolaidis University of Alberta Edmonton, Canada

Chris Blondia University of Antwerp Belgium

Jennifer Hou University of Illinois Urbana, IL, USA

Ariel Orda Technion, Israel Institute of Technology, Haifa, Israel

Raouf Boutaba University of Waterloo Canada

Ahmed Kamal Iowa State University Ames, IA, USA

Sergio Palazzo University of Catania, Italy

Milind Madhav Buddhikot Bell Labs, Holmdel, NJ, USA

Krishna Kant Intel Inc., Hillsboro, OR, USA

Jaudelice Cavalcante de Oliveira Drexel University Philadelphia, PA, USA

Gunnar Karlsson Royal Institute of Technology KTH, Stockholm, Sweden

Jonathan Chao Polytechnic University of New York Brooklyn, NY, USA

Sneha Kumar Kasera University of Utah Salt Lake City, UT, USA

Luca Salgarelli Universita di Brescia, Italy

Carlos Becker Westphall Federal University of Santa Catarina, Florianopolis, Brazil

Edwin K.P. Chong Colorado State University Fort Collins, CO, USA

Wolfgang Kellerer DoCoMo Comm. Labs., Munich, Germany

Guenter Schaefer Technische Universita¨t Ilmenau, Germany

Guoliang Larry Xue Arizona State University Tempe, AZ, USA

Andreas Pitsillides University of Cyprus, Cyprus Juan Quemada ETSI Telecommunication Madrid, Spain Debanjan Saha IBM T. J. Watson Research Ctr. USA

Tuna Tugcu Bogazici University, Turkey Piet Van Mieghem Delft University, The Netherlands Muthaiah Venkatachalam Intel Corp., USA Giorgio Ventre University of Naples, Italy Wenye Wang North Carolina State University Raleigh, NC, USA Cedric Westphal Nokia Research Center, USA