236 21 4MB
English Pages 363 Year 2004
Web Content Caching and Distribution
This page intentionally left blank
Web Content Caching and Distribution Proceedings of the 8th International Workshop
Edited by
Fred Douglis IBM T.J. Watson Research Center, Hawthorne, NY, U.S.A. and
Brian D. Davison Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, U.S.A.
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: Print ISBN:
1-4020-2258-1 1-4020-2257-3
©2004 Springer Science + Business Media, Inc.
Print ©2004 Kluwer Academic Publishers Dordrecht All rights reserved
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher
Created in the United States of America
Visit Springer's eBookstore at: and the Springer Global Website Online at:
http://www.ebooks.kluweronline.com http://www.springeronline.com
Contents
A Message from the Workshop Chairs
ix
Credits
xi
Contributing Authors
xiii
Part 1 – Mobility Mobility-aware server selection for mobile streaming multimedia content distribution networks Muhammad Mukarram Bin Tariq, Ravi Jain, Toshiro Kawahara Performance of PEPs in cellular wireless networks Pablo Rodriguez, Vitali Fridman
1 19
Part 2 – Applications Edge caching for directory based Web applications: Algorithms and performance 39 Apurva Kumar, Rajeev Gupta Computing on the edge: A platform for replicating Internet applications Michael Rabinovich, Zhen Xiao, Amit Aggarwal Scalable consistency maintenance for edge query caches: Exploiting templates in Web applications Khalil Amiri, Sara Sprenkle, Renu Tewari, Sriram Padmanabhan
57
79
Part 3 – Architectures Proxy+: Simple proxy augmentation for dynamic content processing Chun Yuan, Zhigang Hua, Zheng Zhang Synopsis: Multicast cloud with integrated multicast and unicast content distribution routing Dan Li, Arun Desai, Zheng Yang, Kenneth Mueller, Stephen Morris, Dmitry Stavisky Synopsis: A large enterprise content distribution network: Design, implementation and operation Jacobus Van der Merwe, Paul Gausman, Chuck Cranor, Rustam Akhmarov
91
109
119
vi
WEB CONTENT CACHING AND DISTRIBUTION
Synopsis: Architectural choices for video-on-demand systems Anwar Al Hamra, Ernst W. Biersack, Guillaume Urvoy-Keller
129
Part 4 – Multimedia Dynamic cache reconfiguration strategies for a cluster-based streaming proxy Yang Guo, Zihui Ge, Bhuvan Urgaonkar, Prashant Shenoy, Don Towsley
139
Stream engine: A new kernel interface for high-performance Internet streaming servers Jonathan Lemon, Zhe Wang, Zheng Yang, Pei Cao
159
Streaming flow analyses for prefetching in segment-based proxy caching to improve delivery quality Songqing Chen, Bo Shen, Susie Wee, Xiaodong Zhang
171
Part 5 – Customization Subscription-enhanced content delivery Mao Chen, Jaswinder Pal Singh, Andrea LaPaugh Cooperative architectures and algorithms for discovery and transcoding of multi-version content Claudia Canali, Valeria Cardellini, Michele Colajanni, Riccardo Lancellotti, Philip S. Yu Synopsis: User specific request redirection in a content delivery network Sampath Rangarajan, Sarit Mukherjee, Pablo Rodriguez
187
205
223
Part 6 – Peer-to-Peer Friendships that last: Peer lifespan and its role in P2P protocols Fabian E. Bustamante, Yi Qiao Synopsis: A fine-grained peer sharing technique for delivering large media files over the Internet Mengkun Yang, Zongming Fei
233
247
Part 7 – Performance and Measurement Proxy-cache aware object bundling for Web access acceleration Chi Hung Chi, HongGuang Wang, William Ku
257
Synopsis: A case for dynamic selection of replication and caching strategies Swaminathan Sivasubramanian, Guillaume Pierre, Maarten van Steen
275
Synopsis: Link prefetching in Mozilla: A server-driven approach Darin Fisher, Gagan Saksena
283
Contents Synopsis: A generalized model for characterizing content modification dynamics of Web objects Chi Hung Chi, HongGuang Wang
vii 293
Part 8 – Delta Encoding Server-friendly delta compression for efficient Web access Anubhav Savant, Torsten Suel
303
Evaluation of ESI and class-based delta encoding Mor Naaman, Hector Garcia-Molina, Andreas Paepcke
323
Author Index
345
This page intentionally left blank
A Message from the Workshop Chairs
Dear Participant: Welcome to the 8th International Web Caching and Content Delivery Workshop. Since our first meeting in 1996, this workshop has served as the premiere forum for researchers and industry technologists to exchange research results and perspectives on future directions in Internet content caching and content delivery. This year we received 46 submissions, of which 15 have been selected as full-length papers and 8 as synopses. We extend our thanks to the authors of the selected papers, all of which are included in these proceedings. In addition to technical presentations, we are pleased to have Bill Weihl of Akamai to present the keynote address, and a panel discussion on uncachable content organized by Zhen Xiao of AT&T Labs – Research. While originally scheduled to be held in Beijing, China, the workshop moved to the US this year as a result of the concerns over the SARS virus. We are indebted to our industrial sponsor, IBM, for providing the facilities in which to hold the workshop. The T.J. Watson Research Center that serves as our venue spans three sites across two states, and is the headquarters for the eight IBM research labs worldwide. We are also grateful to the members of the program committee for helping to select a strong program, and to the members of the steering committee who continue to provide advice and guidance, even as plans are made for next year’s workshop. In past years, we have found great topics and fruitful discussion as people from industry and academia interact. We are confident that you will experience the same at this year’s workshop.
Brian D. Davison General Chair
Fred Douglis Program Chair
This page intentionally left blank
Credits
General Chair Brian D. Davison, Lehigh University Program Chair Fred Douglis, IBM T.J. Watson Research Center Program Committee Martin Arlitt, University of Calgary Remzi Arpaci-Dusseau, University of Wisconsin Chi-Hung Chi, National University of Singapore Mike Dahlin, University of Texas at Austin Fred Douglis, IBM T.J. Watson Research Center Zongming Fei, University of Kentucky Leana Golubchik, University of Southern California Jaeyeon Jung, MIT LCS Dan Li, Cisco Systems, Inc. Guillaume Pierre, Vrije Universiteit, Amsterdam Weisong Shi, Wayne State University Oliver Spatscheck, AT&T Labs – Research Renu Tewari, IBM Almaden Research Center Amin Vahdat, Duke University Geoff Voelker, University of California, San Diego Zhen Xiao, AT&T Labs – Research Steering Committee Azer Bestavros, Boston University Pei Cao, Cisco Jeff Chase, Duke University Valentino Cavalli, Terena Peter Danzig, University of Southern California John Martin, Network Appliance Michael Rabinovich, AT&T Labs – Research Wojtek Sylwestrzak, Warsaw University Duane Wessels, The Measurement Factory
xii
WEB CONTENT CACHING AND DISTRIBUTION
Keynote Speaker William Weihl, Akamai Technologies, Inc. Panel Moderator Zhen Xiao, AT&T Labs – Research Panelists Indranil Gupta, University of Illinois, Urbana-Champaign Arun Iyengar, IBM Research Michael Rabinovich, AT&T Labs – Research Torsten Suel, Polytechnic University William Weihl, Akamai Technologies, Inc. Session Chairs Chi-Hung Chi, National University of Singapore Brian D. Davison, Lehigh University Fred Douglis, IBM T.J. Watson Research Center Zongming Fei, University of Kentucky Michael Rabinovich, AT&T Labs – Research Pablo Rodriguez, Microsoft Research, Cambridge Oliver Spatscheck, AT&T Labs – Research Torsten Suel, Polytechnic University External Reviewers Benjamin Atkin Yan Chen Subhabrata Sen Andrew Tridgell
Limin Wang Craig Wills Kun-Lung Wu
Contributing Authors
Amit Aggarwal
Microsoft
Anwar Al Hamra
Institut Eurecom
Rustam Akhmarov
AT&T Labs – Research
Khalil Amiri
Imperial College London
Ernst W. Biersack
Institut Eurecom
Fabian E. Bustamante
Department of Computer Science, Northwestern University
Pei Cao
Cisco Systems, Inc.
Claudia Canali
University of Parma
Valeria Cardellini
University of Roma “Tor Vergata”
Mao Chen
Department of Computer Science, Princeton University
Songqing Chen
College of William and Mary
Chi Hung Chi
National University of Singapore
Michele Colajanni
University of Modena and Reggio
Chuck Cranor
AT&T Labs – Research
Arun Desai
Cisco Systems, Inc.
Zongming Fei
Department of Computer Science, University of Kentucky
xiv
WEB CONTENT CACHING AND DISTRIBUTION
Darin Fisher
IBM
Vitali Fridman
Microsoft Research, Cambridge
Hector Garcia-Molina
Department of Computer Science, Stanford University
Paul Gausman
AT&T Labs – Research
Zihui Ge
Department of Computer Science, University of Massachusetts at Amherst
Yang Guo
Department of Computer Science, University of Massachusetts at Amherst
Rajeev Gupta
IBM India Research Lab
Zhigang Hua
Institute of Automation, Chinese Academy of Sciences
Ravi Jain
DoCoMo Communications Laboratories USA
Toshiro Kawahara
DoCoMo Communications Laboratories USA
William Ku
National University of Singapore
Apurva Kumar
IBM India Research Lab
Riccardo Lancellotti
University of Roma “Tor Vergata”
Andrea LaPaugh
Department of Computer Science, Princeton University
Jonathan Lemon
Cisco Systems, Inc.
Dan Li
Cisco Systems, Inc.
Stephen Morris
Cisco Systems, Inc.
Kenneth Mueller
Cisco Systems, Inc.
Sarit Mukherjee
Microsoft Research, Cambridge
xv
Contributing Authors Mor Naaman
Department of Computer Science, Stanford University
Sriram Padmanabhan
IBM Santa Teresa Lab
Andreas Paepcke
Department of Computer Science, Stanford University
Guillaume Pierre
Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam
Yi Qiao
Department of Computer Science, Northwestern University
Michael Rabinovich
AT&T Labs – Research
Sampath Rangarajan
Lucent Technologies Bell Laboratories
Pablo Rodriguez
Microsoft Research, Cambridge
Gagan Saksena
AOL
Anubhav Savant
CIS Department, Polytechnic University
Bo Shen
Hewlett-Packard Laboratories
Prashant Shenoy
Department of Computer Science, University of Massachusetts at Amherst
Jaswinder Pal Singh
Department of Computer Science, Princeton University
Swaminathan Sivasubramanian
Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam
Sara Sprenkle
Duke University
Dmitry Stavisky
Cisco Systems, Inc.
Torsten Suel
CIS Department, Polytechnic University
Muhammad Mukarran Bin Tariq DoCoMo Communications Laboratories USA Renu Tewari
IBM Almaden Research Center
xvi
WEB CONTENT CACHING AND DISTRIBUTION
Don Towsley
Department of Computer Science, University of Massachusetts at Amherst
Bhuvan Urgaonkar
Department of Computer Science, University of Massachusetts at Amherst
Guillaume Urvoy-Keller
Institut Eurecom
Jacobus Van der Merwe
AT&T Labs – Research
Maarten van Steen
Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam
HongGuang Wang
National University of Singapore
Zhe Wang
Cisco Systems, Inc.
Susie Wee
Hewlett-Packard Laboratories
Zhen Xiao
AT&T Labs – Research
Zheng Yang
Cisco Systems, Inc.
Mengkun Yang
Department of Computer Science, University of Kentucky
Philip S. Yu
IBM T.J. Watson Research Center
Chun Yuan
Microsoft Research Asia
Xiaodong Zhang
College of William and Mary
Zheng Zhang
Microsoft Research Asia
MOBILITY AWARE SERVER SELECTION FOR MOBILE STREAMING MULTIMEDIA CONTENT DISTRIBUTION NETWORKS Muhammad Mukarram Bin Tariq, Ravi Jain, and Toshiro Kawahara DoCoMo Communications Laboratories USA, Inc.
Abstract
1.
We propose a Content Delivery Network (CDN) with servers arranged hierarchically in multiple tiers. Lower-tier servers are topologically closer to the clients, and hence can deliver better QoS in terms of end-to-end delay and jitter. On the other hand, higher-tier servers have a larger coverage area and hence their clients incur fewer server handoffs. We present a server selection scheme that reduces the number of server handoffs while meeting differentiated QoS requirement for each client. The scheme dynamically estimates the client's residence time and uses a simple algorithm to assign clients to the appropriate tier. The scheme also caters for traditional server selection criteria, such as the expected QoS from the servers, bandwidth consumption, and the server load. We show through simulations that the scheme can achieve up to 15% reduction in handoffs, at the cost of minimal increases in delay and jitter while ensuring that clients of different QoS classes experience different delays.
Introduction and Overview
We expect that multimedia and data traffic will surpass the traditional voice traffic in mobile networks by the year 2005 [18]. High quality streaming multimedia content is likely to form a significant portion of this traffic. It is therefore important that large mobile networks find ways to manage client traffic and data efficiently. Content distribution networks (CDN) have proven effective for managing contentbased traffic for large numbers of clients in the Internet. CDN consist of surrogate servers that replicate the content of the origin servers and serve it to the clients. CDN employ server selection and request redirection methods for selecting an appropriate (surrogate) server and redirecting the client’s request to that server. CDN reduce load on both the network and origin server by localizing the traffic and providing many alternate sources of content. Iyengar et al., [8] provide an overview of existing CDN technologies. Although there has been much work on the Internet-wide CDN, CDN for mobile 1 F. Douglis and B.D. Davison (eds.), Web Content Caching and Distribution, 1-18. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.
2
Muhammad Mukarram Bin Tariq, Ravi Jain, and Toshiro Kawahara
networks have received little attention so far. Mobile CDN proposals, such as [4], target static WAP or HTML content for mobile clients, while only [16, 17] consider streaming multimedia content distribution for mobile networks. This lack of attention has largely been because of the limited Internet access capabilities of most mobile terminals of recent past. However, this is changing quickly with deployment and acceptability of 3G services [11] that allow mobile terminals to access Internet and other data services at speeds comparable to traditional wired access. In this paper, we consider server selection algorithms for streaming multimedia content distribution in networks with mobile users. The large size of most multimedia content, long-lived nature of typical multimedia sessions, client mobility, and the capricious nature of wireless communication medium, put together, present an interesting challenge. A CDN for mobile networks must address all of these issues. In [16], Tariq et al. show that QoS, in terms of delay and jitter, can be significantly improved using server handoff; a process of changing the server as the client moves so that the client continues to receive content from a nearby server. Server handoff itself, however, can have adverse effects. It can disrupt the streaming from the server, causing glitches at the client. Moreover, changing the server can be an expensive process for the network. Before a session can be handed over to a new server, sufficient content must be pre-fetched (or placed) at the server to ensure smooth delivery [15]. Random and frequent client movements can cause significant signaling and content placement overhead. One approach to the problem of stream disruption is to mitigate it by sufficient buffering at client equipment at the cost of increased playback delay. Another approach is using make-before-break handoffs. However, reducing the actual number of handoffs is not trivial. In [16], authors propose to delay the server handoff process to reduce the number of handoffs and resulting overhead. In this paper we present a more sophisticated server selection scheme. Our scheme reduces the number of server handoffs for mobile clients by using client mobility information and selecting a server with which the client can remain associated for an extended period, thus reducing the need for server handoffs. We also cater for traditional server selection criteria such as expected QoS (in terms of delay and jitter) from the server, traffic localization, and server load. The basic trade-off that we explore is how to maintain QoS service differentiation for clients while reducing the network cost due to server handoffs. The rest of this paper is organized as follows. Section 2 describes our mobilitybased server-selection scheme. Sections 3 and 4 are dedicated to simulation and results. In section 5, we discuss related work, and conclude the paper in section 6.
2.
Mobility Based Server Selection
Our approach is to use the client’s mobility information as an additional metric in the server selection process, along with traditional metrics such as client-server proximity, expected QoS, content availability, and server load. We try to select a server that will remain suitable for content delivery to a client for an extended period of time and thus eliminating the need for frequent server handoff.
Mobility aware server selection for mobile streaming multimedia CDNs
2.1
3
Layout of servers in content distribution network
We consider a content distribution network with a relatively large number of servers arranged in a logically hierarchical topology, where the servers closest to the access network belong to the lowest tier (see Figure 1). This topology allows us to maximize the traffic localization and obtain desired trade-off between the number of server handoffs and the QoS to the clients. These advantages become further apparent in proceeding sections. Each server has a coverage-area defined as the subnets and cells in the underlying transport network (each subnet may consist of one or more cells). In general, each subnet (and cell) is in the coverage area of the closest server at any given tier, but multiple servers may serve subnets at coverage-area boundaries. Servers at higher tiers have larger coverage areas and therefore, multiple tiers cover every subnet. We use the term server zone to refer to the coverage area of lowest tier servers. Tier 3 Server Coverage Area
Servers
Tier 2 Server Coverage Area Tier 1 Server Coverage Area / aka. server-zone Access Network Subnets
Figure 1. Tiered Topology of Servers in CDN
As in [16], a server handoff is performed whenever a client moves from the coverage area of one server to that of another server. However, with the multi-tier arrangement of servers, there is an opportunity to trade-off QoS with the number of server handoffs, by carefully choosing the tier of the server assigned after handoff. Due to the topological arrangement of the servers, it is likely that the data delivered from the servers in lower tiers will suffer less delay (and probably less jitter) than the data delivered from the servers in higher tiers; however, assignment to servers in lower tiers will result in more server handoffs and may increase the overhead for the network. We assume that clients have different categories of QoS requirements in terms of delay and jitter. We further assume that each of these categories of QoS can be roughly characterized as the property of a particular server tier. For example, we can assume that highest QoS class, with strict delay and jitter requirement (QoS class 1) maps to the lowest tier server. A fixed mapping is of this type is, however, not necessary. Other QoS metrics, such as, available bandwidth, server response-time, and media quality may be used for server classification. Several dynamic QoS probing and measurement methods, such as in [5, 10] can be used to dynamically classify servers based on different QoS metrics.
4
Muhammad Mukarram Bin Tariq, Ravi Jain, and Toshiro Kawahara
We assume that there is a Request Routing function (RR) in each server zone; recall a server zone is the coverage area of lowest-tier servers. We recommend a proxy-based RR approach because it provides greater control and flexibility but other request redirection techniques, such as those discussed in [1], may also be used. In case of proxy-based RR, the function can co-reside with the lowest tier servers. As the client moves from one server zone to another, it is assigned to a new serving RR. The new RR selects servers for any new sessions that the client may request and also initiates and coordinates server handoffs for the client’s existing sessions, if needed. We assume that the RR has sufficient knowledge about the CDN topology and the client mobility information to select appropriate servers and route the requests correctly. We describe the information collection and server selection process in the following.
2.2
Considerations in mobility based server selection
We must consider the following in mobility based server selection. Using lower tier servers reduces delay and jitter, but increases the number of server handoffs. Therefore, we must serve the users with high QoS requirement from the servers in the lower tiers and serve the users with high mobility from the servers in the higher tiers. However, there will be users that have high mobility and high QoS requirements at the same time. We must find way to assign server to clients with such conflicting characteristics. If disproportionate numbers of clients in a region have similar mobility characteristics, they may overload a certain tier. Therefore, we must perform load balancing among tiers. As an example, consider the mobility patterns in commute hours. Since many users are mobile, the mobility rate is high for many clients. Similarly, in a region that spans high-speed freeways, the average user mobility would be higher than the regions that cover residential areas. Reduction in handoffs in high mobility rate regions requires that we serve users from servers in higher tiers. However, if we do not consider server load, servers in the higher tier servers may get overloaded. When the client requests belong to various differentiated QoS classes, we must maintain a desired differentiation of the mean QoS for sessions of each QoS class.
2.3
Measurement of mobility rate and server residence time estimation
We use client’s expected residence time as a measure of its mobility and as a parameter in the server selection process. The residence time is defined as the time that elapses between when a client enters a region and when it leaves that region. Each RR maintains the residence time value ri for every client i in its server zone. This value is computed by the clients themselves as a running average of the residence time over their k most recent movements, and reported to the serving RR upon each movement. This movement can be a movement across a cell region, a subnet region or any other lower layer region boundary, but the information based on
Mobility aware server selection for mobile streaming multimedia CDNs
5
movements across subnet boundaries can be easily collected and reported using protocols, such as, Mobile IP [6] binding update messages. Each RR also maintains the average subnet residence time r over all the clients in its server zone. In addition, RR also maintains the mean server residence time Rt over all servers in each tier t . For server residence time, we consider the server’s coverage area as the region. After every successful handoff, the new RR send a server handoff notification message to the old RR. This message includes the client identifier, session identifier, and the identifiers of the new and the old servers involved in the handoff. Using the information contained in this message, the old RR determines the server residence time as the difference of the time at which the client was originally assigned to the old server and the time of server handoff. Information about the timing of original assignment can either be contained in the notification message, or can be assumed to be maintained by the old RR since it is in path of session control signaling traffic. The server tier can be determined using the server identifier. The old RR updates the mean server residence time for the corresponding tier after each handoff notification message. Equipped with measurements of individual and aggregate residence times, an RR can estimate residence time of a client i with a server at tier t as:
Ei ,t =
ri Rt r
(1)
In a refinement to mobility information measurement, every RR maintains the average subnet residence time rs and server tier residence time Rt , s separately for
clients in each subnet s in its coverage-area and uses these values for estimating residence time of a client in subnet s . This improves the accuracy of estimation because the extent of a clients mobility can be more accurately determined by comparing it with other clients in its close geographical proximity (in this case, a subnet), than by comparing it with clients over a larger area. Expected residence time of a client i in subnet s , with a server in tier t with this refinement is given as:
Ei ,t =
ri Rt , s rs
(2)
In the following text, we refer to equation (1) and equation (2) as low granularity residence time estimation and high granularity residence time estimation respectively. Several mobility prediction schemes have been proposed (e.g., [2], see [3] for a survey), which can be used by our algorithm with minor modifications. These schemes will likely provide better accuracy but only at the expense of extensive state maintenance and large overhead.
2.4
Server load and QoS information collection
Each server periodically sends information to all the RR in its coverage area about the number of sessions that it can accept from the RR. The servers observe the load from each RR (i.e., the number of sessions emanating from the RR’s server-zone)
6
Muhammad Mukarram Bin Tariq, Ravi Jain, and Toshiro Kawahara
and allocate its remaining capacity to each RR proportionately to the load observed from that RR; we call this the server load allowance. The sum of load allowance reported to an RR by all servers at tier t is denoted as Lt . With our hierarchical topology of CDN, the number of RR in coverage area of each server is likely to be low, thus dispersal of such information is feasible. Each RR obtains information about the nominal QoS obtained from each tier. This can be done using QoS probing techniques, such as in [5, 10], or QoS feedback information, such as through RTCP [14] or by other means. For the following discussion, we assume delay as the primary QoS metric, but other QoS metrics can be used in a similar manner with our algorithm. The nominal delay information for a tier t is represented as Dt . Each RR also maintains the number of client requests of each QoS class q that it sends to tier t as N q ,t .
2.5
Server selection algorithm
Listing 1 shows the pseudo-code for the server selection process. When an RR needs to select a server for a request to establish new session of QoS class q , it starts by selecting the highest possible server tier t , whose reported nominal delay Dt meets or exceeds the requirements of QoS class q . If the request is for a server handoff of an existing session then the RR sets t to the tier of the currently assigned server. The RR then uses a heuristic (lines 4-7 in listing 1) to determine whether the client should be assigned to tier t + 1. It finds the load allowance Lt +1 for tier t + 1, and determines whether the requesting client i is among the fastest clients. The reasoning behind the intuition here is that since the load allowance is limited and only a limited number of clients can be served from higher tiers, the reduction in number of server handoffs can be maximized if only the fastest clients are served from the higher tiers. Instead of relying on any absolute value for residence time or speed that would qualify a client for service from higher tiers, we simply use the client’s relative mobility with respect to the other clients in its region and see if the client is fast enough. The heuristic used to determine whether a client i is “fast enough” is to determine whether the client is among Lt +1 fastest clients in the server zone. This heuristic is based on the assumption that the residence time distributions of the clients in the recent past are similar to those of the immediate future. The RR sums the load due to all the clients whose residence times are less than the client i , and determines whether it is less than the load allowance Lt +1 ; if so, the client i is a candidate to be moved to tier t + 1 . The RR now determines whether moving the client i to tier t + 1 will maintain the desired separation among the adjacent QoS classes. The comparison is performed using the weighted mean QoS of client sessions of each QoS class (line 12, 23-25 in listing 1). If the QoS separation is maintained, the client is assigned to a server at tier t + 1.
Mobility aware server selection for mobile streaming multimedia CDNs
7
If either of the above conditions is not satisfied, i.e., if client is not “fast enough” or that the QoS differentiation is violated, the RR checks whether the client should be moved to the tier t − 1 (lines 10-16 in listing 1). For this, the load allowance of tier t − 1 must be positive. Additionally, RR must ensure that moving the client to a lower tier would not increase the number of handoffs. This is only possible if the client is relatively slow moving. This is done by calculating whether the estimated residence time for this client in a lower tier, Ei ,t −1 , exceeds the average residence time of current tier t (line 11 in listing 1). If Ei ,t −1 is greater than the average residence time of current tier, it is an indication that the client i is slow moving, and thus assigning it to a lower tier will not increase the number of server handoffs.
Listing 1: Mobility Based Server Selection Algorithm
1. proc selectServer (i: Client, q: QoS, t: serverTier, mode: String) 2. selectedTier ← t; Lt + 1 ← findLoadAllowance (t+1) 3. 4. 5. 6. 7. 8.
L
if clientInFastest ( t + 1 , i) if (qosSeparationMaintained(q)) selectedTier ← t+1; endif elseif(mode = “eager” || L
(mode = “lazy” & LoadAllowanceLow ( t + 1 , t+1)) Lt −1 = findLoadAllowance (t-1) 9. 10. if ( Lt −1 > 0) 11. if ( E(i,t − 1) > Rt ) 12. if (qosSeparationMaintained(q)) 13. selectedTier ← t-1; 14. endif 15. endif 16. endif 17. endif 18. serverIdentifier ← findServer(selectedTier) 19. return serverIdentifier; 20. endproc 21. 22. proc qosSeparationMaintained (q: QoSClass): bool
23.
return (meanDelay(q+1)–meanDelay(q)> δ q ,q + 1 ) &
(meanDelay(q)–meanDelay(q-1)> δ q − 1,q ) %% δ is desired delay separation of QoS Classes 25. endproc 26. 27. proc meanDelay (q: QosClass): QoS 28. T ← number of server tiers that cover server-zone of the RR
24.
8
Muhammad Mukarram Bin Tariq, Ravi Jain, and Toshiro Kawahara T
∑N 29.
Dq ←
D
q ,t t
t =1 T
%% weighted mean QoS of class q
∑N
q ,t
t =1
return Dq
30. 31. endproc
32. proc clientInFastest (Allowance L, Client i): bool 33. C ← total number of clients of RR C
∑ j:r x) ∧ (b x) ∧ (b < y)). B2= ((a ≤ p) ∧ (b q) ∧ (a < x) ∧ (b < y)). For B1 to be inconsistent: (x p) ∨ (q y) For B2 to be inconsistent: (q y) Thus F1 is contained in F2 if: ((x p) ∨ (q y)) ∧ (q y) (q y)
y), but the
q) ∧ (a 1 then the amount of bandwidth that would be saved by serving these requests from the nearby server i would exceed the amount of bandwidth consumed by shipping the application to server i and by keeping the new replica fresh. Hence, the bandwidth overhead of replication would be fully compensated by the benefits from the new replica within one time period until the next execution of the placement algorithm provided the demand patterns remain stable during this time. In fact, we might choose to replicate even if this ratio is less than one (e.g., if we are willing to trade bandwidth for latency), or only allow replication when this ratio exceeds a threshold that is greater than one (e.g., to safeguard against a risk that a fast-changing demand might not allow enough time to compensate for the replication overhead). Hence, the algorithm provides the redeployment threshold R, and replicates the application on server i if Bi /(A + U ) > R. One caveat is the possibility of a vicious cycle of creating a new replica without enough demand, which will then be deleted because of the deletion threshold. Thus, we factor in the deletion threshold into the replication decision and arrive at the following final replication criterion: To create a replica on server i, the demand from i’s vicinity, Bi , should be such that Bi /(A + U ) > R and Bi > D. The migration threshold M governs the migration decision. The application is migrated only if the fraction of demand from the target server exceeds M , B i /Btotal > M , and if the bandwidth benefit would be sufficiently high relative to the overhead, Bi /A > R. Note that the latter condition does not include the updates bytes because migration does not increase the number of replicas. To avoid endless migration back and forth between two servers, we require that the migration threshold be over 50%; we set it at 60% in our experiments.
3 An
alternative is to express it as the request rate. The implications and analysis of these two ways to express the deletion threshold are left for future work.
Computing on the edge: A platform for replicating Internet applications
65
DecidePlacement(): /* Executed by server s */ if loads > HW , offloading = Yes; if loads < LW , offloading = No; for each application app if Btotal ≤ D delete app unless this is the sole replica elseif Bi /(A + U ) > R AND Bi > D for some server i replicate app on server i elseif Bi /Btotal > M AND Bi /A > R for some server i if server i accepts app migrate app to server i endif endfor if no application was deleted, replicated or migrated AND offloading = Yes find out the least loaded server t from the central replicator while loads > LW AND not all applications have been examined let app be the unexamined application with the highest ratio of non-local demand, Btotal /Bs ; if Bs > D and Bt > D replicate app on t loads = loads − loads (app, t) else if server t accepts app migrate app to t loads = loads − loads (app) endif endif endwhile endif end
Figure 3. Replica placement algorithm. Loads denotes load on server s, loads (app, t) denotes load on server s due to demand for application app coming from clients in server t’s area, and loads (app) denotes load on server s due to application app.
The above considerations hold when the server wants to improve proximity of servers to client requests. Another reason for replication is when the current server is overloaded. In this case, it might decide to replicate or migrate some applications regardless of their proximity or their demand characteristics. So, if the server is overloaded, it queries the central replicator for the least-loaded server and, if the application cannot be replicated there (because the new replica might be deleted again), migrates the application to that server unconditionally. To add the stability to the system, we use a standard watermarking technique. There are two load watermarks, high watermark HW and low watermark LW . The server considers itself overloaded if its
66
Michael Rabinovich, Zhen Xiao, and Amit Aggarwal
load reaches the high watermark; once this happens, the server continues considering itself overloaded until its load drops below the low watermark. Again, one must avoid vicious cycles. The danger here is that after migrating an application to server i, the current server’s load drops to the normal range, and the application would then migrate right back to the current server (because its initial migration worsened the client proximity). To prevent this, the target server only accepts the migration request if its projected load after receiving the application will remain acceptable (that is, below low watermark). This resolves the above problem because the current server will not accept the application back. This also prevents a herd effect when many servers try to offload to an underloaded server at once. To allow load predictions, the source server must apportion its total load to the application in question, loads (app), and to the requests that come from the target server’s area, loads (app, t). As a crude estimate, we can apportion the total load in proportion to the number of relevant requests.
5.2
Request distribution algorithm
The goal of the request distribution algorithm is to direct requests to the nearest non-overloaded server with a replica of the application. However, an intuitive algorithm that examines the servers in the order of increasing distance from the client and selects the first non-overloaded server for the request (similar to the algorithm described in [2]) can cause severe load oscillations due to a herd effect [4]. Furthermore, our simulations show that randomization introduced by DNS caching may not be sufficient to eliminate the herd effect. The algorithm used in the RaDaR system [17] does not suffer from the herd effect but often chooses distant servers even when closer servers with low load are available. Thus, our main challenge was to find an algorithm that never skip the nearest non-overloaded server and yet reduce oscillations in request distribution. An additional challenge was to make the algorithm compatible with our system environment. We use iDNS as our load-balancing DNS server [2]. For a given application4 iDNS expects a request distribution policy in the form of tuples (R, P rob(1), . . . , P rob(N )), where R is a region and P rob(i) is the probability of selecting server i for a request from this region. Regions can be defined in a variety of ways and can be geographical regions (e.g, countries) or network regions (e.g., autonomous systems or BGP prefixes). For the purpose of this paper, we assume that each server i in the system is assigned a region Ri (represented as a set of IP addresses) for which this server is the closest. We also assume some distance metric between a region Ri and all servers that allows one to rank all servers according to their distance to a given region. The question of what kind of a distance metric is the most appropriate is a topic of active research in its own right; different CDNs use proprietary techniques to derive these rankings, as well as to divide client IP addresses into regions.
4 We
assume that every application uses a distinct (sub)domain name.
Computing on the edge: A platform for replicating Internet applications
67
for each region R do for every server j in the system P rob(j) = 0; endfor for each server ik with a replica of the application do if load(ik ) < LW P rob(ik ) = 1; elseif load(ik ) > HW P rob(ik ) = 0 else P rob(ik ) = (HW − load(ik ))/(HW − LW ) endif endfor residue = 1.0 Loop through the servers with a replica of the application in the order of increasing distance from region R for each such server ik do P rob(ik ) = residue ∗ P rob(ik ) residue = residue − P rob(ik ) endfor let total be the sum of the P rob array computed above if total > 0 for each server ik with a replica of the application P rob(ik ) = P rob(ik )/total endfor else for each server ik with a replica of the application P rob(ik ) = 1/n, where n is the number of replicas endfor endif output (R, P rob) endfor
Figure 4.
Algorithm for computing request distribution policy for a given application.
In our ACDN, the central replicator computes the request distribution policy in the above format and sends it to iDNS. The policy is computed periodically based on the load reports from ACDN servers (obtained by accessing load reporter scripts as discussed in Section 1.3), and also whenever the set of replicas for the application changes (i.e., after a replica deletion, migration or creation).5 The computation uses the algorithm shown in Figure 4.
5 Technically,
the policy is computed by a control module within iDNS that we modified; however, because this module can run on a different host from the component that actually answers DNS queries, we chose to consider the control module to be logically part of the central replicator.
68
Michael Rabinovich, Zhen Xiao, and Amit Aggarwal
Let the system contain servers 1, . . . , N , out of which servers i1 , . . . , in contain a replica of the application. The algorithm, again, uses low watermark LW and high watermark HW, and operates in three passes over the servers. The first pass assigns a probability to each server based on its load. Any server with load above HW gets zero weight. If the load of a server is below low watermark, the server receives unity weight. Otherwise, the algorithm assigns each examined server a weight between zero and unity depending on where the server load falls between the high and low watermarks. In the second pass, the algorithm examines all servers with a replica of the application in the order of the increasing distance from the region. It computes the probabilities in the request distribution policy to favor the selection of nearby servers. The third pass simply normalizes the probabilities of these servers so that they sum up to one. If all servers are overloaded, the algorithm assigns the load evenly among them.
6.
Performance
In this section, we evaluate the performance of ACDN using a set of simulation experiments. We first evaluate the effectiveness of the request distribution algorithm in achieving a good balance of load and proximity. We then study the effectiveness of our content placement algorithm in reducing bandwidth consumption and user latency.
6.1
Request distribution
A request distribution algorithm would ideally direct client requests to their nearby replicas in order to reduce latency. At the same time, it must avoid overloading a replica in a popular region with too many requests. The simulation was conducted in a system with three replicas with decreasing proximity to the clients: replica 1 is closest to the clients, replica 2 is the next closest, and replica 3 is the farthest. The high watermark and the low watermark for the replicas are 1000 and 200 requests per second, respectively. Initially, there are 10 clients in the system. Each client starts at a randomized time and sends 5 requests per second. Then we gradually increase the number of clients to study the behavior of the algorithm when the replicas are overloaded. After that, we gradually decrease the number of clients to the original level to simulate a situation where the “flash crowd” is gone. To simulate the effect of DNS caching, each client caches the result of replica selection for 100 seconds. The results are shown in Figure 5. The top figure plots the number of requests served by each replica in sequential 100-second intervals. The bar graph on the bottom plots the request distribution every 10 minutes. As can be seen from the figures, when the number of clients is small, the closest replica absorbs all the requests – the request distribution is determined by proximity. As the number of clients increases, load balancing kicks in and all replicas begin to share the load. Proximity still plays a role, however: note that replica 1 serves more requests than replica 2, which in turn serves more requests than replica 3. Also note that the load of none of the replicas ever exceeds the high watermark, which is usually set to reflect the processing capacity of the underlying server. When the number of clients decreases, the load in the three replicas decreases accordingly. Consequently,
Computing on the edge: A platform for replicating Internet applications
69
proximity starts playing an increasingly important role in the request distribution algorithm. When the load falls back to the original level, replica 1 again absorbs all the requests. The results indicate that our algorithm is efficient in utilizing proximity information while avoid overloading the replicas. As targets for comparison, we also simulated two other algorithms: a pure random algorithm and the algorithm previously used in CDN brokering [2]. The pure random algorithm distributes requests randomly among the replicas regardless of the proximity. As can be seen from Figure 6, it achieves perfect load balancing among the three replicas. However, the clients might suffer unnecessarily from high latency due to requests directed to remote replicas even during low load. The request redirection algorithm in the previous CDN brokering paper [2] works as follows: Select the closest replica whose load is less than 80% of its capacity. In our simulation, we set the capacity of all replicas to 1000 requests per second, the same as the high watermark used in ACDN. If no such replica exists, distribute the load evenly across all replicas. The results are shown in Figure 7. When the load is low, replica 1 absorbs all the requests. When the load increases, the algorithm exhibits oscillations between the first and the second replicas, while the third replica remains unutilized. Moreover, note that the load on replica 1 can go substantially above the high watermark. These results show that, although this algorithm also takes into account both load and proximity, it does not perform as well overall as our ACDN algorithm. This is because it relies on a single load threshold (i.e. 80%) to decide whether a replica can be selected, which makes it susceptible to the herd effect. Although residual requests due to DNS caching add randomization to load balancing, they were not sufficient to dampen the herd effect. In contrast, our algorithm uses a pair of high watermark and low watermark to gradually adjust the probability of server selection based on the load of the server.
6.2
Content placement
The goal of the content placement algorithm is to detect hot regions based on observed user demands, and replicate the application to those regions to reduce network bandwidth and client perceived latency. The simulation was conducted using the old UUNET topology with 53 nodes which we used for our previous study [17]. 10% of the nodes represent “hot” regions: they generate 90% of the requests in the system. The rest 90% of the nodes are “cold” regions and generate 10% of the requests. In this simulation, the request rates from each hot region and each cold region are 810 and 10 requests per second, respectively. The simulation starts with an 100 second warm-up period during which clients in each region start at randomized times. To simulate the effect of changing demand patterns, the set of hot regions changes every 400 seconds. The application size used in the simulation is 10M bytes. The size of an application response message is 20K bytes. Every minute we introduce an update into the system that needs to be propagated to all the replicas. The size of the update is 5% of the application size. The redeployment threshold used in the simulation is 4. The deletion
70
Michael Rabinovich, Zhen Xiao, and Amit Aggarwal 900 replica 1 replica 2 replica 3
800
Load (#reqs/sec)
700 600 500 400 300 200 100 0 0
40
20
120
100 80 60 Time (X 100 sec)
140
160
1800 replica 1 replica 2 replica 3
1600
Load (#reqs/sec)
1400 1200 1000 800 600 400 200 0 0
5
Figure 5.
10
15 Time (X 10 min)
20
25
30
Request distribution in ACDN.
threshold is set to half the redeployment threshold times the size of the application. The algorithm makes a decision whether it needs to replicate or migrate every 100 seconds. We compare our approach with two other algorithms: a static algorithm and an ideal algorithm. In the static algorithm, a replica is created when the simulation starts and is fixed throughout the simulation. In the ideal algorithm, we assume that the algorithm can get instantaneous knowledge as which regions are hot or cold and then replicates or deletes applications accordingly. It represents the optimal case which cannot be implemented in practice. The results of the simulation are shown in Figure 8. The top figure shows the amount of network bandwidth consumed in the simulation per second. This is measured as the product between the number of bytes sent and
Computing on the edge: A platform for replicating Internet applications
71
900 replica 1 replica 2 replica 3
800
Load (#reqs/sec)
700 600 500 400 300 200 100 0 0
40
20
120
100 80 60 Time (X 100 sec)
140
160
1800 replica 1 replica 2 replica 3
1600
Load (#reqs/sec)
1400 1200 1000 800 600 400 200 0 0
Figure 6.
5
10
15 Time (X 10 min)
20
25
30
Request distribution in the random algorithm.
the number of hops they travel. For example, if a replica sends 1000 bytes to a client which is 3 hops away, the amount of network bandwidth consumed is 3000 bytes. The bottom figure shows the average response latency among all clients in the system. We assume that the latency on each link in the topology is 10ms. For this preliminary study, we also assume that the processing overhead at the replicas is negligible. Both figures indicate that our ACDN algorithm can quickly adapt to the set of hot regions and significantly reduce network bandwidth and response latency. The spikes in the top figure are caused by the bandwidth incurred during the application replication
72
Michael Rabinovich, Zhen Xiao, and Amit Aggarwal 1500
Load (#reqs/sec)
replica 1 replica 2 replica 3
1000
500
0 0
120
100 80 60 Time (X 100 sec)
40
20
140
160
1800 replica 1 replica 2 replica 3
1600
Load (#reqs/sec)
1400 1200 1000 800 600 400 200 0 0
Figure 7.
5
10
15 Time (X 10 min)
20
25
30
Request distribution in the brokering system.
process.6 The migration algorithm was never triggered in this simulation because no region contributed enough traffic.
6.3
Redeployment threshold
Choosing an appropriate value for the redeployment threshold is essential for achieving good performance of the protocol. With a low threshold, more replicas will be created in the system. This allows more requests to be served efficiently by a 6 Note that in some part of the curve the ACDN algorithm appears to perform slightly better than the ideal algorithm. This is due to random fluctuation in the simulation.
Computing on the edge: A platform for replicating Internet applications
73
450 static ACDN ideal
400
Bandwidth (MBytes/sec)
350 300 250 200 150 100 50 0 100
300
500
700 Time (sec)
900
1300
1100
(a) Network Bandwidth Consumption 90 static ACDN ideal
80
Response Latency (ms)
70 60 50 40 30 20 10 0 100
300
500
700 Time (sec)
900
1100
1300
(b) Response Latency Figure 8.
Effectiveness of dynamic replication in ACDN.
nearby server, but increases the overhead for application replication and update propagation to all the replicas. On the other hand, a high redeployment threshold will result in fewer replicas, with less overhead due to application replication or updates, but also with less efficient processing of application requests. Finding the right threshold is not trivial as it depends on many factors: the size of the application, the sizes and frequency of its updates, the traffic pattern of user requests, etc. We did a preliminary experiment to explore this trade-off for a 100M bytes application in the UUNET topology. In this simulation, the request rates from each hot region and each cold region are 81 and 1 requests per second, respectively.
74
Michael Rabinovich, Zhen Xiao, and Amit Aggarwal 40
Bandwidth (GBytes)
35 30 25 20 15 10 5 0.004
Figure 9.
0.06 0.03 0.015 0.007 Redeployment Threshold
0.125
Effect of Redeployment Threshold
As before, we introduce an update into the system every minute that needs to be propagated to all the replicas. We used higher application size and lower request rates in this experiment to emphasize the effects of the overhead of creating and maintaining extra replicas relative to the benefits from increased proximity of replicas to requests. We vary the redeployment threshold and see its impact on the total amount of traffic on the network. The results are shown in Figure 9. The x-axis is the redeployment threshold used in the protocol, and the y-axis the total amount of bandwidth consumed in the entire simulation. The figure indicates that there is a “plateau” of good threshold values for this application: thresholds that are either too high or too low result in increased bandwidth consumption. As future work, we plan to investigate algorithms for adjusting the threshold automatically to optimize the performance of an application.
7.
Related Work
The importance of supporting dynamic content in a CDN has been recognized and several proposals have been described that address this problem to various extents. Most of this work concentrates on caching dynamic responses and on mechanisms of timely invalidation of the cached copies, or on assembling a response at the edge from static and dynamic components [5, 14, 19, 10]. The fundamental difference between ACDN and these approaches is that the former replicates the computation as well as the underlying data used by the application while the latter handles only responses and leaves the computation to the origin server. The Globule system [16] uses an object-oriented approach to content replication. It encapsulates content into special Globule objects, which include replication functionality and can be used to distribute static or dynamically generated content. Compared to our ACDN, Globule gives each object a flexibility to use its own policy for distribution and consistency maintenance while ACDN applies its policies to all hosted applications. On the other hand, Globule uses its own protocols to implement distributed objects and it requires compiling applications into Globule objects as well as modifying the Web server to be able to use Globule objects. Our ACDN is built entirely
Computing on the edge: A platform for replicating Internet applications
75
over HTTP, which simplifies firewall traversal, and works with existing applications and unmodified Web servers. As discussed in the introduction, one can run an ACDN on top of a general process migration system such as Ejasent and vMatrix [6, 1]. Finally, one can also run ACDN servers on top of a distributed file system where each server acts as a client of the global file system and where each file is replicated among CDN servers through caching within the file system. This approach replicates computation but is limited to only very simple applications since it does not replicate the environment, such as resident processes. Also, ensuring that different components of the application are always mutually consistent becomes difficult since consistency is maintained for each file individually. Turning to algorithms, ACDN involves two main algorithms - an algorithm for application placement and an algorithm for request distribution. Request distribution algorithms are closely related to load balancing and job scheduling algorithms. In particular, the issue of load oscillation that we faced has been well-studied in the context of load balancing (see. e.g., [4] and references therein). However, ACDN has to address the same issue in a new environment that takes into account client proximity in addition to server load. The algorithms by Fei et al. [7] and by Sayal et al. [20] use client-observed latency as the metric for server selection and thus implicitly account for both load and client proximity factors. Both algorithms, however, target clientbased server selection, which does not apply to a CDN. Many algorithms have also been proposed for content or server placement. However, most of them assume static placement so that they can afford to solve a mathematical optimization problem to find an “optimal” placement (see, e.g, [3, 21] and references therein). Even with empirical pruning, this approach is not feasible if content were to dynamically follow the demand. Some server placement algorithms use a greedy approach to place a given number of servers into the network. These algorithms still require a central decision point and are mostly suitable for static server placement. Our ACDN placement algorithm is incremental and distributed. Among the few distributed placement algorithms, the approach by Leff et al. [13] targets the remote caching context and does not apply to our environment where requests are directed specifically to servers that already have the application replica. The strategies considered by Kangasharji et al. [11] assume a homogeneous request pattern across all regions. Our algorithm can react to different demands in different regions and migrate applications accordingly. Finally, the strategy mentioned by Pierre et al. [15] places a fixed number of object replicas in the regions with the highest demand. Our algorithm allows a variable number of replicas depending on the demand and takes into account the server load in addition to client proximity in its placement decisions. Our ACDN content placement algorithm is an extension of our earlier RaDaR algorithm [17]. However, because ACDN replicates entire applications, its placement algorithm is different from RaDaR in that it takes into account the size of the application and the amount of application updates in content placement decisions.
76
8.
Michael Rabinovich, Zhen Xiao, and Amit Aggarwal
Conclusions
This paper describes an ACDN - a middleware platform for providing scalable access to Web applications. Accelerating applications is extremely important to CDNs because it represents CDNs’ unique value that cannot be offered by client-side caching platforms. ACDN relieves the content provider from guessing the demand when provisioning the resources for the application and deciding on the location for those resources. The application can be deployed anywhere on one server, and then ACDN will replicate or migrate it as needed using shared infrastructure to gain the economy of scale. We presented a system design and algorithms for request distribution and replica placement. The main challenge for the algorithms is to avoid a variety of “vicious cycles” such as endless creation and deletion of a replica, or migration of a replica back and force, or oscillations in request distribution, and yet to avoid too much deviation from optimal decisions in a given instance. Our preliminary simulation study indicated that our algorithms achieve promising results. To date, we only experimented with one read-only application as a testbed for ACDN [12]. In the future, we would like to gain more experience with the system by deploying a variety of applications on it. In particular, we would like to explore various ways to support user updates to the application data, from relying on a shared back-end database to possibly replicating these updates among application replicas in a consistent way.
Acknowledgments We would like to acknowledge Pradnya Karbhari who implemented the first version of the ACDN prototype and Fred Douglis for his involvement in the first prototype. We also thank the anonymous reviewers for comments on an early draft of the paper.
References [1] A. A. Awadallah and M. Rosenblum. The vMatrix: A network of virtual machine monitors for dynamic content distribution. In 7th Int. Workshop on Web Content Caching and Distribution (WCW 2002), Aug. 2002. [2] A. Biliris, C. Cranor, F. Douglis, M. Rabinovich, S. Sibal, O. Spatscheck, and W. Sturm. CDN brokering. In 6th Int. Workshop on Web Caching and Content Distribution, June 2001. [3] I. Cidon, S. Kutten, and R. Soffer. Optimal allocation of electronic content. In Proceedings of IEEE INFOCOM, pages 1773–1780, Los Alamitos, CA, Apr. 22–26 2001. IEEE Computer Society. [4] M. Dahlin. Interpreting stale load information. IEEE Transactions on Parallel and Distributed Systems, 11(10):1033–1047, Oct. 2000. [5] F. Douglis, A. Haro, and M. Rabinovich. HPP: HTML macro-preprocessing to support dynamic document caching. In Proceedings of the Symposium on Internet Technologies and Systems, pages 83–94. USENIX, Dec. 1997. [6] Ejasent, Inc. Ejasent web site. http://www.ejasent.com/, 2003.
Computing on the edge: A platform for replicating Internet applications
77
[7] Z. Fei, S. Bhattacharjee, E. W. Zegura, and M. H. Ammar. A novel server selection technique for improving the response time of a replicated service. In INFOCOM, pages 783–791, 1998. [8] S. Gadde, J. Chase, and M. Rabinovich. Web caching and content distribution: A view from the interior. In 5th Int. Web Caching and Content Delivery Workshop (WCW5), 2000. [9] A. V. Hoff, J. Payne, and S. Shaio. Method for the distribution of code and data updates. U.S. Patent Number 5,919,247, July 6 1999. [10] A. Iyengar and J. Challenger. Improving Web server performance by caching dynamic data. In Proceedings of the USENIX Symposium on Internet Technologies and Systems, pages 49–60, Berkeley, Dec. 8–11 1997. [11] J. Kangasharju, J. W. Roberts, and K. W. Ross. Object replication strategires in content distribution networks. In Proceedings of the Sixth Int Workshop on Web Caching and Content Distribution (WCW), 2001. [12] P. Karbhari, M. Rabinovich, Z. Xiao, and F. Douglis. ACDN: a content delivery network for applications (project demo). In Proceedings of ACM SIGMOD, pages 619–619, June 2002. [13] A. Leff, J. L. Wolf, and P. S. Yu. Replication algorithms in a remote caching architecture. IEEE Transactions on Parallel and Distributed Systems, 4(11):1185–1204, Nov. 1993. [14] Oracle Corporation and Akamai Technologies, Inc. ESI - accelerating e-business applications. http://www.esi.org/, 2001. [15] G. Pierre, I. Kuz, M. van Steen, and A. S. Tanenbaum. Differentiated strategies for replicating Web documents. Computer Communications, 24(2):232–240, Feb. 2001. [16] G. Pierre and M. van Steen. Globule: a platform for self-replicating Web documents. In 6th Int. Conference on Protocols for Multimedia Systems, pages 1–11, Oct. 2001. [17] M. Rabinovich, I. Rabinovich, R. Rajaraman, and A. Aggarwal. A dynamic object replication and migration protocol for an Internet hosting service. In 19th IEEE International Conference on Distributed Computing Systems (ICDCS ’99), pages 101–113. IEEE, May 1999. [18] M. Rabinovich and O. Spatscheck. Web Caching and Replication. Addison-Wesley, 2001. [19] M. Rabinovich, Z. Xiao, F. Douglis, and C. Kalmanek. Moving edge-side includes to the real edge—the clients. In Proceedings of the 4th USENIX Symposium on Internet Technologies and Systems, Mar. 2003. [20] M. Sayal, Y. Breitbart, P. Scheuermann, and R. Vingralek. Selection algorithms for replicated web servers. In Workshop on Internet Server Performance, June 1998. [21] A. Wierzbicki. Models for Internet cache location. In The 7th Int’l Workshop on Web Content Caching and Distribution (WCW), 2002.
This page intentionally left blank
SCALABLE CONSISTENCY MAINTENANCE FOR EDGE QUERY CACHES Exploiting templates in Web applications Khalil Amiri1 , Sara Sprenkle2 , Renu Tewari3 , and Sriram Padmanabhan4 1
4
Imperial College London, 2 Duke University, 3 IBM Almaden Research Center, IBM Santa Teresa Lab
Abstract
1.
Semantic database caching is a self-managing approach to dynamic materialization of “semantic” slices of back-end databases on servers at the edge of the network. It can be used to enhance the performance of distributed Web servers, information integration applications, and Web applications of oaded to edge servers. Such semantic caches often rely on update propagation protocols to maintain consistency with the back-end database system. However, the scalability of such update propagation protocols continues to be a major challenge. In this paper, we focus on the scalability of update propagation from back-end databases to the edge server caches. In particular, we propose a publish-subscribe like scheme for aggregating cache subscriptions at the back-end site to enhance the scalability of the ltering step required to route updates to the target caches. Our proposal exploits the template-rich nature of Web applications and promises signi cantly better scalability. In this paper, we describe our approach, discuss the tradeoffs that arise in its implementation, and estimate its scalability compared to naive update propagation schemes.
Introduction
The performance and scalability of Web applications continues to be a critical requirement for content providers. Traditionally, static caching of HTML pages on edge servers has been used to help meet this requirement. However, with a growing fraction of the content becoming dynamic and requiring access to the back-end database, static caching is by-passed as all the dynamically generated pages are marked uncacheable by the server. Dynamic data is typically served using a 3-tiered architecture consisting of a web server, an application server and a database; data is stored in the database and is accessed on-demand by the application server components and formatted and delivered to the client by the web server. In more recent architectures, the edge server (which includes client-side proxies, server-side reverse proxies, or caches within a content 79 F. Douglis and B.D. Davison (eds.), Web Content Caching and Distribution, 79-90. © 2004 Kluwer Academic Publishers. Printed in the Netherlands.
80
Khalil Amiri, Sara Sprenkle, Renu Tewari, and Sriram Padmanabhan
distribution network(CDN) [2]) acts as an application server proxy by offloading application components (e.g., JSPs, servlets, EJBeans) to the edge [12, 7]. Database accesses by these edge application components, however, are still retrieved from the back-end server over the wide area network. To accelerate edge applications by eliminating wide-area network transfers, we have recently proposed and implemented DBProxy, a database cache that dynamically and adaptively stores structured data at the edge [4]. The cache in this scenario is a persistent edge cache containing a large number of changing and overlapping “materialized views” stored in common tables. The scalability of such a scheme depends on how efficiently we can maintain consistency with the back-end database without undue processing at the back-end or network bandwidth overheads. Consistency in semantic caches has been traditionally achieved through two approaches. The first is timeout-based. Data in the cache is expired after a specified timeout period, regardless of whether it has been updated at the back-end or not. The second approach is update-based consistency which relies on propagating the relevant changed data to the caches where it is locally applied. Timeout-based consistency suffers from the disadvantage of increased latency and network message overheads as the cache needs to validate the data with the back-end or fetch any modified data. Update-based consistency propagates all the new data to the cache when any change happens, but suffers from serious scalability limitations both as the number of caches grow, and as the size of an individual cache, in terms of the number of cached views, increases. The scalability problem arises from the back-end overhead of “figuring out”, when a row changes, which of the target caches it must be forwarded to. One approach that was used initially in DBProxy was to propagate all changes at the back-end and apply them to the local cache. This increases the network bandwidth overhead and increases the size of the local cache. In this paper, we propose template-based filtering that efficiently aggregates cache subscriptions at the back-end by exploiting the similarity of “views” in edge query caches. Similar aggregations have been proposed for event matching (against client subscriptions) in publish subscribe systems. We discuss the specifics of our approach, describe how we handle dynamic changes in cache subscription, and highlight the challenges and trade-offs that arise in this space. The rest of the paper is organized as follows. We review the design of our prototype dynamic edge data cache in Section 2, and describe its consistency maintenance framework in Section 3. We review naive filtering schemes in Section 4 and describe our approach in Section 5. We discuss related work in Section 6 and summarize the paper in Section 7.
2. 2.1
Semantic caching over the Web DBProxy overview
We designed and implemented an edge data cache, called DBProxy [4], as a JDBC driver which transparently intercepts the SQL calls issued by application components (e.g., servlets) executed on the edge and determines if they can be satisfied from the
Scalable consistency maintenance for edge query caches
81
local cache (shown in Figure 1). To make DBProxy as self-managing as possible, while leveraging the performance capabilities of mature database management systems, we chose to design DBProxy to be: (i) persistent, so that results are cached across instantiations and crashes of the edge server; (ii) DBMS-based, utilizing a stand-alone database for storage to eliminate redundancy using common tables and to allow for the efficient execution of complex local queries; (iii) dynamically populated, populating the cache based on the application query stream without the need for pre-defined administrator views; and (iv) dynamically pruned, adjusting the set of cached queries based on available space and relative benefits of cached queries. Edge application Servlet SQL (JDBC)
DBProxy JDBC driver
JDBC interface Query Parser Query Evaluator
Cache Index
Query Matching Module Resource Manager
Catalog
Consistency Manager
Cache Repository
to back−end
Figure 1. DBProxy key components. The query evaluator intercepts queries and parses them. The cache index is invoked to identify previously cached queries that operated on the same table(s) and columns. A query matching module establishes whether the new query’s results are contained in the union of the data retrieved by previously cached queries. A local database is used to store the cached data.
2.2
Common Local Store
Data in a DBProxy edge cache is stored persistently in a local stand-alone database. The contents of the edge cache are described by a cache index containing the list of queries. To achieve space efficiency, data is stored in common-schema tables whenever possible such that multiple query results share the same physical storage. Queries over the same base table are stored in a single, usually partially populated, cached copy of the base table at the origin server. Join queries with the same “join condition” and over the same base table list are also stored in the same local table. This scheme not only achieves space efficiency but also simplifies the task of consistency maintenance, as discussed below. When a query is “worth caching”, a local result table is created (if it does not already exist) with as many columns as selected by the query. The column type and metadata information are retrieved from the back-end server and cached in a local catalog cache. For example, Figure 2 shows an example local table cached at the edge. The local ‘item’ table is created just before inserting the three rows retrieved by query Q1 with the primary key column (id) and the two columns requested by the query (cost and msrp). All queries are rewritten to retrieve the primary key so that identical rows in the cached table are identified. Later, and to insert the three rows retrieved by Q2 , the table is altered if necessary to add any new columns not already
82
Khalil Amiri, Sara Sprenkle, Renu Tewari, and Sriram Padmanabhan
created. Next, new rows fetched by Q2 are inserted (id = 450, 620) and existing rows (id = 340) are updated. Note also that since Q2 did not select the cost column, a NULL value is inserted for that column. Cached queries in DBProxy are organized according to a multi-level index of schemas, tables and clauses for efficient matching. Cached item table: id
cost
msrp
✚✝
✜✟
✘☎
✕
✕
✕
✕
✕
✕
✕
✕
✕
✕
✕
✕
Retrieved by Q1 ✖✂
✙✚✆✝
✛✜✞✟
✗✘✄☎
✖✂
14 ✗✄
5
8 ✙✆
✛✞
✖✂
✣✡ ✭✔
✖✂ ✬✓ ✢✣✠✡ ✭✔
22
15 ✪✑
120
SELECT cost, msrp FROM item ✖✂
✬✓ ✫✒
✪✫✑✒
✬✓
✩✏
✧✍
★✎✩✏
✦✧✌✍
✭✔
WHERE cost BETWEEN 14 AND 16 ✖✂
✢✠
✕
✕
✕
✕
✕
✕
✖✂
✥☞
✖✂
★✎
✤☛
Retrieved by Q2
✤☛✥☞
✖✂
340 ★✎
★✎
16 ✦✌
13 ✤☛
✤☛ ✕
✕
✕
✕
✕
✕
✖✂
✖✂
✖✂
450
NULL
18
620
NULL
20
770
35
30
880
45
40
SELECT msrp FROM item WHERE msrp BETWEEN 13 AND 20
Inserted by consistency protocol
Figure 2. Local storage. The local item table entries after the queries Q1 and Q2 are inserted in the cache. The rst three rows are fetched by Q1 and the last three are fetched by Q2 . Since Q2 did not fetch the cost column, NULL values are inserted. The bottom two rows were not added to the table as a part of query result insertion, but by the update propagation protocol which re ects UDIs performed on the origin table. In the original design, changes are forwarded from the origin to the edge cache whether or not they match cached query predicates.
3.
Consistency Management
Read-only queries received by DBProxy are satisfied from the local cache, whenever possible. Updates are always passed to the back-end database directly, bypassing the cache. The effects of updates trickle back to the edge cache, through a push or a pull approach. While many Web applications can tolerate slightly stale data in the edge cache, they are nevertheless interested in reasonable consistency guarantees. For example, applications usually require that they observe the effects of their own updates on an immediate subsequent query. Since a query following an update can hit locally in a stale cache, updates not yet reflected in the cache would seem to have been lost, resulting in strange application behavior. Specifically we assume that three consistency properties hold. First, the cache guarantees lag consistency with respect to the origin. This means that the values of the tuples in the local cache equal those at the back-end at an earlier time, although the cache may have less tuples. Second, DBProxy exhibits monotonic state transitions, i.e, it does not allow an application to see a relatively current database state, then see a previous state corresponding to a point earlier in time. Finally, DBProxy supports immediate visibility of local updates where the result of a later read by an application will show the effect of the local update it had committed earlier. To achieve these properties, DBProxy relies on protocols that force the cache to pull updates from the server on certain events, and also may decide to bypass the cache in
Scalable consistency maintenance for edge query caches
83
others. The details of the consistency protocols and criteria are not the focus of this paper, but are more fully described in [3]. Suffice it to say, here, that regardless of the particular criteria, changed data must be efficiently propagated from the back-end database to the edge caches. In this section, we discuss two alternative approaches to update propagation, and describe a server-side filtering architecture which can be used to propagate changes in back-end tables only to the caches that require them.
3.1
Update propagation approaches
To maintain consistency, DBProxy relies on a data propagator, which captures all UDIs (updates, deletes, inserts) to the tables at the origin and forwards them to the edge caches either in their entirety or after filtering. Zero-filtering propagation. This approach, which we initially implemented in DBProxy, propagates all changes to the edge caches, regardless of whether or not they match cached predicates. This places no filtering load on the server. Data changes are propagated to the edges tagged by their transaction identifiers and applied to the edge cache in transaction commit order. Since cached data is maintained as partially populated copies of back-end tables, changes committed to the base tables at the origin can be applied “as is” to the cached versions, without the need to re-execute the queries. Future queries that will execute over the cache will retrieve from these newly propagated changes any matching tuples. This solution presumes slowly changing data with few updates which is typical of some web environments. Server-side filtering. The problem with propagation based consistency with zero filtering is that all changes to the back-end are propagated regardless of whether they are required by the cache or not. This not only wastes network bandwidth but also increases the size of the local cache database. Periodic garbage collection is required to clean the unaccessed tuples. An alternative approach is to filter the tuples that change in the back-end database and forward to each cache only the “relevant” ones. This filtering step can place, however, high load on the back-end site as the number of caches and the number of views per cache increase. The rest of the paper will describe filtering in more detail, and suggest an approach to make it more scalable.
4.
Basic filtering
Notation. Before we discuss the details of filtering, we first define a few notational conventions to simplify the rest of the discussion. For each query in the cache we have a subscription at the back-end. We express a subscription S as a 3-tuple S = (T, A, P ), where T is the table name(s) accessed by the query. In case of a join, T may contain two or more tables. A is the set of attributes or columns projected by the query, and P is the search predicate. P is assumed to be in AND-OR normal form. When expressed in its AND-OR normal form, we denote by Ci the ith disjunct in that expression. Each disjunct contains several predicate terms (e.g., cost < 15) ANDed together. These predicate terms are atomic conditions, such as equality or inequality predicates over columns.
84
Khalil Amiri, Sara Sprenkle, Renu Tewari, and Sriram Padmanabhan
Let N be the number of edge caches. For cache i, the set of subscriptions are denoted by Si , and the number of subscriptions are therefore equal to |Si |. The j th subscription of cache i is, therefore, Sij = (Tij , Aij , Pij ). Overview of Filtering. In this scheme, caches “subscribe” to a number of “logical” update streams. In particular, each cached view or query corresponds to one subscription. A filtering server at the back-end site manages subscriptions for the edge caches. Subscriptions are dynamic, that is they change as new views are added or evicted. The filtering server, therefore, has to test each newly changed tuple (or row in a table) against all the i=1..N |Si | subscriptions. This results in a linear search overhead as the number of caches, N , increases, and as the number of subscriptions, |S i |, per cache increases. new Precisely, for each tuple tk let us assume that told is the k is the old value and tk old value after a change (i.e., a UDI). In case of a newly inserted tuple t k is NULL, is NULL. Assuming there is a single cached similarly, when a tuple is deleted tnew k query denoted by Sij with predicate Pij , then the filtering algorithm decides to route tuple tk to the target cache if either of the following two conditions hold: tnew ∈ Pij k new told ∈ / Pij k ∈ Pij AND tk
In the first case the tuple is inserted or updated in the cache. In the second case the tuple is either deleted or updated. The pseudo-code of the filtering algorithm is shown below. filter (TUPLE Told, TUPLE Tnew, CACHE i) begin for every SUBSCRIPTION Sij from CACHE i { for every DISJUNCT C in Pij { if( C.matches(Tnew) ) return TUPLE_MATCH; else if ( C.matches(Told) ) return TUPLE_MATCH; } } return TUPLE_NO_MATCH; end
The filtering procedure above is invoked once for each cache, for each tuple updated by a transaction in the back-end server. For each subscription predicate P ij = C1 ∨ C2 ∨ ... ∨ Cm , the tuple is tested against each of the Ci ’s, and is forwarded to the edge cache if it matches any of them. Note that each of the Ci ’s is expressed as a bunch of atomic attribute tests (of the form attr {=,