127 10 24MB
English Pages 404 [386] Year 1999
Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis and J. van Leeuwen
1718
springer Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Singapore Tokyo
Michel Diaz Philippe Owezarski Patrick Senac (Eds.)
Interactive Distributed Multimedia Systems and Telecommunication Services 6th International Workshop, IDMS'99 Toulouse, France, October 12-15, 1999 Proceedings
8 Springer
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Michel Diaz Philippe Owezarski Centre National de la Recherche Scientifique Laboratoire d'Analyse et d'Architecture des Systemes 7, avenue du Colonel Roche, F-31077 Toulouse Cedex 04, France E-mail: {michel.diaz/philippe.owezarski}©laas.fr Patrick Senac ENSICA F-31077 Toulouse Cedex, France E-mail: [email protected] Cataloging-in-Publication data applied for Die Deutsche Bibliothek - CIP-Einheitsaufimhme Interactive distributed multimedia systems and telecommunications: 6th international woikshop; proceedings / IDMS "99, Toulouse, France, October 12 15,1999. Michael Diaz... (ed.). - Berlin; Heidelberg; New Yoric; Barcelona ; Hong Kong; London; M i l ^ ; Paris; Singapore; Tolcyo: Springer, 1999 (Lecture notes in computer science; Vol. 1718) ISBN 3-540^6595-1
CR Subject Classification (1998): H.5.1, C.2, H.4, H.5 ISSN 0302-9743 ISBN 3-540-66595-1 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on inicrofilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provi-sions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1999 Printed in Germany Typesetting: Camera-ready by author SPIN: 10704591 06/3142 - 5 4 3 2 1 0
Printed on acid-free paper
Preface
The 1999 International Workshop on Interactive Distributed Multimedia Systems and Telecommunication Services (IDMS) in Toulouse is the sixth in a series that started in 1992. The previous workshops were held in Stuttgart in 1992, Hamburg in 1994, Berlin in 1996, Darmstadt in 1997, and Oslo in 1998. The area of interest of IDMS ranges from basic system technologies, such as networking and operating system support, to all kinds of teleservices and distributed multimedia applications. Technical solutions for telecommunications and distributed multimedia systems are merging and quality-of-service (QoS) will play a key role in both areas. However, the range from basic system technologies to distributed mutlimedia applications and teleservices is still very broad and we have to understand the implications of multimedia applications and their requirements for middleware and networks. We are challenged to develop new and more fitting solutions for all distributed multimedia systems and telecommunication services to meet the requirements of the future information society. The IDMS'99 call for papers attracted 84 submissions from Asia, Australia, Europe, North America, and South America. The program committee (PC) members and additional referees worked hard to review all submissions such that each contribution received three reviews. Based on the comments and recommendations in these reviews, the PC carried out an online meeting over the Internet that was structured into two discussion and ballot phases. For this, we used a full conference organization system, called ConfMan, developed in Oslo for the previous IDMS, that combines the World Wide Web and e-mail with a database system, and enforces security, privacy, and integrity control for all data acquired, including the comments and votes of the PC. The final result of the discussions and ballots of the PC online meeting was the final program, and the only additional task for us as program co-chairs was to group the selected papers and structure them into sessions. At IDMS'99 a high-quality program with 4 tutorials, 3 invited papers, 25 full papers and 3 position papers, coming from 13 countries, included topics as user aspects, QoS, distributed applications, new networks, multimedia documents, platforms for collaborative systems, storage servers, flow and congestion control. This technical program enabled IDMS'99 to follow the tradition of previously very successful IDMS workshops. We would like to express our deepest gratitude to the organizers of the previous IDMS workshops for their confidence, which allowed us to take on the responsibility of having IDMS'99 in Toulouse, and particularly to the previous organizers, Thomas Plagemann and Vera Goebel, who gave us very valuable information and help. We would like to acknowledge the cooperative eiforts of ACM and IEEE, and the financial support from CNRS, Region Midi-Pyrenees, DGA, France Telecom and Microsoft, which allowed us to keep the fees of IDMS'99 affordable and to offer a very interesting technical and social program. Finally, we would like to thank all the people that helped
VI
us here in Toulouse, and particularly, Daniel Daurat, Sylvie Gay, Marie-Therese Ippolito, Joelle Penavayre, Marc Boyer, Jean-Pierre Courtiat, Laurent Dairaine, and Pierre de Saqui-Sannes.
July 1999
Michel Diaz Philippe Owezarski Patrick Senac
Organization
P r o g r a m Co-Chairs Michel Diaz Philippe Owezarski Patrick Senac
LAAS-CNRS, Toulouse, France LAAS-CNRS, Toulouse, France ENSICA, Toulouse, France
Program Committee H. Affifi P.D. Amer E. Biersack G. V. Bochmann B. Butscher A.T. Campbell T.S. Chua J.-P. Courtiat J. Crowcroft L. Delgrossi C. Diot R. Dssouli M. Dudet W. EfFelsberg F. Eliassen S. Fdida D. Ferrari V. Goebel T. Helbig J.-P. Hubaux D. Hutchison W. Kalfa T.D.C. Little E. Moeller K. Nahrstedt G. Neufeld B. Pehrson T. Plagemann B. Plattner L.A. Rowe H. Scholten A. Seneviratne R. Steinmetz
ENST Bretagne, France University of Delaware, USA Institut Eurecom, France University of Ottawa, Canada GMD FOKUS, Germany Columbia University, USA University of Singapore, Singapore LAAS-CNRS, France University College London, UK University Piacenza, Italy Sprint ATL, USA University of Montreal, Canada CNET France Telecom, France University of Mannheim, Germany University of Oslo, Norway LIP6, France University of Piacenza, Italy UNIK, Norway Philips, Germany EPFL, Switzerland Lancaster University, UK TU Chemnitz, Germany Boston University, USA GMD FOKUS, Germany University of Illinois, USA UBC Vancouver, Canada KTH Stockholm, Sweden UNIK, Norway ETH Zurich, Switzerland University of California at Berkeley, USA University of Twente, The Netherlands UTS, Australia GMD, Germany
VIII
J.B. Stefani L. Wolf M. Zitterbart
CNET France Telecom, France TU Darmstadt, Germany TU Braunschweig, Germany
Referees S. Abdelatif H. Affifi P.D. Amer J.C. Arnu P. Azema V. Baudin P. Berthou E. Biersack G. V. Bochmann M. Boyer B. Butscher A.T. Campbell J.M. Chasserie C. Chassot T.S. Chua L. Costa J.-P. Courtiat J. Crowcroft L. Delgrossi M. Diaz C. Diot A. Dracinschi
K. Drira R. Dssouli M. Dudet W. EiTelsberg F. Eliassen S. Fdida D. Ferrari 0 . Fourmaux F. Garcia T. Gayraud V. Goebel T. Helbig J.-P. Hubaux D. Hutchison W. Kalfa T.D.C. Little A. Lozes E. Moeller C. Morin K. Nahrstedt G. Neufeld R. Noro
P. Owezarski S. Owezarski B. Pehrson T. Plagemann B. Plattner B. Pradin V. Roca L. Rojas Cardenas L.A. Rowe P. Sampaio P. de Saqui-Sannes H. Scholten G. Schiirmann P. Senac A. Seneviratne M. Smirnov R. Steinmetz J.B. Stefani J.P. Tubach T. Villemur L. Wolf M. Zitterbart
Local Organization Marc Boyer Laurent Dairaine Daniel Daurat Sylvie Gay Marie-Therese Ippolito Joelle Pennavayre Pierre de Saqui-Sannes
LAAS-CNRS, Toulouse, France ENSICA, Toulouse, France LAAS-CNRS, Toulouse, France ENSICA, Toulouse, France LAAS-CNRS, Toulouse, France LAAS-CNRS, Toulouse, France ENSICA, Toulouse, France
S u p p o r t i n g / S p o n s o r i n g Organizations ACM IEEE DGA
CNRS Region Midi-Pyrenees France Telecom
Microsoft Research Xerox
Table of Contents
Invited P a p e r The Internet 2 QBONE-Project - Architecture and Phase 1 Implementation P. Emer
1
Network QoS Hardware Acceleration inside a Differentiated Services Access Node J. Forsten, M. Loukola, J. Skytta
3
REDO RSVP: Efficient Signalling for Multimedia in the Internet L. Mathy, D. Hutchison, S. Schmid, S. Simpson
17
QoS-aware Active Gateway for Multimedia Communication K. Nahrstedt, D. Wichadakul
31
Application QoS A Study of the Impact of Network Loss and Burst Size on Video Streaming Quality and Acceptability D. Hands, M. Wilkins
45
Transport of MPEG-2 Video in a Routed IP Network: Transport Stream Errors and Their Effects on Video Quality L.N. Cai, D. Chiu, M. McCutcheon, M.R. Ito, G. W. Neufeld
59
Specification and Realization of the QoS Required by a Distributed Interactive Simulation Application in a New Generation Internet C. Chassot, A. Lozes, F. Garcia, L. Dairaine, L. Rojas Cardenas
75
M b o n e and Multicast A Multicasting Scheme Using Multiple MCSs in ATM Networks T.Y. Byun, K.J. Han A Platform for the Study of Reliable Multicasting Extensions to CORBA Event Service J. Orvalho, L. Figueiredo, T. Andrade, F. Boavida MBone2Tel - Telephone Users Meeting the MBone R. Ackermann, J. Pommnitz, L.C. Wolf, R. Steinmetz
93
107
121
X
Quality of Service Management for Teleteaching Applications Using the MPEG-4/DMIF G. V. Bochmann, Z. Yang
133
Invited P a p e r IP Services Deployment: A US Carrier Strategy C. Diot
147
Adaptive Applications and Networks Network-Diffused Media Scaling for Multimedia Content Services O.E. Kia, J.J. Sauvola, D.S. Doermann
149
Extended Package-Segment Model and Adaptable Applications Y. Wakita, T. Kunieda, N. Takahashi, T. Hashimoto, J. Kuboki
163
Tailoring Protocols for Dynamic Network Conditions and User Requirements R. De Silva, A. Seneviratne
177
Design of an Integrated Environment for Adaptive Multimedia Document Presentation Through Real Time Monitoring E. Carneiro da Cunha, L. F. Rust da Costa Carmo, L. Pirmez
191
New Trends in IDMS Authoring of Teletext Applications for Digital Television Broadcast C. Fuhrhop, K. Hofrichter, A. Kraft
205
Middleware Support for Multimedia Collaborative Applications over the Web: A Case Study M. Pinto, M. Amor, L. Fuentes, J.M. Troya
219
The Virtual Interactive Presenter: A Conversational Interface for Interactive Television M. Cavazza, W. Perotto, N. Cashman
235
Advances in Coding A Video Compression Algorithm for ATM Networks with ABR Service, Using Visual Criteria S. Felici, J. Martinez
245
Content-Fragile Watermarking Based on Content-Based Digital Signatures J. Dittmann, A. Steinmetz, R. Steinmetz
259
XI
Invited Paper New Structures for the Next Generation of IDMS D. Bultermann
273
Conferencing Multi-drop VPs for Multiparty Videoconferencing on SONET/ATM Rings - Architectural Design and Bandwidth Demand Analysis G. Feng, T.S.P. Yum
275
A Generic Scheme for the Recording of Interactive Media Streams V. Hilt, M. Mauve, C. Kuhmnch, W. Effelsberg
291
A Framework for High Quality/Low Cost Conferencing Systems M. Benz, R. Hess, T. Hutschenreuther, S. Kmmel, A. Schill
305
Video Servers A Novel Replica Placement Strategy for Video Servers J. Gafsi, E. W. Biersack Network Bandwidth Allocation and Admission Control for a Continuous Media File Server D. Makaroff, G. W. Neufeld, N. Hutchinson Design and Evaluation of Ring-Based Video Servers C. Guittenit, A. M'zoughi
321
337 351
Position Papers Pricing for Differentiated Internet Services Z. Fan, E. Stewart Lee
365
An Agent-Based Adaptive QoS Management Framework and Its Applications M. Kosuga, T. Yamazaki, N. Ogino, J. Matsuda
371
Improving the Quality of Recorded Mbone Sessions Using a Distributed Model L. Lamhrinos, P. Kirstein, V. Hardman
377
The Internet 2 QBONE Project Architecture and Phase 1 Implementation Phillip Emer NCState.net fe North Csirohna Networking Initiative
A b s t r a c t . As IP-based real-time services move into the mainstream of nationcil and intemationcJ communications infrastructure, the need for a differentiated services framework becomes more important. The Internet 2 QBONE group is focused on building such a differentiated services framework atop NSF supported high performance networks - namely, Abilene cind the vBNS. The QBONE proposes to offer an expedited forWcirding (EF) service from campus edge to campus edge. This EF service includes marking and admission control at the ingress edge cind forwarding and rate shaping in the core transit network and at the egress edge. Further, the QBONE framework includes a bandwidth broker (BB) function for negotiating diffserv behaviours across domciin (AS) boundciries. FincJly, QBONE domains are instrumented with meeisuring and monitoring equipment for verification and traffic profiling (cind resecirch). In this paper, we describe the phase one QBONE eirchitecture, some exercising applications, and some preliminary results. Some results are based on applications sourced in the North Carohna Networking Initiative's (NCNI) NC GigaPOP. Other results are based on interoperability tests performed in association with NC State University's Centennial Networking Labs (CNL).
Hardware Acceleration inside a Differentiated Services Access Node Juha Forsten, Mika Loukola, and Jorma Skytta Helsinki University of Technology, P.O. Box 3000, FrN-02015 HUT, FINLAND, {juha.forsten, mika-loukola, joraia.skytta} @hut.fi
Abstract This paper describes the hardware implementation of Simple Integrated Media Access (SIMA) access node. SIMA introduces Quality of Service (QoS) in IP networks by using the IPv4 Type of Service (TOS) field as priority and real-time indication field. Hardware acceleration is used to speed up the SIMA cal-culations.
1 Introduction SMA is a Differentiated Services (DS) [I] scheme with detailed access node and core network node functions. [2] The SIMA access node has to calculate the momentary bit rate (MBR) in order to determine how well the user traffic flow confirms to the service level agreement (SLA). The access node can calculate the priority of the IP packet from the relation between the MBR and the nominal bit rate (NBR). The NBR value is indicated in the SLA. After inserting the priority in the IPv4 TOS field the header checksum must be recalculated. As the link speeds increase the time left for these calculations decrease and thus new methods for high-speed calculations are needed. In this implementation the SIMA specific calculations are performed in the special hardware acceleration card. The parallelism between the CPU and the SIMA hardware acceleration card leaves the processor free to handle other tasks such as the packet capture/send and SLA database management.
2 SIMA Specification SIMA splits the DS field [3] into a 3-bit priority field and a 1-bit real-time/non-realtime indication bit. The QoS offered to the customer is determined by MBR, NBR, and real-time bit (RT). The value of MBR is the only parameter that keeps changing in an active flow. The other two (NBR and RT) are static parameters that are agreed upon in the SLA. The SIMA access node may also mark the RT value based on the application requirements. When RT service is requested, the SIMA network attempts
to minimize the delay and delay variation. This is achieved by two queues in the core network routers as indicated in Figure 1.
Core Network Node
*KT ^ -^-NKT
PU =
r)
v(*n7+(i«o7
output pott
real-time queue
on}—»
*fii I
User ptwacss jI/D-ReadAWrite
[
[M«cn -Reaid/Write
|
:|il^^$l0rt$: :
Oxdopaq -
0XA3ttf
Mapped FPGA^and. OxaXXXKX GardYYYYY
-
-
-'
-
'
"
.
.
.
-
-
'
•
'
rFPGA.carcl
F ^ . 5. Accessing the FPGA Card in the Linux Environment
\
4 Floating Point Notation In the early prototype integer values were used for SIMA parameters. However the number of bits was limited and insufficient for the needed granularity. This led to very coarse-grained p and a values. As a result the allocated priorities did not feature any dynamics and saturated to some static values. Based on those test cases it was obvious that some special floating point data type was needed. We came up this four different floating point notations presented in Table 2. The e2m4 notation indicates that total six bits are used for the data type, two bits for exponent and four bits for mantissa (see Figure 6). The integer data type was sufficient for MBR values as test customers featured low bit rates. However there is scheme to expand the range of MBR with a similar floating point notation. Table 2. SIMA Parameters Types
Parameter p a Al/L C (kB/s) MBR
Type e2m4 e2m3 e2m3 e4m0 int (8-bit)
Min value 0.0001 0.0001 0.0001 10°= 1 0
e2m4
EXP (2 bit)
MAN (4 bit)
e2m3
EXP (2 bit)
MAN (3 bit)
e4m0
EXP (4 bit)
Max value 0.9 0.8 0.8 10" 255
1
Fig. 6. SIMA Floating Point Data Types
The main program running on Linux is written in C language. The data types of SIMA parameters in main program as well as in the SLA are standard C floating point data types. Before sending the parameters to the SRAM on hardware acceleration card the parameters must be converted to the notation used in the FPGA. Similarly as the parameters have been transferred back to the main program there is a need for reverse conversion. Examples of those conversion functions are presented for e2m4 data type. In the reverse conversion the values of the e2m4 data type is the index to a table that returns a standard C float.
10
Conversion Fucntionfromfloatto e2m4 unsigned char float_to_e2m4(float f) { int man,exp,i; if(f >= 0.9) { man=9; exp=3; } else { if(f =0;i—) { f=(float) (f * (float) 10.0); if(f>= (float) 1.0) { man=(int) (f + 0.5); exp=i; break; } } } } return(man+exp*16) ; }
Initialization of the Conversion Tablefrome2m4 to float
for(j=0;jShaKleT Q TMfll tiu used ill QDS huxfier p Ibtal tine nsed for handingflUitquests before contadiiig mcda server Q Ibtat Uiw usedfit(he acfive gtfeway for saidiig a lequest aidreoetvii^READYJID_SGND packetficDithe nxda sefwr Q Ibtal tioK oaed at the diertfiomseiidiiig lequesti to the active gjaeway utfil leceiviiig READY_TO_,SEND pactot *i
#2
»3
#4
Fig. 6. Configuration delays in each component for handling a new connection request (without authentication) in the Primary Active Gateway (Scenario 1)
3. Experiment: This experiment runs the Scenario 1, and measures configuration time, dynamic service delegation and linking time at the primary active gateway as well as the end-to-end configuration time. The results in Figure 6 show that the dynamic service delegation and linking times used in the QoS handler are similar to the values shown in Figure 5. The difference is in the endto-end configuration times due the absence of the authentication. Evaluation of Experiments: We will classify the achieved results into two categories, and discuss each category separately. The first category comprises the results of experiments which do not perform authentication at each configuration and reconfiguration request. The second category includes the results of experiments which perform authentication check at each configuration and reconfiguration request. 1. Category: For this set of experiments (Experiment 3), the results indicate that using active network concept to configure and reconfigiu-e active gateways (primary or substitute), using dynamic loading service, is applicable and we can provide flexible customization of QoS services at the gateways for multimedia communication. The flexible customization of QoS services can be done within required time bounds below 5 seconds. A larger configuration delay is acceptable if the quality of the connection improves. Furthermore, our goal is to bound the configuration time, so that the configuration can happen during the multimedia transmission if a new QoS service needs to be installed or another substitute active gateway needs to be reconfigured without letting the user know. In this case the duration of the configuration and reconfiguration time influences how long a multimedia transmission service will be disrupted and QoS degradation may occur. The current delays indicate acceptable timing delays within 1-2 seconds and perceptually the user of the VOD application does not notice major QoS degradations.
43
2. Category: The results in Experiments 2 show large delays for authentication, hence large and unacceptable delays for the overall end-to-end configuration times. Because of the unacceptable delays, the authentication upon configuration request is not performed on the current active gateways. As mentioned above, the large overheads are due to the current available JDK 1.1.5 Security Package and we believe that the next generation of JDK seciuity implementation will improve on these overheads and authentication will become part of the active gateway framework to ensiure a secure active environment. In between, there are several alternative solutions: (a) establish two connections between the client and media server through primary and substitute active gateways and perform authentication only during the initial connection setup where the large end-to-end configuration delay does not interfere with the multimedia transmission; (b) establish one connection between the client and the media server through the primary active gateway and perform authentication only during the initial connection setup. In case the primary gateway goes down, we inform the user, tear down the connection and start establishment of a new connection with authentication through the substitute gateway; (c) a centralized security management authenticates ahead users' accesses to any gateway within a considered network and gives the users permission to establish connections and configure QoS services any time. The suggested solutions have different trade-offs among number of connections to keep up, reconfiguration times, management complexity and scalability.
6. Conclusions The major contribution of this work is the provision of a flexible active gateway architecture using the active network concepts. Our active gateway architecture allows flexible configuration and reconfiguration of QoS services for multimedia communication in a timely fashion. Our objective, to bound the timing duration of the configuration and reconfiguration delays so that this active capability can be performed during the multimedia transmission with minimal QoS degradation was achieved. Oiu" results show that the active gateway framework without security can be timely bound below 2 seconds in LAN environment, and acceptable end-to-end configuration delays as well as dynamic service delegation and linking times can be achieved to support minimal or no QoS degradation.
References 1. Tennenhouse D., et al., "From Internet to ActiveNet," MIT, 1996. 2. Tennenhouse DL and W. DJ, 'Towards an Active Network Architecture," Multimedia Computing and Networking (MMCN96), 1996.
44
3. Wetherall, D., J. Guttag, and D.L. Tennenhouse, "ANTS: A Toolkit for Building and Dynamically Deploying Network Protocols," IEEE OPENARCHVS, April 1998. 4. Van, v., "A Defense Against Address Spoofing Using Active Networks," MIT, 1997. 5. Legedza, U., D. J. Wetherall, and J. Guttag, "Improving The Performance of Distributed Applications Using Active Networks," lEEEINFOCOM' 98,1998. 6. Li-wei H. Lehman, Stephen J. Garland, and D.L. Tennenhouse, "Active Reliable Multicast," IEEEINFOCOM'98,1998. 7. Smith, J., et al., "SwitchWare: Towards a 21st Century Network Infrastructure," Department of computer and information science. University of Pennsylvania. 8. Smith, J., et al., " SwitchWare: Accelerating Network Evolution (white paper)," Department of computer and information science. University of Pennsylvania. 9. Michael Hicks, et al., "PLAN: A Programming Language for Active Networks," submitted to/CFPPS. 1998. 10.D. Scott Alexander, et al., "Active Bridging," Proceedings of the ACM SIGCOMM'97. September 1997. Cannes, France. ll.Bhattacharjee, S., K.L. Calvert, and E.W. Zegura, "Implementation of an Active Networking Architecture," 1996, College of Computing, Georgia Institute of Technology: Atlanta, GA. ll.Bhattacharjee, B., K.L. Calvert, and E.W.Zegura, "On Active Networking and Congestion," 1996, College of Computing, Georgia Institute of Technology: Atlanta, GA. 13.Yemini, Y. and S. Silva, 'Towards Programmable Networks (white paper)," to appear in. IFIP/IEEE International Workshop on Distributed Systems: Operations and Management, October 1996, L'Aquila, Italy. 14.Beverly Schwartz, et al., "Smart Packets for Active Networks," in OpenArch, March 1999. 15.D. Scott Alexander, et al, "Active Network Encapsulation Protocol (ANEP)," July 1997. 16.Dawson R. Engler, M. Frans Kaashoek, and J.O.T. Jr., "Exokemel: An operating system architecture for application-level resource management," in Proceedings of the Fifteenth Symposium on Operating Systems Principles, December 1995. 17.Brian Bershad, et al., "Extensibility, Safety and Performance in the SPIN Operating System," in Proceedings of the 15th ACM Symposium on Operating System Principles (SOSP'15). Copper Mountain, CO. IS.Allen B. Montz, et al., "Scout: A Communication-Oriented Operating System," June 17, 1994, Department of Computer Science, University of Arizona. 19.Hartman, J., et al., "Liquid Software: A New Paradigm for Networked Systems," June 1996, Department of Computer Science, University of Arizona. 20.Robert Macgregor, et al., Java Network Security, Prentice Hall, 1998.
A Study of the Impact of Network Loss and Burst Size on Video Streaming Quality and Acceptability David Hands and Miles Wilkins BT Laboratories, Maitlesham Heath, Ipswich, Suffolk, UK. [email protected], [email protected]
Abstract. As multimedia IP services become more popular, commercial service providers are under pressure to maximise their utilisation of network resources whilst still providing a satisfactory service to the end users. The present work examines user opinions of quality and acceptability for video streaming material under different network performance conditions. Subjects provided quality and acceptability opinions for typical multimedia material presented under different network conditions. The subjective test showed that increasing loss levels results in poorer user opinions of both quality and acceptability. Critically, for a given percentage packet loss, the test found that increasing the packet burst size produced improved opinions of quality and acceptability. These results suggest that user-perceived Quality of Service can be improved under network congestion conditions by configuring the packet loss characteristics. Larger bursts of packet loss occurring infrequently may be preferable to more frequent smaller-sized bursts.
1
Introduction
Quality of Service (QoS) is defined by the International Telecommunications Union (ITU) in Recommendation E.800 as the "collective effect of service performances that determine the degree of satisfaction by a user of the service" [1]. Within this definition, QoS is closely related to the users' perception and expectations. These aspects of QoS are mostly restricted to the identification of parameters that can be directly observed and measured at the point at which the service is accessed - in other words the users' perception of the service. E.800 also defines "Network Performance" (NP) as the parameters that are meaningful to the network provider. These NP parameters are expressed in terms that can be related directly to users' QoS expectations. This is subtly different to the network engineering meaning of QoS, which has traditionally been concerned with technical parameters such as error rates and end-to-end delay. The service provider can measure the provided network QoS (i.e. the operator's perception of QoS), map this to E.800 NP parameters and then compare this with the measured customers' perception of QoS. The user's perception of QoS will depend on a number of factors, including price, the end terminal configuration, video resolution, audio quality, processing power, media synchronisation, delay, etc. The influence of these factors on users' perception of QoS will be affected by the hardware, coding schemes, protocols and networks
46
used. Thus, the network performance is one factor that affects the users' overall perception of QoS. For multimedia applications, previous work has identified the most important network parameters as throughput, transit delay, delay variation and error rate [2]. The required network service quality does not necessarily coincide with the conditions at which the effects of delay, error rate and low throughput are perceptible to the user. Previous results [3], [4], [5] have shown that users can accept both loss and delay in the network, up to a certain point. However, beyond this critical network performance threshold users will not use the service. It has also been shown that a significant component of the total end-to-end delay experienced by the user is caused by traffic shaping in the network [6]. IP networks, such as the Internet and corporate intranets, currently operate a 'besteffort' service. This service is typified by first-come, first-serve scheduling of data packets at each hop in the network. As demands for the network resources increase, the network can become congested, leading to increased delays and eventually packet loss. The best effort service has generally worked satisfactorily to date for electronic mail. World Wide Web (WWW) access and file transfer type applications. Such best effort networks are capable of supporting streaming video. Video streaming requires packets to be transported by the network from the server to the client. As long as the network is not too heavily loaded the stream will obtain the network resources (bandwidth, etc) that it requires. However, once a certain network load is reached the stream will experience delays to packets and packet loss. Best effort networks assume that users will be co-operative - they share the available network resource fairly. This works if the applications can reduce their network traffic in the face of congestion. Current video streaming applications used in intranets do not do this - they will continue to send traffic into a loaded network. Data loss, whether caused by congestion in network nodes (IP routers or ATM switches), or interference (e.g. electrical disturbance), is an unfortunate characteristic of most data networks. Network loss is generally not a problem when downloading data to be stored at the client, as the user can wait for all the data to be sent and any missing data can be resent. However, in real-time video streaming, often requiring the transfer of large amounts of data packets across a network, any loss of data directly affects the reproduction quality (i.e. seamlessness) of the video stream at the receiver. If traffic is buffered at the receiver it may be possible for the re-transmission of missing data to be requested. However, this is only feasible if the buffer play-out time is sufficiently long for this re-transmission to occur. The impact of network data loss on the quality of real-time streamed video services is dependent on three factors: • Amount of loss. As the percentage of data packets lost increases, the quality of service tends to decrease. • Burst-size. Data loss can occur as a number of consecutive packets (for a particular stream) being dropped. • Delay and delay variation. Ideally packets will be received with the same interpacket time intervals as when they were sent. However, the end-to-end network delay experienced by each packet will vary according to network conditions. Thus some packets may not arrive when expected. Receivers are required to buffer received traffic in order to smooth out these variations in arrival time. In most video streaming applications, the buffer is sufficiently large to accommodate the majority of variations in network delays. Therefore, most losses
47
likely to affect the users' perception of video streaming performance are due to the amount of loss and the burst size. Research using both human subjects and computer simulations has shown that the amount of loss can have a dramatic effect on quality perception and the readiness of users to accept a service [3], [4], [5], [7]. However, in this research the burst size was not controlled (i.e. burst size could vary within each level of loss). The relationship between loss and burst size and the effect on users' opinions of service quality and acceptability is not known. In addition to the network loss characteristics, the perceived quality of service can be influenced by the manner in which a particular application streams the audio/video media. For example, audio and video information may be sent together in an IP packet or as two separate streams. The effect of network QoS on different streaming methods and the resulting affect on user perceptions of service quality will influence the design of the next generation of real-time streaming applications. This paper examines the impact of different levels of network performance on users' perception of video streaming service quality and acceptability. Data obtained from five different industrial and academic intranets were analysed to determine the performance of each network. Using the results of the network performance analysis, a subjective test was run in which subjects provided quality and acceptability judgements for test clips presented under a number of network conditions. The test focuses on the effects of network loss and burst size on judgements of quality and acceptability. The implications of the test results are discussed with particular reference to future network design to support acceptable user perceptions of video streaming services.
2
Examination of Network Performance
Before subjective testing could begin a range of realistic values for packet loss and burst lengths was required. The applications under study are designed to be used in corporate intranets rather than the public Internet. The requirement was to determine the loss characteristics that these applications might encounter in an intranet. A number of tools are available for measuring network performance. However, these generally do not generate 'realistic' traffic. It is unlikely that arbitrary streams of packets will experience the same loss/delay as real applications. For this reason a test tool that generates traffic with similar characteristics to the applications under study was created. Clients and servers were connected to a stand-alone Ethernet LAN. The traffic generated by a test video sequence was captured by a third machine running the Unix tool tcpdump. The network traffic characteristics (packet size, packet inter-arrival time and bandwidth) were determined by analysis of the collected traffic. The captured network traffic was analysed to determine each applications approximate traffic characteristics, as shown in Table 1. The MGEN/DREC toolset [8] was used to generate and collect traffic streams. A script was written for each application which caused the MGEN tool to generate traffic with characteristics (packet size, packet inter-arrival time and bandwidth) similar to the actual application. The traffic generated by the MGEN scripts was analysed to verify that it was indeed similar to that produced by the real applications. The MGEN scripts were used to perform testing at 5 partners' intranets. At each site a
48
number of 5 minute tests were run consecutively, totalling several hours continuous testing. The DREC tool was used to capture the received traffic. The MGEN tool generates a timestamp and sequence number in each packet. Analysis of the traffic logs revealed the delay, delay variation and packet loss caused by the intranet under test. The logs from all the intranets under study were analysed to determine the range of packet loss characteristics that had been experienced.
Application A (450kbit/s) Application B (1.2Mbit/s)
Audio Video Audio Video
&
5763 byte packet every 100ms 2 consecutive lOOObyte packets every 100ms 12 consecutive UOObyte packets followed by 100ms gap
Table 1. Network traffic characteristics for both applications.
3
Subjective Evaluation of Network Performance
The purpose of the subjective quality experiment was to investigate the effects of loss and burst size on human opinions of video quality and its acceptability. A total of 24 subjects took part in the experiment. All subjects had experienced video over the corporate intranet. This experience was considered important so that the subjects' frame of reference would be computer mediated video rather than broadcast television. 3.1
Test Equipment
A small test network (see Figure 1) was built to perform the experiments. The key component is a PC with two Ethernet cards and software that bridges traffic between the two network cards. This software (called Napoleon) was developed at BT Labs and allows the behaviour of the bridging function to be manipulated. The bridge can be configured to introduce controlled amounts of packet loss and delay. The number of consecutive packets lost can also be defined. One Ethernet port was directly connected to a client PC configured with the application viewer programs. The other Ethernet port was connected to an Ethernet hub. A further two PCs (300MHz PII) were attached to the hub. These acted as the content manager and server; video sequences were stored on the server and the content manager controlled the organisation of files held on the server. Thus, by configuring the Napoleon bridge, the loss characteristics of the network between video server and client could be manipulated. Using the results from the intranet characterisation described above the network performance of a 'typical' corporate network could be emulated.
49
Content Manager & Server
Napoleon Fig. 1. Test environment used in the subjective test. The server was installed with two video streaming applications, termed Application A and Application B. Video files could be accessed and viewed from the client using appropriate viewers for both applications. Audio was presented using Altec Lansing ACS41 speakers.
3.2
Method
Seven different audio-video sequences were recorded from off-air broadcasts as MPEG-1 files at 1.3 Mbit/s. Four clips were used as practice sequences and three clips as test sequences. The practice sequences were presented using the range of network characteristics present in the test sequences. The selection of test material was based on typical business-oriented video streaming content. Each of the recorded sequences was edited down to 10s duration and stored on the server. The details of each sequence were entered into the content manager as both Application A and Application B files, thereby enabling the client to access the files from the server. The content of each test sequence is provided in Table 2. The video viewer window on the client was set to 160 x 120 pixels for both applications. Sequence Newsl News2 Finance
Video Content Studio, female newsreader, head and shoulders, plain backdrop Outside broadcast, man enters building followed by several photographers Male presenter, coloured backdrop, scrolling stock prices
Audio Content Presenter reports a recent crime story. Commentary reporting some legal proceedings Presenter reports business story
Table 2. Description of audio-video content for the test sequence used in the subjective test. Expert examination of the performance of Application A and Application B files prior to the test found that Application B was more tolerant of higher loss levels. Indeed, the performance of Application A appeared unusable above 1% loss levels. Table 3 shows the network conditions used in the present study.
50
Condition % Loss Burst size
1* 0 0
2*
3*
'/2
Vi
1-2
6-7
4* 1 1-2
5 1 6-7
6 7 1 4 9-10 1-2
8 7 1-2
Table 3. Network conditions used in the subjective test. Note that * indicates the conditions used to present Application A files. Application B files were presented under all possible network conditions. For Application A test trials, all three test sequences were presented under four network conditions (see Table 3 for the network conditions used for Application A). In test trials presented using Application B, each test sequence was presented at each of the eight possible network performance levels. At the end of each test presentation, four windows appeared sequentially on the client monitor requesting subjects to make three quality ratings (overall quality, video quality, and audio quality) and an acceptability rating. Table 4 shows the questions used in the test. Quality assessments were made using the 5-grade discrete category scale (Excellent, Good, Fair, Poor, Bad) [9], [10], [11]. Acceptability judgements were made using a binary scale ('Acceptable' and 'Not Acceptable'). Thus, for each trial, subjects made four responses (overall quality rating; video quality rating; audio quality rating; acceptability). A LabView program controlled the presentation of quality and acceptability questions and was used to collect and store the response data. At the end of the experiment, subjects were asked to complete a short questionnaire (see Appendix). The aim of the questionnaire was to obtain feedback on die criteria subjects used to arrive at opinions of quality and acceptability.
Question type Overall quality Video quality Audio quality Acceptability
Question text "In your opinion the OVERALL quality for the last clip was" "In your opinion the VIDEO quality for the last clip was" "In your opinion the AUDIO quality for the last clip was" "Do you consider the performance of the last clip to be:"
Table 4. Questions used in subjective test.
Results For each test sequence the subjective test data showed a similar response pattern for each network condition. Therefore, to simplify interpretation of the test results, the results presented here are averaged across all three test sequences for each network condition. This section will examine subjects' quality responses first, followed by a description of the acceptability data.
51 4.1
Quality Ratings
Application B Each subject's responses to the three quality questions (overall, video and audio quality) for each test trial were collected and Mean Opinion Scores (MOS) calculated. The results for each quality question are shown in Figure 2. Inspection of Figure 2 shows that both the amount of loss and the loss burst size were important to subjects' quality opinions and this was true for all three quality questions. Increasing network loss generally led to a reduction in quality ratings. Not surprisingly, zero loss resulted in the best quality ratings. Quality ratings tended to become progressively poorer as the amount of loss in the network was increased from 0.5% through to 7%. The poorest quality ratings were found with 7% loss. Examination of the effect of network performance on the separate quality questions shows that subjective responses to the video quality question were poorer than responses to the audio quality question. Video quality was rated worse than audio quality in every network condition, and the difference was especially pronounced at higher loss levels. Overall quality opinions appear to be derived from some form of averaging between video quality and audio quality. The effect of burst size is especially interesting (see Figure 3 below). At both the 0.5% and 1% loss levels, quality ratings were worse for the 1-2 burst size condition compared to the 6-7 packet burst size. Larger burst sizes associated with 1% loss (namely burst sizes of 6-7 and 9-10) produced improved quality ratings compared to a burst size of 1-2 (1% loss). A burst size of 6-7 resulted in slightly better quality ratings compared to the largest burst size (9-10). Further, larger burst sizes at the 1% loss level were rated as roughly equivalent to 1-2 burst size with 0.5% loss.
MOS 5 4 -
Quality D Overall n Video D Audio
3 -
1 -^ Loss Burst
1% 4% 1-2 6-7 9-10 1-2 Network Condition
7% 1-2
Fig. 2. MOS for each network condition for the Application B test trials.
52
Fig. 3. Overall quality ratings for each network condition. Application A MOS derived from the Application A trials are shown in Figure 4. The Application A MOS data follow a very different trend from the Application B data. At zero loss level, the quality ratings are similar to those obtained with zero loss using Application B. However, as soon as any loss is introduced into the network, quality ratings for Application A trials were severely affected. The impact of increasing the level of packet loss on quality ratings for Application A trials was minimal, perhaps due to the low quality baseline provoked by even very small levels of loss. Similarly, burst size had little or no effect on the MOS for Application A trials. Unlike Application B trials, both video quality and audio quality ratings were equally affected by network loss to Application A files. MOS 5 T" 4
Quality • Overall B Video D Audio
3+2
1Loss Burst
n 1/2% 1/2% 1-2 6-7 Network Condition
1% 1-2
Fig. 4. MOS for each network condition for the Application A test trials.
53
4.2
Acceptability Responses
Acceptability opinions for each test condition were based on a binary choice: acceptable or not acceptable. The acceptability data was collected from subjects' responses for each test trial. Figure 5 displays the percentage of subjects reporting 'acceptable' opinions for each network condition for both Application A and Application B trials. Figure 5 clearly shows that Application A is especially sensitive to loss. Even small levels of loss result in the video streaming performance to be considered unacceptable (e.g. with 0.5% loss, only 16.7% of subjects found the performance acceptable).Test sequences presented using Application B were found to be considerably more robust to network loss. Figure 5 shows that loss levels up to and including 1% were found to be acceptable by over 70% of subjects. The acceptability data support the results of the quality ratings, with loss and biu-st size affecting acceptability of the video streams. As the loss level increased, acceptability decreased, although only the 7% loss level resulted in acceptability falling below 50% of subjects. Larger burst sizes were found to produce improved acceptability for the video streaming, with burst sizes of 6-7 and 9-10 being more effective in maintaining acceptability than a burst size of 1-2. However, increasing burst size from 6-7 to 9-10 produces a decrease in acceptability, suggesting that an optimum burst size exists. % Acceptability inn
]
s
80 • ^^v 4S 60 40-
B
-S
j
>• -~B'' H *''
"j #^H
' • ' ' • ' V"'
M Application A j
• Application B
|b 1
20 -• '^flF Loss Burst
ll
fe^^H E^'^B ^^B
0 0
0.5% 0.5% 1-2 6-7
1% 1-2
1% 6-7
1% 9-10
4% 1-2
7% 1-2
Fig. 5. Percentage of subjects reporting acceptable opinions of video streaming performance for Application A and Application B trials.
4.3
Questionnaire
The responses to the questionnaire (see Appendix) showed that all subjects were aware of both audio and video errors being present in the test sequences. Subjects reported four types of errors: video pauses; blockiness in the video; loss of audio; and audio-video asynchrony. Of these errors, loss of audio information was considered to be the most serious by 14 subjects, 6 subjects identified video errors as the most
54
serious, with 4 subjects reporting audio and video errors to be equally serious. The majority of subjects did not find any type of errors to be disturbing, although 3 subjects noted that rapid pause-play loss behaviour was slightly disturbing (reported by one subject as a 'kind of strobe effect') and 3 subjects stated audio errors as disturbing. Acceptability opinions were most affected by the loss of meaning, and the majority of subjects reported that provided the audio information remained intact they considered the performance to be acceptable - as long as the video was not too bad. The responses to the acceptability question are rather vague - what does 'as long as the video was not too bad' mean? It may be concluded that audio was of primary importance in the formation of acceptability opinions, but video was an important secondary consideration. A point raised by two subjects was that the material used in this test was audio dominant (i.e. the message could be understood by listening to the audio alone). It would certainly be of some interest to extend the scope of this study to visually dominant material (e.g. sports events). According to the subjects used in this test, the acceptability of video streaming material would be improved by making the computer monitor more like a television (e.g. common responses to this question were to use a larger video window or surround sound).
5
Discussion
The present work represents an important advancement in our knowledge and understanding of the relationship between network behaviour and users' perception of quality and acceptability for video streaming applications. The results described above have practical implications for: • • • •
enhancing the performance of networks employed in real-time video streaming improving user perceived QoS development of video streaming application technologies understanding how human users arrive at quality opinions
IP networks can exhibit various patterns of loss, and although important work [3], [4], [5] has identified global effects of loss and delay on users' opinions of QoS, our understanding of how network behaviour impacts on users' perceptions of quality and acceptability remains limited. Describing a network's QoS as a percentage of packet loss can be misleading. For example, two different networks may both be characterised as exhibiting 5% loss. This paper has shown that the effect on the user perceived QoS will be very different if one network regularly drops a small number of packets and the other network only suffers occasional losses of large numbers of packets. Therefore, in understanding the effects of network behaviour on users' opinions of network performance, it is necessary to obtain network data on the frequency, burst size and temporal characteristics of packet loss. The present work clearly shows that burst size has considerable influence on subjects' opinions of quality and acceptability for video streaming data. Burst sizes of 1-2 were found to be more detrimental to quality and acceptability compared to larger burst sizes. This result suggests that networks exhibiting frequent, small packet losses may provide poorer user perceived QoS compared to a network showing infrequent large packet
55
losses. The implication of this result is that network protocols would be more efficient and provide improved user perceived QoS if they were designed to conform to a mode of loss behaviour in which burst sizes were large but occurred infrequently. Within EURESCOM Project P807 the configuration of network QoS techniques to achieve this aim is under study. The performance of the two video streaming applications used in the subjective test produced very different user responses for both quality and acceptability. Video streaming using Application A was particularly sensitive to loss whereas the performance of Application B was reasonably robust to all but the highest loss levels. Application A streamed video and audio packets together, and although the application has an in-built priority for protecting audio packets, loss in the network was shown to produce a dramatic drop in quality ratings for both audio and video. As a result, overall quality ratings for Application A trials incorporating some degree of loss typically failed to reach 'fair' on the 5-grade quality scale and few subjects considered the performance to be acceptable. Application B, on the other hand, streamed constantly sized video packets together with occasional audio packets containing a large amount of audio data. The consequence of this method of streaming is that video packets have a much higher risk of being lost compared to audio packets. The subjective test data show that overall quality ratings tended to be arrived at by some form of averaging together of the audio and video quahties. Thus, providing high quality audio had the effect of improving both quality and acceptability ratings. Protecting the audio packets appears to be especially important, given that the questionnaire responses identified audio quality as particularly important for understanding the message contained in the test material, and that audio was reported by subjects as being more critical in forming both quality and acceptability judgements. The streaming technique employed by Application B is clearly advantageous in maintaining user-perceived QoS.
6
Conclusions
This paper examined the effects of network loss and packet burst size on subjective opinions of quality and acceptability. A subjective test, using network parameters derived from real network data, found that both quality and acceptability judgements were affected by the amount of loss exhibited by a network and by packet burst size. Quality and acceptability judgements were reduced (i.e. became worse) as the amount of network loss was increased. Test results comparing different burst sizes for a constant level of network loss showed that small 1-2 burst sizes resulted in poorer user opinions compared to larger burst sizes (packet bursts of 6-7 and 9-10). Increasing the burst size beyond 6-7 did not lead to any further improvement in quality or acceptability judgements. This latter result suggests that an optimum burst size may exist, and that for the present test data the optimum burst size was 6-7. The findings of this test indicate that designing a network to lose larger numbers of packets simultaneously, provided packet loss occurs relatively infrequently, can enhance users' perceptions of QoS. There is still a great deal of research required before the impact of network behaviour on users' opinions of QoS can be fully characterised. For example, the present work employed audio dominant material. It would be useful to understand the
56
effects of loss and burst size on video dominant material. This is particularly important for streaming technologies such as Application B, when video packets have a far greater probability of being lost compared to audio packets. The present work has identified burst size as an important determinant of users' perceptions of quality and acceptability. However, only three levels of burst size were used (1-2, 6-7 and 910). To fully understand burst size effects, and define the optimum burst size for different types of material, a more extensive experiment is required using a broader range of material and burst sizes. Finally, the present work employed short, 10 second test sequences. The use of longer test sequences (e.g. 5 minutes) would enable the examination of network loss and burst size under more realistic test conditions.
Acknowledgements This work was undertaken as part of EURESCOM [12] project P807 ('JUPITER2') The authors thank the project partners for their assistance in providing network performance data and in helpful discussions on many aspects of the work. We are also grateful for the technical support provided by Uma Kulkami, Margarida Correia and David Skingsley of the Futures Testbed team at BT Labs.
References 1. ITU-T Rec. E.800: Terms and definition related to quality of service and network performance including dependability. Geneva (1994) 2. Fluckiger, F.: Understanding networked multimedia. London: Prentice Hall (1995) 3. Watson, A., Sasse, M.A.: Evaluating audio and video quality in low-cost multimedia conferencing systems. Interacting with Computers, 3 (1996) 255-275 4. Apteker, R,T., Fisher, J.A., Kisimov, V.S., Neishlos, H.: Video acceptability and frame rate. IEEE Multimedia, Fall, (1995) 32-40 5. Hughes, C.J, Ghanbari, M., Pearson, D., Seferidis, V., Xiong, X.: Modelling and subjective assessment of cell discard in ATM video. IEEE Transactions on Image Processing, 2 (1993) 212-222 6. Ingvaldsen, T., Klovning, E., Wilkins, M.: A study of delay factors in CSCW applications and their importance. Proc. of 5th workshop Interactive Distributed Multimedia Systems and Telecommunications Services, LNCS 1483, Springer (1998) 7. Gringeri, S., Khasnabish, B., Lewis, A., Shuaib, K., Egorov, R., Basch, B.: Transmission of MPEG-2 video streams over ATM. IEEE Multimedia, JanuaryMarch, (1998) 58-71 8. http://manimac.itd.nrl.navy.mil/MGEN/ 9. ITU-R Rec. BT 500-7: Methodology for the subjective assessment of the quality of television pictures. Geneva (1997) lO.ITU-T Rec. P.800: Methods for subjective determination of transmission quality. Geneva(1996) 11.ITU-T Rec.P.920: Interactive test methods for audiovisual communications. Geneva(1996)
57
12.European Institute for Research and Strategic Studies in Telecommunications: http://www.eurescom.de/
Appendix
Questionnaire 1) Did you notice any errors in the clips you have seen? If yes, please indicate the type of errors present. 2) Of the errors you reported in answering question 1, which do you consider to be most serious and why (e.g. audio errors, video errors)? 3) Did any errors cause you discomfort? 4) What factors were most important in determining your opinion of acceptability? 5) What additional properties, if incorporated into the current application, would improve your opinion of acceptability?
Transport of M P E G - 2 Video in a Routed IP Network * Transport Stream Errors and Their EflFects on Video Quality Liang Norton Cai^, Daniel Chiu^, Mark McCutcheon^, M a b o Robert Ito^, and Gerald W . Neufeld^ ' University of British Columbia, Electrical and Computer Engineering Department ^ University of British Columbia, Computer Science Department mjmccutfics.ubc.ca
Abstract. The effects of packet network error and loss on MPEG-2 video streams are very different from those in either wireless bitstreams or cell-based networks (ATM). We report a study transporting highquality MPEG-2 video over an IP network. Our principal objective is to investigate the effects of network imp£iirments at the IP layer, in particular packet loss in congested routers, on MPEG-2 video reconstruction. We have used an IP testbed network rather than simulation studies in order to permit evaluation of actual trcinsmitted video quedity by several means. We have developed a data-logging video cUent to permit off-line error analysis. Analysis requires video strejim resynchronization emd objective qucility measurement. We have studied MPEG-2 video stream transport errors, error-ciffected video qucdity and error sources. We have cdso investigated the relationship of packet loss to slice loss, picture loss, and frame error. We conclude that slice loss is the domincint factor in video quality degradation.
1
Introduction
Multimedia services involving t h e delivery of digital video and audio streams over networks represent the future of conventional home entertainment, encompassing cable and satellite television programming as well as v i d e o - o n - d e m a n d , video games, and other interactive services. This evolution is enabled by the rapid deployment by telcos (ADSL) and cable providers (cable modems) of h i g h b a n d w i d t h connections to the h o m e . T h e ubiquity of the Internet and the continuous increase in computing power of the desktop computer together with the availability of relatively inexpensive M P E G - 2 decoder plug-in cards have m a d e M P E G - 2 based, high quality video communications an interesting possibility. Many recent studies have investigated the transport of M P E G - 2 video over A T M . However, with the explosive growth of the Internet, intranets, and other I P - r e l a t e d technologies, it is our belief t h a t a large majority of video applications This work was made possible by grants from Hewlett-Packcird Canada and the Cjmadian Institute for Telecommmiications Research
60
in the near future will be implemented on personal computers using traditional best-effort IP-based networking technologies. "Best-effort" implies that the link bandwidth, packet loss ratio, end-to-end packet delay and jitter can vary wildly depending on the network traffic conditions. While not critical to most datarelated applications, these present major hurdles to real-time digital video and audio (DAV) delivery. On the other hand, realtime DAV are non-stationary stochastic sources which are often considered to require guaranteed Quality of Service (QoS). In order to gain a better understanding of the characteristics of MPEG-2 video traffic in a best-effort IP network, and how reconstructed MPEG-2 video quality is affected, an experimental Video-on-Demand (VoD) system has been employed. The system is comprised of a video file server, a PC-based router, and several client machines which are connected via switched Fast Ethernet (Figure 1). Different load conditions for the IP-based network are generated, using highly bursty background traffic sources. Video quality at the client is measured objectively using Peak-Signal-to-Noise Ratio (PSNR) calculations and also subjectively using the Mean Opinion Score (MOS) (see Section 4.5). We examine the correlation between IP packet loss and video quality.
Fig. 1. VOD Test Network
The remainder of this paper is organized as follows. In Section 2 we place our work relative to other research in networked MPEG transport. Section 3 provides a brief description of the effect of transport errors on MPEG-encoded digital video. Section 4 describes our experimental VOD system and measurements. Section 5 presents the results of the transport of MPEG-2 video over IP experiments. Finally, our conclusions are presented in Section 6.
61
2
P r e v i o u s Work
Despite published work on the effects of channel errors on the MPEG bitstream, and how these errors affect perceived video quahty at the client, there have been relatively few studies of network effects on MPEG quality. Current fiber optic or copper media packet networks differ greatly in their error and loss behaviour from "classical" distribution systems such as wireless satellite downlink. Most importantly, paicket networks operate at very high signal-to-noise ratio and experience exceedingly low bit error rates. Conversely, the greatest cause of cell or packet loss in data networks is congestion at network nodes, not a factor in conventional video signal distribution. Some of the work relating network Quality of Service to video quality perceived by an end-system viewer has assumed that individual video frame loss is the dominant impairment. Ghinea and Thomas [1] emulate this effect by evaluating viewer perception and understanding of multimedia (MPEG-1 video and audio) at several different frame rates (5, 15 and 25 frames per second). However, as we show in Section 5.3, slice loss is actually more important than frame loss in degrading video quality and viewability. Measures designed to protect against frame loss without improving on network conditions leading to slice loss are unlikely to be useful in practice. Most of the work regarding network effects on MPEG video or voice and video streams has focussed on ATM networks. Zamora et al. [2] [3] [4] have used the Columbia University VoD testbed and both ATM LAN and WAN setups to study the effects of traffic-induced impairment in ATM switches on MPEG2 subjective video quality. This work uses MPEG-2 Transport Stream (TS) packets encapsulated in AAL5 PDUs carried in ATM cells [5] [6], transported over Permanent Virtual Circuits (PVC) defined within a Permanent Virtual Path ( P V P ) having guaranteed service parameters. The effects of high levels of ATM network impairment are emulated using a Hewlett-Packard Network Impairment Emulator module, which generates bit errors, cell loss and PDU loss according to several scenarios. They analyze the effects of varying PDU size in the presence of different errors. Two types of video client are used, a set-top box with fixed PDU size of 2 MPEG-2 TS packets and a Windows NT client with variable-length PDU capability (up to 20 TS packets per PDU). The studies employ 2 types of program material, VCR-quality (2.9Mbps) and HDTV-quality (5.4Mbps) in both CBR and VBR modes. Interfering cross-traffic comes from a 5Mbps bidirectional videoconference data stream, another CBR video server, or a bursty IP-ATM source which is either on (attempting to utilize full path rate) or off. This is a complicated system and gives rise to complex results. In oversimplified summary, Zamora et al. find that: — PHY bit error rates of less than 10~^ are required for "good" MPEG picture quality - PDU loss rates of 10~^ generally give good MPEG picture quality; PDU loss rates of 10~^ produce "slightly annoying" degradation (i.e. ITU MOS rating
62
3), while at PDU loss rates around 10 * the video degradation depends sensitively on the buffering characteristics of the client. - Greater video degradation is observed for large (14 TS packets) PDUs than small (4 TS packets) PDUs in the presence of moderate PDU loss (10~^); however, the choice of PDU size and the nature of the outcome is heavily influenced by client buffering capabilities and encapsulation choice, so the result not easily extrapolated to different situations. This is an interesting body of work, though its conclusions are not generally relevant to MPEG over IP networks. Many results are conditioned by the encapsulation used - MPEG-2 TS in ATM cells, in several block sizes. The MPEG-2 Transport Stream is not content-aware; TS packets are fixed size and not assigned according to slice boundaries or those of other picture elements. Thus PDU loss will degrade more slices than is the case for the MPEG-RTP-MTUDP-IP encapsulation we have used, and as we show, slice loss is the dominant factor influencing perceived video degradation. The test system was used to generate various levels of single- and multi-bit errors and delay-variance profiles, but as discussed above, we believe that these factors are not of much relevance to VoD over IP networks. The effect of PDU size during degraded conditions in this system depends mostly on the mode of encapsulation and nature of particular client network buffers, so does not easily generalize to other systems. Finally, use of a PVP with QoS guarantee contrasts strongly with the situation in today's IP networks, where no service guarantees of any kind are offered. The work most similar to ours of which we are aware is that of Boyce and Gaglianello [7]. These authors encapsulate MPEG-1 streams into RTP packets over UDP/IP transport/network layers, using the IETF RFC2250 [11] format as we do, always aligned on slice boundaries, though there are differences in slice size settings. This results in some of their packets, exceeding the Ethernet MTU and being fragmented, while our packets are all limited to less than the MTU size. Two stream rates were used, 384 kbps and 1 Mbps, and streams were transmitted from each of 3 sites on the public Internet to the Bell Labs in Holmdel, N.J. Packet loss statistics were accumulated as functions of date and time, packet size, and video frame error rate. Boyce and Gaglianello draw several conclusions. They find that average packet loss rates across the public Internet were in the range of 3.0% to 13.5%, with extremes ranging from 0% to 100%. Due to spatial and temporal dependencies in the MPEG stream, 3% packet loss translates to as much as 30% frame error rate. However, these authors made no subjective evaluation of video quality, and used only a very simplified notion of objective quality; any error in a given frame, whether the loss of the frame or some slight block error arising as propagated error from an earlier lost frame, scored as an errored frame. This is clearly quite different from a PSNR calculation. Their conditional packet loss probability curves show the most common lost burst length to be one packet, but with significant probability of longer bursts. They also showed a significantly higher packet loss probability when the packet size exceeds the network MTU.
63
In areas of overlap, we are in general agreement with the work of Boyce and Gaglianello. However, because they used relatively low bit-rate video across the public Internet, while we used higher-rate video (6 Mbps) forwarded by an IP router with controllable levels of generated cross traffic, we believe that our results are better able to illuminate details of video quality effects due to packet loss. In the case of packet loss vs. frame error rate, we obtain a much stronger relationship than do these authors despite using the same calculation method, finding 1% packet loss rates correspond to 30%-40% frame error levels.
3
MPEG-2 Video Coding and Transport Errors
The ISO/IEC MPEG-2 standard outlines compression technologies and bitstream syntax for audio and video. ISO/IEC 13818-2 [8] defines a generic video coding method that is capable of supporting a wide range of applications, bit rates, video resolutions, picture qualities, and services. MPEG-2 defines three main picture types: I (intra), P (predictive), and B (bidirectionally-predicted) pictures. I-pictures are coded independently, entirely without reference to other pictures. They provide the access points to the coded bit stream where the decoding can begin. P-pictures are coded with respect to a previous I - or P picture. B-pictures use both previous and future I - or P-pictures. The video sequences are segmented into groups of pictures (GOP). Within a transported MPEG-2 bitstream, pax;ket loss induced errors may propagate both intra-frame and inter-frame. An error can corrupt resynchronizing boundaries and propagate through multiple layers, resulting in a very visible erroneous block in the corrupted frame. Such errors may persist through multiple frames if the corrupted frame happens to be an I or P frame. Depending on where the error occurs in the MPEG-2 bitstream, the resulting video quality degradation may be quite variable. However, when packet loss is severe, the probability of picture header loss is high. A picture header loss results in an undecodeable picture. If the lost picture is a reference picture, longer-lasting video quality degradation will occur due to loss propagation.
4 4.1
Experimental System and Measurements Video Client—Server
The University of British Columbia's Continuous-Media File Server (CMFS) [9] is used as the video server. The CMFS supports both CBR and VBR streams as well as synchronization for multiple concurrent media streams (i.e., lip-synchronization of audio and video or scalable video). Our server machine is a 200 Mhz Pentium Pro PC running FreeBSD 3.0, with a 2GB SCSI-II Fast/Ultra hard drive. We used either a software or a hardware MPEG-2 encoder to compress the video sequences as MPEG-2 video elementary streams (ES) and stored them on the video server.
64
CMFS Server
Video Client
MPEG-2 VBR Video
MPEG-2 VBR Video
RTP
RTP
MT
IP Rniiter
UDP
MT UDP
IP
IP
IP
DLL
DLL
DLL
PL
PL
PL i
,
,
L
Fig. 2. Protocol Architecture of the VOD System
The protocol architecture of the experimental VOD system is given in Fig. 2. The Real-Time Transport Protocol (RTP), specified in RFC1889 [10], is used to encapsulate the video data. We chose to use RTP because it is based on Application Level Framing (ALF) principles which dictate using the properties of the pay load in designing a data transmission system. Because the payload is MPEG video, we design the packetization scheme based on 'slices' because they are the smallest independently decodable data units for MPEG video. RFC-2250 [11] describes the RTP payload format for M P E G - l / M P E G - 2 video. The CMFS uses its own Media Transport (MT) protocol, a simple unreliable stream protocol utilizing UDP for transmission of data as a series of byte-sequenced datagrams. This allows the receiving end to detect missing data. While RTP uses packet sequence numbering to detect packet loss, MT detects the quantity of data rather than just the number of packets lost. Lost packets are not retransmitted since this is usually not possible in realtime systems. The maximum UDP datagram size is set so that it can be sent in one Ethernet frame (i.e. size < 1500 bytes) to avoid fragmentation. On the client side, we have developed a data-logging video client for off-line error analysis, which runs on a 200 Mhz Pentium MMX PC. Upon arrival of each RTP packet, the client takes a timestamp and stores it along with the RTP packet to a logfile on disk. No realtime decoding of the MPEG-2 video stream is done. An MPEG-2 software decoder is used for non-realtime decoding of each frame in order to perform objective video quality analysis. For subjective quality analysis, we use a REALmagic Hollywood2 MPEG-2 hardware decoder to view the transmitted video streams.
65
4.2
Network
Our network backbone consists of a 100Base-T Ethernet switch and a PC-based IP router (200 Mhz Pentium Pro PC running FreeBSD 2.2.5). IP forwarding in the unmodified FreeBSD kernel is strictly "best eifort", utilizing FIFO output queues on each network interface. When aggregate traffic in excess of an output link's capacity results in queue saturation, new packets are simply discarded (tail-drop). Cho's Alternate Queuing [12] modifications provide the capability to implement more sophisticated queuing disciplines. The switch supports VLANs, allowing its use with multiple independent IP subnets for routing tests. We use three VLANs connected by the IP router. The video server and a host, which is used as a background traffic generator source, are placed on one subnet. The video client and another computer, which is used as the background traffic sink, are placed on the second subnet. Another host on a third subnet is used to generate background traffic through the router.
4.3
M P E G - 2 Video Streams
For subjective video quality evaluation we have selected studio quality MainProfile @ Main-Level (MP@ML) CCIR 601 CBR video streams which have 704x480x30Hz sampling dimensions, progressive frames, 4:2:0 chroma format, and a 6Mb/s constant bit rate. These streams have been encoded with a single slice spanning the frame width (30 slices per frame). To ensure the generality of our study, it is important that test streams cover the characteristics of most real world MPEG-2 encoded video. The clips we selected present large differences in spatial frequency and motion. Cheer, Ballet, and Susi represent High, Medium, and Low activity and spatial detail, respectively. For clarity, we refer to the clips as video-H, video-M and video-L, respectively. MPEG-2 video can be seen as a stream composed of short sub-periods with varying levels of motion and spatial detail. The overall subjective video quality may be characterized by the single worst sub-period. For this reason, we use test streams that are 10 seconds (300 frames) in length. Table 1 provides detailed statistics of the test streams. Table 1. Video Stream Statistics Frames Slices I
p B Total
21 630 80 2400 199 5970 300 9000
video-H 866782 2458287 2930873 6255942
Bytes video-M 664211 2983630 2598643 6246484
Packets video-L video-H video-M video-L 864736 843 880 660 2854488 2374 2701 2472 3037 2532044 3415 3114 6251268 6475 6317 6605
66
4.4
S t r e a m Re-synchronization a n d E r r o r C o n c e a l m e n t
In a highly congested IP network, multiple frames may be lost due to sequentially dropped packets, but this is more likely to occur when the lost packets contain MPEG-2 bitstream headers, resulting in undecodable or unrecoverable frames. In either case, comparison of video frames between the original and the transmitted streams becomes difficult because temporal alignment is lost. Temporal alignment between the two streams is also required for PSNR measure, so re-synchronization is necessary. Another important factor in providing accurate measurements is how lost frames are handled. This may involve leaving a blank or patching with error concealment techniques. A blank could be a black, grey or white frame; various error concealment techniques are available. We have chosen to substitute a black frame for a lost frame. In our experience, error concealment techniques affect measurement results so significantly that the real characteristics of measured video streams are masked. Our analysis detects frame loss in the transmitted video stream and substitutes a black frame for the lost frame, thus resynchronizing the original and the transmitted video sequences. 4.5
Measurements
We measure MPEG GOP header loss, Picture header loss, and slice loss to determine the probability of loss under different network loads. With better insight into loss probability, video quality degradation can be minimized, for example, by duplicating critical data structures in the MPEG coded data stream (i.e. Picture headers). Such measurements will also help determine the optimal packet size to use. A balance exists where packets are not so large that a single lost packet results in extensive MPEG data loss and not so small that too much overhead is incurred. At the application level, the client log files created by the data-logging video client are parsed to calculate the client video buffer occupancy and the MPEG frame interarrival times. The MPEG-2 Elementary Stream (ES) data is extracted from the RTP packets contained in these log files. MPEG-2 ES data contained in any RTP packet that arrived later than the time at which it was supposed to have been taken from the buffer to be decoded is discarded. The MPEG-2 ES data is then decoded using a software decoder in order to perform objective video quality measurements. We use Peak Signal-to-Noise Ration (PSNR) [13] as an objective measure of the degree of difference between the original video stream and the corrupted stream. PSNR correlates poorly with human visual perception, or subjective quality. Nevertheless, it provides accurate frame to frame pixel value comparison. We use the Mean Opinion Score (MOS) as a "true" evaluation of the video quality through an informal viewer rating on a 5-point scale (see Table 2) ^. ' Scale defined per Recommendation ITU-R BT.500.7, "Methodology for the Subjective Assessment of the Quaility of Television Pictures", Oct. 1995
67
We always rate the original video stream as level 5 in MOS-style evaluations, then visually compare the corrupted stream against it. The test streams are encoded and decoded using the same codec. In this way, we eliminate effects of the MPEG-2 codec (such as the introduction of noise to the frames by the lossy compression process) from the tests, since we are only interested in transport loss issues. Table 2. Subjective Quality Scale (MOS) Scale Impairment 5 Imperceptible 4 Perceptible, but not einnoying 3 Slightly Amioying 2 Annoying 1 Very Amioying
5
Experimental Results
The PSNR and MOS results from each video stream under test are expressed as a function of IP network packet loss ratio. The router shown in Figure 1 is stressed by moderately to heavily loading with cross-traffic so as to obtain various packet loss rates. The characteristics of the stream and the statistical correlation between PSNR, MOS and packet loss are investigated. 5.1
Quantitative M P E G - 2 Video Quality to IP Packet Loss Mapping
We are primarily interested in effects of IP packet loss on reconstructed MPEG2 video, in other words, MPEG-2 video transport errors. Our review of existing work [14] indicates that video block artifacts, repeated movement and flickering due to the loss of slices or frames are the most visible types of quality degradation. Figure 3(a-c) shows PSNR vs. packet loss plots for video-H, video-M and video-L. Figure 3(d-f) are the corresponding MOS vs. packet loss plots. Since a human viewer will be the recipient of the video service, the MOS subjective video quality measurement is considered to be the true video quality. We define video with an MOS between 5 and 3 as the Low Degradation Zone (LDZ), between 3 and 2 as the Annoying Degradation Zone (ADZ), and between 2 and 0 as the Very-annoying Degradation Zone (VDZ). We use Low Degradation Point (LDP) and Annoying Degradation Point (ADP) to specify two quality boundaries. With video quality above the LDP, the viewer will occasionally observe corrupted slices. With video quality between LDP and ADP, besides seeing a large number
68
medium activity/medium spatial
high activity/high spatial
40
low activity/low spatial 40
30 1
#yL
1
1
*
*
20 W X. A
10
0
1
1
] 0.5
]
(c) 1
^ 1
1.5
1
31*! LDP; * ^ f \ j
ADP»
^^-*)MBt 1 1
*
1 I
0.5
1
packet loss %
1.5
0.5
1
1.5
packet loss %
(d)
* * Jimt » \
1 .
I
*\* .
0.5
*
1
1.5
packet loss % (f)
(e) F i g . 3 . Video Quality vs. Packet Loss
of corrupted slices, the viewer will also observe occasional flickering due t o t h e loss of one or more frames. W i t h video quality below A D P , the viewer will experience multiple damaged frames ciccompanied by strong flickering. Table 3 is a mapping of Quality Boundary t o P S N R a n d Packet loss ratio. For example, the VDZ zone of v i d e o - M normally results from greater t h a n 0.7% packet loss. T h e Low Degradation Zone ( M O S 5 t o 3) is of the m o s t interest because it is in t h e range of cicceptable video quality for viewers. By observing the LDZ of Figure 3 ( d - f ) , one finds t h a t the slope in Figure 3(d) is not as steep as the one in Figure 3(e) and (f). A steeper slope in the M O S / p a c k e t loss plane indicates a higher r a t e of video quality degradation. We note t h a t in t h e acceptable video quality range (MOS between 5 and 3), the lower-activity/lower-spatial detail M P E G - 2 video stream is more susceptible t o I P packet loss impairment. Quantitatively, video-H reaches the L D P at a packet loss rate of 0.60% while video-M reaches L D P at a loss rate of 0.40% a n d video-L at 0.28%. P S N R values for the L D P are 24.9 d B for video-H, 31.0 d B for video-M and 33.1 dB for video-L.
69
The foregoing indicates that it is satisfactory to examine only low-activity/lowspatial detail MPEG-2 streams for the worst-case scenario video quality evaluation of video streams transported over a lossy IP network. 5.2
Correlation of P S N R and Packet Loss
Many studies have indicated that PSNR as an objective measurement of video quality does not correlate well with the true subjective video quality. This conclusion is quantitatively verified by our experiments. As shown in Figure 3, an MOS value of 3 corresponds to 24.9 dB PSNR in video-H but 33.1 dB PSNR in video-L, a 26% difference in value. On the other hand, as depicted by Figure 3(a-c), the PSNR of the three video streams show linear correlation with packet losses. The regression coefficients for rate and constant are obtained by using a simple linear regression estimation. For video-H, video-M and video-L, the coefficients are (-0.040, 1.6), (-0.046, 1.8), and (-0.43, 1.7), respectively. The similarity in the coefficients means that PSNR to packet loss mapping of the three video streams are almost identical. Thus from Figure 3, at a packet loss ratio of 0.3%, the PSNR values are 31.8, 32.9 and 32.7, a spread which is not statistically significant. 5.3
Dominant Contributor to Video Quality Degradation
An MPEG-2 video bit stream is a prioritized stream of coded data units. Some headers, for example, the P-frame Picture header, are more important than the other headers, such as the I-frame Slice header or the B~frame Picture header. Packet loss protection techniques normally add redundancy back into the original bit stream, increasing the demand on bandwidth. Without knowing how highly prioritized MPEG-2 video bit streams are affected by packet loss, blindly engineered error protection techniques are an inefficient means to deal with video quality degradation. We must investigate what elements in the MPEG-2 bit stream should be protected from IP packet loss. Figure 4(a,b) are plots of PSNR and MOS vs. slice loss. Figure 4(c,d) are plots of PSNR and MOS vs. picture loss, respectively. By comparing the slice loss related results against the picture loss related results, it is clear that the variance of the video quality (PSNR and MOS) due to picture loss is much greater than that due to slice loss. The video quality degradation is closely related to slice Table 3. QuciUty Boimdeiry Mapping Quality Boundciry
PSNR (dB) Packet Loss (%) video-H video-M video-L video-H video-M video-L LDZ (MOS 5-3) 40.0-24.9 40.0-31.0 40.0-33.1 0.0-0.6 0.0-0.4 0.0-0.3 ADZ (MOS 3-2) 24.9-17.5 31.0-24.8 33.1-23.3 0.6-0.9 0.4-0.7 0.3-0.8 VDZ (MOS 2-0) < 17.5 >0.9 >0.7 < 2 4 . 8 , < 23.2 >0.8
70
40r f i s
1
1
r
1
*r**
1
\
1 •
1
1
,
1
J
•
35 30
i^*
*
;
I* * ^ i .
m25
3
•o
|20 CO
i""
;...^....^...^.^...l..^.^.
^
2
10
;
5
q
0.5
1
*i
1.5 2 slice loss %
2.5
1 1.5 picture loss %
2
> \ ;*
;
"
•
3
1 1.5 picture loss %
2
Fig. 4. a) Slice Loss vs. PSNR b) Slice Loss vs. MOS c) Picture Loss vs. PSNR d) Picture Loss vs. MOS
loss ratio but not to picture loss ratio. Alternatively, picture loss is not well correlated with either PSNR or MOS. In an MPEG-2 video, a lost slice normally presents a corrupted horizontal bar, a block artifact, or a localized jerky motion, while a picture loss can be seen as video hesitating, repeating, or flickering. In our experiments, when network congestion increases, the video quality degrades through the increased occurrence of corrupted bars or block artifacts. By the time viewers perceive flickering or frame repeating, the video quality is already so badly damaged by corrupted slices as to be not worth watching. Through quantitative measurement and subjective assessment, our study indicates that when MPEG-2 video is transported in IP packets, in the range of fair video quality, the loss of slices is the dominant factor contributing to the degradation of the video quality rather than the loss of pictures. Our study also found that in the packet loss range of less than 1.5%, single packet losses account for the majority of the losses. This suggests that it is not effective to protect picture level data and parameters without first reducing slice losses.
71
5.4
Characteristics of MPEG—2 Data Loss v s . Packet Loss
The MPEG-2 ES data is packetized Eiccording to the encapsulation scheme specified in IETF RFC-2250. Because the slice in MPEG-2 is the error resynchronizing point and the IP packet size can be as large as 64 KB, RFC-2250 selects the slice as the minimum data unit and creates fragmentation rules to ensure that the beginning of the next slice after one with a missing packet can be found without requiring that the receiver scan the packet contents.
0.5
1 packet loss %
0.5
1 packet loss %
F i g . 5. a) Packet loss vs. Slice loss, b) Packet loss vs. Picture loss
Figure 5(a) is a plot of slice loss vs. packet loss. Due to the RTP packet payload format, slice loss is linearly related to packet loss. Loss of an IP packet means loss of at least one slice. It also means that data loss granularity is much higher than single bit or even a short burst of bits. Figure 5(b) is a plot of the picture loss vs. packet loss. The variation in picture loss count is quite large. If compared with Figure 3(d), the LDZ, ADZ, and VDZ correspond to the picture loss level of 0-0.2%, 0.2-2% and 2-4%. Picture loss rates higher than 2% produce very annoying flickering. I- and P-frames account for 33.2% of the total frames in the test stream. On average, 2% frame loss will result in about 2 I- or P-frame losses from our 300 frame test program. The frame error rate, as calculated in [7], is a measure of the number of frames that contain errors. Errors are due to packet loss in the current frame or in a frame that the current frame is predicted from. Packet loss errors spread within a single picture up to the next resynchronization point (i.e. slice). This is referred to as spatial propagation and may damage any type of picture. When loss occurs in a reference picture (I-picture or P-picture), the error will remain until the next I-picture is received. This causes the error to propagate across several non-intra-coded pictures until the end of the Group-of-Pictures, which typically consists of about 12-15 pictures. This is known as temporal propagation and is due to inter-frame prediction.
72
70r 60 —50
g40h UJ
130 '20 10 0.5
1 Packet Loss (%)
1.5
Fig. 6. Packet loss vs. Frame Error Figure 6 shows the relationship between the packet loss rate and frame error rate for the low activity stream. All three activity streams showed very similar relationships. Small packet loss rates translate into much higher frame error rates. For example, a 1% packet loss rate translates into a 40% frame error rate. This measurement indicates the difficulty of sending MPEG video over a lossy IP-based network.
6
Conclusions
A quantitative mapping between MPEG-2 video quality and IP packet loss has been derived. Using such a mapping, network service providers may be able to predict perceived video quality from IP packet loss ratios. We have identified block artifacts, repeated movements and flickering as the most visible quality degradations in lossy IP network environments. Thus we believe that low-eictivity/low-spatial detail MPEG-2 video clips are suitable for worst-cEise scenario quality evaluation when studying the effects of packet loss. We have shown that PSNR as an objective measurement of video quality is poorly correlated with the subjective assessment. However, PSNR as a numerical comparison method was found to correlate linearly with packet loss in the high, medium, and low activity test streams. When MPEG data is transported in IP packets, slice loss rather than picture loss is the dominant factor contributing to video artifacts at a given packet loss rate. By the time a viewer perceives flickering or frame repeating, the video quality is already badly damaged by corrupted slices and unwatchable. Cells used in ATM networks are small compared to IP packets. Depending upon encapsulation chosen, when an ATM cell is lost, only one or a few macroblocks of a slice may be lost; an IP packet loss will result in one or more slices being lost. Small packet loss rates translate into much higher frame error rates. Therefore, the traditional Forward Error Correction ( F E C ) schemes designed
73
for single bit or short bursts are no longer effective. New schemes are needed t o provide d a t a protection and recovery.
References [I]
[2]
[3]
[4]
[5] [6] [7] [8] [9]
[10] [II] [12]
[13] [14]
G. Ghinea and J. P. Thonicis, QoS Impact on User Perception and Understanding of Multimedia Video Clips, in Proceedings of ACM Multimedia '98. J. Zamora, D. Anastassiou cuid S-F Chang, Objective and Subjective Quality of Service Performcince of Video-on-Demand in ATM-WAN, submitted for publication to Signed Processing: Image Commmiication, July 1997. J. Zamora, D. Anaistassiou, S-F Chang and K. Shibata, Subjective Quality of Service Perfomicince of Video-on-Demand under Extreme ATM Impairment Conditions, Proceedings AVSPN-97, Sept. 1997. J. Zamora, D. Anastzissiou, S-F Chang and L. Ulbricht, Objective and Subjective Quality of Service Performance of Video-on-Demand in ATMWAN, submitted for pubhcation to Elsevier Science, Jan. 1999. ATM Forum Service Aspects and Applications, Audio/Visued Multimedia Services: Video on Demand vl.O, af-saa-0049.000, Jan. 1996. ATM Forum Service Aspects and Applications, Audio/Visual Multimedia Services: Video on Demand v l . l , af-saa-0049.001. Mar. 1997. J. M. Boyce and R. D. GagUanello, Packet Loss Effects on MPEG Video Sent Over the PubUc Internet, in Proceedings of ACM Multimedia '98. ISO/lEC International Standard 13818; Generic coding of moving pictures sad dissociated audio information, November 1994. G. Neufeld, D. Mcikaroff, and N. Hutchinson, Design of a Variable Bit Rate Continuous Media File Server for an ATM Network, IST/SPIE Multimedia Computing eind Networking, pp. 370-380, San Jose, January 1996. H. Schidzrinne, S. Ccisner, R. Frederick, eind V. Jacobson, RTP: A Treinsport Protocol for Real-Time Applications, RFC 1889, January 1996. D. Hoffman, G. Fernando, and V. Goyal, RTP payload format for MPEG1/MPEG2 video, RFC 2250, January 1998. K. Cho, A Framework for Alternate Queueing: Towcirds Traffic Management by PC-UNIX Based Routers, In Proceedings of USENIX 1998 Annual Technical Conference, New Orleans LA, Jrnie 1998. K. R. Rao and J. J. Hwang, Techniques and Stcmdeo-ds For Image, Video and Audio Coding, Prentice Hall PTR, New Jersey, 1996. L. N. Cai, Taxonomy of Errors Study on Real-Time Transport of MPEG-2 Video Over Internet, TEVIA Project Report, UBC Electrical and Computer Engineering Depcirtment, January 1998.
Specification and Realization of the QoS Required by a Distributed Interactive Simulation Application in a New Generation Internet Christophe Chassot', Andre Loze', Fabien Garcia' Laurent Dairaine''^*, Luis Rojas Cardenas'^ ' L A A S / C N R S , 7 Avenue du Colonel Roche 31077 Toulouse cedex 04, France {chassot, alozes, fgarcia, diaz @laas.fr} 'ENSICA, 1 Place Emile Blouin 31056 Toulouse cedex, France {dairaine, [email protected]}
Abstract: One of the current research activities in the networking area consists in trying to federate the Internet new services, mechanisms and protocols, particularly the IETF Integrated Service (IntServ) Working Group ones, and the ATM ones, in order to allow new multimedia applications to be guaranteed with an end to end Quality of Service (QoS). Study which is presented in this paper has been realized within a national project whose general goal was to specify and then implement the QoS required by a distributed interactive simulation in an heterogeneous WAN environment consisting in an ATM interconnection of three distant local platforms implementing the IntServ propositions. The first part of the results exposed in the paper is related to the QoS a distributed interactive simulation (DIS) application has to require so as to be correctly distributed in the considered network environment. The second part of the paper is dedicated to the end to end communication architecture defined within the project in order to provide a QoS matching DIS applications requirements.
1. Introduction 1.1. Work context These latest years, technological improvements in both computer science and telecommunications areas led to the development of new distributed multimedia applications involving processing and (differed or in real time) transmission of all media kinds. Communication requirements of such applications make it necessary the use of new networks providing both high speed transmissions and guaranteed qualities of service (QoS) : throughput, transit delay, etc. Among these applications. Distributed Interactive Simulation (DIS) applications have specific features (interactivity, hard time constraints,...) that make difficult both their traffic characterization and the QoS specification to require from the network. In order to distribute these new applications on a high scale, the mostly used and analyzed wide area network (WAN) solutions are the Internet on one side and ATM based networks on the other side. In spite of (or due to) its popularity, the current Internet does not allow an
76
adequate treatment of multimedia applications features and requirements ; these latest years, several works have been initiated so as to solve the problem: particularly, the Integrated Service (IntServ) working group [1] of the IETF (Internet Engineering Task Force) has recently proposed some extensions of the current Internet service model [2][3], and also published several papers related to mechanisms and protocols [4] [5] allowing to implement the newly defined services. On the other side, several local ATM platforms have been developed in different places, particularly in France in three research laboratories: the LAAS/CNRS (Toulouse), the INRIA (Sophia Antipolis) and the LIP6 (Paris); in 1997, the interconnection of these platforms led to the birth of the MIRIHADE platform (today SAFIR) on which several experiments have been realized (performances measurements, multimedia applications testing, etc.). One of the main research activities in the networking area consists in trying to federate Internet and ATM worlds in order to provide applications with an end to end guaranteed QoS, independently of the physical networks used. The study exposed in this paper has been realized within the "DIS/ATM' project' involving a French industrial (Dassault Electronique, now Thomson/CSF-Detexis) and three laboratories and Institute G^AAS/CNRS, LIP6 and INRIA). The general goal of the project was to specify and then implement the QoS required by a DIS application in a WAN environment consisting in an ATM interconnection of three distant local platforms implementing the IntServ propositions. Within the project, the LAAS-CNRS has been mainly involved in the following three parts: 1. Characterization of the QoS a DIS application has to require fi'om the network so as to be distributed in a WAN environment; 2. Design of an end to end communication architecture: - providing a QoS matching DIS applications requirements ; -distributed over an Internet version 6 (IPv6) network environment, able to provide the IETF Intserv service models {Guaranteed Service and Controlled Load); 3. Integration of the targeted DIS application within this architecture in a PCVLinux machine/system.
1.2. Paper content Results exposed in this paper are related to the two first of the previous points. The following of the paper is structured as follows : the first part of the paper (section 2) consists in an analysis of the end to end QoS a DIS application has to require in a WAN envirorunent, in order to be coherently^ distributed. The second part of the paper (section 3) proposes a classification of the DIS protocol data units (PDUs) that matches application requirements, but also allows the design of an end to end communication architecture based on an architectural principle mainly applied these latest years in the multimedia architecture context. ' The DIS/ATM project (convention n° 97/73-291) is co-financed by the French MENRT (Ministere de TEducation Nationale, de la Recherche et de la Technologie) and the DGA (Direction G6nerale de rArmement). ^ This term will be defined in the appropriate section (section 2).
77
The third part of the paper presents an overview of the defined end to end architecture (section 4). Finally, conclusions and future works are presented in section 5.
1.3. Related work Several work themes have been recently studied within the DIS context. Only the major ones are introduced in this section. An important part of the work realized these latest years is related to the "scalability" problem associated with the deployment of DIS applications between several hundred of sites [6] [7]. Particularly, the LSMA Working Group has provided documentation on how the BETF multicast protocols, transport protocols and multicast routing protocols were expected to support DIS applications. Besides the dead reckoning, several techniques have been proposed in order to reduce the bandwidth required; they can be divided into two parts : - aggregation techniques, that consists in either grouping information related to several entities into a single DIS PDU [8] or reassembling several PDUs into a single UDP message [9] ; - filtering techniques, that classifies DIS PDUs into several fidelity levels, depending on the coupling existing between sending and receiving entities [10]. Following the same goal in the distributed games area, [11] proposes mechanisms allowing the deployment of the MiMaze applications on the Internet. Another theme is related to the definition of an architecture (HLA/RTP) whose goal are (particularly) [12] to facilitate simulators interoperability and to extend the application area to the civilian one. A last theme concerns DIS traffics measurements ; several studies have been pubUshed [13] [14], but none of them allows a forma characterization model.
2. DIS applications and networking QoS The goal of this section is to define the QoS a DIS application has to require from a WAN so as to be coherently distributed through the network. Following this goal, we first show the limits of the QoS specification formulated in the DIS standard in a WAN environment. Then, another QoS expression is proposed allowing to solve the problems associated to the previous limits.
2.1. Coherence notion Traffic generated by a DIS application may be divided into three parts : - traffic corresponding to the broadcasting of the simulated entities state {Entity State PDU, etc.) and events (Fire PDU, Detonation PDU, Collision PDU, etc.); High I^evel Architecture / Real Time Infiastmcture
78
- traffic corresponding to the communication between participants or emissions to captors : radio communications {Signal PDU) and electromagnetic interactions {EM-emission PDU, etc.); - traffic corresponding to the distributed simulation management: simulation initialization, participants synchronization, etc. {Start PDU, Resume PDU, etc.). With regard to the DIS standard [15][16][17], a DIS application is correctly distributed if any simulation site has both a temporal and a spatial coherent view of the distributed simulation ; in other words, this means that: - any event (fire, detonation, etc.) occurred at time t„ on a given site has to be known by the other sites at least T time units after the date ?„ (temporal coherence); - the state (position, orientation, etc.) of any entity simulated on a given site has to be "almost" identical to the one extrapolated on any other site (by application of the same dead reckoning algorithms), the "almost" term meaning that the difference between those two states has to be less than a maximal value defined as the threshold value in the DIS standard (spatial coherence). In order to illustrate this second point, let's consider the example of an entity A representing a plane simulated on a site S,. At a given time, another site S^ has a spatially coherent view of A if the difference between the knowledge 52has from the state of A and the real state of A (i.e. the one simulated on site 5,) does not exceed a given threshold value, defined in the DIS standard as the maximal acceptable error done on the position {Th^J and the orientation {Th^ on each entity. The underlying Fig. 1 illustrates on a same plan the position and the orientation of the entity A (supposed here to be a plane), respectively on site S, (plane on the left) and on site S, (plane on the right). While the position error E^ and the orientation error £„ do not exceed their respective threshold (77ip„ and Th^, S, and 52 have a spatially coherent view of i4. Out of one of those thresholds, it is necessary for S^ to refresh the state of A by means of a state information (typically an Entity State PDU) sent by Sy (actual) state of A on site Si
(extrapolated) state of A on site ^2
Ep = error on the position Eg = error on the orientation Fig. 1. Position and orientation of an entity
79
2.2. DIS standard QoS specification As they are expressed within the DIS standard, DIS apphcation requirements are the following ones : - the application has to be provided with a multicast transport level service ; - the QoS associated with the end to end DIS PDUs transport is defined by means of three parameters whose value depends on the coupling of the corresponding entities with the other entities : - reliability, defined as the maximal PDU loss rate ; - maximal end to end transit delay ; - maximal jitter for radio type PDUs. Let's first recall the coupling notion. Assume A be an entity simulated on a given site. According to the DIS standard, one of the two following situations may be identified : (1) A is loosely coupled with all other entities or (2) A is tightly coupled with at least another (local or distant) entity. As an example (given in the standard) : - two distant tanks provide an example of loosely coupled entities ; - several tanks in formation at a fast speed provide an example of tightly coupled entities. From this situation, the QoS associated with the transfer of the Entity State PDUs is defined as it follows: - a transit delay less than or equal to 100 ms and a loss rate less than or equal to 2 % is required for the transfer of PDUs associated with tightly coupled entities ; - a transit delay less than or equal to 300 ms and a loss rate less than or equal to 5% is required for the transfer of PDUs associated with loosely coupled entities.
2.3. DIS standard QoS specification limits Two major limits may be identified from the previous QoS specification: (LI) First, it does not allow the design of an end to end architecture based on the following principle : « one end to end channel per userflow,providing a specific QoS associated with the features and the constraints of the transported flow ». (L2) The second limit is that the spatial coherence of the simulation can't be respected at any time. Let's detail this second affirmation. Assume two sites 5, and 5', belonging to the same DIS exercise ; 5, and S^ are supposed to be perfectly synchronized. Assume now an entity A simulated on site 5,, whose state is maintained on site 5, by application of a dead reckoning (DR) algorithm. In order to simplify future explanations, A state will be reduced in the following to the position of A according to only one dimension.
80
The underlying Fig. 2 illustrates on a same plan the evolution in time of the extrapolation error* E made on A position, respectively on site S, (upper part of the figure) and on site 5, (lower part of the figure) ; both sites are supposed to have a same reference time and to apply a same DR algorithm. Let's recall that only 5, is able to calculate the real state of A. Black (respectively gray) plumps on Fig.2 correspond to the sending Ts dates (respectively receiving Tr dates) of the entity state (ES) PDUs refreshing A. We suppose that the time origin corresponds to the sending date of the first ES PDU {Ts^. Thus, Tr^, Tr^, Tr^ and Tr, correspond to the receiving dates of the ES PDUs sent at Ts„, Ts^, Ts^ and TSy E = Error on the extrapolated position ES PDU emission
-m
Receiving site 5,
Fig. 2. Extrapolation error evolution Let's analyze Ts^, Ts^ and Ts, dates. - Starting from Ts,,, the error E between the extrapolated and the actual position of A is growing up (in absolute value) and raises its maximal acceptable value {Th^^^, corresponding to the position threshold value) at Ts^: at this time, an ES PDU is generated by 5, (DR application), indicating the actual position of A ; the error E becomes null on site S,. - Starting from Ts^, the error E between the extrapolated and the actual position of A varies between (,-Thj,J and i+Th^J without exceeding one of those two values ; after a Heart Beat Timer (HBT) spent time, i.e. at Ts, = Ts^+5 seconds (default value for the HBT in the DIS standard), an ES PDU is automatically generated by 5,, indicating as in the previous case the actual position of A ; error E becomes null on site S,. Let's now analyze the evolution of error E on site 5,. '' The extrapolation error E is obviously defined as the geometrical difference between the actual and the "dead reckoned" state of the considered entity.
81
- Starting from Tr^, the error E between the extrapolated and the actual position of A is identical to the one on S^; particularly, E raises its maximal acceptable value r/Zp„, at time Ts^, corresponding to the ES sending date on 5,. However, A position is not updated before Tr,, the time interval separating Ti, from Tr, corresponding to the ES PDU transit delay through the network. It then appears an indetermination on E value between Ts^ and Tr, : during this time period, E may exceed the Th^.^^ value, and then potentially generate a spatial coherence violation. - Analogously, a same indetermination may occur between Ts^ and Tr^, and between Ti, and Tr^, illustrated on Fig. 2 by the gray zones. It results from this analysis that the current standard QoS definition makes it possible a spatial coherence guaranty, but possibly in each [Ti,, Tr] time interval : during those time periods, the error E may transitorily exceed in absolute value the maximal acceptable value (DIS standard threshold). Such a case is illustrated on Fig. 3. S,^ transitory excess corresponding to a spatial coherence violation
ES PDU emission © ES PDU reception
Fig. 3. Transitory excess illustration Two questions may then be raised : (1) is this phenomenon a problem for a DIS exercise ? (2) in this case, is it possible to solve the problem ; in other words, is it possible to keep the transitory excess E^ illustrated on Fig. 3 under a given value ? As far as the first question is concerned, the transitory excess becomes problematic as soon as the transit delay (TD on Fig. 3) is non negligible with regard to the mean time interval separating two ES PDU consecutive receptions. In this case, it then results that the receiving sites may have an incoherent view of the simulation during non negligible time periods, and even « almost all time » if the refreshing period is negligible compared with the ES transit delay ; of course, this risk does not occur in a LAN environment, but has to be taken into account when the application is distributed in WAN environment such as the one considered in our study. We now formulate a QoS proposition aimed at allowing the respect of a maximal transitory error value.
2.4. QoS proposition In order to solve the previous two limits (LI and L2), the QoS specification that we propose is based on the following points :
82
- as in the DIS standard, the appUcation has to be provided with a multicast transport level service ; - QoS associated with the end to end DIS PDU transport is defined by means of the same three parameters than the ones defined in the DIS standard : -reliability, defined as the maximal PDU loss rate (total reliability meaning a 0 % loss rate ; null reliability meaning that a 100 % loss rate is acceptable); - maximal end to end transit delay (possibly infinite); - maximal jitter for radio PDU. But and differently from the standard specification, we propose : - to drop the coupling notion which is used in the standard to define QoS parameter values ; - to evaluate QoS parameters value : - only from the information contained in each PDU ; in that, we provide an answer to limit ( L I ) ; - so as to guaranty at any time that the extrapolated and the actual state of any simulated entity do not differ from each other beyond a maximal value ; in that, we provide an answer to limit (L2). Let's now study how the armounced guaranty can be implemented. More exactly, let's study how the transitory excess Ej. illustrated on Fig. 3 may be kept under a maximal value.
2.5. Transitory excess control mechanism Consider the underlying Fig. 4. Obviously, the transitory excess E^. may be kept under a maximal E^^ value as soon as the ES PDUs transit delay does not exceed the TD^ value illustrated on the figure.
•
^P
ES PDU emission
^
ES PDU reception
time
Fig. 4. Transitory excess control Intuitively, it is possible to conceive that the Ej value depends on the dynamic (i.e. the acceleration) of the corresponding entity : the higher this dynamic, the higher E^ (potentially). Consequendy, one can also conceive that the £j.^ value also depends on the dynamic of the entity. Let' now formalize this approach of the problem. On each receiving site, the influence of the transit delay (TD) on the extrapolated state of a distant entity is related to the difference (in absolute value) e^{t) between the actual position P„(f) and the extrapolated position P^git) of the considered entity (relation 1).
83
Pa (0 = frsi "^"^Si ^aW-dT
+ {t- TSi )V,- +P,-
^p(0 = fPfl(0 -^DR(Oil = \l[,. duj^^, [A„iT) - A,-]• rfrjl where: - A^{t) gives the actual acceleration of the considered entity at time t; - P, v., and A, respectively give the actual position, speed and acceleration of the entity at time Ts.. Let's note that P, V., and A. are three specific fields of the ES PDU sent at time Ts-. If the entity has a bounded acceleration (|| AJj) \\ < A^), then e^it) can be bounded during each [Ts•^^, TS|^^+TD] time interval. ej,(t) being less than or equal to TTi^^ while t < Ts.^^, it is then possible to major eJJSi^^+TD) by a sum of three terms (relation 2): ep(Tsi+i + TD) < ||/^^^+l duj!^^, [Aa(T) - A j . ^ r | | +|te''''^"e;"[A.(r)-Aj..r||
(2)
Following the dead reckoning mechanism on the sender site, the first term is obviously majored by 77J^„.
The third term is majored by : - (Vi) A^^. TD- for a linear extrapolation ( P/j/j (f) = P, + V, .{t - Tsi)) -
A^^ TD^ for a quadratic extrapolation ( PQ^J(t) = ?,- + V;.(t-Tsj)+- A,-.(r-Tsi)^)
It is then to major the second term, corresponding to the product between TD and a constant representing the difference between the entity actual speed and its extrapolated speed at time Ts.^^, sending date of the ES (relation 3): TD -ig^' + U A ^ d ) - A , ] - r f T = [(V,+i - V,- ) - A ; . {Ts , + 1 - Ts i )]TD
(3)
= ( V « ( T S , + I ) - V z ) / ? (r5, + i))-TO
This difference raises its maximal absolute value when the difference between the entity actual speed and its extrapolated speed is also maximal during the whole [Ts^ TJ.^,] time interval. Under those conditions, the second term of relation (2) is majored by 4^^max • T^hpos • TD for a linear extrapolation and by 2.^A„j^ • Thp^^ • TD for a quadratic expression. As a conclusion, it is then possible to major ej(t) by the following expression (relation 4):
84
,IT\^ + TD-
e,{t)
-J 'alamO 4changeTurnO «4»
Fig. 4. Scheduling protocol For CSCW application, the next step is to extend the service subsystem to manage collaborative features. These new components and connectors will receive the events generated by the GUI and broadcast them to all collaborators. The relation between them is showed in Fig.5. In MultiTEL the service is running in a distributed platform which implement inside the USP object the broadcast of an event to multiple participants, so it is not necessary to implement an additional event broadcaster. When a USP has to broadcast an event, it sends the event to the USPs of each receptor participant reaching the appropriate connector. For a particular service these components and connectors will depend on thefloorcontrol implemented. If the service uses a rigidfloorcontrol Jiccording to token passing control scheme, only the user that currently has the token can change the state of the CSCW environment and the coherency is enforced. In this case only the EventBroadcaster connector is needed. However, if the service needs multiple users to have the control over the state of the CSCW environment it is necessary to make an absolute ordering of user actions and requires the use of a conflict resolution protocol to maintain consistency. There are two possible approximations: centralized control versus distributed control. - Centralized solution: In this case there would be a unique EventMerged connector, a Mergedlnput component and an EventBroadcaster connector in the service. The EventMerged connector receives the events from jill psirticipants, and sends them to the Mergedlnput component. The Mergedlnput component
229
Fig. 5. Collaboration subsystem sorts the events and sends them to the EventBroadcaster connector which broadcasts the events to all participants. - Distributed solution: In this case there would be a Clock component and £in EventBroadcaster connector for each participant in the service. When a participant sends an event, the EventBroadcaster receives it, adds a timestamp and broeidcasts it to all participants. In this case, it is more important to assure a general coherency between all collaborators than to assure that the events are received in the same order that they were generated. The centralized solution requires more computation since the events must be ordered and have the disadvantage of all centrjdized solutions, but maintaining the coherency between participant is easier. MultiTEL can support different interaction modes. Changing the EventMerged and/or EventBroadcaster implementation we can change between manage all input events in a collaborative way or only some of them. For example, a help button that explains the game's rules is not a collaborative component and if a user push this button the rest of collaborators do not need to be notified. Multimedia Subsystem The multimedia subsystem is a collection of components and connectors that model multimedia devices and control data read/write operations (Fig.6). Our compositional model helps to insulate application developers from the difficulties of multimedia programming. Connectors characterise the dynamic relationships among MultimediaDevice components. The MultimediaSAP component encapsulates the logical connection graph (LCG) which determines the devices to capture. In the CWPictionary service, a participant will join the service if the AllocCont connector can capture a microphone and a speaker or a chat device. If any of the participants do not have a microphone and a speaker, then everybody will use the chat device for communication. In addition, the use of a camera and a display device can be defined as optional. All these requirements will be specified in the LCG. The AllocCont
230
y—^^
stai
t
-;-,v
[^ n, |—»""0 "o "o TVH
A^^
H "o [—-°-oHi,K|,
Ls-
.**
HAP TSMPCAL
"L^J-'-O-o'bHl
-•"^
VBITICAL HOHZOrfTM.
Fig. 6. Left: Frequency division for the horizontal, vertical and temporal domains into two resolution levels using a 3D dyadic separable WT. Right: Its filter bank implementation.
function or filter G, and the low-pciss is called scaling function, or filter H. As it can be observed, multiresolution is obtained by new iterations, where the absolute low pass subband is feedbacked as a new input to the filter bank. This decomposition is done by splitting each frequency domain (horizontal, vertical and temporal) into two parts, as can be seen on the left of figure 6. Initially by this process we get 2x2x2 frequency shares (called subbands) that compose the first resolution level. If the absolute low pass subband (from the output of every low-pass filter of each domain), is processed again, we get 2x2x2 subbands more, that compose the second resolution level. The output of each filter is undersampled by 2 (dyadic sampling) to keep constant the amount of information[9], without adding any redundancy. In this way we also achieve a frequency response that is approximately constant logarithmic, similar to the HVS[7]. The notation that will be used to identify each subband is: t and T for the temporal low and high responses, h and H for the horizontal low and high responses and v and V for the vertical low and high responses. To perform the inverse transform the same filter structure is used in reverse mode but changing the analysis filters by synthesis filters. Both the analysis and synthesis filters have to meet the following requirements: perfect reconstruction, null aliasing and null distortion[5] [9]. These biorthogonal filters have been designed to be linear, using Daubechies' method, avoiding in this way damaging
251
the image after the quantization process [5]. It should also be noticed that a 2 coefficient filter (Haar filter) have been used in the temporal axis to reduce the number of frames that need to be stored by the coder. These filters are labelled with subindex OQ in figure 6. The transfer functions of the analysis (with subindex o) and synthesis (with subindex i) filters, both for the horizontal and vertical domains, are: ^0(2) = i ( l + 22-^ + z-^) Hi{z) = i ( - l + 2z-i + 6^-^ + 2z-^ - z-*) Go{z) = i ( l + 2^-1 - 6z-' + 2z-^ + z-*) Gi{z) = i ( l - 22"' + z"') (1)
4
Operation of System
When a system, as the one described in the previous section, is fed with a video sequence of 25 ^^j^, then a set of 4 frames (4x40=160 ms) are needed to perform a complete 3D decomposition, with two resolution levels . This number of frames represent a trade off between the decorrelation ratio and the number of frames that need to stored at the coder. The process can be observed in figure 7.
40 ms
40 ms
40 ms
Frames
40 ms temporal axis
I ^ « w
i.M.^..d M M r-t I
Subbands
Fig. 7. Subband generation using 3D Wavelet Transform with two resolution levels. Different frames are processed every 40 ms
For example, lets assume that our system is going to perform the decomposition of 4 frames, that we label as frames 1, 2, 3 and 4. The system uses the pair of frames 1-2 and the pair 3-4 to obtain the first resolution level. This process generates 8 subbands from each pair of frames. Then we use the pair of subbands tvh from each of the original pair of frames (1-2 and 3-4) to generate the second resolution level. By this process we obtain 8 additional subbands. Therefore, at the end of the decomposition process we obtain 74-7-1-8 subbands"*. The absolute low pass subband (or tvh) of the second resolution level is coded using Differential Pulse Code Modulation (DPCM) because it has a uniform pixel distribution. The subbands in figure 7, with white boxes represent subbands with low temporal frequencies while with dark boxes represent subbands with high temporal frequencies. For an example of the 8 subbands from the second resolution level of Miss America, see figure 8. * Note, that the 2 low pass ones of the first resolution level (denoted tvh) have been used for the second resolution and will not be transmitted.
252
y-t^ "J Fig. 8. Subbands from the second resolution level of Miss America. From top to bottom and from left to right: tvh, tvH, tVh, tVH, Tvh, TvH, TVh and tVH
After this decomposition, the Bit Allocation Algorithm estimates the number of bits per coefBcient (or the total number of bits for each subband), as will be seen in next subsections. The Bit Allocation Algorithm requires two inputs: the bandwidth that the connection has available at any point in time and the priority of the subband. 4.1
Estimation of the Connection Available Bandwidth
An ABR connection receives periodic information from the network of the bandwidth it has available. This information is conveyed by a special type of cells called Resource Management (RM) cells. The RM cells are inserted by the source in the normal data cell flow sent toward the destination terminal. Once the RM cells get to the destination terminal then they are sent back to the source, collecting on their way congestion state information supplied by the switches. The transmission rate of an ABR source is computed by taking into account both the information conveyed by the RM cells and a set of source behaviour rules [10]. The rate at which an ABR source can transmit at any given time is called Allowed Cell Rate (ACR). The proposed video coder tracks the values taken by the ACR to estimate the bandwidth that a connection has available. The value of the ACR changes in a very small time scale (in the order of hundreds of fis) and cannot be used directly by the coder. Bear in mind that the coder works at the frame time scale ^, therefore it only requires to estimate the connection available bandwidth at this time scale. This can be achieved by filtering the values taken by the ACR. One technique that is easy to implement is to perform an exponential weighted averaging of the ACR samples, that is MACR = MACR+ aiACR - MACK), where MACR (Mean ACR) is the ^ In fact, the proposed coder works at a time scale equivalent to four frames (160 ms) according to the decomposition method descried.
253
estimated available bandwidth and a determines the cut-off frequency of the low pass filter. The value of a can be related to the sampling frequency by cos u;c =
g^ + 2a - 2 2(a-l)
(2)
One of the problems to determine the value of a is that the sampling frequency is not constant. It depends on the inter-arrival times of the RM cells, which itself is a function of the available bandwidth that changes with time. A common trade-off value is a = j ^ The video coder uses the value of MACR as a forecast of the bandwidth that the connection will have available during the next 160 ms. Figure 9 on the left shows how the MACR estimator (continuous line)performs, during an experiment carried out in the same conditions of the experiment described in section 2, but using the proposed video coder instead of a H.263 coder. The same figure shows how the ACR (on the left of figure 9, dotted Hne) of the ABR video source adapts to the changing network conditions and the backbone link utilisation factor, see figure 9 on the right.
D.2
0.4
0.6
0.8
F i g . 9. Left: Evolution of the ACR (dotted line) and of the MACR (continuous line) as a function of time for the H.263 video transmission experiment. Right: backbone link utilisation for the same experiment.
4.2
Bit Allocation Algorithm
The bit allocation procedure is done applying the Rate-Distortion theory. This requires to know the probability density function (pdf) of the coefficients in every subband in order to estimate the distortion noise introduced by every quantization process. These pdfs are well characterised by generalised Gaussian distributions with zero mean [5]. In general, subbands with lower energy should
254
have fewer bits, but the subbands with more perceptual weight (like the low pass ones) get more bits per pixel. The Rate-Distortion theory (equations 3 and 4) is based on the computation of the Mean Square Error (MSE) which it is not an HVS mechanism [5]. The proposed video coder introduces weighted perceptual factors [8] to achieve a better perceptual bit allocation. The bit allocation is estimated by minimising the next equations: M
Distortion:
D(6) = y^afcWjfCA-2~^**o-fc
(3)
M
Rate:
rut.ht^ ^ Ri^) = J2->',,fGk Kctfuetinfl
(Bmh 'o
E z
15
20
Network Size N
Fig.6. Number of VPs needed for multicast VP scheme. D. Multi-drop VP Multicasting on SONET/ATM rings requires the dropping and forwarding functions at the nodal transceivers. Based on these requirements, we design the "Multi-drop VP" for multicasting on SONET/ATM rings. It has all the advantages of VP multicast without requiring VPs to be established for all multicast combinations. The traditional point-to-point VP[9] has only one exit point. A multi-drop VP allows multiple drops or exits. To distinguish these two types of VPs, we use one bit in the GFC (Generic Flow Control) field of the cell header as the "Multi-drop VP Indicator"(MDI). Specifically , MDI=I means that the cell concerned has to be copied (or tapped out) at all intermediate ADMs and MDI=0 means that the cell is on a point-to-point VP. Note that the GFC field is used for traffic control data on a multi-access network. But on SONET/ATM rings, a VC connection corresponds to a DSl channel and these is no statistical multiplexing among the VCs. Therefore, cell-level flow control is not needed at the user-network interface(UNI) and the GFC field can be used to indicate the multi-drop nature of the VP.
280 All tapped-out cells are passed to the VC processing layer. Those belotiging to the local destinations are passed there and the remaining ones are discarded. As an example, consider Fig.7 where two multicast VCs are carried on a single Multi-drop VP (VP 2). VC 1 is set up for the connection from node 4 to nodes 2 and 1. At the intermediate node (i.e., node 3), the cells belonging to VC 1 are tapped out from the VP processing layer but get discarded at the VC processing layer. In fact, VP 2 can carry all multicast traffic from node 4 to node 1 and any subset of nodes between them. In other words , multi-drop VP keeps the advantages of multicast VP while being able to acconunodate multicast connections with various destination combinations. Multi-drop VP is simpler than VP augmented VC multicasting because no VC switching function and therefore no VPI translations are needed at intermediate nodes. It requires a much smaller number of VPIs than VP multicast. The VPI values are assigned on a global basis and no VPI translation at intermediate nodes is required. As there are 12 bits for the VPI field in the NNl ATM cells, up to 4096 VPs can be defined. If the DSl channels via circuit emulation on the SONET/ATM ring are used to support videoconferencing service, then each DSl can be treated as a VC connection and assigned a VPIA'CI.
Fig..7. Multi-drop VP in the SONET/ATM ring
4. Add-Drop MultipIexer(ADM) for SONET/ATM Ring ADMs for SONET/ATM rings can be implemented in different ways depending on the actual SONET STS-Nc terminations. A kind of ADM architecture for point-to-point SONET/ATM rings was introduced in [8]. In this section, we modify the hardware architecture in [8] to accommodate multi-drop VPs. The most commonly proposed ATM STS-Nc terminations are STS-3C, STS-12c, and STS-48c. Fig.8 shows the ADM hardware architecture for STS-3c terminations. It consists of the SONET layer, ATM layer and the service mapping layer. As the SONET layer is identical to that in [8], we focus only on the latter two. The ATM layer performs the following functions: 1. 2. 3. 4.
ATM/SONET interface — convert the STS-3c payload to ATM cell stream and vice versa; Cell type classifying — checktheMDIof individual cells andcopy out those with MDI=1. Cell addressing — For cells with MDI=0, check their VPIs to determine if they should be dropped (for local termination) or forwarded ( for termination in the down-stream nodes); Idle cell identifying — Identify the idle cell locations for cell stream insertion via a sequential access protocol.
281
STS-3C
STS-3C
Service Mapping Layer
incoming traffic streams
Fig.8. A simple ADM hardware architecture suitable for Multi-drop VP
282 The service mapping layer maps the input cells to their corresponding DSl cards based on their VPIA'CI values. Cells from different STS-3c payloads are first multiplexed into a single cell stream. Their VPI/VCI are checked. Those correspond to the local terminations are passed there while the rest are discarded. According to [8], the bandwidth requirement for DSl service is allocated on the peak rate basis and so no congestion will occur. ADM architecture for point-to-point SONET/ATM rings is analyzed in [8]. The ADM architecture for supporting multicasting proposed in this paper requires the adding of cell type classifier (MDI check) and cell copier. These two fiinctional blocks can be embedded in a modified ADM chip.
5. Conference Management A. Multicast Setup and Release Procedure Let there be a conference bridge which performs the fimctions of routing, admission control, and the management of changing active nodes. It is actually a program performing these functions at one of the network nodes. When a new conference is initiated or when there is a change of active node in an on-going conference, a conference management process is created. The conference bridge collects information such as the number of conferees, their location, and their busy/idle status, etc., and tries to set up a multicast connection. B. Call Admission Call admission on a ring network is very simple. When a new call arrives, the conference bridge checks if there is a minimum hop multicast connection with all the links involved having enough bandwidth for the new call. If yes, accept the call, reject otherwise. C. Speaker Change Management In the speaker-video conference network, the network resources should be dynamically allocated and retrieved in response to the changes of speakers throughout the conference session. If the next speaker is attached to the same node, the conference bridge keeps the existing multicast connection. Otherwise, a new cormection is identified and established according to the minimum hop routing rule and the channels in the former multicast connection are released. D. Conferee Joining and Withdrawing One conferee may request to withdraw from an ongoing conference while another conferee may wish to join. Upon receiving a withdrawal request, the conference bridge first checks the location of the node to which the withdrawing conferee is attached. If it is an intermediate node of a multi-drop VP, the multicasting route is not changed. The tradeoff is between saving network resources and processing overhead of "hot" switching. For joining, if the new conferee is attached to an intermediate or the termination node of a multi-drop VP being used, the conference bridge only needs to inform the new conferee of the VC identifier used by the conference in that VP. The local node then outputs the cell stream of that conference to the new conferee. On the other hand, if the location of the new conferee is outside all multi-drop VPs being used, a longer multicast route is set up for its inclusion.
6. Multi-drop VP Assignment Schemes We propose five multi-drop VP assignment schemes and compare their VPI numbers required in this section. Their bandwidth demands are derived and compared in the next section. A. Loop Scheme In this scheme, each source node sets up a loop multi-drop VP that passes through all other nodes, as shown in Fig.9. The total number of VPs required is therefore N. To balance the traffic on the clockwise and the counter-clockwise directions, the VPs for source nodes 1,3,5, ...can be assigned on one direction and the VPs for source nodes 2,4,6,... on the other direction.
283
Multi-drop VP for source node 3
Fig.9. Loop Multi-drop VP Assignment scheme
B. Double Half-Loop Scheme In the Double Half-loop scheme, two multi-drop VPs on the two sides of the source node are set up for embracing the rest of the nodes. Fig. 10 shows such an assignment for node 3 being the source node. The number of multi-drop VPs required for encircling assignment is 2N and each VP has length of approximately N I 2 hops. Under this scheme, when all destination nodes are on one side of the source node, only one VP is needed. This results in a higher bandwidth efficiency than the Loop Assignment scheme.
Multi-drop VP Multi-drop VP
(a)N odd
(b)N even
Fig.lO. Double Half-loop Multi-drop VP Assignment
C. Single Segmental Minimum-hop Scheme For each source node, we set up multi-drop VPs to all other nodes on one direction, as shown in Fig. 11. Under this scheme, a minimimi-hop route within one segment can be found for any multicast connection. Again, to balance the traffic on the two directions on a bi-directional ring, the VPs for nodes 1,3,5,... can be assigned on one direction and the VPs for nodes 2,4,6,... on the other direction. Obviously, the total number of VPs required is N(N-1).
284
MulU-dropVP Fig.ll. Segmental Minimum-hop Multi-drop VP Assignment D. Minimum-Hop Within Half-Loop Scheme In the Double Half-loop scheme, if we add VPs for all the sub-segments of the half-loop VPs as shown in Fig. 12, the bandwidth utilization can be increased. We call this the Minimum-hop within Half-loop Scheme. Obviously, its bandwidth efficiency is higher than the first three schemes. The minimum number of VPs required is N(N — 1).
Multi-drop VP
Fig.l2. Minimum-hop within Half-loop Multi-drop VP Assignment scheme
E. Unconstrained Minimum-hop Scheme A minimum hop route does not waste any bandwidth resources. Such route is always available if multicast VPs are set up for all combination of source and destinations. As discussed in Section 3(c), the total number of VPs needed is huge. On the other hand, with the use of multi-drop VPs, the same can be achieved when multi-drop VPs are set up for all combinations of "source and farthest destination" pairs. To do so, for each node as source node, we assign N-1 multi-drop VPs on the clockwise direction around the ring to each of the other nodes and another N-I multi-drop VPs on the counter-clockwise direction to each of the other nodes as well (Fig.l3). The total number of VPs required is 2N(N-1). When all these VPs are available and used, we call this the Unconstrained Minimum-hop scheme.
285
MuW-drop VP
Fig.l3. Minimum Hop Assignment scheme Table 1 compares the VP numbers required for the five multi-drop VP assignment schemes to that of the VP multicast scheme. Table 1. Comparison of VP number requited on an N-node ring Loop Scheme
Double Half-Loop Scheme
Single Segment Minimum-hop Scheme
Minimum-hop within Half-loop Scheme
Unconstrained Minimum-hop Scheme
VP Multicast Scheme ' N ~
N
2N
N(N-1)
N(N-1)
2N(N-1)
l O + l) 1=1
/• + 1
7. Bandwidth Demand Analysis In this section, we analyze the bandwidth demand of k party conferences for the five multi-drop VP assignment schemes assuming that the source and k-1 destinations are randomly located on an Af-node ring. For convenience, we refer the five schemes as Scheme A, B, C, D and E according to their order of presentation in the last section. Without loss of generality, we can let node 0 be the source and let A = (4,/l2,...,^;v_i) ^^ ^ random vector with A =\ indicating that node / is a destination and ^^ = 0 otherwise. In addition, let a = ( a , , fl^' •••) ^n-\) be a binary vector and N-\
(1)
n.
Due to the symmetry, the destination distribution takes any pattern in 3 with the same probability. Given Afand/t, the total number of patterns is simply ( ^ ~ M _ (A^-IJ! , The probability that A will take
\k-\j
(k-\)\(N-k)\
on any specific pattern a isjust one over that total number, specifically,
{{k-\}.(N-k). Pro6[A = a] = j
(Af-1)! 0
for a 6 ii|;
(2)
otherwise
Let A J. (a) be the number of links used by a specific connection request of size k with destination distribution a imder multi-drop VP assignment scheme X. Averaging over all destination distributions a in £i,, we get the expected number of links required as (3)
286 The bandwidth demand factor under scheme X, denoted as r\j^, is defined as the average number of links used normalized by the ring size N. In other words
^^ =
E[hA^)]
(4)
N
In the following, we derive h., (a) under different multi-drop VP assignment schemes. A. Loop Scheme Under this scheme, a route of N -\ hop counts is always used for any multicast connection request. Thus we have A,(a)=/V-1
(5)
B. Double Half-loop Scheme Under this scheme, all VPs have length about Af / 2. If all destinations are clustered within one of the two half-loops, one VP is enough. Otherwise, two VPs are required. Specifically, if N is odd, we have : /.«(a) =
if ^
\(N-\)/2
a, = A: - 1 or
i=l
[
^ a, = A - 1 i=iN+l)/2
(6)
Otherwise
N-l
IfN is an even number, we have Nf2
if X « , = ' t - 1 (=1 (jv-i)
^(a) = N/2-1
(7)
if S a , = / t - l Otherwise
N-l
C. Single Segmental Minimum-hop Scheme Under this VP assignment scheme, the number of links used by the multi-drop VP is numerically equal to the VP hop countfromthe source to the farthest destination. Specifically, /ic(a) = max{/|a^ >0,V/}
(8)
D. Minimum-hop within Half-loop Scheme Let u and v be the node numbers of the farthest destinations from the source within the two half loops. For a specific 3 , they are given by, max{/- \aj>0J0,j 0,y>(Af-l-l)/2}
for iV odd
[miny \aj>0,j>N/
2)
for A^ even
(9)
(10)
With that, the number of links used is simply
/.„(a) = « + (Af-v)
(11)
E. Unconstrained Minimum-hop Scheme Under this scheme, a minimum hop route can always be used for any multicast connection request. On a ring, the minimum hop route can use either one VP in the clockwise direction or one VP in the counterclockwise direction or two VPs fanning outfromthe source in both directions. Let w,,W2,...Wt_|be the
287
node numbers of the destinations in ascending order, i.e., : Wj < • • •iVn_|. After using the minimum hop route for the multicast connection, there will be an idle segment left on the ring. The length y of the idle segment is just the number of links between the two adjacent connection nodes that are farthest apart. Enumerating all such segment lengths and finding the largest one, y is obtain as 7 = max{(Ar-H't_,),(Wj_, - w^.^),..., (Wj-w,),w,}
(12)
The number of links used under the unconstrained minimum hop scheme is therefore h,(z) = N-y
(13)
7. Numerical Results Fig. 14-17 show the bandwidth demand factors for some network sizes and call sizes under the various multi-drop VP assignment schemes. Fig. 14 is for the call size k=2, i.e., for point-to-point call. We can find that: (1) Bandwidth demand factor for Scheme E is significantly smaller than those for Schemes A, B, C and D; (2) Schemes C and D have the same T\ values; (3) T] ^ and T)^ do not change with the call size.
9 11 Network Size N
Fig.l4. Comparison of bandwidth demand factor T\ for call size k=2
9 11 Network Size N
Fig. 15. Comparison of bandwidth demand factor T] for call size k=4
288 Fig. 15 shows the results for k=4. Here, T]^ and Tl^ increase with the network size while T]^ and Tl^ behave the opposite. Fig.l6 and Fig.l7 show the results for k=6 and 5 respectively. Here, we see that for k'> 6 , T] ^ and Tj J become virtually indistinguishable. While Tlj, is always a constant, T|o and T]^ both decrease slowly as N increases.
11 Network Size N Fig. 16. Comparison of bandwidth demand factor T) for call size k=6
Note that the bandwidth demand of Scheme E, i.e., the Unconstrained Minimum-hop Multi-drop Scheme is the same as that of the multicast VP scheme. However, the number of VPI needed is much smaller than that of the former. As a result, the VPIs could be assigned on a global basis when using Scheme E. This can reduce the complexity of the routing and switching procediu'e.
11 Network Size N
Fig. 17. Comparison of bandwidth demand factor T| for call size k=8
9. Summary Current SONET rings caimot support multiparty videoconferencing efficiently. In this paper, we propose to use SONET/ATM rings to support this service via switched DSl service. Various multicasting methods are discussed and the new Multi-drop VP is found to be suitable for multicasting on SONET/ATM rings. Several VP assignment schemes are proposed and their bandwidth demand factors are compared. Among them, the Unconstrained Minimum-hop Multi-drop VP scheme has the smallest bandwidth demand factor which is identical to that of the multicast VP scheme. It, therefore, has the advantage of requiring much much smaller number of VPs to be set up and is therefore the preferred VP assignment scheme for multiparty video conferencing service on SONET/ATM rings.
289
Reference 1. Sabri, S. and Prasada, B., "Video conferencing systems," Proc. of IEEE, vol.75. No.-?, pp.671-688, Apr.I985. 2. Haruo Noma, Yasuichi Kitamura, et.al, "Multi-point virtual space teleconferencing Commun., Vol. E7S-B, No. 7, July 1995.
system," lEICE Trans.
3. Y. W. Leung, Tak-shing Yum, "Cotmection optimization for two types of videoconferences," lEE Proc. Commun .,
Vol.l43,m.3, June 1996. 4.
T. S. Yum, M. S. Chen and Y. W. Leung, "Video bandwidth allocation for multimedia teleconferences," IEEE
5.
Turletti, T. Huitema and C. Journal, "Videoconferencing on the Internet," IEEE/ACM Transactions on Networking, Jun., 1, 1996, v 4.
6. 7.
CCITT Shidy Group XVIII, Report R34, COM XVIII-R 34-E, June 1990. K.Sato,S. Ohta, and I. Tokizawa, "Broadband ATM Network Architecture Based on Virtual Paths," IEEE Trans Commun., vol. 38, no.8, Aug. 1990. T. H. Wu, et.al, "An Economic Feasibility Study for a Broadband Virtual Path SONET/ATM Self-Healing Ring Architecture," IEEE Journal on Selected Areas in Commun ., Vol.10, No.9, December 1992. CCITT Study Group XVIII, " Draft recommendation 1,311, B-ISDN general network aspects," SGXVlll, January, 1990. Atsushi Horikawa, et.al, "A new bandwidth allocation algorithm for ATM ring networks", IEEE GLOBECOM'95, pp404-409. Yoshio Kajiyama, et.al, " An ATM self-healing ring," IEEE J. of Selected Areas in Communications, vol.12, no.l , pp 171-178,Jan. 1994. Jay J. Lee and Kwi-yung Jung, " An algorithm for determining the feasibility of SONET/ATM rings in broadband networks," IEEE Fourth International Conference on Computer Communication and Networks, 1995, pp356360. N.Tokura, K. KLikuchi and K. Oguchi, "Fiber-optic subscriber networks and systems development,'"'Trans. lEIEC, vol.e-74,no.l,I991. ITU-T draft Recommendation Q.931, Q.93B. ITU-T Recommendation H.320, " Narrow-band ISDN Visual Telephone Systems and Terminal Equipment", 1996. D. Saha, D. Kandlur, T. Batzilai, Z.Y. Shae, and M. Willebeek-LeMair, " A videoconferencing testbed in ATM: Design, implementation, and optimazations," in Proc. ICMCS, 1995. Y. C. Chang, Z.Y. Shae, and M. Willebeek-LeMair, "Multiparty videoconferencing using IP multicast," in SPIE Proc.Networking Commun., San Jose, CA,Feb. 1996. T. Turletti, "The INRIA videoconferencing system (IVS)," Connexions, vol.8, no.lO, pp.20-24 ,1994. A. S. Tanenbaum, Computer Networks, Prentice-Hall Intemational, Inc., 3"'edtion,1996. ppl25-126. Hideki Tode, et.al, "Multicast routing schemes in ATM," Intemational Journal of Commun. Systems. Vol.9,185196,1996. Hideki Tode, et.al, "Multicast routing schemes in ATM", Intemational Journal of Communication Systems, Vol. 9, 185-196(1996).
Trans, on Commun., pp.
8. 9. 10. 11. 12.
13. 14. 15. 16. 17. 18. 19. 20. 21.
457-465, Feb 1995.
A Generic Scheme for the Recording of Interactive Media Streams Volker Hilt, Martin Mauve, Christoph Kuhmiinch, Wolfgang Effelsberg University of Mannlieim, LS PIIV, L 15,16 68131 Mannheim, Germany {hiIt,mauve, kuhmuench, effelsberg}@inf ormatik.uni-itiannheim.de
Abstract. Interactive media streams with real-time characteristics, such as those produced by shared whiteboards, distributed Java applets or shared VRML viewers, are rapidly gaining importance. Current solutions to the recording of interactive media streams are limited to one specific application (e.g. one specific shared whiteboard). In this paper we present a generic recording service that enables the recording and playback of this new class of media. To facilitate the generic recording we have defined a profile for the Real-Time Transport Protocol (RTF) that covers common aspects of the interactive media class in analogy to the profile for audio and video. Based on this profile we introduce a generalized recording service that enables the recording and playback of arbitrary interactive media.
1 Introduction The use of real-time applications in the Internet is increasing quickly. One of the key technologies enabling such transmissions is a transport protocol that meets real-time requirements. The Real-Time Transport Protocol (RTP) has been developed for this purpose [16]. The RTP protocol provides a framework covering common aspects of real-time transmission. Each encoding of a specific media type entails tailoring the RTP protocol. This is accomplished by an RTP profile which covers common aspects of a media class (e.g. the RTP profile for audio and video [14]) and an RTP payload specifying the transmission of a specific type of media encoding (e.g. H.261 video streams). While the class of audio and video is the most important one and is quite well understood, interactive media streams are used by several applications which are gaining importance rapidly. Interactive applications include shared whiteboard applications [3], multi-user VRML models [9] and distributed Java animations [7]. Many existing protocols for interactive media are proprietary. This prevents interoperability and requires re-implementation of similar functionality for each protocol. For this reason, we have defined an RTP profile [10] that covers common aspects of the distribution of interactive media. It can be instantiated for a specific interactive media encoding. The RTP profile for audio and video has enabled the development of generic recording services like those described in [4] [15]. The RTP audio and video recorders operate independently of a specific video or audio encoding. Instead of decoding
292
incoming RTP packets and storing video and audio content (e.g. in H.261 or MPEG format), they operate on entire RTP packets. This has the major advantage that the mechanisms implemented in the recorder (e.g. media storage or media synchronization during playback) are available for all video and audio formats. Recent developments extend these RTP recorders to the proprietary protocols of specific applications. In general, interactive media streams require additional functionality in an RTP recorder since certain information about the semantic of an interactive media stream must be considered. In particular random access to an interactive media stream requires mechanisms to provide the receivers with the current media state. For example, if a recorded shared whiteboard stream is accessed at a random position, the contents of the page active at that time must be displayed to the user. Thus, a recorder must provide the receiving shared whiteboards with the page content before the actual playback is started. Our RTP profile for interactive media provides a common framework that enables the development of a generic services like recording or late join for the class of interactive media. In this paper we discuss the principles of such a generic recording service. We present mechanisms that are required for the recording and playback of interactive media streams, and we show that random access to these media streams can be achieved by these mechanisms without having to interpret media-specific data. The remainder of this paper is structured as follows: Section Two provides an overview over related work. Section Three introduces a classification of different media types. Section Four provides a short overview of our RTP profile for interactive media on which the presented recording scheme is based. Section Five describes the basic architecture of an RTP recording service. Section Six discusses fundamentals of random access to stored interactive media streams, and Section Seven describes two mechanisms that realize media independent random access to these media streams. Section Eight describes the current state of the implementation. Section Nine concludes the paper with a summary and an outlook.
2 Related Work Much work has been done on the recording of media streams. The rtptools [15] are command-line tools for recording and playback of single RTP audio and video streams. The Interactive Multimedia Jukebox (IMJ) [1] utilizes these tools to set up a video-on-demand server. Clips from the IMJ can be requested via the Web. The mMOD [13] system is a Java-based media-on-demand server capable of recording and playing back multiple RTP and UDP data streams. Besides RTP audio and video streams, the mMOD system is capable of handling media streams of applications like mWeb, Internet whiteboard wb [6], mDesk and NetworkTextEditor. While mMOD supports the recording and playback of UDP packets, it does not provide a generalized recording service with support for random access. The MASH infi-astructure [11] comprises an RTP recording service called the MASH Archive System [12]. This system is capable of recording RTP audio and video streams as well as media streams produced by the MediaBoard [18]. The MASH Archive System supports random access to the MediaBoard media stream but does not
293
provide a recording service generalized for other interactive media streams. A different approach is taken by the AOF tools [2]. The AOF recording system does not use RTF packets for storage but converts the recorded data into a special storage format. The AOF recorder grabs audio streams from a hardware device and records the interactive media streams produced by one of two applications AOFwb or the Internet whiteboard wb. Random access as well as fast visual scrolling through the recording are supported but the recordings can only be viewed from a local hard disk or CD. The recording of other interactive media streams is not possible. In the Interactive Remote Instruction (IRI) system [8] a recorder was implemented that captures various media streams from different IRI applications. In all cases a media stream is recorded by means of a specialized version of the IRI apphcation that is used for live transmission. This specific application performs regular protocol action towards the network but stores the received data instead of displaying it to the user. For example, a specialized version of the video transmission tool is used to record the video stream. Such a specialized recording version must be developed for each IRI tool that is to be recorded. One of a number of commercial video-on-demand servers is the Real G2 server. The Real G2 server is capable of streaming video and audio data as well as SMIL presentations to RealPlayer G2 clients. A SMIL presentation may contain video and audio as well as other supported media types like RealText, RealPix and graphics. In contrast to the recording of interactive applications, a specialized authoring tool is used to author SMIL presentation, which consist of predefined media streams displayed according to a static schedule.
3 Interactive Media 3.1
Classification of Interactive Media
Before discussing the recording of interactive media streams, it is important to establish a common view on this media class. Basically, we separate media types by means of two criteria. The first criterion distinguishes whether a medium is discrete or continuous. The characteristic of a discrete medium is that its state is independent of the passage of time. Examples of discrete media are still images or digital whiteboard presentations. While discrete media may change their state, they do so only in response to external events, such as a user drawing on a digital whiteboard. The state of a continuous medium, however, depends on the passage of time and can change without the occurrence of external events. Video and animations belong to the class of continuous media. The second criterion distinguishes between interactive and non-interactive media. Non-interactive media change their state only in response to the passage of time and do not accept external events. Typical representations of non-interactive media are video, audio and images. Interactive media are characterized by the fact that their state can be changed by external events such as user interactions. Whiteboard presentations and interactive animations represent interactive media. Figure 1 depicts how the criteria characterize different media types.
294 Interactive Media
Digital Whiteboard
Animation
n o
Image
1 Video
Non-Interactive Media Fig. 1. Examples of Media Types
3.2
Model for Interactive Media
An interactive medium is a medium that is well defined by its current state at any point in time. For example, at a given point in time the medium "Java animation" is defined by the internal state of the Java program that is implementing the animation. The state of an interactive medium can change for two reasons, either by the passage of time or by events. The state of an interactive medium between two successive events is fully deterministic and depends only on the passage of time. Any state change that is not a fully deterministic function of time is caused by an event. A typical example of an event is the interaction of a user with the medium. An example of a state change caused by the passage of time might be the animaUon of an object moving across the screen. In cases where a complex state of an interactive medium is transmitted frequently by an application, it is necessary to be able to send only those parts that have changed since the last state transmission. We call a state which contains only the state changes that have occurred since the last transmitted state a delta state. A delta state can only be interpreted if the preceding full state and interim delta states are also available. The main advantages of delta states are their smaller size and that they can be calculated faster than full states. In order to provide for a flexible and scalable handling of state information, it is sometimes desirable to partition an interactive medium into several sub-components. In addition to breaking down a large media state into more manageable parts, such partitioning allows participants of a session to track only the states of those sub-components they are actually interested in. Examples of sub-components are VRML objects (a house, a car, a room), or the pages of a whiteboard presentation. To display a non-interactive media stream like video or audio, a receiver needs to
295
have an adequate player for a specific encoding of the medium. If such a player is present in a system, every media stream that employs this encoding can be processed. This is not true for interactive media streams. For example, to process the media stream that is produced by a shared VRML browser, it is not sufficient for a receiver to have a VRML browser. The receiver will also need the VRML world on which the sender acts; otherwise the media stream cannot be interpreted by the receiver. But even if the receiver has loaded the correct world into its browser, the VRML world may be in a state completely different from that of the sender. Therefore, the receiver must synchronize the state of the local representation of the interactive medium to the state of the sender before it will be able to interpret the VRML media stream correctly. Generally speaking, it does not suffice to have a player for an interactive media type. Additionally, the player must be initialized with the context of a media stream before that stream can actually be played. The context is comprised of two components: (1) the environment of a medium and (2) the current state of the medium. The environment represents the static description of an interactive medium that must initially be loaded into the media player. Examples of environments are VRML worlds or the code of Java animations. The state is the dynamic part of the context. The environment within a player must be initialized with the current state of the interactive medium before the stream can be played. During transmission of the stream, both sender and receiver must stay synchronized since each event refers to a well-defined state of the medium and cannot be processed if the medium is in a different state.
4 RTP Profile for Interactive Media In order to be able to develop generic services which base solely on our RTP profile for interactive media, common aspects of interactive media streams which are not already handled by RTP must be supported by the profile. These aspects can be separated into two groups: information and mechanisms. Information is needed, so that a generic service can analyze the semantics of the application level communication. The information provided by the RTP profile is: identification of application-layer packet content, identification of sub-components and sequence numbers. Mechanisms are needed by a generic service to take appropriate actions on the medium. The mechanisms provided within the RTP profile are: announcement of sub-components, requesting state transmissions and mapping of sub-component IDs to application-level names. The remainder of this section discusses basic concepts of the RTP profile for interactive from the view of a generic recording service. A detailed description of the profile can be found in [10]. 4.1
Structure of Data Packets
The model presented in Section 3.2 illustrates that states, delta states and events of an interactive medium must be transmitted in real-time. We define the structure of data packets containing theses basic elements of interactive media as depicted in Figure 2 within our RTP profile; for the general structure of RTP packets see [16]. The most important fields of these packets are type, sub-component ID and data. The type field is needed to distinguish the different packet types state, delta state and event defined in
296
the profile. This is especially important for the recording service, which must be able to identify the type of content transported in an RTP packet without having to interpret the data part of the packet. In state and delta state packets the sub-component ID field holds the sub-component ID of the state included in the data part of the packet. In event packets this field identifies the sub-component containing the "target" of an event. The data field of the packet contains the definition of states, delta states or events specific to the payload type. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 V=2 P X
CC
M
sequence number
PT timestamp
synchronization source (SSRC) identifier contributing source (CSRC) identifiers IV=0
type
PRl
sub-component sequence number
reserved
sub-component ID sub-component ID (continued) data
Fig. 2. RTP Packet Structure for States, Delta States and Events
Since setting the state of a sub-component can be costly and might not always be reasonable, state and delta state packets contain a priority (PRI) field. This priority can be used by the sender of the state to signal its importance. A packet with high priority should be examined and applied by all communication peers which are interested in the specific sub-component. Situations where high priority is recommended are resynchronization after errors or packet loss. Basically a state transmission with high priority forces every participant to discard its information about the sub-component and requires the adoption of the new state. A state transmitted with low priority can be ignored at will by any participant. This is useful if only a subset of communication partners is interested in the state. An example of this case is a recorder that periodically requests the media state in order to insert it into the recording. 4.2
Announcement of Sub-Components
For the implementation of an efficient recording service it is important that the subcomponents present in a session are known. Furthermore it should be possible to distinguish those sub-components which are currently needed to display the medium. Those sub-components are called active. An example for active sub-components are the currently visible pages of a shared whiteboard. All remaining sub-components are
297
passive (e.g. those shared whiteboard pages which are currently not visible for any user). Declaring a sub-component active does not grant permission to modify anything within that sub-component. However, a sub-component must be activated before a session participant is allowed to modify (send events into) the sub-component. The knowledge about active sub-components in a session allows a recording service to transmit only those sub-components during a playback that are actually visible in the receivers. The profile provides a standardized way to announce the sub-components of any application participating in an interactive media session and allows to mark sub-components as active. Active and passive sub-components are announced by selected participants in regular intervals within RTCP reports. 4.3
Requesting State Transmissions
In many cases it is reasonable to let the receivers decide when the state of sub-components should be transmitted. Thus, a receiver must be able to request the state from other participants in the session. As the computation of state information may be costly, the sender must be able to distinguish between different types of requests. Recovery after an error urgently requires information on the sub-component state since the requesting party cannot proceed without it. The state is needed by the receiver to resynchronize with the ongoing transmission. These requests will be relatively rare. In contrast, a recording service needs the media states to enable random access to the recorded media. It does not urgently need the state but will issue requests frequently. For this reason, the state request mechanism supports different priorities through the priority (PRI) field in the state query packet. Senders should satisfy requests with high priority (e.g. for late joiners) very quickly, even if this has a negative impact on the presentation quality for the local user. Requests with low priority can be delayed or even ignored, e.g. if the sender currently has no resources to satisfy them. The sender must be aware that the quality of the service offered by the requesting application will decrease if requests are ignored.
5 RTP Recording Service An RTP recording service such as the MB one VCR on Demand (MVoD) [4] usually handles two network sessions (see Figure 3). In the first, the recorder participates in the multicast transmission of the RTP media data. Depending on its mode of operation (recording or playback), it acts as a receiver or sender towards the other participants of the session. A second network session can be used to control the recorder from a remote client, e.g. using the RTSP [17] protocol. During the recording of an RTP session, the recorder receives RTP data packets and writes them to a storage device. Packets from different media streams are stored separately. When playing back, the recorder successively loads RTP packets of each media stream and computes the time at which each packet must be sent using the time stamps of the RTP headers. The recorder sends the packets according to the computed schedule. A detailed description of the synchronization mechanism implemented in the MVoD can be found in [5].
298
Int. MedUl Appljc^tii A
RTP • •*
RTP *"
Multicast Session
RTP
Int. .Medial Applicatiqal
RTP
(.nntiol^ «5^^
Fig. 3. Scenario for the Recording of an RTP Session
6 Random Access In contrast to the traditional media types where random access to any position within a stream is possible, interactive media streams do not allow easy random access without restoring the context of the stream at the desired access position. For example, jumping directly to annotations on a whiteboard page only makes sense if the right page is shown on the screen. To restore the context of a recorded stream in a receiver, two operations have to be performed: First, the environment has to be loaded into the receiver. The environment can be provided by the recording service or by a third party, e.g. an HTTP server. Then the receiver must get the state of the interactive medium at the desired access position within the recorded stream. Let us come back to our whiteboard example. If we want to jump to minute 17 of a recorded teleconferencing session we must be able to show the contents of the page active at that time, together with the annotations made by the speaker. If we did not restore the state of the whiteboard, the page (which might have been loaded originally at minute 12) would not be visible. 6.1
Recovering the Media State
The state of an interactive medium can be recovered from a recorded media stream. Note that the generic recorder is not able to interpret the media-specific part of the RTP packets and thus cannot directly compute the media state and send it to the receivers. But the recorder may re-send existing RTP packets that are stored within the recorded media stream. Thus, it is our goal to compose a sequence of recorded RTP packets containing states and events that put a receiver into the desired state. The task a recorder has to accomplish before starting a playback is to determine the appropriate sequence of recorded packets. In an interactive media application the current state is determined by an initial state
299
and a sequence of events applied to that state. In a discrete interactive medium the event sequence is not bound to specific points in time. Thus, the application of an event sequence to an initial state of a discrete interactive medium will always result in the same media state, independent of the speed at which the sequence is applied. In contrast, the event sequence for a continuous interactive medium is bound to specific points in time. A sequence of events that is applied to the state of a continuous interactive medium will leave the system in the correct state only if each event is applied at a well-defined instant in time. This main difference between discrete and continuous interactive media must be considered when computing the sequence of event and state packets to recover the media state. In the case of a discrete medium, such a sequence can be computed to recover the media state at any point in a recorded stream. In contrast, the media state of a continuous medium can only be recovered at points within a recording where a state is available; events cannot be used for state recovery because they must be played in real-time. Therefore, random access to an interactive continuous media stream will usually result in a position near the desired access point. The more often the state is stored within a stream, the finer is the granularity at which the stream of a continuous interactive medium can be accessed. Interactive media applications usually send the media state only upon request by another application. Thus, the recorder must request the state at periodic intervals. The requests use a low priority because a delayed or missing response reduces the access granularity of the stream, which can be tolerated to some degree.
7 Mechanisms for Playback The mechanisms presented in this section implement the recovery of the media state from recorded media streams. Both mechanisms can be implemented completely in the recorder. The receiving applications need not recognize the recorder as a specific sender, nor does the recorder need to interpret media-specific data. All applications that use a payload based on the RTP profile for interactive media can be recorded, and will be able to receive data from the recorder. 7.1
The Basic Mechanism
This simple mechanism is able to recover the media state from interactive media streams which do not utilize multiple sub-components. When starting playback of such a stream, the best case is if the state is contained in the recorded stream at exactly the posifion at which the playback is to start. Then playback can begin immediately. But in general, the playback will be requested at a position where no full state is directly available in the stream. Let us consider, for example, a recorded media stream that consists of the sequence SQ containing a state, three successive delta (A) states and several events (see Figure 4). If a user wants to start playback at position tp from the recording, the state at tp must be reconstructed by the recorder. A continuous interactive medium does not allow direct access to tp because the recorder cannot determine the state at tp since there is no state available at tp in the recorded stream. However, access to position 1^3 within the stream
300
is feasible, because 1^3 is the location of a delta state. The complete media state at t^3 can be reconstructed from the state located at position tg and the subsequent delta states until position 1^3, which is the position of the last delta state before tp. The events between tj and t^3 can be ignored, because all modifications to the state at tg are reflected in the delta states. The packets that contain states can be sent at the maximum speed at which the recorder is able to send packets. If required by the medium, the internal media clock is part of the media state. Thus, after applying a state, the media clock of a receiver will reflect the time contained in the state. When the recorder finally reaches t^3 (and has sent A3), fast playback must be stopped, and playback at regular speed must be started. The start of the regular playback may not be delayed because events must be sent in real-time relative to the last state. This is important since for continuous interactive media the events are only valid for a specific state that may change with the passage of time. Altogether, the recorder will play back sequence Sj shown in Figure 4. For discrete interactive media, fast playback of events is possible. Therefore, random access to position tp can be achieved by also sending the events between t^3 and tp at full speed. The resulting sequence S2 is also shown in Figure 4. Sequence So (original): . ^
^-..
t.
t
^^ t
.
°
"
"
»
—
t
- ^ ^
Sequence Si (continuous interactive media):
t t. t
tp
Sequence S2 (discrete interactive media):
t. t
^
= complete state
state O = event
Fig. 4. Playback of a Recorded Sequence of States, Delta States and Events
7.2 Mechanism with Support for Sub-components In a more sophisticated mechanism the existence of sub-components can be exploited to reduce the amount of data required for the recovery of the media state. Using subcomponents, the state of an interactive medium can be recovered selectively by considering only those sub-components which are actually required to display the medium in its current state. Let us take a closer look at the shared whiteboard example of Section 6 where we
301
wanted to access minute 17 of the recording of a teleteaching session. Without the use of sub-components, the recorder would have to recover the complete media state valid at minute 17, which comprises all pages displayed so far. But if the shared whiteboard has divided its media state into several sub-components (e.g. a whiteboard page per sub-component) the recorder is able to determine the sub-components that are active at minute 17 and may recover them selectively. In general, when a recorded stream is accessed, the set of active sub-components at the access position can be determined and their state can be recovered. This is sufficient to display an interactive medium at the access position. However, it must be assured, that a receiver is enabled to display all subsequent data in the recorded stream. If the subsequent data contains the re-activation of a passive sub-component (e.g. a jump to a previous page in the recording of a shared whiteboard session), a receiver would not hold the state for this sub-component as passive sub-components were not recovered initially. Consequently, the receivers would not be able to decode data referring to that sub-component. Thus, the recorder must assure that the state of a sub-component is present in the recorded stream at any position where a passive subcomponent is re-activated. This can be accomplished at the time of recording if the recorder requests a state transmission for each sub-component that gets activated and inserts the retrieved state into the recording. For discrete media streams this scheme can be optimized by not requesting the state of a sub-component if the recorder can reconstruct it from previous data within the recorded stream.
si
s3 el
Sender 1:
el
el
—G
c3
el
©—
e3 —9
O
O
+s5 s4
Sender 2:
s2 —e
U
s2 e—-©
e
s5
s—
—G
e
9—>
t
si s2 i state of sub-component si e-l
O event referring to si
Szl active sub-components are s2, s4 I'^^i
activate s5
Ns4| deactivate s4
Fig. 5. Playback of a recording containing sub-components. Greyed states and events of the recorded streams arefilteredduring the recovery of the state at tThe example shown in Figure 5 depicts recorded streams of a continuous interactive medium with two senders. Sender 1 operated on sub-components 1 and 3, whereas
302
the recorded stream of sender 2 contains packets for sub-component 2 and 4 and later for 2 and 5. If these recorded streams are accessed at position tp, the recorder has to compute the Hst of sub-components which are active at tp. In our case these are si, s2 and s3. For each of these sub-components, the position of the most recent sub-component state before tp must be located in the recorded stream. As a result, the recorder gets the positions of sub-component states tjj, ts2r ts3. (For the sake of simplicity, we have only considered states; support for A states can be achieved similar to the basic mechanism.) si is the sub-component whose state is farthest from tp (here ^^^ < 1^2 < ts3). Thus the recorder has to start playback at position tji and recovers the state si. The recorder must continue with the playback because events referring to si are located between tgj and tp. Notice that we are considering a continuous interactive medium where all events must be played in real time. During the playback of the stream between tj] and tp two problems may occur: At first, events may be located in the stream which refer to states that have not yet been sent. The sending of these events must be suppressed because a receiver can not interpret them correctly. In our example, events concerning s3 and s4 are filtered out. Secondly, there may be sub-component states in the stream that are not in the set of active sub-components at tp (s4 in our example) and thus are not needed for playback at tp. Therefore the state of s4 (and all events referring to s4) must also be filtered out. Summing up, the recorder will start playback at position tjj, sending the state of sub-component si and events referring to si. All other states and events will be filtered out. The next required state is s2, which will be sent as soon as it shows up and, after that, all subsequent events referring to s2 will also pass the filter. The same holds true for s3. Finally, once the recorder has reached position tp, the sub-components si, s2 and s3 will have been recovered, and regular playback without any filtering may start. After the start of the regular playback, the set of active sub-components is enlarged by s5. As the state of a newly activated sub-component has been inserted into the stream during the recording, the state of s5 can be sent by the recorder. Thus, all receivers are enabled to interpret upcoming events referring to s5.
8 Status of the Implementation Our work on the recording of interactive media streams initially started with the implementation of a recorder for a shared whiteboard, the digital lecture board (dlb) [3]. The dlb recorder is based on the MBone VCR on Demand (MVoD) [4] service which is capable of recording and playing back multiple RTP audio and video streams. The MVoD server assures the synchronization of multiple media streams during playback, and a Java user interface enables the remote control of the recorder. The shared whiteboard dlb uses a specific RTP payload format to transmit the dlb media. The dlb recorder module basically extends the MVoD by implementing the functionality for random access to recorded dlb media streams. To achieve random access, the dlb recorder uses a mechanism that is specific to the dlb media stream. Based on the experiences from the implementation of the dlb recorder, we are currently implementing the recording service for interactive media described in this paper and we are now finishing a very early alpha version. Like the dlb recorder, the interac-
303
tive media recorder is realized as an extension to the MVoD. Thus, the existing algorithms for the synchronization of multiple media streams as well as the recording facilities for audio and video streams can be reused. The implementation of the mechanisms for random access allows the presence of discrete and continuous interactive media streams as well as audio and video streams within the same recording.
9 Conclusion We have presented a generic recording service for the class of interactive media with real-time characteristics. Examples of such media are shared whiteboards, multi-user VRML worlds and distributed Java applications. In analogy to RTP video and audio recorders, we have developed a generic recording service that is based on an RTP profile for the interactive media class. The profile covers the common aspects of this media class. We have presented the basic ideas of the RTP profile, pointing out the features that enable the recording and playback of interactive media regardless of a specific media encoding. We have described the key concepts of the generic recording service. An important aspect of this recording service is that it enables random access to recorded streams. The media context of a recording is restored before playback is started. We have showed that the context of a medium can be recovered relying only on the RTP profile and we have presented two recovery mechanisms. We are currently finishing the implementation of a first prototype of the described interactive media recording service. In future work we will implement the RTP payload-type specific functionality for distributed Java animations and multi-user VRML. Our recording service will then be tested and validated with those media types. Furthermore, we are working on a second generic service that will implement a late join algorithm. During the implementation and testing of the RTP profile, the payload types and the generic services, we expect to get enough feedback for a full specification of the profile and the payload types. We intend to publish those specifications as Internet drafts. Acknowledgments. This work is partially supported by the BMBF (Bundesministerium fiir Forschung und Technologic) with the "V3D2 Digital Library Initiative" and by the Siemens Telecollaboration Center, Saarbriicken.
References [1]
[2]
[3]
K. Almeroth, M. Amman The Interactive Multimedia Jukebox (IMJ): A New Paradigm for the On-Demand Delivery of AudioNideo. In: Proc. Seventh International World Wide Web Conference, Brisbane, Australia, April 1998. C. Bacher, R. Miiller, T Ottmann, M. Will. Authoring on the Fly. A new way of integrating telepresentation and courseware production. In: Proc. ICCE '97, Kuching, Sarawak, Malaysia, 1997. W Geyer, W. Effelsberg. The Digital Lecture Board - A Teaching and Learning Tool for Remote Instruction in Higher Education. In: Proc. ED-MEDIA '98,
304
[4]
[5]
[6] [7]
[8]
[9] [10]
[11] [12] [13]
[14]
[15] [16]
[17]
[18]
Freiburg, Germany, AACE, June 1998. Available on CD-ROM, contact: http:// www.aace.org/pubs/. W. Holfelder. Interactive Remote Recording and Playback of Multicast Videoconferences. In: Proc. IDMS '97, Darmstadt, pp. 450-463, LNCS 1309, Springer Verlag, Berlin, September 1997. W. Holfelder. Aufzeichnung und Wiedergabe von Internet-Videokonferenzen. Ph.D. Thesis (in German), LS Praktische Informatik IV, University of Mannheim, Shaker-Verlag, Aachen, Germany, 1998. V. Jacobson. A Portable, Public Domain Network "Whiteboard', Xerox PARC, Viewgraps, April, 1992. C. Kuhmunch, T. Fuhrmann, and G. Schoppe. Java Teachware - The Java Remote Control Tool and its Applications. In: Proc. of ED-MEDIA '98, Freiburg, Germany, AACE, June 1998. Available on CD-ROM, contact: http:// www.aace.org/pubs/. K. Maly, C. M. Overstreet, A. Gonzalez, M, Denbar, R. Cutaran, N. Karunaratne. Automated Content Synthesis for Interactive Remote Instruction, In: Proc. of EDMEDIA '98, Freiburg, Germany, AACE, June 1998. Available on CD-ROM, contact: http://www.aace.org/pubs/. M. Mauve. Transparent Access to and Encoding of VRML State Information. In: Proc. of VRML '99, Paderborn, Germany, pp. 29-38, 1999. M. Mauve, V. Hilt, C. Kuhmunch, W. Effelsberg. A General Framework and Communication Protocol for the Transmission of Interactive Media with RealTime Characteristics, In: Proc. of IEEE ICMCS'99, Florence, Italy, 1999. S. McCanne, et. al. Toward a Common Infrastructure for MultimediaNetworking Middleware, In: Proc. of NOSSDAV '97, St. Louis, Missouri, 1997. S. McCanne, R. Katz, E. Brewer et. al. MASH Archive System. On-line: http:// mash.CS.Berkeley.edu/mash/overview.html, 1998. P. Pames, K. Synnes, D. Schefstrom. mMOD: the multicast Media-on-Demand system. 1997. On-line: http://mates.cdt.luth.se/software/mMOD/paper/ mMOD.ps, 1997. H. Schulzrinne. RTP Profile for Audio and Video Conferences with Minimal Control, Internet Draft, Audio/Video Transport Working Group, IETF, draft-ietfavt-profile-new-05.txt, March 1999. H. Schulzrinne. RTP Tools. Software available on-line, http:// www.cs.columbia.edu/~hgs/rtp/rtptools/, 1996. H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson. RTP: A Transport Protocol for Real-Time Applications. Internet Draft, Audio/Video Transport Working Group, IETF, draft-ietf-avt-rtp-new-03.txt, March, 1999. H. Schulzrinne, A. Rao, R. Lanphier. Real Time Streaming Protocol (RTSP). Request for Comments 2326, Multiparty Multimedia Session Control Working Group, IETF, April 1998. T. Tung. MediaBoard: A Shared Whiteboard Application for the MBone. Master's Thesis, Computer Science Division (EECS), University of California, Berkeley, 1998. On-line: http://www-mash.cs.berkeley.edu/dist/mash/papers/ tecklee-masters.ps.
A Framework for High Quality/Low Cost Conferencing Systems Mirko Benz', Robert Hess', Tino Hutschenreuther', Sascha Kiimmel', and Alexander Schill' ' Department of Computer Science, Dresden University of Technology, D-01062 Dresden, Germany {benz, hess, tino, kuemmel, schill}@ibdr.inf.tu-dresden.de
Abstract This paper presents a framework for the development of advanced video conferencing systems with very high quality. The design and performance of the developed components like session management, real-time scheduler and a specific transport system are outlined. The core of the framework consists of a toolkit for the processing of audio and video data. It integrates capturing, compression, transmission and presentation of media streams into an object oriented class library. The specifics of audio/video hardware, preferred operation modes, compression algorithms and transport protocol properties are encapsulated. Due to the achieved abstraction level complex application software can easily be developed resulting in very short design cycles. Contrary to existing systems the developed application is regarded in its entirety. The focus on highest possible performance throughout all components is unique to our approach. Due to a number of optimisations, specific compression hardware is not required which leads to low overall system costs.
1
Introduction
Current conferencing systems based on pure software compression techniques for video streams often exhibit a lack of presentation quality that is not acceptable for business communication. The common approach to tackle these problems are dedicated hardware solutions. However there are a number of drawbacks. First usually only one compression algorithm is implemented. This often collides with the requirement of concurrent support for ISDN- (H.261) and IP-based (H.263) conferencing. Another aspect is that it is difficult to keep pace with the ongoing development as to be seen in the H.263+ area. Furthermore compression hardware contributes to additional, relatively high costs and has to be installed on each computer which makes this strategy scale poorly. As a consequence, compression algorithms and methods were improved to make software compression feasible. However many research projects concentrate on the optimisation of this process alone. Furthermore, often required portability restricts the use of processor extensions which can lead to impressive gains. In contrast, our ap-
306
proach targets standard desktop hardware and operating systems within a corporate networking environment. The conferencing system is designed in an integrated manner. This means that optimisations are not restricted to several independent components like the compression or transmission of video streams. In contrast, we use layer overlapping techniques and integration of several processing steps. This leads to highly efficient cache usage and reduces memory accesses. Moreover, INTEL MMX processor extensions were used to speed up several computations. The installation of modem high-speed communication systems opened the way to a novel distributed compression approach as introduced in [10]. In this scenario dedicated servers can be installed for the demanding compression effort while desktop systems perform only a light weight preparation of the video processing. The flexibility of the whole system is usually not influenced by exploiting compression servers since a modification of the compression process only requires a replacement of software components at the client side. Hence scalabihty and moderate costs can be achieved. More important than improved presentation quality and low costs are security issues for the business application of conferencing systems. In this environment encryption of voice data is often required. Furthermore, the misuse of application sharing must be made impossible. Due to the achieved performance gains, this is now possible without reducing video quality. On the other hand conferencing systems should support more than two participants and integrate a quality of service management for realtime media streams as designed and implemented here. All these goals were considered within the design and implementation of the framework. It consists of session management, application sharing and a specific transport system. Furthermore, it integrates a toolkit for the processing of audio and video data. Capturing, compression, transmission and presentation of media streams are supported via an object oriented class library. The specifics of audio/video hardware, preferred operation modes (e.g. colour models), compression algorithms and transport protocol properties are encapsulated. Resources for object-chains for realtime processing of media streams can be reserved by an integrated scheduler. As a result, high performance conferencing systems can be developed in a relatively short time without requiring thorough knowledge of implementation details like the used compression techniques or multimedia hardware specifics. The next section outlines our current conferencing system architecture. It includes the following components: session management, application sharing, transport system, soft real-time scheduling and VTToolkit - an object-oriented class library for multimedia data handling. Chapter 3 presents more details concerning the core functionality of VTToolkit. Furthermore, it highlights the general architecture, object structures and interfaces. The components for the audio/video manipulation will be discussed as well. Afterwards, the applied optimisation methods and the achieved performance results will be discussed.
307
2
Conferencing System Architecture
In different research projects we developed special architectures, mechanisms and protocols that are integrated to form a framework for the development of high performance/low cost conferencing systems (figure 1). For example, compression algorithms were designed and optimised by our group as outlined in [9]. Furthermore, specific transport protocols as well as quality of service management [18] and scheduling mechanisms were developed to support real-time multimedia applications.
Conference Server
Session Management
Conferencing Client, File and Compression Servers, Gateways
VTToolkit Components
QoSProvision and Realtime Scheduling
c o O) en c o ta n sz ex m
H.26x, MJPEG and MPEG for x86/MMX and Alpha, • „real-time" aware transport system for a broad spectrum of networks and protocols -> IP/RTP, H.320 and native transport over N-ISDN, (Fast, Gigabit)Ethemet, ATM and wireless LANs, • configurable and integrated control component and • enforcement of soft real-time with standard operating systems on end systems -> Windows NT. The development process quickly revealed numerous technical influences that arise due to the number of assembly opportunities of single components on end systems and networks. As a consequence, it can hardly be handled without a systematic design support. This lead to the parallel development of tools that account for the simplification of development and evaluation of VTToolkit's components as well as support for application development based on VTToolkit.
3.2 Basic Architecture of VTToolkit Two main aspects were important for the design and development of VTToolkit. First, we concentrated on the process of video transmission and the involved components. On the other hand, we wanted to make the interfaces and structure of the objects as simple and uniform as possible to allow an abstraction of the specific operation and interactions. Therefore, the objects that make up a desired application are regarded as a streaming chain of objects as shown in figure 2. All streaming objects but the Splitter Object possess exactly one input and output channel. Depending on the position of an object in a chain terminating objects (local sources and sinks) and intermediate objects are differentiated. Terminating objects may be input components like a Video Grabber. They are located at the beginning of a streaming chain. Hence, they do not have an accompanying logical input stream but a physical video signal input. A Streaming Object consumes an input stream, applies a processing according to its functionality and configuration (transformation function) and finally produces an output stream. The interactions of the single objects are regarded as a consumer-producer-model.
311
Sink1
r^
^FranrteGrabberj
i-
( Presenlation j J i.
^T
Splitter
^
Sink 2 Compressionservice
( Presentation j J k
1
1 a.
Compressor ] Macrobiock V J
(^compressorN l^Macroblock^
f TranaT»ttter J
f
i
1
Reviver
ft
j
f J1
Compressor H263-lnter
h i
f Receiver ")rrrat»sm)tterj I•
1 Decompressor 1 1 H263-lnter I
(
f
\
LAN
t
Receiver
J
\
WAN
Fig. 2. Sample Configuration of a Streaming Cliain
All VTToolkit objects are derived from the virtual class CConsumerProducer. It defines the standard interface that every object has to implement (figure 3). These methods can be grouped into three sections: • Data Streaming Interface: This interface is used to exchange video data. Buffers can be transferred via a push or pull mechanism that can be different for the IN and OLTT interface. By use of Buffer Objects components are interconnected and the specific transfer semantics is hidden. All required configuration is performed dynamically at run time. • Generic Control Interface: Methods to control video streams like start, stop or pause are collected here. Furthermore, the configuration of IN and OUT interfaces can be performed. • QoS Control Interface: To control and adjust runtime behaviour generic methods are defined here. They enable to get current performance data as well as the negotiation and restriction of resource usage. Besides these interfaces every object might define a Private Control Interface to configure a specific parameter set like quality, frames per second or bandwidth on the fly. Streaming Objects encapsulate one or more devices. These may be real hardware devices such as video grabber boards but also software „devices" like different compression codecs. The devices that are supported by an object and associated input/output formats may be inquired and configured via the Generic Control Interface. A major optimisation approach is the avoidance of copy operations [7]. VTToolkit does not dictate whether producer, consumer or none of them prepares a certain buffer. Hence a buffer transfer technique was introduced which allows all possible
312
Fig. 3. Object Structure and Interactions
combinations and makes wrong usage almost impossible due to its in-band signalling. Another aspect in the design of the buffer transfer strategy was to find a compromise between optimisation and simple implementation. This was accomplished by delivering video buffers by reference amongst interfaces. All other information characterising the stream and the specific buffer are handed over by value. This avoids copy operations of large amounts of data and, on the other hand, significantly simplifies management of control information. The described simple but well defined base architecture allowed to accomplish many design goals: • All objects are simple to model since they basically represent a Streaming Object and all exchanged information covers a single video firame as a common processing unit. • Due to the common interfaces all objects may be combined in almost any way, since their sequence is not determined in advance. • Base objects can be automatically arranged and configured without adapting the specific control objects for every newly designed component. • Reuse of single objects as well as of the common object structure is possible and already practised. This eases the development of new objects and additional devices. • The design mandates the integration of a QoS Control Interface. Hence differentiated performance monitoring is supported.
313
A number of optimisation techniques have been applied. However, by using adaptation layers like the Buffer Objects, a simple and clear implementation is facilitated.
4
Applied Optimisation Methods and Performance Results
This section discusses design decisions, measurements, performance tuning and applied optimisations. Since the resulting efficiency of the complete application is of primary concern layer integration as well as necessary compromises will be outlined. Furthermore, our performance oriented system design approach is reflected while concentrating on the video capture, compression and presentation components of VTToolkit.
4.1 Framegrabber and Videoview Usually, software compression is considered quite expensive in terms of required CPU-capacity. On closer examination it becomes evident, that the processes of grabbing the frames and displaying them, that are involved in every live transmission of video, are quite time consuming in itself. Thus, closer attention has to be paid to the optimisation of these components as well. In order to achieve maximum performance, optimisations have to consider a multitude of different hardware and device driver features. This is usually not done in research projects. Our approach uses a fallback strategy, where each component provides a list of possible modes, ordered by performance for the specific hardware and device driver. These parameters are mainly colour space and colour depth, with an additional parameter for the frame grabber operation mode. If the required conditions are not met (e.g. no MMX support), a less optimised version will be used. The display component is based on Microsoft's DirectDraw API, avoiding all unnecessary copy operations and putting the decompressed image direcdy into the frame buffer of the graphic adapter. Another version integrates even the decompressor directly with the output object so that it can decompress directly into video memory. Thus, side-stepping the strict differentiation into objects, an even higher optimisation can be achieved. This is only one example for the use of integrated layer processing (ILP) according to [8]. Since one copy operation over the whole image can be avoided, it is possible to speed up processing significantly. Nevertheless, the interface of the presentation component is able to communicate with all other objects in the normal fashion, so that it can be also used in configurations where other decompression objects are needed. In the same manner, a fall back to conventional GDI-output is possible, whenever a display driver is encountered that does not support the DirectDraw-API.
314
4.2 Colour Space Conversions Codecs usually prefer some planar YUV colour space formats for input. However, some capture boards do not support these formats. Furthermore, the presentation requires specific formats, hence packed YUV or RGB data have to be produced. As a consequence colour space conversions are often required. How^ever, as copy operations can not be fully avoided this processing can often be integrated. Thus, the added overhead can be almost completely hidden. This is especially the case if optimised MMX conversions are used.
4.3 Video Compression Codec Design Our evaluation is based mainly on a H.263 codec we've implemented and tuned especially for the investigation of live compression and high quality video streams. Prior to this, first experiments with a H.261 codec were made, which are described in [9]. To improve flexibility our compression object does not only support uncompressed video data as an input but also possible intermediate formats of partly compressed video data. Within this environment our compression process makes the following assumptions with corresponding design constraints: • The target video resolution should be CIF (352x288) with approximately 25 fps and relatively high image quality. • Highly specialised codecs for the conferencing case, that means half-length portrait and very moderate motion. This allows to work with only motion detection schemes, since we found that in this case at 25 fps the motion is rarely more than 1 pixel. So there is no real need for a motion estimation, as would be for fast moving objects. • We considered primarily LAN and Broadband WAN connections, so the size of the resulting data streams is not of main concern. This assumption is also common in other systems like vie [13] and NetMeeting when gaining for good video quality. • With respect to network transportation, the size of the handled data units and possibilities for fast and simple error correction schemes must be evaluated. Based on these constraints processing of all different stages from capturing/compression and decompression/displaying was highly integrated. Ideas from concepts such as integrated layer processing ILP [8] and ALF [1] were applied, where appropriate. Due to reduced copy operations, it was possible to improve speed significantly. The coding makes maximum use of available hardware features, for example the use of MMX processor extensions for motion detection, DCT and bit coding. By using these optimisations, a tremendous speedup was achieved. For fast compression, ideas derived from several publicly available implementations were adopted. In this process complexity at specific processing stages compared with the improved compression ratio were evaluated. Another aspect was the introduction of fix point operations for the DCT and an optimised algorithm for the adaptation of smaller quantisation step sizes. As discussed in [9], [10], these optimisations led us to codecs that are significantly faster than comparable versions from commercial vendors.
315
4.4 Compression Codec Performance This section discusses the applied compression codec optimisations and the resulting performance improvements. Figure 4 shows the block scheme of a H.263 inter codec [11]. The separate steps require different processing resources. The three most expensive parts are the DCT/IDCT, the half pixel search and the VLC (Variable Length Coding). We started our development based on the Telnor implementation. The first step we did was the optimisation of the internal structure. Furthermore, we investigated which features of the coding consume too much time in relation to their influence on the compression ratio. Based on this investigation we changed the behaviour of some components of the codec. One optimisation was the reduction of the half pixel search to the edges of the macroblock, which means that only a quarter of the search step must be performed. In most cases the influence on the compression ratio was less than 5%.
Full pixel motion estimation
Intra/lnterdecision
Half pixel itearcb
Ascerliin DredJction
DCT and Quantisation
DCT and auinlisation
Dequantisation and IDCT
DequantlsatiOD and IDCT
videA stream
H.263 bit stream i i
K^ r-ca Macro block coding
Run length coding
Motion detection For unchanged macro blocks exactly one bit is coded.
Fig. 4. H.263 inter coding scheme After these basic steps, we replaced some parts of the codec with MMX code for modem Intel Pentium CPUs. We started with the DCT and IDCT process, which had required more than 70% of the coding time. After this replacement, the DCT only consumes 30% of the whole time. This leads to a massive shift in the distribution of the share that every part of the codec is using. We were amazed to find out, that the
316
VLC coding was the most expensive part now. That's why we replaced this part too by a highly optimised MMX/assembler based code. This leads to an impressive reduction of the overall coding time. After this step VLC and DCT are consuming a similar part of the computing power (~35%). Table 1 shows the overall results of our optimisation. The table is divided in three blocks of different codec types. The macroblock encoder (MB encoder) performs only a motion detection algorithm as defined in H.263. No further steps, like DCT and VLC, are included here. We use this specific codec in order to support distributed compression schemas [10]. The next type is the so-called Intra Codec. This codec is similar to the H.261 codec and doesn't use motion compensation algorithms and the calculation of difference pictures (P- and B-frames). The last type, the Inter Codec, provides the full functionality of the H.263 standard except the coding of P-frames. For the measurements two different systems were used. One with Intel Celeron/400
Table 1. Compression codec performance measured on INTEL Celeron/400 and Pentium 166 MMX machines Threshold MB encoder 0 20 50 H.263 Intra 0 20 50 0 20 50 0 20 50 H.263 Inter 0 20 50 0 20 50 0 20 50
Quality Celeron [ms] [%]
Celeron w/o MMX P166 [ms] [ms]
n/a n/a n/a
1.39 1.18 0.97
2.12 1.76 1.55
5.63 4.48 3.77
6.44 5.24 4.30
80 80 80 93 93 93 94 94 94
7.00 7.82 6.75 8.49 9.30 8.07 7.61 8.35 7.26
14.38 14.89 12.97 17.37 17.76 15.48 19.63 20.01 17.29
18.98 19.27 16.08 21.99 21.78 18.17 20.03 20.15 16.91
53.18 49.53 40.95 60.11 55.74 46.26 65.42 60.65 50.22
80 80 80 93 93 93 94 94 94
17.52 18.57 12.32 22.30 21.49 13.03 24.42 21.67 13.34
22.57 23.44 14.94 26.40 24.89 15.94 28.81 24.45 16.58
78.29 77.28 57.10 80.20 76.76 53.11 83.96 75.56 53.45
96.52 96.83 72.09 100.67 95.58 68.72 105.37 93.25 69.37
P166 w/o MMX [ms]
317
MHz and one with an older Intel Pentium-MMX with 200 MHz. Both systems had 64 Mbytes RAM. For every measurement the same video sequence was used. This sequence has a resolution of 352x288 and 25 frames per second. The content of the video was a typical video conferencing scenario with some slow and fast motion parts. The table shows the results with and without MMX/assembler optimisations. The shown performance gains are based on an extensive use of optimisations and codec integration techniques. Due to the extensive use of optimised coding techniques, integrated layer processing and MMX coding performance gains up to 227% were achieved. But the possible improvements differ from case to case. For instance, the macro block encoder (MB encoder) shows very diverse results depending on the used platform. The reason for that is the different memory bandwidth of the used systems. The so-called MB encoder is derived from the H.263 coding process and consists only of a motion detection algorithm. Further compression will not be performed by this codec. This means, that after the (MMX based) coder detects a motion in a macro block, the block will be copied to the output buffer. Therefore, a large amount of data must be transferred. Data movement produces the biggest part of workload of this codec. The Celeron processor has a much better second level architecture in comparison to the standard Pentium processor. Hence the Celeron benefits more from the MMX based coding of the motion detection algorithm. Another point is the difference between the results of the H.263 intra codecs with a motion threshold of zero or twenty. Of course, the coder running with a motion threshold of zero produces more data and needs more time for DCT and run length coding. Based on this it should need more processing time than less. But in our implementation the whole motion detection process will be skipped if a motion threshold of zero is configured. This leads to a reduction of coding time, which is higher than the additional cost of the DCT and run length processing. As the table shows, the performance gains of the optimised coding are higher on the P166 using the H.263 intra codec. The reason for that is the different cache and bus architecture. The optimised code reduces the non-linear memory access significantly. The replacement and integration of table lookups do this by integrated layer processing. Therefore, we used MMX code and integrated some separate processing steps of the H.263 standard. The shown performance gains for the H.263 inter codec are too low compared to the intra version. The optimisation process is not finished yet for this codec. The Inter codec version compresses the video stream much better than the Intra coder. Therefore, only different information between two consecutive pictures will be coded. In this process a DCT and IDCT processing is needed. Unfortunately, the used MMX based DCT isn't accurate enough. In the case of the simple Intra coding, where whole DCT results will be transferred, this does not matter. In contrast, using the same DCT in an Inter codec leads to a bigger output data stream (10 to 20% more). So we decided not to use a MMX based DCT and IDCT for the Inter coder up to now. Currently, we are working on a better implementation of the DCT, and we hope that this leads to similar performance gains as achieved with the Intra codec.
318
4.5 Related Work A corresponding approach to VTToolkit is the CINEMA project [3] with the focus on the configuration of distributed multimedia applications. The enforcement of synchronisation and quality of service specifications in general is addressed here. In contrast, this paper concentrates on the configuration of video transmission systems with performance aspects as the focal point. Another related system is Rendez-Vous, a successor of IVS, developed by the Rodeo group at INRIA. It includes RTF protocol support over multicast or unicast IP, H.26I video standard, high quality PCMU, ADPCM, VADPCM (with hi-fi support) and low bandwidth GSM and LPC audio coding. Additionally the issue of platform independence is of primary concern. A very interesting relation is the integrated scheduler for multilayer video and audio flow management and processing in Rendezvous. In opposition to our scheduling approach, that tries to guarantee that enough resources are available, in Rendez-Vous the scheduler handles intelligent dropping in case of overload. New transport protocols as well as distributed coding are not explicitly addressed. The approach focuses on new compression algorithms, while our VTToolkit tackles system level problems that are also relevant in a real world environment today. Mash from the Berkeley university deals mainly with Scalable, Reliable Multicast (SRM), based on a collaboration tool, a mediaboard and a set of archival tools for standard RTP mbone sessions are developed. The support of heterogeneous network and end user capabilities is another topic, here gateways are deployed in an intelligent network scenario. This resembles the gateways used in VTToolkit to support heterogeneous environments with specific requirements. High end video transmission as well as use of new transport protocols is not a dominant topic within this project. The focus on highest possible performance throughout all components maintaining low overall system costs is unique to our approach. Nevertheless, some quite innovative contributions as the use of new transport protocols in a near-product system, coexisting with the traditional IP and the distribution of compression are made.
5
Conclusions and Future Work
In this paper we introduced a framework for advanced conferencing solutions. Based on VTToolkit - an object oriented toolkit - it eases the handling of various aspects like audio/video capture, compression techniques, transmission and presentation. All significant components are designed to be flexible and complete enough to implement applications like a transport gateway or a remote compression server by simply connecting them. This is possible by hiding the complexity of various multimedia interfaces and compression algorithms as well as due to the exploitation of a common interface as outlined in section 3. Demonstrated with a sample video transmission application, it was shown that the comprehensive observation of all involved components of a complex system enables a
319 rich set of optimisations. These aspects include general techniques like application layer framing and integrated layer processing. Furthermore, the use of specialised multimedia system interfaces and distributed compression, the exploitation of processor extensions like MMX for improving the compression efficiency and the application of efficient transport systems over advanced network technologies were considered. Due to the applied optimisations and the integrated resource and quality of service management, it is now possible to perform high quality/multi-party conferences with moderate costs on modem platforms, since additional compression hardware is not required. Video transmission is characterised by the trade-off between quality, compression effort and available network bandwidth. Together with the number of VTToolkit components and their configurations, like the used colour model, resolution, compression process or transport system, it is necessary to have a suitable design support. Development of VTToolkit as well as other components of the firamework will be consequently continued. This includes further improvements and optimisations as well as extensions concerning additional audio and video codecs and support of standardised conferencing protocols via gateways to enable interoperability. Moreover, it is intended to continue application development. This includes the practical use as a wide area video conferencing system in some teleteaching projects.
References 1. Ahlgren, B., Gunningberg, P., Moldeklev. K.: ,J»erformance with a Minimal-Copy Data Path supporting ILP and ALF." 1995. 2. Benz, M., Engel, F.: „Hardware Supported Protocol Processing for Gigabit Networks", Workshpp on System Design Automation, Dresden, 1998, pp. 23-30 3. Earth, I., Helbig, T., Rothermel K.: „Implementierung multimedialer Systemdienste in CINEMA" GI/TTG-Fachtagung "Kommunikation in Yerteilten Systemen", Febmary 22-24, 1995, Chemnitz, Germany 4. Braun, I., Hess, R., Schill, A.: Teleworking support for small and Medium-sized enterprises; IFIP World Computer Congress '98, Aug.-Sept. 1998 5. Braun, T., Diot, C : „Protocol Implementation Using Integrated Layer Processing", ACM SIGCOMM '95,1995 6. Clark, D. D., Tennenhouse, D. L.: „Architectural Considerations for a New Generation of Protocols", ACM SIGCOMM '90,1990 7. Druschel, P., Abbot, M. B., Pagels, M. A., Peterson, L.: „Network Subsystem Design: A Case for an Integrated Data Path", IEEE network, 1993 8. Braun, T., Diot, C : .J'rotocol Implementation using Integrated Layer Processing." SIGCOMM 1995. 9. Geske, D., Hess, R.: Fast and predictable video compression in software - design and implementation of an H.261 codec, Interacdve Multimedia Service and Equipment - Syben 98 Workshop, Zurich, 1998 10. Hess, R., Geske, D., Kiimmel, S., Thurmer, H.: ..Distributed Compression of Live Video an Application for Active Networks", AcoS 98 Workshop, Lissabon, 1998 11. ITU-T Recommendation H.263. Line Transmission of non-telephone signals. Video codec for low bitrate communication, March 96
320
12. Kummel, S., Hutschenreuther, T.: ..Protocol Support for Optimized. Context-Sensitive Request/Response Communication over Connection Oriented Networks". IFIP Int. Conf. On Open Distributed Processing. Toronto. Mai 1997, pp. 137-150 13. McCanne. St.. Jacobson. V.: vie: A Flexible Frameworlc for Packet Video. ACM Multimedia '95, San Francisco, CA, November 1995 14. http://www.picturetel.com/products.htm 15. http://www.intel.com/proshare/conferencing/products/ 16. Audio-Video Transport Working Group: ,.RTP: A Transport Protocol for Real-Time Applications". 1996 17. Network Working Group: .Jlesource ReSerVation Protocol (RSVP)". 1997 18. Schill. A.. Hutschenreuther. T.: ..Architectural Support for QoS Management and Abstraction: SALMON", Computer Communications Journal. Vol. 20, No. 6, July 1997, pp. 411419 19. Schill. A.. Franze, K.. Neumann, O.: An Infrastructure for Advanced Education: Technology and Experiences; 7th World Conference on Continuing Engineering Education, Turin, Mai 1998 20. Turletti, T.: The INRIA Videoconferencing System (IVS). Connexions - The Interoperability Report Journal, Vol. 8, No 10, pp. 20-24, October 1994.
A Novel Replica Placement Strategy for Video Servers Jamel Gafsi and Ernst W. Biersack (gafsi,erbi)@eurecom.fr Institut EURECOM, B.P. 193, 06904 Sophia AntipoUs Cedex, FRANCE
A b s t r a c t . Mirroring-based reliability as compared to parity-based reliability significantly simplifies the design and the implementation of video servers, since in case of failure mirroring does not require any synchronization of reads or decoding to reconstruct the lost video data. While mirroring doubles the amount of storage volume required, the steep decrease of the cost of magnetic disk storage makes it more and more attractive as a reliability mechanism. We present in this paper a novel data layout strategy for replicated data on a video server. In contrast to classical replica placement schemes that store original and replicated data separately, our approach stores replicated data adjacent to original data and thus does not require additional seek overhead when operating with disk failure. We show that our approach considerably improves the server performance compared to classical replica placement schemes such as the interleaved declustering scheme and the scheme used by the Microsoft Tiger video server. Our performance metric is the maximum number of users that a video server can simultaneously support (server throughput).
Keywords:
1
Video Servers, D a t a Replication, Performance Analysis
Introduction
In order to store a large n u m b e r of voluminous video files, a video server requires numerous storage components, typically magnetic disk drives. As the number of server disks grows, the server m e a n t i m e to failure degrades and the server becomes more vulnerable to d a t a loss. Hence the need of fault tolerance in a video server. T w o techniques are mainly applied in the context of fault tolerant video servers: m i r r o r i n g and p a r i t y . Parity adds small storage overhead for parity data, while mirroring requires twice as much storage volume as in the non-fault tolerant case. Mirroring as compared to parity significantly simplifies the design and the implementation of video servers since it does not require any synchronization of reads or additional processing time to decode lost information, which must be performed for parity. For this reason, various video server designers [1-3] have adopted mirroring t o achieve fault-tolerance. T h i s paper only focuses on the mirroring-based technique. T h e video server is assumed t o use round-based data retrieval, where each stream is served once every t i m e interval called the s e r v i c e r o u n d . For the
322
data retrieval from disk, we use the SCAN scheduling algorithm that optimizes the time spend to seek the different video blocks needed. A video to store on the server is partitioned into video blocks that are stored on all disks of the server in a round robin fashion and that each video is distributed over all disks of the server. The literature distinguishes two main strategies for the storage/retrieval of original blocks on/from server; the Fine Grained Striping (FGS) strategy and Coarse Grained Striping (CGS) strategy. FGS retrieves for one stream multiple blocks from many disks during one service round. A typical example of FGS is RAIDS as defined by Katz et al. [4]. Other researchers proposed some derivations of FGS like the streaming RAID of Tobagi et al. [5], the staggered-group scheme of Muntz et al. [6], and the configuration planner scheme of Ghandeharizadeh et al. [7], and our mean grained striping scheme [8]. FGS generally suffers from large buffer requirements. CGS, however, retrieves for one stream one block from a single disk during each service round. RAID5 is a typical CGS scheme. Oezden et al. [9,10] have shown that CGS provides higher throughput than FGS (RAID5 vs. RAIDS) for the same amount of resources (see also Vin et al. [11], Beadle et al. [12], and our contribution [8]). Accordingly, in order to achieve highest throughput, we adopt CGS to store and retrieve original blocks. What remains to solve is the way original blocks of a single disk are replicated on the server. Obviously, original blocks of one disk are not replicated on the same disk. Mirroring schemes differ on whether a single disk contains original and/or replicated data. The mirrored declustering scheme sees two (many) identical disk arrays, where original content is replicated onto a distinct set of disks. When the server works in normal operation mode (disk failure free mode), only the half of the server disks are active, the other half remains idle, which results in load imbalances within the server. Unlike mirrored declustering, chained declustering [IS, 14] partitions each disk into two parts, the first part contains original blocks and the second part contains replicated blocks (copies): Original blocks of disk i are replicated on disk {i + l)modD, where D is the total number of disks of the server. Interleaved declustering is an extension of chained declustering, where original blocks of disk i are not entirely replicated on another disk (i + l)mod£), but distributed over multiple disks of the server. Mourad [1] proposed the doubly striped scheme that is based on interleaved declustering, where original blocks of a disk are evenly distributed over all remaining disks of the sever. We can consider chained declustering as a special case of interleaved declustering having a distribution granularity of replicated blocks that equals 1. We will restrict our discussion to interleaved declustering schemes, since these schemes distribute the total server load evenly among all components during normal operation mode. Note that interleaved declustering only indicates that the replica of the original blocks belonging to one disk are stored on one, some, or all remaining disks, but does not indicate how to replicate a single original block.
323
This paper is organized as follows. Section 2 classifies and studies several interleaved declustering schemes. We present our novel replica placement strategy in section 3. In section 4, we show that our approach outperforms the other existing schemes in terms of the server throughput. The conclusions are presented in section 5.
2
Interleaved Declustering Schemes
We present in Table 1 different interleaved declustering schemes. We adopt two classification metrics. The first metric examines how a single block is replicated. The second metric concerns the number of disks that store the replica of the original content of a single disk. We consider for the first metric the following three alternatives: 1. The copy of the original block is entirely stored on a single disk (One). 2. The copy of the original block is divided into a set of sub-blocks, which are distributed among some disks building an independent group (Some). 3. The copy of the original block is divided into exactly (D - 1) sub-blocks, which are distributed over all remaining {D — 1) server disks (All). We distinguish three alternatives for the second metric: 1. The original blocks that are stored on one disk are replicated on a single disk (One). 2. The original blocks of one disk are replicated on a set of disks that build an independent group (Some). 3. The original blocks of one disk are replicated on the remaining (13 — 1) server disks (All). The symbol "XXX" in Table 1 indicates combinations that are not useful for our discussion. The name of each scheme contains two parts. The first part indicates how an original block is replicated (the first metric) and the second part gives the number of disks, on which the content of one disk is distributed (the second metric). For instance, the scheme One/Some means that each original block is entirely replicated (One) and that the original content of one disk is distributed among a set of disks (Some).
Entire block (One) Set of sub-blocks (Some) {D - 1) sub-blocks (All)
Single disk (One) Set of disks (Some) {D - 1) disks (AU) One/One One/All One/Some XXX Some/Some XXX XXX XXX All/All
Table 1. Clcissification of interleaved schemes
324
Let s assume a video server containing 6 disks (disks 0 to 5) and a video to store consisting of 30 original blocks. Each disk is partitioned into two equal-size parts, the first part stores original blocks and the second part stores replicated blocks (copies) (see Figures 1 and 2). Figure 1(a) shows the One/One organization. For instance, original blocks of disk 0 are replicated on disk 1 (dashed blocks). During disk failures, the load of a failed disk is entirely shifted to another single disk, which results in load imbalances within the server. On the other hand, the One/One organization has the advantage of surviving up-to y disk failures in the best case. Figure 1(b) shows the One/All organization. The replication of original blocks of disk 0 are stored on the other disks 1,2,3,4, and 5 (dashed blocks). This organization allows, in the best case, to evenly distribute the load of one failed disk among all remaining disks. Its fault tolerance, however, is limited to a single disk failure. We show in Figure 1(c) an example of the One/Some organization that divides the server into 2 independent groups, where each group contains a set £>£ = 3 of disks. Original blocks of one disk are entirely replicated over the remaining disks of the group, e.g. original blocks of disk 0 are replicated on disks 1 and 2. In order to ensure deterministic admission control, each disk of the server must reserve a proportion of its available I/O bandwidth, which is needed to retrieve replicated data during disk failure mode. The amount of I/O bandwidth that is reserved on each disk must respect the worst case scenario. Obviously, the One/One organization needs to reserve on each disk one half of the available I/O bandwidth for disk failure mode. For both, the One/Some and the One/All organizations, the original blocks of one disk are spread among multiple (some for One/Some and {D — 1) for One/All) disks. However, all blocks that would have been retrieved from the failed disk for a set of streams can, in the worst case, have their replica stored on the same disk. This worst case scenario therefore requires the reservation of one half of the I/O bandwidth of each disk. Consequently, all of the three schemes One/One, One/All, and One/Some must reserve the half of each disk's available I/O bandwidth in order to ensure deterministic service when operating in disk failure mode. The Microsoft Tiger [2,3] introduced a replication scheme, where an original block is not entirely replicated on a single disk. Indeed, the replica of an original block consists of a set of sub-blocks, each being stored on a different disk. Original blocks of one disk are replicated across the remaining disks of the group, to which this disk belongs. We have called this organization Some/Some in Table 1. Figure 2(a) illustrates an example of this organization, where dashed blocks show how original blocks of disk 0 (3) are replicated on disks 1 and 2 (4 and 5) inside group 1 (2). As the One/Some organization, the Some/Some organization allows to survive one disk failure inside each group. The last organization of Table 1 is All/All, for which we show an example in Figure 2(b). Dashed original blocks of disk 0 are replicated as indicated. The main advantage of All/All is its perfect load balancing. In fact, the load of a
325
ED
m do [H]
[E]
CD CO
nri IS (a) One/One Organization.
dH CO
CO CO
Uocks
[JD
un
CEl CD Qs]
da
d] SD
an
Is]
QEl
QT]
IX] [ID m] HEI
on
CID DD [S]
IS [a]
[n [JD
E]
CO
(b) One/All Orgzinization.
Original
[^ |2»1
[ill [W] QLl CM] HE)
HD gr]
[O QTI QT] [M] HEI CU OD CZl
[O
[zF]
[O QT] nri [aH [S3
E]
Qo]
[^
[Jl]
[30]
CKl
d]
[JS
dLI
[H]
(c) O n e / S o m e Organization.
F i g . 1 . E n t i r e block replication.
failed disk is always evenly distributed among all remaining disks of the server. However, the All/All organization, as t h e One/All organization, only allows t o survive a single disk failure, which might be not sufficient for large video servers. Contrarily to the entire block replication organizations (Figure 1), the two sub-block replication organizations (Figure 2) avoid to reserve the half of each disk's I / O bandwidth to ensure deterministic service during disk failure m o d e . However, the number of seek operations will double for these two schemes when operating in disk failure m o d e compared to normal operation mode. Exact values of the a m o u n t of I / O b a n d w i d t h t o be reserved are given in section 4 . 1 . T h e m a i n drawback of all replication schemes considered in this section is their additional seek overhead when operating with disk failure as we will see in section 4 . 1 . In fact, these schemes require additional seek times t o retrieve replicated d a t a t h a t are stored separately from original d a t a . Unfortunately, high seek overhead decreases disk utilization and therefore server performance. We present in the following our replication approach t h a t resolves this problem by eliminating the additional seek overhead. In fact, we will see t h a t our approach
326
Original
CD ED
[E] QE)
ttocks
IS
an aa on
CEl QEI QEl
[11
Hi]
en Original blocia
IS
[g]
En Qa
CE]
HE) HE]
CopiM
(a) S o m e / S o m e Organization.
(b) All/All Organization.
F i g . 2 . Sub-blocks replication.
requires for the disk failure m o d e the same seek overhead as for the normal operation m o d e .
3 3.1
A Novel Replica Placement Strategy Motivation
If we look at the evolution of SCSI disk's performance, we observe t h a t (i) d a t a transfer rates double every 3 years, whereas (ii) disk access t i m e decreases by one third every 10 years [15]. Figure 3(a) shows d a t a transfer rates of different Seagate disks' generations (SCSI-1, SCSI-II, Ultra SCSI, and finally Ultra2 SCSI) [16]. Figure 3(b) depicts the evolution of average access t i m e for Seagate disks. We see t h a t Figures 3(a) and 3(b) well confirm the observations (i) and (ii) respectively. A disk drive typically contains a set of surfaces or platters t h a t rotate in lockstep on a central spindle ^. E a c h surface has an associated disk head responsible for reading data. Unfortunately, the disk drive has a single read d a t a channel a n d therefore only one head is active at a time. A surface is set u p t o store d a t a in a series of concentric circles, called t r a c k s . Tracks belonging t o different surfaces and having t h e s a m e distance t o the spindle build together a c y l i n d e r . As an example of t o d a y ' s disks, the Seagate B a r r a c u d a ST118273W disk drive contains 20 surfaces; 7,500 cylinders; and 150,000 tracks. T h e time a disk spends for performing seek operations is wasted since it can not be used t o retrieve d a t a . One seek operation mainly consists of a r o t a t i o n a l l a t e n c y a n d a s e e k t i m e . R o t a t i o n a l latency is the time the disk a r m spends inside one cylinder to reposition itself on the beginning of the block to be read. ' Regarding mechcinical components of disk drives cind their characteristics, we cire based in this paper on [16] and also on the work done in [17].
327
Access lime Evolution
Transfer Rale Evduf on
[86-92]
p3-gq
[95-9q
DK-92)
m-Xl
195-98]
[98-00]
Yews
(a) Evolution of data treinsfer rates for Seagate disks
(b) Evolution of average access time for Seagate disks
Fig. 3. Performance evolution of Seagate SCSI disks
The maximum value of the rotational latency trot is directly given by the rotation speed of the spindle. This rotation speed is actually at about 7,200 rpm, which results in Uot = 10 ms. Seek time tgeek as studied in [17,18] is mainly composed of four phases: (i) a speedup phase, which is the acceleration phase of the arm, (ii) a coast phase (only for long seeks), where the arm moves at its maximum velocity, (iii) a slowdown phase, which is the phase to rest close to the desired track, and finally (iv) a settle phase, where the disk controller adjusts the head to access the desired location. Note the duration t,ti of the the settle phase is independent of the distance traveled and is about tgti = 3 ms. However, the durations of the speedup phase {tspeed), the coast phase (tcoast), and the slowdown phase {t,iowdown) mainly depend on the distance traveled. The seek time tseek takes then the following form: tseek ^ tgpeed T tcoast "r tslowdown T Isfl
Let us assume that the disk arm moves from the outer track (cylinder) to the inner track (cylinder) to retrieve data during one service round and in the opposite direction (from the inner track to the outer track) during the next service round (CSCAN). If a single disk can support up-to 20 streams, at most 20 blocks must be retrieved from disk during one service round. If we assume that the different 20 blocks expected to be retrieved are uniformly spread over the cylinders of the disk, we then deal only with short seeks and the coeist phase is neglected (distance between two blocks to read is about 300 cylinders). Wilkes et al. have shown that seek time is a function of the distance traveled by the disk arm and have proposed for short seeks the formulatjeefe = 3.45 + 0.597 • yd , where d is the number of cylinders the disk arm must travel. Assuming that d » 300 cylinders, the seek time is then about t,eek ^ 13.79 ms. Note that short seeks spend the most of their time in the speedup phase.
328
3.2
Our Approach
The Some/Some scheme (see table 1) ensures a perfect distribution of the load of a failed disk over multiple disks and reduces the amount of bandwidth reserved for each stream on each surviving disk as compared to the interleaved declustering schemes (One/One, One/Some, and One/All). Since the content of one disk is replicated inside one group, Some/Some allows a disk failure inside each group. We call our approach, which is based on the Some/Some scheme, the Improved S o m e / S o m e scheme. The basic idea is to store original data as well as some replicated data in a continuous way so that when a disk fails, no additional seeks are performed to read the replica. In light of this fact, our approach does not divide a disk in two separate parts, one for original blocks and the other for replicated blocks. Figure 4 shows an example of the Improved Some/Some scheme.
3
[I
\~^ ^ 3 . 2 | 7 . , | 18
113 |14.^15.^
3
|7.2|8.l|
_ ^ 3 . 1 |
|i3.;ji4.i|
rrrJsilM
115 [l«-sizO-i|
119 ^ 0 . ^ 1 . ^
|20 t!1.*5.i
125 |za.^7.i
|26
S Group 1
f27n
5
^ ^ 8 - 1
1 6
|10.2|11-1|
Pio
|ll.2(l2-l|
|12
k = ^
ne
^1S.1
|18 ^423.1|
rr
|22 |23.2|24.l|
|24 |!8.^9.l|
|28 |29.2t30.l{
E Group 2
Fig. 4. Layout example of the Improved Some/Some scheme. Let us consider only the disks' content of group 1 (disks 0, 1, and 2). Let us now consider original block 9 that is stored on disk 2 (dashed block). The replication is performed as follows. We divide the original block into 3 — 1 = 2 ^ sub-blocks 9.1 and 9.2 that are stored imme(f«ate/j/contiguous to the original blocks 7 and 8 respectively. Note that original blocks 7 and 8 represent the previous original blocks to block 9 inside the group. If we take original block 13, its previous original blocks inside the group are blocks 8 and 9. Now assume that disk 2 fails. Block 9 is reconstructed as follows. During the service round i where block 7 is retrieved, block 7 and sub-block 9.1 are simultaneously retrieved (neither additional seek time nor additional rotational latency, but additional read time). During the next service round i + 1 , block 8 and sub-block 9.2 are simultaneously retrieved. Sub-blocks 9.1 and 9.2 are retrieved from server in advance and kept in buffer to be consumed during service round i + 2. Generally, sub-blocks that are read in advance are buffered for several service rounds before being consumed. The number of buffering rounds mainly depends on how large the server is (total number of server disks). If we assume that disk 0 is the failed 3 is the group size and therefore the number of su-blocks is 2.
329 disk, the reconstruction of block 19 is performed during the service rounds where blocks 14 (sub-block 19.1) and 15 ( sub-block 19.2) are retrieved. The sub-blocks are kept in the buffer at most during 5 service rounds before they are consumed. The example shows that in order to simultaneously read one original block and one sub-block for one stream, data to be retrieved have a size of at most two original blocks. In order to ensure continuous read, one original block as well as the corresponding replicated blocks must be contained on the same track. Fortunately, today's disk drives satisfy this condition. In fact, the track size is continuously increasing. The actual mean track size for Seagate new generation disk drives is about 160 KByte, which is about 1.3 Mbit. Hence the possibility to store inside one track the original block and the set of replicated sub-blocks as shown in Figure 4. Our approach therefore does not increase seek overhead, but doubles, in the worst case, the read time. Note that the very first blocks require special treatment: our approach entirely replicates the two first blocks of a video within each group, which is represented in Figure 4 with the darkdashed blocks ( block 1 on disk 1, block 2 on disk 0 for group 1 and block 5 on disk 3, block 4 on disk 4 for group 2). Let us take the following example to explain the reason of doing this. If disk 0 has already failed before a new stream is admitted to consume the video presented in the figure, the stream is delayed for one service round. During the next service round, the two first blocks 1 and 2 are simultaneously retrieved from disk 1. Based on the performance evolution of SCSI disks (see Figure 3), our approach will improve server performance in terms of the number of streams that the server can simultaneously admit (see section 4).
4 4.1
Performance Comparison Admission Control Criterion
The admission control policy decides whether a new incoming stream can be admitted or not. The maximum number of streams Q that can be simultaneously admitted from server can be calculated in advance and is called server throughput. The server throughput depends on disk characteristics as well as on the striping/reliability scheme applied. In this paper, the difference between the schemes considered consists of the way original data is replicated. We consider the admission control criterion of Eq. 1. We first calculate disk throughput and then derive server throughput. Let Qj denote the throughput achieved for a single disk. If we do not consider fault tolerance, the disk throughput is given in Eq. 1, where 6 is the block size, ra is the data transfer rate of the disk, trot is the worst case rotational latency, taeek is the worst case seek time, and T is the service round duration ^. ^ We take a constcint value of r, typicsilly T = •^, where h is the size of an original block and Tp is the playback rate of a video
330
Qd • {
h trot + tseek 1 < T
Vd
J ^ , T trot T ^see/c
Introducing fault tolerance (mirroring-based), the disk throughput changes and becomes dependent on which mirroring scheme is applied. Three schemes are considered for discussion: our approach (Improved Some/Some), the One/Some scheme, and the Microsoft Some/Some scheme. Let Q j ' ^ , Qf'^, and Q^^^ the disk throughput for One/Some, Some/Some, and our Improved Some/Some, respectively. Note that the disk throughput is the same during both, normal operation and disk failure mode. For the One/Some scheme, half of the disk I/O bandwidth should be reserved in the worst case to reconstruct failed original blocks and thus Q*^^ is calculated following Eq. 2.
QT
= ^ = •^'^—^
'-
(2)
For the Some/Some scheme, in order to reconstruct a failed original block, the retrieval of sub-blocks requires small read overhead (small sub-blocks to read on each disk), but a complete latency overhead for each additional sub-block to read from disk. The admission control criterion presented in Eq. 1 is therefore modified as Eq. 3 shows. The parameter 6ju(, denotes the size of a sub-block.
Qd^
—+iro • ({ (iz-+trot
+ tseek) + {^-^
^'' ^
+ trot + t,eek)j
'-^
< T
+ 2.{trot+t'Zi:)
^^^
If we take our Improved Some/Some scheme and consider the case where a disk fails inside one group, we get the following admission control criterion (Eq. 4), where bover denotes the amount of data (overhead) that should be simultaneously read with each original block. In the worst case bover = ^• ^ISS
/ fr + bover
\
, ^
.
rd
\ ^ ^
)
Qr = . . , , \ , ,.. "T t r o t ~r
(4) f'seek
Once the disk throughput Qd is calculated, the server throughput Q can be easily derived as Q = D Qdiov each of the schemes, where D denotes again the total number of disks on the server.
331
4.2
Throughput Results
We present in the following the results of the server throughput Q'^^, Q^^, and Q^^^ respectively for the schemes One/Some, Some/Some, and our Improved Some/Some. In Figure 5, we keep constant the values of the seek time and the rotational latency and vary data transfer rate r^ of the disk. Figures 5(a) and 5(b) show that Improved Some/Some outperforms the Microsoft Some/Some scheme that, itself outperforms One/Some for all values of r j (20,80 MByte/sec). Figure 5 also shows that the gap between our Improved Some/Some and the two other schemes (One/Some and Some/Some) considerably increases with the increase of the data transfer rate r j . Table 2 illustrates the benefit of our approach, where the ratios j^os and ^Q-js" are illustrated depending on raQ^
Server Thnxjghpul
|lDOOO
Server Throughput
Improved Some/Some Sonne/Some One/Some
" 8000
200 40O SOD Number of Disks D in the Server
(a) Throughput 20 MByte/sec.
for r j
0
=
200 400 eoo Number o1 Disks D in the Server
(b) Throughput 80 MByte/sec.
for rd
=
F i g . 5. Server throughput for One/Some, Some/Some, and Improved Some/Some with tseek = 13.79 ms, trot = 10 ms, b = 0.5 Mbit, and Tp = 1.5 Mbit/sec.
fl't:' QS3 rd = 20 MByte/sec 1.79 1.69 rd = 40 MByte/sec 1.88 1.83 rd = 80 MByte/sec 1.93 1.91 Table 2. Throughput ratios.
We focus now on the impact of the evolution of the seek time tjeefe and the rotational latency trot on the throughput for the three schemes considered. We keep constant the data transfer rate that takes a relatively small value of rj (40 Mbyte/sec). Figure 6 plots server throughput for the corresponding parameter values. The Figure shows that our Improved Some/Some scheme achieves
332
highest server throughput for a//seek time/rotational latency combination values adopted. Obviously, the decrease in i,eefc and trot increases throughput for all schemes considered. We notice that the gap between our Improved Some/Some and the Microsoft Some/Some slightly decreases when t,eek and trot decrease as Table 3 depicts, where disk transfer rate has also the value r^ (40 Mbyte/sec). Note that th value trot = 6 ms corresponds to a spindle speed of 10000 prm, and the value trot = 4 ms corresponds to the speed of 15000 prm, which is a too optimistic value. We observe that even for very low values of ijeefc and trot, our Improved Some/Some scheme outperforms the Microsoft Some/Some in terms of server throughput (Q^^ = 1.59).
Server Throughput
II
— -»- ^
Server Throughput
Improved Some/Some Some/Some One/Some
I B1.5
-— -»—^
Improved Some/Some Some/Some One/Some
^^^'^^'
•5
k -|0.5
200 400 SOO Numbef ol Disks D in the Server
200 400 600 Number ol Disks D in the Server
(a) Throughput for tseek 10 ms, trot = 8 ms.
=
(b) Throughput for tsee.k = 8 ms, trot = 6 ms. Server Throughput
I g1.5
0
— Improved Some/Some - • - Some/Some —*- One/Some
200 400 BOO Number ot Disks D in the Server
(c) Throughput for t^ee* = 4 ms, trot — 4 ms. F i g . 6. Server throughput for different access time values with r j = 40 MByte/sec, b = 0.5 Mbit, and Vp = 1.5 Mbit/sec.
333
tseek tseek tseek tseek
— = = =
1..S Qii.' 13, 79 cind trot = 10 1.83 1.78 10 a n d trot = 8 1.73 8 cind trot — 6 1.59 4 a n d trot = 4
Table 3. Throughput ratio between our Improved Some/Some cind the Microsoft Some/Some.
4.3
Reducing Read Overhead for our Approach
The worst case read overhead of our Improved Some/Some scheme is the time to read redundant data of the size of a complete original block. We present in the following a method that reduces this worst case amount of data read down-to the half of the size of one original block. This method simply consists of storing different sub-blocks not only on one side of one original block, but to distribute them on the left as well as on the right side of the original block. Figure 7 shows an example, where each original block is stored in the middle of two replicated sub-blocks. Let us assume that disk 2 fails and that block 9 must be regenerated. While reading block 7, disk 0 continuous its read process and reads sub-block 9.1. On disk 1, the situation is slightly different. In fact, before reading block 8, sub-block 9.2 is read. In this particular example, no useless data is read, in contrast to the example in Figure 4. 2 3.1 1 8.2 7 [15.1|13 \•-
r'
1,
1 1 fli
11; I'll yijiL 1 1
j 1!
Hi i!
f i;
m1
2 •O »-
O Ol
^ W
(D «
M W
U> «
Disk Slot NumbOT
Figure 3: Network Block Schedule - Modified
arrivals and buffer stealing often results in a significant amount of contiguous reading when the new stream is accepted, increasing the bandwidth and the read-ahead achieved. For example, consider video streams of approximately 4 Mbps, a typical value for average (TV) quality. If there are 5 currently accepted video streams and 64 MBytes of server buffer space, each stream would have approximately 12 MBytes of buffer space (or 24 seconds of video). If a new stream is accepted, there would be 10.6 MBytes per stream in steady state (or 20 seconds of video). This amount of data could accumulate from the disk in about 3 seconds, so that steady state is achieved rather quickly. The only time that the server would not have read ahead at least 20 seconds is during the first few slots of reading. With staggered arrival patterns, the server is reading from only one stream immediately after acceptance, and so the disk is substantially ahead after the first disk slot. The steady state will be reached soon enough that none of the borderline cases of buffer space and bandwidth will be encountered. Smoothing the bandwidth usage of each stream is a reasonable course of action, which reduces the resource reservation and potentially permits more simultaneous streams.
4
Network Admission Control Algorithm
Once we have achieved a suitable network bandwidth characterization for each stream, the stream requests are submitted to a network admission control algorithm that determines if there is enough outgoing network bandwidth to support
343
these requests. The network admission control algorithm used in the CMFS is relatively simple. The maximum number of bytes that the network interface can transmit per second is easily converted to the number of blocks per disk slot,^ which we hereafter refer to as maxXmit. The algorithm is shown in Figure 4 and can be summarized as follows: for each network slot, the bandwidth values for each stream are added, and as long as the sum is less than maxXmit, the scenario is accepted. Requests which arrive in the middle of a network slot are adjusted so that the network slot ends for each stream simultaneously. Thus, such a stream has less opportunity to fill the client buffer in that first network slot. In the sample streams this made very little difference in the overall bandwidth required for the network block schedule, although the initial shape did differ somewhat. It did not change the overall distribution of bandwidth. Net work AdmissionTest( newStream, networkSlotCount) begin for netwSlot = 0 to networkSlotCount do sum = 0 for i = firstConn to lastConn d o sum = sum + 7VetjB/ocA;s[netwSlot] if (sum > maxXmit) t h e n return (REJECT) end return (ACCEPT) end end Figure 4: Network Admission Control Algorithm The network admission control algorithm is the same algorithm that was called the "Instantaneous Maximum" disk admission control algorithm in our previous work [7]. This algorithm was rejected in favour of the vbrSim algorithm that took advantage of aggressive read-ahead in the future at the guaranteed rate or aggressive read-ahead in the past at the achieved rate. The vbrSim algorithm could be considered for network admission control. The smoothing effect enabled by sending data early could further eliminate transient network bandwidth peaks. One major benefit of vbrSim is the ability to use the server buffer space to store the data which is read-ahead. This buffer space is shared by all the streams and thus, at any given time, one connection can use several Megabytes, while another may use only a small amount of buffer space. For scenarios with cumulative bandwidth approaching capacity, significant server buffer space is required to enable acceptance. ^For disk blocks of 64 KBytes and disk slots of 500 msec, 1 Mbps is approximately 1 Block/slot.
344
If the same relative amount of buffer space was available at each client, then network send-ahead could be effective. The server model only requires two disk slot's worth of buffer space, and so, very little send-ahead is possible. Even this amount of buffer space is large compared with the minimum required by a decoder. For example, according to the MPEG-2 specifications, space for as few as three or four frames is required.
5
Experimental Design
In order to examine the admission performance of our network admission control algorithm, we loaded a CMFS with several representative VBR video streams on several disks. Each disk contained 11 streams. Then we presented a number of stream request scenarios for streams which were located on the same disk to determine which of the scenarios could be accepted by the vbrSim disk admission control algorithm. The initial selection of the streams for each scenario was done choosing a permutation of streams in such a manner as to have the same number of requests for each stream for each size of scenario. Thus, there were 33 scenarios that contained 7 streams and each of the 11 streams was selected 33*7/11 = 21 times, and 33 scenarios that contained 6 streams. There were also 33 scenarios of 5 streams each and 44 of 4 streams each. When arrival times of the streams were staggered, the streams were requested in order of decreasing playback time to ensure that all streams in a scenario were active at some point in the delivery time. The scenarios for each disk were then combined with similar scenarios from other disks and the network admission control algorithm was used to determine whether or not the entire collection of streams could be accepted by a multi-disk server node. The admission control algorithm was not evaluated in a running CMFS, due to limitations in the measurement techniques employed. A summary of the stream characteristics utilized in these experiments is given in Table 1. Each disk has a similar mix of streams that range from 40 seconds to 10 minutes with similar averages in variability, stream length, and average bandwidth. The variability measure reported is the coefficient of variation (Standard Deviation/Mean) of the number of blocks/slot.
6
Results
In this section, we compare the results of the Original algorithm with the Smoothed algorithm. The first observation that can be made is that the average bandwidth reservation is significantly greater than the average bandwidth utilization. When averaged over all scenarios, the Smoothed algorithm reserves significantly less bandwidth than the Original algorithm (113.3 Mbps versus 122.8 Mbps), both of which exceed the bandwidth utilization of 96.5 Mbps. Thus, it is reasonable to expect that the Smoothed algorithm will provide better admission performance results.
345
Largest B/W SmaUest B / W Average B / W Std. Dev. B / W Largest Variability Smaillest Variability Average Variability Longest Duration Shortest Duration Average Duration Std. Dev of Duration
Disk 1 5.89 Mbps 2.16 Mbps 4.16 Mbps 1.15 Mbps .43 .184 .266 574 sees 95 sees 260 sees 160 sees
Disk 2 6.03 Mbps 3.33 Mbps 4.89 Mbps 0.93 Mbps .35 .154 .233 462 sees 59 sees 253 sees 139 sees
Disk 3 6.69 Mbps 2.9 Mbps 4.61 Mbps 1.07 Mbps .354 .185 .251 625 sees 52 sees 311 sees 188 sees
Disk 4 7.28 Mbps 1.71 Mbps 4.61 Mbps 1,64 Mbps .354 .119 .262 615 sees 40 sees 243 sees 181 sees
Table 1: Stream Characteristics We grouped the scenarios with respect to the relative amount of disk bandwidth they request, by adding the average bandwidths of each stream and dividing by the bandwidth achieved on the particular run. The achieved bandwidth is affected by the placement of the blocks on the disk and the amount of contiguous reading that is possible. In the first experiment, 193 scenarios were presented to a single-node CMFS configured with 4 disks. Each disk had a similar request pattern that issued requests for delivery of all the streams simultaneously. Table 2 gives a summary of admission performance with respect to number of scenarios in each request range that could be accepted by both the network admission algorithm and the disk admission algorithm on each disk, which were fewer than 193. Pet Band 95-100 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 50-54 Total
N u m b e r of Scenarios 0 8 6 9 26 21 34 25 10 2 141
Disk Accepted 0 0 0 3 7 15 33 25 10 2 95
Original Accepted 0 0 0 0 0 2 18 23 10 2 56
Smoothed Accepted 0 0 0 0 3 14 33 25 10 2 87
Table 2: Admission Performance: Simultaneous Arrivals (% of Disk)
346
The four disks were able to achieve between 110 and 120 Mbps. The scenario with the largest cumulative bandwidth that the Smoothed algorithm could accept was 93 Mbps, as compared with 87.4 Mbps for the Original algorithm. In this set of scenarios, the requested bandwidth varied from approximately 55% to 95% of the achievable disk bandwidth. The original algorithm accepts only a small percentage (2/15) of the scenarios within the 70-74% request range and approximately half the requests in the band immediately below. With the Smoothed algorithm, about half the requests in the 75-79% request range are accepted, and nearly all in the 70-74% range. The Smoothed algorithm increases network utilization by approximately 10 to 15%. One major benefit of vbrSim is the ability to take advantage of read-ahead achieved when the disk bandwidth exceeded the minimum guarantee. This is enhanced when only some of the streams are actively reading off the disk, reducing the relative number of seeks, producing a significant change in admission results. The achieved bandwidth of the disk increases by approximately 10%, with only 9 of the 193 scenarios rejected by the disk system and the network block schedules are slightly diflferent. Pet Band 95-100 90-94 85-89 80-84 75-79 70-74 65-69 60-64 55-59 50-54 45-49 Total
Number of Scenarios 29 9 12 14 5 22 23 34 25 17 2 193
Disk Accepted 22 7 12 14 5 22 23 34 25 17 2 184
Original Accepted 0 0 0 0 0 1 5 10 24 17 2 59
Smoothed Accepted 0 0 0 0 3 7 21 34 25 17 2 103
Table 3: Admission Performance: Staggered Arrivals (% of Disk) Table 3 shows admission decisions under staggered arrival. The Original algorithm performed significantly worse in terms of percentage of bandwidth requests that are accepted. As mentioned before, many of the scenarios move to a lower percentage request band, due to the increase in achieved bandwidth from the disk. This shows that the increase in disk bandwidth achieved due to stagger was greater than the increase in the amount of accepted network bandwidth. For the Smoothed algorithm, relative acceptance rates are unchanged. The ability to accept streams at the network level and at the disk level have kept up with the increase in achieved bandwidth off the disks. Another experiment examined the percentage of the network bandwidth that can be accepted. The results of admission for the simultaneous arrivals and the
347
staggered arrivals case are shown in Tables 4 and 5. We see that smoothing is an effective way to enhance the admission performance. A maximum of 80% of the network bandwidth can be accepted by the Original algorithm on simultaneous arrivals, although most of the scenarios in that range are accepted. The smoothing operation allows almost all scenarios below 80% to be accepted, along with a small number with greater bandwidth requests. Pet Band
Number of Scenarios
Original Accepted
Smoothed Accepted
95-100 90-94 85-89 80-84 75-79 70-74 65-69 60-64 Total
0 5 4 18 32 19 11 2 91
0 0 0 1 19 18 11 2 51
0 0 2 17 32 19 11 2 85
Table 4: Admission Performance: Simultaneous Arrivals (% of Network)
Pet Band
Number of Scenarios
Original Accepted
Smoothed Accepted
95-100 90-94 85-89 80-84 75-79 70-74 65-69 60-64 Total
5 19 15 27 29 22 11 2 131
0 0 2 3 18 22 11 2 59
0 2 14 27 29 22 11 2 106
Table 5: Admission Performance: Staggered Arrivals (% of Network) In Table 5, we see that the maximum bandwidth range requested and accepted by the disk subsystem approaches 100 Mbps. None of these high bandwidth scenarios are accepted by either network admission algorithm. A few scenarios between 80% and 90% can be accepted with the Original algorithm. The Smoothed algorithm accepts nearly all requests below 90% of the network bandwidth, due to the fact that a smaller number of streams are reading and transmitting the first network slot at the same time. With staggered arrivals, all streams but the most recently accepted stream are sending at smoothed rates, meaning lower peaks for the entire scenario.
348
The results of these experiments enable an additional aspect of the CMFS design to be evaluated: scalability. It is desirable that the disk and network bandwidth scale together. In the configuration tested, 4 disks (with minRead = 23) provided 96 Mbps of guaranteed bandwidth with a network interface of 100 Mbps. At this level of analysis, it would seem a perfect match, but the tests with simultaneous arrivals did not support this conjecture. A system configured with guaranteed cumulative disk bandwidth approximately equal to nominal network bandwidth Wcis unable to accept enough streams at the disk in order to use the network resource fully. There were no scenarios accepted by the disk that requested more than 94% of the network bandwidth. In Table 4, there are only 4 scenarios in the 85-89% request range, that were accepted by the disk system. In Table 5, there were 15 such scenarios. This increase is only due to the staggered arrivals as the same streams were requested in the same order. With staggered arrivals, the network admission control became the performance limitation, as more of the scenarios were accepted by the disk. There were no scenarios that requested less than 100 Mbps that were rejected by the disk. This arrival pattern would be the common case in the operation of a CMFS. Thus, equating disk bandwidth with network bandwidth is an appropriate design point which maximizes resource usage for moderate bandwidth video streams of short duration if the requests arrive staggered in time.
7
Related Work
The problem of characterizing the network resource requirements of Variable Bit Rate audio/video transmission has been studied extensively. Zhang and Knightly [14] provide a brief taxonomy of the approaches from conservative peak-rate allocation to probabilistic allocation using VBR channels of networks such as ATM. The empirical envelope is the tightest upper bound on the network utilization for VBR streams, as proven in Knightly et al. [6], but it is computationally expensive. This characterization has inspired other approximations [3] which are less accurate, less expensive to compute, but still provide useful predictions of network traffic. Traffic shaping has been introduced to reduce the peaks and variability of network utilization for inherently bursty traffic. Graf [3] examines live and stored video and provides traffic descriptors and a traffic shaper based on multiple leaky-buckets. Traffic can be smoothed in an optimal fashion [12], but requires apriori calculation of the entire stream. If only certain portions of the streams are retrieved (i.e. I-frames only for a fast-motion low B/W MPEG stream delivery), the bandwidth profile of the stream is greatly modified. Four different methods of smoothing bandwidth are compared by Feng and Rexford [2], with particular cost-performance tradeoffs. The algorithms they used attempt to minimize the number of bandwidth changes, and the variability in network bandwidth, as well as the computation required to construct the
349
schedule. They do not integrate this with particular admission strategies other than peak-rate allocation. This bandwidth smoothing can be utilized in a system that uses either variable bit rate network channels or constant bit rate channels. Recent work in the literature has shifted the focus away from true VBR on the network towards variations of Constant Bit-Rate Transmission [8],[14]. Since the resource requirements vary over time, renegotiation of the bandwidth [4] is needed in most cases to police the network. This method is used by Kamiyama and Li [5] in a Video-On-Demand system. McManus and Ross [8] analyze a system of delivery that prefetches enough of the data stream to allow end-to-end constant bit rate transmission of the remainder without starvation or overflow at the client, but at the expense of substantial latency in start-up. Indications are that minimum buffer utilization can be realized with a latency of between 30 seconds and 1 minute [13]. For short playback times (less than 5 minutes) that may be appropriate for news-on-demand, such a delay would be unacceptable.
8
Conclusions and Further Work
In this paper, we have presented a network bandwidth characterization scheme for Variable Bit Rate continuous media objects which provides a detailed network block schedule indicating the bandwidth needed for eax;h time slot. This schedule can be utilized to police the bandwidth allocated for ea.ch network channel via sender-based rate control, or network-based renegotiation. We observed that the Original algorithm was susceptible to disk bandwidth peaks at the beginning of network slots. The Smoothed algorithm was introduced, taking advantage of client buffer space and excess network bandwidth that must be reserved, for a reduced overall reservation. The network admission algorithm provides a deterministic guarantee of data transmission, ensuring that no network slot has a cumulative bandwidth peak over the network interface bandwidth. Scenarios with simultaneous arrivals were limited by the disk subsystem. The disk admission control method [7] used in the CMFS, when combined with staggered arrivals, showed that the same disk configuration shifted the bottleneck to the network side. The network admission control algorithm and the smoothed network bandwidth stream characterization combined to provide an environment where scenarios that request up to 90% of the network interface can be supported. These experiments utilized a single value for the size of the network slot and a single granularity for the block size. Extensions to this work could include comparing admission results with different values for these two parameters.
References [1] Jill M. Boyce and Robert D. Gaglianello. Packet Loss Effects on MPEG Video Sent Over the Public Internet. In ACM Multimedia, Bristol, England, September 1998.
350
[2] Wu-Chi Feng and Jennifer Rexford. A Compcirison of Bandwidth Smoothing Techniques for the Transmission of Prerecorded Compressed Video. In IEEE Infocomm, pages 58-66, Los Angeles, CA, June 1997. [3] Mcircel Graf. VBR Video over ATM: Reducing Network Requirements through Endsystem Traffic Shaping. In IEEE Infocomm, pages 48-57, Los Angeles, CA, Jime 1997. [4] M. Grossglauser, S. Keshav, and D. Tse. RCBR: A Simple and Efficient Service for Multiple Time-Scale Traffic. In ACM SIGCOMM, pages 219-230, Boston, MA, August 1995. [5] N. Kamiyama and V. Li. Renegotiated CBR Transmission in Interactive Videoon-Demand Systems. In IEEE Multimedia, pages 12-19, Ottawa, Cfinada, June 1997. [6] E. W. Knightly, D. E. Wrege, J. Liebeherr, and H. Zhang. Fimdamental Limits and Tradeoffs of Providing Deterministic Gueirantees to VBR Video Traffic. In ACM SIGMETRICS '95. ACM, 1995. [7] D. Makcu-off, G. Neufeld, and N. Hutchinson. An Evaluation of VBR Admission Algorithms for Continuous Media File Servers. In ACM Multimedia, pages 143154, Seattle, WA, November 1997. [8] J. M. McManus and K. W. Ross. Video on Demeind over ATM: Constcint-Rate Transmission and Transport. In IEEE InfoComm, pages 1357-1362, San Francisco, CA, October 1996. [9] G. Neufeld, D. Makaroff, and N. Hutchinson. Design of a Variable Bit Rate Continuous Media File Server for an ATM Network. In IST/SPIE Multimedia Computing and Networking, pages 370-380, San Jose, CA, Januciry 1996. [10] G. Neufeld, D. MaJiaroff, cind N. Hutchinson. Server-Bcised Flow Control in a Continuous Media File System. In 6th International Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 29-35, Zushi, Japan, 1996. [11] Ranga Ramanujan, Atiq Ahamad, cind Ken Thurber. Traffic Control Mechanism to Support Video Multicast Over IP Networks. In IEEE Multimedia, pages 85-94, Ottawa, Ccinada, June 1997. [12] J. D. Salehi, Z. Zhang, J. F. Kmose, and D. Towsley. Supporting Stored Video: Reducing Rate Variability smd End-to-End Resource Requirements through Optimal Smoothing. In ACM SIGMETRICS, May 1996. [13] Subrsihata Sen, Jayanta Dey, James Kurose, John Stcinkovic, and Don Towsley. CBR Transmission of VBR Stored Video. In SPIE Symposium on Voice Video and Data Communications: Multimedia Networks: Security, Displays, Terminals, Gateways, DaDas, TX, November 1997. [14] H. Zhang and E. W. Knightly. A New Approach to Support Delay-Sensitive VBR Video in Packet-Switched Networks. In 5th International Workshop on Network and Operating Systems Support for Digital Audio and Video, pages 381397, Durham NH, April 1995.
Design and Evaluation of Ring-Based Video Servers Christophe Guittenit and Abdelsiziz M'zoughi Institut de Recherche en Informatique de Toulouse (HUT), University Paul Sabatier, 31400 Toulouse, Prance {guitteni ,iiizoughi}Qirit. f r
Abstract. Video-on-demand servers (VODS) must be based on costeffective architectures. Therefore architectures based on clusters of PCs will probably be the most suitable to build VODSj they provide the same performance of a large server through the aggregation of many smaller, inexpensive nodes. In this paper, we show that the interconnection network used to build the cluster cein be very expensive if it is based on a set of switches. To obtain the most cost-effective architecture, we argue that VODS must be Dual Counter-Rotating Ring-based (DCRR-based). DCRR are very inexpensive and fulfill all the basic criteria needed for VODS architectures except scalability. To ciddress the scalability issue, we propose to enhance the design of DCRR by partitioning it logically in combination with three new policies ("fast stream migration", "stream splitting" and "distributed XORing"). This design brings very cost-effective and scalable VODS able to play up to 13500 MP'EG-2 concurrent streams using 252 nodes.
1
Introduction
Video-on-dem£ind is a mainly commercial application; the more inexpensive it is, the more it will succeed. Building cost-eflFective video-on-demand servers (VODS) is a complex task; these storage systems are those implying many constreunts. A VODS must serve many continuous data streams simultaneously, store a laxge amount of data, be fault-tolerant and offer a latency lower than one second for some applications (e.g. News-on-Demjmd). In order to reduce costs, clustered video server [11] is the solution generally used to interconnect a large number of hard drives in an expandable way. It provides the equivalent performance of a large server through the combination of many smaller, relatively inexpensive nodes. The disks are grouped within storage nodes (SN). Accessing the delivery network outside the VODS is carried out via interface nodes (IN). All the nodes are distributed over the interconnection network (figure 1). So as to get the most inexpensive server, PCs are generally used to make the nodes. It is the case in this paper. The nodes are sufiiciently autonomous so
352
USERS
Fig. 1. Clustered video server.
that the extension of VODS is as simple as just adding SNs or INs. There are no centralized resources: a resource failure cannot interrupt all the users' services. The interconnection network is the most important resource of clustered architectures. To build expandable architectures, it is necessary that the interconnection network links many nodes easily, that is, its bandwidth as well as the number of ports must be sufficient. To build fault-tolerant architectures, the interconnection network must allow hot swap in order to add or remove nodes without interrupting the VOD services. Similarly, it must have enough redundant paths to avoid service interruptions in case of a link or a node failure. To obtain the most cost-effective architecture, we suggest in this paper to build Dual Counter-Rotating Ring-based (DCRR-based) VODS, principally because this interconnection network is very inexpensive. At first sight, DCRR does not include enough links to build scalable VODS. To address the scalability issue, we propose to enhance the design by peirtitioning logically the DCRR. However partitioning causes load imbalance in the VODS, so we propose and evaluate two new policies to repUcate videos dynamically when the load imbalance occurs. These policies {fast stream migration and stream splitting) accelerate the replication process by up to 10 times. We propose also a third policy {distributed XORing) that reduces the bajidwidth used to recover lost data when an SN fails in a parity RAID. These three policies allow the use of DCRR to build scalable VODS able to play up to 13500 MPEG-2 concurrent streams (using a cluster of 252 nodes). In section 2, we evaluate quantitatively the cost of some interconnection networks and show that DCRR-based architectures are very cost-effective. Section 3 presents the DCRR, and exposes and compares several types of DCRR-based interfaces. It gives the eidvantages of a DCRR and also its drawbacks: its insufficient scalability and the overload caused by a rupture of links. In section 4, we introduce the concept of DCRR logical partitioning and show the trade-off concerning this method: the pjirtitioning lessens load on the DCRR but increases the workload imbalance. In section 5, we present the system model we use to simulate the behavior of the DCRR-based VODS. We explore, in section 6, the partitioning trade-off with experimental evaluations. We show in section 7 how to bedance the load in a partitioned DCRR with fast video replications by using
353
fast stream migmtion and stream splitting. In section 8, we measure the overload caused by a failure in the system and we propose distributed XORing to reduce this overload. We end with a summary and some additionad remarks in section 9.
2
Cost evaluation of switch-beised networks and ring-based networks
Interconnection networks built using standard devices are less expensive than dedicated networks like those in MPPs. Thus we focus in this section on the following standard interfaces: ATM-based LAN networks [2,5], Fibre Channel (FC) [12], Serial Storage Architecture (SSA) [1] and Sebring Ring [13]. Most of these interfaces can be found in two classes of topologies: switch-based and ring-based (see figiire 2).
switch iabric
towards delivery network
Fig. 2. A switch-based interconnection network and a ring-beised interconnection network. Switch-bcised networks include generally more links than ring-based networks for the same number of nodes. ATM-based LAN networks can be switched (as with ATM WAN or MAN) or ring-based (MetaRing, ATMR, CRMA-II,...). FC and SSA interfaces are usually used to replace the SCSI-bus in I/O sirchitectures. However these interfaces are greatly versatile and can be applied to interconnection networks: switch-based or ring-based. Fibre Channel becomes a FC-Arbitred Loop (FC-AL) in its ringbased variant. The Sebring Ring is only ring-based. In this section, we evaluate quantitatively the cost difference between switchbased ATM interfaces and ring-based ATM interfaces. We estimate the cost of the interconnection network and of the VODS as a function of the number of SNs interconnected. We choose to evaluate ATM-based mterfaces because given that the delivery network is aiao ATM based, the INs can be removed by connecting the interconnection outputs directly to ATM switches in the delivery network. Note that the IN removal has an impact on the SN management. SNs must then be able to achieve operations dedicated to INs. For example, QoS protocol between the proxy and users, service aggregation or migration... To evaluate the switch-based ATM interconnection network, we need to build n X n switch fabrics (where n is the number of SNs). A n x n switch fabric is split, into smaller switches, using a Clos network. Note that this network has a much larger bandwidth than a ring-based network and thus, we do not compare
354
two interconnection networks equivjilent in performance. Our objective is to show that the interconnection network can be the most expensive resource in a VODS. Figure 3 shows the results of our cost evaluation, and table 1 shows the parameters used to determine the cost of the architectures. Table 1. Cost pafameters (source: www.buymicro.com). Device ATM switch 8 ports {OC-3 155 Mbit/s) ATM switch 14 ports (OC-3 155 Mbit/s) ATM switch 52 ports (OC-3 155 Mbit/s) PCI/ATM interfeice card SN (PC Intel Pentiumll 400, 128 MB, 4 IDE 10 GB disks)
Price inSUS 6660 9500 89000 500 3000
-switch-based ""-ring-based I
*Snumber of SN
number of SN
Fig. 3. Cost comparison of a Clos network-beised interconnection network and a ringbased interconnection network. Our evaluation confirms that Clos network-based architectures are much more expensive than ring-based architectures. It is due to the high cost of the switches that corresponds to close to 80% of the total VODS cost. The interconnection network multiplies the VODS cost by more than 2. Figure 3 also shows that the ring cost evolves proportionally to the number of SNs, contrary to switch-based networks for which the cost evolves in fits and starts. These jerks are due to the fact that more powerful (and so more expensive) switches must be used when the traffic grows. These results can be extended to FC or SSA. This evaluation shows that the use of an inexpensive interconnection network reduces drastically the price of a VODS: ring-based networks can lead to truly cost-eflFective VODS. The ring network is inexpensive because it includes few links. Our objective in the following sections, is to find policies that can deal with the lack of links to build scalable VODS.
355
3
Dual Counter-Rotating Rings (DCRR)
For fault-tolerance reasons, the ring is actually made up of two unidirectional sub-rings; data transmission on a sub-ring is carried out in the opposite direction to transmission on the other ring: it forms a dual counter-rotating ring. This topology is equivalent to a single ring composed of bi-directional links. The DCRR is thus tolerant of a node or a link failure. When a node or a link fails, both sub-rings are broken in the same place. The four extremities of the two broken rings are joined to form a new ring (figure 4).
ring interfaces
Fig. 4. A Dual Counter-Rotating Ring working in normal mode and in degraded mode. Table 2 shows interfjices that could be used to form a DCRR-based interconnection network. A very important feature concerning this type of network is spatial reuse, i.e., the ability to caxry out concurrent transmissions on independent links. FC-AL is the only interface that does not allow spatial reuse. In this case, the ring is viewed as a unique resource and a node must get control of the ring before transmitting its packet (Uke on a bus). We will see in section 4, that we need spatial reuse to partition the DCRR and then to build scalable VODS. Table 2. Comparison of ring-based interfaces. Every interface allows hot swap. Interfaces ATMR SSA FC-AL Sebring Ring Max. throughput on 312 (OC-48) 20 125 532 one sub-ring (MB/S) Spatial reuse Yes Yes No Yes
SSA allows spatial reuse but the maximum throughput on its links is not sufficient to build large servers. We will see in section 6 that high throughputs on links reduce the load imbalance in large VODS.
356
Both ATMR ajid Sebring Ring have enough bandAvidth and concurrency to be good interconnection networks for VODS. At IRIT (Institut de Recherche en Informatique de Toulouse, Computer Science Research Institute of Toulouse), we are developing a prototype^ of a VODS based on the Sebring Ring. We chose the Sebring Ring because of its large bandwidth and because this network is a PCI-PCI bridge. It makes the construction of PC-based architecture easier and it allows PCI cards to be directly connected on the ring (i.e. the cards do not have to be in a PC). This feature is very useful to alleviate the potential bottleneck that exists in each IN. The main drawbeick of a DCRR is that its diameter (defined as the maximum among the lengths of shortest paths between all possible pairs of nodes [14]) evolves in o{n) where n is the number of nodes in the ring. So the mean throughput on every link increases in proportion to the number of nodes (given a imiform distribution of transfer addresses over all nodes). Links are therefore saturated when the ring integrates too many nodes. In summary, DCRR is very expandable because it is easy to add resources but on the other hand it is insufficiently scalable because an added resource does not necessarily lead to a gain in performance. Besides, the connectivity of a DCRR (defined to be the minimum number of nodes or links whose removal causes the network to be disconnected into two or more components [14]) is too small: it is equal to 2. Thus if a link breeiks, a lajge part of the load is diverted to avoid the rupture and clutters the remaining links. These two drawbacks could dramatically limit the usefulness of a DCRR for Ijirge VODS. In the next sections, we will show how to lessen these two drawbacks.
4
Logical partitioning of a DCRR-based VODS
Disk array partitioning has been proposed in [3,6,7] to improve fault-tolerance and to maJce the memagement of the RAIDs easier. Partitioning increases the performance of storage systems that include a large number of disk drives [3]. In this section we show that partitioning is also very interesting to build large clustered VODS based a on low-performance interconnection network like DCRR. Figure 5 illustrates our proposition to adapt the partitioning to a DCRR-based VODS. Each SN is the PC described in table 1 and INs contain PCI/ATM 155 Mb/s interface cards. Data are striped in order to distribute the workload uniformly over aJl SNs. Redundant information is distributed on every SN cind allows data to be recovered in case of a SN failure: a RAID is created. So different SNs can serve a user during the same service. On the other hand, during a service, data is always transmitted by the same IN. Indeed, trauisfers on the deUvery ' This project isfinancedby the Conseil Regional de Midi Pyrenees (Regional Council of Midi-Pyr6n6es) and the CNRS (Centre National de la Recherche Scientifique, National Center of Scientific Research).
357
network, between the INs and the users, are carried out using connected mode protocols with resource reservation at the connection time (e.g. ATM protocol or IPv6 when using RSVP). Changing the IN during a service would require a new connection and therefore new resource reservation. Thus, a packet emitted by an SN can cover many links (at maximum: half of the toted number of Hnks in one sub-ring) before reaching the right IN. In order to reduce the accumulation of throughput on links near INs (as shown in RAID A, figure 5), spatial reuse is used to partition the ring into several independent RAIDs and thus to form a logically partitioned DCRR. SNs are grouped around one or several INs. Every RAID contains whole videos. The partitioning degree is defined as the number of independent RAIDs that the VODS includes.
RAID A
towards delivery nerwodc
RAIDS
towards delivery networic
F i g . 5. Partitioning of a DCRR into two independent RAIDs.
DCRR-partitioning improves scalability and fault-tolerance. Scalability is no longer limited by the maximum throughput on a link and fault-tolerance is improved because the server can tolerate multiple faulures provided that they occur in different RAIDs. Moreover, fewer users are concerned by an SN failure (since RAIDs are smaller, multiple and independent). Note that logical partitioning is very different from physical partitioning. In the latter case, the DCRR is physically divided into several DCRRs; thus every data transmission needed for load balancing or data rebuilding after a failure (see sections 7 and 8) is done using the delivery network and could overload it. Moreover, a physically partitioned ring offers less versatility in case of an addition or a removal of nodes. It cannot be logically reorganized to take advantage of the new resources. We therefore think that partitioning must be logical. However, partitioning has a drawbaxi: the load on SN could be unbalanced since videos are not striped on all SNs but only on SNs belonging to the sajne RAID: a RAID can be overlojided if it stores very popular videos. Thus, there is a tradeoff based on the partitioning degree. K the partitioning degree is high (i.e. the RAIDs are small), the load on RAIDs would probably be unbalanced. If the partitioning degree is low (i.e. the RAIDs are large), throughput on the links near INs would be cumbersome and fault tolerance would be poor. To study and taJce advantage of this tradeoff, we have run simulations on a model of the VODS designed in this section.
358
5
System model
User streams are served in rounds. During a round, every SN generates a given number of packets that axe sent to the users through INs. A pjicket must contain enough data to allow the user to "wait" for the end of the round. To simplify the model, we assume that the size of a packet is equal to the striping unit (SU) of the RAID. In our system, SNs are PCs able to produce 30 MB/s. Every stream has a throughput equal to 500 KB/s. So every SN serves 60 MPEG-2 streams simultjineously. SU size is equal to 50 KB; therefore every round lasts 1/lOth of second. In order to use all the disks bandwidth, some buffers are used on SNs to transfer data from disk to interconnection network (so a block, read from a disk in one access, is larger than an SU). Videos last 3600 seconds and every SN can store 25 videos. Concerning data layout, we chose Flat-Left-Symmetric RAID 5 [9]; this parity placement is particularly efficient in order to distribute the workload uniformly on the SN. Link throughput is set at 250 MB/s for each sub-ring. Users' requests are modeled according to a Zipf distribution [4] (with a coefficient equal to 0.271 because it closely models a movie store rental distribution).
6
Evaluation of DCRR partitioning
A DCRR is logically partitioned into independent RAIDs in order to reduce the throughput of the most loaded link. To evaluate the effect of partitioning the DCRR, we carried out simulations on VODS with partitioning degrees of 1, 2, 4, 8, 16 and 32. We measure the number of streams that the architecture is able to play as a function of the number of SNs interconnected.
NbSNs
Fig. 6. Scalability of logically partitioned DCRR-based VODS.
359
Figure 6 shows that partitioning improves greatly the scalability of a DCRRbased VODS. With a small partitioning degree (1, 2 et 4), links of the DCRR are overloaded because of the throughput accumulation that occurs on links near INs. On the other hand, the improvement of the scalability is not linear with the partitioning degree (for high partitioning degrees). As pointed out in section 4, some RAIDs contain more popular videos than others; these RAIDs are then too small to deal with the loeid. The workload becomes unbalanced and that severely diminishes the scalability of the VODS. Figure 6 clearly shows that a partitioning degree of 32 yields architectures less scjilable than architectures which use a partitioning degree of 8 or 16. With high partitioning degrees, the device that limits the scalability of a VODS is not the DCRR anymore: it is the RAID. In order to effectively improve scalability using high partitioning degrees, it is necessary to decrease the sensitivity of the partitioned ring to the user demand imbalance. As popularity of videos is very difficult to foresee, it is necessary to balance the load dynamically by moving data in order to avoid popular videos being concentrated in some RAIDs. Thus, we use a well-known technique used in server jirray lojid-baleincing: replications.
7
Load balancing
So as to dynamically balance the load, we replicate some videos onto severed RAIDs. The use of replications to balance the load in VODS has already been studied [6-8]. However in this paper, we put the emphasis on policies designed for accelerating the replication process. Since replications are done once the load imbalance occurs, they must be completed as soon as possible in order to reduce the load imbalance time. Replications are done from overload RAIDs {original RAIDs) to less loaded RAIDs {target RAIDs). If we want the replications to be fast enough, the bandwidth part dedicated to replications must be increased in the target RAID and more critically, in the origined RAID (already overloaded). In this section, we propose and evaluate two new policies that accelerate the replication process without consuming extra bandwidth of the original RAID. First, we propose that video should be dynamically replicated by taking advantage of the streams used for service to users {service streams). We call this policy, stream splitting. The packets belonging to a service stream are duplicated: the original service stream goes towards the IN (service to the user) and the duplicated load balancing stream goes towards the SNs of the target RAID. Thus a large part of each replication is done using disk accesses already used to serve customers. In order to deal with data that are not replicated by the duplicated load balancing streams, dedicated load balancing streams are created. Note that there are much less dedicated streams than duplicated streams. Second, in order to move services from the original RAID to the target RAID, it is not necessary to wait for the video to be entirely replicated to the target RAID. Assuming that the throughput of a dedicated load balancing stream is equal to or greater than the throughput of the service streeim, a service could
360
be moved aa soon as the user needs an SlPthat is already on the target RAID. This policy, fast stream migration, accelerates the load balancing.
300 -
Ho i ai
1 1
1
1
l
i
2*i y " I M • u -M^^