324 109 6MB
English Pages 359 [378] Year 2011
Communications in Computer and Information Science
130
Mohammad S. Obaidat Joaquim Filipe (Eds.)
e-Business and Telecommunications 6th International Joint Conference, ICETE 2009 Milan, Italy, July 7-10, 2009 Revised Selected Papers
13
Volume Editors Mohammad S. Obaidat Monmouth University West Long Branch, NJ, USA E-mail: [email protected] Joaquim Filipe INSTICC and IPS Estefanilha, Setúbal, Portugal E-mail: [email protected]
ISSN 1865-0929 e-ISSN 1865-0937 ISBN 978-3-642-20076-2 e-ISBN 978-3-642-20077-9 DOI 10.1007/978-3-642-20077-9 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011923681 CR Subject Classification (1998): C.2, J.1, K.4.4, K.6.5, K.4.2, D.4.6
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
The present book includes extended and revised versions of a set of selected best papers from the 6th International Joint Conference on e-Business and Telecommunications (ICETE), which was held in July 2009, in Milan, Italy. This conference reflects a continuing effort to increase the dissemination of recent research results among professionals who work in the areas of e-business and telecommunications. ICETE is a joint international conference integrating four major areas of knowledge that are divided into four corresponding conferences: ICE-B (International Conference on e-Business), SECRYPT (International Conference on Security and Cryptography), WINSYS (International Conference on Wireless Information Systems) and SIGMAP (International Conference on Signal Processing and Multimedia). The program of this joint conference included several outstanding keynote lectures presented by internationally renowned distinguished researchers who are experts in the various ICETE areas. Their keynote speeches contributed to heighten the overall quality of the program and significance of the theme of the conference. The conference topic areas define a broad spectrum in the key areas of ebusiness and telecommunications. This wide-view reporting made ICETE appealing to a global audience of engineers, scientists, business practitioners, ICT managers and policy experts. The papers accepted and presented at the conference demonstrated a number of new and innovative solutions for e-business and telecommunication networks and systems, showing that the technical problems in these closely related fields are challenging and worthwhile to approach with an interdisciplinary perspective such as that promoted by ICETE. ICETE 2009 received 300 papers in total, with contributions from 50 different countries, in all continents, which really shows the success and global dimension of the conference. To evaluate each submission, a double-blind paper evaluation method was used; each paper was reviewed by at least two experts from the International Program Committee, and most papers received three reviews or even more. In the end, 114 papers were selected for oral presentation and publication, corresponding to a 38% acceptance ratio. Of these only 34 were accepted as full papers (11% of submissions) and 80 as short papers. Additionally, 51 papers were accepted for poster presentation. We hope that you will find this collection of the best ICETE 2009 papers an excellent source of inspiration as well as a helpful reference for research in the aforementioned areas. July 2010
Joaquim Filipe Mohammad S. Obaidat
Conference Committee
Conference Co-chairs Joaquim Filipe Mohammad S. Obaidat
Polytechnic Institute of Set´ ubal / INSTICC, Portugal Monmouth University, USA
Program Co-chairs Pedro Assun¸c˜ao Rafael Caldeirinha S´ergio Faria Eduardo Fern´ andez-medina Javier Hernando Manu Malek David Marca Mohammad S. Obaidat Boris Shishkov Marten Van Sinderen
Polytechnic Institute of Leiria, Portugal (SIGMAP) Polytechnic Institute of Leiria, Portugal (WINSYS) Polytechnic Institute of Leiria, Portugal (SIGMAP) University of Castilla-La Mancha, Spain (SECRYPT) Technical University of Catalonia, Spain (SECRYPT) Stevens Institute of Technology, USA (SECRYPT) University of Phoenix, USA (ICE-B) Monmouth University, USA (WINSYS) IICREST / Delft University of Technology, The Netherlands (ICE-B) University of Twente, The Netherlands (ICE-B)
Organizing Committee S´ergio Brissos Helder Coelhas Vera Coelho Andreia Costa Bruno Encarna¸c˜ao B´arbara Lima Raquel Martins Elton Mendes Carla Mota Vitor Pedrosa
INSTICC, INSTICC, INSTICC, INSTICC, INSTICC, INSTICC, INSTICC, INSTICC, INSTICC, INSTICC,
Portugal Portugal Portugal Portugal Portugal Portugal Portugal Portugal Portugal Portugal
VIII
Conference Committee
M´ onica Saramago Jos´e Varela Pedro Varela
INSTICC, Portugal INSTICC, Portugal INSTICC, Portugal
ICE-B Program Committee Geetha Abeysinghe, UK Osman Abul, Turkey Shakil Ahktar, USA Fahim Akhter, UAE Antonia Albani, The Netherlands Panagiotes Anastasiades, Greece Anteneh Ayanso, Canada Gilbert Babin, Canada Eduard Babulak, Canada Ladjel Belllatreche, France Morad Benyoucef, Canada Jun Bi, China Ch. Bouras, Greece Andrei Broder, USA Erik Buchmann, Germany Rebecca Bulander, Germany Christer Carlsson, Finland Michelangelo Ceci, Italy Wojciech Cellary, Poland Patrick Y.K. Chau, Hong Kong Dickson Chiu, China Soon Chun, USA Jen-Yao Chung, USA Michele Colajanni, Italy Rafael Corchuelo, Spain Hepu Deng, Australia Peter Dolog, Denmark Khalil Drira, France Yanqing Duan, UK Erwin Fielt, Australia Flavius Frasincar, The Netherlands George Giaglis, Greece Claude Godart, France Paul Grefen, The Netherlands Giovanna Guerrini, Italy Mohand Sa¨ıd Hacid, France Hyoil Han, USA G. Harindranath, UK Milena Head, Canada
Birgit Hofreiter, Austria Weihong Huang, UK Christian Huemer, Austria Ela Hunt, UK Arun Iyengar, USA Nallani Chackravartula Sriman Narayana Iyengar, India Ibrahim Kushchu, UK Anton Lavrin, Slovak Republic Dahui Li, USA Yinsheng Li, China Chin Lin, Taiwan Sebastian Link, New Zealand Liping Liu, USA Hui Ma, New Zealand Zaki Malik, USA Tokuro Matsuo, Japan Gavin McArdle, Ireland Jan Mendling, Germany Brian Mennecke, USA Paolo Merialdo, Italy Adrian Mocan, Germany Ali Reza Montazemi, Canada Maurice Mulvenna, UK Mieczyslaw Muraszkiewicz, Poland Li Niu, Australia Dan O’Leary, USA Dana Petcu, Romania Krassie Petrova, New Zealand Pascal Poncelet, France Pak-Lok Poon, China Philippos Pouyioutas, Cyprus Ana Paula Rocha, Portugal Joel Rodrigues, Portugal Gustavo Rossi, Argentina David Ruiz, Spain Jarogniew Rykowski, Poland Markus Schneider, USA Timos Sellis, Greece
Conference Committee
Quah Tong Seng, Singapore Sushil Sharma, USA Quan Z. Sheng, Australia Mario Spremic, Croatia Zhaohao Sun, Australia Thompson Teo, Singapore Ramayah Thurasamy, Malaysia Thanassis Tiropanis, UK
Laurentiu Vasiliu, Ireland Tomas Vitvar, Austria Adam Vrechopoulos, Greece Michael Weiss, Canada Jongwook Woo, USA Lai Xu, Switzerland Guangquan Zhang, Australia Constantin Zopounidis, Greece
ICE-B Auxiliary Reviewer Jos´e Mar´ıa Garc´ıa, Spain
SECRYPT Program Committee Sudhir Aggarwal, USA Isaac Agudo, Spain Gail-joon Ahn, USA Luiz Carlos Pessoa Albini, Brazil Eduard Babulak, Canada Yun Bai, Australia Dan Bailey, USA Ken Barker, Canada Peter Bertok, Australia Carlo Blundo, Italy Indranil Bose, Hong Kong Richard R. Brooks, USA Gil Pino Caballero, Spain Roy Campbell, USA Zhenfu Cao, China Chin-Chen Chang, Taiwan Rocky K.C. Chang, Hong Kong Pascale Charpin, France Yu Chen, USA Zesheng Chen, USA Stelvio Cimato, Italy Debbie Cook, USA Nathalie Dagorn, France Paolo D’arco, Italy Anupam Datta, USA Bart De Decker, Belgium Shlomi Dolev, Israel Robin Doss, Australia Nicola Dragoni, Denmark
Falko Dressler, Germany Robert Erbacher, USA Eduardo B. Fernandez, USA Simone Fischer-H¨ ubner, Sweden Mariagrazia Fugini, Italy Steven Furnell, UK Carlos Goulart, Brazil Lisandro Granville, Brazil R¨ udiger Grimm, Germany Stefanos Gritzalis, Greece Drew Hamilton, USA Javier Hernando, Spain Amir Herzberg, Israel Alejandro Hevia, Chile Jiankun Hu, Australia Min-Shiang Hwang, Taiwan Markus Jakobsson, USA Christian Damsgaard Jensen, Denmark Hai Jiang, USA Dong Seong Kim, USA Seungjoo Kim, Korea, Republic of Michael Kounavis, USA Evangelos Kranakis, Canada Ralf Kuesters, Germany Chi-Sung Laih, Taiwan Chin-Laung Lei, Taiwan Albert Levi, Turkey Shiguo Lian, China
IX
X
Conference Committee
Antonio Lioy, Italy Luis de la Cruz Llopis, Spain Olivier Ly, France Khamish Malhotra, UK Yoshifumi Manabe, Japan Olivier Markowitch, Belgium Gianluigi Me, Italy Ali Miri, Canada Atsuko Miyaji, Japan Mohamed Mosbah, France Yi Mu, Australia Jalal Al Muhtadi, Saudi Arabia James Muir, Canada Volker M¨ uller, Luxembourg Juan Gonzalez Nieto, Australia Jos´e Luis Oliveira, Portugal Martin Olivier, South Africa Rolf Oppliger, Switzerland Carles Padro, Spain G¨ unther Pernul, Germany Marinella Petrocchi, Italy Raphael C.-w. Phan, UK Roberto Di Pietro, Italy Krerk Piromsopa, Thailand George Polyzos, Greece Miodrag Potkonjak, USA Douglas Reeves, USA Peter Reiher, USA Rodrigo Roman, Spain David Samyde, USA Aldri Santos, Brazil Susana Sargento, Portugal Damien Sauveron, France
Erkay Savas, Turkey Bruno Schulze, Brazil Dimitrios Serpanos, Greece Alice Silverberg, USA Haoyu Song, USA Paul Spirakis, Greece Mario Spremic, Croatia Yannis Stamatiou, Greece Aaron Striegel, USA Willy Susilo, Australia Kitt Tientanopajai, Thailand Ferucio Laurentiu Tiplea, Romania Jorge E. L´ opez de Vergara, Spain Luca Vigan`o, Italy Sabrina de Capitani di Vimercati, Italy Haining Wang, USA Hua Wang, Australia Lingyu Wang, Canada Xinyuan (Frank) Wang, USA Mariemma I. Yag¨ ue, Spain Wei Yan, USA Alec Yasinsac, USA George Yee, Canada Sung-Ming Yen, Taiwan Meng Yu, USA Ting Yu, USA Moti Yung, USA Nicola Zannone, The Netherlands Fangguo Zhang, China Zhongwei Zhang, Australia Sheng Zhong, USA Andr´e Z´ uquete, Portugal
SECRYPT Auxiliary Reviewers Ahmad Roshidi Amran, UK Jean-Philippe Aumasson, Switzerland Hayretdin Bahsi, Turkey Balasingham Balamohan, Canada Sonia Chiasson, Canada Prokopios Drogkaris, Greece Jos´e Luis Garc´ıa-Dorado, Spain Ken Grewal, USA
Divya Kolar, USA Michele Nogueira Lima, France Men Long, USA Behzad Malek, Canada Leonardo Martucci, Sweden Felipe Mata, Spain Amit Mishra, India Vincent Naessens, Belgium
Conference Committee
Germ´an Retamosa, Spain Jerry Sui, Canada Giacomo Victor Mc Evoy Valenzano, Brazil
XI
Kristof Verslype, Belgium Jie Wang, UK Ge Zhang, Sweden
SIGMAP Program Committee Gwo Giun (Chris) Lee, Taiwan Burak Acar, Turkey Harry Agius, UK Jo˜ ao Ascenso, Portugal Pradeep K. Atrey, Canada Eduard Babulak, Canada Azeddine Beghdadi, France Adel Belouchrani, Algeria Amel Benazza-Benyahia, Tunisia Shuvra Bhattacharyya, USA Adrian Bors, UK Abdesselam Bouzerdoum, Australia Jun Cai, Canada Wai-Kuen Cham, China Chin-Chen Chang, Taiwan Liang-Gee Chen, Taiwan Shu-Ching Chen, USA Ryszard S. Choras, Poland Paulo Lobato Correia, Portugal Jos´e Alfredo Ferreira Costa, Brazil Michel Crucianu, France Aysegul Cuhadar, Canada Rob Evans, Australia David Dagan Feng, Australia Wu-Chi Feng, USA Yun Fu, USA Mathew George, USA Zabih Ghassemlooy, UK Lorenzo Granai, UK Christos Grecos, UK Mislav Grgic, Croatia Patrick Gros, France William Grosky, USA Malka Halgamuge, Australia Omar Ait Hellal, USA Hermann Hellwagner, Austria Richang Hong, Singapore Guo Huaqun, Singapore
Jiri Jan, Czech Republic Chehdi Kacem, France Mohan Kankanhalli, Singapore Michael Kipp, Germany Yiannis Kompatsiaris, Greece Constantine Kotropoulos, Greece C.-C. Jay Kuo, USA Jeongkyu Lee, USA Jiann-Shu Lee, Taiwan Jing Li, UK Rastislav Lukac, Canada Antonio De Maio, Italy Manuel Perez Malumbres, Spain Hong Man, USA Andreas Maras, Greece Tao Mei, China Majid Mirmehdi, UK Klaus Moessner, UK Alejandro Murua, Canada Montse Pardas, Spain Raffaele Parisi, Italy Jong Hyuk Park, Korea, Republic of Andrew Perkis, Norway B´eatrice Pesquet-Popescu, France Ashok Popat, USA Viktor Prasanna, USA Xiaojun Qi, USA Gang Qian, USA Maria Paula Queluz, Portugal Anthony Quinn, Ireland Rudolf Rabenstein, Germany Matthias Rauterberg, The Netherlands Stefan Robila, USA Nuno Rodrigues, Portugal Brunilde Sanso, Canada Shin’ichi Satoh, Japan Xiaowei Shao, Japan Timothy K. Shih, Taiwan
XII
Conference Committee
Mingli Song, China John Aa. Sorensen, Denmark Yutaka Takahashi, Japan Jinhui Tang, Singapore Dacheng Tao, Singapore Daniel Thalmann, Switzerland Abdellatif Benjelloun Touimi, UK Steve Uhlig, Germany Meng Wang, China Zhiyong Wang, Australia
Toyohide Watanabe, Japan Michael Weber, Germany Kim-hui Yap, Singapore Yuan Yuan, UK Cha Zhang, USA Tianhao Zhang, USA Zhi-Li Zhang, USA Huiyu Zhou, UK Ce Zhu, Singapore
SIGMAP Auxiliary Reviewers Sofia Benjebara, Tunisia Bo Liu, China Tanaphol Thaipanich, USA
WINSYS Program Committee ¨ ur B. Akan, Turkey Ozg¨ Vicente Alarcon-Aquino, Mexico Shawkat Ali, Australia Eduard Babulak, Canada Marinho Barcellos, Brazil Novella Bartolini, Italy Bert-Jan van Beijnum, The Netherlands Paolo Bellavista, Italy Luis Bernardo, Portugal Rajendra V. Boppana, USA Rebecca Braynard, USA Jiannong Cao, Hong Kong Qi Cheng, Australia Sheng-Tzong Cheng, Taiwan Young-June Choi, USA I˜ nigo Cui˜ nas, Spain Arindam Das, USA Val Dyadyuk, Australia Tamer Elbatt, USA Patrik Floreen, Finland Chuan Heng Foh, Singapore Shengli Fu, USA Jie Gao, USA Damianos Gavalas, Greece
Matthias Hollick, Spain Raj Jain, USA Jehn-Ruey Jiang, Taiwan Eduard Jorswieck, Germany Abdelmajid Khelil, Germany Boris Koldehofe, Germany Vinod Kulathumani, USA Thomas Kunz, Canada Wing Kwong, USA Xu Li, France Qilian Liang, USA Chin Lin, Taiwan Kathy J. Liszka, USA Hsi-pin Ma, Taiwan Aniket Mahanti, Canada Pascale Minet, France Klaus Moessner, UK Gero Muehl, Germany Jean Frederic Myoupo, France Ed Pinnes, USA Andreas Pitsillides, Cyprus Christian Prehofer, Finland Daniele Puccinelli, Switzerland Nicholas Race, UK Rabie Ramadan, Egypt
Conference Committee
S.S. Ravi, USA Peter Reichl, Austria Daniele Riboni, Italy Ant´ onio Rodrigues, Portugal Michele Rossi, Italy Pierluigi Salvo Rossi, Italy J¨ org Roth, Germany Christian Schindelhauer, Germany Pablo Serrano, Spain Kuei-Ping Shih, Taiwan Tor Skeie, Norway Shensheng Tang, USA
Cesar Vargas-Rosales, Mexico Enrique Vazquez, Spain Dimitrios Vergados, Greece Yu Wang, USA Muhammed Younas, UK Ming Yu, USA Hans-Jurgen Zepernick, Sweden Yu Zheng, China Hao Zhu, USA Yanmin Zhu, China Artur Ziviani, Brazil
WINSYS Auxiliary Reviewers Johnathan Ishmael, UK Hai Ngoc Pham, Norway
Aggeliki Sgora, Greece Pedro Vieira, Portugal
Invited Speakers Blagovest Shishkov Pierangela Samarati David Marca Frank Leymann Gottfried Vossen
Bulgarian Academy of Sciences, Bulgaria University of Milan, Italy University of Phoenix, USA University of Stuttgart, Germany University of M¨ unster, Germany
XIII
Table of Contents
Invited Papers Stochastic Modeling and Statistical Inferences of Adaptive Antennas in Wireless Communications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Blagovest Shishkov
3
Protecting Information Privacy in the Electronic Society . . . . . . . . . . . . . . Sabrina De Capitani di Vimercati, Sara Foresti, and Pierangela Samarati
20
The Three Fundamental e-Business Models . . . . . . . . . . . . . . . . . . . . . . . . . . David A. Marca
37
Web 2.0: From a Buzzword to Mainstream Web Reality . . . . . . . . . . . . . . . Gottfried Vossen
53
Part I: e-Business Exploring Price Elasticity to Optimize Posted Prices in e-Commerce . . . . Burkhardt Funk Designing Digital Marketplaces for Competitiveness of SMEs in Developing Countries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Valentina Ndou, Pasquale Del Vecchio, and Laura Schina Strategic Planning, Environmental Dynamicity and Their Impact on Business Model Design: The Case of a Mobile Middleware Technology Provider . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Antonio Ghezzi, Andrea Rangone, and Raffaello Balocco Collaboration Strategies in Turbulent Periods: Effects of Perception of Relational Risk on Enterprise Alliances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marco Remondino, Marco Pironti, and Paola Pisano A Classification Schema for Mobile-Internet 2.0 Applications . . . . . . . . . . Marcelo Cortimiglia, Filippo Renga, and Andrea Rangone Plug and Play Transport Chain Management: Agent-Based Support to the Planning and Execution of Transports . . . . . . . . . . . . . . . . . . . . . . . . . . . Paul Davidsson, Johan Holmgren, Jan A. Persson, and Andreas Jacobsson
71
82
94
110 126
139
XVI
Table of Contents
Part II: Security and Cryptography Exploiting Crosstalk Effects in FPGAs for Generating True Random Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Octavian Cret¸, Radu Tudoran, Alin Suciu, and Tamas Gy¨ orfi
159
Offline Peer-to-Peer Broadcast with Anonymity . . . . . . . . . . . . . . . . . . . . . . Shinsaku Kiyomoto, Kazuhide Fukushima, and Keith M. Martin
174
Wireless Authentication and Transaction-Confirmation Token . . . . . . . . . Daniel V. Bailey, John Brainard, Sebastian Rohde, and Christof Paar
186
Optimizations for High-Performance IPsec Execution . . . . . . . . . . . . . . . . . Michael G. Iatrou, Artemios G. Voyiatzis, and Dimitrios N. Serpanos
199
An Efficient Protocol for Authenticated Group Key Agreement in Heterogeneous Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mounita Saha and Dipanwita RoyChowdhury
212
Privacy Enhancements for Hardware-Based Security Modules . . . . . . . . . . Vijayakrishnan Pasupathinathan, Josef Pieprzyk, and Huaxiong Wang
224
Flexible and Time-Based Anonymous Access Restrictions . . . . . . . . . . . . . Kristof Verslype and Bart De Decker
237
Part III: Signal Processing and Multimedia Applications Robust Numeric Set Watermarking: Numbers Don’t Lie . . . . . . . . . . . . . . Gaurav Gupta, Josef Pieprzyk, and Mohan Kankanhalli Corrupting Noise Estimation Based on Rapid Adaptation and Recursive Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fran¸cois Xavier Nsabimana, Vignesh Subbaraman, and Udo Z¨ olzer Recommender System: A Personalized TV Guide System . . . . . . . . . . . . . ´ Paulo Muniz de Avila and S´ergio Donizetti Zorzo An Enhanced Concept of a Digital Radio Incorporating a Multimodal Interface and Searchable Spoken Content . . . . . . . . . . . . . . . . . . . . . . . . . . . G¨ unther Schatter and Andreas Eiselt
253
266 278
291
Part IV: Wireless Information Networks and Systems Modulation-Mode Assignment in Iteratively Detected and SVD-Assisted Broadband MIMO Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andreas Ahrens and C´esar Benavente-Peces
307
Table of Contents
Wireless Sensor Resource Usage Optimisation Using Embedded State Predictors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . David Lowe, Steve Murray, and Xiaoying Kong A Self-configuring Middleware Solution for Context Management . . . . . . . Tudor Cioara, Ionut Anghel, and Ioan Salomie Device Whispering: An Approach for Directory-Less WLAN Positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Karl-Heinz Krempels, Sebastian Patzak, Janno von St¨ ulpnagel, and Christoph Terwelp Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
XVII
320 332
346
359
Invited Papers
Stochastic Modeling and Statistical Inferences of Adaptive Antennas in Wireless Communications Blagovest Shishkov Institute of Mathematics & Informatics, Bulgarian Academy of Sciences Acad. G. Bonchev Str., Bl.8, Sofia 1113, Bulgaria [email protected]
Abstract. Wireless Ad-hoc networks can be considered as a means of linking portable user terminals that meet temporarily in locations where connection to a network infrastructure is difficult. Hence, techniques are needed that contribute to the development of high-performance receiving antennas with the capability of automatically eliminating surrounding interference. Solutions to this problem for the conventional linear antenna arrays meet nevertheless complex architectures resulting in high power dissipation. We consider in the current paper novel algorithms for the analog aerial beamforming of a reactively controlled adaptive antenna array as a non-linear spatial filter by variable parameters. Being based on the Stochastic Approximation Theory, such algorithms have great potentials for use in mobile terminals and provide therefore important support for wireless communication networks. The resulting unconventional adaptive antennas can lead to dramatically simplified architectures leading in turn to significantly lower power dissipation and fabrication costs. Keywords: Adaptive beamforming, ESPAR antenna, Wireless Ad-hoccommunity networks, Interference reduction, Stochastic approximation, Rate of convergence.
1 Introduction We consider in the current paper novel algorithms for the analog aerial beamforming of a reactively controlled adaptive antenna array as a non-linear spatial filter by variable parameters and we will use in the remaining of the paper the term ‘Smart Antenna’ for such adaptive antennas. A smart antenna in general consists of an antenna array and an adaptive processor. It is impressive about such an antenna that by only applying the simple technique like least-mean squares (LMS) algorithm, one could achieve automatic adjustment of the array variable weights of a signal processor (Figure 1). For this reason, the smart antenna technology is already playing a major role with regard to current wireless communication networks and systems. Furthermore, decreases in integrated circuit cost and antenna advancement have made smart antennas attractive in terms of both cost and implementation even on small devices. M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 3–19, 2011. © Springer-Verlag Berlin Heidelberg 2011
4
B. Shishkov
Fig. 1. Basic Array Processing Scheme
In justifying further the smart antenna solution, we will compare it with other possible alternatives. We consider in general several receiver architectures for adaptive beamforming [1], as exhibited in Figure 2:
Fig. 2. Receiver architectures with adaptive beamforming in four different stages
These architectures work on continuous iterations of a sequence to: (1) receive a signal at each element (2) weight each and sum them up (3) estimate signal to interference and noise ratio (SINR) of the sum (4) change the weighting factors unless SINR meets the goal. In the DBF architecture (Digital Beamforming) - Fig.2a, processes (2), (3), and (4) are done in digital circuits, preferably in a single-chip. It sounds quite cost-effective. However, process (1) must be done prior to the digital stage. This implies an array of low-noise RF amplifiers, frequency converters, and A/D converters. Such analog circuits lead the system to high weight, high power consumption, and high fabrication cost. The problem becomes more serious as the number of elements increases.
Stochastic Modeling and Statistical Inferences of Adaptive Antennas
5
We compare this widespread approach of beamforming with an alternative aerial beamforming (ABF) - Fig.2d. This intends achieving the ultimate reductions in the size, weight, power dissipation, and fabrication cost of smart antennas. ABF works upon electromagnetic coupling among array elements. The name ABF comes because processes (1) and (2) are done in space, not in circuits. The weights are controlled by changing the equivalent elements length and their coupling strength. To accomplish it electronically, they employ voltage-controlled devices such as varactor diodes. Since ABF requires only one RF port to feed, the RF circuit scale is drastically reduced compared with the other configurations. The ESPAR antenna [2][3][4][5] is an example of a pragmatic implementation of the ABF concept. Discussing Fig. 2b and Fig. 2c would support the above claim, being omitted however for brevity. The Electronically Steerable Passive Array Radiator (ESPAR) antenna has been proposed, for low cost analog aerial beamforming and has shown strong potential for application in wireless communications and especially in mobile user terminals [6][7]. In [8] the direction of maximum gain is controlled by varying the load reactances of moderate number of dipoles (resonance mode) and optimum seeking univariate search procedure was applied; in [9] experimental results and theory are presented for a reactively steered adaptive array in a power inversion mode; in [10] adaptive beamforming is performed on switch-based operation and don’t perform continuous steering. All this papers do not meet the demand of adaptively cancelling interferences and reducing the additive noise. In [11][12] equations of voltages and currents of electromagnetic coupling among the radiators are described. Direct wireless transmissions from device to device make it possible to communicate even where there is no telecommunication infrastructure such as the base stations of cellular mobile phone systems or the access points of wireless LAN (Local Area Networks). If the devices cooperate to relay each other’s data packets they can form ad hoc wireless networks. Ad hoc wireless networks are expected to have many applications providing communications in areas where fixed infrastructure networks do not provide sufficient coverage. The essence of beamforming functionality of the ESPAR antenna is complex weighting in each branch of the array and adaptive optimization of the weights via adjustable reactances [11][12]. Hence, the main contribution we present in this reported research is a novel non-linear algorithm for the beamforming of unconventional antenna arrays, which algorithm is based on the Stochastic Approximation Theory. The remainder of this paper is organized as follows: Section 2 introduces the ESPAR antenna through which our 'message' will be conveyed. Section 3 elaborates some issues on signal modelling concerning the antenna. Section 4 introduces the objective function that concerns the optimization. Section 5 introduces the stochastic approximation method to be applied and considers the relevant statistical procedure to be applied with regard to the algorithm. Section 6 reflects the simulation results and performance analysis. Section 7 presents outline of further development of algorithm with regard to the contribution presented, namely the related blind algorithm that allows for intelligent modification of the objective function. Finally, Section 8 presents the conclusions.
6
B. Shishkov
2 ESPAR Antenna-Configuration and Formulation The basic configuration of the ESPAR antenna is depicted in Fig.3. Learning curves of (M+1) – element ESPAR antenna with M=6 are depicted in Fig.4.
Fig. 3. A 7-element adaptive ESPAR antenna
The 0-th element is an active radiator located at the center of a circular ground plane. It is a λ/4 monopole (where λ is the wavelength) and is connected to the RF receiver in a coaxial fashion. The remaining M elements of λ /4 monopoles are passive radiators surrounding the active radiator symmetrically with radius R= λ /4 of the circle. These M elements are loaded with varactors having reactance xm(m=1,2,…,M) . Thus adjusting the values of the reactances can change the patterns of the antenna. In practical applications, the reactances xm may be constrained in certain ranges, e.g., from xminΩ to xmaxΩ. The vector denoted by x = [x1,x2,…,xM] is called the reactance vector. Let the ESPAR antenna is operating in a transmit mode. The central radiator is excited by an RF signal source with internal voltage vs and output impedance zs. The voltages and currents are mutually related by electromagnetic coupling among the radiators and the following scalar circuit equations hold
ν0 = νs - zs i0
(1)
vm=−j xm im, m=1,2,…,M
(2)
th
where xm is the m varactor’s reactance. Employing voltage and current vectors v=[ν0,ν1,…,νM]T and i=[i0,i1,…,iM]T the above scalar equations are transformed into a single vector fashion as (3) v = νs u0 − Xi where the unit vector u0 = [1,0,…,0]T and diagonal reactance matrix X = diag[zs,jx1,…,jxM] are associated. Since the voltage and current vectors are mutually
Stochastic Modeling and Statistical Inferences of Adaptive Antennas
7
related by electromagnetic coupling among the radiators, they must satisfy another vector circuit equation i = Yv (4) where Y=[ykl] is referred as to the admittance matrix, ykl is the mutual admittance between the elements k and l (0 ≤ k, l ≤ M). By eliminating the voltage vector from the equations (3) and (4) we obtain the current vector explicitly i = w uc = v s (I + YX )−1 y 0 =
{
(5)
}
v s I − [YX ] + [YX ]2 − [YX ]3 ⋅ ⋅ ⋅ y 0
where y0 is the first column of Y and I is the identity matrix. According to the theorem of reciprocity the receive-mode radiation pattern of an antenna is equal to that of the transmit-mode and therefore the representation of Eq. (5) is also valid for the receive-mode scheme of Fig.4. In fact the term νs is a factor concerning the gain of the antenna. A key role is played by the RF current weight vector i [12] which does not have independent components but is unconventional one - wuc. Its nonlinear relationship by reactance vector x (see Eq. (5)) is not studied until now. Conventional techniques of determining w, are useless and adaptive beamforming of ESPAR antenna must be considered as a nonlinear spatial filter that has variable parameters.
3 Signal Model The following notations are to be used: b,b and B stand for scalar, vector and matrix in that order. Similarly B*, BT, BH and ║B║ represent the complex conjugate, transpose, complex conjugate transpose and norm of B respectively. Let E(⋅) denotes the statistical expectation operator. Let us consider an environment having P statistically independent signals (no multipath propagation). One of them plays role of the desired user terminal signal – SOI and others play role of the undesired user terminal signals – interferences. Consider the ESPAR antenna geometry, Fig.3. The M elements are uniformly spaced at azimuth angle
ϕm=2π(m−1) ⁄ M, m = 1,2,…,6 relative to a reference axis. Let θ is the angle of direction of arrival (DOA) of plane wave front of signal s(t) relative to the same reference axis. The delay between the pair of the m-th element and 0-th element is Rcos(θ-ϕm) and steering vector in the direction is a slight modification of that of circular array ω R j c cos (θ p −ϕ M ) ⎤ ⎡ j ωcc R cos (θ p −ϕ1 ) a (θ p ) = ⎢1, e ,..., e c ⎥ ⎦ ⎣
H
where ωc is the carrier frequency and c is the velocity of propagation. The output of the ESPAR antenna can be expressed as the model:
(6)
8
B. Shishkov H y (t ) = w uc ∑ a θ p s p (t ) + ν(t ) P
p =1
( )
(7)
where sp(t), p = 1,…,P is the waveform of the p-th user terminal; v(t) is a complex valued additive Gaussian noise (AGN) and the weight vector w was defined in Eq. (5). Next, the equation (7) can be written in the form
y (t ) = ∑ g (θ p ) s p (t ) + ν(t )
(8)
p
( )
where g( θ p )= w uc a θ p is the antenna response in the direction θp. The beampattern is generally defined as the magnitude squared of g(θ ,ω). Note that each component of wuc affects both the spatial and temporal response of the beamformer. H
4 Objective Function As was stated already the output of the ESPAR antenna is not linearly connected by the adjustable reactances and spatial filtering in adaptive process must be applied carefully. The character of this non-linearity for the ESPAR antenna is not studied until now. That why the model rather is considered and evaluated by numerical way instead of presenting analytical solution of optimal adaptive beamforming. The performance measures to evaluate waveform estimators such as spatial filters are summarized here. Let the error ε(t)=y(t)-d(t) be defined as the difference between the actual response of the ESPAR antenna y(t) and the desired response d(t) (an externally supplied input sometimes called the “training signal”). Let turn to the measures as mean squared error (MSE) or normalized MSE (NMSE) of the output waveform y(t) relative to the desired waveform d(t)
MSE ( y , d ) = E [ε (t )ε (t )∗ ] = E y (t ) − d (t )
NMSE ( y , d ) = MSE (gy , d ) = 1 − ρ yd
2
2
(9) (10)
where g is the complex scalar
g = E [ y (t )d (t ) ∗]/ E [ y (t ) y (t ) ∗] and
ρ yd = E [ y (t )d (t ) ∗]/ E [ y (t ) y (t ) ∗] E [d (t )d (t ) ∗] is correlation coefficient. A closely related measure is the signal-to-interference-andnoise ratio (SINR) which can be expressed as 2 2 SINR( y, d ) = ρ yd /⎛⎜ 1 − ρ yd ⎞⎟ ⎝ ⎠
(11)
Stochastic Modeling and Statistical Inferences of Adaptive Antennas
9
All of these measures are applicable to single time-series, but are often averaged over multiple realizations or multiple data segments of length N. In this paper adaptive beamforming of ESPAR antenna is proposed by using 2
NMSE = 1 − ρ yd (see Eq. (10)) as an objective function and its minimization via stochastic descent technique in accordance with stochastic approximation theory. Let’s have the N-dimensional vectors d(n) and y(n) that are discrete-time samples of the desired signal d(t) and the output signal y(t). Then the following objective function has to be minimized:
d(n ) y (n ) N y (n ) d (n ) N H
J N (x) = NMSE (y , d) = 1 −
H
2
2
d(n ) 2 N y (n ) 2 N
(12)
where the symbol 〈⋅〉N denotes discrete-time averaging. This objective function is a real valued scalar function that depends (via y ) nonlinearly on reactance vector x, x ∈ RM. In the conventional beamforming, the objective function is usually a quadratic (convex) function of the weights and its derivative with respect to w is a linear function of w and thus requiring the linear filtering theory of Wiener for the optimization problem. By contrast, the objective function for the ESPAR antenna Eq. (12) is a non-convex function of the reactances x and its derivative with respect to x is highly non-linear function of x. Thus we have to resort to the non-linear filtering theory that is not completely studied and applied to adaptive beamforming. It is the basic difficulty. Second, in general, the error performance surface of the iterative procedure may have local minima in addition to global minimum, more than one global minimum may exist and so on. The stochastic gradient-based adaptation is used and by recursive way starting from any initial (arbitrary) value for the reactance vector x, it improves with the increased number of iterations k (k=1,2,…,K) among the error-performance surface, without knowledge of the surface itself.
5 The Stochastic Approximation Method In the ESPAR antenna optimization problem, the functional J(x) is not explicitly known and the usual numerical analysis procedures cannot be used. Actually the system can be simulated or observed and sample values J(x), at various settings for x, noted and used for finding the optimal solution. Unfortunately, it is quite common that one cannot actually observe J(x), but rather J(x) plus error or noise. However for the ESPAR antenna the surface of optimization J(x) in M+1 dimensional space is so complicated, gradient vector and Hessian matrix are not available and we have to search for optimum solution by using nonlinear models and no derivatives methods. Let J(xk) denote the “large sample” average yield (N→∞) of Eq.(12) in the kth run (iteration) when the parameter is xk. This quantity corresponds to statistical expectation of the objective function. The actual observed (not averaged or “small sample” averaged) yield JN(xk)=J(xk)+ξk of that output quantity may fluctuate from run to run, owing to variations in the input processes, residues left from previous runs,
10
B. Shishkov
to unavoidable errors in the system and so on. Here ξk=JN(xk)−J(xk) is an observation noise and must not be confused by the AGN ν(n) in Eq.(7). We rediscovered the optimization problem of the ESPAR antenna into the frame of the old famous method of stochastic approximation (SA) for obtaining or approximating the best value of the parameter xk. If J(⋅) were known and smooth, then the basic Newton procedure can be used
x k +1 = x k − H −1 (x k )g(x k )
(13)
under suitable conditions on H(⋅), where g(x)=∇J(x) is the gradient vector and H(x)=∇2J(x) is the Hessian matrix of J(⋅) at x. Our objective function is not analytically tractable with respect to x but can be observed and “noise” corrupted observations JN(⋅) can be taken at xk. The solution is searched into framework of stochastic approximation method, which is based on a “noisy” finite difference form of Eq.(13). In order to set this up, we need some additional definitions. Let {Δxk} denote a sequence of positive finite difference intervals of reactances {xi,i=1,.., M} tending to zero as k→∞ and let ei denote the unit vector in the ith coordinate direction. Also, let xk be the kth estimate of the optimal (minimizing) value of the parameter and JN(xk) be the kth actual noise corrupted observation of the performance. Define the finite difference vectors gd(xk,Δxk), gdN(xk,Δxk) by the ith component and vector observation noise ξk as follows :
g id (x k , Δx k ) = [J (x k + Δx k e i ) − J (x k − Δx k e i )] / 2Δx k g idN (x k , Δx k ) = [J N (x k + Δx k e i ) − J N (x k − Δx k e i )] / 2Δx k
(14)
ξk = g dN (x k , Δxk ) - g d (x k , Δxk ) The stochastic approximation procedure is given by the algorithm xk+1= xk-µk gdN(xk, Δxk) = xk-µk[ gd(xk, Δx k)+ξk]
(15)
Note that both gradients in (15) depend on two arguments: reactance vector and reactance step. We can observe only actual noise corrupted values of the performance JN(⋅,⋅). The Eq.(16) describes the adaptive control algorithm of ESPAR antenna. xk+1= xk-µk gdN(xk, Δxk) / g dN (x k , Δx k
)
(16)
The significance of control step parameter µk is of paramount importance for the performance of adaptive control algorithm and will be discussed further. The sequence {µk} must be of positive numbers, tending to zero and such that ∑µk=∞ in order to help “asymptotically cancel” the noise effects, and for convergence to the “right” point or set. Define dk=xk−xopt, and let g(xopt)=0, then the quantity uk =(k + 1)βdk , β ∈ (0,1) has asymptotically normal distribution. It was proven that xk converges to xopt in the weak sense (in probability):
Stochastic Modeling and Statistical Inferences of Adaptive Antennas
11
P
xk ⎯⎯→ xopt. In SA literature they use usually
µk = µ (k + 1)-α
Δxk = Δx(k + 1)-γ 010,000), whose individual prices vary only slightly between the company’s and its competitors. The technical platform is a Java-based, proprietary web application which is operated, maintained and further developed by the IT department of the company. The backend system was also individually developed for the company. In the preparatory phase (outlined in chapter 3), we decided at the beginning of the project that rather than test individual product prices we would study the optimization of the service charge collected with each order. With respect to the technical implementation we decided to use cookies to identify users participating in the price test. When a visitor arrived on the web site during the live phase it was checked whether he already had a cookie related to the price test. If not, a cookie was set containing information about which subgroup the visitor had been assigned to. If there was already a ”price test cookie”, it was used to show the appropriate service charge for the subgroup. In addition to the cookie, a session ID was generated that contained an ID referring to the appropriate subgroup (SGID). Since the shop system supported URL rewriting (hence URLs contained the SGID), users who bookmarked the site and deleted the cookies could still be identified as belonging to a particular subgroup. Furthermore the SGID was used to communicate and use prices consistently throughout the shopping process of the visitor, even when cookies were not enabled by the user. The first time the visitor saw the service charge was in the electronic basket. Therefore the conversion rate for each subgroup was not determined as the plain ratio of buyers to total visitors but as the ratio of buyers (with a certain SGID) to visitors (with the same SGID) who entered the electronic basket at least once during their session and so were able to see the service charge (in the following chapter this value is referred to as CRB a sket ). The first phase (including project set-up, software design and implementation as well as internal communication and preparation) took about six weeks. The first part of the live phase took about 6 days, the second part about 3 weeks. Results were presented 3 months after the beginning of the project. Alongside the technical requirements there are also organizational ones. Obviously price tests should be managed by the department/ people responsible for price management in the company taking into account what the limits with respect to strategic positioning and competitors are. Furthermore, before starting the price test employees who actively enter into contact with customers of the website (especially service staff) or who are contacted by customers need to be informed and trained in how to deal with price enquiries by telephone and email, canceling orders and possible complaints about different prices caused by the price test. In the latter case the service staff was asked to communicate the display of different prices as a technical error and to apologize by sending out a 5 gift certificate to the complaining visitor/ customer. As long as the number of gift certificates sent out is small it does not impact the price test.
Exploring Price Elasticity to Optimize Posted Prices in e-Commerce
77
5 Results Before the price test the service charge was 2.50 per order. At the beginning we decided to investigate the price range between 0.00 and 7.50. In order to be able to reach valid conclusions after a short period of time it was decided to test four values for the service charge (0.00/ 2.50/ 5.00/ 7.50). Following basic statistics with respect to the necessary sample size, the required time period for the test can be estimated by: TT est ≈
NBin
CI 2 e
CRBasket (1 − CRBasket ) VB
(1)
where NBin is the number of subgroups (in the field study 4 subgroups were used), CI is the chosen confidence interval (1σ), e is the allowed error (=1%), CRBasket is the ratio of buyers to visitors who have seen the electronic basket ( 40%, as an approximate value before the price test was carried out), and VB is the number of visitors seeing the electronic basket per day (=1,500 per day, here only a fraction of all visitors to the website participated in the price test). A calculation using company data from the field study leads to necessary time period for the study TT est of 6.4 days. The price test was carried out on 6 consecutive days and yielded the following results. 5.1 Interpretation Figure 2 shows the dependency of the basket conversion rate CRBasket as a function of the service charge SC. As expected, the conversion rate decreases (statistically significant) with the service charge, indicating the customer’s decreasing willingness-to-pay the increasing service charge. It should be emphasized that each data point (and the corresponding samples) consists of sales with a wide variety of products and a wide range of prices. Nevertheless, allowing for standard deviation the samples are identical. At this point the exact functional dependency is not important for the argument and thus we decided not to use theoretical functions from microeconomics. For practical purposes, in the field study the empirical data were fitted using a second order polynomial function CRf it : CRf it = (−2.2 ± 0.6) · 10−3 SC 2 + (6.2 ± 5.1) · 10−3 SC + 0.385 ± 0.008
(2)
In order to find the optimal service charge the full economics of the company under consideration had to be taken into account. In this paper we initially make the following simplifying assumptions (more realistic assumptions would not change the main results of the study): the service charge should be set in such a way that it maximizes the average contribution margin of a visitor who has seen the electronic basket. We assume that the contribution margin of a single order is given by CMprod + SC − CS, where CMprod is the average contribution margin of the products sold in an average order and CS (= 3.00) is the internal order fulfillment cost (including e.g. postage, wages, machines). For our analysis CMprod (= 18.00) is assumed to be independent of SC, even though the average revenue in a subgroup per visitor (and thus the contribution margin) might be positively correlated with SC, since it is likely that the willingness to
78
B. Funk 45%
7,50 €
Contribution Margin
7,00 € 38,7%
Conversion Rate
6,00 € 30,9%
30%
5,50 €
SCmax
Conversion Rate
6,50 €
36,7%
38,3% 35%
5,00 €
Contribution Margin
40%
25%
4,50 € 20%
-0,50 €
4,00 €
0,50 €
1,50 €
2,50 €
3,50 €
4,50 €
5,50 €
6,50 €
7,50 €
8,50 €
Service Charge SC
Fig. 2. For each of the 4 subgroups there is one data point shown in the figure above covering the range from 0.00 to 7.50. The sample size is about 1,900 for each data point and so the error is about ±1.1%, slightly above the planned value of 1.0%, which is due to the fact that the sample size turned out to be a bit lower than expected. The dashed line shows the average contribution margin per visitor who has seen the seen the electronic basket.
accept a higher service charge will rise with increasing revenue. Formally, the problem is to find SC so that it maximizes ACMvisitor = CRf it (SC)(CMprod − CS + SC)
(3)
where ACMvisitor is the average margin contributed by each visitor who sees the electronic basket at least once during his visit. Applying the above values the optimal service charge is about 5.50 compared to a service charge of 2.50 at the beginning of the field study. By setting the service charge to 5.50, the average contribution margin per visitor ACMvisitor could be increased by 7% from 6.78 to 7.25. 5.2 Long-Term Impact The procedure described above optimizes the contribution margin of an individual visitor with respect to his first order. As an indicator for the long-term impact on customer retention we use the repeat buyer rate (what fraction of customers comes back after their initial order?) for the respective subgroups. It should be emphasized that the repeat buyer rates were calculated only for the test period. Figure 3 shows only a slight dependency of the repeat buyer rate on the service charge. The slope of the linear function in fig. 3 is 0.27 ± 0, 31% per EUR and thus not significantly different from 0. More data would be needed to prove or exclude such a dependency. A negative correlation between the repeat rate and the service charge would indicate that some customers are willing to buy once at a higher service charge but subsequently use alternative offerings. In order to optimize long-term profit this relationship would have to be taken into account in future studies. 5.3 Customer Complaints During the study there were less than 10 complaints from customers by email and telephone related to the observation of different prices. The majority of these complaints
Exploring Price Elasticity to Optimize Posted Prices in e-Commerce
79
21%
19%
Repeating Buyer Rate
17,5% 17% 16,0%
15,8% 15%
13,5% 13%
11%
9% -0,50 €
1,50 €
3,50 €
5,50 €
7,50 €
Service Charge SC
Fig. 3. For each subgroup the fraction of buyers that bought again during the test period is determined
came from existing customers who knew the original service charge and first noticed the supposed increase after ordering. These customers were not told the reason for the change but instead received a 5.00 gift certificate. The overall complaint rate was lower than that expected at the beginning of the field study and had no influence on the outcome.
6 Conclusions and Outlook The managerial implication of this paper are straight forward, since it deals with one of the fundamental questions in economics: what is the optimal price of products and services? The method described in this paper enables e-commerce companies to determine the price elasticity of demand represented by the conversion rate as a function of price. The accuracy of this method is only limited by statistical means and thus the number of visitors and buyers of a site. The field study shows that employing the method involves only limited expense and effort while significantly increasing the average profitability per visitor of a company. There is a large body of research around the topic of price dispersion in the Internet but only a limited amount of work has gone into studying what are the resulting degrees of freedom for companies and how they should use them. This paper opens substantial opportunities for future studies on this topic. Research questions include for example: – How does the demand curve change over time? – What impact do the brand awareness of a company and the uniqueness of its products have on the demand curve? – What kinds of reciprocal dependencies are there between price components of an order [9]?
80
B. Funk
– Can user groups be identified that demonstrate varying degrees of willingness to pay? – Do we have to take customer lifetime value into account when optimizing the longterm profitability? For studying the price optimization based on the customer lifetime value, data from the field study has been used [8]. While the average customer lifetime value per visitor could be enhanced by 13% (compared to the 7% increase of the contribution margin described in this paper) the optimal service charge is only slightly higher than the one discussed here. This paper is meant as a starting point for discussion and further research related to optimizing prices in e-commerce by determining the exact demand curve for products and services in different environments. For advancing this topic additional field studies are needed - a good reason for joining forces between practitioners and the scientific community. Acknowledgements. The comments from the unknown referees are gratefully acknowledged.
References 1. Baker, W.L., Lin, E., Marn, M.V., Zawada, C.C.: Getting prices right on the Web. McKinsey Quarterly (2), 54–63 (2001) 2. Baye, M.R., Morgan, J., Scholten, P.: Price Dispersion in the small and in the large: evidence from an internet price comparison site. Journal of Industrial Economics 51(4), 463–496 (2004) 3. B¨ohme, R., Koble, S.: Pricing Strategies in Electronic Marketplaces with Privacy-Enhancing Technologies. Wirtschaftsinformatik 49(1), 16–25 (2007) 4. Bock, G.W., Lee, S.-Y.T., Li, H.Y.: Price Comparison and Price Dispersion: Products and Retailers at Different Internet Maturity Stages. International Journal of Electronic Commerce 11(4), 101 (2007) 5. Brynjolfsson, E., Smith, M.D.: Frictionless Commerce? A Comparison of Internet and Conventional Retailers. Management Science 46(4), 563–585 (2000) 6. Chernev, A.: Reverse Pricing and Online Price Elicitation Strategies in Consumer Choice. Journal of Consumer Psychology 13(1), 51–62 (2003) 7. Daripa, A., Kapur, S.: Pricing on the Internet. Oxford Review of Economic Policy 17, 202– 216 (2001) 8. Funk, B.: Optimizing price levels in e-commerce applications with respect to customer lifetime values. In: Proceedings of the 11th International Conference on Electronic Commerce, pp. 169–175. ACM, New York (2009) 9. Hamilton, R.W., Srivastava, J.: When 2 + 2 Is Not the Same as 1 + 3: Variations in Price Sensitivity Across Components of Partitioned Prices. Journal of Marketing Research 45(4), 450–461 (2008) 10. Hann, I.-H., Terwiesch, C.: Measuring the Frictional Costs of Online Transactions: The Case of a Name-Your-Own-Price Channel. Management Science 49(11), 1563–1579 (2003) 11. Kannan, P.K., Kopalle, P.: Dynamic Pricing on the Internet: Importance and Implications for Consumer Behavior. International Journal of Electronic Commerce 5(3), 63–83 (2001)
Exploring Price Elasticity to Optimize Posted Prices in e-Commerce
81
12. Knoop, M.: Effective pricing policies for E-Commerce Applications - a field study. Internal Thesis, University of Applied Sciences L¨uneburg (2004) 13. Odlyzko, A.: Privacy, economics, and price discrimination on the Internet. In: Proceedings of the 5th International Conference on Electronic Commerce, pp. 355–366. ACM, New York (2003) 14. Patzer, G.L.: Experiment-Research Methodology in Marketing: Types and Applications, Westport, Connecticut, Quorum/Greenwood (1996) 15. Roth, A.E., Ockenfels, A.: Last-Minute Bidding and the Rules for Ending Second-Price Auctions: Evidence from eBay and Amazon Auctions on the Internet. The American Economic Review 92(4), 1093–1103 (2002) 16. Simon, H., Dolan, R.J.: Power Pricing: How Managing Price Transforms the Bottom Line. The Free Press, New York (1996) 17. Spann, M., Skiera, B., Sch¨afers, B.: Measuring individual frictional costs and willingness-topay via name-your-price mechanisms. Journal of Interactive Marketing 18(4), 22–36 (2004)
Designing Digital Marketplaces for Competitiveness of SMEs in Developing Countries Valentina Ndou, Pasquale Del Vecchio, and Laura Schina University of Salento, via Monteroni, s.n., 73100, Lecce, Italy {valentina.ndou,pasquale.delvecchio,laura.schina}@ebms.unile.it
Abstract. We examine the importance and the role that digital marketplaces play for SMEs in developing countries. By building on the literature of digital marketplace and business models we argue that the features of the context shape the type of the business models applicable for digital marketplaces. We suggest that before designing a digital marketplace initiative it is necessary to undertake a context assessment that ensures a better understanding of the firms’ preparedness to the usage of this kind of platforms in terms of technological infrastructure, human resources’ capabilities and skills, integration and innovation level among firms. It is essential to start with feasible initiatives and build up steadily the qualifications necessary for facing hindrances. However, starting at the right point and in the right way doesn’t automatically guarantee success and competitive advantage of SMEs, but can represent a way to start admitting the fundamental role of the innovation in a high complex environment. Keywords: Digital marketplaces, SMEs, Business models.
1 Introduction The widespread diffusion of e-Business and rising global competition have prompted a dramatic rethinking of the ways in which business is organized. The new internetworking technologies, that enhance collaboration and coordination of firms and foster the development of innovative business models, are increasingly important factors for firms competitiveness. An important trend in various industries is the creation of Digital Marketplaces as a key enabler that allows firms to expand the potential benefits originating from linking electronically with suppliers, customers, and other business partners. The number of new digital marketplaces grew rapidly in 1999 and 2000 [1]. In sectors such as industrial metals, chemicals, energy supply, food, construction, and automotive, “e-marketplaces are becoming the new business venues for buying, selling, and supporting to engage in the customers, products, and services” [2]. Over the years digital marketplaces have produced significant benefits for firms, in terms of reductions in transaction costs, improved planning and improved audit of capability, which, if well communicated, might provide strong incentives for other organizations to adopt [3]. M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 82–93, 2011. © Springer-Verlag Berlin Heidelberg 2011
Designing Digital Marketplaces for Competitiveness of SMEs in Developing Countries
83
On the other hand, it is widely believed that digital marketplaces offer increased opportunities for firms in developing countries by enabling firms to eliminate geographical barriers and expand globally to reap profits in new markets that were once out of reach. However, it has been observed that although digital marketplaces are already appearing in almost every industry and country a very small number of them have been able to grasp the benefits and resist on time. In 2006 just 750 active digital marketplaces were registered on the directory of e-Market Services compared to 2,233 digital marketplaces identified by Laseter et al in 2001 [4]. The statistics show also that the use of digital marketplaces in developing countries is really low. The study conducted by Humprey et al in 2003 [5] related to the use of b2b marketplaces in three developing countries shows that 77 per cent of the respondents had not registered with a digital marketplace. While of the remaining 23 per cent that had registered with one or more digital marketplace, only seven had completed at least one sale. These statistics demonstrate the low levels of adoption across firms as result of a number of barriers faced for the adoption of digital marketplaces. Humprey et al (2003) [5] identifies as inhibiting factors for developing countries the perceived incompatibility between the use of digital marketplaces and the formation of trusted relationships; lack of preparedness, awareness and the need for training; sophisticated technologies. Marketplaces’ operators provide standardized solutions that do neither match the needs of developing countries nor allow this latter to exploit new technologies’ potentialities. In order for developing countries to grasp the advantages of digital marketplaces it is not feasible to simply transfer technologies and processes from advanced economies. People involved with the design, implementation and management of ITenabled projects and systems in the developing countries must improve their capacity to address the specific contextual characteristics of the organization, sector, country or region within which their work is located [6]. Therefore, what can and needs to be done in these contexts is to find out a digital marketplace model that is rooted in an assessment study which permits to understand the firms preparedness to use the digital marketplace in terms of technological infrastructure, human resources’ capabilities and skills, integration and innovation level among firms. Moving away from these assertions, the aim of this paper is to provide a conceptual framework for finding out an appropriate digital marketplace business model for food firms in Tunisian context that matches with their specific conditions. In specific we have undertaken a study among food firms in Tunisia to assess their awareness and actual preparedness to use the digital marketplace. Based on the outcomes of that assessment we found out the appropriate digital marketplace model to start with as well as an evolutionary path that firms need to follow in order to enhance their competitiveness. The remainder of the paper is structured as follows. The next section discusses the concept of digital marketplace and their importance for firms. Next, we present the survey study undertaken with the objective to understand the e-readiness level among Tunisian food firms in order to propose a viable digital marketplace model that is appropriate to the context under study. We describe the survey and sample selection. Next we discuss the business model that fits with actual readiness status of firms for
84
V. Ndou, P.D. Vecchio, and L. Schina
using new business models. The proposed model traces an evolutionary path in order for participating firms to get aware, to learn and adopt to new business models as well as to develop progressively the necessary resources and capabilities (relational, technical and infrastructural) to enhance their competitiveness in the current digital economic landscape.
2 Digital Marketplaces Digital Marketplaces can be defined as web-based systems that link multiple businesses together for the purposes of trading or collaboration [3] and are based on the notion of electronically connecting multiple actors to a central marketspace in order to facilitate exchanges of different types of resources as information, goods and services [7], [8], [9], [10]. Nowadays these platforms are considered an integral part of conducting business online [1], [11], [12], [13], [14], [15]. Digital Marketplaces have become increasingly used across industries and sectors, providing different types and categories of functions and services. The authors categorize them according to different criteria, as for example basing on the functionalities they offer [16], [17], [18], or on number of owners and their role in the marketplace [19]. As regard to the ownership, can be identified three classes of marketplace, commonly classified in: • Third party or public marketplaces are owned and operated by one or more independent third parties. • Consortia marketplaces are formed by a collaboration of firms that also participate in the marketplace either as buyers or suppliers [20]. • Private marketplaces are electronic networks formed by a single company to trade with its customers, its suppliers or both [21]. Consortia marketplaces were identified as most likely to be sustainable [20], as the founders can introduce their own customers and suppliers to the marketplace, helping the marketplace establish a viable level of transactions – a ready source of buyers and suppliers not available to third party marketplaces [1]. This category of platform seems to be able to guarantee a higher level of integration than for example the public marketplace characterized by a wide range of participants both in terms of number and variety. This example could be useful in order to provide a guide in the selection of the most suitable solution according to different requirements and the background of the field of interest [22]. Even though a digital marketplace can perform a variety of mechanisms, according to IBM et al., (2000) [23], these types of platforms based on a shared Internet-based infrastructure, are mainly important for: • Automating core commerce transactions and rationalizing a lot of processes, as for example the procurement, the customers’ management and the selling; • Enabling a collaborative network for the management of the entire life cycle process, from the product and service design, to the planning, optimization and fulfillment, covering the global supply chain;
Designing Digital Marketplaces for Competitiveness of SMEs in Developing Countries
85
• Aggregating a wide range of industry information (regarding products, services, processes, and so on) into common classification and structures; • Providing a shared environment where trading processes (as supplying, negotiation and exchange) can be taking place online and in real time; • Building a global community in order to share knowledge, exchange news and information, organizing events, with the purpose to increase the number of the players involved. Moreover, Kaplan and Sawhney (2000) [10], identify two categories of basic trading functions according to a digital marketplace is able to create added value, addressed as follow: • Aggregation: considered as a mechanism for bringing many buyers and sellers together under one roof, which facilitates “one-stop shopping” and thus reduces transaction costs. • Matching: a mechanism that allows dynamic negotiations (for example of price) between buyers and sellers on a real-time basis. Howard et al., (2006) [3] argues that there are significant evidences regarding the benefits that firms could realize by using digital marketplaces in term of reductions in transaction costs, improved supplier communication, improved planning and improved audit of capability. Rask and Kragh (2004) [24], classify the main benefits for participation in a digital marketplace into three main categories: efficiency benefits (reducing process time and cost); positioning benefits (improving company’s competitive position); legitimacy benefits (maintaining relationship with trading partners). While Eng (2004) [25] distinguishes three main benefits of digital marketplaces: the reduction of the unit costs; the growth of the global efficiency; the alignment from an operational perspective [25]. However, most of the implemented digital marketplaces have failed to realize their core objectives and to deliver real value for their participants. According to Bruun, Jensen and Skovgaard (2002) [26], many digital marketplaces have failed as they have been founded on optimism and hope rather than on attractive value propositions and solid strategies. Evidently, the benefits that could be created via digital marketplaces has generated tremendous interest. This has led to a large number of e-marketplace initiatives rushed online without sufficient knowledge of their customers’ priorities, with no distinctive offerings, and without a clear idea about how to become profitable [27]. Another reason that cause the failure of the initiatives undertaken to the adoption of a Digital Marketplace is related to the lack of a deep analysis of the customers’ priorities able to satisfy specific needs and create valuable paths of profitability. In fact many digital marketplaces’ operators provide standardized solutions that do neither match the needs of particular categories of firms nor allow this latter to exploit new technologies’ potentialities [9]. Also they ignore the fact that most industries are dominated by small and medium enterprises that are far less likely to use new technologies as result of resource poverty, limited IT infrastructure, limited knowledge and expertise with information systems.
86
V. Ndou, P.D. Vecchio, and L. Schina
Finally, White et al., (2007) [1] claim that developing and creating high-valueadded services is challenging for digital marketplaces as technology is not in place to enable more sophisticated forms of real-time collaboration among multiple participants. Therefore, offering simply a standard marketplace platform will result in a failure of the initiative as firms might not be prepared to use it, do not see the value proposition and hence they remain disinterested in using the platform for integration. According to Rayport and Jaworski (2002) [28], the process of convincing organizations to join the digital marketplace is both long and expensive, despite the fact that the same offers its participants appropriate economic incentives. On the other hand, prospective buyers and suppliers will not join the digital marketplace only on “visionary predictions of the glorious future of B2B e-trade; they must see the benefits in it right now,” according to Lennstrand et al., (2001) [29]. Therefore, finding a business model that provides enough value to trading partners, to justify the effort and cost of participation is a substantial challenge associated with the creation of a digital marketplace [28].
3 Methodology In order to capture the state of the preparedness of firms to adopt a new internetworking platform a survey has been conducted to collect data. The sample of the study consisted on Tunisian food SMEs chosen according to the EU definition of firms with 10-250 employees. The population of firms was derived from a database of the Tunisian industry portal containing data on Tunisian food processing firms. The sample is made of 27 firms with 13 medium size enterprises and 14 small size enterprises. Out of the 27 firms surveyed, 18 questionnaires were useful for the survey producing a response rate of 67 %. The data were collected in march 2008, over a period of three weeks, by means of face to face interviews and in some cases e- mail surveys (when mangers didn’t gave use the availability to realize a face – to face interview). We used the questionnaire as a tool to gather data. The final questionnaire included a 3 pages structured questionnaire with a set of indicators organized into the following modules: - The technological networks in which are included indicators that measure the ICT infrastructure for networking in the firms in particular the use of Internet, the use of Local Area Networks (LAN) and Virtual Private Network (VPN) for remote access. - The e-business activities that firms use to support and optimize internal business processes, procurement and supply chain integration, marketing and sales activities, use of e-business software; - The level of awareness and use of digital marketplaces aimed to identify if firms use digital marketplace and/or are aware about the potentialities and benefits they can deliver; - Limitations and Conditions to e-Business – aimed to identify the perceived factors and barriers that firms consider as limitation for the adoption and use of e-business models.
Designing Digital Marketplaces for Competitiveness of SMEs in Developing Countries
87
Data provided have been analyzed by using a series of descriptive statistics processed into the Statistical Package for Social Sciences SPSS version 12.0 for Windows.
4 Survey Results The results of the survey regarding the ICT networking are displayed in the figure 1.
Fig. 1. Technological Infrastructure
The results show a shortage of networking technological infrastructure. Generally, the surveyed firms have internet connection or plan to have it, but when it comes to technologies used to connect computers such as LAN, WLAN, VPN the firms surveyed reported to use them in a very low rate. LAN is existent in 33 % of firms surveyed, while just 11% of the firms have the VPN since it is inherent to the technology infrastructure of medium sized firm. The VOIP is inexistent in all firms and is not even planned to be used. The results also, confirmed the limited awareness of firms for ICT issues and a general lack of ICT skills within the industry. In fact, although firms surveyed do have an Internet connection or are planning to have one they don’t have vision about its usage. Only 39% of the firms allow its employees to have an external access and 11% are planning to have it whereas the 50% remaining don’t even plan for it. e-Business activity - The results regarding the business solutions used by Tunisian Agrifood firms are displayed in the figure 2. According to the results, Enterprise Resource Planning (ERP) systems are the most diffused among Agrifood Industry. Also, there is a wide range of firms that are planning to implement the ERP. The firms that neither have ERP nor plan for it argue on the fact that the size of their firm is manageable without any IS that require tremendous efforts and investments. This is consistent with the fact that accounting application (excel, software, in house solutions) are widely diffused. Even though some firms do have the Intranet as a mean of intra organizational communication, its level of use remains relatively low and its functionalities under used. Although, Intranet exist yet the purpose of use and its outcome is relatively low. Only one of the medium sized enterprises use or plan to use the Extranet whereas the small firms have no plan for adopting the extranet as a business solution that will connect them to their trading partners. The e-learning application is not highly
88
V. Ndou, P.D. Vecchio, and L. Schina
Fig. 2. e-Business solutions
diffused. However, the firms consider it very important and the results show that all our sample plan to have it. This is consistent with the government politic as it is providing incentives and support for e-learning adoption. The findings concerning the level of diffusion of extranet are consisting with the findings concerning the Supply Chain Management (SCM). The managers of SMEs in agrifood sectors believe that the SCM overpasses their needs and it is an extra expense. Therefore, we find out a high rate of firms that don’t even plan to implement SCM. The level of diffusion of web sites remains relatively low and the firms do not get the point with such an investment and its benefits. In fact, a lot of managersespecially in small size companies-believe that a web site’s requirements in term of investment are higher than the expected returns and benefits. Our findings are consistent with the e-BusinessW@tch (2006) [30]. In fact, according to eBusinessW@tch, ERP is largely adopted among agrifood firms since it allows process integration and synergies. E-Marketplace Awareness – Some questions regarding the level of knowledge, the willingness and the likelihood of adoption of a trading platform as a business solution were inserted in the study, as well. The survey data reported that 27.8% of the firms were aware of the digital marketplace and able to provide a brief description of it. Among the firms aware of digital marketplace only one firm is a small size one. In contrast, 72.3% of the firms plan to undertake e-business activities without any intent to integrate their system with their main trading partners. Limitations and Conditions to E-Business - The questionnaire was also aimed to identify the main obstacles and limitations that companies encounter in performing online transactions and e-business activities. In particular we asked them to identify the most important constraints they face in an attempt to use different ebusiness platforms or models. Nine main indicators resulted as most influential: Lack of human capital (HC); Fear of loosing privacy and confidentiality of the company’s information (PCI); Lack of financial resources,(FR); Lack of top management support, (TMS); Strict government regulation, (SGR); Lack of regulations for online payment (ROP); Our partners do not use e-business (PU); Benefits of using e-business are not clear, (BEB); IT and software integration problems (ITSI). The results are displayed in Figure 3.
Designing Digital Marketplaces for Competitiveness of SMEs in Developing Countries
89
In general terms, the results show that the lack of regulations for online payment is the main inhibitor of e-business followed by the fact that business partners using e-business and the lack of resources especially human capital. In fact, past studies on e-business highlighted some challenges that e-business adopters might face. Notice that training and finding qualified e business employees are among the most critical challenges that e-business adopters might face.
Fig. 3. Factors inhibiting eBusiness adoption
However, even though we notice some degree of e-business awareness, the volume of transactions via Internet is still of an issue. There are no transactions on line since SMEs are not linked to an agency that secures electronic certification. Further, the problems of payment security still persist along with logistic and quality problems. Therefore, the only symptoms of EC in Tunisia are e-mail, e-catalog and information portals and at some exceptions the possibility to order online. The rest of the transaction is done via classical way. Thus, we cannot really talk about EC in Tunisian SMEs as it still needs time to emerge.
5 Designing Digital Marketplace for SMEs The research findings show that Tunisian food firms didn’t get the full use of ebusiness nor get tangible benefits. The managers proved a good level of theoretical knowledge concerning e-business and its benefits however practical cases didn’t bubble up yet. This is due to managerial and technical inhibitors that our prospects expressed. Such results suggest avoiding the choice of solutions which tend to use sophisticated technologies, require high level of integration and collaboration among supply chain actors or that point directly to the integration firms. Rather, it is reasonable opting for ‘lighter’ solutions, to start involving a limited number of operators, particularly aware and interested, but not necessarily equipped, around a simple solution that requires lower levels of innovation and coordination capabilities by local firms. It is important to note that the design of a marketplace is not a given, it highly depends on users ability to recognize opportunities, benefits as well as the barriers to be faced by them. However, in most cases a kind of intermediary actor is
90
V. Ndou, P.D. Vecchio, and L. Schina
needed which bring buyers and sellers together and tries to create awareness among them by providing the platform. The intermediary actor arrange and direct the activities and process of the digital marketplace. The role of intermediaries can be played by a confederation of the industry, industry associations or other types of representative organizations that are able to secure a critical mass of users to the digital marketplace. This marketplace model is known in literature as Consortium marketplace. Different authors have identified it as most likely to be sustainable especially for fragmented industries and SMEs [20], as the founders can introduce their own customers and suppliers to the marketplace, helping the marketplace establish a viable level of transactions. In contrast to a private marketplace, a consortium marketplace by definition is open to a number of buyers and suppliers in the industry, if not all, increasing the likelihood of participation and use. Thus, in this initial stage the digital marketplace will serve as aggregator of buyers and sellers in a one single market in order to enhance products promotion and commercialization. It is aimed at offering a one-stop procurement solution to firms by matching buyers and sellers through its website. The digital marketplace in this case will serve as a context to initiate a change management process aimed at creating, overtime, the necessary technological and organizational prerequisites for any further intervention aimed at developing and enhancing the competitiveness of firms. Simple trading services such as e-Catalogue, e-mail Communication, Request for Quote, Auction are suitable for firms in this stage. These services do not support new processes, they just replicate the traditional processes over the Internet, in an effort to cut costs and accelerate the process [22]. However, this is just the first step of an evolutionary learning process of creation, development, consolidation and renewal of firms competitiveness through e-Business. This stage is a prerequisite for increasing the firms awareness regarding the potentialities and the benefits of e-Business solutions. Any solution, no matter how simple, will not be automatically adopted if not framed in a wider awareness initiative aimed at informing the relevant stakeholders of the impacts and benefits that this solution can have for them over the short and the medium-long term. To further increase the awareness of SME and for building the local SMEs capabilities some training programs that tap on ICTs could be of great support for firms. Starting to use the basic services offered by digital marketplaces in this initial stage is an indispensable phase for creating the right conditions for pursuing an evolutionary pathway towards more collaborative configurations. For example the use of eCatalogues or Auction services involves information sharing or data exchange between trading parts. Through communication with trading partners firms starts to pull inside the marketplace other supply chain firms with which they realize trade. In this way firms start to move towards more collaborative settings where suppliers, customers, and partners share more information and data between them, create strong ties as well as longer-term supplier-customer relationships. Then, to support this new type of relationships created among participating and to provide more value added for them the digital marketplace needs to evolve towards the development of new services that reinforce the relationships among different actors, create new ones and exploit partnerships in order to enhance services. More
Designing Digital Marketplaces for Competitiveness of SMEs in Developing Countries
91
advanced collaborative technologies could be implemented in order to connect, suppliers, customers and partners in a global supply network where critical knowledge and information about demand, supply, manufacturing and other departments and processes is shared instantaneously. More value added services could be provided at this stage such as – online orders, transactions, bid aggregation, contract management, transaction tracking, logistics, traceability etc - which permits the supply chain actors to integrate their operations and processes. The use of such services as bid aggregation, logistics, traceability and so on, doesn’t simply enable firms to exchange knowledge and information and but also to develop it together in order to better understand customers and market trends. Thus a further stage of digital marketplace could be developed to leverage on the integrated and collaborative culture of the firms to create distributed knowledge networks, composed by a set of dynamic linkages among diverse members who are engaged in deliberate and strategic exchanges of information, knowledge and services to design and produce both integrated and/or modular products. Networking services could be implemented in this stage such as Virtual Project Workspace (VPW) for product development teams, e-learning services, knowledge management, virtual communities. The approach proposed suggest that firms need to go through a sequential stages where the activities are cumulative. This means that firms in stage 2, for example, undertakes the same activities as those in stage 1, that is communicating with customers and suppliers via email and using the web for realizing catalogues, but in addition they start collaborating and transacting online with other actors of the supply chain.
6 Conclusions The thesis underpinning this paper is that the context features shapes the type of the business models for digital marketplaces. It is argued that before starting a digital marketplace initiative it is necessary to undertake a context assessment that permits to understand the firms preparedness to use the digital marketplace in terms of technological infrastructure, human resources’ capabilities and skills, integration and innovation level among firms. On this basis a specific evolutionary approach has been presented. The aim is to provide developing firms with solutions that match their needs and that help them to get aware, to learn and adopt to new business models as well as to develop progressively the necessary resources and capabilities (relational, technical and infrastructural) to enhance their competitiveness in the current digital economic landscape. Our approach also presents important implications. The concept of digital marketplaces is useful for firms in developing countries, however despite the promise they remain largely unused because of the inadequacy of solutions to context features. We argue that the success of a digital marketplace initiative needs to be rooted in an assessment study of the context that permits to understand the firms preparedness to use the digital marketplace in terms of technological infrastructure, human resources’ capabilities and skills, integration and innovation level among firms. Based on the outcomes of this assessment it is then possible to find out a suitable business model for the digital marketplace that show sensitivity to local realities and ensure the effective participation of the firms.
92
V. Ndou, P.D. Vecchio, and L. Schina
It is essential to start with feasible initiatives and build up steadily the qualifications necessary for facing hindrances. However, starting at the right point and in the right way doesn’t automatically guarantee success and competitive advantage to SMEs, but can represent a way to start admitting the fundamental role of the innovation, according to the need to survive in a high complex environment. In today’s business environment firms need to continuously upgrade and develop organizational structures, assets and capabilities, the social and customer capital to enhance to enhance their competitiveness. Thus firms need to adopt a co-evolutionary that stimulate collaboration and coordination among firms. The active role of an intermediary is crucial especially at the earliest stages, to raise awareness, assure firms participation, build and maintain wide commitment and involvement. The ideas that we propose need to be refined in further conceptual and empirical research. First, a field analysis is needed in order to appraise and to validate the evolutionary path proposed in this paper. Second, it will be important also to monitor the process of adoption of digital marketplace and their specific impacts on firms competitiveness. Third, further research could also focus o how to realize digital marketplace solutions that integrates internal business systems with a common platform. The research can be oriented toward identification of a unifying solution for SMEs, in which there is a convergence and integration of activities, considered as part of a joint entity. This solution may be able to generate a high number of benefits, related to the opportunity to decrease errors and mistakes in the transactions, to reduce the duplication of activities, to manage business in a simple and fast way. Further research could be also focused on understanding the factors that inhibit or support the passage of firms from one stage to another of the evolutionary model.
References 1. White, A., Daniel, E., Ward, J., Wilson, H.: The adoption of consortium B2B e-marketplaces: An exploratory study. Journal of Strategic Information Systems 16(1), 71– 103 (2007) 2. Raisch, W.D.: The e-Marketplace: Strategies for Success in B2B e-Commerce. McGrawHill, New York (2001) 3. Howard, M., Vidgen, R., Powell, P.: Automotive e-hubs: exploring motivations and barriers to collaboration and interaction. Journal of Strategic Information Systems 15(1), 51–75 (2006) 4. Laseter, T., Long, B., Capers, C.: B2B benchmark: the state of electronic exchanges. Booz Allen Hamilton (2001) 5. Humphrey, J., Mansell, R., Paré, D., Schmitz, H.: The reality of E-commerce with developing countries. A report prepared for the Department for International Development’s Globalisation & Poverty Programme jointly by the London School of Economics and the Institute of Development Studies. Sussex, London/Falmer (March 2003) 6. Avgerou, C., Walsham, G. (eds.): Information technology in context: Implementing systems in the developing world. Ashgate, Aldershot (2000) 7. Bakos, J.Y.: A strategic analysis of electronic marketplaces. MIS Quarterly 15(3), 295– 310 (1991) 8. Bakos, J.Y.: The emerging role of electronic marketplaces on the Internet. ACM Common. 41(8), 35–42 (1998)
Designing Digital Marketplaces for Competitiveness of SMEs in Developing Countries
93
9. Grieger, M.: Electronic marketplaces: A literature review and a call for supply chain management research. European Journal of Operational Research 144, 280–294 (2003) 10. Kaplan, S., Sawhney, M.: E-hubs: the new B2B marketplaces. Harvard Business Review 3, 97–103 (2000) 11. Soh, C., Markus, L.M.: B2B E-Marketplaces-Interconnection Effects, Strategic Positioning, and Performance. Systemes d’Information et Management 7(1), 77–103 (2002) 12. Gengatharen, D.E., Standing, C.: A Framework to Assess the Factors Affecting Success or Failure of the Implementation of Government-Supported Regional eMarketplaces for SMEs. European Journal of Information Systems 14(4), 417–433 (2005) 13. Markus, M.L., Christiaanse, E.: Adoption and impact of collaboration electronic marketplaces. Information Systems and E-Business Management 1(2), 139–155 (2003) 14. Kambil, A., Van Heck, E.: Making markets. How firms can design and profit from online auctions and exchanges. Harvard Business School Press, Boston (2002) 15. Koch, H.: Business-to-business electronic commerce marketplaces: the alliance process. Journal of Electronic Commerce Research 3(2), 67–76 (2002) 16. Dai, Q., Kauffman, R.J.: Business Models for Internet-Based B2B Electronic Markets 17. Grieger, M.: An empirical study of business processes across Internet-based electronic marketplaces. Business Process Management Journal 10(1), 80–100 (2004) 18. Rudberg, M., Klingenberg, N., Kronhamn, K.: Collaborative supply chain planning using electronic marketplaces. Integrated Manufacturing Systems 13(8), 596–610 (2002) 19. Le, T.: Business-to-business electronic marketplaces: evolving business models and competitive landscapes. International Journal of Services Technology and Management 6(1), 40–53 (2005) 20. Devine, D.A., Dugan, C.B., Semaca, N.D., Speicher, K.J.: Building enduring consortia. McKinsey Quarterly Special Edition (2), 26–33 (2001) 21. Hoffman, W., Keedy, J., Roberts, K.: The unexpected return of B2B. McKinsey Quarterly 3, 97–105 (2002) 22. Popovic, M.: B2B e-Marketplaces (2002), http://europa.eu.int/information_society/topics/ebusiness/ ecommerce/3information/keyissues/documents/doc/ B2Bemarketplaces.doc 23. IBM, i2, Ariba, A. M.: E-marketplaces changing the way we do business (2000), http://www.ibm-i2-ariba.com 24. Rask, M., Kragh, H.: Motives for e-marketplace participation: Differences and similarities between buyers and suppliers. Electronic Markets 14(4), 270–283 (2004) 25. Eng, T.-Y.: The role of e-marketplace in supply chain management. Industrial Marketing Management 33, 97–105 (2004) 26. Bruun, P., Jensen, M., Skovgaard, J.: e-Marketplaces: Crafting A Winning Strategy. European Management Journal 20(3), 286–298 (2002) 27. Wise, R., Morrison, D.: Beyond the Exchange - The Future of B2B. Harvard Business Review, 86–96 (November-December 2000) 28. Rayport, J.F., Jaworski, B.J.: Introduction to e-Commerce, pp. 204–206. McGraw-Hill, New York (2002) 29. Lennstrand, B., Frey, M., Johansen, M.: Analyzing B2B eMarkets. In: ITS Asia-Indian Ocean Regional Conference in Perth, Western Australia (July 2-3, 2001) 30. e-Business W@tch 2006. The european ebusiness report: a portrait of ebusiness in10 sectors of the economy. European Commission’s Directorate General for Enterprise and Industry (2006)
Strategic Planning, Environmental Dynamicity and Their Impact on Business Model Design: The Case of a Mobile Middleware Technology Provider Antonio Ghezzi, Andrea Rangone, and Raffaello Balocco Politecnico di Milano, Department of Management, Economics and Industrial Engineering Piazza Leonardo 32, 20133 Milan, Italy {antonio1.ghezzi,andrea.rangone,raffaello.balocco}@polimi.it
Abstract. The study aims at addressing how the approach to strategy definition, strategic planning and external dynamicity can affect the process of business model design within a firm. Analyzing the case of a newcomer in the market for the provisioning of middleware platforms enabling the delivery of Mobile digital content, the paper first identifies a set of critical decisions to be made at a business model design level for a Mobile Middleware Technology Provider, and aggregates these variables within an overall reference framework; then, through a longitudinal case study, it assesses how and why the initial business model configuration changed after two years of business activity. The study allows to argue that those business model design changes occurred in the timeframe considered depended largely on a defective approach to business strategy definition, while environmental dynamicity mostly played an “amplification effect” on the mistakes made in the underlying strategic planning process. Keywords: Strategy, Strategic planning, Business model, Mobile telecommunications, Mobile content, Mobile middleware technology provider, Case study.
1 Introduction In the last years, The Mobile Content market, i.e. the market for mobile digital content and services, has been characterized by high levels of dynamicity and uncertainty. The Mobile Network Operators’ (MNO) refocus on digital content – so to cope with the levelling off of voice revenues and to the subsequent decrease of Average Revenue per User [1][17][21] –, together with the process of value system reconfiguration the Mobile Industry as a whole was going through [10][18][23][31], contributed in shaping a complex context where each market segment experiences significantly different performances [4], and is populated by a fast growing range of actors whose activities covered and position within the value network are not so clearly defined. Among them, a relatively new actor typology is currently taking on a significant role in the competitive ecosystem: the market’s technology enabler, from M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 94–109, 2011. © Springer-Verlag Berlin Heidelberg 2011
Strategic Planning, Environmental Dynamicity and Their Impact
95
now on referred to as “Mobile Middleware Technology Provider” (MMTP). Such players are converging in the Mobile Content market from several neighboring business areas, and their moves can strongly influence the market’s development, potentially determining unexpected competitive attritions between these new players and incumbents. These competitive dynamics deserve attention from both researchers and practitioners. Specifically, questions arise concerning the strategies MMTPs will elaborate to compete in the fast-changing and fragmented Mobile Content market, and the business models they will hence design and adopt. Analyzing the noteworthy case of MMTPs, the purpose of the paper is to explore the evolution of a company’s business model, assessing how the internal strategic planning process on the one hand, and the external changes due to the market turbulence and dynamicity on the other, can determine the reshaping of the previously adopted model. Moreover, the relation between business model design and the underlying strategy approach a firms adopts will be investigated. The research focuses on an Italy-based Mobile Middleware Technology Provider, new entrant in the Mobile Content market, finding itself in the condition of developing a business model for the new business area it is going to compete in. Employing the longitudinal single case study methodology – based on 15 semistructured interviews carried out in two distinct periods of time, 2006 and 2008 –, the research is articulated into two main steps. At a first stage, it attempts to identify which are the most critical choices to be made at a business model design level for a MMTP, and to understand how these parameters are interrelated and can be combined to give rise to a thorough business models. At a second stage, a comparison is carried out between the initial and the current business models adopted by the company, in order to identify any change in the parameters prioritization and in the approach towards business model design as a whole. As a conclusions, inferences will be made concerning the existing relation between business model design and the overall strategy definition process.
2 An Overview on Business Model Design Literature The concept of business model generally refers to the “architecture of a business” or the way firms structure their activities in order to create and capture value [26], [28], [30]. As a literature stream, business model design has evolved from a piecemeal approach that looked for the single identification of typologies or taxonomies of models, to one searching for the development of a clear and unambiguous ontology – that is, the definition of the basic concepts of a theory [22] – that could be employed as a generalized tool for supporting strategy analysis on firms. In parallel, business model has become an extensive and dynamic concept, as its focus has shifted from the single firm to the network of firms, and from the sole firm’s positioning within the network to the its entire interrelations and hierarchies [2]. What is widely accepted by the literature is that a business model shall be analyzed through a multi-category approach, as a combination of multiple design dimension, elements or building blocks. However, the proposed dimensions are quite diverse, and the existing body of knowledge shows a lack of homogeneity.
96
A. Ghezzi, A. Rangone, and R. Balocco
Noteworthy attempts of providing a unified and consistent framework can be found in [2][22][26][30] – this last study asserting a specific focus on Mobile Telecommunication Industry –. The recurrent parameters of their models can be brought back to the general concepts of Value – e.g. value proposition and financial configuration –, and Control – e.g. inter-firm or value network relationships. The literature review on business model design allowed to individuate a further literature gap: as the Mobile Content segment is a relatively young market, and as the “advent” of MMTPs within such market’s boundaries is an extremely recent phenomenon, only few consolidated theory on strategy creation and business model design in such market context and with reference to the specific player typology under consideration is present. Therefore, starting from the existing literature on business model design, and taking into account the building blocks so far pinned down, this research attempts to identify the key business model parameters for MMTPs, and to describe the “parameters mix” actually employed by one player operating in the Mobile Content market. Moreover, through a comparative analysis of the business model solutions adopted in different moments in time by the same company, the study will shed light on the impact of a fast changing environment on the business model design process, also assessing the implications of the strategic approach underlying the business model design choices.
3 Research Methodology The present research is based on case studies, defined by Yin [32] as “empirical inquiries that investigates a contemporary phenomenon within its real-life context, especially when the boundaries between phenomenon and context are not clearly evident; and in which multiple sources of evidence are used”. Qualitative research methodology was chosen as particularly suitable for reaching the research objectives, which aim at understanding the complex phenomenon of business model design development within a given industry – i.e. Mobile Content – characterized by a high level of dynamicity and competitive turbulence, and with reference to a specific typology of players – MMTPs –, and at thus building new theory – or extending existing theories – on it [9], [19], [20]. To accomplish the previously identified research propositions, a single in-depth longitudinal exploratory case study on an Italy-based Mobile Middleware Technology Providers was performed. (The company name will not be disclosed throughout the paper. All proper names of informants have not been mentioned as well, to preserve their anonymity). This company could be defined a “MMTP” as it presented both a well-defined line of business dedicated to the commercialization of Content and Service Delivery Platforms or CSDP modules, and an offer directed to the Mobile Telecommunications market. Coherently to the research methodology employed [24], the firms belonging to the theoretical sample were selected as they conformed to the main requirement of the study, where the process of interest was “transparently observable”. Specifically, at the time the first set of interviews were collected, this company was an early entrant on the Mobile Content market, and was going through the process of designing a suitable business model.
Strategic Planning, Environmental Dynamicity and Their Impact
97
A single case study methodology allows to provide a thorough, extensive qualitative description and analysis of the business model definition process with the needed depth and insight, hardly replicable when considering a wider theoretical sample. Furthermore, the longitudinal approach enables the establishment of a comparison between the company’s conditions in different moments of its history, thus obtaining a valuable “ongoing view” on how it developed with reference to the specific variables under scrutiny. From May to June, 2006, 10 face-to-face semi-structured interviews were held with 4 persons identified as key participants in the firms’ strategy definition and business model design processes at different levels. The population of informants included the following top and middle managers: Chief Executive Officer (CEO); Chief Information Officer (CIO); Marketing & Sales Manager (MSM); Product Managers (PM).The semi-structured nature of the interviews made possible to start from some key issues identified through the literature – e.g. the business model parameters highlighted by the existing body of knowledge –, but also to let any innovative issue emerge from the open discussion. The identification of core business model parameters also leveraged on procedures borrowed from “Grounded Theory” methodology [15], which helps developing new theory or a fresh insight into old theory: after identifying the research “core category, the related “conceptual categories” were then isolated and described by means of applying the “open coding” technique to the interviews transcriptions. In order to assess the impacts of the fast-changing environment on both the business model initially adopted by the company and, potentially, on the overall strategy employed, more than two years after the first contacts with the firm – from June to July, 2008 –, a second wave of 5 further interviews were held with 3 key informants – this time, the Chief Executive Officer, the Chief Information Officer and the Marketing and Sales Manager were involved. By maintaining the same research structure in terms of scheme of analysis and questions, the comparability of 2007 and 2008 results was assured. The only addition to the original scheme of analysis was related to questions concerning the business model variation in time. In 2008, the informants were asked to identify any perceived difference between the initial and the current business configuration. This further set of interviews provided the study with the requested longitude, thus supporting a within-case analysis of changes in the firm’s strategic dynamics and business architecture with reference to the temporal dimension. The need of assessing the whole business model design decision making process, paying attention to different subunits within the companies, led to the adoption of an “embedded” case study, with multiple units of analysis, related to the set of “decisions” to be made at a business model design level. As the validity and reliability of case studies rest heavily on the correctness of the information provided by the interviewees and can be assured by using multiple sources or “looking at data in multiple ways”[8], [32], multiple sources of evidences or research methods were employed: interviews – to be considered the primary data source –, analysis of internal documents – both official and informal –, study of secondary sources on the firm – research reports, websites, newsletters, white papers, databases, international conferences proceedings. This combination of sources allowed to obtain “data triangulation” or “perceptual triangulation”, essential for assuring rigorous results in qualitative research [6].
98
A. Ghezzi, A. Rangone, and R. Balocco
4 The Initial Business Model Configuration The research took into consideration an Italian Technology Provider, just entering the new business area of Mobile Content. Founded in the early ’90 to operate as a software house for telecommunications systems, in 2006 the company’s core business lied on the design and provision of customer care multichannel platforms (e.g. call centers, Interactive Voice Response etc.). At this time, the company had matured advanced skills in content management and channels integration. Moreover, in 2006 the Management Buy-Out process started two years before was completed, making the company totally independent from the group it previously belonged to. For the top management, it was time to look for a business expansion, in order to create the conditions for higher growth and revenues. As the CEO stated: “Now the company structure is linear, and we find ourselves in an ideal situation for making strong strategic choices”. Thanks to the past cooperation with actors belonging to the Mobile Industry, the company had the chance to come into contact with the Mobile Content segment, in which it perceived an high level of attractiveness and potential profitability, especially in the niche of video services. The main reasons for the subsequent choice of penetrating the Mobile Content market were disclosed by the initial declarations of the CEO: “We consider the business area as particularly attractive, because of its vicinity to our core, and of the prediction that incumbent players are about to invest heavily on infrastructure platforms to enable their value added services offer. The market is going to grow dramatically in the short term: and we want to be there when that happens”. This point was later confirmed by the Marketing & Sales Manager: “Our solution portfolio could be easily enlarged to embrace innovative mobile video solutions both Mobile Network Operators and Mobile Content & Service Providers are going to need to deliver their rich media services”. The development of the new platform represented an addition of functionalities to the existing solution, and did not represent a major technological issue for the company’s software engineers. According to the CIO: “After having developed the platform for fixed and IP network, for us, the step of integrating the mobile channel was a piece of cake. We had the technology, we had the know how: it was just about applying it all to a new market”. The idea of positioning the offer on the video segment held some criticalities, that were quickly overcome thanks to the experience matured in similar project. This clearly emerged from the words of the Product Manager: “Making the platform capable of real-time assembly and delivery of video content was quite messy and made us sweat; but nothing we couldn’t handle after all. We had done that before”.
Strategic Planning, Environmental Dynamicity and Their Impact
99
The market value drivers the company wanted to leverage on appeared clear and recognizable to the management: video services and real-time content creation and adaptation were key to success. Therefore, the MMTP was positioning itself to deliver innovative, high quality solutions, looking for product leadership in the promising video services niche. Concerning the role the company desired to play within the Mobile Content Value Network, a clear statement by the CEO synthesized it: “We are essentially a technology provider, and we want to maintain our traditional focus”. The Marketing & Sales Manager further explained this topic, presenting the motivations for such choice: “We want our scope to remain strictly technological. This is for three main reasons: first, we don’t want to move too much from our core business; second, we don’t want to be forced to internally develop the infrastructure and know how necessary for creating and commercializing digital content; and third, we definitely want to maintain a clear separation between our business and our customers’. This last one is a key point. The idea that we may represent a threat to their business, because of the overlapping of one or more activities, mustn’t even cross our customers’ mind”. The previous argument allows to infer that in the initial configuration, the firm intended to cover the Platform Layer activities of the Value Network, without any interference with the Content & Service Layer. The “pure technology provider” positioning was reinforced by further decisions concerning platform provisioning and complementary services: in-house installation of the Content & Service Delivery Platform within Mobile Network Operators’ (MNOs) and Mobile Content & Service Providers’ (MCSPs) infrastructure was the only option made available; customers could also rely on the MMTP for the delivery of technical services related to the platform’s operation management – e.g. maintenance, upgrading, etc. –. With reference to the revenue model adopted, the company opted for a rigid platform selling to the customer, characterized by fixed revenues for the MMTP. The possibility of establishing a “revenue sharing” model, where revenues coming from the selling of content and services published on the platform are shared between the MMTP and its customer, was strongly criticized by the Marketing & Sales Manager: “We absolutely don’t want to set up a dirty model where our revenues and our customers’ revenues are somehow not clearly distinguished. Revenue sharing is not just a way too risky option for a technology provider: it’s simply wrong. Our positioning must be fully transparent to our customers”.
5 The Current Business Model Configuration When the firm was contacted again in 2008, the situation looked radically different than two years before. The company’s future within the market was far from looking bright. Falling short of managers expectations, the market had failed to keep its promises of high growth and consistent revenues. Instead, it had revealed its true
100
A. Ghezzi, A. Rangone, and R. Balocco
nature: a context characterized by high levels of complexity, dynamicity and scarce predictability of future trends. According to the CEO, the current situation the company was going through was discouraging: “We predicted the market, especially the video segment, would grow dramatically. And when I say dramatically, I don’t mean a 15%-20% growth per year: we expected a 50% growth rate. Well, till now, this just didn’t happen. This is an area we’ve been heavily investing on for three years […], and what we found out now is that, objectively, the results we obtained are so poor they wouldn’t justify to hold the current position”. Moreover, the international reach of the company allowed to verify that the criticalities were not depending on some specific condition proper of the Italian context, but could be considered a generalized characteristic of the global market. The market complexity and dynamicity are well depicted by the words spoken by the Marketing & Sales Manager, who spontaneously admitted the incapacity of predicting the Mobile Content’s future scenarios – the manager even got to ask the researcher for some “hints” to support an interpretation of the competitive environment, thus reinforcing the idea of absence of a clear direction –: “My idea of the current market trends is at the moment so confused that, personally, I don’t deny that giving the company a clear indication of where to invest, on which segments, on which kind of services, is really a tough call”. The causes for such change are brought back to the scarce commitment of MNOs, to the absence of a real “killer application” for video services and to the uncertainty caused by the unclear norms regulating mobile premium services commercialization. As the Marketing & Sales Manager and the CEO pointed out: “The MNOs themselves don’t seem committed, they don’t want to bet on innovative video services. And, even worse, when we sit around a table to discuss about any possible cooperation, they ask us what kind of services to develop to attract their own customer base. That’s something they should know! This is not a good sign”. Being the Operators the “network focal”, – i.e. the central firm within the network, expected to drive the whole market’s development –, the absence of strategic initiative on their part determines a strong sense of disorientation, making the identification of the market’s true value driver extremely complex. In order to cope with this unexpected situation, the company reacted trying to reposition itself: in doing so, it departed from the initial configuration, and appeared to be adopting an approach based on a higher openness and third parties involvement. The management started looking outside the company’s boundaries, searching for greater dialogue and interaction with other actors in the Value Network. The CEO stated that: “At the moment, we are constantly talking to every actor in the market”. The Marketing & Sales Manager reinforced this message, though clarifying that even this new open approach had not given much results yet:
Strategic Planning, Environmental Dynamicity and Their Impact
101
“We are looking at what’s going on in the market, to get some hint that can help us reposition our offer. However, the external situation looks really confusing: all the players in the chain seem to be taking different directions”. As a whole, all the informants perceived the urgent need of reshaping the business configuration under banner of flexibility, at all levels: from the value proposition to the activities covered, to the financial configuration adopted. Talking about the reorientation of the solution portfolio, the CIO commented: “We are going through a process of repositioning our platforms on more generalist content and services. We are also trying to figure out whether our video solutions may be reapplied to different contexts, like the Web”. The shift from a rigid vision of the products was also testified by the new tendency of establishing joint projects with several different market players, so to test, by “trial and error”, the commercial feasibility of the initiatives, without concentrating investments and the related risks. Consistent changes also affected the revenue model. Quoting the Marketing & Sales Manager: “Our level of flexibility is getting higher and higher as time goes by, and we are willing to set up a wider range of revenue models, if it can win us customers. We are even evaluating revenue sharing models, even if, I have to admit, I don’t like them that much”. The need of sustaining the business made the company even depart from its initial negative stance towards revenue sharing, regarded as dangerous for its competitive implications: as will be discussed later, such radical change can be interpreted as a symptom of the lack of a clear strategy driving the firm’s choices. Concerning the role the company wished to play within the value market, the environmental complexity led the management to strive for a more active positioning, potentially extending the original coverage of activities towards the downstream chain. As the Marketing & Sales Manager stated: “By taking part to the call for tenders for the outsourcing of an MNO’s Mobile Portal, we understood that many operators are looking for an editorial partner, not only for a technological one. This gave us a useful indication for orienting our future positioning”. The company was in desperate need of customers, and was ready to exploit every chance the environment was going to offer; even if this meant to abandon the “pure” technology provider role. As a conclusion, the top managers declared their will of remaining and keep investing on the Mobile Content market: nevertheless, they somehow admitted Mobile Content was never the strategic focus for the company. Taking from the words of the CEO: “We moved in the Mobile Content market as a diversification maneuver of our past offer. Thanks God, our main business unit is still focused on a different, consolidated market, creating 90% of our revenues. This allows us to treat the Mobile Content business area as start-up market, following the logics of resources allocation proper of businesses portfolio management”.
102
A. Ghezzi, A. Rangone, and R. Balocco
The business was and remained a “question mark”, and the company was trying to face turbulence and change though a profound reassessment of the initial configuration it shaped. In fact, this reassessment not only encompassed the business model adopted; it also dealt with the underlying strategic approach which guided the design of such model in 2006.
6 A Comparison between the Two Configurations 6.1 The Emerging Core Business Model Parameters Throughout the first wave of interviews with different managers, the emerging main theme or recurring issue was the search for the most suitable business architecture for competing in the newly entered market. Therefore, business model design was found to be the “core category” [15] of the research. Through applying the “open coding” method proposed by Grounded Theory approach, the main “conceptual categories” related to the core category were labeled and identified. Such categories corresponded to the core business model parameters or building blocks for the Mobile Middleware Technology Provider under study. MMTP Business Model
Business Modeling Parameters Value Proposition 1. Platform characteristics 2. Offer positioning 3. Platform provisioning 4. Additional services 5. Resources & Competencies
Value Network 1. Vertical integration 2. Customer ownership
Financial Configuration 1. Revenue model 2. Cost model
Fig. 1. MMTP Business Model Parameters Reference Framework
The findings are synthesized in the “MMTP Business Model Parameter Reference Framework” below provided, which identifies three macro-dimensions, in turn divided into nine parameters. 1. Value Proposition parameters. Platform characteristics; Offer positioning; Platform provisioning; Additional services; Resources & competencies. 2. Value Network parameters. Vertical integration; Customer ownership. 3. Financial Configuration parameters. Revenue model; Cost model. As it will become clear by analyzing the framework, some building blocks were present in previous models – in particular [2] –, while others – as not present in the
Strategic Planning, Environmental Dynamicity and Their Impact
103
existing literature, or not made explicit – were modified or originally created to better express some aspects strictly linked to MMTPs. For each parameter, a definition will be provided, and the specific “parameters values”, i.e. the company’s choices concerning the initial business model configuration, will be described and discussed. Platform characteristics: As the CSDP is the core element of MMTPs’ value proposition, its characteristics are a key parameter to be modeled, as they strongly affect the firm positioning. In the initial business model configuration, the firm opted for developing an end-to-end solution, only open to some degree of modularity. This choice was driven by the habits matured within the firm’s traditional business – customer care platforms –, where the firm was used to develop vertical solutions for its customers. Offer positioning: Offer positioning is related to the choice of developing a CSDP devoted to the management & delivery of “mature” content – Sms, Mms, logos, wallpapers, ringtones and so on [4] –, or meant to deal with more innovative and cutting edge services – like video services or Mobile Tv –. As mentioned earlier, the company’s management decided to focus on the video services niche, believed to be particularly attractive. Platform provisioning: The CSDP provision modality is an emergent parameter – not present in the existing literature –, particularly interesting in the case of MMTPs, as it influences the kind of relation the technology supplier creates with its business customers. In the initial configuration, the company only considered the installation of the platform in the customers’ “house”, maintaining a clear separation between customer and supplier infrastructures. Again, this situation was something the company was accustomed to, and derived from the approach followed in the traditional business. Additional services: Another original parameter for MMTP business model design, additional services refers to the complementary offer accompanying the CSDP selling, which can range from a simple technological management of the platform’s operation – e.g. maintenance, upgrading etc. – to, in some rare case, a commercial management of the content and services published on the platform itself. Coherently to the CEO declaration, in the initial business configuration the company remained strictly focused on a technology dimension. Resources & Competencies: As the “research based view” and the “dynamic capabilities approach” state, a firm’s collection of path-dependent core resources and competencies (R&C), strongly influence its ways of seeking competitive advantage [3][16][27]. In 2006, the company showed a clear prevalence of technology-oriented R&C, making it better disposed towards a simple technological partnership with its potential customers. Vertical integration: The level of vertical integration refers to the MMTP coverage of activities in the Mobile Content Value Network. The clear initial positioning on the Platform Layer activities denotes a firm choice of self-relegation to the technology
104
A. Ghezzi, A. Rangone, and R. Balocco
enabler role, staying out of the downstream chain that allows direct contact with the end user, and clearly separating the MMTP business from the ones of its customers. Customer ownership: Strongly related to the choices concerning vertical integration, customer ownership deals with the nature of relationship established between the MMTP and the end customer. In the first configuration, an intermediated customer ownership on the Technology Provider’s part, implying a higher reliance on MNOs and MCSPs, was selected; the CSDP vendor was going to receive only indirect revenues streams from its business counterparts. Revenue model: The revenue model parameter refers to the kind of revenue streams flowing from the MNO/MCSP to the MTTP, that can vary from mere selling of the platform, to a full revenue sharing agreement on the content/services delivered through the CSDP. The choices at this level are strictly related to the platform provisioning parameter, and shall be considered extremely critical, because of their many implications on the firm’s overall positioning and strategy. In the first business model, the company adopts a rigid “system selling” solution, granting a spot, fixed and “assured” revenue for the MMTP, and presupposing a clear distinction between its business and the ones of its customers. As stated before, the revenue sharing model is labeled as “way too risky” and “unfeasible”. Cost model: The cost model refers to the nature and sharing of investments undergone for CSDP development. In 2006, the company decided to rely heavily on internal resources, concentrating the investment within its perimeter. This way, a “product approach” was taken, that is, the MMTP delivers to the market a given product, with scarce or no cooperation of other players in financing the development process: the risks associated to development are not shared, but the player can benefit from a greater strategic independence after the solution is created. The combination of the above described parameters gave rise to the business model the MMTP adopted in 2006, right after its entry in the Mobile Content market. In the next paragraph, the business model currently employed by the firm will be described and analyzed, and a comparison between the two will be carried out, so to underline any changes, and attempt to interpret their causes. 6.2 The Shift of Business Models Parameters Value As emerged from the second wave of interviews, in 2008 the decisions taken two years before were put under discussion, and to a great extent reconsidered. At a business model design level, the partial realization of the mistakes made led to a heavy reconfiguration of many parameters from the original to the current model. The changes are synthesized in Table 1: the comparison is meant to underlining the parameters shift. As will emerge from the analysis carried out in the next paragraph, the origins of such changes cannot be brought back only to exogenous factors – i.e. the market complexity and dynamicity –, but also to endogenous ones, mainly related to the underlying strategic approach that drove the initial business model design process.
Strategic Planning, Environmental Dynamicity and Their Impact
105
Table 1. A comparison between the original and the current business model configuration
Value Proposition
Business Model Parameter Platform Characteristic
Original BM (2006) End to end solution Scarce modularity
Current BM (2008) Higher platform modularity and interoperability
Offer Positioning
Innovative video service coverage
More flexible, multi-purpose platform, open to generalist services Shift towards a “software as a service” (ASP) approach Evaluation of content marketing & sales option Effort to develop content-oriented r&c (“editorial partnership”) Search for higher integration of value adding activities Search for direct customer ownership More flexible revenue models, open to revenue sharing agreements Joint investments “Project approach”
Platform Provisioning
Financial Value Configuration Network
Additional Services Resources & Competencies Vertical Integration Customer Ownership Revenue Model
Cost Model
In-house installation Business separation Technology management of the platform Technology-oriented r&c Low integration Platform activities focus Intermediated ownership Indirect revenue streams System selling solution Spot, fixed revenues Concentrated investments “Product approach”
7 Discussion The longitudinal study on the business model adopted by the company from 2006 to 2008 showed significant changes in terms of values assumed by the core parameters identified. Form a superficial analysis, one could infer the reasons of such differentials are to be brought back exclusively to exogenous factors, i.e. the complexity, dynamicity and hard predictability of the Mobile Content market. Unexpectedly finding itself at the mercy of turbulent competition and uncertainty, one could conclude that the relatively inexperienced new entrant tried to shift from a rigid to a flexible business model, declining such choice at all levels. Instead, the profound reassessment of the initial business model configuration was only the tip of the iceberg: in fact, the company was undergoing a deeper redefinition and rebalancing of the underlying strategy that drove the first business model design; a strategy that proved itself deficient in the first two years of activity. As can be inferred by deeply analyzing the interviews and the additional sources available, in 2006, after the Management Buy-Out, the company found itself in the condition of looking for new revenues to sustain its growth. Almost accidentally getting in contact with players belonging to the Mobile Industry – these firms needing an adaptation of the traditional IVR platform for new purposes, different than the ones institutionally established –, the company sought to take advantage of a contingent opportunity, and started collecting information on the Mobile Content market. On the basis of such information and data – which, from an ex post analysis, can be labeled as fragmentary and incomplete –, lacking the necessary insight, a sloppy external strategic analysis was performed, which made the top management conclude the
106
A. Ghezzi, A. Rangone, and R. Balocco
market was attractive and extremely profitable. A deeper external analysis would have allowed to identify the market peculiarities, as well as the threats resident in the video segment, and develop a business strategy accordingly. A much deeper focus was put on carrying out an internal strategic analysis, aiming at identifying how the company could be adapted to fit the new business: the result was that, since the products could be easily adjusted to respond the apparent Mobile players needs, the top management was confident that the company could rapidly take the role of MMTP, substantially replicating the model adopted in the traditional business. Following the corporate strategy input, the management chose to “rush in” the neighboring business area where it could pursue correlated diversification. However, an insufficient effort was put in the development of a dedicated strategy at a business level, which lacked an adequate external analysis, and suffered from an unbalance on internal analysis. The CEO final statement concerning the relevance of the traditional business in comparison to the start up market almost appears to be an admission that Mobile Content was never a strategic priority for the company. The “lame strategy” resulting from this excessive “inward focus” taken by the management was not suitable for driving the competition in the newly entered market. It also made difficult to identify the right business model, which represents an concretization of the overall strategy. In terms of value proposition, the company was essentially bringing to a new market the slightly modified version of its traditional products portfolio, thus showing to follow a “technology-driven” approach implying the search for an application for a technology already available, not the answer to real customers needs – which, in the initial strategy development process, were never actually assessed. In the first two years of activity in the Mobile Content market (2006-2008), the top management gradually became aware of its mistakes made in the strategy definition process, which resulted in a wrong approach towards business model design with reference to the choices in the parameters design. The negative impact of the adoption of a rigid business model on the company’s performances were also amplified by exogenous factors like market dynamicity and turbulence. In order to get back on track, the management then looked for a repositioning of its offer, which ineluctably had to pass through the reshaping of the initial business model, and the further rebalancing of its underlying business strategy. In such strive for rebalancing the strategy and reshaping the business model straightaway while continuing to operate within the market, the search of constant dialogue and interaction with other firms in the value network become a key element. Through this new “outward focus”, the management tried to compensate the initial lack of insight on the market context by collecting information on the competitive environment by means of establishing a wide set of inter-firm relationships, thus maturing a “learning by doing” experience. In addition to this, the open and collaborative approach helped the company to diminish the impact of external uncertainty, by sharing opportunities and risks with new partners. As the company is now changing direction, it will need to find the right alignment between business models parameters through a “two footed strategy” balancing both the inward and the outward focus. Of course, the rebalancing process will not be completed straightforwardly. In the short run, such reorientation could also lead to
Strategic Planning, Environmental Dynamicity and Their Impact
107
choices which appear strategically incoherent, like the sudden reversal of the original stance concerning revenue sharing models. Concerning the comparison between the consecutive approaches towards strategy definition and business model design, the initial configuration sees the development of a “lame strategy”, resting only on internal strategic analysis, and a weak link between strategy creation an business model definition. On the contrary, the current configuration is characterized by a stronger tie bounding the “two footed strategy” – i.e. a strategy founded on both external and internal analysis – and the business model, and the strive to find the right alignment between internal and external focus. What emerges from the longitudinal case study is that, especially when facing convulse changes and uncertainty, the correct balance between external and internal strategic analysis, between the “inward focus” and the “outward focus”, is essential for developing an adequate business model. Had the company developed a wellbalanced strategy before entering the new market, the impact of the fast-changing environment on the initial business model would have been less dramatic, and would have not determined such radical changes in the parameters values.
8 Conclusions The research allowed to identify the core business model design parameters for Mobile Middleware Technology Providers; moreover, it shed light on the relationship existing between business model design and strategy definition. Concerning the first research objective, the findings show that some key business model parameters identified by the existing literature can be applied to MMTPs’ business model design activity, while others were missing or not made explicit. With reference to the influence of the context dynamicity and turbulence on business model design, the research demonstrates that what really matter in determining a change in the business model adopted are not only the exogenous factors – e.g. dynamicity, high competitive pressure, uncertainty etc. –, but also endogenous elements, like the nature and quality of the strategy definition process, the alignment between external and internal strategic analysis, and the ties bounding strategy to business model design. Business model design is intimately related to strategy, as the latter determines the former adequacy and performances. While a “lame strategy” where external and internal analysis are not correctly balanced leads to the development of an unstable business model, potentially more influenced by external dynamics, a strategy well grounded on both an inward and an outward focus represents a solid foundation for the business architecture, making it less vulnerable to uncertainty and change. The paper’s value for researchers can be brought back to its contribution to Value Network, Business model design and Strategy definition theories. Existing literature on Value Network – with specific reference to the Mobile Content Network – was extended, through the provisioning of a unified definition for the player typology under scrutiny and its role in terms of activities covered; in addition to this, the intrinsic value of establishing and maintaining a wide set of inter-firm relationships to obtain a more central role in the network was evidences. Business model design literature was applied to the study of a new player typology, and original design parameters have emerged. Moreover, the relation between strategy creation and
108
A. Ghezzi, A. Rangone, and R. Balocco
business model design was made explicit, through the in-depth analysis of how choices made at a strategy definition level affect the business model design process. The essentiality of achieving the right balance between external and internal strategic analysis, expressed by the “inward look” and the “outward look” concepts, was also confirmed. The value for practitioners lies in the creation and provisioning of a “reference model” capable of supporting the decision making process of business model design for a MMTP, as it presents strong ties between business model parameters and their strategic implications. The business model parameters presented in the reference framework are confirmed even after the change in the strategic approach: what varies is the values such parameters assume, as a consequence of a new strategic orientation of the management – passing from the “inward focus” to the “outward focus” –. Moreover, the study provides a “noteworthy case” of how the interpretation of corporate strategy’s priorities can influence business strategy definition, and, in turn, business model design. The research represent a significant step towards the development of business model design theory, with reference to MMTPs. Future works will need to confirm the generalizability of results, applying the reference framework to a wider sample of players, and to evaluate the impact of strategy definition on business model design in different contexts.
References 1. Little, A.D.: Key Success Factors for M-Commerce (2001), http://www.adlittle.com 2. Ballon, P.: Business modelling revisited: the configuration of control and value. Info. 9(5), 6–19 (2007) 3. Barney, J.: Firm resources and sustained competitive advantage. Journal of Management 17(1), 99–129 (1991) 4. Bertelè, U., Rangone, A., Renga, F.: Mobile goes Web. Web goes Mobile. Mobile Content Observatory Research Report (2008), http://www.osservatori.net 5. Blind, K.: Interoperability of software: demand and solutions. In: Panetto, H. (ed.) Interoperability of Enterprise Software and Applications, pp. 199–210. Hermes Science, London (2005) 6. Bonoma, T.V.: Case research in marketing: opportunities, problems, and a process. Journal of Marketing Research 22, 199–208 (1985) 7. Camponovo, G., Pigneur, Y.: Business Model Analysis applied to Mobile Business. In: Proceedings of the ICEIS 2003 (2003) 8. Eisenhardt, K.M.: Building theories from case study research. Academy of Management Review 14(4), 532–550 (1989) 9. Eisenhardt, K.M., Graebner, M.E.: Theory building from cases: opportunities and challenges. Academy of Management Journal 50(1), 25–32 (2007) 10. Fjeldstad, Ø.D., Becerra, M., Narayanan, S.: Strategic action in network industries: an empirical analysis of the European mobile phone industry. Scandinavian Journal of Management 20, 173–196 (2004) 11. Gersick, C.: Time and transition in work teams: Toward a new model of group development. Academy of Management Journal 31, 9–41 (1988)
Strategic Planning, Environmental Dynamicity and Their Impact
109
12. Ghezzi, A.: Emerging Business Models and strategies for Mobile Middleware Technology Providers. In: Proceedings of the 17th European Conference on Information Systems (ECIS 2009), Verona, Italy (2009) 13. Ghezzi, A.: A Strategic Analysis Reference Model for Mobile Middleware Technology Providers. In: Proceedings of the 8th International Conference on Mobile Business (ICMB 2009), Dalian, China (2009) 14. Ghezzi, A., Renga, F., Cortimiglia, M.: Value Networks: scenarios on the Mobile Content market configurations. In: Proceedings of the 8th International Conference on Mobile Business (ICMB 2009), Dalian, China (2009) 15. Glaser, B., Strauss, A.: The discovery of grounded theory. Aldine de Gruyter, New York (1967) 16. Hamel, G., Prahalad, C.K.: Competing for the future. Harvard Business School Press, Boston (1994) 17. Kuo, Y., Yu, C.: 3G Telecommunication operators’ challenges and roles: a perspective of mobile commerce value chain. Technovation, 1347–1356 (2006) 18. Li, F., Whalley, J.: Deconstruction of the telecommunications industry: from value chain to value network. Telecommunications Policy 26, 451–472 (2002) 19. McCutcheon, D.M., Meredith, J.R.: Conducting case study research in operation management. Journal of Operation Management 11(3), 239–256 (1993) 20. Meredith, J.: Building operations management theory through case and field research. Journal of Operations Management 16, 441–454 (1998) 21. Muller-Veerse, F.: Mobile Commerce Report. Durlacher Research Ltd. (1999), http://www.durlacher.com/downloads/mcomreport.pdf 22. Osterwalder, A.: The Business Model Ontology. A proposition in a design science approach. Phd Thesis, Ecole des Hautes Etudes Commerciales de l’Université de Lausanne (2004) 23. Peppard, J., Rylander, A.: From Value Chain to Value Network: an Insight for Mobile Operators. European Management Journal 24(2) (2006) 24. Pettigrew, A.: The management of strategic change. Blackwell, Oxford (1988) 25. Porter, M.E.: Competitive advantage: Creating and sustaining superior performance. Free Press, New York (1985) 26. Rappa, M.: Business models on the Web: managing the digital enterprise. North Carolina State University (2000) 27. Teece, D.J., Pisano, G., Shuen, A.: Dynamic capabilities and strategic management. Strategic Management Journal 18(7), 509–533 (1997) 28. Timmers, P.: Business models for electronic commerce. Electronic Markets 8(2), 3–8 (1998) 29. Turban, E., King, D.: Introduction to e-commerce. Prentice-Hall, New York (2002) 30. Weill, P., Vitale, M.: Place to Space: Migrating to E-Business Models. Harvard Business Press, Boston (2001) 31. Wirtz, B.W.: Reconfiguration of Value Chains in Converging Media and Communications Markets. Long Range Planning 34, 489–506 (2001) 32. Yin, R.: Case study research: Design and methods. Sage Publishing, Thousand Oaks (2003)
Collaboration Strategies in Turbulent Periods: Effects of Perception of Relational Risk on Enterprise Alliances* Marco Remondino, Marco Pironti, and Paola Pisano University of Turin, e-Business L@B, Cso Svizzera 185, Turin, Italy {remond,pironti,pisano}@di.unito.it
Abstract. A study at the enterprise level is carried on, through a simulation model, to explore different aggregate behaviors emerging from individual collaborative strategies and the effects of innovation diffusion. In particular, turbulent periods are taken into account (e.g. an Economic, Financial or Environmental crisis), during which the individual perception of the enterprises can be distorted both by exogenous factors and by endogenous ones. The crisis makes it evident for the enterprises the urge to revise their business model in order to adapt it to the changes of the external environment and of the competitive scenario. The analysis is carried on by means of agent based simulation, employing a comprehensive tool (E³) internally developed and here described. The results are mostly qualitative ones and show that, in response to crisis, communication complexity is reduced, power and influence become centralized, and concern for efficiency increases, leading to conservation of resources and greater behavioral rigidity in organizations. Keywords: Crisis, Innovation diffusion, Enterprise collaboration, Formation of alliances.
1 Introduction Researchers have posited a variety of behaviors that will occur within organizations faced with crisis [29]. The threat-rigidity effect hypothesizes that in response to crisis, communication complexity is reduced, power and influence become centralized, and concern for efficiency increases, leading to conservation of resources and greater behavioral rigidity in organizations [21] has also posited that “centralization is a likely outcome of organizational threats and crises, which provides a rationale for legitimately reasserting claims to centralized control”. Individuals may also underestimate the extent to which their own behavior contributes negatively to an organizational crisis, thus reducing their flexibility of response [17]. The rapid pace of technological development and increased globalization of the marketplace are creating a new competitive environment in which competing only with one's own resources has come to mean abandoning opportunities and resources available from *
Although the present article is the result of a joint research project, the seven total sections are divided among the authors as follows: sections 1 and 2 are jointly written and equally divided among the three authors; sections 4, 5, and 6 are by Marco Remondino; section 3 and 7 are jointly written and equally divided among Marco Pironti and Marco Remondino.
M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 110–125, 2011. © Springer-Verlag Berlin Heidelberg 2011
Collaboration Strategies in Turbulent Periods: Effects of Perception of Relational Risk
111
others. As a result, the formation of strategic alliances, defined as voluntary inter-firm co-operative arrangements, has become a noteworthy trend in recent years. In search of the so-called “collaborative advantage,” many firms are finding their performance and survival increasingly dependent on their collective activities with other firms. In this context, the role played by managerial perceptions in alliance structuring is crucial. A fair number of scholars have studied strategic decision-making in alliances, typically aiming at understanding the perceptions and decision contexts that form the basis of the partners' decisions. The essence of this approach lies in the prominent role assigned to decision-makers in the alliance-making process. Alliance decision-makers are no longer assumed as completely rational — rather, they are believed to have limitations in reasoning capacity. The studies about innovation prove that, beside the creation of innovations, it is also crucial to study their diffusion in the system in which the firms work and cooperate, i.e.: the network. At that level, it is important to clarify what an enterprise network is and why the firms start to cooperate inside the network for diffusing an innovation. A collaborative network is a whole of nodes and ties with a different configuration based on of what it has to achieve. These concepts are often displayed in a social network diagram, where nodes are the points and ties are the lines. The idea of drawing a picture (called a “sociogram”) of who is connected to whom for a specific set of people is credited to [31], an early social psychologist who envisioned mapping the entire population of New York City. Cultural anthropologists independently invented the notion of social networks to provide a new way to think about social structure and the concepts of role and position [32], [30], [6], an approach that culminated in rigorous algebraic treatments of kinship systems [43]. At the same time, in mathematics, the nascent field of graph theory began to grow rapidly, providing the underpinnings for the analytical techniques of modern social network analysis. The nodes represent the different organizations that interact inside the network and the links represent the type of collaboration between different organizations. The organizations could be Suppliers, Distributors, Competitors, Customers, Consultants, Professional Associations, Science Partners, Incubators, University, and so on. The kind of partner firms linked over a network looks to be related to the type of innovation occurring: for example incremental innovators rely frequently on their customers as innovation partners, whereas firms that have totally new products for a given market are more likely to collaborate with suppliers and consultants. Advanced innovators and the development of radical innovations tend to require a tighter interaction with universities. This point is supported by [18] in a survey of 4.564 firms in the Lake Constance region (on the border between Austria, Germany and Switzerland). By examining the interactions among firms, customers, suppliers and universities it emerges that firms that do not integrate their internal resources and competences with complementary external resources and knowledge show a lower capability of releasing innovations [17]. Philippen and Riccaboni [34], in their work on “radical innovation and network evolution” focus on the importance of local link formation and the process of distant link formation. Regarding the formation of new linkages Gulati [22] finds that this phenomenon is heavily embedded in an actor’s existing network. This means that new ties are often formed with prior partners or with partners of prior partners, indicating network growth to be a local process. Particularly when considering inter-firm alliances, new link formation is considered “risky business” and actors prefer alliances that are embedded in a dense clique were norms are more likely to be
112
M. Remondino, M. Pironti, and P. Pisano
enforceable and opportunistic behavior to be punished [21], [23], [35], [26]. Distant link formation implies that new linkages are created with partners whom are not known to the existing partners of an actor. At the enterprise level, [30] shows that distant linkage that serve as bridge between dense local clique of enterprises, can provide access to new source of information and favorable strategic negotiation position, which improves the firms’ position in the network and industry. In order to analyze the complex dynamics behind link formation and innovation diffusion, as long as their relationships, an agent based model is introduced in this work, and is formally analyzed.
2 Collaboration Strategies and Innovation Diffusion In generally, collaboration among competing firms may occur in many ways. Some examples are joint use of complex technological or marketing processes, bundling products or setting standards. Collaboration typically requires sharing information and know how, as well as resources. In the literature collaboration problems are usually studied with the help of methods from microeconomics and game theory. It turns out that the most important factors affecting the usefulness of collaboration are: - Market structure. If perfect competition prevails collaboration is of limited use. No single firm or proper subsets of firms may influence market prices and/or quantities. In a monopolistic environment there obviously is no room for collaboration. Consequently, the interesting market structure is an oligopoly. Depending on the kind of products offered and the way an equilibrium is obtained, price or quantity setting oligopolies may be distinguished. - Product relationship. Products offered may be substitutes or complements. In general, we would expect that products of competing firms are substitutes. Product differentiation, however, allows to vary the degree of possible substitution. - Distribution of knowledge and ability. The distribution of knowledge and ability is closely related to the possibility of generating sustaining competitive advantages [10]. If a firm has specific knowledge or specific abilities that competitors do not have it may use these skills to outperform competing firms. - Kind and degree of uncertainty faced by competing firms. Basically we may distinguish uncertainty with respect to common or private variables. As an example consider demand parameters. They are called common or public variables since they directly affect profits but are not firm specific. On the other hand variable costs are an example of private variables [25]. They are firm specific. Of course, knowledge of rival’s variable costs may affect a firms own decisions since it may predict rival’s behavior more precisely. - Risk preferences of competing firms. It is assumed that decision makers are risk averse. Hence they will not maximize expected profits as if they were risk neutral but expected utility of profits. The ties representing collaborations among firms can be different - inside the network - in structure, type and number. - type of ties: strong or weak (depending on the type of collaboration: contracted development, licensing, research partnerships, joint venture, acquisition of an owner of a given technology);
Collaboration Strategies in Turbulent Periods: Effects of Perception of Relational Risk
113
- structure of ties: long or short (for example industrial districts in which firms are geographic clusters or virtual clusters); reciprocal or not (firms that exchange competences each other or simply give/take); - number of ties: dense or not (depending on the number of links among the firms). The type and the number of ties affect the network efficiency: for example, a network composed of relationships with partners comprising few ties among them would enable control for the principle partner. A network of many non-overlapping ties would provide information benefits: in [40] the authors suggest that the number of collaborative relationships a firm is involved in, is positively related to innovation output while, conversely, closed networks have been found to foster innovation more than open ones [9]. A network composed of partners with many interlocking and redundant ties would facilitate the development of trust and cooperation. The firm’s position inside the network is as important as the number and type of ties. In [6] the authors find that rather than maximizing the number of ties, firms should strive to position themselves strategically in gaps between different nodes, so to become intermediaries. Contrary to this perspective, [3] propose that the best position is one where all the firms are tied only to the focal actor. On the other side, [4] suggests that the benefits of increasing trust, developing and improving collaboration and reducing opportunism shapes network structures creating cohesive interconnected partnerships. These consequent studies highlight that there is no consensus about which the optimal networking configuration should be. The configuration depends on the actions that the structure seeks to facilitate. The firms start to collaborate inside a network for different reasons: - risk sharing [16] - obtaining access to new markets and technologies [17]; - speeding products to market [1]; - pooling complementary skills [12]; - safeguarding property rights when complete or contingent contracts are not possible [22]; - acting as a key vehicle for obtaining access to external knowledge [28], [10]. The literature on network formation and networking activity therefore clearly demonstrates that whilst firms collaborate in networks for many different reasons the most common reason to do so is to gain access to new or complementary competencies and technologies. Those firms which do not cooperate and which do not formally or informally exchange knowledge and capabilities limit their knowledge base on a long-term and ultimately reduce their ability to access exchange relationships. When the innovation starts to circulate, it can affect the network collaboration efficiency: firms can decide to cooperate inside the network by developing an external exploration behavior, meaning that a firm decides to be related to other organizations in order to exchange competences and innovations. Otherwise if the firm considers its internal capability to create innovation as a point of strength, or if the cost of external exploration is perceived as higher than that of internal research, then it could prefer to assume an internally explorative behavior in which it tries to create new competences (and possibly innovations) inside the organization itself. During the process of innovation diffusion the network can change in the number of actors (exit and entry), and in numbers and patterns of link information [2].
114
M. Remondino, M. Pironti, and P. Pisano
The network can expand, churn, strengthen or shrink. Each network change is brought about by specific combination of changes in tie creation, tie deletion, and by changes in an actor's portfolio size (number of link) and portfolio range (numbers of partners) [2]. It’s normal that the modification depends on the original structure of the network. Also the propensity to collaborate inside a network affects innovation diffusion. When a network is a highly collaborative one, the innovation tends to diffuse more quickly, if the ties are dense, non redundant, strong and reciprocal. If the network is a collaborative one, but the ties are weak or unidirectional, the innovation spreads slowly and could not reach all the nodes in the network. To explore and analyze these complex social dynamics, an agent based model is described in the following paragraphs, that keeps into account most network and enterprise variables.
3 Collaboration in Turbulent Periods During the periods of crisis, be it both sector-based or systemic, especially if extended over time, enterprises tend to modify their approach to collaborative forms, even if not in univocal ways. The crisis makes it evident for the enterprises the urge to revise their business model in order to adapt it to the changes of the external environment and of the competitive scenario. The negative conjuncture, in fact, doesn’t involve, in the majority of cases, just a reduction of demand or of the availability of the factors of production (among which the financial capital becomes a very relevant resource), but also a modification – sometimes a deep one – of the variables influencing competitiveness and profitability of the enterprise. Business management has to face the changes induced by the new scenario, by acting in an effective and well-timed way, both on strategy and on business processes. One of the main factors influencing strategic change [21] is indeed the lack of results induced by the crisis. Business processes are reengineered, in order to regain efficiency and effectiveness, both to reduce costs and to get financial resources back and to increase the differentiation of products and services and adapt them to the renewed needs of current and potential customers. The impact on processes may happen by accelerating actions of progressive but continuous improvement, also by using some benchmarking techniques and increasing the outsourcing. If deeper modifications are needed, the management decides for a complete reengineering, in order to reach a significant improvement in the results of the enterprise [9]. The crisis, be it real or perceived as such, has an important role as an agent of change, even if it's the only motivation for starting a reengineering process [5]. According to [9] the drive is to be found in "crisis that will not go away"; crisis thus often weights upon collaborative processes among the enterprises, undermining the certainties that usually are created during expansion periods and can push enterprises to be excessively self-referring. The need of reducing costs, in particular the fixed ones, in order to face the reduction of business volumes, is often an important stimulus to look for - or to accelerate - collaborative forms for sharing resources and material and immaterial investments (both vertical - with suppliers - and horizontal - through consortia and bilateral deals). Strategic outsourcing, consortia for cost and investment sharing, joint-ventures, are all techniques and forms that, during a crisis, can be used in a pervasive way to face also the lower financial resources available for the enterprises that, having to change
Collaboration Strategies in Turbulent Periods: Effects of Perception of Relational Risk
115
for not succumbing, reduce the diffidence that during a positive period would increase their resilience and rigidity towards collaborative forms. An example can be from the automotive sector, where the sharing of flatcars, gears, engines usually gets more evident during the period of crisis, in order to share investments, thus reducing the individual breakeven point. In order to find new market openings and to motivate buying processes, innovation gets more and more crucial, and enterprises that sometimes try to enlarge their offering have to balance the need to invest in R&D with the few available financial resources. Collaborative forms can be an important answer also through forms of knowledgesharing. These conditions take firms to face an important strategic dilemma. On one hand, exploiting existing competencies may provide short-term success, although competence exploration can become a hindrance to the firm’s long term viability by stifling the exploration of new competencies and the development new resources [7]. On the other hand, deploying resource combinations as a form of external exploration search inherently mitigates the risk of failure by making the new combination perform in ways that firms in the mainstream market already value, while deploying resource combination as a form of internal exploration may dramatically enlarge the set of all possible deployments that are within someone’s ability and means to execute, increase the number of productive possibilities, and enable new market application to emerge [5]. The growth and development of a firm depend, therefore, on its ability to develop new capabilities and resources and introduce them in the market [3]. The crisis would seem to be an incentive to forms of collaboration, more or less structured, among enterprises. Notwithstanding, during periods of crisis, many forms of collaboration loose intensity and effectiveness. In particular those collaborative relations based on activity featuring a higher degree of risk - e.g. those requiring higher investments or with a longer period of return on investment - are the most affected ones. Many enterprises, when facing the need of gaining their profitability and financial resources back in the short term - also due to problems linked with financial constraints significantly reduce their investments towards activities, often carried on in a collaborative form with other enterprises or research facilities, for which they don't perceive a return on investment, or they consider it as a risky or a too long a term one. Focus on short term effect versus long term strategic vision impact on managerial perceptions in collaboration structuring, changing the bias of alliance structure based on their perceptions of relational risk and performance risk. In other cases, during a period of crisis, the conflicts among partners can be enhanced, due to the subdivision of the contributions or of the results provided by the collaboration. This approach could, in the medium term, be dangerous and penalize future competitiveness, since the excess of risk aversion or a strategic approach which is too oriented to the short term (the so called short-termism) limits the capacity of building and support any distinctive competitive advantage. During the periods of crisis in particular, the possibility is built for an enterprise to take opportunities that will have great value during the next positive cycle; this is done also by strengthening collaborative links, allowing to increase its knowledge capital, that's a fundamental driver for competition. The financial system, by using proper politics and tools, can play an effective and important role to avoid that any difficulty could induce enterprises - in particular the most fragile ones - to scatter what they realized. Another element that could influence in a negative way the propensity to collaborate is the increased competition, typical of
116
M. Remondino, M. Pironti, and P. Pisano
the periods of negative circumstances, during which the enterprises tend to widen their competitive space, thus increasing the probability of conflicts.
4 The Effect of Decision Makers’ Perceptions on Collaboration Our contribution makes a point of the decision-makers' perceptions regarding alliances. The perception of high-risk/high-trust prompts companies to prefer relational contracts — that is, strategic alliances. Managerial perceptions are considered a key ingredient in strategic decision-making. Decision-makers have been found to react to a situation based on their perceptions of a change of the environment, as a pre or post crisis period. Compared to the traditional industrial organization economics approach to strategy formulation — which relies exclusively on the objective environment — the managerial perceptions view helps capture the essence of the decision process. Although a relatively low correlation between the “objective” and the “perceived” environment has been reported, theorists tend to agree that it is the “perceived” environment that is most relevant to the process of making strategic decisions. However, few studies have paid sufficient attention to the connection between the objective environment and managerial perceptions. The decisions to adopt innovations, in their widest acceptance, have been investigated using both international patterns and behavioral theories. The Theory of Planned Behaviour (TPB) by [2] - an extension of the Theory of Reasoned Action [3] supports, in fact, the understanding of elements that describe an active and deliberate decisional process, set in a context characterized by constraints, such as social pressures and limited resources. After a close look, it is a question of central moments in the life on an enterprise which are strictly connected with obtaining positions that guarantee a lasting competitive advantage, also thanks to the exploitation of new technologies [5], [33] and, today, of ICTs [38], [39]. Learning from reinforcements has received substantial attention as a mechanism for robots and other computer systems to learn tasks without external supervision. The agent typically receives a positive payoff from the environment after it achieves a particular goal, or, even simpler, when a performed action gives good results. In the same way, it receives a negative (or null) payoff when the action (or set of actions) performed brings to a failure. By performing many actions overtime (trial and error technique), the agents can compute the expected values (EV) for each action. According to [21] this paradigm turns values into behavioral patterns; in fact, each time an action will need to be performed, its EV, will be considered and compared with the EVs of other possible actions, thus determining the agent’s behavior, which is not wired into the agent itself, but self adapting to the system in which it operates. Most RL algorithms are about coordination in multi agents systems, defined as the ability of two or more agents to jointly reach a consensus over which actions to perform in an environment. In these cases, an algorithm derived from the classic Q-Learning technique [22] can be used. The EV for an action – EV(a) – is simply updated every time the action is performed, according to the following, reported by Kapetanakis and Kundenko: (1) Where 0 vf alse . As an example, let c = 0.9, vtrue = 0.1, vf alse = 0.07. vc (Si ) is the number of items greater than avg(Si ) + c × δ(Si). Bit encoded in a subset Si is ‘1’ if vc (Si ) > vtrue × y, ‘0’ if vc (Si ) < vf alse × y and otherwise invalid. Algorithm 1 embeds a single watermark bit b in a subset Si . If it returns success, we insert the next bit, otherwise we insert the same bit, in the next subset. Detection algorithm works symmetrically, identifying watermarked bit in subsets created from Equations 3, 4. The watermarking scheme is presented to be resilient against several attacks such as re-sorting (obviously, the actual sorting that the watermarking algorithm uses is based on hash of the secret key and the items’ MSBs hence it is evident that re-sorting attacks do not alter the watermarking detection results), and subset selection (up to 50% data cuts). Although, subset addition attack is not discussed by the authors. The attacker inserts multiple instances of the same item in the set to distort the subsets used n for watermark detection. On an average, 2×y subsets are distorted. The watermark detection is affected based on the properties of elements that jump from one subset to another. The effectiveness of this attack needs to be measured experimentally, however, in Section 2.1, we provide a theoretical estimate of this SW S’s resilience against subset addition attack. 2.1 Drawbacks of SW S From our discussion above, we have identified the following drawbacks of SW S:
256
G. Gupta, J. Pieprzyk, and M. Kankanhalli
Algorithm 1. Single watermark bit insertion.
Input : Bit b, Subset Si Output: bit embedded status return success if ((b = 1 and vc (Si ) > vtrue × y) or (b = 0 and vc (Si ) < vf alse × y)); if b = 1 then while true do Select it1 , it2 ∈ Si ≤ avg(Si ) + c × δ(Si ); if it1 , it2 found then while it1 ≤ avg(Si ) + c × δ(Si ) do it1 = it1 +incrementValue; it2 = it2 −incrementValue; return f ailure if DU C violated; end return success if vc (Si ) > vtrue × y; end end else while true do Select it1 , it2 ∈ Si > avg(Si ) + c × δ(Si ); if it1 , it2 found then while it1 > avg(Si ) + c × δ(Si ) do it1 = it1 −incrementValue; it2 = it2 +incrementValue; return f ailure if DU C violated; end return success if vc (Si ) < vf alse × y; end end end return f ailure;
1. We need to preserve each subset’s average during watermark insertion. If the watermark bit is 1, then we choose two items, it1 , it2 < avg(Si ) + c × δ(Si ) and increase it1 while decreasing it2 until it1 ≥ avg(Si ) + c × δ(Si ). The condition increases the standard deviation and the value of avg(Si ) + c × δ(Si ) is different during watermark detection. This value should be remain the same during insertion and detection. Hence, instead of using avg(Si ) + c × δ(Si) as a bound, c × avg(Si) should be used. 2. The scheme is applicable to numeric set that follow a normal distribution; a theoretical bell-shaped data distribution that is symmetrical around the mean and has a majority of items concentrated around the mean. This is not practical in real life since a lot of candidate numeric sets watermarking might not be normally distributed. Secondly, even if we assume that the set is normally distributed, the chances of each subset following a normal distribution are even lower. Thus, a watermarking scheme should be independent of the data distribution.
Robust Numeric Set Watermarking: Numbers Don’t Lie
257
3. The sorting mechanism assumes that small changes to the items do not alter the subset categorization, which is based on MSBs. However, small modifications can change an item’s MSBs when the item lies in the neighborhood of 2x (let the set containing such items be N ) for x ∈ Z. For example, subtracting two from 513 (1000000001)2 would change it to 511 (0111111111)2, thereby modifying the MSBs. The attacker can hence, select these items and add a small value to the items in the left N so that they jump to the right neighborhood and vice-versa. SW S does not address this constraint and possible solutions. 4. The watermarking scheme actually relies on the enormity of available bandwidth with majority voting being used to determine the correct watermark bit. For an mbit watermark that is embedded l times, the data set needs to have m × l × y items. As an illustrative figure, for a 32-bit watermark to be embedded just five times in subsets containing 20 items, we need to have 3,200 items in the set. 5. Vulnerable to addition attacks: Assume that the adversary adds n ¯ instances of the same item to the original set of n items. The number of items in the new set is n′ = n+n ¯ . The added items are adjacent to each other in the sorted set, which is divided into y subsets, each containing n′ /y items. The starting index of the added items can be 1, . . . , n + 1 with equal probabilities . Let the probability of detecting the watermark correctly be P (A, i), where the starting index of the added items is i in n+1 1 the sorted set. Therefore, the overall detection probability is = n+1 i=1 P (A, i). From Figure 1, the modified subsets are divided into three categories: (a) G1 : Subsets containing items with index lower than that of added items and not containing any added items. (b) G2 : Subsets containing added items. (c) G3 : Subsets containing items with index higher than that of added items and not containing any added items. Each modified subset Si′ ∈ G1 contains σi = ny − i × ny¯ items of original subset Si and ζi = i × ny¯ items of the next original subset Si+1 . At some point of time, either the added items are encountered, or, σi becomes 0 (since gcd(n, n ¯ ) > 1). In the second condition, modified subset Si′ will contain ny − i × ny¯ items of next original subset Si+1 and ζi items of Si+2 (σi is 0 in this case, since the subset does not contain any of the original items). Thus, the probability of the correct watermark bit wi being detected in subsets in ′ G1 is F (σi , ny ) where F(a, b) is the probability of the correct watermark bit being detected in a subset with a of the original b items remaining. The probability of all |G1 | watermark bits being detected correctly is given as follows: |G1 |
P (d1 ) =
i=1
F (σi ,
n′ ) y
(5)
The second group G2 can be further divided into two categories: (a) G21 : Subsets containing both original and new items from the same subset (the only possibility of this is with the first subset in G2 ). (b) G22 : Subsets containing none of the original items.
258
G. Gupta, J. Pieprzyk, and M. Kankanhalli
✲ n y
✲ n Original set
n+¯ n y
added by attacker
✲
✲ ✲
✲
G2
✲
G1
G3
✲ n + n¯ Set after data addition
Fig. 1. Subset generation after data addition attack (multiple instances of the same item added their location in the sorted set represented in red line) ′
′
Watermark detection probability in G21 is F(σ(|G1 |+1)1 , ny ), and in G22 is F(0, ny ), achieving an overall watermark detection probability given below. P (d2 ) = F (σ(|G1 |+1)1 ,
|G2 −1| n′ n′ )× F(0, ) y y i=1
(6)
None of the subsets have the original items in G3 and therefore the probability of detecting the watermark correctly equals: |G3 |
P (detect3 ) =
i=1
F (0,
n′ ) y
(7)
The overall probability of detecting the watermark in the new set, P (detected), is, P (detect1 ) × P (detect2 ) × P (detect3 ). F (0, −) is negligible since the subset contains none of the original items. It can be see that P (detected) depends on the starting index of the added items in the modified set; if the added items are towards the front of the index-based sorted set, then the watermark is more likely to be erased.
3 Proposed Scheme We propose a watermarking scheme that inserts a single watermark bit in each of the items selected from a numeric set. During detection, we check if an item carries a watermark bit and verify whether the bit extracted from the watermarked item matches the expected watermark bit. If the proportion of items for which the extracted bit matches the watermark bit, to the total number of item carrying a watermark bit, is above a certain threshold, the watermark is successfully detected.
Robust Numeric Set Watermarking: Numbers Don’t Lie
259
During the insertion algorithm, the watermark should ideally be spread evenly across the set and should be sparse enough so that the watermark can survive active attacks. We distribute the watermark evenly across the set by selecting the items based on their MSBs. It is possible to make it sparse enough by embedding a watermark bit in one of every γ items. This can be done by checking if γ divides λ, where λ is a one way hash on a concatenation of M SB(f, si ) and a secret key K, shown as follows: λ = H(M SB(f, si )K)
(8)
We assume that we have ξ LSBs that can be modified without substantially reducing the data’s utility (value of ξ can be adjusted by the owner). The maximum distortion to the data without compromising it’s quality is 2ξ . The watermark bit is λ (mod 2). The owner marks v out of n items. If the detection algorithm identifies u′ items to be watermarked, out of v ′ items carry the correct bit, ′ then watermark presence is established if uv′ > α. Higher values of confidence level α imply lower false positive probability but lower resilience against attacks. The value α should be set to an optimal value, usually between 0.6 and 0.8. Finally, the bit location to be replaced by the watermark bit is identified. We input the maximum percentage change that can be introduced in an item, ǫ, and generate ξ = ⌈log2 (si × ǫ)⌉. We insert the replaced bit into the fractional part to enable reversibility. We can choose the location at which this bit is inserted in the fraction part as τ = λ (mod β), where β is the number of bits used to store the fraction part. As discussed in Section 2.1, even a small distortion during insertion or by the attacker can result in modifying MSBs if the item lies in N and therefore affect the detection process. Let there be one of out of ǫ items from S in N . Upon inserting a bit in an item from N , the watermarked item is ignored during detection with a probability of (γ − 1)/γ, which simply reduces the number of items in which the watermark bits are detected. There is a 1/γ probability that the modified item is still detected as carrying a watermark bit (Algorithm 3, line 5). When this happens, there is a 50% probability that the bit detected is, in fact, the correct watermark bit (Algorithm 3, line 11). Thus the overall probability that the watermark bit being detected incorrectly is 1/(2ǫγ). In normal circumstances, this is less than 1% since usually ǫ < 10 and γ ≈ 10. In stricter conditions where even a small proportion of watermark bits getting affected is unacceptable, a solution is to ensure that abs(si − 2x) > 2ξ , where abs(x) is a function that returns the absolute value of a number x ∈ R. Thus, an item si is chosen for carrying a watermark bit if λ (mod γ) = 0 AN D abs(si − 2x ) > 2ξ . From a security perspective, the attacker can ignore n/ǫ items that are in N while trying to remove the watermark, but apart from that, (s)he does not get any benefit. 3.1 Watermarking Algorithms The insertion and detection processes are provided in Algorithms 2, 3 respectively. In these algorithms lsb(x, y) refers to y th LSB of value x.
260
G. Gupta, J. Pieprzyk, and M. Kankanhalli
Algorithm 2. Watermark insertion.
I nput : Numeric set S = {s1 , . . . , sn }, change limit ǫ, bits used for fraction part β, Secret key K, Watermarking fraction γ Output: Watermarked set Sw λ = H(M SB(f, si )K); τ = λ (mod β); for i = 1 to n in steps of 1 do ξ = ⌈log2 (si × ǫ)⌉; if λ (mod γ) = 0 then //2x is the power of 2 closest to si ; if abs(si − 2x ) > 2ξ then int = ⌊si ; f rac = si − int; b = λ (mod ξ); lsb(f rac, τ ) = lsb(int, b); lsb(int, b) = λ (mod 2); end end end
Algorithm 3. Watermark detection.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
I nput : Watermarked set Sw , change limit ǫ, bits used for fraction part β, Secret key K, Watermarking fraction γ, confidence level α Output: Watermark presence status, Original set S = {s1 , . . . , sn } λ = H(M SB(f, si )K); τ = λ (mod β); for i = 1 to n in steps of 1 do ξ = ⌈log2 (si × ǫ)⌉; if λ (mod γ) = 0 then //2x is the power of 2 closest to si ; if abs(si − 2x ) > 2ξ then int = ⌊si ; f rac = si − int; b = λ (mod ξ); if lsb(int, b) = λ (mod 2) then match = match + 1; lsb(int, b) = lsb(f rac, τ ); end total = total + 1; end end end return true if lsb(int, b) = λ (mod 2), otherwise f alse;
Robust Numeric Set Watermarking: Numbers Don’t Lie
261
4 Analysis and Experimental Results 4.1 False Positive Probability First we discuss the false positive probability of our watermarking scheme. That is, what are the chances of a watermark detection algorithm detecting a watermark in an unmarked set S with parameters secret key K, fraction γ and confidence level α. The number of items in a random set identified as containing watermark bit are n′ = nγ and probability that the watermark bit will be detected correctly for an item is 1/2. Hence, at least α proportion of watermark bits identified correctly is given in Equation 9. This false positive probability is extremely and has shown to be around 10−10 in [1]. n/γ
i=α×n/γ
n/γ i
=
(1/2)i × (1/2)n/γ−i
n/γ
i=α×n/γ
=2
−n/γ
×
n/γ i
n/γ
(1/2)n/γ
i=α×n/γ
n/γ i
(9)
4.2 Security The attacks and our scheme’s resilience to them is provided next: 1) Set Re-ordering. The re-ordering attack is ineffective against the watermarking model since each item is individually watermarked and checked for watermark bit presence. 2) Subset Addition. Let the attacker add subset S1 containing nadd items to the watermarked set S2 containing n items. nγ out of nγ watermark bits will still be detected correctly in S2 . From S1 , a total of nadd will probabilistically be detected as marked γ and for each item considered to be marked, watermark bit will be detected correctly with a a 0.5 probability. Thus, the expected number of correctly detected bits from S2 nadd add /(2n) add is n2×γ . The overall watermark detection ratio is 1+n = 12 ), 1+nadd /n . For 50% ( n and 100% ( nadd = 1) data additions, the expected watermark detection ratio is 65 and 34 n respectively. For α = 0.7. The adversary needs to add at least 150 items for every 100 items in the watermarked set to have a decent chance of destroying the watermark. For α = 0.6, the number of items that need to be added to destroy the watermark increases to 400 items for every 100 items. Such levels of data addition are bound to have derogatory effect on data usability. Figure 1 illustrates the variation of watermark detection ratio with increasing levels of data addition. Experimental results from data addition attacks are given in Figure 2. The findings confirm our claim with all watermarked sets surviving attacks of up to 300% data addition for α = 0.6, and 95% of the watermarked sets surviving attacks of up to 100% data addition for α = 0.7.
262
G. Gupta, J. Pieprzyk, and M. Kankanhalli 1.2
Watermark survival ratio
1
0.8
0.6
α 0.4
0.2
0 50
0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 100
150
$n
200 add
250
300
× 100$
Fig. 2. Watermark survival with varying α, nadd
3) Subset Deletion. We assume that the attacker deletes nremove items from the watermarked set containing n items, leaving n − nremove items. The removed items have equal probability of being watermarked as the remaining items. Thus, the watermark (n−nremove )/γ detection ratio is (n−n = 1. But this does not mean that the watermarking remove )/γ scheme is unconditionally secure against subtractive attacks. If the number of remaining elements is extremely low, the false positive probability becomes unacceptably high and the adversary can claim that the watermark detection was accidental. However, it is only in the interest of the adversary to leave sufficient items so that the set is still useful. 4 a) LSB Distortion. We assume that the attacker has the knowledge of ξ for this discussion.This is to provide additional strength to the attack and thereby provide the worst case security analysis of the watermarking model. The attacker chooses nd items out of the total n items and flips all ξ bits in an attempt to erase the watermark. Watermark detection algorithm gets the watermark bits incorrectly from the nd items and, correctly from the other n − nd items. The watermark detection ratio in this case is 1 − nnd . This ratio needs to be at least α to detect the watermark. Hence, the upper limit on items that can be distorted is nd ≤ n × (1 − α). For α = 0.7, a maximum of 30% items can be distorted such that the watermark is preserved. Experimental results of LSB distortion attack are provided in Figure 3. The experiment was run on 200 numerical sets and computed the proportion of the times watermark survived when all ξ LSBs of 20% to 40% data items were flipped. The results show that the watermarking scheme is extremely secure against LSB bit flipping attacks for LSBs of 25% items being flipped. For 35% attack, the watermark survived an average of 62% times. For α = 0.60, the watermark survival rate drops to 46% times when attack level increases to 40%. We infer from experimental results that the optimal value of α is around 0.65, with which watermark has a high survival possibility and at the same time has a low false positive probability. 4 b) MSB Distortion. We assume that the attacker has the knowledge of f for this discussion. Again, this makes the adversary stronger and provides us with an estimate of the watermark’s resilience against acute attacks. The attacker chooses nd items out of the total n items and flips all f MSBs, resulting in modified λ. The watermark detection will detect the watermark bits correctly from the other n − nd items. For the items with distorted MSBs, there are two cases:
Robust Numeric Set Watermarking: Numbers Don’t Lie
263
Watermark survival ratio
1
0.8
0.6
α 0.4
0.2
0 20
0.60 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 25
30
35
40
$nd × 100$
Fig. 3. Watermark survival with varying α, nd
1. With a probability of γ−1 γ , λ (mod γ) = 0 and the item is not considered as carrying a watermark bit. 2. With a probability of γ1 , λ (mod γ) = 0 and the item is still considered as carrying a watermark bit. There is a probability of 1/2 that (λ (mod ξ))th LSB equals λ (mod 2). The following is an analysis of the expected value of watermark detection ratio. Within the distorted subset, the expected number of items considered as carrying a watermark bit is nd −γ+1 and the expected number of items in which watermark bit is detected γ correctly is
nd −γ+1/2 . γ
Expected value of watermark detection ratio in the final set is
n−γ+ 12
n−γ+1 .
We can see that, on an average, for sufficiently large n − γ, the expected watermark detection ratio after MSB modification attack is very close 1. During our experiments, the watermarks were detected at all times with all f MSBs of 20% to 40% items being flipped. The average watermark survival proportion under the three significant attacks of LSB distortion, MSB distortion, and data addition are presented in Figure 4. It can be seen from the figure that α = 0.65 is the optimal value, where the watermark has a high chance of survival while having a low false positive probability. 5) Secondary Watermarking. The security of the watermarking scheme against secondary watermarking attacks comes from the reversibility operation (storing the original bit replaced by the watermark bit in the fraction part). If r parties, O1 , . . . , Or watermark the same numeric set sequentially, then the objective is for the first party O1 to be established as original and rightful owner. It has been shown that owner identification is facilitated by watermarking schemes that provide reversibility [8]. Based on the experimental results, the current watermarking scheme provides security against secondary watermarking attacks with r ≤ 5. The watermark carrying capacity of the watermarking scheme is |{si : (abs(si − 2x ) > 2ξ )}|/γ, where 2x is the power to 2 closest to si . This is much higher than the capacity of |Si|S| |×m offered by SW S, where |S| is the size of the numeric set, |Si | is the size of the subsets and m is the number of times each watermark bit must be inserted. We designed experiments to test the watermarking capacities of both schemes
G. Gupta, J. Pieprzyk, and M. Kankanhalli
Watermark survival ratio
264
1 0.8 0.6 0.4
MSB distortion LSB distortion Data addition
0.2 0 0.6
0.62
0.64
0.66
0.68
0.7
α
Watermarking capacity in percentage
Fig. 4. Watermark survival with varying α 16
SWS capacity Our scheme’s capacity
14
12
10
8
6
4
2
0
1
2
3
4
5
Number of times each watermark bit is embedded for SWS
Fig. 5. Comparison of our scheme’s watermarking capacity with SWS
with the sets ranging from 1000 to 3000 items, each watermark bit being embedded 1 to 5 times in subsets containing 25 to 200 items for SWS. Our scheme had an average watermarking capacity of 8.28% for the 60 experiments while the overall average of SWS was 0.86%. The summary of results is presented in Figure 5.
5 Conclusions and Future Work The watermarking algorithms presented in this paper embed one watermark bit in every γ items of the numeric set S of size n. The watermark carrying capacity of this scheme is approximately n/γ if one considers the inability of certain items to carry a watermark bit. Our proposed scheme has no constraints on the distribution of items in the numeric set. In [11], unmarked numeric set is required to have a near-normal distribution, our scheme can be applied to a numeric set irrespective of its distribution. It is shown through experimental results that the watermark is resilient against data addition, deletion, distortion, re-sorting attacks as well as secondary watermarking attacks. The capacity of the watermarking scheme is significantly higher than that of the previous scheme ([11]).
Robust Numeric Set Watermarking: Numbers Don’t Lie
265
In future there are two directions in which we would like to carry our research forward: 1. Exploring the possibility of attacks targeted specifically for the given model and upgrading the model to provide resilience against these attacks. 2. Embedding an extractable watermark in the numeric set whilst providing the same level of security and capacity offered by our current scheme.
References 1. Agrawal, R., Kiernan, J.: Watermarking relational databases. In: Proceedings of the 28th International Conference on Very Large Databases, VLDB (2002) 2. Atallah, M.J., Raskin, V., Crogan, M., Hempelmann, C., Kerschbaum, F., Mohamed, D., Naik, S.: Natural language watermarking: Design, analysis, and a proof-of-concept implementation. In: Moskowitz, I.S. (ed.) IH 2001. LNCS, vol. 2137, pp. 185–199. Springer, Heidelberg (2001) 3. Bolshakov, I.A.: A method of linguistic steganography based on collocationally-verified synonymy. In: Fridrich, J. (ed.) IH 2004. LNCS, vol. 3200, pp. 180–191. Springer, Heidelberg (2004) 4. Bors, A., Pitas, I.: Image watermarking using dct domain constraints. In: Proceedings of IEEE International Conference on Image Processing (ICIP 1996), vol. III, pp. 231–234 (September 1996) 5. Collberg, C., Thomborson, C.: Software watermarking: Models and dynamic embeddings. In: Proceedings of Principles of Programming Languages 1999, POPL 1999, pp. 311–324 (1999) 6. Cox, I.J., Kilian, J., Leighton, F.T., Shamoon, T.: A secure, robust watermark for multimedia. In: Anderson, R. (ed.) IH 1996. LNCS, vol. 1174, pp. 185–206. Springer, Heidelberg (1996) 7. Cox, I.J., Killian, J., Leighton, T., Shamoon, T.: Secure spread spectrum watermarking for images, audio, and video. In: IEEE International Conference on Image Processing (ICIP 1996), vol. III, pp. 243–246 (1996) 8. Gupta, G., Pieprzyk, J.: Reversible and blind database watermarking using difference expansion. International Journal of Digital Crime and Forensics 1(2), 42 9. Qu, G., Potkonjak, M.: Analysis of watermarking techniques for graph coloring problem. In: Proceedings of International Conference on Computer Aided Design, pp. 190–193 (1998) 10. Sebe, F., Domingo-Ferrer, J., Solanas, A.: Noise-robust watermarking for numerical datasets. In: Torra, V., Narukawa, Y., Miyamoto, S. (eds.) MDAI 2005. LNCS (LNAI), vol. 3558, pp. 134–143. Springer, Heidelberg (2005) 11. Sion, R., Atallah, M.J., Prabhakar, S.: On watermarking numeric sets. In: Petitcolas, F.A.P., Kim, H.-J. (eds.) IWDW 2002. LNCS, vol. 2613, pp. 130–146. Springer, Heidelberg (2003) 12. Sion, R., Atallah, M., Prabhakar, S.: Rights protection for relational data. IEEE Transactions on Knowledge and Data Engineering 16(12), 1509–1525 (2004) 13. Venkatesan, R., Vazirani, V., Sinha, S.: A graph theoretic approach to software watermarking. In: Moskowitz, I.S. (ed.) IH 2001. LNCS, vol. 2137, pp. 157–168. Springer, Heidelberg (2001) 14. Zhang, Y., Niu, X.-M., Zhao, D.: A method of protecting relational databases copyright with cloud watermark. Transactions of Engineering, Computing and Technology 3, 170–174 (2004)
Corrupting Noise Estimation Based on Rapid Adaptation and Recursive Smoothing Franc¸ois Xavier Nsabimana, Vignesh Subbaraman, and Udo Z¨olzer Department of Signal Processing and Communications, Helmut Schmidt University Holstenhofweg 85, 22043 Hamburg, Germany {fransa,udo.zoelzer}@hsu-hh.de http://www.hsu-hh.de/ant/
Abstract. This work describes an algorithm that estimates the corrupting noise power from the speech signal degraded by stationary or highly non-stationary noise sources for the speech enhancement. The proposed technique combines the advantages of minimum statistics and rapid adaptation techniques to address especially low SNRs speech signals. In the first step, the algorithm starts the noise power estimation using minimum statistics principles with a very short adaption window. This yields an overestimation of the noise power that is finally accounted for using recursive averaging techniques. To ensure minimum speech power leakage into estimated noise power the algorithm updates the noise power using an unbiased estimate of the noise power from the minimum statistics approach. To outline the performances of the proposed technique objective and subjective grading tests were conducted for various noise sources at different SNRs. Keywords: Noise estimation, Minimum statistics, Recursive smoothing, Rapid adaptation, Voice activity Detection, Speech presence probability, Normalized mean square error.
1 Introduction Since the environmental or the background noise is the factor that degrades the most the quality and the intelligibility of the speech, the estimation of the corrupting noise has gained a lot of attention for decades. The improvement of the quality and the intelligibility of the degraded speech is thus very important because it provides accurate information exchange and contributes to reduce listener fatigue in highly disturbed environments. To achieve this goal, two main activities for the speech enhancement need to be done first. These are noise estimation and noise reduction techniques. This contribution presents only an algorithm that describes the estimation of the corrupting noise as the first step of the speech enhancement technique. The techniques to estimate the corrupting noise can be classified into two main types of algorithms. These are minima tracking and recursive averaging algorithms [1]. In minima tracking algorithms, the spectral minimum is continuously updated or tracks within a finite window. Optimal Smoothing and Minimum statistics algorithm is an example for the minima tracking type [2,3]. In recursive averaging algorithms the noise power in the individual bands is updated recursively, whenever the probability M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 266–277, 2011. c Springer-Verlag Berlin Heidelberg 2011
Corrupting Noise Estimation Based on RARS
267
of speech presence is very low. Minima controlled and recursive averaging for robust speech enhancement [4] and Rapid adaptation for highly non stationary environments [5] are examples for the recursive averaging type. There are also several noise estimation techniques proposed in literature [2,6,7,8,9]. Continuous Spectral Minima Tracking in Subbands (SMTS) proposed by Doblinger [10] is one of the classical noise estimation techniques. It is very simple but its performance suffers from pronounced overestimation. Optimal Smoothing and Minimum Statistics (OSMS) proposed by Martin [3] is one of the most commonly used algorithms for noise estimation in speech enhancement techniques. The noise power estimated by this approach is very good but the algorithm fails to track quickly the rapid increase of the noise power in the corrupted speech. Rapid Adaptation for Highly Non-Stationary Environments (RAHNSE) as proposed by Loizou [5] tracks quickly the sudden changes in the noise power. But this algorithm still suffers from some overestimation, as it partially relies on the SMTS approach. The motivation for this work is to combine advantages of some of above-mentioned techniques to derive a noise estimator which provides a minimum overestimation and a small adaptation time for suddenly increasing noise power. This work thus proposes a method to update the noise power recursively with minimum speech leakage. The adaptation time of this approach is comparable to the one of RAHNSE (0.5 sec). The objective grading tests and the subjective plots and spectrogram comparison reveal that the proposed algorithm performs better than the simulated OSMS and RAHNSE approaches. The rest of the paper is organised as follows. Section 2 presents some preliminary definitions. Section 3 discusses the proposed noise estimation algorithm. Section 4 exhibits experimental results and runs a comparative study. Section 5 finally concludes.
2 Preliminary Definitions Let consider the spectrum of a corrupted speech signal to be defined as X(k, m) = S(k, m) + N (k, m),
(1)
where S(k, m) and N (k, m) are the short-time DFT coefficients at frequency bin k and frame number m from the clean speech and additive noise respectively. S(k, m) and N (k, m) are assumed to be statistically independent and zero mean. The adjacent frames of the corrupted speech x(n) overlap by 75 % in time domain. The power level of the clean speech Rs (k, m), of the additive true noise Rn (k, m) and of the corrupted speech Rx (k, m) are obtained by squaring their respective magnitude spectrum. In this paper an algorithm to estimate Rn (k, m) from Rx (k, m) is proposed. The estimated noise power is represented by Rn˜ (k, m).
3 Proposed Technique Fig. 1 presents the flow diagram of the Rapid Adaptation and Recursive Smoothing (RARS) which is the proposed noise estimation technique in this paper.
268
F.X. Nsabimana, V. Subbaraman, and U. Z¨olzer
Fig. 1. RARS approach. Power Spectrum Smoothing (PSS), Bias Correction (BC), Noise Update (NU), Smoothing Parameter (SP), Speech Presence Probability (SPP), Voice Activity Detector (VAD), Smoothed SNR (SSNR)
In the RARS approach (s. Fig. 1), first the noise power is estimated using Optimal Smoothing and Minimum Statistics (OSMS) approach [3] with a very short window. This yields an overestimation of the estimated noise power. Based on the smoothed posteriori SNR from the OSMS noise power a VAD index I is derived to compute the speech presence probability P and a smoothing parameter η. This smoothing parameter is finally applied to the unbiased estimated noise power Ru from OSMS approach to account for the overestimation. In order to improve the adaptation time for the estimated noise power, a condition BC is used to track quickly the fast changes in the noise power. The proposed algorithm is not an optimal solution, yet practically it gives very good results. Optimization of the proposed approach is possible. In the followings the main steps of the RARS approach (s. Fig. 1) are individually described. 3.1 Rough Estimate with OSMS In the first step of the RARS approach, the noise power is estimated using OSMS approach with very short window length (0.5 - 0.6 sec). This causes an overestimate of the noise power since the window length is very small. The estimated noise power with OSMS using small window and the final estimate with RARS can be seen in Fig. 2, where green curve depicts the power spectrum of the corrupted speech, while red and black curve represent respectively the estimated noise power with OSMS and RARS approach. In Fig. 2 the aforementioned overestimation is clearly observed.
Corrupting Noise Estimation Based on RARS
269
Fig. 2. Rough estimate with OSMS vs. final estimate with RARS. Results for frequency bin k=8.
3.2 Speech Presence Probability In order to calculate the speech presence probability the idea proposed by Cohen [4] is used. Firstly the a posteriori SNR is calculated using the OSMS estimated noise power as ζ(k, m) =
Rx (k, m) . ROSMS (k, m)
(2)
Since ζ(k, m) is computed using overestimated noise power, it cannot be used directly. To overcome this effect the a posteriori SNR is smoothed over the neighboring frequency bins to take into account the strong correlation of speech presence across the frequency bins in the same frame [4]. Smoothed SNR is given by ˜ m) = ζ(k,
i=j
w(i) · ζ(k − i, m)
(3)
i=−j
where, i=j
w(i) = 1
(4)
i=−j
˜ m) is then compared and 2j + 1 is a window length for the frequency smoothing. ζ(k, with a threshold ∆ to derive a VAD index I(k, m) as follows,
270
F.X. Nsabimana, V. Subbaraman, and U. Z¨olzer
I(k, m) =
1 , 0 ,
˜ m) > ∆ if ζ(k, otherwise,
(5)
where ∆ is an empirically determined threshold and I(k, m) = 1 represents speech present bin. ∆ = 4.7 was proposed by Cohen [4]. Based on the VAD index the speech presence probability is then given by p(k, m) = γ · p(k, m − 1) + (1 − γ) · I(k, m),
(6)
where γ is a constant determined empirically. Values of γ ≤ 0.2 are suggested for a better estimate [4]. p(k, m) is the probability for the bin to be speech. If I(k, m) = 1, then value of p(k, m) increases, else if I(k, m) = 0, the value of p(k, m) decreases. It should be pointed out that Eq. (3) implicitly takes correlation of speech presence in adjacent bins into consideration. Note also that the threshold ∆ in Eq. (5) plays an important role in speech detection. If the threshold ∆ is low, speech presence can be detected with higher confidence thus avoiding overestimation [4]. 3.3 Smoothing Parameter With the help of the above derived speech presence probability a time frequency dependent smoothing parameter η(k, m) = β + (1 − β) · p(k, m)
(7)
is updated, where β is a constant. Values of β ≥ 0.85 yield a better estimate of η as proposed in [4]. If p(k, m) is high, then value of η(k, m) will be high. Else if p(k, m) is low, then value of η(k, m) will be low. η(k, m) takes value in the range β ≤ η(k, m) ≤ 1 . It is expected that the smoothing parameter will be close to 1 during speech presence regions. 3.4 Tracking Fast Changes An algorithm to track the fast changes in noise power is proposed here. The adaptation time for the proposed algorithm is around 0.5 sec, thus close to that of Rapid Adaption for Highly Non-Stationary Environments (RAHNSE approach) [5]. A simple and effective idea as proposed in [8] is applied here, which ensures that the proposed approach can track quickly changes in the noise power. First a reference noise power estimate using OSMS with a short window (0.5 sec) is computed. The corrupted speech power is smoothed with a low value smoothing constant. The idea here is to push the noise estimate into the right direction when there is an increase in noise power. The smoothed corrupted speech power is given by P (k, m) = α · P (k − 1, m) + (1 − α) · Rx (k, m),
(8)
where values of α ≤ 0.2 are suggested for better smoothing. From the smoothed power spectrum, Pm i n is found for a window length of at least 0.5 sec. Because of small
Corrupting Noise Estimation Based on RARS
271
smoothing constant, smoothed spectrum power almost follows the corrupted speech power. To account for biased estimate the following condition is tested: if BPmin (k, m) > ROSMS (k, m), then Ru (k, m) = BPmin (k, m),
(9)
where B > 1 is a bias correction factor. For the RARS approach B = 1.5 yields good bias correction. If the above condition fails then Ru (k, m) = ROSMS (k, m). In case of increase in noise power BPmin (k, m) will be greater than ROSMS (k, m). The value for ROSMS (k, m) is thus replaced by BPmin (k, m). For this case the probability is updated to p(k, m) = 0 and the smoothing parameter for noise update is then recomputed (s. Eq. (7)). Observations [8] reveal that the value of B and window length is not critical, but a window length of at least 0.5 sec is necessary for good performances. 3.5 Noise Power Update Finally with the frequency dependent smoothing factor η(k, m) from Eq. (7), the spectral noise power from RARS approach is updated using Rn˜ (k, m) = η · Rn˜ (k, m − 1) + (1 − η) · Ru (k, m).
(10)
The key idea of this algorithm is that instead of using the corrupted speech power Rx (k, m) to updated the noise estimate [5], the unbiased estimate Ru of noise power from OSMS algorithm is used. Since Ru (k, m) has minimum speech power as compared to corrupted speech power Rx (k, m), the speech power leakage into noise power in this approach is minimized. Whenever the speech presence probability is low, the estimated noise power will follow Ru (k, m). But when the speech presence probability is high, estimated noise power will follow the noise power in the previous frame. Thus, as shown in Fig. 2, the proposed algorithm (black curve of Fig. 2) avoids the overestimated values observed in the rough OSMS estimation (red curve of Fig. 2).
4 Experimental Results Fig. 3 presents the comparison between OSMS, RAHNSE and RARS approach in terms of rapid adaption and true minimum estimate. This simulation was run for a mixed signal where the first 500 frames consist of only clean speech and the last 500 frames consist of the same clean speech but corrupted with car noise at 5 dB SNR. The estimation for both parts of the mixed signal reveals the best minimum estimate for the RARS approach followed by OSMS. Best rapid adaptation is observed by RAHNSE followed by RARS approach. The adaptation time for the proposed approach is also around 0.5 to 0.6 sec as in RAHNSE approach. A comparison at only one specified frequency bin may not be sufficient to state about the performances of the three approaches. Figure 4 thus presents for the sake of completeness a comparative study of the estimated noise in terms of spectrograms. Obviously the result with the RARS approach (s. Fig. 4 lower plot right) is close to the true noise (s. Fig. 4 upper plot left). Some pronounced overestimations are observed in the RAHNSE approach (s. Fig. 4 lower plot left) especially for high frequency bands. The OSMS result in Fig. 4 upper plot right that can be found close to the RARS result than to the RAHNSE one, still reveals some over- and underestimation in some frequency bands.
272
F.X. Nsabimana, V. Subbaraman, and U. Z¨olzer
Fig. 3. Comparison in terms of true noise estimate and rapid adaptation time. True car noise (green), RAHNSE (blue), OSMS (red) and RARS (black).
4.1 Objective Quality Measurement A comparative study in terms of Normalized Mean Square Error (NMSE), Weighted Spectral Slope (WSS) and the Log Likelihood Ratio (LLR)[1,11] has been also conducted. As these three Parameters are distance measure, the best result should be a minimum. The Normalized Mean Square Error (NMSE) is here computed as
M −1 1 N M SE = M m=0
L−1
[Rn (k, m) − Rn˜ (k, m)]
k=0
L−1
2
, [Rn (k, m)]
(11)
2
k=0
where Rn (k, m) is the true noise power and Rn˜ (k, m) represents the estimated noise power. Ideally the value for N M SE lies in the interval [0 1], where 0 represents true estimation and 1 represents very poor estimation. But practically the N M SE value can be greater than 1 due to overestimation. Whenever there is an overestimation in the algorithm, the value for Rn˜ (k, m) can be twice greater than Rn (k, m) therefore the ratio in Eq. (11) can be greater than 1. All the signals used for the simulation in this paper are from the Noisex-92 database taken from Sharon Gannot and Peter Vary web pages. Table 1 presents the NMSE results for three kinds of corrupting noise and three
Corrupting Noise Estimation Based on RARS
273
Table 1. NMSE results. (a) car noise at 5dB, (b) room noise at 9 dB and (c) white noise at 9dB. car noise at 5dB OSMS 0.740 RAHNSE 0.692 RARS 0.601
room noise at 9 dB 0.211 0.391 0.061
white noise at 9dB 0.023 0.011 0.007
Fig. 4. Subjective study of spectrograms for the estimated noise. True car noise (upper plot left), OSMS (upper plot right), RAHNSE (lower plot left) and RARS (lower plot right).
compared noise estimation techniques, while Table 2 and 3 depict respectively the WSS and the LLR results for the same scenario. These results reveal that the RARS approach is graded best. While Table 1 and 2 reveal that the RAHNSE approach is graded second for two kinds of corrupting noise, Table 3 clearly shows that OSMS approach remains close to RARS approach for all three kinds of corrupting noise. In general the RARS approach is graded best for these three parameters. 4.2 Subjective Quality Measurement Although some objective quality measures may indicate or outline which technique is graded best, the results should subsequently be supported by a subjective quality measure for the sake of completeness. This can be done in terms of spectrogram or plots comparison on the one hand and informal listening test on the other hand. The following, Fig. 5 - 7 thus present the results of a subjective comparison between true noise and estimated noise for speech signal corrupted by car noise at 5dB, room noise at
274
F.X. Nsabimana, V. Subbaraman, and U. Z¨olzer
Table 2. WSS results. (a) car noise at 5dB, (b) room noise at 9 dB and (c) white noise at 9dB. car noise at 5dB OSMS 26.66 RAHNSE 20.02 RARS 17.38
room noise at 9 dB 19.80 29.46 15.46
white noise at 9dB 23.05 21.14 18.52
Table 3. LLR results. (a) car noise at 5dB, (b) room noise at 9 dB and (c) white noise at 9dB. car noise at 5dB OSMS 0.53 RAHNSE 0.93 RARS 0.39
room noise at 9 dB 0.10 0.21 0.09
white noise at 9dB 0.09 0.12 0.07
Fig. 5. Estimated noise power for speech signal corrupted by car noise at 5dB. Results for frequency index k=5.
9 dB and white noise at 9 dB. In Fig. 5 - 7, the green, red, blue and black curve represent respectively the true noise power, the estimated noise power from the OSMS, RAHNSE and RARS approaches. Only for the sake of simplicity, the comparison is presented here for the simulation of the estimated noise power at frequency index k = 5. Fig. 5 presents the plot of true noise power and the estimated noise power from a speech signal corrupted by car noise at 5dB. The purpose of the estimator is to find the mean value of the green curve from the corrupted speech power. It can be noticed that the red curve is below the expected mean value of the green curve. The blue curve (noise power estimated by RAHNSE) is instead pretty high. It clearly reveals some
Corrupting Noise Estimation Based on RARS
275
Fig. 6. Estimated noise power for speech signal corrupted by room noise at 9dB. Results for frequency index k=5.
overestimation. It is obvious that the black curve (estimated noise power by the RARS approach) clearly follows here the expected mean of the true noise power (see green curve). Fig. 6 depicts the plot of true noise power and the estimated noise power from a speech signal corrupted by room noise at 9dB. The green curve still represents here the true noise power. The black curve (noise power from RARS) reveals some underestimation of the noise power in the region of frame number 75 to 150. Outside this region it follows the mean of the true noise power. Blue (noise power from RAHNSE) and red curves (noise power from OSMS) are pretty close and they seem to better follow the expected mean value pretty well in this case. Fig. 7 shows the plot of true noise power and the estimated noise power from a speech signal corrupted by white noise at 9dB. The green curve still depicts here the true noise power. The red curve (noise power from OSMS) represents the underestimated power. Blue (noise power from RAHNSE) and Black curves (noise power from RARS) are pretty close. But an in-dept view states that the black curve really follows the expected mean of green curve. This subsection has presented a possibility to subjectively compare results of investigated techniques at specific frequency bin k. As the estimation is generally done at given frequency bin k and frame number m this method can help to control the performances of each technique over the entire signal. By this method one can also observe the evolution of the estimated noise power at frequencies of interest. Although the target noise power was estimated from the corrupted speech power without any knowledge of
276
F.X. Nsabimana, V. Subbaraman, and U. Z¨olzer
Fig. 7. Estimated noise power for speech signal corrupted by white noise at 9dB. Results for frequency index k=5.
the true noise power, the obtained results with the RARS approach remain close to the expected mean of the true noise. 4.3 Speech Enhancement Context As stated in Section 1, recall that the noise estimation is one of the two main activities needed to achieve the improvement of the quality and the intelligibility of the degraded speech. It is indeed the first step for a speech enhancement technique as the computation of the gain function in the noise reduction part mostly depends on the estimated noise power. For this reason the RARS approach [12] has been evaluated in the speech enhancement context as detailed in [13] using the technique proposed in [14]. To get a fair comparison, tests were carried out for different SNRs using additive white gaussian noise. A window length of 512 samples with a hop size of 25 % for analysis and synthesis is applied for all approaches. The spectrogram results [13] show that the RARS - IPMSWR approach preserves sibilants (s-like sounds) even for very low SNRs (5 - 10 dB). Informal listening tests conducted in [14] have outlined the best performances of the IPMSWR approach, which are still preserved even using the estimation from the RARS approach.
5 Conclusions A robust noise estimation technique based on minimum statistics, rapid adaptation and recursive averaging is presented here. The proposed approach that relies on the OSMS
Corrupting Noise Estimation Based on RARS
277
approach with very short window addresses the subsequent overestimation and adapts fast to rapid changes in noise power than the OSMS approach. The conducted objective study in terms of NMSE, WSS and LLR and the subjective study in terms of plots and spectrogram have both revealed that the RARS approach performs best especially for very low SNRs speech signals. The performances of the proposed technique in the speech enhancement context have been also discussed during this work.
References 1. Loizou, P.C.: Speech Enhancement Theory and Practice, 1st edn. Taylor and Francis Group, New York (2007) 2. Martin, R.: Spectral subtraction based on minimum statistics. In: Proc. of EUSIPCO (1994) 3. Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. on Speech, Audio Processing 9 (July 2001) 4. Cohen, I., Berdugo, B.: Noise estimation by minima controlled recursive averaging for robust speech enhancement. IEEE Signal Proc. Letters (January 2002) 5. Rangachari, S., Loizou, P.C.: A noise estimation algorithm for highly non stationary environments. In: Proc. of Speech Communications, vol. 48 (February 2006) 6. Cohen, I.: Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. IEEE Trans. Speech Audio Processing 411 (2003) 7. Rangachari, S., Loizou, P.C., Hu, Y.: A noise estimation algorithm with rapid adaptation for highly nonstationary environments. In: Proc. of ICASSP, vol. 1 (May 2004) 8. Erkelens, J.S., Heusdens, R.: Fast noise tracking based on recursive smoothing of mmse noise power estimates. In: Proc. of ICASSP (February 2008) 9. Erkelens, J.S., Heusdens, R.: Tracking of nonstationary noise based on data-driven recursive noise power estimation. IEEE Trans. on Audio, Speech, and Language Processing 16, 1112– 1123 (2008) 10. Doblinger, G.: Computationally efficient speech enhancement by spectral minima tracking in subbands. In: Proc. of Eurospeech, vol. 2 (September 1995) 11. Hu, Y., Loizou, P.C.: Evaluation of objective quality measures for speech enhancement, vol. 16, pp. 229–238 (January 2008) 12. Nsabimana, F.X., Subbaraman, V., Z¨olzer, U.: Noise power estimation using rapid adaptation and recursive smoothing principles. In: Proc. of the International Conference on Signal Processing and Multimedia Applications (SIGMAP 2009), Milan, Italy, July 7-10, pp. 13–18 (2009) 13. Nsabimana, F.X., Subbaraman, V., Z¨olzer, U.: A single channel speech enhancement technique exploiting human auditory masking properties. In: Journal of the 12th Conference for the International Union of Radio Science (URSI 2009), Miltenberg, Germany, September 28-October 01 (2009) 14. Nsabimana, F.X., Subbaraman, V., Z¨olzer, U.: A single channel speech enhancement technique using psychoacoustic principles. In: Proc. of the 17th European Signal Processing Conference (EUSIPCO 2009), Glasgow, Scotland, August 24-28, pp. 170–174 (2009)
Recommender System: A Personalized TV Guide System Paulo Muniz de Ávila1,2 and Sérgio Donizetti Zorzo1 1
Federal University of Sao Carlos, Department of Computer Science São Carlos, SP, Brazil 2 Pontifical Catholic University of Minas Gerais, Department of Computer Science Poços de Caldas, MG, Brazil {paulo.avila,zorzo}@dc.ufscar.br
Abstract. The Electronic Programming Guide helps viewers to navigate between channels, but the measure that new channels are available it is inevitable that information overload occurs making systems EPG inadequate. This situation arises the need of personalized recommendation systems. In this paper we present a recommendation system compliant with the middleware Ginga. Are presented the results obtained with three different mining algorithms running in a set-top Box using real data provide by IBOPE Midia. The IBOPE Midia is the company of the IBOPE group responsible for the communication, media, consumption and audience research. Keywords: Personalization, Multimedia, Recommendation system, Digital TV.
1 Introduction An essential change has been occurring in TV nowadays in Brazil: the migration from the analogical system to the system digital TV. This change has two main implications: the increase in the transmission of new channels with the same bandwidth and the possibility of sending multiplexed applications with the audio-visual content. As new channels emerge due to the transmission increase, it is necessary to create ways that allow the TV viewers to search among these channels. The Electronic Program Guide (EPG) helps the TV viewers. However, as new channels are available, an information overload is unavoidable making the EPG system inappropriate. In Shangai [1], a big city in China, the TV operators provide different services (in the analogical system, channels), and this number has been increasing at a rate of 20% per year. This way, the traditional EPG system became unattractive because it takes too long for the viewers to search in the hundreds of options available to find their favorite program. In face of this situation, the personalized recommendation systems are necessary. Different from the EPG functions which allow basic search, a personalized TV system can create a profile for each TV viewer and recommend programs that best match this profile, avoiding the searching in many EPG options to find the favorite M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 278–290, 2011. © Springer-Verlag Berlin Heidelberg 2011
Recommender System: A Personalized TV Guide System
279
program. The TV viewer’s profile can be realized in an explicit way where the system receives information about the preferences or it can be realized in an implicit way where the system can infer the TV viewers’ preferences analyzing their behavior background. In the DTV context, the implicit option is surely the best in face of the limitations imposed by the remote control to the data income. However, both systems can be used. To make the benefits (new channels, interactive applications) offered by the digital system possible, the TV viewers with analogical system need a new equipment called set-top box (STB). STB is a device which works connected to the TV and converts the digital sign received from the provider to audio/video that the analogical TV can exhibit. To have the advantages offered by the digital TV, the STB needs a software layer which connects the hardware to the interactive applications called middleware. The DTV Brazilian System middleware is Ginga [2,3]. It allows the accomplishment of declarative and procedural applications through its components Ginga-NCL [2] and Ginga-J [3]. Ginga-NCL accomplishes declarative application written in Nested Context Language (NCL) while Ginga-J can accomplish procedural application based on JavaTM known as Xlets [4]. This paper proposes an extension to Ginga-NCL middleware through implementation of a new module incorporated to the Ginga Common Core called Recommender. The Recommender module is responsible for gather, store, process and recommend TV programs to the TV viewer. To develop the Recommender module, it was used the Ginga-NCL middleware developed by PUC-RIO (Pontifical Catholic University of Rio de Janeiro), implemented in C/C++ language with source code available under GPLv2 license and according with the patterns defined by the Brazilian system digital television [5]. The rest of this paper is organized as follow: section 2 presents related works, section 3 describes the providers, section 4 presents a general view of Ginga-NCL middleware and the extensions proposed to support the recommendation system; section 5 details the experiences, the simulation environment and the outcomes and section 6 presents the conclusion.
2 Related Works There are several recommendation systems for DTV (Digital Television) designed to offer a distinct personalization service and to help the TV viewers to deal with the great quantity of TV programs. Some systems related to the current work are presented here. The AIMED system proposed by [6], presents a recommendation mechanism that considers some TV viewer characteristics as activities, interests, mood, TV use background and demographic information. These data are inserted in a neural network model that infers the viewers’ preferences about the programs. Unlike the work proposed in this paper, which uses the implicit data collection, in the AIMED system, the data are collected and the system is set trough questionnaires. This approach is doubtful, mainly when limitations imposed to data input in a DTV system are considered. In [7] a method to discover models of multiuser environment in intelligent houses based on users’ implicit interactions is presented. This method stores information in logs. So, the logs can be used by a recommendation system in order to decrease effort
280
P.M. de Ávila and S.D. Zorzo
and adapt the content for each TV viewer as well as for multiuser situations. Evaluating the TV viewers’ background of 20 families, it was possible to see that the accuracy of the proposed model was similar to an explicit system. This shows that collecting the data in an implicit way is as efficient as the explicit approach. In this system, the user has to identify himself in an explicit way, using the remote control. Unlike this system, the proposal in this paper aims at promoting services to the recommendation systems for a totally implicit multiuser environment. In [8], a program recommendation strategy for multiple TV viewers is proposed based on the combination of the viewer’s profile. The research analyzed three strategies to perform the content recommendation and provided the choice of the strategy based on the profile combination. The results proved that the TV viewers’ profile combination can reflect properly in the preferences of the majority of the members in a group. The proposal in this paper uses an approach similar to a multiuser environment, however, besides the profile combination, the time and day of the week are also considered. In [1] a personalized TV system is proposed loaded in the STB compatible with the Multimidia Home Plataform (MHP) model of the digital television European pattern. According to the authors, the system was implemented in a commercial solution of the MHP middleware, and for that, implemented alterations and inclusions of new modules in this middleware. Offering recommendation in this system requires two important information that must be available: programs description and the viewer visualization behavior. The description of the programs is obtained by demultiplexing and decoding the information in the EIT (Event Information Table) table. EIT is the table used to transport specific information about programs, such as: start time, duration and description of programs in digital television environments. The viewing behavior is collected monitoring the user action with the STB and the later persistence of these information in the STB. The work of [1] is similar to the work proposed in this paper. The implicit collection of data, along with the inclusion of a new module in the middleware architecture, are examples of this similarity. In [9], the Personalized Electronic Program Guide is considered a possible solution to the information overload problem, mentioned in the beginning of this work. The authors compared the use of explicit and implicit profile and proved that the indicators of implicit interests are similar to the indicators of explicit interests. The approach to find out the user’s profile in an implicit way is adopted in this work and it is about an efficient mechanism in the context of television environment, where the information input is performed through remote control, a device that was not designed to this purpose. In [10], the AVATAR recommendation system is presented, compatible to the European MHP middleware. The authors propose a new approach, where the recommendation system is distributed by broadcast service providers, as well as an interactive application. According to the authors, this approach allows the user to choose among different recommendation systems, what is not possible when we have an STB with a recommendation system installed in plant. The AVATAR system uses the approach of implicit collection of user profile and proposes modifications in the MHP middleware to include the monitoring method. The Naïve Bayes [11] is used as a classification algorithm and one of the main reasons for that is the low use of STB resources.
Recommender System: A Personalized TV Guide System
281
3 Service Provider This section presents important concepts related to the service provider, how the digital sign transmission is done and what information is provided and the relation with the recommendation system proposed in this paper. Besides the transmission of audio and video, the Brazilian system digital TV is supposed to send data to the TV viewer. The service providers can send via broadcast application written in JavaTM known as Xlets or NCL applications, and both of them are defined in the television Brazilian system. Besides the application, the providers send tables which transport information to the STB. This section gives details about two important tables to this context, the EIT and the SDT (Service, description Table). The digital television open systems adopt MPEG-2 pattern System – Transport Stream [12] to the multiplexation of elementary stream. To comprehend what is elementary stream, we have to understand how the digital sign is built. First, the audio captured by the microphone and the video captured by the camera are sent separately to the audio coder and to the video coder. The stream of bits generated by the coders, separately, is denominated elementary stream. Once multiplexed in a single stream of bits, the elementary stream is called transport stream. Two kinds of data structures can be multiplexed in a transport stream: the Packetized Elementary Streams and the sections. The sections are defined structures to transport tables known as SI - System Information – in the European [13], Japanese [14] and Brazilian [15] patterns, and PSIP - Program and System Information Protocol - in the American pattern [16]. Fig. 1 shows the process of coding, multiplexing and modulation. In this paper, the focus was on the EIT tables and SDT defined and patterned in ABNT NBR 15603 standard. The EIT table is used to present specific information about programs like the name of the program, the start time, duration, etc. The EIT table allows the availability of more information by the providers through its descriptors. One example of information transported by the descriptors is the program gender, age classification and the description of short or extended events. The SDT table contains information which describes the system services as the name of the service, provider of the service, etc. For the recommendation system proposed in this paper, the SDT table transports the name of the broadcasting station and the name of the service. The Brazilian system digital TV allows a broadcasting station to transmit more than one service (in the analogical system, known as channel) while the EIT table is responsible for the transportation of the name of the program, start time, duration and complementary information in its descriptors. For example, the descriptor of extended events of the EIT table allows the service provider (broadcasting station) to specify a summary of the program. These tables together transport essential information to present the EPG and they are very important in our recommendation system.
4 System Overview The recommendation system proposed in this paper is based on Ginga middleware where the procedural applications are developed using JavaTM language and declarative applications in NCL. As mentioned before, the version used was the open
282
P.M. de Ávila and S.D. Zorzo
source version of Ginga-NCL middleware. Fig. 1 presents its architecture consisting of three layers: Resident applications responsible for the exhibition (frequently called presentation layer); Ginga Common Core, a set of modules responsible for the data processing, information filtering in the transport stream. It is the architecture core; Stack protocol layer responsible for supporting many communication protocols like HTTP, RTP and TS.
Fig. 1. Ginga Middleware Architecture (adjusted with the recommendation system)
The proposed system extends the Ginga middleware functionalities including new services in the Ginga Common Core layer. The Recommender module is the main part of the recommendation system and it is inserted in the Common Core layer of Ginga-NCL architecture. The Recommender module is divided in two parts. The first one describes the components integrated to the source code of the middleware such as Local Agent, Schedule Agent, Filter Agent and Data Agent. The second part describes the new component added to the STB: Sqlite [17], a C library which implements an attached relational database. Fig. 2 presents the Recommender module architecture. 4.1 Implemented Modules This subsection describes the modules added to the Ginga-NCL middleware source code and the extensions implemented to provide a better connection between middleware and the recommendation system. Local Agent is the module responsible for constant monitoring of the remote control. Any interaction between the viewer and the control is detected and stored in the database. The Local Agent is essential for the recommendation system that uses implicit approach to realize the profile.
Recommender System: A Personalized TV Guide System
283
Fig. 2. Recommender Module Architecture
Scheduler Agent is the module responsible for periodically request the data mining. Data mining is a process that demands time and processing, making its execution impracticable every time the viewer requests a recommendation. Scheduler Agent module guarantees a new processing every 24 hours preferably at night, when the STB is in standby. Mining Agent is the module accesses the information in the viewer’s behavior background and the programming data from the EIT and SDT tables stored in cache to realize the data mining. In order to process the data mining, the mining module has direct access to the database and recovers the TV viewer’s behavior background. From the point of view of the system performance, this communication between mining module and user database is important. Without this communication, it would be necessary to implement a new module responsible for recover the database information and then make such data available to the mining algorithm. The second data set necessary to make possible the data mining is the program guide. The program guide is composed by information sent by providers through EIT and SDT tables. These tables are stored in cache and are available to be recovered and processed by the Mining module. Ginga-NCL Middleware does not implement storage mechanism in cache of EIT andSDT tables. This functionality was implemented by the RecommenderTV system. Filter Agent & Data Agent the raw data returned by the Mining Agent module need to be filtered and later stored in the viewer’s database. The Filter Agent and Data Agent modules are responsible for this function. The Filter Agent module receives the data from the mining provided by the Mining Agent and eliminates any information that is not important keeping only those which are relevant to the recommendation system such as the name of the program, time, date, service provider and the name of the service. The Data Agent module receives the recommendations and stores them in the viewer’s database.
284
P.M. de Ávila and S.D. Zorzo Table 1. Viewer Group Visualization Background
Channel
Program Name
Category of the program
Day
Time
period of view (min)
5
P1
News
Monday
Night
8
2
P2
News
Tuesday
Night
20
8 2 10 5 12
P3 P2 P4 P5 P6
Kids News Kids Novel Sports
Tuesday Wednesday Thursday Thursday Thursday
Morning Night Morning Afternoon Afternoon
40 25 40 60 30
4.2 Module Integrated to the Set-Top Box This subsection deals with the main characteristics which would allow the sqlite database connection to the Ginga-NCL middleware. Sqlite is a C library that implements an attached SQL database. The sqlite library reads and writes directly to and from the database file in the disk. Sqlite is recommended where the management simplicity, the implementation and the maintenance are more important than the great variety of resources implemented by SGBDs directed to complex applications. Some examples of sqlite uses are: attached devices and systems, small desktop applications. The sqlite database was chosen due to three facilities: 1) it is written in C language; 2) it was projected to operate in attached devices; 3) it allows Weka mining module to access the information stored in the viewer’s database. 4.3 Data Mining Algorithms In order to define which mining algorithm implement in the Mining Agent module, tests with three algorithms were performed: C.45, Naïve Bayesian and Apriori [11]. The algorithms tests were performed using the data set provided by IBOPE1.Mídia. The IBOPE Mídia is the company of the IBOPE group responsible for the communication, media, consumption and audience research. The IBOPE Mídia is already know by its researches in the audience area, but operates also in the advertisement investment area and in quantitative researches in all kinds of communication channels, whether it is Television, Radio, publishing and alternative media. The data are related to 8 families with different social-economic profiles. The visualization behavior was collected during 4 weeks, minute-by-minute in each house. In order to choose the algorithm, two STBs characteristics were considered: the quantity of memory and the processing capacity. The C.45 is a classification algorithm based on decision trees using the share and conquer concept. The Naïve Bayes is a classifier based on statistics, it is fast and 1
http://www.ibope.com.br
Recommender System: A Personalized TV Guide System
285
efficient when applied to a big data set and is similar in performance to the classifiers based on decision trees. The last compared algorithm is Apriori. It is an association algorithm applied to discover patterns hidden in the data set. The algorithm seeks for affinity among items and expresses it in the way of rules, for example, “70% of the visualization time on Mondays between 7:00 p.m. and 8:00 p.m. is news”. Another efficient algorithm is the SVM (Support vector machines). For the proposal of this paper, SVM was not used due to limitations imposed by the STB hardware. Next, we present the results of the algorithms comparison considering processing speed and recommendation accuracy. The accuracy is calculated using the following formula: (1)
δ
Where δ corresponds to the system efficacy and varies from 0 to 100, is the number of recommendations viewed by the TV viewers and is the number of recommendations performed. Table 2 presents the results obtained after the analysis of the background of 8 families during 4 weeks. Table 2. Comparison among Algorithms Algorithm
Average Time (s)
Accuracy Biggest Value Obtained among 8 houses
C.45
65
71.22 %
Naïve Bayes
54
70.90 %
Apriori
62
72.10 %
The conclusion is that the three algorithms have similar performance, however, with the great quantity of data analyzed, around 43 thousand tuples, Apriori algorithm had a better performance in processing time and accuracy. In such case, we chose to include it as a classification algorithm in the Mining Agent module.
5 Experiences and Results In order to validate the recommendation system, two environments were prepared. The first simulates the Ginga-NCL Virtual STB in accordance with the Brazilian standards. The second simulates a data carrousel generator in the provider making possible to send transport stream to the STB. To provide the environment simulating the provider, the data carrousel generator was used based on the open source project dsmcc-mhp-tools [18]. dsmcc-mhp-tools is a set of utilities to generate MPEG-2 Elementary Stream. The project provides tools to generate PMT, AIT, NPT tables and Object Carousel. For this context, only the utilities responsible for generating the PMT (Program Map Table) were used. This utilitarian contains a list of elementary streams identifiers which compose a service. Generally speaking, this table transports information which
286
P.M. de Ávila and S.D. Zorzo
allows the demultiplexator to know which package identifier (PID) transports the audio, video and data transmitted by the provider. Ginga-NCL Middleware offers support to the reception of transport stream via UDP protocol. In order to try and validate the recommendation system, the carousel generator transmitted data through this approach. It would be right to assume that EIT and SDT table data were available to the recommendation system, storing these tables in cache in advance. This approaching would allow testing the recommendation offered to the viewer but it would not allow the system validation in an environment very close to a real one, where problems related to the transmission happen frequently. In order to simulate the providers, it was used the program of three broadcasting stations in Brazil and then the EIT and SDT tables were generated for each station, taking into account a program for three days. EIT, SDT table and the video file in the transport stream (TS) format were multiplexed and transmitted to the STB through the data carousel generator. To each broadcasting station simulated, it was offered a data carousel generator, creating an environment very close to a real one where each provider is responsible for generating the transport stream. Fig. 3 shows this simulation environment composed by three PC computers, each one executing a copy of the carousel generator and the mini PC executing Ginga-NCL middleware with all functionalities and extensions proposed in this paper.
Fig. 3. Simulation Environment
In Ginga-NCL middleware, the location of each provider was configured in the multicast.ini file assigning an IP (Internet Protocol) address to each one of the three providers which compose the simulation environment. The TV viewers can search among three service providers adding the button Channel + or Channel – in the remote control, which notifies the reception of the transport stream coming from the next/previous data carousel generator defined in multicast.ini configuration file. Every time the TV viewer press the buttons in the remote control, it is monitored and stored in the viewer’s database and is used by the mining algorithms in the moment settled by the mining agent. During the testes, the efficacy of the Recommender module linked to Ginga-NCL middleware was
Recommender System: A Personalized TV Guide System
287
verified. As the viewer searched among service providers, the information in the transport stream was demultiplexed and decoded and in the case of EIT and SDT tables the information are stored in cache. 5.1 Results In order to validate the application, the data provided by IBOPE were used. The validation adopted an accuracy formula presented in (1).
Fig. 4. Accuracy of the Recommendation System
Fig. 4 presents the outcomes obtained after 4 weeks of monitoring considering the best value obtained among the 8 houses analyzed. It is clear that on the first weeks, as the collected data were few, Apriori algorithm did not extract relevant information from the preferences of the group. With the data increase in the visualization background on the third and fourth week, the algorithm obtained better results and the index of recommendation acceptance increased.
Fig. 5. Accuracy of the Recommendation System per house
288
P.M. de Ávila and S.D. Zorzo
Fig. 5 presents the accuracy per house. The main characteristic of the houses is the socioeconomics difference among them. The conclusion is that Apriori algorithm had a good performance unrestricted to the users ‘socioeconomic profile. Fig. 6 shows Recommender TV system. The application used as front-end is written in NCL and allows the TV viewer to search the recommendation list selecting the wanted program.
Fig. 6. Recommender TV System
6 Conclusions and Future Work With the appearance of digital TV, a variety of new services (in the analogical system, channels) will be available. This information overload requires the implementation of new mechanisms to offer facilities to the viewers looking for their favorite programs. These new mechanisms suggesting the viewers programs are known as recommendation systems. A recommendation system compatible with Ginga-NCL middleware is presented in this paper and it is implemented according to the standards of the digital television Brazilian system. With the purpose of simulating the life cycle of the digital television Brazilian system which starts in the service provider making the audio, video and data available and finishes in the viewer STB, a simulation environment was implemented. This environment is composed by three service providers transmitting audio, video and data to Ginga-NCL Virtual STB which is in accordance with the Brazilian standards. Despite Brazil broadcasts digital sign for more than one year, this is limited to audio
Recommender System: A Personalized TV Guide System
289
and video; data and consequently sections and tables, and it is not yet a reality in the country. It was a problem in the validation of the recommending system because it is necessary to access two important tables: EIT and SDT and these tables are not broadcasted by the service providers in Brazil so far. In face of this limitation, an EIT and SDT table generator was implemented in accordance with the Brazilian standards. In order to validate the recommendation system, it would be adequate to store EIT and SDT tables in the STB in advance, but for a simulation closer to a real situation such tables were sent in a transport stream what allowed verifying data demultiplexing and decoding as well as testing new modules included in Ginga-NCL middleware in order to store the data decoded in cache. With the implementation, it was clear that without the alterations proposed in this paper, a recommendation system implementation is impracticable. The necessity to keep the viewer’s behavior information in database associated with the necessity of storing the information coming from the service providers require the linking of new modules to Ginga middleware and the extension of others. This paper described the complete implementation of a recommendation system compatible with Ginga middleware. The expectation for future research is to extend the functionalities implemented in Recommender TV system, allowing the interoperability with other devices through UPnPTM/DLNA [19,20] protocol in a home networks.
References 1. Zhang, H., Zheng, S., Yuan, J.: A Personalized TV Guide System Compliant with MHP. IEEE Transactions on Consumer Electronics 51(2), 731–737 (2005) 2. Soares, L.F.G., Rodrigues, R.F., Moreno, M.F.: Ginga-NCL: The declarative Environment of the Brazilian Digital TV System. Journal of the Brazilian Computer Society 12(4), 37–46 (2007) 3. Souza Filho, G.L., Leite, L.E.C., Batista, C.E.C.F.: Ginga-J: The Procedural Middleware for the Brazilian Digital TV System. Journal of the Brazilian Computer Society 12(4), 47–56 (2007) 4. Sun Microsystems, Sun JavaTV: Java Technology in Digital TV (May 2009), http://java.sun.com/products/javatv/ 5. Ginga-NCL Virtual STB (May 2008), http://www.ncl.org.br/ferramentas/index_30.html 6. Hsu, S.H., Wen, M.H., Lin, H.C., Lee, C.C., Lee, C.H.: AIMED- A Personalized TV Recommendation System. In: Cesar, P., Chorianopoulos, K., Jensen, J.F. (eds.) EuroITV 2007. LNCS, vol. 4471, pp. 166–174. Springer, Heidelberg (2007) 7. Vildjiounaite, E., Kyllonen, V., Hannula, T., Alahuhta, P.: Unobtrusive Dynamic Modelling of TV Program Preferences in a Household. In: Tscheligi, M., Obrist, M., Lugmayr, A. (eds.) EuroITV 2008. LNCS, vol. 5066, pp. 82–91. Springer, Heidelberg (2008) 8. Zhiwen, Y., Xingshe, Z., Yanbin, H., Jianhua, G.: TV program recommendation for multiple viewers based on user profile merging. In: Proceedings of the User Modeling and User-Adapted Interaction, pp. 63–82. Springer, Netherlands (2006) 9. O’Sullivan, D., Smyth, B., Wilson, D.C., McDonald, K., Smeaton, A.: Interactive Television Personalization: From Guides to Programs. In: Ardissono, L., Kobsa, A., Maybury, M. (eds.) Personalized Digital Television: Targeting Programs to Individual Viewers, pp. 73–91. Kluwer Academic Publishers, Dordrecht (2004)
290
P.M. de Ávila and S.D. Zorzo
10. Blanco-Fernandez, Y., Pazos-Arias, J., Gil-Solla, A., Ramos-Cabrer, M., Lopes-Nores, M., Barragans-Martinez, B.: AVATAR: a Multi-agent TV Recommender System Using MHP Applications. In: IEEE International Conference on E-Technology, E-Commerce and EService (EEE 2005), pp. 660–665 (2005) 11. Wu, X., Kumar, V., Ross Quinlan, J., Ghosh, J., Yang, Q., Motoda, H., McLachlan, G.J., Ng, A., Liu, B., Yu, P.S., Zhou, Z., Steinbach, M., Hand, D.J., Steinberg, D.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2007), http://dx.doi.org/10.1007/s10115-007-0114-2 12. ISO/IEC 13818-1. Information technology – Generic coding of moving pictures and associated audio information - Part 1: Systems (2008) 13. DVB Document A038 Rev. 3. 2007. Specification for Service Information (SI) in DVB systems, DVB (2007) 14. ARIB STD-B10. 2005. Service Information for Digital Broadcasting System, ARIB (2005) 15. ABNT NBR 15603-1:2007. Televisão digital terrestre - Multiplexação e serviços de informação (SI) - Parte 1: Serviços de informação do sistema de radiodifusão (2007) 16. ATSC A/65b. Program and System Information Protocol, ATSC (2003) 17. Sqlite (May 2009), http://www.sqlite.org/ 18. dsmcc-mhp-tools (May 2009), http://www.linuxtv.org/dsmcc-mhptools.php 19. Tkachenko, D., Kornet, N., Kaplan, A.: Convergence of iDTV and Home Network Platforms. In: IEEE Consumer Communications and Networking Conference (2004) 20. Forno, F., Malnati, G., Portelli, G.: HoNeY: a MHP-based Platform for Home Network interoperability. In: Proceedings of the 20th IEEE International Conference on Advanced Information Networking and Applications (2006)
An Enhanced Concept of a Digital Radio Incorporating a Multimodal Interface and Searchable Spoken Content Günther Schatter and Andreas Eiselt Bauhaus-Universität Weimar, Faculty of Media, 99421 Weimar, Germany {guenther.schatter,andreas.eiselt}@uni-weimar.de
Abstract. The objective of this paper is to summarize relevant aspects of concept, design, and test of a considerably improved digital radio. The system enables users to request particular information from a set of spoken content and transmitted data services received by Digital Audio Broadcasting (DAB). The core of the solution is a search engine which is able to monitor multiple audio and data services simultaneously. The usage of several information sources, the retrieval process, the concept of the bimodal interface, and the conversational dialog are described with first results of the evaluation at the experimental stage. Hereby the DAB system turns from a classic only-audio receiver into a more enhanced multimedia platform as a part in symbiotic embedded systems such as multifunctional cell phones and sophisticated car entertainment systems. That way, radio can overcome its restrictions in content, presentation format and time and will be able to offer more comprehensive and contemporary choices. We show that a sophisticated radio may offer similar favorable characteristics referred to speech content like internet services do for text.
1 Introduction Real-time multimedia services have been contributed extensively to our life experience and are expected to be among the most important applications in the future Internet. However, the diversity of information and communication technology (ICT) is often narrowed to computers and the Internet. New ICTs are considered as separate from older ones while strategies and programmes that combine them hold more promise. For instance, the older medium radio is given less attention than during the past decades. Its history is long, and its presence in many people’s lives is quite normal, yet is often overlooked and ignored. However, in the digital age radio still has much to offer yet is underestimated and underutilized in many contexts. There is insufficient incorporation of new ICTs with older communication technologies, such as radio. This is despite the potential of radio to offer cheap and effective communication channels in sparsely populated or less developed regions. Radio should not lose its role in this audio world, at where you can have your own privacy, at where you can listen with simple devices to useful and entertaining information – everywhere with no extra costs. However, the development of digital radio is very variable, with some nations having a large number of digital radio services across the hole country, while others have very few services and devices. Dedicated digital radio systems have been developed M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 291–304, 2011. © Springer-Verlag Berlin Heidelberg 2011
292
G. Schatter and A. Eiselt
which replicate all the attributes of analogue radio in digital form, and the most advanced of these is the terrestrial Digital Audio Broadcasting (DAB) system [1]. The intention and expectation were that, while DAB would remain essentially a technology for the delivery of radio services, the data carried on the DAB transmissions could also include multimedia information, and that the radio would become a more sophisticated device, capable of presenting graphical information and the ability to store and replay broadcast audio. Furthermore, the digitalization of radio provided a number of new possibilities for augmenting the service and even challenging the meaning of the term radio. DAB permits a whole range of new data services besides audio broadcasts. These powerful data channels carry additional information (news, press reviews, images, videos, traffic information and control systems, data transmission to closed user groups, etc.) and all kind of data broadcasting services such as podcasts in addition to audio broadcasts [2]. Hence, one should always bear in mind that the value added messages or images should always be supplementary to the content and the audio effect of a radio program [3]. Further advances in human-computer interaction (HCI) research have enabled the development of user interfaces that support the integration of different communication channels between human and computer supported devices such as digital radio. Natural language has been increasingly incorporated as additional modality to interactive systems, and for the acoustic medium radio this seems natural. The rationale behind this trend is the assumption that these modes would facilitate the user interaction with the generic acoustic device. In the following chapters, we will summarize the prototypical development of a software research platform for DAB. In distinction from earlier approaches the system incorporates retrieval mechanisms for speech and data as well as a bimodal interface in order to give the listener an intuitive and fast access to the offered information. The solution consists of a USB receiver module and software for device controlling, signal processing, and conversation. We will show our efforts focused on two main objectives: Firstly, the implementation of a system to temporarily store speech-based audio content and data service elements combined with the capability to search for specific content. Examples are Broadcast Website (BWS), Electronic Programme Guide (EPG), Dynamic Label (DL), Programme Associated Data (PAD), and internal data of the DAB system. Secondly, the development of a speech-based user interface that enables users to operate the entire functional scope of a Digital Radio DAB alternatively by vocal commands. This development was focused on a system that conforms to the concepts of humancentered dialogs. The results are applicable also for other digital broadcasting systems than DAB/DMB. We demonstrate that a sophisticated radio may offer similar favorable characteristics referred to speech content like internet services do for text.
2 Related Work and Motivation During the last decade the broadcast news transcription task has attracted a lot of attention to the development of automatic speech-to-text transcription systems. First investigations into the use of speech recognition for analysis of news stories were carried out by Schäuble and Wechsler [4]. Their approach used a phonetic engine that
An Enhanced Concept of a Digital Radio Incorporating
293
transformed the spoken content of the news stories into a phoneme string. Later Whittaker et al. evaluated interfaces to support navigation within speech documents providing a mixed solution for global and local navigation by a graphical interface in combination with an audio output [5]. Emnett and Schmandt developed a system that searches for story boundaries in news broadcasts and enables a nonlinear navigation in the audible content with the help of different interface approaches [6]. The problems of mobile devices in rough environments in relation with a speechbased interaction were reported in [7]. A notification model dynamically selects the relevant presentation level for incoming messages (such as email, voice mails, news broadcasts) based on priority and user activity. Users can browse these messages using speech recognition and tactile input on a wearable audio computing platform. Recent developments for Large-Vocabulary Continuous Speech Recognition (LVCSR) systems used methods of neural networks and of cepstral coefficients [8]. A comprehensive overview of current research topics on advanced Speech Dialogue Systems (SDS) is given in [9]. The feasibility in mobile devices with the help of distributed approaches is shown in [10]. There are prototypes for web-based SDSs and audio search engines that are able to search spoken words of podcasts and radio signals for queries entered with a keyboard [11] [12]. In this context, the European Telecommunications Standards Institute developed a standard for DAB/DMB-based voice applications. The VoiceXML standard can be used to abstract the dialog constructs, the user input/output, the control flow, the environment, and resources [13].
3 Fundamentals In this section we introduce the information sources and approaches which are necessary for the development of the system. We will analyze the various data transmission techniques, present a survey of the available information sources in the Digital Radio environment, and introduce methods related to the Information Retrieval (IR) process such as the Automatic Speech Recognition (ASR), Music-Speech Discrimination (MSD) respectively speech extraction, and Text Processing (TP) as well as Text-toSpeech (TTS) procedures for the output of spoken information. 3.1 Information Sources Two general types of information sources can be distinguished for DAB, see Fig. 1: 1. The data services (service information, additional data services) are available as text. Internet-based information is applicable as well. Primary sources of information for the receiver are service- and program-related data, comparable with FM-radio data system (RDS). Broadcast Websites (BWS) contain multifaceted news, press reviews etc. Other sources of information are Dynamic Label Plus (DL), Intellitext, and Journaline, and Electronic Program Guides (EPG), where providers are able to transmit supplementary as well as program-independent information. 2. The audible signals are after MSD converted by an ASR into plain text in order to perform a content-based analysis. Internet audio and podcast files are applicable as well. Recorded information can be of any kind with regards to content e.g. breaking news, headlines, educational and cultural items, current affairs, discussions, etc.
294
G. Schatter and A. Eiselt Meta information
Service information
Service
- Basic (ensemble, date, time...) - Service-related (service labels) - Programme-related (type...) - Announcements*, tuning aids*
Data
TV Anytime-based
- Electronic Programme Guide - Broadcast websites - Dynamic label, Intellitext* - Slide show* (mostly pictures) - Journaline*, Traffic information*
Audio
Keywords linked
Text
Speech Automatic Speech Recognition Speech Information Retrieval Music Speech Music Discriminator Music/Noise Information Retrieval*
Topic, language, speaker; tracking & indexing; mood, emotion* Knowledge Genre; melody, rhythm, morphological related* Audio content
Gathering, separation
Analysis, retrieval
Audio
Storage, application
Fig. 1. Summary of available data sources for an information and knowledge base in a digital audio broadcasting system (*not yet in use)
Compared with audible signals the data services are more reliable with respect to structure and content, but less detailed and not always available. It was indispensable to establish a hierarchical sequence of sources depending on reliability, quality, and convenience. 3.2 Information Retrieval A part of the system design includes the extraction of information from the spoken parts of radio programs. Therefore it is at first necessary to distinguish in the audio signal between speech and non-speech (music, noise). The DAB standard already defines a so-called Music/Speech Flag (M/S) to indicate both types, but broadcasters hardly ever broadcast this information. Another way to discriminate music from speech is to analyze the signal for typical patterns. Features which are abstracting this information were analyzed focusing on the requirements of MSD in digital broadcasting in [2]. Earlier approaches on MSD were often too complex to monitor a higher amount of sources in parallel or not able to classify correctly speech in common arrangements in radio such as background music and cross-fadings. Therefore we propose a new audio feature, which is based on the fact that speech is recorded by one microphone, while music is normally recorded by at least two microphones (stereo). Related experiments have shown that the very low phase difference between the audio channels is a valid indicator for a speech signal. This feature is of very low complexity and can therefore be calculated even on a mobile device for multiple sources in parallel. Once the speech is separated from the music, an ASR system is capable to transform spoken parts into machine-readable text. In order to make a subsequent search more insensible to different forms of a word (singular/plural, conjugation, declination) each word is heuristically reduced to its stem [14]. Furthermore we use a tf-idf (term-frequency inverse document frequency) based weighting scheme [15] to rank the available text information with respect to the user-query and according to the relative importance of each word in it.
An Enhanced Concept of a Digital Radio Incorporating
295
3.3 Metadata for Spoken Content The functionality of IR systems is highly dependent on the usage of appropriate metadata. It is a difficult task especially for spoken content to come to a compromise between accuracy of description, universal adaptability, and computational overhead. It is possible to apply subsets of the TV-Anytime standard [16] offering XML-structures feasible in the field of search, recommender and archive applications [17]. There are a couple of different standards and approaches for DAB defining how to transmit additional information beside the actual audio program. An early proposal for a technique of machine-interpretable DAB content annotation and receiver hardware control, involving the utilization of Dynamic Label (DL) fields in the transmitted frames, was formulated by Nathan et al. in [18]. A similar approach was chosen in [19]. The separation of information is carried out in both cases by machine-readable control characters. DL plus uses a fixed data structure compared with the free-form predecessor DL but no dictionary, similar to Journaline and Intellitext. The complex EPG uses consistently TV-Anytime and MPEG-7 metadata with a XML-based data structure. The internal structure allows describing content elements in a hierarchical structure with any depth and level of granularity according to numerous descriptors. The defined catchwords and descriptors are derived from a controlled vocabulary that is capacious but of fixed extent. The TV-Anytime standard embraces a comprehensive collection of classification schemes. These schemes consist of elaborate arrangements containing pre-assigned terms that are used as catchwords to attach several categories to AV-contents. However, the descriptions appear not to be chosen systematically in every respect. Due to the fact that the transmission of data is not guaranteed and currently very sporadic, we extract any transmitted data, evaluate and assign it to the corresponding program.
4 System Design There are two fundamental designs proposed for the system (see Fig. 2): 1. A provider-side model: All IR processes will be done by the broadcaster (section A) who submits the gathered information to the DAB receivers using an appropriate meta description. 2. A user-based-side model: The navigation through the broadcast audio content is possible even without additional metadata from the broadcaster. In this case, all IR processes of IR have to be done carried out on the receiver (section B). To match the fundamental premises of an independent and ubiquitous system, two additional features were proposed with the focus on improved usability in a mobile environment: 1. The entire controllability of the radio device by verbalized user queries, 2. A memory function which allows not only to search in the current broadcast content but in stored content as well.
296
G. Schatter and A. Eiselt
Fig. 2. System overview of provider- and user-side model
4.1 Provider- and User-Side Model Provider-side Model. This design approach tries to avoid the restrictions brought about by limited resources on mobile devices. The IR process is centralized and carried out by the broadcaster, who adds the gathered information as a data service. As the broadcaster has much more powerful resources, the quality of the retrieved data may increase. On the other hand, although the proposed model is much cheaper than for example an equivalent editorial processing, many radio stations may not consistently broadcast this information for reasons of costs. User-side Model. In this model, the IR process will be carried out on the mobile device itself. The MSD and the TP can be done simultaneously and in real time for more than one monitored service, whereas the ASR is much more time-consuming. There are several principal system architectures for ASR on mobile devices discussed in [10], which consider this aspect. Because two of them are involving additional mobile devices whose availability could not be guaranteed in our case, the third, an embedded approach, is recommended. The advantage of this model is a much higher independency from the broadcasted data services. On the other hand, this model requires more powerful mobile devices, what limits the area of application. In a realistic scenario, both models have to be used. As long as the majority of broadcasters do not offer an appropriate service (provider-side model), the available information has to be enriched on the client side in order to present the user comprehensive search results. 4.2 Information Retrieval and Management The IR process is divided into three sub-processes which are successively executed: the MSD, the ASR and the TP, see Fig. 3. In order to retrieve information from the broadcast audio, the speech-based content is processed by an ASR system converting the audio data into plain text. To enrich the searchable data, information about the current program is extracted from the available data services (such as TVAnytime, EPG, BWS or DL) and if available from appropriate sources in the internet. Any available text is then split into words which are stemmed and inserted to a global tf-idf weighted index in order to make them searchable.
An Enhanced Concept of a Digital Radio Incorporating
297
As the capacity of non-volatile memory on mobile devices is limited, we implemented a strategy that is deleting those content elements that are probably not relevant for the user. For that purpose the personal preferences of users are stored separately based on the work in [20]. Here the preferences of users are centrally defined in a standardized format. Each element in the user’s preferences defines a preference value ranging between -100 and 100 for a certain content type or element. Depending on the value, the content element is persisted for a longer or a shorter time.
Fig. 3. Activity diagram of the IR process
The data which were extracted during the preceding IR process are subsequently subject to the automatic generation of metadata. They are transmitted in parallel to the audio services. There are three entities comprising this process: 1. The extracted data conforming to a proprietary structure, 2. The standardized metadata to transmit (EPG, TV-Anytime, MPEG-7), and 3. The converter that is automatically generating the metadata on the basis of the extracted data and the metadata structure to transmit. 4.3 Interfaces The system incorporates a multimodal user-interface, which enables the user to interact in two ways: • by a Speech-based User Interface (SUI), • by a Graphical User Interface (GUI). The structure of the system is based on the client-server model, see Fig. 4. At the one hand the client incorporates all functionalities related to ASR, speech synthesis and the GUI and at the other hand the server enables the client to access the entire functional scope of the radio. The choice of voice commands for the SUI was based on the following requirements: • Memorability: Commands had to be easily memorable in order to enable the user to reliably utilize all commands. • Conciseness: Users should be able to easily associate commands to functions.
298
G. Schatter and A. Eiselt
• Briefness: The length of a command was kept at a target size of 1-2 syllables. • Unambiguousness: The use of homonyms was strongly avoided. • Tolerance: The use of synonyms was desirable to a high degree. To make the communication with the user as natural as possible, the SDS furthermore incorporates a Text to Speech System that synthetically generates a speech signal according to the predefined dialog structure.
Fig. 4. Structure of the multimodal interface
5 Implementation The system was exemplarily implemented based on the user-based model pursuing the two design ideas: 1. The identification of speech-based content and the possibility for users to directly search for specific audio content was implemented with a graphical user interface. 2. The accumulation of text-based data services in combination with the capability to search for desired information was realized with a speech-based user interface. The system consists of two main parts. The first is a monitor comprising the three main processes of the information retrieval subsystem: MSD, ASR and TP. The second part is the graphical user interface allowing the user to search for content and access the audio files related to the results found. It is important to note that both interfaces could be utilized for either case. Our solution was realized on the basis of a DAB receiver (DR Box 1, Terratec) connected by USB. The System has been implemented in Java JDK 6/MySQL 5.1 on a standard laptop (Core2Duo; 1.8 GHz; 4GB DDR3-RAM; 500 GB HDD) and operates quite mobile. 5.1 Information Retrieval and Management The first main process records and processes the incoming audio data by MSD. The MSD is accomplished by at first decompressing the incoming MPEG signal. The raw PCM format is the processed by two audio feature extractors calculating the channel difference and the strongest frequency, to classify the current content, see Fig. 5.
An Enhanced Concept of a Digital Radio Incorporating
299
signal = getSignalFrameArray(signalSource) for i=0 to signal.count() step 2 do channelDifference += Abs(signal[i]-signal[i+1]) end for channelDiff /= signal.count()/2 powerSpec = getPowerSpectrum(signal) strongestFreq = 0 for i=0 to powerSpec.count() do if powerSpec[i] > strongestFreq then strongestFreq = powerSpec[i] end if end for class = classify(channelDiff, strongestFreq) return class Fig. 5. Music-Speech Discrimination in pseudo code
All audio data is recorded to a repository with separate folders for each digital broadcast service and child folders for music and speech. In order to keep the content elements in the original sequence the according audio files are labeled with the time stamp of their starting time. The second main process monitors the audio repository in parallel, see Fig. 6. This process permanently retrieves new files from the repository and converts the audio data into plain text (ASR). The prototype uses a commercial ASR engine with a large vocabulary base that was designed for speech-to-text dictation. These dictation recognizers are not designed for information retrieval tasks, but they can operate as speaker-independent systems. After the extraction the text is processed by a stop word removal and the stemming algorithm. The resulting set of words is written to a database including obligatory information about the associated audio file and contextual metadata. Additional metadata are included by parsing the BWSs, EPGs, DLs, and appropriate internet services. while monitorIsActive files = getFilesNotProcessedInAudioRepository() foreach files as filename do text = asr(filename) procText = textProcessing(text) service = getServicenameFromFilePath(filename) timestamp = getTimestamp(filename) meta = getRelatedMeta(service,timestamp) saveToDB(sender,timestamp,procText,meta) end foreach end while Fig. 6. Automatic Speech Recognition in pseudo code
In case of the provider-based model it is necessary to map the proprietary data into standardized XML-based metadata. This requires a converter for each pair of proprietary data structures and possible metadata standards to use for broadcasting. This aspect was realized exemplarily by a converter mapping the data structure to the EPG metadata standard, see Fig. 7. The converter tries automatically mapping the information from the extracted data to the standardized descriptors of the metadata structure.
300
G. Schatter and A. Eiselt
Fig. 7. Schematic mapping of proprietary data structure to standardized metadata
5.2 User Interface The second part of the system is the User Interface (SUI/GUI). The user is able to specify a query by voice or via a web interface. Subsequently the system parses/interprets the user’s input and searches for corresponding data in the database. The results are listed in the GUI as shown in Fig. 8. The user is able to select a content element as in a web browser and to listen to the associated audio file. Although our prototype utilizes underlying textual representation and employs text-based information retrieval techniques, this mechanism is hidden to a great extent from the user. [18:35:13] Deutschlandfunk „Kultur heute“ Theaterfestival in Berlin 04.03.10 – Programm: 17:35-18:00 Dauer: 4m 17s
Audio: alles | Musik | Sprache [14:22:03] Deutschlandfunk „Deutschland heute“ Open-Air-Ausstellung "Friedliche Revolution 1989/ 90" auf dem Berliner Alexanderplatz 07.05.09 – Programm: 14:10-14:30 Dauer: 6m 28s Audio: alles | Musik | Sprache [14:35:13] Deutschlandradio „Radiofeuilleton“ Das Tierstimmenarchiv am Berliner Naturkundemuseum 07.05.09 – Programm: 14:07-14:20 Dauer: 12m 56s Audio: alles | Musik | Sprache [17:35:13] Deutschlandfunk „Kultur heute“ Nofretete modern? – Zur Diskussion um die Authentizität der berühmten Statue in Berlin 07.05.09 – Programm: 17:35-18:00 Dauer: 4m 17s Audio: alles | Musik | Sprache
Menu
23:10 OK
Back
[23:45:03] Deutschlandradio „Fazit“ Tanztheater im Ruhrgebiet 04.03.10 – Programm: 23:05-24:00 Dauer: 2m 48s
Audio: alles | Musik | Sprache [18:35:13] MDR Klassik „Kulturnachrichten“ Theaterkrise in Gera? 03.03.10 – Programm: 18:30-18:38 Dauer: 1m 06s
Audio: alles | Musik | Sprache
Fig. 8. Design of a GUI and results for user queries
Over the GUI the user is able to decide for each result if he wants to hear only the piece where the keyword occurred, the whole program in which this piece occurred or only the speech/music of this program. A possible result-set is exemplified in Fig. 8.
An Enhanced Concept of a Digital Radio Incorporating
301
6 Use Case and Experiences The user is able to search the indexed content through verbal queries or accepts textual queries via a web-based interface with a query window, see Figure 5. In case of a spoken request, the system utilizes a speech recognition engine to retrieve a searchable string, which in the case of the web-based interface is entered by the user directly. The results could be presented either as spoken response utilizing TTS technologies, see Fig. 9 or through a text-interface, see Fig. 8. The results are furthermore ordered by their relevance represented by the count of occurrences of the keywords and the time when the related audio content was broadcast. For each result the user can decide to hear only the piece where the keyword occurred, the whole program in which this piece occurred or only the speech/music of this program.
Fig. 9. Example spoken dialog about traffic
Through relatively recent improvements in large-vocabulary ASR systems, recognition of broadcast news has become possible in real-time. Though, problems such as the use of abbreviations, elements of foreign languages, and acoustic interferences are complicating the recognition process. The combination of informal speech (including dialect and slang, non-speech utterances, music, noise, and environmental sounds), frequent speaker changes, and the impossibility to train the ASR system on individual speakers results in poor transcription performance on broadcast news. The result is a stream of words with fragmented units of meaning. We confirmed with our experiments an older study of ASR performance on broadcast news of Whittaker to this day [5], who observed wide variations from a maximum of 88% words correctly recognized to a minimum of 35%, with a mean of 67% (our results: 92%, 41%, 72%). Unfortunately most ASR programs do not show additional information; they do not offer any measure of confidence, nor do they give any indication if it fails to recognize anything of the audio signal. When the speech recognizer makes errors, they are gaps and deletions, insertions and substitutions of the inherent word pool, rather than the kinds of non-word errors that are generated by optical character recognition. Recent proper nouns, especially names, contribute significant error because they can not be transcribed correctly. It seems unlikely that error-free ASR will be available in the foreseeable future.
302
G. Schatter and A. Eiselt
However, highest precision is not really required for our approach. The goal is not to obtain a correct transcript, but simply to gather enough semantic information to generate a characterization that the system can employ to find relevant content. The interface offers primary the user the original audible content from recordings, because audio is doubtless a much richer medium of communication. Voice quality and intonational characteristics are lost in transcription, and intonational variation has been widely shown to change even the semantics of the simplest phrases. Hence, the presentation of texts is intentionally limited in contrast to [5]. An advantage of our system, also respective to the previously mentioned problem, is the low complex but efficient MSD. It enables us to monitor up to 30% more channels with still a good accuracy compared to a system with MSD of higher complexity. In any case MSD and ASR often lead into major difficulties while modern broadcast uses background music for spoken amounts. During an evaluation time of one month we were able to process up to four radio channels at the same time and integrate the obtained information automatically into our database for instant use. On the other hand the monitoring of several data services is possible without any problems. The limitation of ASR could be avoided by splitting up the task by parallel processing which can reduce the lag of time between the recording and the end of the indexing process. The current limitations of the introduced system have to be handled by more efficient speech recognition subsystems, sophisticated semantic retrieval algorithms, and a higher degree of parallel processing. Furthermore, prospectively a more natural communication style using a combination of speech, gesture and contextual knowledge should be possible. Therefore, a system able to interpret the semantics of speech is inevitable.
7 Conclusions and Future Work The Digital Radio was extended with the capability to systematically search for contents in DAB/DMB audio and data services; no major obstacles exist to extend the principles also on HD RadioTM, internet services, and podcasts etc. The functional enlargement of a digital receiver significantly adds value by promoting the evolution towards an embedded device providing innovative functionalities: • Interactive search for content from audio and data information sources, • Speech-based output of content, • Conversion of highly accepted internet services into the broadcast environment. The information extraction and retrieval process of broadcast information delivers a newspaper-like knowledge base, while web services provide an encyclopedia-like base. Even if ASR engines could supply accurate transcripts, they are to this day faraway from comprehending all that speech has to offer. Although recognizers have reached a reasonable standard recently, there are other useful information which can be captured from audio recordings in the future: language identification, speaker tracking and indexing, topic detection and tracking or non-verbal ancillary information (mood, emotion, intonational contours, accentuation) and other pertinent descriptors [21].
An Enhanced Concept of a Digital Radio Incorporating
303
Furthermore, the prospective capability of devices to adapt to preferences of specific users offers an enormous variety of augmentations to be implemented. In case of the functionality of directly searching for content elements as described in this paper, there is the possibility of especially selecting respectively appropriately sorting those elements that are conforming to the preferences of the current user. Hence, the development of radio usage from passive listening towards an interactive and individual dialog is strongly supported and the improved functionalities render the radio to be an appropriate device to satisfy much more multifarious necessities for information than before. As a result users are capable of selecting desired audio contents more systematically, with higher concentration and with higher density of information from current and past programs.
References 1. Hoeg, W., Lauterbach, T. (eds.): Digital Audio Broadcasting. Principles and Applications of DAB, DAB+ and DMB, 3rd edn. Wiley, Chichester (2009) 2. Schatter, G., Eiselt, A., Zeller, B.: A multichannel monitoring Digital Radio DAB utilizing a memory function and verbal queries to search for audio and data content. IEEE Transactions on Consumer Electronics 54(3), 1082–1090 (2008) 3. Chan, Y.: Possibilities of Value-added Digital Radio Broadcasting. In: Asia Pacific Broadcasting Union Conferences (2007) 4. Schäuble, P., Wechsler, M.: First Experiences with a System for Content Based Retrieval of Information from Speech Recordings. In: IJCAI 1995 Workshop on Intelligent Multimedia Information Retrieval (1995) 5. Whittaker, S., et al.: SCAN: Designing and Evaluating User Interfaces to Support Retrieval. In: Proceedings of ACM SIGIR 1999, pp. 26–33 (1999) 6. Emnett, K., Schmandt, C.: Synthetic News Radio. IBM Systems Journal 39(3&4), 646– 659 (2000) 7. Sawhney, N., Schmandt, C.: Nomadic radio: speech and audio interaction in nomadic environments. ACM Transactions on Computer-Human Interaction 7, 353–383 (2000) 8. Zhu, Q., et al.: Using MLP features in SRI’s conversational speech recognition system. In: Interspeech, Lisboa, September 4-8, pp. 2141–2144 (2005) 9. Minker, W., et al.: Next-Generation Human-Computer Interfaces. In: 2nd IEE International Conference on Intelligent Environments (2006) 10. Zaykovskiy, D., Schmitt, A.: Deploying DSR Technology on Today’s Mobile Phones: A Feasibility Study. In: André, E., et al. (eds.) PIT 2008. LNCS (LNAI), vol. 5078, pp. 145–155. Springer, Heidelberg (2008) 11. Červa, P., et al.: Study on Speaker Adaptation Methods in the Broadcast News Transcription Task. In: Sojka, P., et al. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 277–284. Springer, Heidelberg (2008) 12. TVEyes: Podscope - The audio video search engine (2009), http://podscope.com 13. ETSI: Digital Audio Broadcasting (DAB); Voice Applications In: ETSI TS 102 632 V1.1.1 (2008) 14. Porter, M.F.: An Algorithm for Suffix Stripping. Program-Automated Library and Information Systems 14(3), 130–137 (1980) 15. Spärck Jones, K.: A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28(1), 11–21 (1972)
304
G. Schatter and A. Eiselt
16. ETSI: Broadcast and Online Services: Search, select, and rightful use of content on personal storage systems TVA; Part 3. In: ETSI TS 102 822-3-1 (2004) 17. Schatter, G., Bräutigam, C., Neumann, M.: Personal Digital Audio Recording via DAB. In: 7th Workshop Digital Broadcast, Fraunhofer Erlangen, pp. 146–153 (2006) 18. Nathan, D., et al.: DAB Content Annotation and Receiver Hardware Control with XML. Computer Research Repository, CoRR (2004) 19. Schatter, G., Zeller, B.: Design and implementation of an adaptive Digital Radio DAB using content personalization. IEEE Transactions on Consumer Electronics 53, 1353–1361 (2007) 20. ETSI: Digital Audio Broadcasting (DAB); XML Specification for DAB Electronic Programme Guide (EPG). In: ETSI TS 102 818 (2005) 21. Magrin-Chagnolleau, I., Parlangeau-Vallès, N.: Audio-Indexing: what has been accomplished and the road ahead. In: Sixth International Joint Conference on Information Sciences, JCIS 2002, pp. 911–914 (2002)
Part IV
Wireless Information Networks and Systems
Modulation-Mode Assignment in Iteratively Detected and SVD-Assisted Broadband MIMO Schemes Andreas Ahrens1 and C´esar Benavente-Peces2 1
Hochschule Wismar, University of Technology, Business and Design Department of Electrical Engineering and Computer Science Communications Signal Processing Group, Philipp-M¨uller-Straße 14, 23966 Wismar, Germany 2 Universidad Polit´ecnica de Madrid, E.U.I.T de Telecomunicaci´on Ctra. Valencia. km. 7, 28031 Madrid, Spain [email protected], [email protected] http://www.hs-wismar.de, http://www.upm.es
Abstract. In this contribution we jointly optimize the number of activated MIMO layers and the number of bits per symbol under the constraint of a given fixed data throughput and integrity. In analogy to bit-interleaved coded irregular modulation, we introduce a Broadband MIMO-BICM scheme, where different signal constellations and mappings were used within a single codeword. Extrinsic information transfer (EXIT) charts are used for analyzing and optimizing the convergence behaviour of the iterative demapping and decoding. Our results show that in order to achieve the best bit-error rate, not necessarily all MIMO layers have to be activated. Keywords: Multiple-Input Multiple-Output (MIMO) System, Wireless transmission, Singular-Value Decomposition (SVD), Extrinsic Information Transfer (EXIT) Charts, Bit-Interleaved Coded Modulation (BICM), Iterative Decoding, Bit-Interleaved Coded Irregular Modulation (BICIM), Spatial Division Multiplexing (SDM).
1 Introduction Iterative demapping and decoding aided bit-interleaved coded modulation was designed for bandwidth efficient transmission over fading channels [1,2]. The BICM philosophy has been extended by using different signal constellations and bit-to-symbol mapping arrangements within a single codeword, leading to the concept of bit-interleaved coded irregular modulation (BICIM) schemes, offering an improved link adaptation capability and an increased design freedom [3]. Since the capacity of multiple-input multiple-output (MIMO) systems increases linearly with the minimum number of antennas at both, the transmitter as well as the receiver side, MIMO-BICM schemes have attracted substantial attention [4,5] and can be considered as an essential part of increasing both the achievable capacity and integrity of future generations of wireless systems [6,7]. However, their parameters have to be carefully optimized, especially in conjunction with adaptive modulation [8]. In general, non-frequency selective MIMO links have attracted a lot of research and have reached a state of maturity [6,9]. By M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 307–319, 2011. c Springer-Verlag Berlin Heidelberg 2011
308
A. Ahrens and C. Benavente-Peces
contrast, frequency selective MIMO links require substantial further research, where spatio-temporal vector coding (STVC) introduced by R ALEIGH seems to be an appropriate candidate for broadband MIMO transmission channels [10,11]. In general, the choice of the number of bits per symbol and the number of activated MIMO layers combined with powerful error correcting codes offer a certain degree of design freedom, which substantially affects the performance of MIMO systems. In addition to bit loading algorithms, in this contribution the benefits of channel coding are also investigated. The proposed iterative decoder structures employ symbol-by-symbol soft-output decoding based on the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm and are analyzed under the constraint of a fixed data throughput [12]. Against this background, the novel contribution of this paper is that we jointly optimize the number of activated MIMO layers and the number of bits per symbol combined with powerful error correcting codes under the constraint of a given fixed data throughput and integrity. Since the ”designspace” is large, a two-stage optimization technique is considered. Firstly, the uncoded spatial division multiplexing (SDM) broadband MIMO scheme is analyzed, investigating the allocation of both the number of bits per modulated symbol and the number of activated MIMO layers at a fixed data rate. Secondly, the optimized uncoded system is extended by incorporating bit-interleaved coded modulation using iterative detection (BICM-ID), whereby both the uncoded as well as the coded systems are required to support the same user data rate within the same bandwidth. This contribution is organized as follows: Section 2 introduces our system model, while the proposed uncoded solutions are discussed in Section 3. In Section 4 the channel encoded MIMO system is introduced, while the computation of the extrinsic information transfer function is presented in Section 5. The associated performance results are presented and interpreted in Section 6. Finally, Section 7 provides our concluding remarks.
2 System Model When considering a frequency selective SDM MIMO link, composed of nT transmit and nR receive antennas, the block-oriented system is modelled by u=H·c+w .
(1)
In (1), c is the (NT ×1) transmitted signal vector containing the complex input symbols transmitted over nT transmit antennas in K consecutive time slots, i. e., NT = K nT. This vector can be decomposed into nT antenna-specific signal vectors cµ according to T T T . (2) c = cT 1 , . . . , cµ , . . . , cnT In (2), the (K ×1) antenna-specific signal vector cµ transmitted by the transmit antenna µ (with µ = 1, . . . , nT) is modelled by T
cµ = (c1 µ , . . . , ck µ , . . . , cK µ )
.
(3)
The (NR × 1) received signal vector u, defined in (1), can again be decomposed into nR antenna-specific signal vectors uν (with ν = 1, . . . , nR) of the length K + Lc, i. e., NR = (K + Lc) nR, and results in
Iteratively Detected and SVD-Assisted Broadband MIMO Schemes
T T T . u = uT 1 , . . . , uν , . . . , unR
309
(4)
By taking the (Lc + 1) non-zero elements of the resulting symbol rate sampled overall channel impulse response between the µth transmit and the νth receive antenna into account, the antenna-specific received vector uν has to be extended by Lc elements, compared to the transmitted antenna-specific signal vector cµ defined in (3). The ((K + Lc ) × 1) signal vector uν received by the antenna ν (with ν = 1, . . . , nR ) can be constructed, including the extension through the multipath propagation, as follows T uν = u1 ν , u2 ν , . . . , u(K+Lc ) ν .
(5)
Similarly, in (1) the (NR × 1) noise vector w results in T . w = w1T , . . . , wνT , . . . , wnTR
(6)
The vector w of the additive, white Gaussian noise (AWGN) is assumed to have a variance of UR2 for both the real and imaginary parts and can still be decomposed into nR antenna-specific signal vectors wν (with ν = 1, . . . , nR ) according to T wν = w1 ν , w2 ν , . . . , w(K+Lc ) ν .
(7)
Finally, the (NR × NT ) system matrix H of the block-oriented system model, introduced in (1), results in ⎤ ⎡ H1 1 . . . H1 nT ⎥ ⎢ .. .. (8) H = ⎣ ... ⎦ , . . HnR 1 · · · HnR nT and consists of nR · nT single-input single-output (SISO) channel matrices Hν µ (with ν = 1, . . . , nR and µ = 1, . . . , nT ). The system description, called spatio-temporal vector coding (STVC), was introduced by R ALEIGH [10,11]. Every of these matrices Hν µ with the dimension ((K + Lc ) × K) describes the influence of the channel from transmit antenna µ to receive antenna ν including transmit and receive filtering. The channel convolution matrix Hν µ between the µth transmit and the νth receive antenna is obtained by taking the (Lc + 1) non-zero elements of resulting symbol rate sampled overall impulse response into account and results in: h0
0
⎢ ⎢ h1 ⎢ ⎢ h2 ⎢ ⎢ .. ⎢ . ⎢ =⎢ ⎢ hL ⎢ c ⎢ ⎢ 0 ⎢ ⎢ ⎣ 0 0
h0 h1
⎡
Hν µ
h2 .. . hL c 0 0
⎤ 0 ··· 0 . ⎥ 0 · · · .. ⎥ ⎥ h0 · · · 0 ⎥ ⎥ ⎥ h1 · · · h0 ⎥ ⎥ ⎥ . h2 · · · h 1 ⎥ ⎥ ⎥ .. . · · · h2 ⎥ ⎥ .. ⎥ hL c · · · . ⎦ 0 · · · hLc
(9)
310
A. Ahrens and C. Benavente-Peces
Throughout this paper, it is assumed that the (Lc + 1) channel coefficients, between the µth transmit and the νth receive antenna have the same averaged power and undergo a Rayleigh distribution. Furthermore, a block fading channel model is applied, i. e., the channel is assumed to be time invariant for the duration of one SDM MIMO data vector. The interference, which is introduced by the off-diagonal elements of the channel matrix H, requires appropriate signal processing strategies. A popular technique is based on the singular-value decomposition (SVD) [13] of the system matrix H, which can be written as H= S · V· DH , where S and DH are unitary matrices and Vis a real-valued diagonal matrix of the positive square roots of the eigenvalues of the matrix HH Hsorted in descending order1 . The SDM MIMO data vector cis now multiplied by the matrix D before transmission. In turn, the receiver multiplies the received vector u by the matrix SH . Thereby being D and SH unitary matrices, neither the transmit power nor the noise power is enhanced. The overall transmission relationship is defined as y = SH (H · D · c + w) = V · c + w ˜.
(10)
As a consequence of the processing in (10), the channel matrix H is transformed into independent, non-interfering layers having unequal gains [14,15].
3 Quality Criteria In general, the quality of data transmission can be informally assessed by using the signal-to-noise ratio (SNR) at the detector’s input defined by the half vertical eye opening and the noise power per quadrature component according to 2
̺=
(UA ) (Half vertical eye opening)2 = , Noise Power (UR )2
(11)
which is often used as a quality parameter [9,16]. The relationship between the signalto-noise ratio ̺ = UA2 /UR2 and the bit-error probability evaluated for AWGN channels and M -ary Quadrature Amplitude Modulation (QAM) is given by [17]
2 1 ̺ PBER = 1− √ erfc . (12) log2 (M ) 2 M When applying the proposed system structure, the SVD-based equalization leads to different eye openings per activated MIMO layer ℓ (with ℓ = 1, 2, · · · , L) at the time k (with k = 1, 2, · · · , K) within the SDM MIMO signal vector according to
(ℓ,k) UA = ξℓ,k · Us ℓ , (13)
where Us ℓ denotes the half-level transmit amplitude assuming Mℓ-ary QAM and ξℓ,k represents the corresponding positive square roots of the eigenvalues of the matrix HH H. Together with the noise power per quadrature component, the SNR per MIMO layer ℓ at the time k becomes 1
The transpose and conjugate transpose (Hermitian) of D are denoted by DT and DH , respectively.
Iteratively Detected and SVD-Assisted Broadband MIMO Schemes
̺(ℓ,k) =
2 (ℓ,k) UA UR2
= ξℓ,k
(Us ℓ )2 . UR2
311
(14)
Using the parallel transmission over L ≤ min(nT , nR ) MIMO layers, the overall mean L transmit power becomes Ps = ℓ=1 Ps ℓ , where the number of readily separable layers is limited by min(nT , nR ). However, it is worth noting that with the aid of powerful non-linear near Maximum Likelihood (ML) sphere decoders it is possible to separate nR > nT number of layers [18]. Considering QAM constellations, the average transmit power Ps ℓ per MIMO layer ℓ may be expressed as [17] Ps ℓ =
2 2 U (Mℓ − 1) . 3 sℓ
(15)
Combining (14) and (15), the layer-specific SNR at the time k results in ̺(ℓ,k) = ξℓ,k
3 Ps ℓ . 2 (Mℓ − 1) UR2
(16)
In order to transmit at a fixed data rate while maintaining the best possible integrity, i. e., bit-error rate, an appropriate number of MIMO layers has to be used, which depends on the specific transmission mode, as detailed in Table 1. In general, the BER per SDM MIMO data vector is dominated by the specific transmission modes and the characteristics of the singular values, resulting in different BERs for the different QAM configurations in Table 1. An optimized adaptive scheme would now use the particular transmission modes, e. g., by using bit auction procedures [19], that results in the lowest BER for each SDM MIMO data vector. This would lead to different transmission modes per SDM MIMO data vector and a high signaling overhead would result. However, in order to avoid any signalling overhead, fixed transmission modes are used in this contribution regardless of the channel quality. The MIMO layer specific bit-error probability at the time k after SVD is given by [9] 1 2 1 − √M ̺(ℓ,k) (ℓ,k) ℓ PBER = erfc . (17) log2 (Mℓ ) 2 The resulting average bit-error probability at the time k assuming different QAM constellation sizes per activated MIMO layer is given by (k)
PBER = L
1
ν=1 log2 (Mν )
L
(ℓ,k)
log2 (Mℓ ) PBER .
(18)
ℓ=1
Taking K consecutive time slots into account, needed to transmit the SDM MIMO data vector, the aggregate bit-error probability per SDM MIMO data vector yields PBER block =
K 1 (k) PBER . K k=1
(19)
312
A. Ahrens and C. Benavente-Peces Table 1. Investigated transmission modes throughput layer 1 layer 2 layer 3 layer 4 8 bit/s/Hz 8 bit/s/Hz 8 bit/s/Hz 8 bit/s/Hz 8 bit/s/Hz
256 64 16 16 4
0 4 16 4 4
0 0 0 4 4
0 0 0 0 4
˜ 1,k b
c1,k
i
b encoder
˜ b
MUX and Buffer
Mapper
˜ 2,k b Mapper
˜ L,k b
c2,k
cL,k Mapper
Fig. 1. The channel-encoded MIMO transmitter structure
When considering time-variant channel conditions, rather than an AWGN channel, the BER can be derived by considering the different transmission block SNRs. Assuming that the transmit power is uniformly distributed over the number of activated MIMO layers, i. e., Psℓ = Ps/L, the half-level transmit amplitude Usℓ per activated MIMO layer results in 3 Ps Usℓ = . (20) 2 L (Mℓ − 1) Finally, the layer-specific signal-to-noise ratio at the time k, defined in (14), results with the ratio of symbol energy to noise power spectral density Es/N0 = Ps/(2 UR2 ) and (20) in ̺(ℓ,k) = ξℓ,k
3 Ps 3 Es = ξℓ,k . 2 L (Mℓ − 1)UR2 L (Mℓ − 1) N0
(21)
4 Coded MIMO System The channel-encoded transmitter structure is depicted in Figure 1. The encoder employs a half-rate nonrecursive, non-systematic convolutional (NSC) code using the generator polynomials (7, 5) in octal notation. The uncoded information is organized in blocks of Ni bits, consisting of at least 3000 bits, depending on the specific QAM constellation used. Each data block iis encoded and results in the block b consisting of Nb = 2 Ni +4 encoded bits, including 2 termination bits. The encoded bits are interleaved using a ˜ The encoded and interleaved bits are then random interleaver and stored in the vector b.
Iteratively Detected and SVD-Assisted Broadband MIMO Schemes y1,k Soft Demapper
˜ L(ν) a,2 (b)
y2,k
yL,k
(ν) ˜ L2 (b)
(ν−1)
−1
Le,1
(ν)
313
(ν)
L1 (b)
(b)
decoder
La,1 (b)
(ν)
L1 (i)
Fig. 2. Iterative demodulator structure
mapped to the MIMO layers. The task of the multiplexer and buffer block of Figure 1 ˜ into subvectors is to divide the vector of encoded and interleaved information bits b ˜ ˜ ˜ (b1,k , b2,k , · · · , bL,k ), each consisting of 8 bits according to the chosen transmission ˜ ℓ,k are then mapped to the QAM mode (Table 1). The individual binary data vectors b symbols cℓ,k according to the specific mapper used. The iterative demodulator structure is shown in Figure 2 [20]. When using the iteration index ν, the first iteration of ν = 1 commences with the soft-demapper delivering (ν=1) ˜ (b) of the encoded and interleaved inforthe Nb log-likelihood ratios (LLRs) L2 (ν=1) mation bits, whose de-interleaved version La,1 (b) represents the input of the convolutional decoder as depicted in Figure 2 [12,6]. This channel decoder provides the (ν=1) estimates L1 (i) of the original uncoded information bits as well as the LLRs of the Nb NSC-encoded bits in the form of (ν=1)
L1
(ν=1)
(b) = La,1
(ν=1)
(b) + Le,1
(b) .
(22)
As seen in Figure 2 and (22), the LLRs of the NSC-encoded bits consist of the re(ν=1) ceiver’s input signal itself plus the extrinsic information Le,1 (b), which is generated (ν=1)
by subtracting La,1
(ν=1)
(b) from L1
(b). The appropriately ordered, i. e. interleaved (ν=2) ˜ to the soft demapper of extrinsic LLRs are fed back as a priori information La,2 (b) Figure 2 for the second iteration.
5 Extrinsic Information Transfer Function Random variables (r.v.s) are denoted with capital letters and their corresponding realizations with lower case letters. Sequences of random variables and realizations are indicated by boldface italics letters (as B or b). Furthermore, boldface roman letters denote vectors (as B or b). The time instant is denoted with k and the layer with ℓ. The transmitted data sequence B is multiplexed onto the different used MIMO layers ℓ and results in the MIMO layer specific sequence Bℓ with ℓ = 1, 2, . . . , L. The stationary binary input sequence Bℓ = [Bℓ,1 , Bℓ,2 , . . . , Bℓ,k , . . .] consists of r.v.s Bℓ,k , where the corresponding realizations bℓ,k have an index length of 1 bit and are taken from a finite alphabet B = {0, 1}. The mapper output sequence Cℓ = [Cℓ,1 , Cℓ,2 , . . . , Cℓ,k , . . .] on the ℓ-th layer consists of r.v.s Cℓ,k , where the corresponding realizations cℓ,k have an index length of log2 (Mℓ ) bits and are taken from a finite alphabet C = {0, 1, . . . , Mℓ −1}.
A. Ahrens and C. Benavente-Peces
bℓ,k
cℓ,k Mapper
Comm. channel
A Priori channel
yℓ,k
aℓ,k
Soft Demapper
314
eℓ,k
Fig. 3. Transmission model analyzing the ℓ-th MIMO layer
The symbols cℓ,k are transmitted over independent channels resulting in the received values yℓ,k . The a priori channel, as depicted in Figure 3, models the a priori information used at the soft demapper. The sequence Aℓ = [Aℓ,1 , Aℓ,2 , . . . , Aℓ,k , . . .] with the corresponding realizations aℓ,k contains the a priori LLR information passed to the demapper. EXIT charts visualize the input / output characteristics of the soft demapper and the decoder in terms of a mutual information transfer between the data sequence Bℓ and the sequence Aℓ of the a priori LLR information at the input of the soft demapper, as well as between Bℓ and the sequence Eℓ of the extrinsic LLR at the output, respectively. Denoting the mutual information between two r.v.s X and Y as I(X; Y ) we may define for a given sequence Bℓ the quantities Iℓ,A = I(Aℓ ; Bℓ ) as well as Iℓ,E = I(Eℓ ; Bℓ ). Herein, Iℓ,A represents the average a priori information and Iℓ,E the average extrinsic information, respectively [20]. The transfer characteristic T of the soft demapper is given by Iℓ,E = T (Iℓ,A , ρ), where ρ represents the SNR of the communication channel. Analyzing the outer decoder in a serially concatenated scheme T does not depend on ρ. An EXIT chart is now obtained by plotting the transfer characteristics T for both the demapper and the decoder within a single diagram, where the axes have to be swapped for one of the constituent decoders [21] (normally the outer one for serial concatenation). Analyzing the layer specific characteristics, a MIMO-layer specific parameter α(ℓ) can be defined according to log2 (Mℓ ) α(ℓ) = , (23) R describing the fraction of the data sequence B that is transmitted over the ℓth layer, i. e. Bℓ [20]. Therein, the parameter R describes the number of transmitted bits per time interval including all L MIMO layers and results in R = L ℓ=1 log2 (Mℓ ). Hence, the mutual information for a given sequence B and the extrinsic LLR E at the output is obtained by L α(ℓ) I(Eℓ ; Bℓ ) . (24) I(E ; B ) = ℓ=1
The MIMO layer specific extrinsic LLR sequences Eℓ are multiplexed onto the sequence E , which is lead to the outer decoder [20]. Beneficial values of α(ℓ) may be chosen by ensuring that there is an open EXIT tunnel between the soft demapper transfer characteristic and the decoder transfer characteristic at a given Es /N0 value that is
Iteratively Detected and SVD-Assisted Broadband MIMO Schemes
315
Table 2. Transmission modes and corresponding α(ℓ)
8 bit/s/Hz 8 bit/s/Hz 8 bit/s/Hz 8 bit/s/Hz 8 bit/s/Hz
M1 , α(1)
M2 , α(2)
256, 1 64, 3/4 16, 1/2 16, 1/2 4, 1/4
0, 0 4, 1/4 16, 1/2 4, 1/4 4, 1/4
M3 , α(3) M4 , α4) 0, 0 0, 0 0, 0 4, 1/4 4, 1/4
0, 0 0, 0 0, 0 0, 0 4, 1/4
0
10
bit-error rate →
−2
10
−4
10
−6
10
−8
10
10
(256, 0, 0, 0) QAM (64, 4, 0, 0) QAM (16, 16, 0, 0) QAM (16, 4, 4, 0) QAM (4, 4, 4, 4) QAM
15
20
10 · log10 (Es /N0 ) (in dB) →
25
Fig. 4. BER when using the transmission modes introduced in Table 1 and transmitting 8 bit/s/Hz over frequency selective channels with Lc = 1 (two-path channel model)
close to the channel capacity bound. Analyzing the transmission modes in Table 1, the resulting values of α(ℓ) are shown in Table 2.
6 Results In this contribution fixed transmission modes are used regardless of the channel quality. Assuming predefined transmission modes, a fixed data rate can be guaranteed. The obtained uncoded BER curves are depicted in Figure 4 and 5 for the different QAM constellation sizes and MIMO configurations of Table 1, when transmitting at a bandwidth efficiency of 8 bit/s/Hz. Assuming a uniform distribution of the transmit power over the number of activated MIMO layers, it turns out that not all MIMO layers have to be activated in order to achieve the best BERs. More explicitly, our goal is to find that specific combination of the QAM mode and the number of MIMO layers, which gives the best possible BER performance at a given fixed bit/s/Hz bandwidth efficiency. However, the lowest BERs can only be achieved by using bit auction procedures leading to a high signalling overhead. Analyzing the probability of choosing specific transmission modes by
316
A. Ahrens and C. Benavente-Peces 0
10
bit-error rate →
−2
10
−4
10
−6
10
−8
10
10
(256, 0, 0, 0) QAM (64, 4, 0, 0) QAM (16, 16, 0, 0) QAM (16, 4, 4, 0) QAM (4, 4, 4, 4) QAM
15
20
10 · log10 (Es / N0 ) (in dB) →
25
Fig. 5. BER when using the transmission modes introduced in Table 1 and transmitting 8 bit/s/Hz over frequency selective channels with Lc = 4 (five-path channel model)
using optimal bitloading as illustrated in [14] it turns out that at moderate SNR only an appropriate number of MIMO layers have to be activated, e. g., the (16, 4, 4, 0) QAM configuration. Using the half-rate constraint-length Kcl = 3 NSC code, the BER performance is analyzed for an effective user throughput of 4 bit/s/Hz. The BER investigations using the NSC code are based on the best uncoded schemes of Table 1. The information word length is 3000 bits and a random interleaver is applied. In addition to the number of bits per symbol and the number of activated MIMO layers, the achievable performance of the iterative decoder is substantially affected by the specific mapping of the bits to both the QAM symbols as well as to the MIMO layers. While the employment of the classic Gray-mapping is appropriate in the absence of a priori information, the availability of a priori information in iterative receivers requires an exhaustive search for finding the best non-Gray – synonymously also referred to as anti-Gray – mapping scheme [2]. A mapping scheme optimized for perfect a priori information has usually a poor performance, when there is no a priori information. However, when applying iterative demapping and decoding, large gains can be achieved as long as the reliability of the a priori information increases with the number of iterations. Analyzing the number of activated MIMO layers, the soft-demapper transfer characteristic is depicted in Figure 6 using anti-Gray mapping on all activated MIMO layers. Assuming predefined QAM constellation sizes, the entire soft demapper transfer characteristic is well predictable by combining the single MIMO layer transfer characteristics using the parameter α(ℓ) . Using predefined QAM constellation sizes and the corresponding α(ℓ) , the resulting EXIT chart curve is depicted in Figure 7. In order to match the soft demapper transfer characteristic properly to the decoder transfer characteristic, a joint optimization of the number of activated MIMO layers as well as the number of bit per symbol has been carried out. Our results suggest that not all MIMO layers have to be activated in order to shape the soft demapper transfer characteristic
Iteratively Detected and SVD-Assisted Broadband MIMO Schemes
317
extrinsic demapper output →
1 0.8 0.6 0.4 I(E ; B), Equa. (24) I(E ; B), Simulation I(E1 ; B1 ), anti-Gray I(E2 ; B2 ), anti-Gray I(E3 ; B3 ), anti-Gray
0.2 0 0
0.2
0.4 0.6 a-priori input →
0.8
1
Fig. 6. Layer-specific transfer characteristic when using anti-Gray mapping and the (16, 4, 4, 0) transmission mode over frequency-selective MIMO links (10 log 10 (Es /N0 ) = 2 dB, Lc = 1 (two-path channel model))
extrinsic demapper output →
1
0.8
0.6
0.4
0.2
0 0
NSC code (16, 4, 4, 0) QAM (4, 4, 4, 4) QAM
0.2
0.4
0.6
0.8
extrinsic decoder output →
1
Fig. 7. EXIT chart for an effective throughput of 4 bit/s/Hz when using anti-Gray mapping on all activated MIMO layers (10 log10 (Es /N0 ) = 2 dB and Lc = 1 (two-path channel model)) and the half-rate NSC code with the generator polynomials of (7, 5) in octal notation
properly. The best uncoded solutions seems also to be useful in the coded scenario. The corresponding BER curves are shown in Figure 8 and confirm the EXIT charts results. In order to guarantee an efficient information exchange between the soft-dempapper and the decoder, i. e., an open EXIT tunnel, only an appropriate number of MIMO layers has
318
A. Ahrens and C. Benavente-Peces
(4, 4, 4, 4) QAM, 3 Iter. (16, 4, 4, 0) QAM, 3 Iter.
−1
bit-error rate →
10
(16, 4, 4, 0) QAM, 10 Iter.
−2
10
−3
10
−4
10
0
2
4
6
8
10 · log10 (Es /N0 ) (in dB) →
10
Fig. 8. BER for an effective user throughput of 4 bit/s/Hz and anti-Gray mapping in combination with different transmission modes (Lc = 1 (two-path channel model)) and the half-rate NSC code with the generator polynomials of (7, 5) in octal notation
to be activated. Using all MIMO layers for the data transmission, the information exchange between the soft-dempapper and the decoder stops relatively early, as illustrated by the EXIT chart results in Figure 7, and significant enhancements in the BER performance can’t be achieved any longer by increasing the number of iterations at low SNR. As demonstrated along this work, it is showed that an appropriate number of MIMO layers seems to be a promising solution for minimizing the overall BER characteristic.
7 Conclusions The choice of the number of bits per symbol and the number of MIMO layers combined with error correcting codes substantially affects the performance of a MIMO system. Analyzing the uncoded system, it turns out that not all MIMO layers have to be activated in order to achieve the best BERs. Considering the coded system, the choice of the mapping strategies combined with the appropriate number of activated MIMO layers and transmitted bits per symbol offers a certain degree of design freedom, which substantially affects the performance of MIMO systems. Here, using an appropriate number of MIMO layers for the data transmission seems to be a promising solution for minimizing the overall BER characteristic.
References 1. Caire, G., Taricco, G., Biglieri, E.: Bit-Interleaved Coded Modulation. IEEE Transactions on Information Theory 44(3), 927–946 (1998) 2. Chindapol, A., Ritcey, J.A.: Design, Analysis, and Performance Evaluation for BICM-ID with square QAM Constellations in Rayleigh Fading Channels. IEEE Journal on Selected Areas in Communications 19(5), 944–957 (2001)
Iteratively Detected and SVD-Assisted Broadband MIMO Schemes
319
3. Schreckenbach, F., Bauch, G.: Bit-Interleaved Coded Irregular Modulation. European Transactions on Telecommunications 17(2), 269–282 (2006) 4. McKay, M.R., Collings, I.B.: Capacity and Performance of MIMO-BICM with Zero-Forcing Receivers. IEEE Transactions on Communications 53(1), 74–83 (2005) 5. Mueller-Weinfurtner, S.H.: Coding Approaches for Multiple Antenna Transmission in Fast Fading and OFDM. IEEE Transactions on Signal Processing 50(10), 2442–2450 (2002) 6. K¨uhn, V.: Wireless Communications over MIMO Channels – Applications to CDMA and Multiple Antenna Systems. Wiley, Chichester (2006) 7. Zheng, L., Tse, D.N.T.: Diversity and Multiplexing: A Fundamental Tradeoff in MultipleAntenna Channels. IEEE Transactions on Information Theory 49(5), 1073–1096 (2003) 8. Zhou, Z., Vucetic, B., Dohler, M., Li, Y.: MIMO Systems with Adaptive Modulation. IEEE Transactions on Vehicular Technology 54(5), 1073–1096 (2005) 9. Ahrens, A., Lange, C.: Modulation-Mode and Power Assignment in SVD-equalized MIMO Systems. Facta Universitatis (Series Electronics and Energetics) 21(2), 167–181 (2008) 10. Raleigh, G.G., Cioffi, J.M.: Spatio-Temporal Coding for Wirless Communication. IEEE Transactions on Communications 46(3), 357–366 (1998) 11. Raleigh, G.G., Jones, V.K.: Multivariate Modulation and Coding for Wireless Communication. IEEE Journal on Selected Areas in Communications 17(5), 851–866 (1999) 12. Bahl, L.R., Cocke, J., Jelinek, F., Raviv, J.: Optimal Decoding of Linear Codes for Minimizing Symbol Error Rate. IEEE Transactions on Information Theory 20(3), 284–287 (1974) 13. Haykin, S.S.: Adaptive Filter Theory. Prentice Hall, New Jersey (2002) 14. Ahrens, A., Benavente-Peces, C.: Modulation-Mode and Power Assignment in SVD-assisted Broadband MIMO Systems. In: International Conference on Wireless Information Networks and Systems (WINSYS), Milan (Italy), July 06-10, pp. 83–88 (2009) 15. Ahrens, A., Benavente-Peces, C.: Modulation-Mode Assignment in SVD-assisted Broadband MIMO-BICM Schemes. In: International Conference on Wireless Information Networks and Systems (WINSYS), Milan (Italy), July 06-10, pp. 73–80 (2009) 16. Ahrens, A., Benavente-Peces, C.: Modulation-Mode and Power Assignment in Broadband MIMO Systems. Facta Universitatis (Series Electronics and Energetics) 22(3), 313–327 (2009) 17. Proakis, J.G.: Digital Communications. McGraw-Hill, Boston (2000) 18. Hanzo, L., Keller, T.: OFDM and MC-CDMA. Wiley, New York (2006) 19. Wong, C.Y., Cheng, R.S., Letaief, K.B., Murch, R.D.: Multiuser OFDM with Adaptive Subcarrier, Bit, and Power Allocation. IEEE Journal on Selected Areas in Communications 17(10), 1747–1758 (1999) 20. Ahrens, A., Ng, S.X., K¨uhn, V., Hanzo, L.: Modulation-Mode Assignment for SVDAided and BICM-Assisted Spatial Division Multiplexing. Physical Communications (PHYCOM) 1(1), 60–66 (2008) 21. Brink, S.t.: Convergence Behavior of Iteratively Decoded Parallel Concatenated Codes. IEEE Transactions on Communications 49(10), 1727–1737 (2001)
Wireless Sensor Resource Usage Optimisation Using Embedded State Predictors David Lowe, Steve Murray, and Xiaoying Kong Centre for Real-Time Information Networks University of Technology Sydney, Sydney, Australia {david.lowe,xiaoying.kong}@uts.edu.au, [email protected]
Abstract. The increasing prevalence and sophistication of wireless sensors is creating an opportunity for improving, or in many cases enabling, the real-time monitoring and control of distributed physical systems. However, whilst a major issue in the use of these sensors is their resource utilisation, there has only been limited consideration given to the interplay between the data sampling requirements of the control and monitoring systems and the design characteristics of the wireless sensors. In this paper we describe an approach to the optimization of the resources utilized by these devices based on the use of synchronized state predictors. By embedding state predictors into the sensors themselves it becomes possible for the sensors to predict their optimal sampling rate consistent with maintaining monitoring or control performance, and hence minimize the utilization of limited sensor resources such as power and bandwidth. Keywords: Wireless sensor networks, State observers, Control, Optimisation.
1 Introduction The increasing prevalence and sophistication of cheap, small, but powerful wireless sensors is creating an opportunity for improving, or in many cases enabling, the realtime monitoring and control of distributed physical systems. This is being particularly driven by work in the area of the Sensor Web [1]. However, whilst there has been substantial research into aspects such as the design of sensor communications, network topologies, sampling rate algorithms and multirate control, there has only been limited consideration given to the interplay between the data sampling requirements of the control and monitoring systems that use the sensor data, and the design characteristics of the wireless sensors themselves. In many (indeed, probably most) applications that use wireless sensors, the sensors are capable of providing much more data than is necessary, representing a significant potential inefficiency. We hypothesise that by embedding knowledge of the required control and/or monitoring performance into the sensors themselves, along with a state estimator, sensors could predict each subsequent data sampling point that would meet those performance requirements. The sensor would then self-trigger at that time. This new approach will allow wireless sensors to be tuned to minimise usage of sensor network resources (such as node energy and wireless communication bandwidth) and hence sensor lifetimes. M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 320–331, 2011. c Springer-Verlag Berlin Heidelberg 2011
Wireless Sensor Resource Usage Optimisation Using Embedded State Predictors
321
S1
soil moisture
m1
S2
m2
S3
m3 mon
t1
t2
t3
time
Fig. 1. Soil moisture state estimation and prediction of required sampling points
To illustrate this concept consider, for example, a simplified application involving the embedding of a network of wireless sensors measuring soil parameters (moisture, salinity, pH) as input to an irrigation control system for an agricultural area under irrigation. Considering figure 1, an initial sample is taken by a particular sensor at time t1 , resulting in a soil moisture level of m1 . Using a state estimator we can then estimate the band within which the system state may be predicted to exist over time in the absence of further sensing data (i.e. the shaded region around the line running from S1 to S2 ). In the absence of further data samples this band will tend to diverge as a result of model errors, measurement errors, and disturbance inputs to the system - all of which must be modelled. An H∞ filter is well suited to this particular problem, given that it makes no assumptions about the characteristics of the measurement noise, elegantly accommodates model errors, and allows us to determine the worst-case estimation error. Being able to predict the upper and lower bounds on the state estimation then allows us to determine the earliest time at which the state may reach a critical decision point. In the above example, the estimated state could potentially reach mon - the soil moisture level at which the irrigation needs to be turned on - as early as t2 , and hence the next sample is scheduled for this time. The result of the sampling is that the state estimate is corrected to m2 , and the process continues again, with the next sample then scheduled for t3 . The above scenario is typical of numerous applications involving embedded wireless sensor devices. The sensor devices are designed to be extremely low power, thereby enabling them to operate for considerable periods (often years) off a single battery cell. This low power usage is achieved through having the sensors operate on a very low duty cycle, where they spend most of the time in an extremely low-power ”sleep” mode, only waking periodically to take a sensor reading (and transmit it if necessary). By incorporating a system model into a state predictor that is embedded into the sensors it becomes possible for the sensors to predict the points at which future sensors will be required in order for the predicted state of the system to remain within acceptable error bands. In this paper we consider an architecture based on synchronized state predictors that addresses these issues and facilitates the optimization of resource usage in web-enabled
322
D. Lowe, S. Murray, and X. Kong
sensor networks. In the following section we describe related work, considering in particular both the growing trend to embed sensors directly into control and monitoring applications, as well as approaches to data monitoring optimization through the use of a classical technique from control theory - state observers. In section 3 we provide an overview of our proposed architecture, and describe how it addresses the design constraints. In section 4 we then outline a prototype evaluation that demonstrates the performance gains that can be achieved through our proposed approach. Finally, in section 5, we consider the implications of this approach and outline directions for future work.
2 Background As was described above, technological advancements in the area of embedded processors, lightweight intelligent sensors, and wireless communications have led to an increasing availability and sophistication of wireless sensors that can be used in gathering rich real-time data from physical environments [2]. This data can be used for monitoring and control applications as diverse as environmental monitoring, traffic management, building systems control, power usage tracking, irrigation control, and transport infrastructure monitoring, amongst many others [3]. The information provided by sensors can be incredibly diverse: location, speed, vibration, temperature, humidity, light, sound, pollutants, etc. This information, in turn, enables extremely rich monitoring and control applications, many of which however only become feasible when the sensors are small and cheap - which in turn places constraints on the resources available to the sensors. As an example, consider the following scenario: a building incorporates a network of temperature and humidity sensors to support monitoring of the building environment. To enable them to be rapidly and cheaply deployed, without requiring cabling, they are designed as a Zigbee wireless mesh network [4] with each sensor node being battery powered. A significant design objective on the sensor modules is therefore to minimize their power utilization in order to maximize their battery life. As an example, a Zigbee module might use as little as 1µ A when in deep sleep, 10mA when operating, and 40mA when transmitting or receiving data. A typical scenario would have each set of samples requiring the module to be awake for 5ms and transmitting for 1ms . If it sampled continuously, a standard high-performance Lithium “coin” battery would last approximately 50 hours. Conversely, transmitting a data sample only every 10 seconds, and sleeping the remainder of the time would give a 1:2000 duty cycle and an operating life of over 10 years (ignoring shelf-life characteristics of the battery, which can vary enormously, from less than a year to >> 10 years, depending on the environment and the battery type). Note that other factors, such as the requirements for data routing, will moderate these extreme examples somewhat. Power minimization in turn requires the module to minimize the time in which it is operational. Similar resource constraints exist in terms of minimization of communication bandwidth, CPU cycles, and other resources. For many applications, access to the sensor nodes after deployment is difficult or costly, and hence replacement of sensor power sources (typically batteries) is either not feasible or uneconomic. Available communication bandwidths are often limited,
Wireless Sensor Resource Usage Optimisation Using Embedded State Predictors
323
particularly where sensor networks are very dense. For these reasons a major design challenge for wireless sensors is minimisation of data flow and/or energy use. In particular, there has been considerable research into algorithms for minimising energy in datagathering [5,6]. This typically involves determination of both the data routing from the distributed sensors to the data sinks (a relatively complex problem in mesh networks), as well as leveraging correlations between the data collected by each sensor in order to perform rate adaptation and minimise the total data flows. Approaches vary from relatively simplistic forms of delta encoding to sophisticated distributed neighbourhood aggregation. From a networking perspective, there is substantial research considering the impact of network performance on the system being monitored or controlled [7], as well as how to best utilise the available network capability. For example, [8] considers the level of data loss in a network that can be tolerated before a controlled system becomes unstable. Conversely, [9,10] describe allocating limited bandwidth to different data streams in order to maximise system performance. A parallel thread of research has looked at limitations on the capabilities of the devices themselves. For example, [11] recognised that embedded devices have limited processing capability and provided a model, based on minimising a global cost function, for balancing the allocation of embedded device processor time to multiple competing tasks. The previous research avenues have focused on optimally using the available resources to achieve the best possible performance. In many cases, however, the objective is to minimise the use of resources whilst still meeting performance requirements. A good example of this approach is work by Sun et al [12], which considered the minimum data rates in a networked control system that are required to ensure system stability. Whilst providing a valuable design guideline, this approach still assumes a constant sampling rate - whereas previous research by the authors [13] has shown that this is an assumption that need not be maintained. The flexibility of sensors provides us with an opportunity to reduce data rates by dynamically changing sensor sampling rates depending upon prevailing real-time circumstances. It may be appropriate under relevant circumstances to slow the sampling rates for some or all sensors or to sub-sample the sensors spatially - thereby conserving power and bandwidth. A key research question is therefore how we optimally sample to minimise the sensor resource utilisation whilst maintaining appropriate system performance. Control theory provides a useful way forward in addressing this problem. There has been considerable research, dating back to the 1950’s, into multirate control systems [14,15]. Conventional digital control theory assumes a single constant sampling rate across the digital system. Multirate theory, however, models the dynamics of digital control systems in the presence of multiple different sampling rates. Much of this theory has been driven by a recognition that different devices within a hybrid system will inherently operate at different sampling rates [16]. De La Sen [17] has considered properties such as observability and stabilisability of these systems - though not in the context of wireless sensors. Kawka and Alleyne [18] considered control performance (particularly stability and disturbance rejection) in a wireless network through modelling data losses as a random variation to sampling rates.
324
D. Lowe, S. Murray, and X. Kong
Actual System u(k) (k) k input
y(k) k Plant output
G(z)
Observer Compensator
L(z) ()
+
x(k) k Plant state (not observable)
-
Modeled System + +
GEst(z)
yO(k) Observer output xO(k) k Observed state
Fig. 2. General form of a Luenberger observer
The above approaches all rely on a fixed sampling period, but adaptively dropping selected samples. An alternative that has shown recent promise in lowering overall sensor sampling rates is to move from periodic triggering (i.e. taking a sample every time T) to event-based triggering (i.e. only sampling the sensor when certain circumstances are met in the system) [19]. However, as acknowledged by Velasco, Fuertes and Marti [20], recognising the circumstances that initiate sampling can often require additional hardware or data. An effective compromise is to use self-triggering where, after each sample is taken, the system calculates the minimum time allowable before the next sample must be taken in order to retain stability. The work by Velasco et al has shown that self-triggering can significantly reduce average sample rates required by a system. Whilst [19] provides a good starting point, the trigger for each sample is the estimated error exceeding a given threshold. This approach is limited insofar as the error that can be tolerated will vary over time. We contend that an improved approach triggers the sample when the ”worst case” system state reaches a given threshold. This, in turn, can be determined through the use of state estimators. A state estimator is a model of a real system that gives us access to an estimate of the internal state of the system. As shown in Figure 2, with a Luenberger Observer [21], the observable outputs of the physical system are compared to the equivalent outputs from the state observer and used to correct any errors in the observer using a compensator. Traditionally state observers have been used to gain access to estimates of the variables which determine the state of the system when these variables cannot be directly accessed (in Figure 2, we would use xo (k) in our system control, rather than x(k), which cannot be directly accessed). It is, however, equally applicable to use estimates, when gaining access to the actual system variables is inappropriate due to resource requirements, as may be the case with sensor networks. We therefore propose an architecture which utilizes synchronized distributed estimators, so that each sensor includes an embedded copy of the state estimator, and can hence determine locally on the sensor if the potential estimation error of the state is growing beyond a specified performance threshold and requires correction. The sensors therefore only provide sensor samples when it is required to keep the control or monitoring system at a suitable performance level. More simplistic versions of this approach
Wireless Sensor Resource Usage Optimisation Using Embedded State Predictors
325
Sensor Actual System u(k) input
y(k) Plant output
G(z) Observer Compensator
+ +
+
L(z)
yComp(k)
Quantizer
yErr(k)
-
Modeled System
GEst(z)
uO(k)
yO(k) Observer output
Modeled System
Data Aggregator / Web Server
GEst(z)
yO(k) k Observer output xO(k) Observed state
Fig. 3. Distributed Synchronised State Observer Architecture
have been used previously. For example, numerous approaches have adopted variations of using constant sampling rates in the sensors, but only transmitting sensor data when the change exceeds some threshold (a form of adaptive delta modulation - see [22,23]). A state observer however has the potential to allow a much more intelligent variation of the transmission thresholds based on a system model.
3 Predictor Architecture 3.1 System Architecture In order to demonstrate our approach we have adopted the architecture shown in Figure 3. In this architecture, we implement as part of the sensor module a slightly modified Luenberger observer, with a quantizer included in the compensation so that small corrections to the modeled system are ignored, until the model error reaches a level that requires correction. This minimizes the data flow associated with the correction to the modeled system. An identical model of the system is then incorporated into a data aggregator, which makes the data available either for monitoring (posisbly through a Web portal) or control. The consequence of this is that the communication that needs to occur from the sensor nodes is reduced. 3.2 Design Modeling The proposed architecture can be modeled as follows. The standard form for the linear relation, at time k, between the input vector u(k), system state vector x(k) (which
326
D. Lowe, S. Murray, and X. Kong
may not be directly measurable) and the vector of observable outputs y(k) in a discrete system is: x(k + 1) = A x (k) + Bu(k) (1) y (k) = Cx (k) + D u(k) Where A , B, C and D are matrices that define the model of the system dynamics, and are obtained through conventional control system modelling techniques. Assuming that we are able to construct a sufficiently accurate representation of this system, then for a normal Luenberger observer we have: x o (k + 1) = A x o (k) + Buo (k) y o (k) = Cx o (k) + D uo (k)
(2)
where x o and y o are the estimates of the system state and the system output, and uo is the input to the observer. But: uo (k) = Q (u(k) + y comp (k)) = Q (u(k) + Ly err (k)) = Q (u(k) + L(y (k) − y o (k)))
(3)
Where Q is the quantization function and L is the Luenberger compensator matrix. (Note that the derivation of these is beyond the scope of this paper, but is well covered in most control texts). Therefore, merging equation 2 and 3 gives: x o (k + 1) = A x o (k) + Q Bu(k) + Q BL(y (k) − y o (k))
(4)
For the observer to provide an accurate representation of the system state, we need the observer state error to approach zero as k → ∞. i.e.: e (k) = x o (k) − x (k) e (k + 1) = x o (k + 1) − x (k + 1) = A x o (k) + Q Bu(k) + Q BL(y (k) − y o (k)) − A x (k) − Bu(k) ≈ A (x o (k) − x (k)) − Q BL(y o (k) − y (k)) = A (x o (k) − x (k)) − Q BLC(x o (k) − x (k)) = (A − Q BLC)e (k)
(5)
The observer will therefore converge when the eigenvalues of A − Q BLC all have negative real values. However, in the case of typical environmental monitoring, we will be sensing a system that we are not controlling. We would therefore treat u(k) as a disturbance input which we cannot directly monitor. For example, if we are designing a Web interface to a system that monitors temperatures throughout a building, then someone opening a window may lead to the entry of cold air, and hence temperature fluctuations. Given that our only information is the sensor data, we therefore can consider how rapidly our observer can track these variations. In this case: uo (k) = Q Ly err (k) = Q L(y (k) − y o (k))
(6)
Wireless Sensor Resource Usage Optimisation Using Embedded State Predictors
327
And therefore: xo (k + 1) = A xo (k) + Q B L(y(k) − yo (k))
(7)
e(k + 1) ≈ (A − Q B LC )e(k) − B u (k)
(8)
And hence: The stability criteria remain the same, but we can now determine the responsiveness of the system to disturbance rejection, and hence the ability of the observer to track variations. Appropriate selection of the model parameters, as well as the observer compensator and quantizer, will therefore allow us to select the minimal data stream rate between the sensor observer and the Web client observer that achieves the desired observer accuracy. Where applications require less accuracy, we can tune the compensator and quantizer to reduce the data rates. 3.3 Design Considerations Given the baseline architecture, we can now move to consideration of the issues this raises, and how it relates to the design of monitoring systems that incorporate predictors into the sensors. In particular: 1. How accurately can we model the system being monitored, and what are the consequences (in resource utilization) of inaccuracies in the model? 2. What are consequences for sensor and client synchronization of typical network impacts on the data stream - i.e., network delays, packet drops, etc? 3. What additional information needs to be passed between the sensor observer and the monitoring or control system in order to ensure that synchronization is retained in the event of network delays, packet drops and other forms of disturbances? 4. What additional processing burden does the implementation of the observer place on the sensor module, and how do these additional resources compare to those saved through possible reductions in the data stream which must be communicated? In the latter part of this paper we will focus on a consideration of the last of these questions, since an initial demonstration of the potential resource savings is a crucial first step in justifying the approach. It is only worth deeper analysis of issues such as model robustness and error correction if the approach clearly shows merit in terms of reducing resource utilization in wireless sensors (or conversely, enabling accuracy improvements for a given resource usage level). Consideration of the first three of these design considerations is ongoing and will be reported in subsequent publications.
4 Performance Evaluation In order to evaluate the approach - and in particular the potential ability to optimize the trade-off between accuracy of the monitoring of the estimator-enabled sensor and the resources required for this monitoring, we have implemented (in MATLAB) a simulation of a simple thermal monitoring system and associated sensor configuration.
328
D. Lowe, S. Murray, and X. Kong
This initial implementation (which is much simpler than that which would typically exist in a real system - but nevertheless allows evaluation of the approach) comprised a simulation of a model of a two-room house, which had a specified thermal resistance between the rooms and between each room and the outside environment. Both rooms also had substantial thermal capacitance. The system state could therefore be modelled by the following variables: T x(k) = TE (k) T1 (k) T˙1 (k) T2 (k) T˙2 (k)
(9)
Where TE (k) is the external temperature, T1 (k) and T2 (k) are the temperatures in the two rooms, and T˙1 (k) and T˙2 (k) are the corresponding rates of temperature change. Only two system values are actually measured directly by sensors - the external temperature TE (k) and the temperature in one of the rooms T2 (k) - so y(k) is given by: T y(k) = TE (k) T2 (k)
(10)
4.1 Implementation The aim was to allow monitoring of these temperatures whilst minimising the sensor data rates. In this implementation we have constructed the client observer using Javascript embedded within a Web page. The system state output from the client observer (i.e. estimates of the temperature values and rates of temperature change) is used to support rendering of, and interaction with, the sensor data. The Javascript also uses an AJAX-like approach to query the Web server for any new quantized observer compensator data which, when available, is used as input to the Web client observer in order to correct its modelled state. Whilst this example is relatively simplistic, it does demonstrate the general approach and allows evaluation of the performance. 4.2 Data Flow Improvements The client-side implementation allows evaluation of the improved interactivity that is enabled by including the state observer directly within the web pages (such as zooming into sensor trend data or interpolating spatially between sensor values). The more substantial benefits however are potentially achieved through reduction in the sensor data rates. In order to evaluate this, we analysed the outputs of the simulated system under varying circumstances, and in particular considered the data transmissions associated with the quantized observer compensator data that are required to correct both the sensor observer and the client observer. In our simulated system we introduced various disturbances to the system (equivalent to sudden temperature variations that were not predicted by the simple model used), and looked at the level of data that was required to be transmitted by the sensor in order to retain synchronization between the estimator built into the sensor and the Web client estimator. We also considered the implications on these data flows of inaccuracies in the observer model.
Wireless Sensor Resource Usage Optimisation Using Embedded State Predictors
329
Table 1. Typical data rates associated with different configurations of simulated temperature monitoring Configuration Baseline system with no observers No sensor-side or client-side observer, and system transmits raw sensor data from both temperature sensors (rate=1 sample/sec) System with implemented observers No disturbances, TE (k) stable, observer is completely accurate (Note: resynchronization transmissions occur every 10 minutes) No disturbances, TE (k) stable, observer has minor inaccuracies that lead to drift No disturbances, TE (k) stable, observer has major inaccuracies that lead to drift External temperature TE (k) sinusoidally varying by 10 degC with 24 hour period, observer has minor inaccuracies that lead to drift Internal temperature T2 (k) varying in a square wave by 10 degC with a 2 minute period
Data Rate (transmits/day) 86,400
144 (0.2%) 462 (0.5%) 2,712 (3.1%) 3,842 (4.4%)
7,563 (8.8%)
In our scenario, the base sampling rate was 1 sample (from each sensor) per second. A typical Zigbee-based single temperature sensor that immediately transmitted each sensor value would operate on the following 1 second cycle: – Reading sensor + housekeeping: 10mA, 0.4 mSec – Transmitting data: 40mA, 1 mSec – Asleep: 0.001mA, remainder of time Giving an average power usage of 0.045mA (or approximately 92 days from a 100mAh battery). The data rate required with the state observer implemented depended upon a number of factors, but a typical scenario would reduce the transmitted data to an average of approximately one sample every 15 seconds (though, as observed above, this was highly dependant upon the volatility of the data and the accuracy of the model). The additional processing time to implement the observer in the sensor module will depend substantially upon the particular module used, though a quick prototype on a Jennic JN5139 Zigbee module indicated that the observer could be implemented in approximately 140 Sec per cycle. This gives: – Reading sensor + housekeeping: 10mA, 0.54 mSec. – Transmitting data: 40mA, 1 mSec (every 15th sample) – Asleep: 0.001mA, remainder of time This gives an average power usage of 0.009mA (or approximately 460 days on a 100mAh battery) - a very significant improvement. In cases where the sensor data is very volatile, and the observer is not able to track, and hence there is absolutely no reduction in sensor data that is being transmitted, the increased power usage due to the inclusion of the observer minimal - a 3% increase from 0.045mA to 0.046mA.
330
D. Lowe, S. Murray, and X. Kong
5 Conclusions In this paper we have considered an architectural model that minimises resource usage in wireless sensors through the inclusion into the sensor of a state estimator that determines when the possible state estimation error will exceed a specified threshold, and uses this determination to trigger the next sensor sampling event. Whilst still preliminary, our initial results have demonstrated that significant gains may be possible in terms of minimizing resource utilization within the sensors (by limiting the data that has been wirelessly transmitted - potentially at significant power cost) and potentially also improving the interactivity of the client-side experience (though this needs further consideration). Further work on the development of this approach will consider the extent to which we can construct useful models of the dynamics of the physical systems being monitored by the sensors, and the implications of these models as the sensors become more distributed. Further work will also consider how reliably the multiple observers can remain synchronized in the presence of network delays, data loss, etc. Finally, we are also constructing a more substantial physical implementation of a sensor network which can be used as a test bed environment to validate our model simulations. Acknowledgements. The authors wish to acknowledge the Centre for Real-Time Information Networks (CRIN) at the University of Technology, Sydney, in supporting this research project.
References 1. Delin, K.A.: The sensor web: A macro-instrument for coordinated sensing. Sensors 2, 270–285 (2002) 2. Baronti, P., Pillai, P., Chook, V.W.C., Chessa, S., Gotta, A., Hu, Y.F.: Wireless sensor networks: A survey on the state of the art and the 802.15. 4 and zigbee standards. Computer Communications 30, 1655–1695 (2007) 3. Li, Y., Thai, M.T., Wu, W.: Wireless Sensor Networks and Applications. Springer, Heidelberg (2008) 4. The zigbee alliance (2008), http://www.zigbee.org/en/index.asp 5. Anastasi, G., Conti, M., Francesco, M.D., Passarella, A.: Energy conservation in wireless sensor networks: A survey. Ad Hoc Networks 7, 537–568 (2009) 6. Yang, Z., Liao, S., Cheng, W.: Joint power control and rate adaptation in wireless sensor networks. Ad Hoc Networks 7, 401–410 (2009) 7. Nair, G.N., Fagnani, F., Zampieri, S., Evans, R.J.: Feedback control under data rate constraints: An overview. Proceedings of the IEEE 95, 108–137 (2007) 8. Estrada, T., Antsaklis, P.: Stability of model-based networked control systems with intermittent feedback. In: Proceedings of the 15th IFAC World Congress, Seoul, Korea (2008) 9. Guo, G., Liu, X.P.: Observability and controllability of systems with limited data rate. International Journal of Systems Science 40, 327–334 (2009) 10. Hristu-Varsakelis, D., Zhang, L.: Lqg control of networked control systems with access constraints and delays. International Journal of Control (81), 1266–1280 11. Eker, J., Hagander, P., Arzen, K.: A feedback scheduler for real-time controller tasks. Control Engineering Practice 8, 1369–1378 (2000)
Wireless Sensor Resource Usage Optimisation Using Embedded State Predictors
331
12. Sun, Y.L., Ghantasala, S., El-Farra, N.H.: Networked control of spatially distributed processes with sensor-controller communication constraints. In: 2009 American Control Conference, St Louis, MO, USA, pp. 2489–2494. IEEE, Los Alamitos (2009) 13. Lowe, D., Murray, S.: Wireless sensor network optimisation through control-theoretic adaptation of sample rates. In: First International Conference on Sensor Network and Applications, SNA 2009, pp. 73–78 (2009) 14. Ding, F., Chen, T.: Modeling and identification for multirate systems. Acta Automatica Sinica 31, 105–122 (2005) 15. Sezer, M.E., Siljak, D.D.: Decentralized multirate control. IEEE Transactions on Automatic Control 35, 60–65 (1990) 16. de la Sen, M.: The reachability and observability of hybrid multirate sampling linear systems. Computers and Mathematics with Applications 31, 109–122 (1996) 17. De la Sen, M.: Algebraic properties and design of sampling rates in hybrid linear systems under multirate sampling. Acta Applicandae Mathematicae 72, 199–245 (2002) 18. Kawka, P.A., Alleyne, A.G.: Stability and feedback control of wireless networked systems. In: Proceedings of the 2005 American Control Conference, Portland, OR, USA, AACC, pp. 2953–2959 (2005) 19. Tabuada, P.: Event-triggered real-time scheduling of stabilizing control tasks. IEEE Transactions on Automatic Control 52, 1680–1685 (2007) 20. Velasco, M., Fuertes, J., Marti, P.: The self triggered task model for real-time control systems. In: 24th IEEE Real-Time Systems Symposium (2003) 21. Ellis, G.: Observers in Control Systems: A Practical Guide. Academic Press, London (2002) 22. Ishwar, P., Kumar, A., Ramchandran, K.: Distributed sampling for dense sensor networks: A” bit-conservation principle”. In: Proceedings Of The Annual Allerton Conference On Communication Control And Computing, vol. 41, pp. 80–89. Springer, Heidelberg (2003) 23. Li, H., Fang, J.: Distributed adaptive quantization and estimation for wireless sensor networks. IEEE Signal Processing Letters 14, 669–672 (2007)
A Self-configuring Middleware Solution for Context Management Tudor Cioara, Ionut Anghel, and Ioan Salomie Computer Science Department, Technical University of Cluj-Napoca 15 Daicoviciu Street, 400020 Cluj-Napoca, Romania {Tudor.Cioara,Ionut.Anghel,Ioan.Salomie}@cs.utcluj.ro
Abstract. This paper proposes a self-configuring middleware that uses a context management infrastructure to gather context data from various context sources and generate/update a run-time context representation. The high demand for reducing the context representation management complexity and ensuring a high tolerance and robustness, lead us to considering the selfconfiguring autonomic computing paradigm for the context acquisition and representation processes. The middleware defines three main layers: the acquisition layer that captures the context data from real world contexts, the context model layer that represents the context data in a programmatic manner and the context model management infrastructure layer. The middleware continuously monitors the real context to detect context variations or conditions for updating the context representation. The proposed middleware was tested and validated within the premises of our Distributed Systems Research Laboratory smart environment. Keywords: Autonomic context management, Self-configuring, Middleware, Context model.
1 Introduction and Related Work An important challenge in developing context aware systems is the dynamic nature of their execution environment, which makes the process of context information acquisition and representation extremely difficult to manage. During the context information acquisition process, the sources of context information (e.g. sensors) can fail or new context information sources may be identified. The context acquisition and representation processes need to be reliable and fault tolerant. For example, a context aware system cannot wait indefinitely for an answer from a temporary unavailable context resource. On the other hand, many times the payoff for not taking into consideration the new available context resources can be very high. To provide an efficient context information management, it is necessary to introduce some degree of autonomy for the context acquisition and representation processes. Another important challenge in the context aware systems development is the task of assigning the context management responsibility. Current approaches put the system developers in charge with the context management process, making system M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 332–345, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Self-configuring Middleware Solution for Context Management
333
development extremely complicated. Our vision is that a third party context management infrastructure must deal with the processes of context information acquisition and representation. This paper offers a solution for these challenges by introducing a self-configuring middleware that uses a context management infrastructure to gather context information from various context sources and generate a run-time context representation. Therefore, the context management processes are transparent for the context aware systems developers, allowing them to concentrate on designing and implementing the system desired functionality. The research related to the autonomic context management is focused on two major directions (i) the development of models and tools for acquiring and formally representing the system execution context and (ii) the development of models and techniques for analyzing, processing and managing the context representation without human intervention. The most important research problems related to context information acquisition are to identify the features defining the system execution context [1] and to define models for capturing context features specific data [2]. In the domain literature [3, 4], several system execution context features are considered such as: spatiotemporal (time and location), ambiental and facility (the system devices and their capabilities), user-system interaction, system internal events, system life cycle, etc. Regarding context representation, generic models aiming at accurately describing the system execution context in a programmatic manner are proposed. In [5], the authors propose the use of key-value models to represent the set of context features and their associated values. Markup and object oriented models [6, 7] are also used to structure and represent the context information. In [8], context features are represented as ontological concepts in design time and instantiated during run-time with sensor captured values. The main drawback of these approaches is the lack of semantic information encapsulated in the context representation which makes difficult the process of inferring new context related knowledge. Our paper overcomes these deficiencies by using the set-oriented and ontology based RAP context model [9] to represent the context information in a programmatic manner. The set representations of the RAP context model are used by the context management middleware to detect the context changes while the ontology representation is used to infer new context related information through reasoning algorithms. In the context management research direction, the efforts are concentrated on developing models and techniques for: (i) keeping the context representation consistent with the real context and (ii) processing and analyzing the context representation for inferring new context related knowledge and evaluate the context changes. To ensure the consistency of context representation, models and tools that allow for the automatic discovery, installation and configuration of new context information sources are proposed. In [10], the authors describe models for capturing and updating the context information based on the information type. Fournier [11] defines reusable components for updating the context specific data. These components provide stable communication channels for capturing and controlling context specific data. In [12], the development of context guided behavioral models, allowing the detection of only those context data variations that lead their behavior modification, is discussed. The main disadvantage of these approaches is the lack of
334
T. Cioara, I. Anghel, and I. Salomie
efficiency for the context management process that is rather static and difficult to adapt to context changes. There is a high demand for reducing the context model management complexity while ensuring a higher tolerance and robustness, leading to the consideration of the self-configuring autonomic computing paradigm [13]. The specification and representation of configuration, discovery and integration requirements of resource components have been identified as main research problems [14]. In [15], a model for self-configuring the new added components based on policies is proposed. The self-configuring policies are stored into a repository, which is queried when a new component is added. In [16], the authors present an autonomic context adaptive platform based on the closed loop control principle. The novelty of this proposal consists in defining and using the concept of application-context description to represent the context information. This description is frequently updated and used for self-configuring and taking adapting decisions. For the context processing and analyzing research direction, models and techniques that aim at determining and evaluating the context changes are proposed. These models are strongly correlated with the context representation model. In [17], fuzzy Petri nets are used to describe context changing rules. Data obtained from sensors, together with user profiles and requests represent the input data for the reasoning mechanism. Context analyzing models based on reasoning and learning on the available context information are presented in [19, 20]. Context changing rules can be described using natural language [18] or first order logic and evaluated using reasoning engines. The main contribution of our approach is the definition of a self-configuring middleware targeting an efficient and autonomic context management. The fundamental element of this middleware is our RAP context model which uses the concepts of context Resources, Actors and Policies to formally represent specific context data. The context model management infrastructure is implemented by using BDI (Believe, Desire, Intention) agents [21] that generate and administrate the context model artifacts at run time. The middleware self-configuring feature is implemented by monitoring and evaluating the environment changes in order to keep updated the context artifacts. The proposed middleware was tested and validated using our Distributed Systems Research Laboratory [22] as a smart space infrastructure. The rest of the paper is organized as follows: in Section 2, the middleware architecture is presented; Section 3 details the self-configuring enhanced middleware; Section 4 shows how the middleware is used to manage the context representation of an intelligent laboratory environment while Section 5 concludes the paper and shows the future work.
2 The Middleware Architecture The middleware architecture defines three main layers (see Fig. 1): the acquisition layer that captures the context information from real world contexts, the context model layer that represents the context information in a machine interpretable manner and the context model management infrastructure layer. In the following sections, we detail each of the three middleware architectural layers.
A Self-configuring Middleware Solution for Context Management
335
Fig. 1. The middleware conceptual architecture
2.1 The Context Acquisition Layer The context acquisition layer collects information from various context sources (sensor, intelligent devices, etc.) and makes it available to the context model layer (see Fig. 2.a) through a Context Acquisition API. To make sensor information visible to the upper layers in an independent way, we have used the web services technology. Each sensor has an attached web service for exposing its values. The structure of the Context Acquisition API is presented in Fig. 2.b. The communication between a sensor attached web service and the Context Acquisition API is managed by the WSClient class. It provides methods that: (i) builds a SOAP request, (ii) sends the request to the web service and (iii) waits for the sensor value response. From middleware perspective, the context acquisition layer defines both push and pull mechanisms for sensor information retrieval. The push mechanism uses event listeners for gathering context data from sensors while the pull mechanism uses a query based approach that allows the context data to be provided on demand. The pull information retrieval mechanism is implemented in the SensorTools class by defining a method that queries a specific web service to obtain the sensor value. For the push mechanism, the Observer design pattern is used. A SensorWSReader instance must be created first by specifying the URL of the web service and the time interval at which the sensor data will be updated. The SensorWSReader instance also contains a list of listeners that are notified when a sensor value has changed. The listeners are created by the middleware upper layers by extending the AbstractSensorListener class. To verify the sensor value, separate threads that continuously send requests to the web service are created using the WSReaderThread.
336
T. Cioara, I. Anghel, and I. Salomie
a)
b)
Fig. 2. (a) The context data retrieval flow and (b) the Context Acquisition API class diagram
2.2 The Context Model Layer To represent the real world context in a programmatic manner, the RAP context model is used. The RAP model represents the context information as a triple: C = , where R is the set of context resources that captures and /or processes context information, A is the set of actors which interact with context resources in order to satisfy their needs and P is the set of real world context related policies. The set of context resources R is split in two disjunctive subsets: RE - the set of environment context resources and RA - the set of actor context resources. The accurate representation of the real world contexts is achieved by defining the artifacts of (see Fig. 3a): specific context model CS, specific context model instance CSI and context – actor instance CIat. The specific context model CS = maps the context model onto real contexts and populates the context model sets with context specific actors, resources and policies. A specific context model instance CSIt = < RSIt, ASIt, PSIt > contains the set of context resources with which the middleware interacts, together with their values in a specific moment of time t. The context – actor instance CIat = contains the set of context resources with which the actor can interact, together with their values in a specific moment of time t. A context – actor instance represents the projection of the specific context model instance onto a certain actor. The RAP model also offers an ontological representation of the context model artifacts, which allows for learning and reasoning processes in order to obtain context knowledge (Fig. 3b). The relationships between the R, A and P context model elements are represented in a general purpose context ontology core. The specific context model concepts are represented as sub trees of the core ontology. A context situation or a context instance is represented by the core ontology together with the specific context model sub trees and their individuals in a specific moment of time.
A Self-configuring Middleware Solution for Context Management
a)
337
b)
Fig. 3. The RAP context model context representation: (a) set-based and (b) ontology-based
The set-based and ontology-based context representations are equivalent and need to be kept synchronized. The set-based context representation is used to evaluate the conditions under which the context management agents should execute self-* processes in order to enforce the autonomic properties at the middleware level. The ontology-based model uses reasoning and learning processes for generating new context knowledge. 2.3 The Context Model Management Infrastructure Layer The context model management infrastructure layer is based on four cooperative BDI type agents: Context Model Administering Agents, Context Interpreting Agents, Request Processing Agents and Execution and Monitoring Agents. The Context Model Administering Agent (CMA Agent) is the manager of the specific context model. Its main goal is to synchronize RAP context model artifacts with the real context. This agent is also responsible for the negotiating processes that take place when an actor or resource is joining the context. The Context Interpreting Agent (CI agent) semantically evaluates the information of a context instance and identifies the context instance “meaning”. The Request Processing Agent (RP agent) processes the actor requests. This agent identifies and generates the action plans that must be executed for serving an incoming request. The RP agent uses the specific context model instance to identify / generate the adequate plan to be executed by the Execution and Monitoring Agent. The Execution and Monitoring Agent (EM agent) executes the action plans received from the RP agent using the available services. After mapping the action plans onto services, a plan orchestration is obtained and executed using transactional principles. The context management infrastructure agents are implemented using the Java Agent Development Framework platform [23]. When the middleware is deployed, the CMA agent is the first running agent. It instantiates the CI, RP and EM agents and sends them the context representation.
338
T. Cioara, I. Anghel, and I. Salomie
3 Enhancing the Middleware with Self-configuring Capabilities The middleware context acquisition and representation processes need to be reliable and fault tolerant because the context resources can fail or new resources may be identified at run-time. Consequently, the context representation constructed by de middleware needs to accurately reflect the real context. To provide an efficient, fault tolerant and robust context management, the middleware is enhanced with selfconfiguring properties. The self-configuring property is enforced by monitoring the real world context to detect context variations or conditions for which the context artifacts must be updated. We have identified three causes that might generate context variation: (1) adding or removing context elements (resources, actors or policies) to / from the context, (2) actors’ mobility within the context and (3) changes of the resources property values (mainly due to changing the sensors’ captured values). In the following sections we discuss each of the context variation sources targeting to determine: (i) the context variation degree and (ii) the triggering condition of the self-configuring process. 3.1 Context Variation Generated by Adding or Removing Context Elements During the context data acquisition process, the sources of context data can fail or randomly leave / join the context. These changes generate a context variation that is detected by the context acquisition layer and sent to the CMA agent which updates the RAP specific context model, according to the new real context. Next, we evaluate the context variation degree generated by: (1) context resources ΔR, (2) context policies ΔP and (3) context actors ΔA against the values of the associated defined thresholds TR, TP, and TA. The context resources set variation ΔR is generated by adding or removing a context resource r (sensor or actuator) to / from the real context. ΔR is calculated using the set difference operation applied for two consecutive moments of time: t and t+1, where t+1 represent the moment when the resource r became available. The same reasoning pattern is applied when the resource r fails or becomes unavailable: t+1
ΔR = {REt+1 ∖ REt} ⋃ {REt ∖ REt+1}
(1)
ΔP = {Pt+1 ∖ Pt} ⋃ {Pt ∖ Pt+1}
(2)
t
In formula (1) RE \ RE contains the set of context resources that become available at t+1 while REt \ REt+1 contains the set of context resources that become unavailable at t+1. If Card(ΔR) ≥ TR, the RAP specific context model is updated by adding or removing the context resources contained in ΔR. The variation of the policy set ΔP is generated by adding, removing or updating a context policy. Using the same assumptions and conclusions as for context resources, the policy set variation is calculated as: The variation of the actors set ΔA is generated by the actors that enter or leave the context. Each context actor has an attached context resources set during its context interactions. An actor features a large number of actor-context interaction patterns, but only two of these patterns may determine the actor set variation: (i) the actor enters the context and (ii) the actor leaves the context. The actor’s context variation is:
A Self-configuring Middleware Solution for Context Management
ΔA = {At+1 ∖ At} ⋃ {At ∖ At+1} ⋃ {RAt ∖ RAt +1} ⋃ {RAt+1 ∖ RAt}
339
(3)
Overall, the RAP model context variation ΔRAP is given by the union of all context elements’ variations, as shown below: ΔRAP = ΔR ⋃ ΔA ⋃ ΔP
Card(ΔRAP) = Card(ΔR) + Card(ΔA) + Card(ΔP)
(4)
The CMA agent starts the execution of the self-configuring process and updates the context model when Card(ΔRAP) ≥ TSelf-Configuring where the self-configuring threshold is defined as: TSelf-Configuring = min(TR, TA, TP)
(5)
3.2 Context Variation Generated by Actors Mobility Due to their mobility, model actors are changing their context location and implicitly the set of context resources with which they may interact. The CMA agent identifies this variation and generates a new context-actor instance and updates the specific context model instance. To evaluate the context variation generated by actors’ mobility we use the isotropic context space concept, as defined in [9]. A context space is isotropic if and only if the set of context resources remains invariant to the actors’ movement. Usually, a context space is non-isotropic, but it can be split into a set of disjunctive isotropic context sub-space volumes, called Context Granules (CG). For a given moment of time, an actor can be physically located in a single CG. As a result, the space isotropy variation ΔIZ is non-zero only when an actor moves between two CGs. The isotropy variation for a context actor is calculated as: ΔIZa = {RCGt+1 ∖ RCGt} ⋃ {RCGt ∖ RCGt+1}
(6)
CIat+1 = | Rat+1 = RCGt+1
(7)
The CMA agent continuously monitors the actors’ movement in the real context and periodically evaluates the space isotropy variation. If for an actor, the space isotropy variation is non-empty, then the self-configuring process executed by the CMA agent updates the context-actor instance. It actually represents the specific context model instance projection onto a certain actor: The context variation ΔCAM, generated by all actors’ mobility in a context is: ΔCAM = ⋃a є A ΔIZa
(8)
3.3 Context Variation Generated by Changes of Resources Property Values A context resource is a physical or virtual entity that generates and / or processes context data. The resource properties, K(r), specify the set of relevant context data that a resource can provide. For example, the set of context properties for a Hot&Humidity sensor resource is K(Hot&Humidity) = {Temperature, Humidity}. To
340
T. Cioara, I. Anghel, and I. Salomie
evaluate the context variation generated by the changes in the resource property values, we define a function Kval that associates the resource property to its value: Kval(R) = {(k1,val1),…, (kn,valn)} | k1,…,kn є K(R)
(9)
If the values captured by the Hot&Humidity sensor in a moment of time is 5 degree Celsius for temperature and 60%, for humidity, then Kval(Hot&HumiditySensor) = {(Temperature, 5), (Humidity, 60%)}. CMA agent calculates the context variation generated by changes of resource properties’ values (ΔRPV) as presented in 10. As a result, a new specific context model instance is created when Card(ΔRPV) ≥ 0. ΔRPV = Kval(Rt+1) - Kval(Rt)={(k1,val1t+1- val1t),…,(kn,valnt+1-valnt)}
(10)
3.4 The Self-configuring Algorithm The CMA agent executes the self-configuring algorithm in order to keep the context model artifacts synchronized with the real context (see Fig. 4). The CMA agent periodically evaluates the context changes. When a significant context variation is determined, the context model ontology artifacts are updated using the updateOntologyModel (owlModel, CSt+1, CIat+1, CSIt+1) method.
Fig. 4. The CMA agent self-configuring algorithm
A Self-configuring Middleware Solution for Context Management
341
4 Case Study and Results For the case study we have considered a real context represented by our Distributed System Research Laboratory (DSRL). In the laboratory the students are marked and identified by using RFID tags and readers. The students interact with the smart laboratory by means of wireless capable PDAs on which different laboratory provided services are executed (for example: submit homework services, lesson hints services, print services, information retrieval services, etc.). A sensor network captures information regarding students’ location and ambiental information such as temperature or humidity. In the laboratory, a set of policies like “the temperature should be 22 degrees Celsius” or “the loud upper limit is 80 dB” should be respected.
Fig. 5. The DSRL infrastructure
The DSRL infrastructure contains a set of sensors through which context data is collected: two Hot&Humidity sensors that capture the air humidity and the temperature, four Orient sensors placed in the upper four corners of the laboratory that measure the orientation on a single axis, one Loud sensor that detects sound loudness level and one Far Reach sensor that measures distances (see Fig. 5). The sensors are connected through a Wi-microSystem wireless network from Infusion Systems [24]. The middleware is deployed on an IBM Blade-based technology physical server. The IBM Blade technology was chosen because its maintenance software offers autonomic features like self-configuring of its hardware resources. The context related data captured by sensors is collected through the Wi-microSystem that has an I-CubeX WimicroDig analogue to digital encoder as its main part. The WimicroDig is a configurable hardware device that encodes up to 8 analogue sensor signals to MIDI messages which are real-time wirelessly transmitted, through Bluetooth waves, to the server for analysis and / or control purposes. The Bluetooth receiver located on the server is mapped as a Virtual Serial Port (VSP). In order to read/write to/from the VSP we used two sensor manufacture applications: (i) BlueMIDI which converts the Bluetooth waves received on the VSP into MIDI messages and (ii) MIDI Yoke which creates pairs of input/output MIDI ports and associates the output MIDI port with the VSP. The MIDI message information is extracted using the Microsoft Windows API multimedia operations and published through web services (see Fig. 6).
342
T. Cioara, I. Anghel, and I. Salomie
Fig. 6. The context information data path form sensors to their attached web services
The CMA agent periodically evaluates the context information changes at a predefined time interval (we use 1 second time intervals for this purpose). If significant variations are detected, the context model artifacts are created or updated using the self-configuring algorithm presented in Section 3.4. When the middleware is deployed and starts its execution (t=0) there are no context model artifacts constructed yet, i.e. the R, A and P sets of the RAP context model are empty. After one second (t=1), when two students John and Mary enter the lab, the CMA agent receives the updated context information from the Context Acquisition Layer and calculates the context elements variation ∆R, ∆P and ∆A as presented in Fig. 7a. By default the selfconfiguring thresholds are set to the value 1: TSelf-Conf = TR = TA = TP = 1. As a result of evaluating the context variation at t=1, the CMA agent executes the self – configuring algorithm which adds new concepts and updates the context model artifacts ontology. The new added concepts (see Fig. 7a) originate from the context elements set variations ∆R, ∆P and ∆A. To test the middleware self-configuring capabilities we have considered that after 60 seconds the following context changes have occurred: (i) student John leaves the laboratory, (ii) Orientation Sensor1 and OrientationSensor4 are disabled and (iii) LoudSensor is disabled. The CMA agent calculates the variation in the new context at t = 61 (Fig. 7b), executes the selfconfiguring algorithm and updates accordingly the context ontology.
a)
b)
Fig. 7. DSRL context variation at: (a) t=1 and (b) t=61
A Self-configuring Middleware Solution for Context Management
343
To test the scalability of our self-configuring algorithm we have implemented an application that can simulate the behavior of a large number of sensors that randomly generate context data at fixed time periods. The results show that the self-configuring algorithm implemented by CMA agent can generate, synchronize and update the context model artifacts that change their values simultaneously in a reasonable time for up to 20 sensors (Fig. 8). However, it is possible that sensor values change much faster than the CMA agent is capable of synchronizing the contexts representation, thus requiring a higher ticker interval value.
Fig. 8. The self-configuring algorithm scalability results
Fig. 9. The self-configuring algorithm CPU and memory overloading with 100 sensors at (a) t2=2000 ms and (b) t1=100 ms
344
T. Cioara, I. Anghel, and I. Salomie
To assess the overhead of the proposed self-configuring algorithm, a simulation editor was developed in which complex test cases can be described by generating sets of (simulation time, sensor value) associations. We evaluated the memory and processor loading when executing the self-configuring algorithm to update the specific context model instance due to sensor values changes. Using the simulator, we tested our middleware with 100 sensors changing their values every 100ms for the first test case and every 2000ms for the second test case. Even if the sensor values change rate is much higher in the first test case than in the second test case, the memory and processor loading did not show major differences (see Fig. 9).
5 Conclusions This paper addresses the problem of managing the context information acquisition and representation processes in a reliable and fault tolerant manner by using a selfconfiguring middleware. The middleware defines an agent based context management infrastructure to gather context data from sensors and generate a RAP model context representation at run-time. The self-configuring property is enforced at the middleware level by monitoring the context in order to detect context variations or conditions for which the context model artifacts must be created / updated. The evaluation results are promising showing that the self-configuring algorithm can manage in a reasonable time up 20 sensors which change their values simultaneously at a high sampling rate. Also we have proved that the memory and processor overload induced by executing the self-configuring algorithm is negligible.
References 1. Wang, K.: Context awareness and adaptation in mobile learning. In: Proc. of the 2nd IEEE Int. Wshop. on Wireless and Mobile Tech. in Education, pp. 154–158 (2004) ISBN: 07695-1989-X 2. Yu, Z., Zhou, X., Park, J.H.: iMuseum: A scalable context-aware intelligent museum system. Computer Communications 31(18), 4376–4382 (2008) 3. Pareschi, L.: Composition and Generalization of Context Datafor Privacy Preservation. In: 6th IEEE Int. Conf. on Perv. Comp. and Comm., pp. 429–433 (2008) ISBN: 0-7695-3113-X 4. Grossniklauss, M.: Context Aware Data Management, 1st edn. VDM Verlag (2007) ISBN: 978-3-8364-2938-2 5. Anderson, K., Hansen, F.: Templates and queries in contextual hypermedia. In: Proc. of the 17th Conf. on Hypertext and Hypermedia, pp. 99–110 (2006) ISBN: 1-59593-417-0 6. Raz, D., Juhola, A.T.: Fast and Efficient Context-Aware Services. Wiley Series on Comm. Networking & Distributed Systems, pp. 5–25 (2006) ISBN-13: 978-0470016688 7. Hofer, T.: Context-awareness on mobile devices – the hydrogen approach. In: Proc. of the 36th Hawaii Int. Conf. on System Sciences, USA, p. 292 (2003) ISBN: 0-7695-1874-5 8. Cafezeiro, I., Hermann, E.: Ontology and Context. In: Proc. Of the 6th Annual IEEE Int. Conf. on Pervasive Comp. and Comm., pp. 417–422 (2008) ISBN: 978-0-7695-3113-7 9. Salomie, I., Cioara, T., Anghel, I.: RAP-A Basic Context Awareness Model. In: Proc. Of The 4th IEEE Int. Conf. on Intelligent Comp Comm. and Proc., Cluj-Napoca, Romania, pp. 315–318 (2008) ISBN: 978-1-4244-2673-7
A Self-configuring Middleware Solution for Context Management
345
10. Bellavista, P.: Mobile Computing Middleware for Location and Context-Aware Internet Data Services. ACM Trans. on Internet Tech., 356–380 (2006) ISSN: 1533-5399 11. Fournier, D., Mokhtar, S.B.: Towards Ad hoc Contextual Services for Pervasive Computing. In: IEEE Middleware for S.O.C., pp. 36–41 (2006) ISBN: 1-59593-425-1 12. Spanoudakis, G., Mahbub, K.: A Platform for Context Aware Runtime Web Service Discovery. In: IEEE Int. Conf. on Web Services, USA, pp. 233–240 (2007) 13. Calinescu, R.: Model-Driven Autonomic Architecture. In: Proc. of the Fourth International Conference on Autonomic Computing, p. 9 (2007) ISBN: 0-7695-2779-5 14. Patouni, E., Alonistioti, N.: A Framework for the Deployment of Self-Managing and SelfConfiguring Components in Autonomic Environments. In: Proc. of the Int. Symp. on a World of Wireless, Mobile and Multimedia, pp. 484–489 (2006) ISBN: 0-7695-2593-8 15. Bahati, R.: Using Policies to Drive Autonomic Management. In: Proc. of the Int. Symp. on a World of Wireless, Mob. and Multimedia, pp. 475–479 (2006) ISBN: 0-7695-2593-8 16. Cremene, M., Riveill, M.: Autonomic Adaptation based on Service-Context Adequacy Determination. In: Electronic Notes in Theoretical Comp. Sc., pp. 35–50. Elsevier, Amsterdam (2007) ISSN: 1571-0661 17. Huaifeng, Q.: Integrating Context Aware with Sensornet. In: Proc. of 1st Int. Conf. on Semantics, Knowledge, Grid, Beijing, China (2006) ISBN: 0-7695-2534-2 18. Bernstein, A.: Querying the Semantic Web with Ginseng: A Guided Input Natural Language Search Engine. In: 15th Workshop on Inf. Tech. and Syst., pp. 112–126 (2005) 19. Sirin, E., Parsia, B.: Pellet: A practical OWL-DL reasoner. In: Web Semantics: Science, Services and Agents on the World Wide Web, vol. 5(2), pp. 51–53. Elsevier, Amsterdam (2007) 20. Amoui, M., Salehie, M.: Adaptive Action Selection in Autonomic Software Using Reinforcement Learning. In: Proc. of the 4th Int. Conf. on Aut. and Autonomous Sys., pp. 175–181 (2008) ISBN: 0-7695-3093-1 21. Thangarajah, J., Padgham, L.: Representation and reasoning for goals in BDI agents. In: Proc. of the 25th Australasian Conf. on Comp. Sci., pp. 259–265 (2002) ISSN: 1445-1336 22. Distributed Systems Research Laboratory, http://dsrl.coned.utcluj.ro 23. Jade-Java Agent DEvelopment Framework, http://jade.tilab.com 24. Infusion Systems Ltd., http://www.infusionsystems.com
Device Whispering: An Approach for Directory-Less WLAN Positioning Karl-Heinz Krempels, Sebastian Patzak, Janno von St¨ulpnagel, and Christoph Terwelp RWTH Aachen University, Informatik 4, Intelligent Distributed Systems Group Ahornstr. 55, D-52146 Aachen, Germany {krempels,patzak,stuelpnagel,terwelp}@nets.rwth-aachen.de
Abstract. A widely-used positioning system for mobile devices is GPS. It is based on transit times of signals from satellites, so it provides accurate positioning in outdoor scenarios. But in indoor scenarios it is not usable because the signals are absorbed by buildings. To provide positioning services indoors several approaches exist which use for example GSM (Global System for Mobile Communication) or WLAN (Wireless Local Area Network) signals. GSM based systems not using special hardware are limited to identify the GSM-Cell the mobile device is in and associate it through a directory service with a geographical position. The fingerprinting approach is based on WLAN using signal strength vectors of multiple access points to approximate a position. But this method requires a huge number of measurements and only works reliable in laboratory environments. The approach discussed in this paper uses the WLAN radio of a mobile device to identify the nearest access point. Then the geographical position of the mobile device is calculated from the geo tags broadcasted by the access points. So the approach provides at least the same accuracy as directory-based positioning systems but does not require the maintenance and communication costs of a directory. The evaluation shows that the accuracy of this approach is limited by the abilities of hardware and drivers on todays mobile devices.
1 Introduction Wireless networks become more present at many places and the vision of ubiquitous and pervasive computing becomes true. The current position of a mobile device is important for navigation and guiding applications as well as for the determination of a mobile user’s context. Since outdoor positioning approaches are based on GPS, that does not work indoors, due to the limited reception of GPS signals inside of buildings, there is a need for indoor positioning systems. This paper discusses a directory-less approach for WLAN based indoor positioning which can be used to realize indoor navigation and guidance systems at airports or railway stations, e.g. to guide the passenger to his gate or to the next restaurant. This approach does not need any additional server infrastructure or additional transmitter antennas, because it uses the already existing WLAN infrastructure. M.S. Obaidat and J. Filipe (Eds.): ICETE 2009, CCIS 130, pp. 346–357, 2011. c Springer-Verlag Berlin Heidelberg 2011
Device Whispering: An Approach for Directory-Less WLAN Positioning
347
The paper is organized as follows: In section 2.4 we introduce the WLAN whispering approach and discuss its merits and flaws. Section 5 discusses a guiding application scenario for an airport. Finally, in section 6 we summarize the results of our work and address open problems in this area of research.
2 Directory-Less Indoor WLAN Positioning Approaches for indoor positioning based on WLAN signals are discussed in [1] [2] [3] [4] [5] [6], accuracy comparisons are given in [7] [8] [9]. Directory-less indoor positioning based on geo-tags is discussed in [10] [11]. In this approach the geographical coordinates of the access points are directly provided by the access points them self. A mobile device with an embedded WLAN receiver analyses the signals from the adjacent access points combined with the received information on their positions and calculates its own geographical position. 2.1 Service Set Identifier Service Set Identifiers are defined by the IEEE 802.11-1999 [12] standard. For the approach of discourse only the SSID is useful: – The SSID indicates the name of the WLAN cell that is broadcasted in beacons. The length of the SSID information field is 0 to 32 octets. – Extended Service Set Identifier (ESSID): Multiple APs with the same SSID are combined to a larger cell on layer 2. This is called ESSID. – The Basic Service Set Identifier (BSSID) is a 48-bit field of the same format as an IEEE 802.11 MAC address. It uniquely identifies a Basic Service Set (BSS). Normally, the value is set to the MAC address of the AP or a broadcast MAC address in an infrastructure BSS. To supply the geographical coordinates of the wireless access points to the mobile device the following two interaction modes can be used: 2.2 Pull Model Every wireless access point (AP) broadcasts the same SSID like ’geo’. Then, the client associates with the AP and obtains an IP address over Dynamic Host Configuration Protocol (DHCP). Finally, the client queries a positioning service provided by the access point to retrieve the GPS coordinate of the AP. 2.3 Push Model Every AP broadcasts a unique SSID that encodes the GPS coordinates of the AP. The client needs only to scan for specific geo SSIDs and selects the SSID with the highest signal strength. It is not necessary that the client associates with the base station, because the client can retrieve all information from the already received SSID broadcast.
348
K.-H. Krempels et al.
AP3
AP2
AP4
MD
AP5
AP1
Fig. 1. WLAN SSID-Positioning
2.4 SSID WLAN Positioning The position of the mobile device could be estimated with the help of interpolation calculus, by using only the coordinates of the m strongest signals from n signals received by the device, or by a combination of both approaches. Meaning, first selecting the m strongest signals and then interpolating the coordinates related to this signals. However, the result will be an area or even a space. Determining the position of the mobile device only with the help of the strength of the received signals is highly influenced by the changing environment and the changing sending power of the considered access points. Thus, we can not assume, that the strongest signal is received from the closest access point. In Figure 1 the signal received from A P4 could be stronger than the signal received from A P5 . 2.5 SSID WLAN Whispering In Fig. 1 the mobile device M D receives the signals and SSID’s from the access points A P1 , A P2 , . . . , A P5 . To select the closest geographical vicinity of the mobile device,
Device Whispering: An Approach for Directory-Less WLAN Positioning
349
AP3
AP2
AP4
MD
AP5
AP1
Fig. 2. Radio Whispering to Detect the Close Vicinity
we introduce the whispering approach. Since a mobile device is able to control its WLAN radio interface it can control also its sending power. The characteristics of its receiving antenna are not influenced thereby, so that the list of access points received by the mobile device would not change. WLAN radio whispering [11] consists in reducing the sending power of a mobile device to a minimal value (less than 1mW) and querying a subset of the visible access points for management information (Fig. 2). Due to the reduced sending power of the mobile device only the access points, that are geographically very close to the mobile device will receive its query and will answer to it. Thus, the effect of whispering is a filter that is robust against signal multi-path propagation and power oscillations or automated adaption of access points. An idealistic abstraction of the whispering effect is shown in Fig. 3. In the WLAN communication range of the mobile device M D the access points AP1 , AP2 , . . . , AP5 are visible (Fig. 1). AP4 and AP5 will receive the information query send with very low power by the mobile device (Fig. 2) due to their close vicinity to it. Access point AP5 answers to the query (Fig. 3) and the mobile device can extract its position from the SSID of AP5 .
350
K.-H. Krempels et al.
AP3
AP2
AP4
MD
AP5
AP1
Fig. 3. Answer of the Close Vicinity
3 Experimental Setup and Environment Description Device whispering requires control of the signal sending power at hardware level and a corresponding driver that provides this functionality to application software. Since only a few network drivers met this requirement Linux was chosen as development platform providing a suitable open source driver. Fig. 4 shows the architectural diagram of the implemented software. The developed application collects the positioning information from the access points requesting the operating system driver to scan the wifi network. Thus, the wireless network interface is instructed with the help of firmware functions by the wireless interface driver to perform the required network scan, then to reduce the sending power to 1mW, and to send out probe request packets to access points with tagged SSIDs. All measurements for the evaluation of the device whispering approach have been taken at Cologne International Airport1 . The airport building consists of two terminals Terminal 1 and Terminal 2 with three floors. In both terminals the third floor is the departure level. The arrival level is situated at floor 2 in Terminal 1 and at floor 1 in Terminal 2. In Terminal 1 the three floors have a common open side close to the elevators and the stairs. Thus, the WLAN signal strength varies very much in this area, 1
Cologne International Airport.
Device Whispering: An Approach for Directory-Less WLAN Positioning
Application Software Driver Firmware Hardware Network Interface
Fig. 4. Software Architecture Terminal 1 Terminal 2 Departure Level
Departure Level
Arrival Level
Shopping Mall
Airport Administration Arrival Level
Fig. 5. Position of the tagged access points in the airport building
351
352
K.-H. Krempels et al.
Terminal 1 7
6
5
4
8
Terminal 2
3
9 10
Departure Level
Departure Level
11 12 13 14 15
2 1
Arrival Level
Shopping Mall
16
Airport Administration Arrival Level
18 17 19
Fig. 6. Position of the tagged access points in the airport building
due to the high dependency of the signal quality from the position and direction of the mobile’s WLAN radio antenna. For the validation of the approach we tagged a set of 13 WLAN access points operated by the computer center of the airport with geo tags. Five access points are installed in Terminal 1 at departure level, six access points in Terminal 2 at departure level, and 2 access points in Terminal 2 at arrival level. Figure 5 shows a sketch of the airport building and the position of tagged access points. Figure 6 shows a sketch of the airport building and the position of measurement points 1-19. At each of this measurement points the real position was determined manually to get a value to compare to. Then the position was measured by triangulation with the access points. Once using the old approach and once using the new whispering approach. This was done two times to get an idea of the stability of the measurements.
Device Whispering: An Approach for Directory-Less WLAN Positioning
353
Measurement 1 Measurement 2
110 100 90
Deviation [m]
80 70 60 50 40 30 20 10 0 1
2
3
4
5
6
7
8 9 10 11 12 Measurement Point Id
13
14
15
16
17
18
19
(a) Without Device Whispering Measurement 1 Measurement 2
110 100 90
Deviation [m]
80 70 60 50 40 30 20 10 0 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Measurement Point Id
(b) Device Whispering Fig. 7. Deviation of the measured positions
The deviation of the measured positions from the actual positions is shown in Figure 7(a) not using whispering and in Figure 7(b) using whispering.
4 Evaluation If we look at the figures and at the median of the measurement errors, which are 45.5m without whispering and 32.5m with whispering, we see an improved positioning quality by about 29%. As we only did the measurements 2 times, our ability to make assumptions about the stability of the positioning method is limited. But looking at the mean values of the differences between the two measurements, which are 23.53m without whispering and 14.37m, the stability seems to have improved, too.
354
K.-H. Krempels et al.
5 Application Scenario Many indoor navigation and guidance applications suffer on high positioning costs and on low positioning accuracy. The business cases of a subset of this systems are based on low cost or free positioning and do not require high accuracy positioning. Thus, it seems that even with a low positioning accuracy (less than twenty-five meters) navigation and positioning applications could be deployed and used. In Fig. 8 a guidance scenario is shown that could be implemented with the help of the positioning approach discussed in this paper. The scenario is based on a planned trip consisting of a travel chain. Each element of the chain has an expected duration and a travel mode (e.g. walking, flying, traveling by bus, etc). For the most travel modes the operation vehicle (e.g. bus, train) and its route is known in advance. Thus, the positioning accuracy could be improved if a determined (rough) position is mapped to well known trajectories of a planned route at the respective time, e.g. corridors, stairs, etc. The scenario in Fig. 8 shows a travel chain element with the travel mode walking. A traveler is guided with the help of discrete position points mapped to his planned route to the right gate, e.g to take his plane. A prototype system for indoor navigation based on imprecise positions using our whispering approach is discussed in [13].
Bus Walk Train Walk Plane
Gate 1 Gate 2 Gate 29 Gate 30
Fig. 8. Application Scenario
Device Whispering: An Approach for Directory-Less WLAN Positioning
355
(a) Small overlapping: many AP (channel 10) and AP eduroam (channel 11)
(b) Large overlapping: many AP (channel 10) and many AP (channel 11) Fig. 9. Wifi Channel Signal Overlapping
6 Conclusions In this paper we presented the whispering technique to improve the positioning accuracy of directory-less indoor WLAN positioning systems. The advantage of this approach is that there is no need to establish an Internet connection, and it is applicable indoor and outdoor. The positioning accuracy is determined by the number of access points which
356
K.-H. Krempels et al.
can be seen by a mobile device, their radio range and how fine the sending power of the WLAN radio of the device itself can be adjusted. We showed on the K¨oln-Bonn airport that the approach gives better results than WLAN positioning without whispering. A limiting factor is the hardware’s and driver’s capability to reduce the sending power. Current systems are able to reduce the sending power to minimum of 1mW . But to improve the positioning results further an adjustable sending power between 10µ W and 1000µ W is required. So, one future step is to modify the WLAN hardware to support this low sending power levels. Another approaches are to combine this approach with other positioning systems, as for example GPS, to a hybrid positioning system and extend the 802.11 standard to support context information for access points.
7 Outlook Further investigations have shown that for the most operating systems for mobile devices it is very difficult to control the sending signal level of the Wifi interface, since this functionality is either not supported by the used hardware or it is not implemented by the device driver [14]. A workaround for this limitations maybe possible by the overspeaking effect between adjacent channels used in Wifi communication. In Fig. 9(a) an Fig. 9(b) several Wifi signals are shown. E.g. due to the over-speak effect the mops network sending on channel 10 is visible on channels 8 to 12, but with lower signal strength. We analyze this effect and will determine its usability for our approach. Acknowledgements. This research was funded in part by the DFG Cluster of Excellence on Ultra-high Speed Information and Communication (UMIC), German Research Foundation grant DFG EXC 89.
References 1. Jan, R.H., Lee, Y.R.: An Indoor Geolocation System for Wireless LANs. In: Proceedings of 2003 International Conference on Parallel Processing Workshops, October 6-9, pp. 29–34 (2003) 2. Wallbaum, M., Spaniol, O.: Indoor Positioning Using Wireless Local Area Networks. In: IEEE John Vincent Atanasoff 2006 International Symposium on Modern Computing, JVA 2006, pp. 17–26 (October 2006) 3. Wallbaum, M.: Tracking of Moving Wireless LAN Terminals. In: 15th IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2004, September 5-8, vol. 2, pp. 1455–1459 (2004) 4. Yeung, W.M., Ng, J.K.: Wireless LAN Positioning based on Received Signal Strength from Mobile Device and Access Points. In: 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2007, August 21-24, pp. 131–137 (2007) 5. Kaemarungsi, K.: Distribution of wlan received signal strength indication for indoor location determination, p. 6 (January 2006) 6. Zhao, Y., Zhou, H., Li, M., Kong, R.: Implementation of indoor positioning system based on location fingerprinting in wireless networks, pp. 1–4 (October 2008) 7. Lin, T.N., Lin, P.C.: Performance comparison of indoor positioning techniques based on location fingerprinting in wireless networks, vol. 2, pp. 1569–1574 (June 2005)
Device Whispering: An Approach for Directory-Less WLAN Positioning
357
8. Wallbaum, M., Diepolder, S.: Benchmarking Wireless LAN Location Systems Wireless LAN Location Systems. In: The Second IEEE International Workshop on Mobile Commerce and Services, WMCS 2005, July 19, pp. 42–51 (2005) 9. Liu, H., Darabi, H., Banerjee, P., Liu, J.: Survey of wireless indoor positioning techniques and systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 37(6), 1067–1080 (2007) 10. Krempels, K.H., Krebs, M.: Directory-less WLAN Indoor Positioning. In: Proceedings of the IEEE International Symposium on Consumer Electronics 2008, Vilamoura, Portugal (2008) 11. Krempels, K.H., Krebs, M.: Improving Directory-Less WLAN Positioning by Device Whispering. In: Proceedings of the International Conference on Wireless Information Networks and Systems, Porto, Portugal (2008) 12. LAN MAN Standards Committee: ANSI/IEEE Std 802.11, 1999 Edition, Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE Standard (1999) 13. Chowaw-Liebman, O., Krempels, K.H., von St¨ulpnagel, J., Terwelp, C.: Indoor navigation using approximate positions. In: [15], pp. 168–171 14. Krempels, K.H., Patzak, S., von St¨ulpnagel, J., Terwelp, C.: Evaluation of directory-less wlan positioning by device whispering. In: [15], pp. 139–144 15. Obaidat, M.S., Caldeirinha, R.F.S. (eds.): Proceedings of the International Conference on Wireless Information Networks and Systems, WINSYS 2009, Milan, Italy, July 7-10 (2009); In: Obaidat, M.S., Caldeirinha, R.F.S. (eds.) WINSYS is part of ICETE - The International Joint Conference on e-Business and Telecommunications, WINSYS. INSTICC Press (2009)
Author Index
Marca, David A. 37 Martin, Keith M. 174 Murray, Steve 320
A h r e n s , A n d r e as 307 A n gh e l , Ion u t 332 B B B B
ai l e y , D an i e l V . 186 al oc c o, R aff ae l l o 94 e n av e n t e - P e c e s , C ´es ar r ai n ar d , J oh n 186
C i oar a, T u d or 332 C or t i m i gl i a, M ar c e l o C r e ¸t , O c t av i an 159
307
291
F or e s t i , S ar a 20 F u k u s h i m a, K az u h i d e F u n k , B u r k h ar d t 71
174
G h e z z i , A n t on i o 94 G u p t a, G au r av 253 G y ¨ or fi , T am as 159 Holmgren, Johan Iatrou, Michael G. Jacobsson, Andreas
139
Kankanhalli, Mohan 253 Kiyomoto, Shinsaku 174 Kong, Xiaoying 320 Krempels, Karl-Heinz 346 Lowe, David
320
20
Rangone, Andrea 94, 126 Remondino, Marco 110 Renga, Filippo 126 Rohde, Sebastian 186 RoyChowdhury, Dipanwita Saha, Mounita 212 Salomie, Ioan 332 Samarati, Pierangela 20 Schatter, G¨ unther 291 Schina, Laura 82 Serpanos, Dimitrios N. 199 Shishkov, Blagovest 3 Subbaraman, Vignesh 266 Suciu, Alin 159
139 199
266
Paar, Christof 186 Pasupathinathan, Vijayakrishnan Patzak, Sebastian 346 Persson, Jan A. 139 Pieprzyk, Josef 224, 253 Pironti, Marco 110 Pisano, Paola 110
126
D av i d s s on , P au l 139 d e A´ v i l a, P au l o M u n i z 278 D e C ap i t an i d i V i m e r c at i , S ab r i n a D e D e c k e r , B ar t 237 D e l V e c c h i o, P as q u al e 82 E i s e l t , A n d r e as
Ndou, Valentina 82 Nsabimana, Fran¸cois Xavier
Terwelp, Christoph 346 Tudoran, Radu 159 Verslype, Kristof 237 von St¨ ulpnagel, Janno 346 Vossen, Gottfried 53 Voyiatzis, Artemios G. 199 Wang, Huaxiong
224
Z¨ olzer, Udo 266 Zorzo, S´ergio Donizetti
278
212
224