124 77
English Pages 792 [790] Year 2011
Communications in Computer and Information Science
167
Hocine Cherifi Jasni Mohamad Zain Eyas El-Qawasmeh (Eds.)
Digital Information and Communication Technology and Its Applications International Conference, DICTAP 2011 Dijon, France, June 21-23, 2011 Proceedings, Part II
13
Volume Editors Hocine Cherifi LE2I, UMR, CNRS 5158, Faculté des Sciences Mirande 9 , avenue Alain Savary, 21078 Dijon, France E-mail: hocine.cherifi@u-bourgogne.fr Jasni Mohamad Zain Universiti Malaysia Pahang Faculty of Computer Systems and Software Engineering Lebuhraya Tun Razak, 26300 Gambang, Kuantan, Pahang, Malaysia E-mail: [email protected] Eyas El-Qawasmeh King Saud University Faculty of Computer and Information Science Information Systems Department Riyadh 11543, Saudi Arabia E-mail: [email protected]
ISSN 1865-0929 e-ISSN 1865-0937 e-ISBN 978-3-642-22027-2 ISBN 978-3-642-22026-5 DOI 10.1007/978-3-642-22027-2 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011930189 CR Subject Classification (1998): H, C.2, I.4, D.2
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Preface
On behalf of the Program Committee, we welcome you to the proceedings of participate in the International Conference on Digital Information and Communication Technology and Its Applications (DICTAP 2011) held at the Universit´e de Bourgogne. The DICTAP 2011 conference explored new advances in digital information and data communications technologies. It brought together researchers from various areas of computer, information sciences, and data communications who address both theoretical and applied aspects of digital communications and wireless technology. We do hope that the discussions and exchange of ideas will contribute to the advancements in the technology in the near future. The conference received 330 papers, out of which 130 were accepted, resulting in an acceptance rate of 39%. These accepted papers are authored by researchers from 34 countries covering many significant areas of digital information and data communications. Each paper was evaluated by a minimum of two reviewers. We express our thanks to the Universit´e de Bourgogne in Dijon, Springer, the authors and the organizers of the conference.
Proceedings Chairs DICTAP2011
General Chair Hocine Cherifi
Universit´e de Bourgogne, France
Program Chairs Yoshiro Imai Renata Wachowiak-Smolikova Norozzila Sulaiman
Kagawa University, Japan Nipissing University, Canada University of Malaysia Pahang, Malaysia
Program Co-chairs Noraziah Ahmad Jan Platos Eyas El-Qawasmeh
University of Malaysia Pahang, Malaysia VSB-Technical University of Ostrava, Czech Republic King Saud University, Saudi Arabia
Publicity Chairs Ezendu Ariwa Maytham Safar Zuqing Zhu
London Metropolitan University, UK Kuwait University, Kuwait University of Science and Technology of China, China
Message from the Chairs
The International Conference on Digital Information and Communication Technology and Its Applications (DICTAP 2011)—co-sponsored by Springer—was organized and hosted by the Universit´e de Bourgogne in Dijon, France, during June 21–23, 2011 in association with the Society of Digital Information and Wireless Communications. DICTAP 2011 was planned as a major event in the computer and information sciences and served as a forum for scientists and engineers to meet and present their latest research results, ideas, and papers in the diverse areas of data communications, networks, mobile communications, and information technology. The conference included guest lectures and 128 research papers for presentation in the technical session. This meeting was a great opportunity to exchange knowledge and experience for all the participants who joined us from around the world to discuss new ideas in the areas of data communications and its applications. We are grateful to the Universit´e de Bourgogne in Dijon for hosting this conference. We use this occasion to express our thanks to the Technical Committee and to all the external reviewers. We are grateful to Springer for co-sponsoring the event. Finally, we would like to thank all the participants and sponsors. Hocine Cherifi Yoshiro Imai Renata Wachowiak-Smolikova Norozzila Sulaiman
Table of Contents – Part II
Software Engineering Ontology Development as a Software Engineering Procedure . . . . . . . . . . . Ladislav Burita
1
An IPv6 Test Bed for Mobility Support in Next Generation Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wan H. Hassan and Ahmed M. Al-Imam
9
Test Management Traceability Model to Support Software Testing Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Azri Azmi and Suhaimi Ibrahim
21
Software Maintenance Testing Approaches to Support Test Case Changes – A Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Othman Mohd Yusop and Suhaimi Ibrahim
33
Measuring Understandability of Aspect-Oriented Code . . . . . . . . . . . . . . . . Mathupayas Thongmak and Pornsiri Muenchaisri
43
Aspect Oriented and Component Based Model Driven Architecture . . . . Rachit Mohan Garg, Deepak Dahiya, Ankit Tyagi, Pranav Hundoo, and Raghvi Behl
55
Engineering the Development Process for User Interfaces: Toward Improving Usability of Mobile Applications . . . . . . . . . . . . . . . . . . . . . . . . . . Reyes Ju´ arez-Ram´ırez, Guillermo Licea, Itzel Barriba, ´ V´ıctor Izquierdo, and Alfonso Angeles Lifelong Automated Testing: A Collaborative Framework for Checking Industrial Products along Their Lifecycle . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Tamisier and Fernand Feltz A Comparative Analysis and Simulation of Semicompetitive Neural Networks in Detecting the Functional Gastrointestinal Disorder . . . . . . . . Yasaman Zandi Mehran, Mona Nafari, Nazanin Zandi Mehran, and Alireza Nafari
65
80
87
Networking and Mobiles Individuals Identification Using Tooth Structure . . . . . . . . . . . . . . . . . . . . . Charbel Fares, Mireille Feghali, and Emilie Mouchantaf
100
X
Table of Contents – Part II
Modeling Interpolated Distance Error for Clutter Based Location Estimation of Wireless Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muhammad Alam, Mazliham Mohd Su’ud, Patrice Boursier, and Shahrulniza Musa Mobile and Wireless Access in Video Surveillance System . . . . . . . . . . . . . Aleksandra Karimaa A Novel Event-Driven QoS-Aware Connection Setup Management Scheme for Optical Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wissam Fawaz, Abdelghani Sinno, Raghed Bardan, and Maurice Khabbaz Semantic Data Caching Strategies for Location Dependent Data in Mobile Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. Ilayaraja, F. Mary Magdalene Jane, I. Thomson, C. Vikram Narayan, R. Nadarajan, and Maytham Safar
115
131
139
151
Distributed and Parallel Processing PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution in Data Stream Management System . . . . . . . . . . . . . . . . . . . . Mehdi Alemi, Ali A. Safaei, Mostafa S. Haghjoo, and Fatemeh Abdi
166
PFGN : A Hybrid Multiprocessor Real-Time Scheduling Algorithm for Data Stream Management Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ali A. Safaei, Mostafa S. Haghjoo, and Fatemeh Abdi
180
Detecting Cycles in Graphs Using Parallel Capabilities of GPU . . . . . . . . Fahad Mahdi, Maytham Safar, and Khaled Mahdi
193
Survey of MPI Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mehnaz Hafeez, Sajjad Asghar, Usman Ahmad Malik, Adeel ur Rehman, and Naveed Riaz
206
Mathematical Model for Distributed Heap Memory Load Balancing . . . . Sami Serhan, Imad Salah, Heba Saadeh, and Hamed Abdel-Haq
221
Social Networks Effectiveness of Using Integrated Algorithm in Preserving Privacy of Social Network Sites Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sanaz Kavianpour, Zuraini Ismail, and Amirhossein Mohtasebi A Web Data Exchange System for Cooperative Work in CAE . . . . . . . . . Min-hwan Ok and Hyun-seung Jung
237
250
Table of Contents – Part II
Online Social Media in a Disaster Event: Network and Public Participation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shu-Fen Tseng, Wei-Chu Chen, and Chien-Liang Chi Qualitative Comparison of Community Detection Algorithms . . . . . . . . . . G¨ unce Keziban Orman, Vincent Labatut, and Hocine Cherifi
XI
256 265
Ontology Proof-of-Concept Design of an Ontology-Based Computing Curricula Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Adelina Tang and Amanullah Abdur Rahman
280
Automatic Approaches to Ontology Engineering . . . . . . . . . . . . . . . . . . . . . Petr Do
293
The Heritage Trust Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˇ anek and Pavel Tyl Roman Sp´
307
Towards Semantically Filtering Web Services Repository . . . . . . . . . . . . . . Thair Khdour
321
Ontology for Home Energy Management Domain . . . . . . . . . . . . . . . . . . . . Nazaraf Shah, Kuo-Ming Chao, Tomasz Zlamaniec, and Adriana Matei
336
Adding Semantic Extension to Wikis for Enhancing Cultural Heritage Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ´ Eric Leclercq and Marinette Savonnet K-CRIO: An Ontology for Organizations Involved in Product Design . . . Yishuai Lin, Vincent Hilaire, Nicolas Gaud, and Abderrafiaa Koukam Improving Web Query Processing with Integrating Intelligent Algorithm and XML for Heterogeneous Database Access . . . . . . . . . . . . . . Mohd Kamir Yusof, Ahmad Faisal Amri Abidin, Sufian Mat Deris, and Surayati Usop Semantics and knowledge Management to Improve Construction of Customer Railway Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Diana Penciuc, Marie-H`el`ene Abel, Didier Van Den Abeele, and Adeline Leblanc
347 361
376
391
Algorithms Blending Planes and Canal Surfaces Using Dupin Cyclides . . . . . . . . . . . . Lucie Druoton, Lionel Garnier, Remi Langevin, Herve Marcellier, and Remy Besnard
406
XII
Table of Contents – Part II
Multimedia Avoiding Zigzag Quality Switching in Real Content Adaptive Video Streaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wassim Ramadan, Eugen Dedu, and Julien Bourgeois A Novel Attention-Based Keyframe Detection Method . . . . . . . . . . . . . . . . Huang-Chia Shih
421 436
E-Learning Evaluating the Effectiveness of Using the Internet for Knowledge Acquisition and Students’ Knowledge Retention . . . . . . . . . . . . . . . . . . . . . . Zakaria Saleh, Alaa Abu Baker, and Ahmad Mashhour Computer-Based Assessment of Implicit Attitudes . . . . . . . . . . . . . . . . . . . . Ali Reza Rezaei First Electronic Examination for Mathematics and Sciences Held in Poland - Exercises and Evaluation System . . . . . . . . . . . . . . . . . . . . . . . . . . Jacek Sta´ ndo
448 456
463
Interactive Environments and Emergent Technologies for eLearning How Can ICT Effectively Support Educational Processes? Mathematical Emergency E-Services – Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jacek Sta´ ndo and Krzysztof Kisiel
473
A Feasibility Study of Learning Assessment Using Student’s Notes in an On-Line Learning Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minoru Nakayama, Kouichi Mutsuura, and Hiroh Yamamoto
483
IBS: Intrusion Block System a General Security Module for elearning Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessio Conti, Andrea Sterbini, and Marco Temperini
494
Interpretation of Questionnaire Survey Results in Comparison with Usage Analysis in E-Learning System for Healthcare . . . . . . . . . . . . . . . . . Martin C´ apay, Zolt´ an Balogh, M´ aria Boledoviˇcov´ a, and Miroslava Mes´ aroˇsov´ a Dynamic Calculation of Concept Difficulty Based on Choquet Fuzzy Integral and the Learner Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmad Kardan and Roya Hosseini
504
517
Table of Contents – Part II
XIII
Signal Processing Simple Block-Diagonalization Based Suboptimal Method for MIMO Multiuser Downlink Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tetsuki Taniguchi, Yoshio Karasawa, and Nobuo Nakajima
531
Novel Cooperation Strategies for Free-Space Optical Communication Systems in the Absence and Presence of Feedback . . . . . . . . . . . . . . . . . . . . Chadi Abou-Rjeily and Serj Haddad
543
Hybrid HMM/ANN System Using Fuzzy Clustering for Speech and Medical Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lilia Lazli, Abdennasser Chebira, Mohamed Tayeb Laskri, and Kurosh Madani Mobile-Embedded Smart Guide for the Blind . . . . . . . . . . . . . . . . . . . . . . . . Danyia AbdulRasool and Susan Sabra
557
571
Information and Data Management MMET: A Migration Metadata Extraction Tool for Long-Term Preservation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Feng Luan and Mads Nyg˚ ard RDF2SPIN: Mapping Semantic Graphs to SPIN Model Checker . . . . . . . Mahdi Gueffaz, Sylvain Rampacek, and Christophe Nicolle C-Lash: A Cache System for Optimizing NAND Flash Memory Performance and Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jalil Boukhobza and Pierre Olivier
579 591
599
Resource Discovery for Supporting Ubiquitous Collaborative Work . . . . . Kimberly Garc´ıa, Sonia Mendoza, Dominique Decouchant, Jos´e Rodr´ıguez, and Alfredo Piero Mateos Papis
614
Comparison between Data Mining Algorithms Implementation . . . . . . . . . Yas A. Alsultanny
629
Role of ICT in Reduction of Poverty in Developing Countries: Botswana as an Evidence in SADC Region . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tiroyamodimo M. Mogotlhwane, Mohammad Talib, and Malebogo Mokwena Personnel Selection for Manned Diving Operations . . . . . . . . . . . . . . . . . . . Tamer Ozyigit, Salih Murat Egi, Salih Aydin, and Nevzat Tunc Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Corina Cimpoieru
642
654
663
XIV
Table of Contents – Part II
CRAS: A Model for Information Representation in a Multidisciplinary Knowledge Sharing Environment for Its Reuse . . . . . . . . . . . . . . . . . . . . . . . Yueh Hui Kao, Alain Lepage, and Charles Robert
678
A Comparative Study on Different License Plate Recognition Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hadi Sharifi and Asadollah Shahbahrami
686
A Tag-Like, Linked Navigation Approach for Retrieval and Discovery of Desktop Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gontlafetse Mosweunyane, Leslie Carr, and Nicholas Gibbins
692
Heuristic Approach to Solve Feature Selection Problem . . . . . . . . . . . . . . . Rana Forsati, Alireza Moayedikia, and Bahareh Safarkhani
707
Context-Aware Systems: A Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bachir Chihani, Emmanuel Bertin, Fabrice Jeanne, and Noel Crespi
718
Handling Ambiguous Entities in Human-Robot Instructions Using Linguistic and Context Knowledge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shaidah Jusoh and Hejab M. Alfawareh
733
Cognitive Design of User Interface: Incorporating Cognitive Styles into Format, Structure and Representation Dimensions . . . . . . . . . . . . . . . . . . . Natrah Abdullah, Wan Adilah Wan Adnan, and Nor Laila Md Noor
743
Communications in Computer and Information Science: The Impact of an Online Environment Reading Comprehension: A Case of Algerian EFL Students . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samir Zidat and Mahieddine Djoudi Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
759
775
Ontology Development as a Software Engineering Procedure Ladislav Burita University of Defence, Faculty of Military Technology, CIS Department, Kounicova street 65, Brno, Czech Republic Tomas Bata University in Zlin, Faculty of Mangement and Economics, EIIS Department, Mostni street 5139, Zlin, Czech Republic [email protected]
Abstract. The article is based on the research project “Network Enabled Capability Knowledge Management of the Army Czech Republic”, which the first project is dealing with the Knowledge Management at the Czech military. The theoretical basis of the project is Topic Maps. The key issue for the project solution is designing and creating a suitable ontology. The paper describes the Software Engineering procedure from the selection of an Upper Ontology through the Core Ontology design to the processing of the Domain Ontology. Ontology definitions are stated and their meaning is explained. Ontology design was evaluated. Keywords: Upper, Core, and Domain Ontology, Knowledge Management System, MENTAL, Network Enabled Capability, Software Engineering, methodology.
1 Introduction The research project “Knowledge Management of the Army Czech Republic (ACR) Network Enabled Capability (NEC) - MENTAL” will result in a Knowledge Management System (KMS). The aim of the MENTAL is [1]” • to carry out the analysis of knowledge approaches, ontologies and ontology languages, and to assess their suitability for using them in the Army of the Czech Republic (ACR); • to evaluate the security state solution; to formalize the ACR NEC strategy and develop an encyclopedia of NEC terms; • to propose a methodology for knowledge systems development in the ACR; • to develop Command and Control Information System (C2 IS) and NEC security ontologies; • to elaborate a knowledge system proposal in the ACR NEC administration and to implement it. The accomplishment of the project is assured by successful cooperation of researchers from the University of Defence with the TOVEK and AION CS companies [6]. The most important activity concerning the knowledge-based system is the design and development of an appropriate ontology, which constitutes a formal framework H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 1–8, 2011. © Springer-Verlag Berlin Heidelberg 2011
2
L. Burita
for storing the knowledge, creating links between knowledge and ontology concepts, and establishing connections to concepts and pieces of knowledge of vital documents, which are connected with the area in focus. Ontology itself, without using the known definitions, can be considered an abstract model of a part of reality - domain for which the knowledge-based system is created. One part of the project is the validation of the methodology for ontology creation. The first of the underlying methodological postulates for designing ontology is a logical procedure from an Upper Ontology through a Core Ontology to a Domain Ontology.
2 Upper Ontology Selection “In information science, the Upper Ontology (Top-Level Ontology or Foundation Ontology) is an ontology which describes very general concepts that are the same across all knowledge domains. The most important function of an Upper Ontology is to support a very broad semantic interoperability between a large numbers of ontologies accessible under this Upper Ontology [4, 8, 9, 12].” There are ontologies that are competing to be used as the foundation for standard, for example IFF Foundation Ontology [7], OpenCYC [10], and SUMO [11].
Fig. 1. The Upper Ontology for MENTAL
We have selected the Upper Ontology with the theme of Competition Intelligence for the MENTAL project. The general concepts here are PERSON, ORGANIZATION, ACTIVITY, RULE, SOURCE, THING, THEME, EVENT and PLACE; see classes at the Fig. 1. This ontology corresponds with the military area in Ministry of Defense (MoD) and NEC theme, for which is being knowledge base prepared. Therefore with respect to the Upper Ontology, the MoD Core Ontology, NEC
Ontology Development as a Software Engineering Procedure
3
Domain Ontology, and other Domain Ontologies with the similar approach can be linked in all subsequent projects.
3 Core Ontology Specification “In philosophy, a Core Ontology is a basic and minimal ontology consisting only of the minimal concepts required to understand the other concepts. It must be based on a core glossary in a human language, so that humans can comprehend the concepts and distinctions made. Such a Core Ontology is a key pre-requisite to more complete ontology foundation, or a more general philosophical sense of ontology. Core Ontology is a concept that is used in information science as well [2, 4, 9, 12]”.
Fig. 2. The MoD Core Ontology
Core Ontology has a basic position in the interoperability area. It is a central ontology for systems that integrates many ideas from various points of view of the same problem. Other view on the Core Ontology corresponds with the work of representatives from various communities with the goal of harmonizing their knowledge perspectives [3]. The solution of the Core Ontology is connected with the integration of dictionaries from many fields of the same theme, for example in medicine, in an attempt to find the core part in all fields [5]. In the MENTAL project the Core Ontology is a general model of the military at the MoD, see ontology classes and associations in Fig. 2. This ontology should integrate all ideas concerning knowledge management systems in ACR.
4
L. Burita
4 Domain Ontology Development and Evaluation “A Domain Ontology (or Domain-Specific Ontology) models a specific domain, or a part of the world. It represents particular meanings of terms as they apply to that domain. Since Domain Ontologies represent concepts in very specific and often eclectic ways, they are often incompatible. As systems that rely on Domain Ontologies expand, they often need to merge Domain Ontologies into a more general representation. This presents a challenge to the ontology designer. Different ontologies in the same domain can also arise due to different perceptions of the domain based on cultural background, education, ideology, or because a different representation language was chosen [4, 9, 12]”. In our case the compatibility of the Domain Ontologies in Czech military area is ensured applying methodology steps Upper Ontology selection and Core Ontology specification. The methodology for creating the NEC Domain Ontology includes next a preparatory stage, in which a set of documents that sufficiently describes a given domain (document base) will be collected. At this stage, the project team members were trained in the fundamentals of ontology, and tried to create a working version of their own ontology. Furthermore, it is necessary to clarify the basic concepts of the subject area in focus; for instance, by means of the analysis of the document base, which characterizes the selected domain. Basic concepts of the domain are arranged, e.g. into taxonomy, see Fig. 3. The taxonomy construction helps to understand the selected domain and is the best way to the proper ontology creation.
Fig. 3. The NEC Taxonomy – Basic concepts
Taxonomy is a set of concepts, where concepts of higher levels can be further developed by concepts of lower levels. The depth of the hierarchical structure is set by goals for which the ontology is created. Our research team used the TOVEK analytic products for the concepts selection and evaluation, especially Tovek Tools Analyst
Ontology Development as a Software Engineering Procedure
5
Pack (Index Manager, Tovek Agent, InfoRating, Query Editor, and Tovek Harvester). The document base should be put into a unified form, which assumes the selection of documents by language and format than indexed by Index Manager. The Tovek Agent is a concepts querying tool. The Info Rating is a tool for context analysis and Tovek Harvester is a tool for content analysis. The Query Editor is a tool for complex questions development.
Fig. 4. Testing of concepts in document base
Fig. 5. The NEC Domain Ontology
6
L. Burita Table 1. The NEC Domain Ontology associations
Number 1 2 3 4 5 6 7 8 9 10 11 12 ……..
Concept1 PERSON PERSON PERSON PERSON DOCUMENT DOCUMENT DOCUMENT DOCUMENT ORGANIZATION ORGANIZATION ORGANIZATION ORGANIZATION …………………..
Relationship is an author of is a member of is a part of is working on is input/output describes is input/output describes is a part of keeps includes is working on ……………………
Concept2 DOCUMENT ORGANIZATION SYSTEM PROJECT PROJECT RULE PROCES SYSTEM SYSTEM SCHOPNOST ORGANIZATION PROJECT …………………..
Table 2. The NEC Domain Ontology characteristics of classes
Ontology Development as a Software Engineering Procedure
7
The set of concepts candidates are tested against the information base in the evaluation phase. The main goal is to ensure the appearance of concepts in the document base, see Fig. 4. As you see, all concepts (classes) have appropriate frequency in 281 documents of the document base and could be accepted. A design of an ontology scheme follows; see Fig. 5. A prerequisite to an appropriate ontology design is a good understanding of the subject area (domain) of the future knowledge-based system. This is an iterative “top-down and bottom-up” procedure which leads to continuous improvement of the original proposal. The main criterion for the quality of the ontology will be an effective and userfriendly knowledge application. The ontology design contains a set of ontology classes (concepts): area of interest, project, process, document, person, organization, system, equipment and armament, theme, place, capability, stage, rule, procedure and event. Each class has its own definition and attributes that it characterize. The associations between classes, in the diagram in Fig. 3 numbered only, are described in Tab. 1 (segment only) and characteristics are shown in Tab. 2 (segment only). The last stage before the ontology using in KMS is implementation in chosen ontology environment. The MENTAL KMS will be implemented in SW ATOM (Aion TOpic Map) by company AION CS.
5 Conclusion The project MENTAL is still under progress. A significant part of the research task was finished and the NEC ontology was developed. During the project development new technology and tools were used. This is the first project dealing with the Knowledge Management theme at the MoD Czech Republic. There were summarized procedures and application tools for ontology design and development in the paper. The useful methodology for the ontology design and development was demonstrated. The methodology steps correspond with the common SW engineering procedures: abstraction, phasing, structuralization; top-down, and bottom-up approaches. The theoretic background of the document based knowledge management system (KMS) MENTAL is Topic Maps. For that system development are key points: indepth analysis of significant documents, proper design relevant ontology classes and associations that represent surveyed domain, and then execution some cycles of the ontology evaluations using different methods and tools. Satisfied SW technology by Companies TOVEK and AION CS was very helpful for research team. The aim of the ontology development is to prepare the significant part of the knowledge system MENTAL and not to overcome semantic interoperability issues.
Acknowledgement The article is prepared as a component of the Research Project [1]. It introduces some outcomes of the solutions in knowledge management field. Our results are part of the education process in the Information Systems Design course at the University of Defence in Brno and the Information Management course at Tomas Bata University in Zlin.
8
L. Burita
References 1. Documentation of Research Defence Project the ACR NEC Knowledge Management. Prague: MoD, 32 p. (2008-2011) 2. Dictionary and Encyclopedia Directory (2011), http://www.wordiq.com/definition/Core_ontology 3. Doerr, M., Hunter, J., Lagoze, C.: Towards a Core Ontology for Information Integration. Journal of Digital Information 4(1), 22 (2003), http://journals.tdl.org/jodi/article/viewArticle/92-Archiv 4. Gómez, A., Fernández-López, M., Corcho, O.: Ontological Engineering. Springer, Heidelberg (2004) ISBN 1-85233-551-3 5. Charlet, J.: The management of medical knowledge between non-structured documents and ontologies. Annales dés télécommunication 62(7-8), 808–826 (2007) 6. Information sources of the companies TOVEK (2011), http://www.tovek.cz and AION CS, http://www.aion.cz 7. Kent Robert, E.: IFF Foundation Ontology (2001), http://suo.ieee.org/SUO/documents/ IFF-Foundation-Ontology.pdf 8. Multilingual Archive (2011), http://www.worldlingo.com/ma/enwiki/en/Upper_ontology_information_science 9. Navigli, R., Navigli, P.: Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites (2004), http://www.dsi.uniroma1.it/~velardi/CL.pdf 10. Open CYC Ontology, (2011), http://www.opencyc.org 11. Suggested Upper Merged Ontology (SUMO) (2011), http://www.ontologyportal.org/ 12. Uschold, M., Gruninger, M.: Ontologies: principles, methods and applications. The Knowledge Engineering Review 11(2), 93–136 (1996)
An IPv6 Test Bed for Mobility Support in Next Generation Wireless Networks Wan H. Hassan1 and Ahmed M. Al-Imam2 1 Malaysia-Japan International Institute of Technology, Universiti Teknologi Malaysia, Kuala Lumpur, Malaysia [email protected] 2 School of Computer Technology, Sunway University, Selangor, Malaysia [email protected]
Abstract. This paper extends our work in developing network intelligence via agents for improved mobile Quality of Service in next generation wireless networks. A test bed architecture of our protocol, called AMP, has been built to facilitate and expedite location and handover management over IPv6 networks. AMP comprises a collaborative multi-agent system residing in the mobile node and access networks. The core IP network has remained untouched to simplify design and operations. AMP’s performance was evaluated against the IETF’s standard Mobile IPv6 protocol in support of roaming mobile nodes. Results from analyses indicate that AMP outperformed Mobile IP with lower signaling cost, latency and packet loss. Our work shows that with AMP, an improved IPbased mobility support may be achieved through added intelligence without increased complexity in the core network. Furthermore, results suggest that AMP may be more suited for micro-mobility and may serve as a viable and promising alternative to Mobile IPv6 in facilitating Internet-based host mobility in next generation wireless networks. Keywords: mobility management, Mobile IP, IPv6, intelligent agents, next generation networks.
1 Introduction Present communication networks rely heavily on the intelligence of end systems in providing services following the Internet’s principles of the end-to-end arguments by Saltzer, Reed and Clark [1]. Network architectures are typically rigid and nonadaptive in nature, offering little intelligence except for data forwarding. This approach has been sufficient in the past where the principal role of networks has been packet routing from source to destination. However, with new application requirements and emerging trends, this simplified design may no longer be adequate and consequently, next generation networks may need to be augmented with additional control features to meet these increasing demands. Presently, network-layer mobility in non-cellular IP networks is facilitated by the IETF’s Mobile IP protocol as defined in RFC 3775 [2] and RFC 3440 [3]. Although H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 9–20, 2011. © Springer-Verlag Berlin Heidelberg 2011
10
W.H. Hassan and A.M. Al-Imam
the Internet was not originally designed for mobility, the use of special entities or mobility agents i.e. the Home Agent and Foreign Agent (in Mobile IPv4) enables the Internet to offer host mobility. The Internet’s approach for mobility is simple with limited intelligence or control mechanism placed in end-systems and certain mobility agents, conforming to the Internet’s principle of the end-to-end arguments. However, the drawbacks include high latencies and packet loss, which have hindered its widescale deployment. In contrast, cellular architectures are more complex with intelligence integrated explicitly into the core network. This results in an efficient service with low latencies but an extensive architecture with high deployment and operational costs. Early works by Yabusaki et al at NTT Docomo [4] have explored mechanisms in which intelligence for mobility management may be integrated into an all-IP-mobile network of the future. Their work essentially attempts to combine the best of both worlds by adopting certain core features of cellular architectures with the simplicity typically found in packet-based networks. Moreover, according to Leonard Kleinrock, the ‘father’ of the Internet, in an article by Kurose [5], “the future of networking would include additional components such as intelligent software agents deployed across the network to handle certain tasks in a dynamic and adaptive manner”. This has been the underlying premise of our on-going research in providing novel ways of offering network services via incremental development of distributed intelligence. Our work is based on the notion of ‘intelligent network nodes’ i.e. entities with the ability to communicate and adapt dynamically according to varying scenarios and conditions. Specifically, an agent-based collaborative architecture called AMP (agentbased mobility protocol) has been developed for the purpose of providing improved mobility support when a user roams and maintains an on-going session over an IPbased network. In essence, the AMP architecture is a community of agents residing in network nodes that expedite location and handover management functions for mobile users. AMP’s operations are defined within the scope of the access networks, with the core IP network remaining unaltered. The motivation for this was to adhere to the Internet’s principle of the end-to-end arguments whenever possible. This paper is organized as follows: Section 2 provides an overview of related work in the areas mobility and test bed development, while Section 3 highlights the details of our AMP architecture. In Section 4, the details of the testbed and experimental work are explained. A discussion on the results obtained is given in Section 5. The paper concludes in Section 6 with a discussion on further work.
2 Related Work In [6], researchers at Kings College London have developed a multi-level networking test bed for hierarchical Mobile IPv6 (MIPv6) with support for multiple mobile nodes. The test bed consists of a static wired hierarchal access network connected to an Internet gateway, and provides Internet reachability for wireless mobile nodes. The proposed enhancement enables mobile nodes to perform self-configuration i.e. the nodes are able to acquire valid IPv6 addresses and establish routes to the Internet gateway through each other by utilizing a reactive mobile ad-hoc protocol. The test bed, however, only supports auto-configured ad-hoc routing protocols. Moreover, the
An IPv6 Test Bed for Mobility Support in Next Generation Wireless Networks
11
auto-configuration functionality needs to be performed by both the mobile node as well as the hierarchal access network architecture and this may incur additional computational costs and complexity. An alternative approach to test bed development has been proposed in [7] where multiple virtual machines are connected to each other to form a virtual mobile ad-hoc network. This approach reduces the associated hardware cost for test bed development and simplifies deployment by eliminating physical configuration. It enables testing and evaluating software applications for a variety of mobile ad-hoc network scenarios. Three functionalities are offered by this test bed – transparent forwarding, local stack management, and remote test bed management. The ease in operations of the virtual test bed come at the cost of accuracy since virtualization may not precisely emulate parameters at the network and data link layers. As such, performance evaluation might produce unrealistic or overly optimistic results, and may hide significant constraints and issues that may only appear in actual implementation. A test bed for mobile social networking has been proposed in [8]. The test bed studies security and privacy issues, and several context awareness policies. It provides a comprehensive environment for mobile application performance analysis but does not operate at the network layer. It is more focused on the end user and provides static router configuration, thereby limiting other routing possibilities.
3 The Agent-Based Mobility Protocol (AMP) 3.1 Features The test bed architecture of AMP builds on the earlier work of Yabusaki et al [4] where control mechanisms are realized in the IP network, and not merely in end systems. However, the AMP architecture utilizes agents with specific tasks for handling mobility management operations and as such, also differs from the IETF’s Mobile IP and Session Initiation Protocol, SIP (RFC 3261) [9]. Here, the term ‘agent’ is not strict from a distributed artificial intelligence perspective, but rather as a generic term to mean entities that communicate with each other and are autonomous with specific goals and tasks to achieve. Among the key distinguishing features of AMP are: •
•
•
A network-layer tracking mechanism that monitors the current location of a particular mobile node as it moves from one subnet to another. This enables faster detection at the IP layer, and allows state information to be maintained by the access network while the mobile node roams. The absence of packet encapsulation (IP-in-IP) and tunneling in AMP – this is unlike Mobile IP. The absence of packet encapsulation results in a smaller sized datagram and consequently, lower overheads for packet processing. Packet delivery to the current location of a mobile node is achieved through the use of database lookups and packet header replacement/switching at the mobility agents in each subnet, network or cell location, where appropriate. Direct mode of packet delivery in AMP. Packet re-routing or re-direction is not used since packets are usually delivered directly to the current location of the mobile node. As such, a reduction in transport overheads is further achieved which translates to more efficient utilization of network resources. In Mobile IP,
12
•
•
W.H. Hassan and A.M. Al-Imam
route-optimization has been proposed to reduce triangular routing, however, security considerations may not allow this to be implemented. Application-layer transparency is maintained since AMP operates primarily at the network layer. Although the mobile node and access networks are mobile-aware through the agency, distributed applications residing in end-systems need not be mobile-aware and thus, neither application adaptation nor any specific applicationlayer protocol is required, unlike in SIP. Mitigation of packet loss through buffering of packets to the next location in AMP. Through agent collaboration between neighboring access networks, preregistration and buffering of packets to the next location may be done before host movement (inter-network movement). Presently, in hierarchical Mobile IPv6, packet buffering is only possible for intra-network movement.
3.2 Architecture In the AMP architecture, mobility agents in the access network are composed of two types – registrar and tracker. A hierarchical architecture is used where there is a single registrar agent in each subnet with a tracker for each cell (see Fig. 1). There is a peerto-peer relationship between registrars of different subnets. The test bed consists of a correspondent node (CN), a mobile node (MN), four registrars R1, R2, R3 and R4, and three trackers T1, T2 and T3. The test bed is connected to another IPv6 test bed in Lancaster University UK, through an IPv6-over-IPv4 VPN link. This connection is used for testing global IPv6 connectivity as well as testing application performance over links with high Internet delays.
Fig. 1. The AMP test bed deployed with a roaming mobile node (MN), a correspondent node (CN), trackers, and registrars
An IPv6 Test Bed for Mobility Support in Next Generation Wireless Networks
13
Each registrar is assigned a unique /48 IPv6 address prefix, and each tracker within the registrar subnet is assigned a /64 IPv6 prefix. Each subnet is modeled as a tree topology with one level of hierarchy. The prefix length may be changed to suit additional hierarchical levels in the tree. Routing tables are configured statically for each router. Each tracker runs the router advertisement daemon (radv-d) in order to advertise the prefix to any mobile node. Router advertisement packets of the Neighbor Discovery Protocol (NDP) are sent by radv-d to configure a /64 unicast IPv6 address on the mobile node whenever it is associated to a tracker. 3.3 Code Structure The implementation of AMP is distributed over multiple entities – namely, mobile agent node (ma), registrar (r), and tracker (t). These entities communicate autonomously with each other using signaling packets based on the ICMPv6 protocol. The source codes that implement these entities are packaged into a single program that has multiple modes reflecting the desired entity type. The program is compiled as a daemon running in the Linux userspace while command line arguments are used to specify the entity type. The program is implemented in C language and follows a modular structure. Two threads are used - main thread, and the packet_receives thread. Within the scope of each source file, there are global variables that keep track of information accessible by all functions in that file. The program file structure is as shown in Figure 2. There are six header files and five source files. Three files of each type define and implement functions that relate to the three entities, and one header file defines the generic packet format of the AMP along with generic functions and macros. One source file implements these function. In order to avoid multiple and cyclic includes, the source files are only allowed to include the includes.h file (which contains the header includes in the correct order). Bold arrows starts at the included header and ends at the including file, the dashed arrows starts at the files defining the procedure and ends at the files using them.
Fig. 2. File structure of AMP daemon
14
W.H. Hassan and A.M. Al-Imam
Figure 3 shows the functions used and their internal interactions. All functions are implemented inside the main thread except for the receive function. The latter is a blocking function and has its own thread to avoid interrupting other operations.
Fig. 3. Data flow diagram between AMP internal modules
The two functions recv_amp_packet() and timer_ptr() are callbacks. The recv_amp_packet() is implemented inside the receiving header and it calls the blocking recvfrom() Linux function. The handler functions t_handler, r_handler and ma_handler are triggered by polling the receiving buffer, and only one of them is used at any given time depending on the running entity. The timer_ptr is the other callback function that is triggered by the main loop of the program. Function pointers such as timer_ptr and handler_ptr are used instead of the explicit handler and timer functions. As mentioned previously, the AMP daemon operates in three modes – mobile agent, tracker and registrar mode. When the AMP daemon runs in the mobile agent mode, the timer_ptr and handler_ptr points to ma_timer and ma_handler. The handler function will call a specific function depending on the type of signaling packet received. Similarly, when the AMP daemon runs in the registrar mode, the timer_ptr and handler_ptr points to r_timer and r_handler. The handler function will call a specific function depending on the type of signaling packet received. The ma_list is used to keep track of the registered mobile agents, and to maintain their respective states. The timer function makes periodic checks to update the list, and sends corresponding signaling packets when necessary. The r_list is initialized with information on other registrars, obtained from the active registrar, for the purpose of exchanging node mobility information. When the AMP daemon runs in the tracker mode, the timer_ptr and handler_ptr points to t_timer and t_handler. The t_list is used to keep track of registered tracker agents and their respective interfaces. Again, the timer function makes periodic checks to update the list, and sends corresponding signaling packets whenever necessary.
An IPv6 Test Bed for Mobility Support in Next Generation Wireless Networks
15
4 Experimental Scenario 4.1 Environment The network topology used for the test bed is as previously shown in Fig. 1. There are four registrars, and a correspondent node, CN, with an on-going connection with a mobile node, MN, which roams from one subnet to another, traversing over three wireless cells where each has a base station with a tracker. Table 1. Implementation details Parameter Packet arrival rate Average data packet size Average control packet size Average velocity of mobile node Cell/subnet radius Average distance between base stations Wired link bandwidth Wireless link bandwidth Control packet type Data packet type Layer-2 wireless technology Benchmark Benchmark implementation Operating system Kernel Probing tools Implementation type
Value 100 packets/second 100 kilobytes 64 bytes 5.6 km/h ≈ 30 meters 3 meters 100 Mbps 54 Mbps (IEEE 802.11g) ICMPv6 (customized for AMP) IPv6 packets IEEE 802.11g Mobile IPv6 UMIP Nautilus - IPv6 and Mobility tools http://www.nautilus6.org/ Linux Ubuntu 10.4 2.6.33 Wireshark & iperf Userland & Kernel patch
For comparative analyses, the test bed has also been configured to run IETF’s Mobile IPv6 (MIPv6) with a home agent located in place of Registrar 2 (the home subnetwork). Table 1 highlights the salient details of the implementation. Constant bit rate traffic was transmitted between the CN and MN as a real-time application session between the two nodes. The diameter of each cell is approximately 30 meters and the cells are overlapping with the MN situated at the periphery. A handoff to the next base station will only occur if the signal strength at the new cell location is stronger than that of the previous cell, and this signal strength has been sustained for a period of no less than 4 seconds. This sustained period is necessary to ensure that the difference in signal strength is a result of the MN’s proximity to the new base station and is not due to external noise or other factor. Given the speed of the MN, this would mean that a distance of 16.5 meters would need to be covered before a handoff is initiated so that a stronger signal from the second base station may be verified. In all, it will take approximately 14.6 seconds for a handoff (with the MN located at the cell periphery) to the new cell even though the stations are only 3 meters apart.
16
W.H. Hassan and A.M. Al-Imam
4.2 Performance Metrics The performance metrics for mobile Quality of Service (QoS) include signaling cost, packet latency and packet loss [10]. The total signaling cost is the cost of transmitting control packets for the purpose of creating a new address binding or mapping during handovers i.e. binding update, refreshing an existing address binding upon timer expiry, and for packet delivery. In the test bed, the signaling cost is predominantly influenced by packet delivery and is determined based on the total number of signaling or control packets that have to be sent to ensure data packet delivery to the mobile node. The packet delay was calculated by analyzing the difference in time between the outgoing CN’s packets and the incoming traffic to MN, while packet loss was calculated based on the number of packets that were received by the MN against the number of packets that were sent by the CN.
5 Results and Analyses 5.1 Total Signaling Cost Figure 4 shows the signaling cost for AMP and MIPv6 against call-mobility-ratio (CMR). The notion of call-to-mobility ratio (CMR) is used to reflect the average number of calls or sessions to a MH within an access network per unit of time that a MH changes cells/subnets per unit of time [11, 12]. Generally, small values of CMR means that the rate of movement of the mobile node is much higher compared to the arrival rate of calls or sessions. As such, the signaling cost will be larger as the mobile node crosses over many cells/subnets triggering registrations and handovers as it moves. As the CMR increases, the mobility rate is less than or equal to the call or session arrival rate, and this translates to less binding updates that need to be done since the number of crossovers and handovers are small. Hence, the signaling cost decreases as the value of CMR increases, and the rate itself decreases as CMR approaches 0.5.
Fig. 4. Relationship between signaling cost and CMR for AMP and MIPv6
An IPv6 Test Bed for Mobility Support in Next Generation Wireless Networks
17
As depicted in Figure 4, the total signaling overhead is much larger in MIPv6 than in AMP. This is because most of the binding updates in AMP are localized i.e. sent within the same access network to the local registrar agent due to the hierarchical architecture of AMP. Only when the MH moves to another access network will the binding update be sent to the home and correspondent registrar agents. In contrast, MIPv6 requires binding updates to be sent to the home and correspondent agents every time the MH changes location to a new subnet (globalised update). Generally, the average performance gain for AMP in terms of lower signaling cost is approximately 70% over MIPv6. 5.2 Packet Delay The packet delay is the average time required to transmit a packet from the correspondent node to the mobile node as it moves from one location to another over a period of time. Figure 5 shows the packet delay for AMP and MIPv6. The time represents the period in which the mobile node moves from one subnet location to another (travel time). Generally, there are three intervals reflecting packet delays within a particular cell (since there are 3 cells in the test bed). For AMP, the average packet delay in each interval is almost constant at approximately 0.14 second. It may be seen that the key advantage of AMP is that packet delay is not significantly affected by the mobile node’s distance from his home network, and on average is constant at about 0.14 seconds regardless of the node’s location. In MIPv6, the average packet delay increases as time progresses where the mobile node moves further away from his home network, as shown in Figure 5. Initially, the average packet delay for the mobile node was about 0.14 second, similar to AMP. However, as the mobile node moves to the second subnet, the average packet delay increases to about 0.18 second. When the mobile node moves to the third subnet, the average packet delay continues to increase to about 0.32 second. This is due to the
Fig. 5. Packet delay versus MN travel time for AMP and MIPv6
18
W.H. Hassan and A.M. Al-Imam
fact that all registration and binding updates for handovers must be done through the home network and this incurs significant delays as the mobile node moves further away from his home network. In all cases, hard handovers are assumed between cells and this is reflected as gaps in the graph at the time periods of 10 and 20 seconds, respectively. 5.3 Packet Loss Packet loss is an important mobile QoS parameter for loss-intolerant applications. In the AMP architecture, packet loss is reduced through lower latencies during movements. Figure 6 below shows packet loss for both AMP and MIPv6 as the mobile node roams to different subnets. In both architectures, packet loss will still occur as a result of hard handovers and delays. Although packets may be buffered in AMP, in this particular scenario, hard handovers were performed (no buffering of packets) and as such, packet loss occurred during the periods of 10 and 20 seconds. This is to examine the worst case scenario for both protocols. Generally, the number of packet loss in AMP is less than in MIPv6 and when they do occur, the duration period for the packet loss is slightly shorter. As shown in Fig. 6, packet loss in AMP is marginally less than in MIPv6 – 8.41% packet loss in AMP versus 13.04% for MIPv6. This is due to the fact that in AMP, handover procedures are faster than in MIPv6 since a hierarchical architecture is used where there is a distinction between local (intra-network) and global (inter-network) movements. For local movements, the bindings in AMP are equally localized i.e. there is no need for the home and correspondent registrars to be notified.
Fig. 6. Comparison of packet loss between AMP and MIPv6
6 Conclusion and Further Work This paper represents our initial work in deploying the AMP architecture over an IPv6 test bed for improved mobile Quality of Service in next generation wireless networks.
An IPv6 Test Bed for Mobility Support in Next Generation Wireless Networks
19
So far, results from the evaluation of the AMP architecture have suggested that it is indeed feasible to deploy intelligence at the edges of the network via agents for improved mobility services without violating too much on the Internet’s principle of the end-to-end arguments. The core IP network has remained untouched while enhancements in infrastructure have been restricted mainly to the access networks. Both application- and lower-layer independence have been assumed throughout the development of the AMP architecture. The performance of AMP was evaluated against MIPv6 using three performance metrics i.e. signaling cost with varying call-mobility ratios, packet delay and packet loss. Initial results have been optimistic and indicate that AMP outperformed MIPv6 with an average of 70% lower signaling cost, lower handover delay, and 5% less packet loss than MIPv6. AMP may be suited for micro-mobility (fast node movement), may give better support for real-time applications with lower latencies, and may be a viable alternative to MIPv6. As part of this ongoing work, the next steps would be to evaluate the AMP architecture further in terms of varying traffic condition, buffer size, and scalability. In reducing packet loss, the size of buffers would be a significant factor for consideration and this would also depend on the number of mobile nodes that the network would need to support. In addition, AMP may need to be evaluated against the hierarchical Mobile IPv6 (HMIPv6) protocol since both architectures provide a distinction between local and global movements. In this respect, it is hoped that the AMP architecture may be developed further as a more comprehensive and viable solution in supporting Internet-based mobility in the near future.
Acknowledgement This work has been funded by a two-year research grant from Sunway University Malaysia. We would also like to acknowledge Ms May Al-Kulabi for her contribution as part of the linux-network programming team at the Network Research Lab, School of Computing Technology, Sunway University.
References 1. Saltzer, J., Reed, D., Clark, D.D.: End-to-End Arguments in System Design. In: Second International Conference on Distributed Computing Systems, ACM Transactions on Computer Systems, vol. 2(4), pp. 277–288. ACM, New York (1984) 2. Johnson, D., Perkins, C., Arko, J.: Mobility Support in IPv6. RFC 3375. Internet Engineering Task Force (2004) 3. Perkins, C.: IP Mobility Support in IPv4. RFC 3440. Internet Engineering Task Force (2002) 4. Yabusaki, M., Okagawa, T., Imai, K.: Mobility Management in All-IP Mobile Network: End-to-End Intelligence or Network Intelligence? IEEE Communications Magazine 43(12), S16–S23 (2005) 5. Kurose, J., Ross, K.: Computer Networking: A Top Down Approach, 5th edn. Pearson Addison Wesley, Boston (2010)
20
W.H. Hassan and A.M. Al-Imam
6. Audsin, D.P., Aghvami, A.H., Friderikos, V.: Implementation Architecture of HMIPv6 in an Emulated Self-configuring Multi-hop Mobile Network TestBed. In: 5th International Conference on Testbeds and Research Infrastructures for the Development of Networks & Communities and Workshops, pp. 1–10. IEEE, Washington (2009) 7. Biswas, P.K., Serban, C., Poylisher, A., Lee, J., Mau, S., Chadha, R., Chiang, C.J., Orlando, R., Jakubowski, K.: An Integrated Testbed for Virtual Ad Hoc Networks. In: 5th International Conference on Testbeds and Research Infrastructures for the Development of Networks & Communities and Workshops, pp. 1–10. IEEE, Washington (2009) 8. Alazzawe, A., Alazzawe, A., Wijesekera, D., Dantu, R.: A Testbed for Mobile Social Computing. In: 5th International Conference on Testbeds and Research Infrastructures for the Development of Networks & Communities and Workshops, pp. 1–6. IEEE, Washington (2009) 9. Rosenberg, J., Schulzrinne, H., Carmarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., Schooler, E.: SIP: Session Initiation Protocol. RFC 3261. Internet Engineering Task Force (2002) 10. Makaya, C., Pierre, S.: An Analytical Framework for Performance Evaluation of IPv6-Based Mobility Management Protocols. IEEE Transactions on Wireless Communications 7(3), 972–983 (2008) 11. Jain, R., Lin, Y., Lo, C., Mohan, S.: A Caching Strategy to Reduce Network Impacts of PCS. IEEE Journal on Selected Areas in Communications 12(8) (1994) 12. Wang, K., Huey, J.: A Cost Effective Distributed Location Management Strategy for Wireless Networks. ACM/Kluwer Wireless Networks 5(4), 287–297 (1999) 13. Hassan, W., Mustafa, A., Fisal, N.: An Agent-Based Mobility Protocol for Improved Mobile QoS in Next Generation IP Networks. In: Proc. of the IEEE 5th International Conference on Networked Computing (INC 2009), Seoul, Korea (2009) 14. Hassan, W.H., Hashim, A., Mustafa, A., Fisal, N.: AMP - A Novel Architecture for IPbased Mobility Management. International Journal of Computer Science and Network Security 8(12) (2008); ISSN: 1738-7906 15. Hassan, W.H., Hashim, A., Fisal, N.: Performance Analysis of AMP for Mobility Management. International Journal of Engineering (IJE) 2(5) (2008); ISSN: 1985-2312 (Openonline access) 16. Hassan, W.H., Hassan, A., Fisal, N.: A Tracking Mechanism Using Agents for Personal Mobility In IP Networks. In: Proc. of the 6th International Conference on Industrial Informatics (INDIN 2008). IEEE, Daejeong (2008) 17. Hassan, W.H., Hassan, A., Fisal, N.: A Dynamic Approach for Location Management in IP-based Networks. In: Proc. of the IEEE International Conference on Computer & Communications Engineering 2008 (ICCCE 2008). IEEE, Kuala Lumpur (2008)
Test Management Traceability Model to Support Software Testing Documentation Azri Azmi and Suhaimi Ibrahim Advanced Informatics School (UTM AIS), Universiti Teknologi Malaysia 54100 Kuala Lumpur, Malaysia {azriazmi,suhaimiibrahim}@utm.my
Abstract. Software Documentation is one of the key quality factors in software development. However, many developers are still putting less effort and less priority on documentation. To them writing documentation during project development is very tedious and time consuming. As a result, the documentation tends to be significantly out-dated, poor quality and difficult to access that will certainly lead to poor software maintenance. Current studies have proved that the key point to this problem is software traceability. Traceability relates to an ability to trace all related software components within a software system that includes requirements, test cases, test results and other artefacts. This research reveals some issues related to current software traceability and attempts to suggest a new software traceability model that focuses on software test documentation for test management. This effort leads to a new software test documentation generation process model based on software engineering standards. Keywords: Software Traceability, Software Documentation, Software Testing.
1 Introduction Nowadays software is becoming more complex. It consists of diverse components with distributed locations, complex algorithms, on varieties of platforms, many subcontractors with different kind of development methodologies and rapid technology innovation. The cost and risk will become higher in software development project with this kind of complexity as reported by Boehm [1]. It is vital to ensure the reliability and correctness the software being developed. Such aims can be reached using documentation as tools. Documentation is a detailed of descriptions of particular items and used to represent information such model, architecture, record artefacts, maintain traceability of requirement and serial decisions, log problems and help in maintaining the systems. Software developers rely on documentation to assist them in understanding the requirement, architecture design, coding, testing and details of intricate applications. Without such documentation, engineers have to depend only on source code. This will consume time and lead to make mistakes [2] especially when developing large scale systems. As reported by Huang and Tiley [3] and Sommerville [4], there are several shortcomings in current documentation such as out-of-date, inconsistency between source code and documentation, poor quality and others. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 21–32, 2011. © Springer-Verlag Berlin Heidelberg 2011
22
A. Azmi and S. Ibrahim
The key point solution to the above problems is software traceability. Traceability is defined as the ability to link between various artefacts in software development phases linking requirements, design, source code and testing artefacts. In the early seventies, requirements traceability was driven mainly obligatory policy such DoD2167A for US military systems [5]. Later, many institutions recommended traceability IEEE Standard, SPICE, CMM/CMMI have gathered more awareness. Today, software traceability has become one of the key attributes to software quality. Unfortunately, many organizations failed to implement effective traceability due to difficulties in creating, assessing, using and maintaining traceability links [6, 7]. The accurate traceability practices can help in maintaining software. Thus it will improve the quality of system as well as the software process. On the other hand, neglecting traceability can lead to reduce the quality of the software product. The quality of the software product cannot be achieved when it is not fully tested and traced with the requirements. In this paper, we present an implementation of software traceability model that support test management in generating software testing documentation base on software engineering standards.
2 Related Works As development becomes complex, the task of connecting between requirements and various artefacts becomes tedious and sophisticated. The IEEE Standard Computer Dictionary [8] defines traceability as “The degree to which a relationship can be established between two or more product of the development process, especially products having a predecessor-successor or master-subordinate relationship to one another; for example, the degree to which the requirements and design of a given software component match”. Meanwhile [9] defines traceability in much broader definitions. Aizen defined traceability as any relationship that exists between artefacts that involved the software development life-cycle. There are many benefits of software traceability. Most commonly, it is claimed to help in change management [10-12], system verification [7, 13], help in performing impact analysis [12], reuse of software artefacts[14] and meets the need of the stakeholders [7, 15, 16]. With support of sophisticated tool that capable to store and retrieve links, traceability still facing several issues such [17]: (i) the process of tracing is still done manually, (ii) missing of information to be traced, (iii) engineering issues that could arise later, so the trace information may be insufficient. Traceability is referenced in many software development and standards, however specific requirements or guidelines on how it should be implemented are rarely provided. The IEEE Standard Computer Dictionary [8] defines model as “An approximation, representation, or idealization of selected aspects of the structure, behaviour, operation, or other characteristics of a real-world process, concept, or system”. Meanwhile, a broader definition from Microsoft Computer Dictionary [18] stated that model as “A mathematical or graphical representation of a real-world situation or object -for example, a mathematical model of the distribution of matter in the universe, a spreadsheet (numeric) model of business operations, or a graphical model of a molecule. Models can generally be changed or manipulated so that their creators
Test Management Traceability Model to Support Software Testing Documentation
23
can see how the real version might be affected by modifications or varying conditions.” In summary, model is a hypothetical description of complex entity or process. Software traceability model is an obligation for an organization before any development begins. It arises from the complexity of development such varieties of platforms, distributed locations, tools and technology used, organizational structure, different organizations, policies, standards, and development methodologies. As discussed in previous section, traceability is crucial to verify and validate the completeness and consistency with the requirements. It can provide significant benefits, if it is properly implemented. Therefore, a strategy for implementation of software traceability must be carefully defined. Different organizations or projects will have different set of traceability model depending on business/organizational, domain, project, product or technology [19]. The key point of implementing efficacious software traceability includes linkage of data and artefacts, semantic content, and automation capability. Traceability will be able to be achieved when all these aspects are addressed. After performing analysis of the existing model, a number of findings and limitations of existing software traceability models were identified. (Detailed of comparative study i tabulated in Table 1). There are several software traceability model discussed by researchers and next paragraphs will discussed them. Narmanli [20] presents a traceability model to support requirements to requirements traceability known as Inter Requirements Traceability Model (IRT). The model consists of three types of software requirements such use cases (GUI, test data, qualification), business rules (business-rule engine, test data, qualification test procedures) and data definitions (entities, database, test data). The traceability model proposed bidirectional traces of these types. The model tries to minimize the effect and the workload of change requests on implementation and tests to satisfy both customers and development teams. The contribution for this model is that it support for change request impact analysis. Advantages of using this model are: i) minimize the effect and the workload of change requests on implementation and tests when constructing the software design, ii) ease in make-buy-reuse analysis, iii) effective tests, where every business rule is ready to become a test procedure. Meanwhile, the drawback of this model is that it only supports traceability between requirements only. Ibrahim et at.[21] introduced a model that derived requirement traceability from system documentation. This model is called a Total Traceability Model (TT). It provides links of different artefacts that include requirements, design, test cases and source code. It uses horizontal traceability and vertical traceability to make up a total traceability. The model capture relationships across different levels of artefacts before an impact analysis can be implemented. The process of tracing and capturing these artefacts is called hypothesizing traces. The horizontal relationships can occur at the cross boundaries as shown by the thin solid arrows, while the vertical relationships can occur at the code level and design level respectively. The thick doted lines represent the total traceability that need to implement in either top down or bottom up tracing. Tools such McCabe and Code Surfer were used to help in capturing the dependencies. The tracing type is between requirement-test cases, test case-code, method-method and class-class. A prototype CATIA has been developed to demonstrate the model. A significant contribution of this model is that it ability to support top down and bottom up tracing from a component perspective.
24
A. Azmi and S. Ibrahim
Meanwhile, Salem [22] in his research established a traceability model that provides an intuitive and dynamic way of requirements traceability during software development process. This model is named as Coding Phase Requirements Traceability Model (CPRT). The model composed of a Traceability Viewer Component (TVC), a Traceability Engine Component (TEC) and Quality Assurance Interface (QAI). The TEC is used to help developers to link source code element with the software requirements. Meanwhile, TVC acts as viewing medium to view links between requirements and source code. It provides software engineer with a distinctive way to scrutiny all the information that TEC has gathered. Lastly, QAI is the component that validates and verifies of requirements. A flagging procedure is designed using Requirement Flags to provide traceability between requirements and source code. The preliminary model provides a simple interface that allows developers to seamlessly locate the correct requirements and link them to the correct source code elements. Limitation of the model is that only trace links between requirements and source code. Asuncion et al. [23] proposed end-to-end traceability model (ETET). This processoriented model achieves comprehensive traceability and supports the entire software development life cycle (SDLC), from the requirements phase to the test phase by focusing on both requirements traceability and process traceability. It emphasized on process traceability as an important facet of effective requirements traceability. Three main goals were set to be achieved by using this traceability model. First, minimize overhead in trace definition and maintenance, followed by preserve document integrity and lastly to support SDLC activities. A successful prototype tool has been developed to demonstrate the model. It used bidirectional updates between documents and the artefact repository to guarantee the document integrity. A tool has been developed at Wonderware, a mid-sized software development company. Limitation of the model is that only support for post-requirements traceability. The boxes at the top represent the global trace artefacts and solid lines represent their requirements trace links. Meanwhile, the users of the system shown at the bottom of diagram and all of them are consumers of trace information. Finally, TraceabilityWeb model (TW) was introduced by Kirova et at. [19]. They have examined the traceability problem-solution in the context of multiple aspects such business/organization, project, product, development model and technology. A tool called TraceabilityWeb has been developed to demonstrate the feasibility and the benefits of integrated tools environments, which automate the creation and maintenance of traceability information. It also provides enough content to start test in early phase compared to traditional approach. These include test planning, test creation and design specification efforts. The model also established benefits such aseffortless change impact analysis, simpler maintenance of large volumes of data, flexible levels of granularity when creating links, metrics and reporting. It integrates multiple artifact repositories, including requirements management systems, test management system and databases. It also auto generates a significant portion of artifact mappings and supports most functional areas such system engineering, architecture, development, test, product management and test management. The drawback of this model is that it only capable to small teams such agile methodologies.
Test Management Traceability Model to Support Software Testing Documentation
25
Table 1. Comparative Study of Traceability Model Table 1.
Model /II tem
IRT
TT
CPRT
ETET
TW
TMT
Business Rule, Use Case and Data Definition R-R
Method, Class, Package
Documents, artefacts repository
Artefacts at different level of granularity
Requirement, Design, Code, Test Documents
R-D-C-TC
Source Code, Requirement s R-C
R-D-C-T
R-D-C-P-TA
n/a
Yes, CATIA
n/a
MR-UC-RTC Yes, no name given
Yes, Implementation Phase
Support
Change Impact Analysis
Strength
-
- Software Maintenance from system documentation - Change Impact Analysis Support traceability through SDLC
Prescriptive Improve workflow Software Quality by validating and verifying requirements Efficient Postrequirements way of tracking and traceability tracing supporting requirements SDLC
Yes, combination of TMS, APXTMS, HWTMS Notification mechanisms and configurable control
No tracing for prerequirements
Tracing only on requirement to source code
Tracing Item
Tracing Type Tool Support
Limitation
Tracing only on requirement to requirement
C=Code D=Design MR=Marketing Requirement P=Plan TA=Testing Artefacts.
-no tracing in maintenance phase and prerequirement -tracing textbase artefacts only
R=Requirement
-Effortless change impact analysis -simpler maintenance for large volume data Applicable to small team (agile methodologies )
T=Testing
-Test document generation based on SE standard -Test Management
-pre and post requirements traceability -support whole SDLC
TC=Test Cases
-
UC=Use Cases
Documentation is a written material that serves as a record of information and evidence. Software engineering documentation encompasses not only source code, but also all intermediate work products associated to the code and its validation and operation, such as contracts, design architectures and diagrams, reports, configurations, test cases, maintenance logs, design comments, and user manuals. According to Wang [24] documentation is a software engineering principle that is used to embody system design and architectures, record work products, maintain traceability of serial decisions, log problems and maintenance solutions, and enable post-mortem analysis. While Forward [25] in his thesis defined software documentation as an artefact whose purpose is to communicate information about the software system to which it belongs. From the Agile perspective, a document is any artefact external to source code whose purpose is to convey information in a persistent manner [26].
26
A. Azmi and S. Ibrahim
Software engineers need to rely on documentation to aid in understanding the software systems. Regrettably, the artefacts available for them usually are out of date and therefore cannot be trusted [2]. Developers need to depend solely on source code because of unavailability of such documentation. Thus, the process becomes an error prone and time consuming, especially when dealing with manual tracing the traceability link for large scale systems. Sulaiman et al. [27] in her survey stated three main reasons why software documentation were not produce during software development; time constraint, commercial pressures and not requested by supervisors. Other reasons are not requested by customers, tedious task, too costly to keep updated, boring task, and more.
3 Evaluation of Software Traceability Model Before a new model can be proposed, an evaluation on existing models needs to be investigated and evaluated. Therefore, the way of assessment needs to be determined. The software evolution taxonomy [28] has been used in order to evaluate the traceability model together with perspectives introduced by Hazeline[29]. The taxonomy of software evolution is based on multiple dimensions characterizing the process of change and factors that influence these mechanisms while the perspectives discuss on economic, technical and social perspectives. The evaluation criterion is based on accessibility (mapping between artefacts), scalability, capturability (degree of automation), tool supportability, temporality (when to trace) and scalability. The results of evaluation are tabulated in Table 2. The rational of choosing the criterion is explained in details in next paragraphs. The accessibility criterion is evaluated to determine whether a model can be mapped between artefacts in software development life cycle. Such documents are requirements specification, software design, source code, and test suits. Of the observations made, it appears the entire model are capable to link requirements to design, source code and test suits except for IRT and CPRT. For CPRT, tracing is between requirements and source code only. While the IRT, the link is between the requirements and requirements. Meanwhile, the criterion capturability is to help in analyzed, managed, control, implement or measure changes in software. The mechanisms for this case can be automated, semi-automated or manual. Result of the comparison made shows for all models, links are formed in a semi-automatic and there is no link made manually. For TT, ETET, TW and TMT, links can be generated automatically. As for the next criterion is tool supportability, which is evaluating whether the model provides tool support to accommodate links between artefacts. The purpose of the tool support is to help in visualizing and managing traceability. On this criterion, the entire model can be supported using tool except IRT and CPRT. The next criterion is temporality. Temporality refers to time and when a link is created or updated. TW and TMT allow links in both development phase and maintenance phase. Meanwhile, other models such IRT, CPRT and ETET only in the development phase. In addition, the TT model dedicated in maintenance phase. Finally is the scalability criterion. This criterion is to analyze whether the model can be applied to large-scale projects. The result showed that TT and ETET models can be applied to large-scale systems compared to others only to small and medium-sized projects.
Test Management Traceability Model to Support Software Testing Documentation
27
Table Table 2. 2. Evaluation of Software Traceability Model Featur es Accessibility (i) Requirement (ii) Design (iii)Source Code (iv) Test Suits Capturability (i) Automatic (ii) Semi (iii)Manual Tool Supportability Temporality (i) Development (ii) Maintenance Scalability
IRT
TT
CPRT
ETET
TW
TMT
¥
¥ ¥ ¥ ¥
¥
¥ ¥ ¥ ¥
¥ ¥ ¥ ¥
¥ ¥ ¥ ¥
¥ ¥
¥ ¥
¥ ¥
¥
¥
¥
¥
¥ ¥
¥ ¥
¥
¥ ¥
¥
¥
¥ ¥
¥ ¥ ¥
¥
4 Result This research is intended to create a traceability model that will be used in order to generate a software testing documentation based on Software Engineering Standards. As such, a preliminary study was conducted in finding an approach that can suite the traceability within software testing artefacts that lead to establish a repository. Software traceability has been used by many researchers and practitioners and it is a key factor for improving software quality. There are numerous benefits of using traceability such as to keep documentation updated and consistent within artefacts, enabling requirements-based testing, early revision of requirements testing, and improve in management of change impact analysis etc. Despite these advantages, traceability is hard to implement in software development. There are weaknesses in current approaches and models [23]. There is a lack of research on traceability in finding relationships between software testing artefacts. 4 Several Resulttools and research prototypes have been analyzed and compared in order to find the similarities and differences in previous paper [30]. Out of all, there is only one prototype was closely similar to this proposed study. PROMETUE [31] is a prototype tool that was developed to introduce or improve the software testing process of a small software company. The artefacts and information elements to trace are based on IEEE829-1998. However, PROMETUE was developed to support traceability within artefacts such of documents and requirements only. Our proposed research is to establish a traceability model that governs various artefacts such source code, documents, testing tools files, requirements, legacy systems and stakeholders.
5 Discussion Fig. 1 shows the proposed model is called Test Management Traceability (TMT) model that will generate software testing artefacts based on Software Engineering
28
A. Azmi and S. Ibrah him
Standards. There are four main components namely Traceability Engine, Analyyzer, Extractor/Parser and Docu ument Generator. The proposed model illustrates thatt all the data are gathered and stored in a repository. Firstly, the tool will analyze the information to be stored an nd will create a repository of traceability links. The stoored data in the repository may y come from a variety of sources and format such as:: (i) source code (Java and C+ ++), (ii) software documents such Interface Requiremeents Specification (IRS), Softw ware Requirements Specification (SRS), Software Dessign Descriptions (SDD), Softw ware Test Result (STR), Software Test Descriptions (ST TD), (iii) legacy systems, (iv) stakeholders/users, (v) output files from testing tools, (vi) requirements and (vii) expeerts.
Fig. 1. Test Management Traceability Model
Extractors/Parsers as ageents will be used in order to extract the desired informattion to be converted into eXtenssible Markup Language (XML) as a raw format. The XM ML files will be used by analyzers to create the traceability among artefacts and the outtput will be saved into a reposiitory called a traceability repository. This repository tthen will be used as an input to the process of generating software testing document. T The document generator will bee developed in an integrated environment with the template repository to produce a so oftware testing documentation. The next paragraphs w will explain in more detail of thee proposed model components. Parser task is to analyze continuous flow of input and breaks into constituent paarts. Several parsers will be ussed to convert multiples format of input data into XM ML format. It will support the capture, summarization and linking of software artefaacts information. This support for extracting information from wide variety of souurce artefacts, viewing it in su ummarization form, manages changes to the artefactss in different representations. Software information sources may include format of T Microsoft WordTM, ExcelTM , modeling application such Rational RoseTM, testting
Test Management Traceability Model to Support Software Testing Documentation
29
application such SpiralTeamTM, TestSuiteTM and RobotTM. It also may include of an email files, data from legacy systems or a text files (.txt). The goal is to gather all artefacts that available into one central repository to streamline test management approach. All the data stored in the raw repository will be in XML format. XML is designed to transport and store data and is acknowledged as the most effective format for data encoding and exchange over domains ranging from internet to desktop applications. The hierarchical nature of XML documents provides useful structuring mechanisms for program artefacts. It is chosen because of easily processed by commonly available third-party tools such editors, browsers or parsers.
Fig. 2. Traceability link meta-model
In order to generate documentation from an artefact, an analyzer is needed to analyze the data. There are some existing approaches being made available to analyze the data, such lexical analysis, syntactic analysis, island grammars, and parse tree analysis. The most appropriate approach will be determined during the implementation phase. Meanwhile, the input data for traceability engine will be a traceability repository or corpus. In this repository, relationships or link across artefacts will be kept. A unique key will be given to each requirement related to it. Several items or artefacts such test cases, design item (module, class, packages), and codes may refer or link to one requirement. In order to define this repository, a structure must first be defined. Traceability link metamodel will be used as shown in Fig. 2 that inspired by Valderas [32]. According to the meta-model, traceability link presents as Identifier and will be develop using relational database. MS SQL will be used as database to store all artefacts. Before the data is stored in the database, we need to trace and capture their relationships among artefacts. The
30
A. Azmi and S. Ibrahim
process of tracing and capturing these artefacts is called hypothesizing traces that was introduce by Ibrahim [21]. Fig 3 depicted one way of hypothesizing traces. It can be described in the following sequence. For a selected requirement, choose a test case. One requirement might have several test cases. Then clarify with other documentations such plan, design and codes followed by observe traces with code. From this, link can be generated. An information retrieval (IR) method will be utilized to drive the traceability link. A distinct advantage of using this traceability method is that it does not rely on a predefined vocabulary or grammar for the documentation. As a consequence, this method can be applied without large amount of pre-processing of the input. A method of IR so called Latent Semantic Indexing (LSI) will be used. A traceability engine involves with task of setting traceability elements specifically designed to link with other artefacts to constitute some traceability links in a repository. In order to create this repository, a specific structured must be defined first. This structure must be used to store information relating to traceability links. Document generator is a process of generating documentation based on Software Engineering Standards. It will generate document such STD and STR based on the template repository. Existing artefacts and documents are also used as input to the data gathering process. Generate documentation is a process of collecting data and information, analyzes it, combining this information with other resources, extrapolate new facts, and generating updated documentation. By generating documentation when it is needed, software engineers can simply acquire documentation when they need it without worry about cataloguing, storing or sharing a repository of documentation. The generated documents need to preserve its integrity. It is carried out by using bidirectional updates between documents and the traceability repository.
Fig. 3. Hypothesized traces
Test Management Traceability Model to Support Software Testing Documentation
31
This research is expected to establish a software testing documentation process model using a traceability approach. The findings include: (i) Defining documentation features based on software engineering standards. (ii) an evaluation result on existing approaches and models, (iii) established a new comprehensible process of integrated system as a proposed solution, (iv) a tool to be developed to support test management for software testing activities.
6 Conclusion Software documentation is vital for software engineers to help them in producing a good quality of system software. Without such aid, they are solely relying on source code that leads to error prone and time consuming. A key point here is to establish a workable traceability model and approach to meet the demand of software documentation. The correct traceability use can help a lot of activities within the development process. It can improve the quality of the product as well, as the software process. Traceability practices in general are far from mature and a lot of researches need to be done. A new traceability model is expected to support software testing documentation that will certainly be useful in software maintenance activities.
References 1. Boehm, B.: Value-based software engineering: reinventing. ACM SIGSOFT Software Engineering Notes 28(3), 3 (2003) 2. Thomas, B., Tilley, S.: Documentation for software engineers: what is needed to aid system understanding? In: Proceedings of the 19th Annual International Conference on Computer Documentation, p. 236 (2001) 3. Huang, S., Tilley, S.: Towards a documentation maturity model. In: Proceedings of the 21st Annual International Conference on Documentation, pp. 93–99. ACM, San Francisco (2003) 4. Sommerville, I.: Software engineering:the supporting process., vol. 2. Addison-Wesley, Reading (2002) 5. Albinet, A., Boulanger, J.L., Dubois, H., Peraldi-Frati, M.A., Sorel, Y., Van, Q.D.: Modelbased methodology for requirements traceability in embedded systems. In: 3rd ECMDA Workshop on Traceability (June 2007) 6. Knethen, A., Paech, B.: A survey on tracing approaches in practice and research. IESEReport No. 095.01/E. 95 (2002) 7. Ramesh, B., Jarke, M.: Toward reference models for requirements traceability. IEEE Transactions on Software Engineering 27, 58–93 (2001) 8. Geraci, A.: IEEE standard computer dictionary: compilation of IEEE standard computer glossaries. Institute of Electrical and Electronics Engineers Inc. (1991) 9. Aizenbud-Reshef, N., Nolan, B.T., Rubin, J., Shaham-Gafni, Y.: Model traceability. IBM Systems Journal 45, 515–526 (2006) 10. Cleland-Huang, J., Chang, C.K., Sethi, G., Javvaji, K., Hu, H., Xia, J.: Automating speculative queries through event-based requirements traceability. In: Proceedings of the IEEE Joint International Requirements Engineering Conference (RE 2002), pp. 9–13 (2002) 11. Chang, C.K., Christensen, M.: Event-based traceability for managing evolutionary change. IEEE Transactions on Software Engineering 29, 796–810 (2003)
32
A. Azmi and S. Ibrahim
12. Cleland-Huang, J.: Requirements traceability: When and how does it deliver more than it costs? In: 14th IEEE International Conference Requirements Engineering, pp. 330–330 (2006) 13. Ramesh, B.: Factors influencing requirements traceability practice (1998) 14. Knethen, A., Paech, B., Kiedaisch, F., Houdek, F.: Systematic requirements recycling through abstraction and traceability. In: Proc. of the Int. Conf. on Requirements Engineering, pp. 273–281 (2002) 15. Antoniol, G., Canfora, G., Casazza, G., De Lucia, A., Merlo, E.: Recovering traceability links between code and documentation. IEEE Transactions on Software Engineering, 970–983 (2002) 16. Marcus, A., Maletic, J.I.: Recovering documentation-to-source-code traceability links using latent semantic indexing. In: Proceedings of the 25th International Conference on Software Engineering, pp. 125–135 (2003) 17. Egyed, A., Grünbacher, P.: Automating requirements traceability: Beyond the record & replay paradigm. In: Proceedings of the 17th IEEE International Conference on Automated Software Engineering, pp. 163–171 (2002) 18. Aiken, P., Arenson, B., Colburn, J.: Microsoft computer dictionary. Microsoft Press (2002) 19. Kirova, V., Kirby, N., Kothari, D., Childress, G.: Effective requirements traceability: Models, tools, and practices. Bell Labs Technical Journal 12, 143–158 (2008) 20. Narmanli, M.: A business rule approach to requirements traceability (2010) 21. Ibrahim, S., Idris, N.B., Uk, M.M., Deraman, A.: Implementing a document-based requirements traceability: A case study. In: IASTED International Conference on Software Engineering, pp. 124–131 (2005) 22. Salem, A.M.: Improving software Quality through requirements traceability models. In: IEEE International Conference on Computer Systems and Applications 2006, pp. 1159–1162 (2006) 23. Asuncion, H.U., François, F., Taylor, R.N.: An end-to-end industrial software traceability tool. In: Proceedings of the the 6th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pp. 115–124. ACM, Dubrovnik (2007) 24. Wang, Y.: Software engineering foundations: A software science perspective. AUERBACH (2008) 25. Forward, A.: Software Documentation–Building and Maintaining Artefacts of Communication (2002) 26. Ambler, S.W., Jeffries, R.: Agile modeling: effective practices for extreme programming and the unified process. Wiley, New York (2002) 27. Sulaiman, S., Idris, N.B., Sahibuddin, S.: Production and maintenance of system documentation: what, why, when and how tools should support the practice (2002) 28. Buckley, J., Mens, T., Zenger, M., Rashid, A., Kniesel, G.: Towards a taxonomy of software change. Journal of Software Maintenance and Evolution 17, 309–332 (2005) 29. Asuncion, H., Taylor, R.N.: Establishing the Connection Between Software Traceability and Data Provenance (2007) 30. Azmi, A., Ibrahim, S., Mahrin, M.N.: A software traceability model to support software testing documentation. In: Proc. 10th IASTED International Conference on Software Engineering, pp. 152–159. IASTED, Innsbruck (2011) 31. da Cruz, J.L., Jino, M., Crespo, A.: PROMETEU-a tool to support documents generation and traceability in the test process (2003) 32. Valderas, P., Pelechano, V.: Introducing requirements traceability support in model-driven development of web applications. Information and Software Technology 51, 749–768 (2009)
Software Maintenance Testing Approaches to Support Test Case Changes – A Review Othman Mohd Yusop and Suhaimi Ibrahim Advanced Informatics School, Universiti Teknologi Malaysia International Campus 54100 Jalan Semarak, Kuala Lumpur, Malaysia {othmanyusop,suhaimiibrahim}@utm.my
Abstract. Software Maintenance Testing is essential during software testing phase. All defects found during testing must undergo a re-test process in order to eliminate the flaws. By doing so, test cases are absolutely needed to evolve and change accordingly. In this paper, several maintenance testing approaches namely regression test suite approach, heuristic based approach, keyword based approach, GUI based approach and model based approach are evaluated based on software evolution taxonomy framework. Some of the discussed approaches support changes of test cases. Out of the review study, a couple of results are postulated and highlighted including the limitation of the existing approaches. Keywords: Maintenance Testing, Test Case, Test Suite, Software Change.
1 Introduction Maintenance testing as defined in ISTQB glossary terms (standard glossary terms ver2.0) “testing the changes to an operational system or the impact of a changed environment to an operational system”. There are two type of maintenance testing that relates to changes in artefacts during the maintenance phase: confirmation testing and regression testing. Maintenance testing phase happens after the deployment of the system. Over time, the system is often changed, updated, deleted, extended, etc during software evolution. The artefacts that support the system need to be updated concurrently to avoid being outdated as compared to the source codes. [1] shows 80% of overall testing budget went to retesting the software and 50% of total software maintenance is consumed by retest alone. Confirmation Testing can briefly define as a re-testing. Defects found during testing will be corrected and another test execution will take place to re-confirm the failure does not exist. During the retest, originality of test environment, data and inputs have to be exactly identical as it was tested in the first time. If the confirmation testing has passed, it does not guarantee the defect has been corrected. It might introduce defects somewhere else, hence regression testing is required. In order to ensure the defect does not propagate to other functionalities, regression testing has to be carried out. More specifically, the purpose of regression testing is to verify that modifications in the software or the environment have not caused unintended adverse side effects and that the system still meets its requirements. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 33–42, 2011. © Springer-Verlag Berlin Heidelberg 2011
34
O.M. Yusop and S. Ibrahim
In this paper some of the maintenance testing approaches are evaluated to find the commonalities among the approaches and to what extent these approaches provide support within entire spectrum of software development activities specifically during maintenance testing. This paper is organized as follows:
2 Overview of Software Maintenance Testing Approaches In this section several maintenance testing approaches namely Regression Test Suite approach, Heuristic-Based Framework approach, Keyword-Based approach, Graphical User Interface (GUIs) Regression Testing approach and Model-Based approach will be evaluated into several subsections respectively. 2.1 Regression Test Suite Approach (RTS) When changes applied in to a system, impacted artefacts i.e. test suites have to be changed accordingly during maintenance phase. [2] came up with a case study how test suite maintenance can be done during system evolution that caused by changes made to the system during maintenance phase. This case study make used of reusable test environments and program. [2] investigated issues in software maintenance through exploratory study and follow-up study on change strategies in their studies and this study was inspired by [3]. Four phases involved during the case study process: environment setup, build configuration, execute unit test and execute functional test as shown in Fig.1 below:
Fig. 1. Case Study Process [2]
Several steps were involved during the test. First step involves installation of baseline version and follows by executing the baseline to verify its executable and usable. Change propagation version will follow the suite i.e. installation and execution processes. The adapting task will be logged and recorded. The results will undergo a comparative study with different version of the test cases. Validation of this work is
Software Maintenance Testing Approaches to Support Test Case Changes – A Review
35
done via a model called Padgett [4]. The result than postulated into three classifications: reactivity, researcher bias and respondent bias. 2.2 Heuristic-Based Framework Approach (HBF) Graphic User Interfaces (GUIs) are believed to make up a large portion of source code [5]. Testing GUI is different compared to traditional testing which involve implementation level of source code/programming. This is a true when it comes to test case maintenance especially during regression testing. Minor changes in GUI, could impact malfunction of test cases. [5] has created an approach which is based on heuristic model to solve GUI test case maintenance. This approach make used of two techniques: Capture/Replay and Elements and Actions. Capture/Replay technique requires end-user gestures such as mouse maneuverability usage and keystrokes. These activities are recorded and played back. The advantage of this approach, it does not require a good programming skill. The issue of this approach raise when there is change in the interface and consequently causing test cases to be malfunctions and broken. Whenever this situation happens, manual effort is often required to repair some test cases in test suite. Elements and Actions approach models a GUI test case as a sequence of actions. Examples are shown as in Fig. 2 and Table 1. Fig. 2 shows a sample of GUI as captured test cases whereas Table 1 shows the elements and actions captured during the capture/replay technique.
Fig. 2. Find Dialog Box [5] Table 1. Sample of Test Case for Find Dialog Box [5] GUI Elements FindTextBox
Actions setText(‘GUI’)
CaseSensitiveCheckBox
‘click’
FindButton
‘click’
CancelButton
‘click’
In Fig. 3 shows the captured new elements and actions. While in table 2 shows the sample heuristics table.
36
O.M. Yusop and S. Ibrahim
Fig. 3. Modified Find Dialog Box [5] Table 2. Sample Heuristic Result [5]
2.3 Keyword-Based Approach (KB) [6] used keyword-based approach for software testing automation and maintenance. They categorised test automation into 5 categories namely: test management, unit test, test data generation, performance test and functional/system/regression test. The authors picked up test execution for test automation. Basic principle of this keyword based approach is test engineering tasks are separated into specific roles. Identified roles for test engineering composed of test designer, automation engineer and test executor. As test designer, the person needs to form test cases using keywords and documented in using spreadsheet. The automation engineer will code up the keywords scripts using scripting tool and language. Finally the test executor will run the tests directly from the spreadsheet. The approach is said to improve Capture/Playback technique through reduction the amount of test script. The approach is as shown in Fig. 4 below and follows by the test result (Fig. 5) of the research study by the authors.
Fig. 4. Keyword Based Approach [6]
Software Maintenance Testing Approaches to Support Test Case Changes – A Review
37
Fig. 5. Result as Return of Investment [6]
2.4 Graphical User Interfaces (GUIs) Regression Testing Approach (GUIs-RT) [7] claims the test case maintenance using GUI is approachable and not many research have been done on it. This approach basically provides some useful insight information on test suite maintenance through GUI maintenance. This approach is called GUI regression testing and it determines the usability of test suites after changes are imposed on the system GUIs. This technique consists of two parts: a checker and a repairer. A checker is responsible to categorise test cases into usable and un-usable. If the test case is siding to the latter, the repairer will try to repair the un-usable test cases. Once done, the repaired test cases are labeled and stored as repairable test case. Details of the Fig.6 as shown below:
Fig. 6. Regression Testers’ Components [7]
[7] Repaired test case is an effective way to reduce cost of creating new test cases. Each component details have been explained by the authors in their research studies and will not recurring into this sub-section. Results are compared through several case studies and Bit-vector and Graph Matching Checker execution time were taken as shown in Table 3 and Fig. 7 respectively below:
38
O.M. Yusop and S. Ibrahim Table 3. Time Taken at a Glance [7]
Fig. 7. Comparing Bit-Vector with Graph Matching Checker [7]
2.5 Model-Based Approach (MB) [8] proposed a model-based approach for maintenance testing. Models are the main source for this approach and tools that support models and source codes are presumably established i.e. support auto generation between models and source code. UML classes and sequence diagrams are two input factors for test case generation. Whilst generating test cases out of models, an infrastructure composing of test related model and fine-grained traceability are created. The infrastructure or the approach as depicted in Fig. 8 below:
Fig. 8. The Approach Overview [8]
Software Maintenance Testing Approaches to Support Test Case Changes – A Review
39
This approach is divided into two phases. First phase is the creation of models and traceability, and the second phase utilises created models and traceability from the first phase as well as the modified UML models. During first phase, two steps are executed: (1) sequence diagram into model-based control flow graph (mbcfg) transformation and (2) then converting mgcfg information into test generation hierarchy and keeps safe the traceability model. Abstract test cases are produced and during further transformation the abstract test cases will turn into concrete test script skeletons. The second phase involves four activities: (1) comparing the models to find differences hence differencing model, (2) converting sequence diagram into modified mbcfg, (3) mbcfg and differencing model will used during pair wise graph traversal between original and modified mbcfg, and finally (4) test cases are classify through selected dangerous entities identification. This approach support modification at model level i.e. classes and sequence diagrams.
3 Result Reviews of Maintenance Testing Approaches Evaluation of the existing maintenance testing approaches will be done by benchmarking against [9] software evolution framework. [9] defined their framework criteria based on characterising factors of change mechanism and its influence factors. The evolution framework was organised into four dimensions: temporal properties, object of change, system properties and change support. These dimensions stated aspects of software changes in term of when, where, what and how. For this particular review, research candidate has done cross checking against one of the dimensions namely Object of Change (where). Other dimensions are not going to be included in this review. This is due to research candidate nature research is based on scopes such as change of artefacts, granularities, impact and change propagation. 3.1 Result Review I [9] classified the location of changes or answering where changes happened, as the second logical of his taxonomy. Within this dimension, there are four influencing factors had been highlighted namely artefacts, granularity, impact and change propagation. Artefacts are sub-categorised into static evolution and dynamic evolution. Level of artefacts abstraction will be divided into three sub-categories granularity; coarse, medium and fine. The impact of the changes indicates whether the changes influencing at local or global/system-wide level of abstraction. Changes made can spread out to other entities; this influencing factor is called change propagation. Based on table 4 above, all maintenance testing approaches are checked against the object of change dimension. For influencing factor artefacts, RTS caters set of test cases or a test suite. The granularity of the artefacts is still coarse or at file level. Though the impact can span from locally impacted or system wide impacted. RTS does cover the change propagation by having change propagation cycle in its approach.
40
O.M. Yusop and S. Ibrahim Table 4. Object of Change for Maintenance Testing Approaches – Result Review I
Object of Change Artefacts Static Evolution Granularity Coarse Medium Fine Impact Locally System Wide Level of abstraction Change Propagation
RTS Test Suites
HBF Test Cases
KB Test Cases
GUIs-RT Test Cases
;
;
;
;
;
;
;
MB Classes and sequence diagrams ;
; ; ; ; Test Cases ;
; ; Graphic User Interface :
; : Testing Script :
; ; Test Suite
; : Model
:
;
HBF has similar set of artefacts when compared to RTS. HBF goes a little detail on level of artefacts which is test case instead of test suite. Since HBF covering element of artefacts based on graphical user interface, each of the field inside the window is treated as a single possible test case, thus HBF provides finer granularity as opposed to RTS. The impact can be traced out locally or widely despite no change propagation tracing was provided. KB uses keywords to build up test cases. The level of test case is still coarse in term of its granularity. All the keywords are typed out manually into a test script and will be used to assist a programming task during test automation. It has locally impact and does not support change propagation. GUIs-RT is another approach which took graphical elements as its test cases artefacts. The focus is more towards maintaining user interface therefore all test cases are sourced by graphical elements. Level of granularity is still at coarse level and the impact is able to trace out locally and widely except with no change propagation. Finally MB approach make used of model as input artefacts. Specifically classes and sequence diagram will be used as the sources. Two UML elements are considered as medium granularity and the impact can be traced out locally with change propagation support. 3.2 Result Review II Besides evaluating the approaches based on [9] taxonomy perspective, research candidate did some comparative study among the maintenance testing approaches. Some generic features or commonality namely contribution, limitation, level of abstraction, traceability support, version control support, tool support, result of research and validity to threat were included and tabulated as the following table 5:
Software Maintenance Testing Approaches to Support Test Case Changes – A Review
41
Table 5. Maintenance Testing Approaches Commonality Features – Result Review II Commonality Features
RTS
HBF
KB
GUIs-RT
MB
Test Suite Automation
Support GUI test cases
Support Test Automation through Keywords
Model based test case generation and maintenance
Manually execution for encapsulated function
Larger GUI Scalability size causes Issues adverse effect to its accuracy
Support Test Suite maintenance during GUI regression testing method Obsolete test cases still can occured
Test Suite 8
GUI elements Words i.e. buttons, textbox 8 8
8
If more modified operations were called, adverse in precision Models, classes, source code 9
8
8
8
8
8
8
GUIAnalyzer
8
GUITAR
RTStool & DejaVOO
Contribution
Limitation
Level of Abstration Traceability Support Version Control Tool Support
Validity to Threat
Result of Research
Test Suite
Subject application, Capture/Playback Performance Not stated was comparative and case study Effectiveness, comparative algorithms Comparatively Keyword is Efficient and Role splitting Accuracy can better that effective new strategy is be sought using Capture/Playback regression favourable multiple approach which GUI test over the others heuristic sets is very costlier approach and manually done Padgett model developed by (Robson, 2002) to validate the approach
Using these criteria: efficiency, precision and safety (Rothermel & Harrold, 1996) Combinational between model-based test generation and regression selection test
4 Conclusion The sources used to evaluate the approaches are based on accepted published international journal. Justifications are based on the software evolution taxonomy mentioned in [9]. Whereas another evaluation was based on common criteria such as limitation, contribution, etc. In this paper we presented the evaluation results for maintenance testing approaches. These inputs can be used for future references in case research candidate wishes to further the research study.
42
O.M. Yusop and S. Ibrahim
References 1. Harrold, M.J.: Reduce, reuse, recycle, recover: Techniques for improved regression testing. In: 2009 IEEE International Conference on Software Maintenance, Edmonton, AB, Canada, pp. 5–5 (2009) 2. Skoglund, M., Runeson, P.: A case study on regression test suite maintenance in system evolution. In: Proceedings of 20th IEEE International Conference on Software Maintenance 2004, Chicago, IL, USA, pp. 438–442 (2004) 3. Rajlich, Gosavi: A Case Study of Unanticipated Incremental Change. In: Proceedings of the International Conference on Software Maintenance (ICSM 2002), p. 442. IEEE Computer Society, Los Alamitos (2002) 4. Robson, C.: Real World Research: A Resource for Social Scientists and PractitionerResearchers (Regional Surveys of the World). Blackwell Publishing Limited, Malden (2002) 5. McMaster, S., Memon, A.M.: An Extensible Heuristic-Based Framework for GUI Test Case Maintenance. In: Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops, pp. 251–254. IEEE Computer Society, Los Alamitos (2009) 6. Wissink, T., Amaro, C.: Successful Test Automation for Software Maintenance. In: 2006 22nd IEEE International Conference on Software Maintenance, Philadelphia, PA, USA, pp. 265–266 (2006) 7. Memon, A.M., Soffa, M.L.: Regression testing of GUIs. In: Proceedings of the 9th European software Engineering Conference Held Jointly with 11th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 118–127. ACM, Helsinki (2003) 8. Naslavsky, L., Ziv, H., Richardson, D.J.: A model-based regression test selection technique. In: 2009 IEEE International Conference on Software Maintenance, Edmonton, AB, Canada, pp. 515–518 (2009) 9. Buckley, J., Mens, T., Zenger, M., Rashid, A., Kniesel, G.: Towards a taxonomy of software change: Research Articles. J. Softw. Maint. Evol. 17, 309–332 (2005)
Measuring Understandability of Aspect-Oriented Code Mathupayas Thongmak1 and Pornsiri Muenchaisri2 1
Department of Management Information Systems, Thammasat Business School, Thammasat University, Thailand [email protected] 2 Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Thailand [email protected]
Abstract. Software maintainability is one of important factors that developers should concern because two-thirds of a software system's lifetime-cost involve maintenance. Understandability is one of sub-characteristics that can describe software maintainability. Aspect-oriented programming (AOP) is an alternative software development paradigm that aims to increase understandability, adaptability, and reusability. It focuses on crosscutting concerns by introducing a modular unit, called “aspect”. Based on the definition of understandability that “the related attributes of software components that users have to put their effort on to recognizing the logical concept of the components”, this paper proposes seven metrics for evaluating understandability of aspect-oriented code using different levels of dependence graphs. The metrics are applied to two versions of aspect-oriented programs to give an illustration. Keywords: Software Metrics, Aspect-Oriented, Understandability.
1 Introduction Software quality has become essential to good software development. Quality factors consist of efficiency, reliability, reusability, maintainability, etc. Software maintainability is an important factor that developers should concern because two-thirds of a software system's lifetime cost involves maintenance [1]. Maintainability characteristic composes of three sub-characteristics: Testability, Understandability, and Modifiability [2]. For understandability sub-characteristic, to comprehend software, the factors that influences on the comprehensibility are an internal process of humans and an internal software quality itself [3, 4]. Aspect-oriented programming (AOP) is an alternative software development paradigm that aims to increase comprehensibility, adaptability and reusability [5]. It focuses on crosscutting concerns by introducing a modular unit, called “aspect”. There are many research works studied on aspect-oriented software measurements and understandability assessments. Zhao proposes dependence graphs for aspectoriented programs and presents cohesion metrics and structural metrics based on the graphs [6, 7, 8]. Shima et al. introduce an approach to experimental evaluation of software understandability from the internal process of humans [4]. Jindasawat et al. investigate correlations between object-oriented design metrics and two H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 43–54, 2011. © Springer-Verlag Berlin Heidelberg 2011
44
M. Thongmak and P. Muenchaisri
sub-characteristics of maintainability: understandability and modifiability [9]. Sheldon et al. propose metrics for maintainability of class inheritance hierarchies [10]. The metrics are evaluated from understandability and modifiability subcharacteristics. Genero et al. study relationships between size and structural metrics of class diagrams and maintainability time [11]. This paper aims to propose metrics for understandability of AspectJ code structure. We define understandability as “The related attributes of software components that users have to put their effort on to recognizing the logical concept of the components”. Some related works are discussed in section 2. Section 3 presents the notion of Aspect-Oriented Programming and AspectJ. Section 4 shows dependency graphs adapted from [6] as a representation for aspect-oriented program structure. Metrics for understandability are introduced in section 5. The paper ends with some conclusion and future work.
2 Related Work This section summarizes the related work. Some researches are mentioned to the understandability measurements, and some works are about aspect-oriented software metrics. The dependence graphs of our work are adapted from [6]. Zhao introduces dependency graphs for aspect-oriented program and defines some metrics based on his graphs. In [8], Zhao proposes cohesion metrics to evaluate how tightly the attributes and modules of aspects cohere. And in [7], he presents complexity metrics to measure the complexity of an aspect-oriented program from different levels of dependence graphs. Both Jindasawat et al. and Genero et. al use controlled experiments to investigate correlations between object-oriented design metrics and two sub-characteristics of maintainability: understandability and modifiability [9, 11]. Jindasawat et. al find the relationships between metrics from class and sequence diagrams, and understandability exam scores and modifiability exam scores. Genero et. al study the correlation between metrics from class diagrams, and understandability times and modifiability times. Sheldon et al. propose objective metrics for measuring maintainability of class inheritance hierarchies [10]. The metrics are composed of understandability and modifiability metrics. Shima et al. introduce an approach to experimental evaluation of software understandability [4]. They propose software overhaul as an approach to externalize the process of understanding software systems and the probabilistic models to evaluate understandability.
3 Aspect-Oriented Programming with AspectJ Aspect-oriented programming (AOP) paradigm, also called aspect-oriented software development (AOSD), attempts to solve code tangling and scattering problems by modularization and encapsulation of crosscutting concerns [12]. To encapsulate various types of crosscutting concerns, AOP introduces a new construct called an aspect. An aspect is a part of a program that crosscuts its core concerns, the base code (the non-aspect part of a program), by applying advice over a number of join point, called a pointcut. Join points are well-defined point in the flow of a program where the
Measuring Understandability of Aspect-Oriented Code
45
aspect can apply along the program execution. Join points include method execution, the instantiation of an object, and the throwing of an exception. A pointcut is a set of join points. Whenever the program execution reaches one of the join points described in the pointcut, a piece of code associated with the pointcut (called advice) is executed. This allows a programmer to describe where and when additional code should be executed in addition to an already defined behavior. Advices are method-like constructs that provide a way to express crosscutting action at the join points that are captured by a pointcut [13]. There are three kinds of advice: before, after, and around. A before advice is executed prior to the execution of join point. An after advice is executed following the execution of join point. An around advice surrounds the join point’s execution. It has the ability to bypass execution, continue the original execution, or cause execution with an altered context. AspectJ is the first and the most popular aspect-oriented programming language [12]. Two ways that AspectJ interact with the base program are pointcut and advice, and inter-type declarations. Pointcut and advice are described above. The join points in AspectJ are method call, method or constructor execution, field read or write access, exception handler execution, class or object initialization, object preinitialization, and advice execution [13]. An inter-type declaration, also called introduction, is a mechanism that allows the developer to crosscut concerns in a static way [14]. Six types of possible changes through Inter-type declaration are adding members (methods, constructors, fields) to types (including other aspects), adding concrete implementation to interfaces, declaring that types extend new types or implement new interfaces, declaring aspect precedence, declaring custom compilation errors or warnings, and converting checked exceptions to unchecked.
Fig. 1. An example of AspectJ program
46
M. Thongmak and P. Muenchaisri
Figure 1 shows an AspectJ program from [13]. The program contains one aspect MinimumBalanceRuleAspect and three classes Account, SavingsAccount, and InsufficientBalanceException. The aspect owns one method getCount() and one attribute throwInsufficientCount. It adds one method getAvailableBalance() and one attribute _minimumBalance to the class Account and also introduces two advices related to poincuts newSavingAccount and throwInsufficientBalance respectively. We apply this example to show the idea of understandability effort measurement in the next section.
4 Dependence Graphs for Aspect-Oriented Software We measure the effort required for understanding aspect-oriented software based on aspect-oriented software dependence graph (ASDG) adapted from [6, 7, 15]. There are three levels of dependency graphs representing an aspect-oriented system, i.e., module-level, class/aspect-level, and system-level. To produce the ASDG of aspectoriented program, we construct the software dependence graph (SDG) for non-aspect code from the class dependence graphs (CDGs) containing method dependence graphs (MDGs) first, then construct the aspect interprocedural dependence graphs (AIDGs) to represent aspects from advice dependence graphs (ADGs), introduction dependence graphs (IDGs), pointcut dependence graphs (PDGs) and method dependence graphs. Finally, we determine weaving points between SDG and AIDGs to form the ASDG. The rest of this section explains these dependency graphs from [6, 15] and describes the graph modification points to be more suitable for understandability effort measurements. 4.1 Module-Level Dependence Graphs In aspect-oriented systems, an aspect contains several types of module, i.e., advice, intertype declaration, pointcut, and method, and a class contains only one type of module called method [16]. In this paper, we apply three types of module-level dependence graphs: method dependence graph (MDG), advice dependence graph (ADG), and introduction dependence graph (IDG) to represent method, advice, and method introduction respectively, and introduce pointcut dependence graph (PDG) to show a pointcut. The MDG is an arc-classified digraph whose vertices represent statement or predicate expressions in the method. An MDG also includes formal parameter vertices and actual parameter vertices to model parameter passing between models. A formal-in vertex and a formal-out vertex are used to show a formal parameter of the method and a formal parameter that may be modified by the method. An actual-in vertex and an actual-out vertex are used to present an actual parameter at call site and an actual parameter that may be modified by the called method. There are two types of arcs representing dependence relationships in the graph, i.e., control dependence, data dependence, and call dependence. Control dependence represents control conditions
Measuring Understandability of Aspect-Oriented Code
47
on which the execution of a statement or expression depends in the method. It is used to link between method and statement, statement and statement, method and formal parameter, and method and actual parameter. Data dependence represents the data flows between statements in the method. It is used to connect statement and formal parameter and to join statement and actual parameter. Call dependence represents call relationships between statements of a call method and the called method. The examples of MDG are shown as parts of CDGs in Figure 2. The ADG and IDG are constructed with the similar notations with the MDG. ADGs and IDGs are displayed as parts of Figure 4. For the alteration of module-level dependence graphs, we add parameter in signature vertices to the MDG and IDG to represent parameter-in and parameter-out defined in the signature of a method and introduce parameters in signature of method dependence arc to link the method or the method introduction with parameters in their signature. We exclude local variables in order to decrease the graphs’ complexity. We also omit control dependence arcs between methods and formal parameters/actual parameters in the situation that data dependence is linked between method’s statement and formal/actual parameters to avoid redundant metric’s count. We show these control dependence arcs only in case there is no explicit relationship between statements of a method and its parameters. Moreover, we add PDGs to represent poincuts in an aspect-oriented program. In PDGs, the pointcut vertices are added to model the poincuts in the program. Each pointcut has its own parameter in signature vertices. We also link pointcut vertices and blank vertices using crosscutting dependence arcs to show possible joinpoints that the poincuts will crosscut. 4.2 Class/Aspect-Level Dependence Graph The class dependence graph (CDG) and the aspect interprocedural dependence graph (AIDG) are used to depict a single Java class and a single aspect in the program respectively. Figure 2 shows CDGs of class Account, class SavingsAccount, and class InsufficientBalanceException. The CDG is a digraph consisting of a collection of MDGs which each represents a single method in the class. In this level, parameter dependence arcs are added to connect actual-in and formal-in, and formal-out and actual-out vertices to model parameter passing between the methods in the class. The class membership dependence arcs are added to show that each method is a member of the class. The methods or attributes of superclasses are also inherited to the subclass. For the CDG, we add attribute of class vertices to model member variables of a class. The vertices are linked to the class vertex by class membership dependence arc. We also add call dependence arcs to blank vertices to represent call relationships between vertices inside the class and vertices outside the class and add parameter dependence arcs to blank vertices to represent parameter passing between internal class’s methods and external class’s methods. We did not draw classes outside the scope of source code such as API classes.
48
M. Thongmak and P. Muenchaisri
class/aspect-membership dependence arc parameters in signature of method dependence arc control dependence arc data dependence arc parameter dependence arc call dependence arc crosscutting dependence arc advice dependence arc
f1_in: accountNumber = accountNumber_in f2_in: amount = amount_in f3_in: _balance = _balance_in f4_in: balance = balance_in f5_in: throwInsufficientCount = throwInsufficientCount_in f6_in: _minimumBalance = _minimumBalance_in f1_out: _accountNumber_out = _accountNumber f2_out: _balance_out = _balance f3_out: _minimumBalance_out = 25 a1_in: accountNumber_in = accountNumber a2_in: accountNumber_in = 12456 a3_in: amount_in = 100 a4_in: amount_in = 50 a5_in: message_in = “Insufficient available balance” a6_in: message_in = “Total balance not sufficent” a7_in: balance_in = _balance + amount a8_in: balance_in = balance - amount p1_in: accountNumber p2_in: amount p3_in: balance p4_in: args p5_in: message p6_in: account p1_out: void p2_out: float
Fig. 2. (a) A CDG for class Account, (b) A CDG for class SavingAccount, and (c) A CDG for class InsufficientBalanceException
Fig. 3. An SDG for non-aspect code
Measuring Understandability of Aspect-Oriented Code
class/aspect-membership dependence arc parameters in signature of method dependence arc control dependence arc data dependence arc parameter dependence arc call dependence arc crosscutting dependence arc
49
f1_in: accountNumber = accountNumber_in f2_in: amount = amount_in f3_in: _balance = _balance_in f4_in: balance = balance_in f5_in: throwInsufficientCount = throwInsufficientCount_in f6_in: _minimumBalance = _minimumBalance_in f1_out: _accountNumber_out = _accountNumber f2_out: _balance_out = _balance f3_out: _minimumBalance_out = 25 a1_in: accountNumber_in = accountNumber a2_in: accountNumber_in = 12456 a3_in: amount_in = 100 a4_in: amount_in = 50 a5_in: message_in = “Insufficient available balance” a6_in: message_in = “Total balance not sufficent” a7_in: balance_in = _balance + amount a8_in: balance_in = balance - amount p1_in: accountNumber p2_in: amount p3_in: balance p4_in: args p5_in: message p6_in: account p1_out: void p2_out: float
advice dependence arc
Fig. 4. An AIDG for aspect MinimumBalanceRule
Fig. 5. An ASDG for aspect-oriented program in Figure 1
50
M. Thongmak and P. Muenchaisri
The AIDG is a digraph that consists of a number of ADGs, IDGs, PDGs, and MDGs. The structure of AIDG is similar to CDG. Figure 4 depicts an AIDG for MinimumBalanceRuleAspect of program in Figure 1. The aspect membership dependence arcs are used to show memberships in the aspect. For the modifications of AIDG, we add attribute of aspect vertices and attribute introduction vertices to represent member variables or attribute introductions of an aspect. Like CDG, we put in call dependence arcs to blank vertices to represent call relationships between inside class vertices and outside class vertices and add parameter dependence arcs to blank vertices to represent parameter passing between internal class’s methods and external class’s methods. We also add advice dependence arcs to show the relationship between ADGs and their PDG. 4.3 System-Level Dependence Graphs Graphs in this level consist of the software dependence graphs (SDG) for a Java program before weaving and the aspect-oriented system dependence graph (ASDG) for complete aspect-oriented program. An SDG combines a collection of MDGs from each class in the non-aspect code. The graph starts from the main() method, then the main method calls other methods. In this level, a SDG explicitly shows the relationships between methods in different classes, but cut off the class vertices and class membership dependence arcs to clearly show flows of calling between methods. An example of SDG is displayed in Figure 3. Figure 5 presents an ASDG for complete aspect-oriented program in Figure 1. An ASDG are assembled from an SDG for non-aspect code and AIDGs. In the ASDG, crosscutting relationships between the joinpoints in non-aspect code and the pointcuts are explicitly shown.
5 Metrics for Understandability In this section, on the assumption that the software understanding effort can be an indicator of understandability measurement, we define some metrics for understandability based on each level of dependency graphs, i.e., module-level, class/aspect-level, and system-level. First of all, we will mention about a guideline for applying our metrics. The proposed metrics are composed from various types of dependency arcs. Each type of arcs has its own level of comprehension. Then, each metric is computed as follows: U = w1*NOAs1 + w2*NOAs2 + …+ wn*NOAsn where U is the understandability metric value, wi is a weight value for each type of dependencies, and NOAn is number of appearances of arcs in the same type. We are not going to discuss about the weight value here, just simply suppose that the weight values are determined by the experts. In each metric described afterwards, we assume that all weights are equals to 1 so we except to show weighted variables in the equations.
Measuring Understandability of Aspect-Oriented Code
51
5.1 Module-Level Metrics Metrics in this level are defined based on the MDGs for methods, ADGs for advices, IDGs for method introductions, and PDGs for pointcuts. Attached to each definitions, we illustrate our measurement examples based on program and graphs (not included the grey highlights) in Figure 1-5. To understand each method or method introduction, we have to understand its parameters, its statements in the method/method introduction, the called methods, and the parameters of each statement (both formal and actual parameters). So we define understandability efforts for MDG and IDG as: UMDG/IDG = the number of control dependence arcs + the number of data dependence arcs + the number of parameter in signature of method dependence arcs + the number of call dependence arcs For each advice, we have to comprehend its parameters, its statements in the method/method introduction, the called methods, the parameters-in/out of each statement, and the pointcuts that the advice depends on. The effort for understanding ADG is calculated as: UADG = the number of control dependence arcs + the number of data dependence arcs + the number of parameter in signature of method dependence arcs + the number of call dependence arcs + the number of advice dependence arcs To comprehend each pointcut, we have to know its parameters and its selected join points. Hence, we have the following metric: UPDG = the number of parameter in signature of method dependence arcs + the number of crosscutting dependence arcs For example in Figure 2, UMDG:mc3 = 4, UMDG:mc5 = 8, UMDG:mc7 = 14, UMDG:mc13 = 3, UMDG:mc15 = 5, UMDG:mc18 = 5, UMDG:mc22 = 10, and UMDG:mc27 = 3. In Figure 4, UMDG:ma33 = 3, UIDG:mi34 = 4, UPDG:pe36 =.2, UPDG:pe37 = 3, UADG:ae38 = 4, and UADG:ae40 = 12. 5.2 Class/Aspect-Level Metrics We define some class/aspect level metrics grounded on CDGs for classes and AIDGs for aspects. In this level, the relationships between modules are explicitly shown, so we add counts of class membership dependencies and parameter dependencies to the measurements. Under the assumption that we use effort only once to understand repeated called methods, the redundant number of calls to the same method are omitted from the equations. Then, the metric for CDG understandability are defined as: UCDG = summation of UMDGs for all inherited methods and methods in the class + the number of class membership dependence arcs + the number of parameter dependence arcs – the number of redundant call dependence arcs count or UCDG = the number of all dependence arcs in CDG – the number of redundant call dependence arcs count in CDG
52
M. Thongmak and P. Muenchaisri
The efforts for understanding ADG are calculated as: UAIDG = summation of UMDGs for all inherited methods and methods in the aspect + summation of UADGs for all advices in the aspect + summation of UPDGs for all poincuts in the aspect + the number of aspect membership dependence arcs + the number of parameter dependence arcs – the number of redundant call dependence arcs count or UAIDG = the number of all dependence arcs – the number of redundant call dependence arcs count For instance in Figure 2, UCDG:ce0 = 42, UCDG:ce17 = 60, and UCDG:ce26 = 4. In Figure 4, UAIDG:ase29 = 38. 5.3 System-Level Metrics Lastly, we propose metrics at the system level base upon the SDG for non-aspect code and ASDG for the whole aspect-oriented system. The metric for Java based code are proposed as follows: USDG = summation of UCDGs for all classes in non-aspect code – summation of redundant UMDGs count in all CDGs - the number of redundant call dependence arcs count not shown in CDGs - or USDG = the number of all dependence arcs + the number of class membership dependence arcs in all CDGs – the number of redundant call dependence arcs count The last metric is used to measure total understandability effort of an aspectoriented program. The metric are measured as: UASDG = USDG for non-aspect code + summation of UAIDGs for all aspects - the number of redundant call dependence arcs count not shown in SDG and AIDGs or UASDG = the number of all dependence arcs + the number of class membership dependence arcs in all CDGs + the number of aspect membership dependence arcs in all AIDGs – the number of redundant call dependence arcs count Then the metric in Figure 3, USDG = 72. In Figure 5, UASDG = 109. 5.4 An Example The measurement examples above are based on the program in Figure 1 that is not included the grey parts. After alteration program by adding code in the grey highlights (adding one method debit() to class SavingAccount and modifying the set of join points in pointcut newSavingAccount), the measurement values of all metrics are as following. The values of metrics that have been changed are represented in bold face. For module-level, the metrics in Figure 2: UMDG:mc3 = 4, UMDG:mc5 = 8, UMDG:mc7 = 14, UMDG:mc13 = 3, UMDG:mc15 = 5, UMDG:mc18 = 5, UMDG:mc20 = 8, UMDG:mc22 = 10, and
Measuring Understandability of Aspect-Oriented Code
53
UMDG:mc27 = 3. In Figure 4: UMDG:ma33 = 3, UIDG:mi34 = 4, UPDG:pe36 =.4, UPDG:pe37 = 3, UADG:ae38 = 4, and UADG:ae40 = 12. For class/aspect-level, the metrics in Figure 2: UCDG:ce0 = 42, UCDG:ce17 = 60, and UCDG:ce26 = 4. In Figure 4: UAIDG:ase29 = 40. Finally, the metric in system-level in Figure 3: USDG = 78. In Figure 5: UASDG = 117.
6 Conclusion and Future Work This paper proposes seven objective metrics for evaluating understandability of aspect-oriented software from the understanding effort used. These metrics are based on three levels dependency graphs mapping from aspect-oriented program code, i.e., MDGs, ADGs, IDGs, and PDGs in module-level, CDGs and AIDGs in class/aspectlevel, and SDG and ASDG in system-level. They are composed from summarizing number of each type of dependency arcs in the graphs. The measures are applied to two versions of aspect-oriented programs to give an illustrative example. For further research, we plan to find more objective metrics that are related to the understandability or other maintainability sub-characteristics. In addition, the proposed metrics should be also explored their thresholds to be used as a guideline for the result assessment.
References 1. Page-Jones, M.: The Practical Guide to Structured System Design. Yourdon Press, New York (1980) 2. Fenton, N.E., Pfleeger, S.L.: Software Metrics: A Rigorous and Practical Approach, 2nd edn. International Thomson Computer Press (1996) 3. Informatics Institute, http://www.ii.metu.edu.tr/~ion502/demo/ch1.html 4. Shima, K., Takemura, Y., Matsumoto, K.: An Approach to Experimental Evaluation of Software Understandability. In: International Symposium on Empirical Software Engineering (ISESE 2002), pp. 48–55 (2002) 5. Stein, D., Hanenberg, S., Unland, R.: A UML-based Aspect-Oriented Design Notation. In: 1st International Conference on Aspect-Oriented Software Development, pp. 106–112 (2002) 6. Zhao, J.: Dependence Analysis of Aspect-Oriented Software and Its Applications to Slicing, Testing, and Debugging. Technical-Report SE-2001-134-17, Information Processing Society of Japan, IPSJ (2001) 7. Zhao, J.: Towards a Metrics Suite for Aspect-Oriented Software. Technical-Report SE136-25, Information Processing Society of Japan, IPSJ (2002) 8. Zhao, J., Xu, B.: Measuring aspect cohesion. In: Wermelinger, M., Margaria-Steffen, T. (eds.) FASE 2004. LNCS, vol. 2984, pp. 54–68. Springer, Heidelberg (2004) 9. Jindasawat, N., Kiewkanya, M., Muenchaisri, P.: Investigating Correlation between the Object-Oriented Design Maintainability and Two Sub-Characteristics. In: 13th International Conference on Intelligent & Adaptive Systems, and Software Engineering (IASSE 2004), pp. 151–156 (2004) 10. Sheldon, F.S., Jerath, K., Chung, H.: Metrics for maintainability of class inheritance hierarchies. Journal of Software Maintenance and Evolution: Research and Practice 14, 147–160 (2002)
54
M. Thongmak and P. Muenchaisri
11. Genero, M., Piatini, M., Manso, E.: Finding "Early" Indicators of UML Class Diagrams Understandability and Modifiability. In: International Symposium on Empirical Software Engineering (ISESE 2004), pp. 207–216 (2004) 12. Gregor, K., Lamping, J., Mendhekar, A., Maeda, C., Lopes, C., Loingtier, J.M., Irwin, J.: Aspect-Oriented Programming. In: Aksit, M., Auletta, V. (eds.) ECOOP 1997. LNCS, vol. 1241, pp. 220–242. Springer, Heidelberg (1997) 13. Ramnivas, L.: AspectJ in Action: Practical Aspect-Oriented Programming. Manning Publications (2003) 14. Guyomarc’h, J.Y., Guéhéneuc, Y.G.: On the Impact of Aspect-Oriented Programming on Object-Oriented Metrics. In: 9th ECOOP workshop on Quantitative Approaches in ObjectOriented Software Engineering, pp. 42–47 (2005) 15. Zhao, J.: Applying Program Dependence Analysis to Java Software. In: Workshop on Software Engineering and Database Systems, International Computer Symposium, pp. 162–169 (1998) 16. Zhao, J.: Measuring Coupling in Aspect-Oriented Systems. In: 10th International Software Metrics Symposium (2004)
Aspect Oriented and Component Based Model Driven Architecture Rachit Mohan Garg, Deepak Dahiya, Ankit Tyagi, Pranav Hundoo, and Raghvi Behl Department of Computer Science and Engineering, Jaypee University of Information Technology, H.P, India [email protected], [email protected], [email protected], {pranavhundoo,raghvi071304cse}@gmail.com
Abstract. This paper presents a methodology for an efficient development of software for various platforms by using the concept of aspects, components and model driven architecture. In the proposed methodology the analysis of the whole software is performed so as to separate aspects and identify independent components. Then comes the designing phase where the modules identified in the previous phase are modeled without any correlation of the platform on which it has to be build using UML. Then comes the third phase in which different platform specific models and the code artifacts are generated from these independent models. This separates all the concerns from the business logic and provides a proper understanding of different modules that are to be developed well before the start of the coding phase thereby reducing the burden of artifact design from the developers. Keywords: Aspect Modeling; Model Driven Architecture; Component Based Software Development; Aspects in MDA; Software Design.
1 Introduction To survive the cut throat world of competition organizations have started spending most of their time in coding thereby resulting in a software that becomes less reliable, less adaptable since the actual and the hidden needs of the customer are not addressed completely. Intermingling of concerns in the core business functionality is a primary concerned area in software development as it causes the most prevalent problems of code scattering and code tangling in the software. The work presented in this paper describes a design methodology which will help in creating highly reliable, adaptable software products in a timely fashion. A brief overview of the proposed methodology is as follows. It uses the concept of model driven architecture [1], [2], [3], aspects [7], [8], [9] and components [4], [5], [6]. Firstly the software is analyzed and is then broken down into functionally independent components, concerns by using the process of Forward Identification [10]. In the next phase the identified components are modeled using UML. Concerns are also modeled. After this begins the third phase which includes the platform independent to platform specific model conversion and code artifact generation from the model. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 55–64, 2011. © Springer-Verlag Berlin Heidelberg 2011
56
R.M. Garg et al.
The paper is organized as follows: Section 2 describes the related study on the literature work describing Model Driven Software Development, Component Based Software Development, Unified Modeling Language and Aspect Oriented Software Development. This section is followed by Section 3 which includes the proposed work. Subsequently, section 4 provides an implementation of the proposed methodology using a case study to depict the practical and industrial relevance of the overall methodology. Finally, section 5 states the observations and results with the paper conclusion summarized in Section 6.
2 Related Study A review of the literature of MDA, CBSD, AOSD is presented in this section. 2.1 Model Driven Software Development Model-Driven Development (MDD) is an approach in which the problem is modeled at a high level of abstraction and the implementation is derived from these high level models. In Model Driven Architecture (MDA) [1] given by OMG, business processes and applications are specified using platform-independent models (PIM’s) or the analysis model [2] that define the required features at a level of abstraction above the details of possible implementation platforms. Standardized mapping techniques transform the PIM’s into platform-specific models (PSM’s) and ultimately into implementations. The three primary goals of MDA are portability, interoperability and reusability through architectural separation of concerns which are in lieu with the MDA viewpoints [1], [3] of the system viz. structural, behavioral and informal which includes enterprise, information, technology, engineering viewpoints etc.
Predefined Templates
Mapping Rules
Platform Independent Model
PIM to PSM Translator
Platform Dependent Model
Code Generator
Basic Code Artifact
Fig. 1. Basic model for Model Driven Transformation
Figure 1 shows a basic structure for model driven transformation. In this a PIM is passed to the PIM-to-PSM translator which converts it to a PSM and then generates code artifact from it. 2.2 Unified Modeling Language Unified Modeling Language (UML) [11] plays a very important role in the development of a Model Driven Software. It helps in an efficient development of the Software product. UML is represented as the standard modeling language for the MDA.
Aspect Oriented and Component Based Model Driven Architecture
57
The UML Profile [12] mechanism in particular is used heavily in the MDA to introduce platform-specific annotations as well as to define platform-independent language constructs. The Profile mechanism provides a lightweight extension mechanism for UML that allows for refining the UML semantics. UML consists of many diagrams that are used to depict the information contained by the entities along with their interaction with the other system entities. The use-case diagrams are used to perform the analysis of the software. Class diagram are used for designing and defining attributes for different entities. Figure 2 shows the interrelation of UML and MDA. UML is required for creating both the models i.e. the platform independent model and the platform specific one. It can be seen that PIM’s and PSM’s are described through a meta-model that includes all the relevant information regarding the attributes, operations etc. Meta-model is expressed with the help of UML. UML
PIM Described with
Expressed with Meta-model
Mapping
Mapping
Described with PSM
PSM
Fig. 2. Interrelation of MDA with UML
2.3 Component Based Software Development (CBSD) Component based development is an approach that lays emphasis on the separation of concerns in which the whole system is divided into a number of independent modules. These modules are defined in such a way so as to facilitate the concept of software reuse i.e. the component of one software product can be used in other software where such a functionality is required. Separating the software product into functionally independent components helps in proper understanding of the functionality of the product, and better maintainability of the product. 2.4 Aspect Oriented Software Development (AOSD) Concerns. The term concerns [9] doesn’t reflect the issues related to the program, they represent the priorities and the requirements related to the software product. Concerns are basically divided into two broad categories: Core Concerns. These comprise of the program elements that constitute towards the business logic. Secondary Concerns. These comprise of the program element related to requirements other than that of the business logic. It includes functional, organizational, policy etc. Concerns addressing the authentication, logging, persistence concerns.
58
R.M. Garg et al.
Figure 3 gives a pictorial representation of the concerns that are discussed above.
Fig. 3. Various types of Concerns
Some of the secondary concerns especially the functional one’s depends on many parts of the system and not only the single one i.e. their implementation cuts across many program elements of the software product. These are known as cross-cutting concerns [9], [13] due to their relation with many elements. Since these concerns relate to many elements they cause the problem of code scattering and tangling [9], [13]. Tangling. It refers to the interdependencies between different elements within a program i.e. when two or more concerns are implemented in the same module, it becomes more difficult to understand it. Changes to implementation of any concern may cause unintended changes to other tangled concerns. Scattering. It refers to distribution of similar code throughout many program modules. It creates problem if functionality of that code is to be changed. Changes to the implementation may require finding and editing all affected code. Aspect Oriented Programming. Aspect Oriented Programming (AOP) [7], [8], [9], [15] deals with these issues and provides a solution to them. It uses a concept of aspects by virtue of which it tends to modularize these cross cutting concerns thereby removing the problems in cross-cutting. Thus it can be said that it isolates the secondary functions from the business logic. The basic terminologies [15] associated with AOP are presented in table 1 Table 1. AOP Basic Terminologies Concept Advice Aspect Join Point Pointcut Weaving
Definition Implementation regarding a particular concern. Defines an abstraction cross-cutting concern. It encapsulates definition of the pointcut and the advice associated with that concern. Locations in the program currently in execution where the advice related to an aspect can be implemented. A statement in an aspect, that defines the join points where the corresponding advice should be executed. Incorporating the advice at the specified join points in the program.
Aspect Oriented and Component Based Model Driven Architecture
59
3 Proposed Methodology Now-a-days most of the organizations are using the UML diagrams only to fulfill the designing phase requirements and understand the flow of data. After that these are thrown away as if they were of no use. This was due to the fact that these models were only depicting the flow of data and control and had no relation with the coding. The underlying proposed methodology separates out the secondary requirements or in general terms the concerns from the primary requirements or the core functionality. As opposed to the traditional approaches of software development this methodology also supports the developers in coding by generating artifacts of the software for different platforms from the models created during design phase. This provides a better separation of concerns and enables the concept of software reuse. The proposed methodology consists of 3 phases which are described below: Investigation Phase. This is the first and one of the most important phase of the methodology. In this phase the software is decomposed in such a way so as to yield all the independent components along with the concerns that should be weaved at appropriate locations in the software. In this phase use case diagrams are used for the proper analysis of the software’s functionality. The outputs of this phase are the components, aspects corresponding to the given software. Modeling Phase. In this phase the components, aspects identified in the investigation phase are modeled using UML in such a way that they don’t include any platform/language specific information. Each of the components constitutes the classes, interfaces corresponding to that component. Aspects are modeled separately from the core functionality and are weaved at appropriate joinpoints. Artifact Generation Phase. This is the final phase of the methodology. In this phase the code artifacts are generated for the specific platforms by transforming the independent model developed in previous phase into platform specific models one for each of the desired language. These platform/language specific models are then further transformed to produce the code artifacts for the specific languages.
PIM
Predefined Mapping Rules Classes Clas
Classes Interfaces
PSMPS
Model Translator
Interfaces Inter-
Platform/Language Platform/Language Related Related Information Information added added (if (if required.) required.)
AsAspects
Aspects Classes Interfaces
Code Generator
Predefined Templates
Aspects
Fig. 4. A basic representation of the whole methodology
60
R.M. Garg et al.
Figure 4 shows the diagrammatic representation of the proposed methodology. Here first the PIM is developed after the proper analysis of the software. Then it is translated to a PSM using Model translator which uses mapping rules defined by OMG group. In PSM platform specific information is added to entities. Now from this PSM the code artifact is generated by the code generator which makes use of the standard templates defined by OMG for code generation.
4 Methodology Implementation of the University Information Resource Scheduling (UIRS) For better understanding of the proposed methodology an implementation of the University Information Resource Scheduling (UIRS) System is shown based on this approach. The UIRS is a web based system which is being developed for lecturers and students of a university as their online timetable. It would contain three modules: administrator module, lecturer module and student module. The functions of the administrator module are to handle the entire administrator task. Administrator has to register all the students for the first time. Administrator also has to handle the additional, editing and deleting classes and subject as well. For the lecturer module it contains the function to view timetable for the specific lecturer and the master timetable for that semester. Lecturer can inquire for the class available and can book for the free classes. Students module contains the functionality of add and drop subject. Students can view and get the registration slip from the system. It also contains a database, which stores the lecturers and students personal details. Only the administrator can view, add and delete the data in the timetable. Poseidon for UML professional edition 6.0 is used for the development of the models related to the UIRS. The reason for using this tool is its code artifact generation technology [17]. To achieve this predefined templates are used. The syntax of the resultant model depends on the corresponding template. The templates are predefined for many high level languages such as Java, C++, PHP, XML and HTML. The phase-wise description of the methodology for the website is given below. Investigation Phase. This phase separates out the components and concerns so that they can be modeled in an effective way. The actors identified for this website are administrators, lecturers and students. An administrative Section which includes the Manage students’ profiles, Manage lecturers’ profile, Manage the username, password and change password, Manage the add, drop subject, Manage the class, Creation of Master Timetable. A lecturers’ Section which includes the View and print their own timetable, View and print master timetable for one semester, Query on the class availability, booking the class, Creation of lecturer’s timetable. A students’ Section which includes Add and drop subject, View and print their timetable and registration slip, Change password, Creation of student’s timetable. The use-case diagram depicting the functions that can be performed by the various actors is shown in figure 5. Modeling Phase. The components, aspects identified in the previous phase are modeled in this phase using various UML diagrams mainly class, sequence and state machine diagrams.
Aspect Oriented and Component Based Model Driven Architecture
61
Figure 6 shows the sequence diagram for the UIRS. It brings out the behavior within the given system and depicts the flow of control from one entity to the other in a sequential manner. Aspects are modeled and are implemented at specific joinpoints.
Fig. 5. Use-case depicting functions of actors
Fig. 6. Sequence Diagram for the UIRS
Artifact Generation Phase. Diagrams generated in the previous phase with the help of meta-model information and the mapping rules are converted to the PSM model. From this meta-model information and the corresponding predefined templates available, code artifacts for different platforms are generated. Here code artifacts for Java and PHP are developed as it is a web based application. Figure 7 represents the code artifact for the student class in Java. Figure 8 represents the PHP code artifact for the same student class.
62
R.M. Garg et al.
Fig. 7. Java artifact for the student class
Fig. 8. PHP artifact for the student class
5 Results and Significance The implementation of the proposed methodology on the University Information Resource Scheduling depicts the importance of the methodology as it incorporates concept of component based development and aspect-oriented development in MDA for better development of the software. The significance of the proposed methodology lies in the form of numerous advantages that are gained over the other prevailing approaches like Agile methodology etc. The advantages gained by the proposed methodology are depicted as follows: Aspect Modeling. Identification and designing of aspects is done as a separate process. They are modeled separately from the business code at the initial level and are continued accordingly i.e. from PIM to PSM aspects are also modeled in same manner. Ease of Inter-conversion. Since the different specific models are derived from a single PIM thus it acts as a bridge between different PSM’s enabling the information how the element in one PSM relates to the other element. Developer Overhead Reduction. Developers have to emphasize on the coding of the functionality and not he basic structure as it is generated automatically from the model designed. Early Error Detection. Errors are detected and rectified as early as during the design phase itself which otherwise would have been caught at testing phase and may have induced further errors during that coarse. Component Reconfiguration. After the analysis of the accumulated reuse data, the components can be reconfigured so as to make them more robust and fitting for the practical reuse.
Aspect Oriented and Component Based Model Driven Architecture
63
Software Reuse. In this approach the software is partitioned into different independent components during the initial phase. These components can be reused in other similar software’s thereby supporting the concept of software reuse. This is represented in figure 9 where a component from product 1 is being reused in product 2.
Software Product 2
Software Product 1 Comp. 1
Comp. 3
Comp. 2
Component Reuse
Comp. n
Component Reuse
Comp. 1
Comp. 2
Comp. 3
Comp. m
Fig. 9. Reuse of the components in other software
6 Conclusion Designing and transforming of model has been the core functionality of MDA. The methodology proposed integrates the concepts of aspects and software reuse in MDA. Models are developed and later transformed to generate the code artifacts for the specific platform. Thus developer only lays emphasis on writing the code for the specific functionality and thereby reduces a considerable amount of burden for the developer. Incorporation of aspects will lead to a better product as the concerns are refrained from being implemented within the business code. Moreover the division of the product into the components enables the concept of software reuse. Thus the two major concepts in software development viz. aspects and software reuse get implemented into the MDA approach. The implementation of the methodology on University Information Resource Scheduling software which is later transformed into two language specific platforms i.e. Java and PHP is also shown.
References 1. Object Management Group: MDA Guide Version 1.0.1, http://www.omg.org/mda/ (last accessed November 21, 2010) 2. Jacobson, M., Christerson, P., Övergaard, G.: Object-Oriented Engineering. ACM Press, Addison-Wesley, Wokingham, England (1992) 3. IEEE-SA Standards Board: IEEE Recommended Practice for Architectural Description of Software-Intensive Systems. IEEE Std 1471-2000, pp. i–23 (2000) 4. Pour, G.: Moving toward Component-Based Software Development Approach. IEEE J. Technology of Object-Oriented Languages and Systems (1998) 5. Wu, Y., Offutt, J.: Maintaining Evolving Component-Based Software with UML. In: Proc. of the 7th IEEE European Conference on Software Maintenance and Reengineering (2003)
64
R.M. Garg et al.
6. Cai, X., et al.: Component-Based Software Engineering: Technologies, Development Frameworks, and Quality Assurance Schemes. In: Proc. of the 7th IEEE Asia-Pacific Software Engineering Conference (2000) 7. Simmonds, D., et al.: An Aspect Oriented Model Driven Framework. In: Proc. of the 9th IEEE International EDOC Enterprise Computing Conference (2005) 8. Elrad, T., et al.: Special Issue on Aspect-Oriented Programming. Communications of the ACM 44 (2001) 9. Laddad, R.: AspectJ in Action, 2nd edn. Manning Publication (2009) 10. Wang, Z., Xu, X., Zhan, D.: A Survey of Business Component Identification Methods and Related Techniques. International Journal of Information Technology 2 (2005) 11. Object Management Group. UML Resource Page (2010), http://www.omg.org/uml/ (last accessed November 21, 2010) 12. Fuentes-Fernández, L., Vallecillo-Moreno, A.: An Introduction to UML Profiles. Informatics Professional 5, 6–13 (2004) 13. Walls, C., Breidenbach, R.: Spring in Action, 2nd edn. Dreamtech Press (2004) 14. Tarr, P.L., Ossher, H., Harrison, W., S. M. S. Jr.: N Degrees of Separation: MultiDimensional Separation of Concerns. In: International Conference on Software Engineering (1999) 15. Wampler, D.: The Role of Aspect-Oriented Programming in OMG’s Model-Driven Architecture. Phd. Thesis (2003) 16. Aspect-Oriented Software Development Steering Committee.: Aspect-Oriented Software Development, http://aosd.net/ (last accessed 5, November 2010) 17. Boger, M., Graham, E., Köster, M.: Poseidon for UML, http://www.gentleware.com/fileadmin/media/pdfs/userguides/Po seidonUsersGuide.pdf (last accessed November 5, 2010)
Engineering the Development Process for User Interfaces: Toward Improving Usability of Mobile Applications Reyes Juárez-Ramírez1, Guillermo Licea1, Itzel Barriba1, Víctor Izquierdo2, and Alfonso Ángeles3 1
Universidad Autónoma de Baja California, Facultad de Ciencias Químicas e Ingeniería, Calzada Universidad 14418, Parque Industrial Internacional Tijuana, B.C., C.P. 22390, México {reyesjua,glicea,itzel.barriba}@uabc.edu.mx 2 GPPI Telecomunicaciones S. de R.L. de C.V., Blvd. Díaz Ordaz 1460 Interior A, Colonia Reynoso, La Mesa, Tijuana, Baja California C.P. 22106, México [email protected] 3 Centro de Investigación y Desarrollo de Tecnología Digital –IPN Av. del Parque No. 1310, Mesa de Otay, Tijuana, Baja California, México. C.P. 22510 [email protected]
Abstract. Mobile applications have a great proliferation nowadays. Their usage varies from personal applications to enterprise systems. Even though there has been proliferation, the development of mobile applications confronts some limitations and faces particular challenges. The usability is one of the main domains to attend in such systems. To improve this quality attribute we suggest incorporating to the software development process best practices from other disciplines, such as usability engineering and human-computer interaction. On the other hand, it’s important to incorporate studies that assist to identify requirements of each mobile device capability to offer services in usable manner. In this paper we present a proposal to apply user oriented analysis and design, emphasizing specific practices, such as user and task analysis. Also, we present a case of study consisting in a mobile application for the iPhone device, which allows us to prove our proposal. Keywords: Usability, mobile applications, task and user analysis.
1 Introduction Since the last decade mobile phones have been used as medium for personal interaction using services such as voice and SMS [1]. Nowadays, these devices are getting more powerful and are being built with more functionality that was, until recently, only available with laptops and desktops computers [2], [3], [4]. Taking advantage from this capacity, other usages have emerged for mobile phones [3], [5], [6]. By today, they serve as utility devices incorporating personal information management capacities such as contacts agenda, calendars, etc. Additionally, a more recent usage is as medium of entertainment playing music, video, and games. Also mobile phones H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 65–79, 2011. © Springer-Verlag Berlin Heidelberg 2011
66
R. Juárez-Ramírez et al.
have incorporated the attractive utilities, such as e-mail managers and Internet browsing. Furthermore, there is a growing desire in organizations to perform more of their business functions using mobile phones [6], [7], [8]. Examples of these applications are: health care, remote sensing, remote monitoring and control. Although the proliferation of mobile technology and systems, the development of mobile applications confronts some limitations and faces particular challenges as those mentioned in [2], [3]: security, local persistence, connectivity, user administration, and integration. A special issue is related with the application’s usability, which represents a significant factor to the progress of this field. An engineering approach is required to face these challenges. Software engineering (SE) is a discipline that can be used to build these applications and to deal with those problems. In its better practices, SE includes a well structured set of phases (requirements analysis, design, implementation, and testing) which contribute to have a product that meet the final user acceptance. Especially the requirements analysis phase can contribute to consider all requirements and restrictions for a new mobile system. Even though SE requires integrating practices from other disciplines in order to meet, in a better way, special quality attributes, such as usability. Most software engineering approaches applied to user interface (UI) design have been conceived based on the experience from the development of traditional PCbased systems. However, technology advances are bringing considerable changes to electronic devices such as computers, mobile devices and displays [9]. These new devices require attention to design their interfaces [10], [11]. Nowadays, the “Lack of usability is the most critical problem facing software engineering” [12]. In such situation it’s useful to combine other disciplines, such as Usability Engineering (UE) and Human-Computer Interaction (HCI). Usability engineering is a generic practice, traditionally used to develop any kind of engineering product, and it can be applied to design software systems [13]. In the practice, UE is associated with HCI [14], [15] since it considers user and tasks analysis practices, which determine the aspects that the user interface should have. In this study we attend the challenge of designing user interfaces for mobile phone applications, making a joint of SE, UE and HCI. We focus our study on the user and tasks analysis in conjunction with the use cases technique in order to derive an efficient and robust user interface. In this combination, we emphasize on user considerations such as tasks performed, user’s professional profile, and device’s advantages and limitations. This paper is structured as follows. Section 2 introduces the background and some related works. Section 3 describes the basic concepts of usability engineering which are considered in our proposal. Section 4 describes our proposal to treat usability, emphasizing on mobile interfaces for devices with iOS. Section 5 exposes a case of study applying our proposal to a video cameras system. Finally, section 6 contains our conclusions and describes the future work.
2 Background and Related Works SE is mainly concerned with the internal functionality of the product, and it’s oriented to the software development process. HCI mainly employs the User Centered Design
Engineering the Development Process for User Interfaces
67
(UCD) approach [8], [15], which places the user, user tasks and user goals as the main themes in order to guide the design and implementation of the product. This focus is alike UE techniques. In terms to improve usability, most efforts have been done separately in SE and HCI, however, SE practices neither HCI techniques have been focused properly to manage application’s usability [16]. Nowadays, specific usability engineering practices are not included within software engineering models; even more they are not well adopted in the industry where applications are produced. Commonly the tasks models are based on individual task reviews, and a lack of organizational context modeling generates the major usability problems [17]. Furthermore, “Most basic elements in the usability engineering model are empirical user testing and prototyping, combined with iterative design” [13]. The necessity of integrating HCI and SE techniques has been expressed in several studies such as [9], [12], [16], [18], [19], [20], [21], [22], [23], [24], [25], [26], [27], [28]. Some of them manage a complete view of the development process, covering a general approach, but not oriented to a specific type of applications and platforms. Next we describe some of the more related works in the context of desktop applications. This serves as basis to formulate our proposal. In [24], a review of some frameworks for usability is presented, emphasizing their weaknesses as well as how far the objective of integrating HCI methods and principles within different software engineering methods has been reached. In order to reduce this lack, this work presents a generic framework that can: (1) facilitate the integration of usability engineering methods in software development practices and, (2) foster the cross-pollination of the HCI and software engineering disciplines. A process model oriented approach is presented in [29], incorporating the task analysis into the use case driving approach for developing interactive system. This model emphasizes on the use of various UML diagrams, especially use case and sequence diagrams to express the task analysis, in order to derive a dialogue model defining communication between the user and the system, and also a structure for the user interface and its components. These components are expressed in a class diagram. Other work with similar scope is presented in [22]. This model also emphasizes on the use case artifact as a tool to express the task analysis. UML syntax also is used to orchestrate the interaction model expressing standard relationships between use cases. As we can see, several proposals have been developed for integrating HCI and usability techniques into the software development lifecycle; even though, still remain a necessity to focus usability techniques properly to develop applications [16]. Improving usability could be benefit by complementing HCI and SE practices with newer types of evaluation emerging from cognitive science and usability engineering. Most studies on usability matters are focused on stationary applications; however, the challenges faced on usability improvements also affect directly the development of mobile applications. The usability is one of the main domains in such systems. There are a lot of improvements to do in the context of mobile applications [30], [31], [32]. As we mentioned above, a problem to face in mobile application is concerned with the variable screen sizes and layouts for different handset. This problem is directly connected with the usability aspects of applications. In this case, the diverse display sizes (especially reduced sizes) oblige to optimize UI aspects in order to
68
R. Juárez-Ramírez et al.
satisfy the user’s necessities and ergonomics requirements. This fact is compounded by the requirement that the mobile UI needs to provide maximum information and/or functionality with minimum navigation [33]. All of these aspects keep challenging developers, even if their aim is to facilitate tasks and increase the productivity of the users. A depth study on user and tasks analysis can be applied in order to meet these requirements. An evident relation is that any functionality of a system should have associated user interface aspects, so that, a combination of different disciplines can be used to construct robust user interfaces. To develop mobile applications, especially their interfaces, three approaches can be merged [18], [19], [34]: Usability engineering, human-computer interaction, and software engineering. A practical usability engineering process can be incorporated into the software development process to guarantee the usability of interactive products. Usability activities should be present during the three main phases of a software project [13]: before, during, and after product design and implementation.
3 Essential Concepts of Usability Engineering In this section we describe the main aspects of usability engineering which are considered in our proposal. Fig. 1 shows the traditional lifecycle for the usability engineering. Below we describe these three phases.
Fig. 1. The usability engineering’s life-cycle
We emphasize in the specification phase, especially on the user and task analysis practices. From the SE perspective, it’s important to perform a strength description of the user and tasks in order to detect useful elements for the analysis phase in the development process. The specification phase gathers the requirements for the system in terms of characteristics of the users and tasks performed [13], [35]. Usability specifications define how user interfaces must fit the user and tasks requirements. In the design phase the user interface is specified in terms of the interactions of the users, and it is materialized in prototypes [35]. After that, through the prototypes it’s tested with the participation of users in an iterative process.
Engineering the Development Process for User Interfaces
69
In the evaluation phase, the usability test allows gathering data about the usability of the system. These tests should be realized by a group of users performing specific tasks [35]. Heuristic evaluation pretends to find the usability problems in the design [36]. It’s done as a systematic inspection of the user interface design based on usability guidelines and principles, not just in personal opinions. The user analysis User analysis has as objective to know the users for whom the system is built [29], [32]. This means to know in depth the user’s characteristics in order to visualize their profile. User analysis identifies the target users and their characteristics including those shown in Table 1 (first and second columns), which were extracted from [35]. Table 1. Characteristics for the user profile integration Characteristics Demographic data:
Traits intelligence:
Factor Age.
Description Adequacy of the theme of the application to the age. Gender. Fondness for the theme of the application. Education. - Knowledge acquired. - Reasoning abilities. Occupation. - Activities performed. - Abilities acquired or practices in the work domain. Cultural background. Customs or habits related with application content: vocabulary, signs (symbols). Special needs. Capabilities and limitations to operate machines or devices. Computer training and Abilities to operate the computer or similar knowledge. devices. Experience with - Knowledge of the system functionality. similar - Abilities to manipulate the system. systems/products. Learning styles, learning abilities. and Cognitive styles.
Affective traits. Capabilities for human body gestures. Skill sets or capability. Skills possessed for learning or knowledge acquisition. Type of job, involved tasks, physical and Job or task Job characteristics. cognitive exigencies. related factors: Knowledge of - Expertise in the application domain. application domain - Amount of time in the job or job experience. and job familiarity. Rate of use of the Frequency of computer use for the job and computer. usage constraints.
In the third column we propose a description for each factor trying to precise aspects concerned with the user interface design.
70
R. Juárez-Ramírez et al.
The task analysis The task analysis is concerned with understanding what users’ goals are and what they do to achieve their goals [35]. • It describes behaviors at three abstract levels: goals, tasks, and actions. • It also includes scenarios and conditions under which humans perform the tasks. • The objective of task analysis is to identify opportunities to support user activities. From the HCI point of view, task analysis also distinguishes between what computers do, and what humans do. In this case, techniques from SE such as use cases [9], [37] and scenarios can be used for task analysis. In addition, existing techniques on task analysis (e.g., Hierarchical Task Analysis) can be applied at this stage.
4 The Proposed Model In this section we describe a proposal to merge HCI, UE and SE practices, adding some specific recommendations and principles for user interface design recommended by manufactures, which contribute to improve the user experiences in mobile applications. In Table 2 we present the main phases (column 1) of the software development process, emphasizing specific practices and work products. Table 2. Merging practices from SE, UE and HCI Phase Requirements elicitation
Practice Requirements specification.
Work Product Requirements definition.
Analysis
User analysis. Task analysis. Context analysis. Data analysis.
User profile. Task’s architecture. Context specification. Data model.
Design
Database design. User interface design. Program design. Design of interaction. Presentation design.
Database architecture. Interface specification. Logical architecture. Dialogue model. Prototyping.
Implementation
Formative evaluation.
Software components.
Testing
Summative evaluation.
Usability test report.
Engineering the Development Process for User Interfaces
71
Practices showed in the second column were extracted from SE, UE and HCI; they represent a merging between their processes. The work products also were extracted from the processes of the three disciplines. Most of these practices remain considering GUI designs, which are mainly useful for desktop applications. However, modern devices employ technologies more sophisticated than GUI interactions. In order to attend the specific matters of mobile devices, especially smart phones, we incorporate two aspects suggested for usable interfaces for specific devices: (i) Principles of user interface design from the iOS system, and (ii) the Natural User Interface (NUI) approach, which includes most concerns for usable interfaces. 4.1 The Apple iOS Principles for User Interfaces In this section we describe some principles contained in the Apple iOS approach [38], [39]. We emphasize how these principles match the basic concepts of usability engineering and software engineering (Third column in Table 3). Table 3. Matching iOS usability principles with SE/UE concepts Principle Aesthetic integrity Consistency
Direct manipulation
Feedback
Metaphors
User control
Description It is a measure of how well the appearance of the application integrates with its function. Consistency in the interface allows people to transfer their knowledge and skills from one application to another. It is an application that takes advantage of the standards and paradigms people are comfortable with. When people directly manipulate on screen objects instead of using separate controls to manipulate them, they're more engaged with the task and they more readily understand the results of their actions. Feedback acknowledges on people’s actions and assures them that processing is occurring. People expect immediate feedback when they operate a control, and they appreciate status updates during lengthy operations. When virtual objects and actions in an application are metaphors for objects and actions in the real world, users quickly grasp how to use the application. People, not applications, should initiate and control actions. Although an application can suggest a course of action or warn about dangerous consequences, it’s usually a mistake for the application to take decision-making away from the user.
SE/UE Concepts Functionality adequacy (SE). Professional experiences (UE), Device/platform constraints (SE/UE). Intuitiveness (UE), Ergonomics (UE).
Usability (UE), Ergonomics (UE).
Intuitiveness (UE).
Usability (UE).
72
R. Juárez-Ramírez et al.
Proprietary systems suggest principles and practices for user interfaces design; most of them appear disperse guidelines. However, as we can see in Table 3, we can establish a matching between the principles published by proprietary systems and the basic concepts of SE, UE and HCI. 4.2 The NUI Approach In order to cover the usability aspects for touch screen technologies, we attend the NUI approach, which involves an interface that lets people use their natural behaviors to interact directly with information. NUI has four defining characteristics [39]: chNUI_01: Direct, natural input. This characteristic involves anything that comes naturally; it could include 3D gestures, speech recognition, and facial expressions. Mainly it mostly means multi-touch. In practical terms, it allows natural, expressive gestures like pinching, stretching, twisting, and flicking. chNUI_02: Realistic, real-time output. A richer input demands richer output. In order to harness natural responses, a NUI output has to be as fast and convincing as nature itself. For example, when the user makes a natural gesture like a “pinch”, the display has to respond in an animated almost photorealistic way in real time, or else the illusion will be broken. chNUI_03: Content, not chrome. One of the most unnatural features of a computer interface is including only windows, icons and menus, which are laden with visual signals and controls, or "chrome". This tends to distract users from the actual content they are trying to work with. A NUI strips most of this away and lets users focus on one thing at a time. chNUI_04: Immediate consequences. In the real world, actions have immediate consequences. In order to attend this principle, NUI devices and applications start instantly and stop on a dime. Changes are saved as user go. To attend this approach, we need to consider a NUI version for the use cases, which differs from a GUI use cases view; in this case we should identify the NUI gestures for each action in the flows of a use case.
5 Case of Study: iSysCam -Application for Remote Control of Video Cameras Using iOS Devices In this section we expose a case of study consisting in a video-camera security system called iSysCam, which is operated by iOS devices. To develop iSysCam we generated the following artifacts: functional requirement specification (including business rules), use cases (including prototyping), class diagrams, and source code. In this section we expose some details of these artifacts, emphasizing some of the main aspects suggested in the IDEAS methodology [9], [20] for the task analysis and user analysis. Still when the details of the design and implementation phases are not described here, also we attended the recommendations proposed in said methodology. For the implementation phase we used Objective-C, in the Xcode 3.2.3 framework, which integrates Interface Builder, a graphic tool to create user interfaces.
Engineering the Development Process for User Interfaces
73
5.1 The Requirements Phase We introduce the idea of identifying a first version of the primary tasks since the requirements specification phase. Table 4 shows the primary tasks related with the cameras manipulation. For this functionality we identified two end users: client (guard), and administrator. The client is the main user of the system at the level of streaming consulting. The administrator is a kind of user with special permissions to perform operations, such as manipulation of the cameras and image adjusting. Table 4. Main tasks for the iSysCam system –module of cameras manipulation Primary task Potential user 1. Connecting to the system (pTsk_01) Client, Administrator. 2. Streaming visualizing (pTsk_02) Client, Administrator. 3. Image parameters’ adjusting (pTsk_03) Client, Administrator. 4. Camera’s manipulation (movements) (pTsk_04) Administrator.
In order to illustrate the tasks and user analysis sub phases, we considered the functional requirements (enumerated as FR_00 in Table 5) involving user interaction, especially for client and administrator. In this case, for each functional requirement (22 functional requirements) we indicate the main objective of the user and data entities involved. This will help us to manage the task or data model for UI design. Table 5 shows a short list of functional requirements, as example. Table 5. Functional requirements and user actions for iSysCam FR
Actor
FR_01 Client, administrator FR_02 Client, administrator FR_04 Client, administrator FR_05 Client, administrator FR_07 Administrator
Objective
Main User Action Entity
Log in the system.
Enter information.
Recovering password.
Enter information.
Visualize streaming -all the cameras. Visualize streaming -one camera. Change rotation of a camera.
Select that option.
User account User account Image
Select the camera.
Image
Indicate the rotation (+, -).
Camera pTsk_04 (configurati on) Camera pTsk_04 (configurati on) Image pTsk_03
FR_08 Administrator Change optic zoom of a camera.
Indicate the change (+, -).
FR_12 Administrator Establish sharpness for image of a camera. FR_14 Administrator Change the name of a camera.
Enter new value.
FR_15 Administrator Establish night mode for a camera
Activate mode.
Enter new name.
Primary Task pTsk_01 pTsk_01 pTsk_02 pTsk_02
Camera pTsk_04 (configurati on) Image pTsk_04
74
R. Juárez-Ramírez et al.
The user analysis: User model In order to define the profile of the user, we considered aspects such as: user experience, physical, and cognitive skills. To define this profile we applied a survey including 54 potential users. Table 6 shows the aspects related with the user identification and experience. Second column shows the expected characteristics for the user. Third column shows the characteristics detected in the survey. Table 6. User’s profile –personal characteristics and experience Profile: Gender Age (years) Education (years) Role Experience: Professional Experience (in months) Computer experience (in months) – platform Computer experience (in months) – operating system Computer experience (in months) – type of use
Expected M/F 25+ 12+ Owner/Guard
Detected M 25+ 12+ Owner/Guard
1-12 1-12
12+ 12+
1-12
2+
User
User
For physical aspects we considered the minimal skills required to operate the mobile device. Also, we considered the minimal cognitive skills required to operate the mobile device and the system (See Table 7). Table 7. User’s profile -physical and cognitive factors required Profile Physical: Hand Finger Eyes Ear Cognitive: Attention Comprehension Learning Memory Inference Decision-making Language processing
Expected
Detected
1+ 2+ (thumb, index) 1+, (20/20) 1
2 5 2, 2
Sustained Good Visual/kinesthetic Good Good High
Divided Good Visual/kinesthetic Good Good High
Engineering the Development Process for User Interfaces
75
The information gathered allows us to establish the following aspects to initiate the design of the user interface. We can conclude that: -
-
The potential users have an age that gives them a sense of responsibility to take care about the business or home. This aspect can influence the motivation. The potential users have the level of education that gives them the knowledge and skills required to treat with the operation of the system. The potential users have the minimal physical and cognitive capabilities required to operate the system.
5.2 The Use Case Model -Identifying the NUI and GUI Elements We have considered 22 use cases, which were derived directly from the functional requirements. The use cases were studied in terms of: (1) Task analysis, (2) data analysis, and (3) NUI elements. All these views contributed to design the user interface. We have generated two proposals for the user interface at the level of menus and navigability: (1) Task oriented view, and (2) data oriented view. See Fig. 2. The tasks oriented view consists of menus and navigation components based on groups of tasks associated by their nature. In the data oriented view the user interface consists of menus and navigation components based on the main data entities (user_account, session, camera, image, streaming). The final version of the user interface was a mixing of both.
Fig. 2. The task and data oriented views for the user interface
For each use case we have performed a task analysis expressed in sequences of events for basic and alternative flows, for which we identified the GUI, dialogue and NUI elements. In this case, due to restrictions of space, we present a reduced set of use cases. Table 8 shows the following set of use cases. CU_01: Log in the system;
76
R. Juárez-Ramírez et al.
CU_02: Recovering password; CU_04: Visualize streaming -all the cameras; CU_05: Visualize streaming -one camera; CU_07: Change rotation of a camera; CU_08: Change optic zoom of a camera; CU_12: Establish sharpness for image of a camera; CU_14: Change the name of a camera. This description allows us identifying specific elements for GUI such as windows, buttons, dialogues/messages, etc. In the case of NUI, also are identified the natural elements such as gestures. Some GUI elements allow user to operate the functionalities of the application and others allow operating the navigability. NUI elements allow user to operate most of the functionalities of the applications. Table 8. NUI elements for the use cases of the iSysCam system Use case
GUI elements Graphic objects
CU_01: Basic flow
NUI elements Dialogue elements
Window 1: TextField, Button; Message (Success) Window 2: Message. Window 3: Message, Button Message (Error)
Alternative flow CU_02: Basic flow Window 4: TextField, Button; Message (Success) Alternative flow Window 5: Label, Button Message (Error) 1 CU_04: Basic flow Window 6: UITable; Window 7: UITable; Window 8: UIWebView, Button CU_05: Basic flow Window 9: UIWebView, Button; Alternative flow Window 12: UIWebView 1 CU_07: 2nd Basic flow Window 12: UIWebView Alternative flow Window 11: UIWebView, Button; CU_08: Basic flow Window 11: UIWebView, Button; CU_12: Basic Flow Window 11: UIWebView, Button; Window 13: UITable; Window 14: UISlider; CU_14: Basic Flow Window 11: UIWebView, Button; Window 13: UITable; Window 16: UISwitch;
Tap, swipe Tap Tap, swipe;
Tap, Swipe; Tap, swipe; Tap
Tap Swipe
Pinch Out Tap
Tap, Double Dual Finger Tap Tap, Swipe
Tap, Swipe
All of these components were evaluated considering the user capabilities to operate them. This analysis allows us to generate a robust user interface.
Engineering the Development Process for User Interfaces
77
6 Conclusions and Future Work In this paper we expose a proposal to integrate SE, UE and HCI concerns in order to improve the usability of software applications. We show how this approach can be adopted to construct software applications for mobile devices. We emphasize on the user and task analysis practices in order to consider capabilities and conditions of the users to operate the system. Also, we considered practical recommendations from the manufacturers of mobile devices in terms of principles and guidelines for user interface design. Considering those recommendations allows us to combine theoretical concepts and real practices, which contribute to construct practical user interfaces. As future work, we are formulating a new methodology that considers not only all the concepts from SE, UE and HCI exposed here, but also considering a user analysis in depth emphasizing on the physical, cognitive, psychological and professional concerns. We are planning specific scenarios to apply this methodology in the development of mobile applications, including applications for people with physical or cognitive impairments. Furthermore, our current study is oriented to integrate strategies to incorporate the user information and usability issues into the software architecture from the early phases.
Acknowledgment This work was done under the support of the Mexican Government through the CONACYT and the Secretary of Economy; granting the proposal No. 140022 in the program “Convocatoria de Proyectos de Investigación, de Desarrollo o de Innovación Tecnológica 2010.” We are grateful for the hard work performed by all undergraduate students who participated in this project. Also, thanks to the graduate students who contributed to integrate this paper. In the same way, thanks to the GPPI’s employees who participated in this project.
References 1. Nilsson, E.G.: Design guidelines for mobile applications. SINTEF Report STF90A06003 (2005) 2. Serhani, M.A., Benharref, A., Dssouli, R., Mizouni, R.: Toward an Efficient Framework for Designing, Developing, and Using Secure Mobile Applications. International Journal of Human and Social Sciences 5(4), 272–278 (2010) 3. Wirth, N.: A Brief History of Software Engineering. IEEE Annals of the History of Computing 30(3), 32–39 (2008) 4. Emmott, S., Rison, S.: Toward 2020 Science. Technical Report, Microsoft Research Cambridge (2005) 5. Petrova, K.: Mobile learning as a mobile business application. International Journal of Innovation and Learning 4(1), 1–13 (2007) 6. Delic, N., Vukasinovic, A.: Mobile payment solution: Symbiosis between banks, application service providers and mobile network operators. In: Proceedings of the Third International Conference on Information Technology: New Generations (ITNG 2006), pp. 346–350. IEEE Computer Society Press, Las Vegas (2006)
78
R. Juárez-Ramírez et al.
7. Gruhn, V.: University of Leipzig, Mobile Software Engineering, http://www.iasted.org/conferences/2007/innsbruck/se/ pdfs/GruhnSE2007.pdf 8. Harper, R., Rodden, T., Rogers, Y., Sellen, S.: Being Human: Human-Computer Interaction in the year 2020. Technical Report, Microsoft Research Cambridge (2008) 9. Molina, J.P., González, P., Lozano, M.D., Montero, F., López-Jaquero, V.: Bridging the gap: Developing 2D and 3D user interfaces with the IDEAS methodology. In: Jorge, J.A., Jardim Nunes, N., Falcão e Cunha, J. (eds.) DSV-IS 2003. LNCS, vol. 2844, pp. 303–315. Springer, Heidelberg (2003) 10. Paspallis, N., Papadopoulos, G.A.: An Approach for Developing Adaptive, Mobile Applications with Separation of Concerns. In: Thirtieth Annual International Computer Software and Applications Conference (COMPSAC 2006), pp. 299–306. IEEE Computer Society Press, Chicago (2006) 11. Cheng, M.-C., Yuan, S.-M.: An adaptive mobile application development framework. In: Yang, L.T., Amamiya, M., Liu, Z., Guo, M., Rammig, F.J. (eds.) EUC 2005. LNCS, vol. 3824, pp. 765–774. Springer, Heidelberg (2005) 12. Lethbridge, T. C.: Integrating HCI and Usability into Software Engineering: The Imperative and the Resistance, http://www.capchi.org/documents/capchi_lethbridge_060927.pdf 13. Nielsen, J.: The Usability Engineering Life Cycle. Computer 25(3), 12–22 (1992) 14. Knouf, N.A.: HCI for the real world. In: Proceedings of the 27th International Conference Extended Abstracts on Human Factors in Computing Systems CHI EA 2009, pp. 2255–2564. ACM Press, Boston (2009) 15. Blevis, E.: Sustainable interaction design: invention & disposal, renewal & reuse. In: Proceedings of the ACM Conference on Human Factors in Computing Systems, pp. 503–512. ACM, San Jose (2007) 16. Kushniruk, A.W., Patel, V.L.: Cognitive and usability engineering methods for the evalua-tion of clinical information systems. Journal of Biomedical Informatics 37(1), 56–76 (2004) 17. Pimenta, M.S., Barthet, M.F.: Context Modeling for an Usability Oriented Approach to Interactive Systems Requirements Engineering. In: Proc. of IEEE International Symposium and Workshop on Engineering of Computer Based Systems (ECBS 1996), pp. 315–321. IEEE Computer Society, Washington, DC, USA (1996) 18. Ferre, X.: Integration of Usability Techniques into the Software Development Process. In: Proceedings of the International Conference on Software Engineering (ICSE 2003), IFIP, Portland, Oregon, pp. 28–35 (2003) 19. Walenstein, A.: Finding Boundary Objects in SE and HCI: An Approach Through Engineering-oriented Design Theories. In: Proceedings of the International Conference on Software Engineering (ICSE 2003), IFIP, Portland, Oregon, USA, pp. 92–99 (2003) 20. Molina, J.P., González, P., Lozano, M.D., Montero, F., López-Jaquero, V.: Bridging the gap: Developing 2D and 3D user interfaces with the IDEAS methodology. In: Jorge, J.A., Jardim Nunes, N., Falcão e Cunha, J. (eds.) DSV-IS 2003. LNCS, vol. 2844, pp. 303–315. Springer, Heidelberg (2003) 21. Wania, C.E., Atwood, M.E., McCain, K.W.: How do design and evaluation interrelate in HCI research? In: Proceedings of the 6th Conference on Designing Interactive Systems, pp. 90–98. ACM, New York (2006) 22. García, J.D., Carretero, J., Pérez, J.M., García, F., Filgueira, R.: Specifying use case behavior with interaction models. Journal of Object Technology 2(2), 1–17 (2003)
Engineering the Development Process for User Interfaces
79
23. Wang, S., Yilmaz, L.: A Strategy and Tool Support to Motivate the Study of Formal Meth-ods in Undergraduate Software Design and Modeling Courses. Int. J. Engineering Ed. 22(2), 407–418 (2006) 24. Seffa, A., Desmarais, M., Metzker, E.: HCI, Usability and Software Engineering Integration: Present and Future. In: Autores, Libro (eds.) Human-Centered Software Engineering –Integrating Usability in the Software Development Lifecycle, Human-Computer. Series, vol. 8, II, pp. 37–57 (2005) 25. Seffah, A., Metzker, E.: The obstacles and myths of usability and software engineering. Communications of the ACM - The Blogosphere 47(12), 71–76 (2004) 26. Sousa, K., Furtado, E.: From usability tasks to usable user interfaces. In: Proceedings of the 4th International Workshop on Task Models and Diagrams, pp. 103–110. ACM, Gdansk (2005) 27. Golden, E.: Helping software architects design for usability. In: Proceedings of the 1st ACM SIGCHI Symposium on Engineering Interactive Computing Systems, EICS 2009, pp. 317–320. ACM Press, Pittsburgh (2009) 28. Reeves, S.V.: Principled formal methods in HCI research. In: IEEE Colloquium on Formal Methods in HCI, vol. III, pp. 2/1 - 2/3. IEEE Explore, London (1989) 29. Kim, S.-K., Carrington, D.: Integrating Use-Case Analysis and Task Analysis for Interactive Systems. In: Asia Pacific Software Engineering Conference (APSEC 2002), pp. 12–21. IEEE Computer Society, Washington, DC, USA (2002) 30. Baillie, L.: Motivation for Writing the Paper: Designing Quick & Dirty Applications for Mobiles: Making the Case for the Utility of HCI Principles. Journal of Computing and Information Technology - CIT 18(2), 101–102 (2010) 31. Suzuki, S., Nakao, Y., Asahi, T., Bellotti, V., Yee, N., Fukuzumi, S.: Empirical comparison of task completion time between mobile phone models with matched interaction sequences. In: Jacko, J.A. (ed.) HCI International 2009. LNCS, vol. 5612, pp. 114–122. Springer, Heidelberg (2009) 32. Bellotti, V., Fukuzumi, S., Asahi, T., Suzuki, S.: User-centered design and evaluation - the big picture. In: Proceedings of HCI International 2009, pp. 214–223. Springer, Berlin (2009) 33. Kaiwar, D.: Building Enterprise Business Mobile Applications, http://www.trivium-esolutions.com/downloads/ Enterprise%20Mobile%20Applications%20-%20White%20Paper.pdf 34. Kohler, K., Paech, B.: Usability Engineering integrated with Requirements Engineering. In: Proceedings of ICSE 2003 Workshop on Bridging the Gaps Between Software Engineering and Human-Computer Interaction, IFIP, Portland, Oregon, USA, pp. 36–40 (2003) 35. Zhang, P., Carey, J., Te’eni, D., Tremaine, M.: Integrating Human-Computer Interaction Development into System Development Life Cycle: A Methodology. Communications of the Association for Information Systems 15, 512–543 (2005) 36. Nielsen, J.: Finding usability problems through heuristic evaluation. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 373–380. ACM Press, New York (1992) 37. Swain, D. E.: From Task Analysis to Use Cases, http://www.stc-carolina.org/ wiki_attachments/SwainTRIDOCpresent.pdf 38. Apple Inc.: iOS Human Interface Guidelines -User Experience, http://developer.apple.com/library/ios/documentation/ UserExperience/Conceptual/MobileHIG/MobileHIG.pdf 39. Cartan, J.: iPad’s Natural User Interface at Work, http://blogs.oracle.com/usableapps/2010/07/ ipads-natural-user-interface-a.html
Lifelong Automated Testing: A Collaborative Framework for Checking Industrial Products Along Their Lifecycle Thomas Tamisier and Fernand Feltz Department Informatics. Systems, Collaboration (ISC) Centre de Recherche Public- Gabriel Lippmann 41, rue du Brill, L-4422 Belvaux, Luxembourg [email protected]
Abstract. With more and more intelligence and cutting edge applications embedded, the need for the accurate validation of industrial products from early stage of conception to commercial releases has become crucial. We present a new approach for automating the validation of critical and complex systems, in which tests sequences can be at first derived both from the modeling of the system under test and the properties to check, and then easily updated according to evolutions of the product or the usage requirements. The approach is being proofed in validating high-class sensor prototypes for the automotive industry, which in particular illustrates how operational issues related to the different level of abstraction between textual specifications and effective test routines can be solved. Keywords: Model-Based Testing, Collaborative CAD, Embedded Systems.
1 Introduction From early stage of conception to commercial releases, the test of smart systems such as sensors used in automotive industry has become crucial. As a matter of fact, the industrial challenge consists not only in developing and customizing the new features quickly but also in making sure of the stability and the robustness of products that embed more and more intelligence and cutting edge applications, whereas the complexity of their controlling software increases exponentially with each additional feature [1]. In whole, their life cycle faces 3 main challenges involving test issues. •
•
•
in order to timely respond to the market with stable and robust solutions, embedded software applications have to be validated before their hardware support is available; all functionalities with their exponential number of use cases are to be tested as exhaustively as possible to ensure safety of numerous critical applications and avoid customers dissatisfaction; test phases must be re-run for every new release of the product required both by new integrations (e.g. automotive sensors successively bought by different car makers) and by its own evolution cycle.
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 80–86, 2011. © Springer-Verlag Berlin Heidelberg 2011
Lifelong Automated Testing: A Collaborative Framework
81
Tests are commonly divided in validation, which ensures that functional and other usage requirements are met, and verification, which checks that every life cycle step yields the right product. Typically, validation and verification require multiple procedures related to the elementary functionalities of the product [2], and every modification in the product is to be incorporated in a set of interdependent test procedures. As a result, tests are invalid until all procedures are completed, which overburdens the testing process, and delays the commercial releases of the product. Such constraints are thus particularly inappropriate to nowadays' powerful systems, which integrate a lot of software, making changes readily available. The situation calls therefore for testing tools with the following capabilities: • • •
thorough process of a whole product according to its specifications automatic determination of tests necessary to meet specified criteria easy update to accommodate frequent changes performed on the product.
In addition, developers generally test their tasks separately. Contrarily, we present in the following an approach towards a centralized testing framework fulfilling these needs, where the different actors who collaborate horizontally to develop a set of functionalities, of vertically to integrate multifunctional layers in a single product, may improve their collaboration for the testing of the integrated product. In particular, implementation and experiments are discussed to show how the approach allows integrating more straightforward and efficient test procedures during the development cycle of the product.
2 Collaborative Model Based Testing In traditional local testing, for each task, test sequences are first defined according to the properties to check. Then, though almost entirely controlled by computer, they are executed manually and separately on the test bench made of the system under test (SUT) and all auxiliary peripherals (like power supplier, environmental equipment, dedicated software) [3]. Automating test procedures consists therefore in building a unified software framework that will allow extracting tests sequences from abstract specifications, executing them through interfaces with the physical test bench, and controlling and monitoring the automatic execution through a dedicated user interface. This centralized framework is also dedicated to improve the collaboration between all developers involved in the testing of the integrated product. It is today common practice in software development to begin with building a model of the targeted application, showing the architecture, functionalities and interdependencies. Out of this model, application components are straightforwardly generated, or manually implemented. Similarly, Model Driven Testing (MDT) creates models of the behavior of a SUT, and uses generation tools for automatically deriving a set of computer-readable tests. These sets are further deployed to check for success or failure [4]. However, the theory of MDT is still an evolving field, with two practical unresolved issues [3]. First, test cases derived from the model are on the same high level of abstraction as the model, and cannot be directly executed on concrete test benches
82
T. Tamisier and F. Feltz
essentially made of a low-level communications. Second, testing of a whole system is an experimental process, based on empirical heuristics that cannot be integrated in a general model or a generation tool. Consequently, there is no available MDT test environment suitable for a complex system developed on standard hardware/software interfaces such as CAN, LIN, or MOST [4]. By contrast, we develop a collaborative model based testing (CMBT) approach that resorts to abstract modeling as a reference to unify the objects and procedures involved in the test bench, but uses an intermediary layer to process different levels of abstraction. This intermediary layer gives freedom to parameter and adjust test sequences. It is realized by a dedicated language, called Test Bench Scripting Language (TBSL).
3 Test Bench Scripting Language As an alternative to automatic test generation, TBSL constitutes the kernel of the CMBT approach, and allows parametric building up of test sequences. The challenge is to keep enough genericity to interpret the high-level requirements of a wide range of hardware devices and use-cases, while processing them into machine executable instructions. Developed on XML format, TBSL is an extended scripting language that includes 3 basic semantic categories of items: (1) definitions of low-level drivers (2) extendable test description elements (3) execution instructions linking the items of the 2 preceding categories. It is implemented into 3 separate modules. The high-level interface accepts as inputs specifications and properties models written in UML and extracts the information into category 2 items. In embedded software engineering, several approaches are based on UML to describe complex systems with their properties, and extract test patterns. In particular, [5] represents the SUT and test use-cases in the shape of constrained state diagrams that are in turn described in UML format. Such information from UML can be compiled into TBSL items through the XMI intermediary format. The graphical user interface (GUI) allows the non-computer specialist to program tests sequences from categories 2 and 3 items with high-level commands. It also contains a database for archiving and retrieving test sets. Thus, some standardized packages, like the ISO UDS (Unified Diagnostic Services) test protocol for automotive industry are ready to include and parameterize in a new TBSL program. This module is responsible for ensuring the coherence of the data from the different abstraction levels (model instances, checking heuristics, machine instructions). The low-level interface interacts with the concrete test bench with the help of categories 1 and 3 items. It constitutes a Hardware Abstraction Layout were all communication protocols (such as CAN, LIN, USB, Ethernet…) with physical devices, masked to the user's view, are implemented. Two versions of TBSL are available: in standalone releases, protocols are addressed through standard or native API; a more general version is being developed on top of Vector CANoe generic communication framework [6], and uses Vector API to handle all low-level translation. In view of supporting the cooperation, TBSL modules are designed for deployment on a client-server architecture, allowing developers to share the view and the control
Lifelong Automated Testing: A Collaborative Framework
83
of the same physical test bench. This architecture also complies with contradictory requirements of the software. Real-time hardware communications, and combinational complexity of the processing of generic models are handled on the server, while the resources consuming GUI is executed locally.
UML Models Specifications & Properties Graphical User Interface Tests Parametrization & Control
TBSL Compiler & Interpreter UML Compilation Test programming Hardware control
Generic test samples
Hardware Abstraction Layout
API Interfaces
Hardware
Connections Modules Ethernet, Serial, USB, CAN, SPI…
Other test devices
Power Supply
S.U.T
Legend: Communications
Run-time components
Test Bench
Fig. 1. TBSL Architecture
Figure 1 reviews the whole architecture of the project. The hardware equipment consists of the SUT and all the equipment required for testing such as a power supply or an oscilloscope. The hardware connection to the computer and its associated communication protocol can be of any kind (CAN, GMLAN…). Indeed, thanks to the hardware abstraction layout, no matter what the protocol is, the communication will be handled in the background. The abstract test cases resulting of the test selection and generation in UML format are translated in TBSL files containing all the necessary information for the test execution. When the control of the hardware via the
84
T. Tamisier and F. Feltz
abstraction layout is available, and when the whole test execution is formalized into scripts, the testing process of the SUT can be performed. Tests parameterization and control are made through the GUI to monitor the whole execution.
4 Practical Experiments The CMBT approach is being proofed in partnership with automotive supplier industry. Samples of hardware / software systems are selected to ensure the adequacy between expected and realized specification, and assess the results of the whole process. In particular, we have successfully run functional tests with a 3D Time-Of-Flight optical sensor, which represents about 63,000 lines of embedded code. The sensor is to be used aboard a vehicle to detect the position of the passenger, in view of improved control of the safety systems. The sensor has been specified using constrained state diagrams, from which we have selected 4 use-cases, dealing with power delivery, system start and reset, recognition of the passenger position… Thanks to this comprehensive benchmark, we were able to review all aspects of TBSL development and interface platform. Each test iteration follows a 3 steps procedure. 1.
2.
3.
First, abstract test cases are generated through the analysis of the software requirements according to the method provided in [7] and [8]. They are first written as UML2 class diagram , protocol state machine, and OCL constraints [9,10]. Then converted to XMi and finally translated into TBSL. The second step consists in specifying constraints on the analysis model and result in a test selection model. The execution of the test is realized in CANoe [6], after translation of the TBSL scripts in CAN Access Programming Language (CAPL) [11] and the generation of a CANoe configuration to create the test environment in the software. In a third step, a report containing test cases verdict is created in order to keep a trace of the execution.
The testing procedure is detailed in Figure 2. These experimental trials have also illustrated how the model based approach support the collaborative work in the setting and execution of test procedures. We distinguish 4 teams in the product development: Design; Software Development; Hardware Integration; Production & Validation. Following ISO layers classification and industry usage [12], the collaboration is called horizontal within the same team and vertical between them. As regard horizontal collaboration, tests sequences programmed in the GUI or retrieved from the associated database are used to organize teamwork efficiently in order to assign unit tests, share results, and avoid redundant checks. Vertical collaboration is facilitated by the use of a common model by all teams, which participate together in the setting of the tests, and can filter the view of the model according their abstraction level. In case of a modification, tests are re-generated and all developers have direct access to the pertinent information.
Lifelong Automated Testing: A Collaborative Framework
85
Tests & properties Models
XMI → TBSL
TBSL Interpreter
TBSL ↔ CAPL
CANoe API
XML format
Tests & Verdicts Reports Legend: Communications
Run-time components
Inputs/outputs
Fig. 2. Iterative Test Procedure
5 Conclusion In view of the first results, the major outcomes or the approach are the easy handling of the test bench by non-expert users as well as its quick setting even for a SUT as an early prototyping stage. Moreover, in case of new use requirements, the test bench can be straightforwardly updated through the programming interface or recompiled from a released UML model. As TBSL is targeted to check commercial products that are for the most part highly safety critical, it needs to guarantee a 100% accuracy. The biggest challenge for now on is thus to exhibit a scientific proof of the total correctness of the design of the TBSL platform and the software it contains. Acknowledgments. The authors wish to thanks L. Federspiel and D. Wiseman, from IEE S.A. Luxembourg for their constant and invaluable support in these works.
References 1. Lemke, K., Paar, C., Wolf, M.: Embedded Security in Cars, pp. 3-12 (2006) 2. Fujii, R., Wallace, D.: Software Verification and Validation. IEEE Computer Society Press, Los Alamitos (1996) 3. Tretmans, J.: Testing Concurrent Systems: A Formal Approach. In: Baeten, J.C.M., Mauw, S. (eds.) CONCUR 1999. LNCS, vol. 1664, p. 46. Springer, Heidelberg (1999)
86
T. Tamisier and F. Feltz
4. Baker, P., et al.: Model-Driven Testing. Springer, Heidelberg (2007) 5. Guelfi, N., Ries, B.: Selection, Evaluation and Generation of Test Cases in an Industrial Setting: a Process and a Tool. In: Testing: IEEE Academic & Industrial Conference, Windsor, UK, pp. 47–51 (2008) 6. Vector CANoe, http://www.vector-worldwide.com/vi_canoe_en_html 7. Guelfi, N., Ries, B.: A Lightweight Model-Driven Test Selection Process Using a DSL for Small Size Reactive Systems. In: Proc. of MODELS Conference (2008) 8. Guelfi, N., Ries, B.: A Model-driven Test Selection Process with UML2 Class Diagrams and Protocol State Machines. In: Proc. of ISSTA Conference (2008) 9. Prenninger, W., Pretschner, A.: Abstractions for Model-Based Testing. In: Proc. of ENTCS Conference (2004) 10. Object Management Group, UML 2.0 testing profile, http://www.omg.org 11. Vector Informatik GmbH, Programming with CAPL (2004), http://www.vector-cantech.com 12. James, G.: Collaboration Models. Chip Design Magazine (2008)
A Comparative Analysis and Simulation of Semicompetitive Neural Networks in Detecting the Functional Gastrointestinal Disorder Yasaman Zandi Mehran1, Mona Nafari2, Nazanin Zandi Mehran3, and Alireza Nafari4 1
Islamic Azad University, Shahre-e-Rey Branch,Tehran, Iran [email protected] 2 Razi University of Kermanshah, Department of Electrical Engineering, Kermanshah, Iran [email protected] 3 Amir Kabir University of Technology, Department of Biomedical Engineering, Tehran, Iran [email protected] 4 Amir Kabir University of Technology, Department of Electrical and Mechanical Engineering, Tehran, Iran [email protected]
Abstract. The stomach has a complex physiology, where physical, biological and psychological parameters take part in, thus it is difficult to understand its behavior and function in normal and functional gastrointestinal disorders (FGD). In the area of competitive learning, a large number of models exist which have similar goals but considerably they are different in the way they work. A common goal of these algorithms is to distribute specified number of vectors in a high dimensional space. In this paper several methods related to competitive learning, have been examined by describing and simulating different data distribution. A qualitative comparison of these methods has been performed in the processing of gastric electrical activity (GEA) signal and classifying GEA types to discriminate two GEA types: normal and FGD. The GEA signals are first decomposed into components in different sub-bands using discrete wavelet transformation. Keywords: gastric electrical activity (GEA), Functional Gastrointestinal Disorder (FGD), wavelet transform, Neural Network.
1 Introduction In recent years, researchers have developed powerful wavelet techniques for multiscale representation and analysis of signals [1,2,3]. Wavelets localize the information in the time-frequency plane [4,5]. One of the areas where these properties have been applied is diagnosis [6]. Due to the wide variety of signals and problems encountered in biomedical engineering, there are various applications of wavelet transform [7,8]. Like in the heart, there exists a rhythmic electrical oscillation in the stomach [9]. Through accomplishment of the whole digestive process of the stomach, from mixing, H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 87–99, 2011. © Springer-Verlag Berlin Heidelberg 2011
88
Y.Z. Mehran et al.
stirring, and agitating, to propelling and emptying, a spatiotemporal pattern is formed [10]. The stomach has a complex physiology, where physical, biological and psychological parameters take part in and make it difficult to understand its behavior and function. The initial concepts of a mechanical prototype of the stomach are presented which are used to describe mechanical functions of storing, mixing and food emptying [11,12]. DWT decomposes a signal into signals of different coarseness. Features extracted from the wavelet coefficients are capable of effectively representing the characteristics of the original signal. In this paper we compare methods of applying competitive neural networks from DWT coefficients of such time series [2-10]. The distinction between normal and FGD in the GEA has many applications, and interest in physically small implementations of real-time classification is growing for biomedical engineering instruments. Automated analysis of GEA has received considerable attention during the past decades and efficient and robust algorithms are developed in medical applications to detect the functional and structural abnormalities of stomach [11,12]. These methods help clinicians and patient for detection of abnormalities. As we presented previously, one of common of stomach abnormalities is Functional Gastrointestinal Disorder [13,14,15]. For these purposes, we should extract a simple set of features which permit classification of abnormalities. This paper presents techniques which use a competitive neural network to classify GEA types. In this approach, a preprocessor is used to extract features of the input GEA and FGD signals from their wavelet coefficients, before they are applied to neural network classifier for further classification. Figure.1 shows a cycle of six channels of normal and FGD GEA respectively.
Next section of this paper describes some neural networks abilities to classify the GEA types for detecting abnormalities. Experimental results of training and testing a network are presented in section 6. The models described in this paper have several common architectural properties which are described in this section. The rest of the paper is organized as follows: analyses of different semi competitive neural networks are introduced in Section.2. In Section.6, their algorithms are simulated and the experimental results are illustrated.
a Fig. 1. a) A cycle of normal of six channels, b) A cycle of FGD GEA of six channels
b
A Comparative Analysis and Simulation of Semicompetitive Neural Networks
89
2 Methods Discrete wavelet transform has been widely used in signal processing tasks. Wavelet transform, a type of time frequency transform, is a good method to provide a reasonable compromise between temporal and frequency resolution. The major advantage of the DWT is that it provides great time and frequency localization. Moreover, the multi-scale feature of the DWT allows the decomposition of a GEA signal into different scales. A variety of different wavelet families have been proposed in the literature. There are different types of wavelet transform. In this work we used db4, ciof4, bior3.7 and sym4 and compared the results. We analyzed the GEA signal using an induced wavelet transform method to determine the time–frequency (TF) pattern of the response. In order to compute DWT, calculations were performed using the MATLAB7.0. Wavelet toolbox.
3 Feature Extraction For each type of GEA signal, an example of which is shown in figure 2&3 and the wavelet coefficients for 3 levels were extracted. In this paper, we have extracted 18 features extracted from coefficients of three level wavelet decomposition and other features from GEA signals such as the energy, zero crossing, second local maximum of autocorrelation and the mean of some detail and approximation coefficients of the second or third level. A variety of different wavelet families have been proposed in the literature, too. So there are different types of wavelet transform. In this work we used db4, ciof4, bior3.7 and sym4, shown in figure 2&3 and compared the results.
Fig. 2. Normalized mean of 18 features extracted from wavelet coefficients of 10 GEA signals of each type for bior3.7
90
Y.Z. Mehran et al.
Fig. 3. normalized mean of 18 features extracted from wavelet coefficients of 10 GEA signals of each type for coif4
4 Competitive Neural Networks 4.1 Self Organizing Map A SOM network defines a mapping from a high dimensional input data space onto a regular two-dimensional array of neurons. Every neuron i of the map is associated with a n dimensional reference vector, where wi = [wi1 , wi 2 ,..., win ]T denotes the dimension of the n input vectors. The set of reference vectors is called a codebook. The neurons of the map are connected to adjacent neurons by a neighborhood relation, which dictates the topology or the structure of the map. The most common topologies in use are rectangular and hexagonal. The network topology is defined by the set N i of the nearest neighbors of neuron i . In the basic SOM algorithm, the topology and the total number of neurons are remained fixed. The total number of neurons determines the granularity of the mapping [16]. The learning process of the SOM is as follows [17]: 1-Determine the number of neurons, then assign dimensionally a proper weight vector of each one.
A = {c1 , c2 ,..., c N }
(1)
2-Select the input vector randomly, and then calculate the Euclidean distances to the weight vectors.
A Comparative Analysis and Simulation of Semicompetitive Neural Networks
91
3- Select the winning neuron which is the closest to the input vector’s pattern:
s 2 (ξ ) = arg min c∈A\{S1 } ξ − wc
(2)
Where wc is the weight vector of node c and ξ is data sample for input signal to the network. 4-Update the weight vectors based on the following equation:
Δwi =∈ (t ).hrs .(ξ − wc )
(3)
∈ (t ) =∈i (∈ f / ∈i )
(4)
t / t max
Where ∈ (t ) is the time dependency parameter. 5- Increase the time parameters.
t = t +1
(5)
6-If t < t max go to step2, then repeat the procedure.
If the neighborhood of the closest vector to the input data vector x (t ) at step is N c (t ) , then the reference vector update rule is the following: ⎧mi (t ) + α (t )[x(t ) − mi (t )] mi (t + 1) = ⎨ ⎩mi (t )
i ∈ N c (t ) i ∉ N c (t )
t
(6)
Fig. 4. normalized mean of 18 features extracted from wavelet coefficients of 10 GEA signals of each type for db4
92
Y.Z. Mehran et al.
Fig. 5. normalized mean of 18 features extracted from wavelet coefficients of 10 GEA signals of each type for sym4
4.2 Self Organizing Tree Algorithm
SOTA is based both on the Kohonen self-organizing maps and the growing cell structure algorithms. The algorithm generates a mapping from a complex input space to a simpler output space. The input space is defined by the experimental input data, whereas the output space consists of a set of nodes arranged according to certain topologies, usually two-dimensional grids. One of the innovations of SOTA is that the output space has been arranged following a binary tree topology. It incorporates the principles of the growing cell structure algorithms to this binary tree topology. The result has been an algorithm that adapts the number of output nodes arranged in a binary tree to the intrinsic characteristics of the input data set. The growing of the output nodes can be stopped at the desired taxonomic level or, alternatively, they can grow until a complete classification of every sequence in the input data set is reached. Algorithm 1- Initialize system. 2- Present new input. 3- Compute distances to all external neurons (tree leaves). For aligned sequences, the distance between the input sequence j and the neuron i is: A
L
1 − ∑ si [r , l ]ci [r , l ]
j =1
L
d si c j = ∑
i =1
Where si [r , l ] is the value for the residue r of the input sequence node the residue r of the neuron i .
(7)
j and ci [r, l ] is
A Comparative Analysis and Simulation of Semicompetitive Neural Networks
93
4-Select output neuron i with minimum distance d ij . 5- Update neuron i and neighbors Neurons updated as:
Ci (τ + 1) = C(τ ) + ητ ,i , j (s j − ci (τ ))
(8)
Where ητ ,I , j is the neighborhood function for neuron i . 6- If a cycle finished, increase the network size: two new neurons are attached to the neuron with higher resources. This neuron becomes mother neuron and does not receive any more updating. Resources for each terminal neuron i are calculated as an average of the distances of the input sequences assigned to this neuron to itself. 7- Repeat by going to Step 2 until convergence (resources are below a threshold). This newly created file, highlight all of the contents and import your prepared text file.
Error node Error node aa
new new
bb error
Error cc
new
new new dd
Fig. 6. Growing the feature map grid. Figure (a) shows the initial structure after the first organization stage; the boundary node with the highest error value is marked. (b) New nodes are grown" into any open grid location that is an immediate neighbor of the error node. (c) After organizing the new structure with the standard self-organization process, a new error node is found. (d) Again, new nodes are grown into any open grid location that is an immediate neighbor of the error node.
4.3 Self Organizing Mapping Based on Fuzzy Similarity
Similarity measuring is a substantial part directly influences the mapping results. In SOM, Euclidean distance is used to measure the similarity. Precisely measuring the similarity between two objects is an essential step of SOM. Adaptive self-organized maps based on bidirectional approximate reasoning (ASOMBAR), a novel approach on non-Euclidean space, are presented to improve the effectiveness of the competitive and cooperative process. The matching criterion in the traditional SOM is correspondingly modified to a new fuzzy matching criterion. According to the proposed fuzzy matching criterion, the topological neighborhood is also redefined based on the new matching criterion and new fuzzy distance [18].
94
Y.Z. Mehran et al.
Consider the realization of the bidirectional approximate reasoning method which consists of forward reasoning and backward reasoning. Define the weight matrices w ∈ R 1×n [0,1] as follows:
[
w j = w j1 , w j 2 ,..., w jn
]
T
( j = 1,2,...,1)
(9)
We only describe the case of forward reasoning since it is similar to describe the case of backward reasoning. When we consider the behavior of the BAR inference network, mainly the concept of forward reasoning/backward reasoning is used. The ordered weighted averaging (OWA) in the aggregation theory is useful to combine fuzzy membership values in constraint satisfaction problems, scheduling and group decision making. In this research, BAR and the concept of OWA are introduced into our similarity distance definition. The novel fuzzy similarity distance is presented as follows: 1-For a given fuzzy input vector x = [x1 , x2 ,..., xn ]T ∈ R n [0,1] , compute the similarity
index vector Qj = [q j1, q j 2 ,...,q jn ]T ( j = 1,2,...,1) , where q ji is the similarity index between inputs x i and weight w ji .
2-Compute the fuzzy similarity distance vector S = [s1 , s2 ,..., sn ]T where s j is computed as follows: ⎧ n n ⎪ (2 λ − 1) max (q )+ 2 (1 − λ ) q ∑ ji ji ⎪ n i i =1 ⎪ sj = ⎨ ⎪ n 2 n ⎪ (1 − 2 λ ) min (q ji )+ λ ∑ q ji if n i ⎪ i =1 ⎩
if
λ ≥ 0 .5
(10) λ < 0 .5
Where λ is an important fuzzy parameter of the fuzzy similarity distance to decide how much the big factor and other smaller factors in one input vector influence the proposed fuzzy distance. k can be understood as the parameter of the fuzzy membership function, which decides the different fuzzy impacts of all the factors (big factor and other factors) in the input vector. Since the similarity measuring between two objects is very important in knowledge processing, the proposed fuzzy similarity distance is more useful for knowledge processing applications than the simple Euclidean distance. 4.4 ASOMBAR
The main goals of both the ASOMBAR and the SOM are to transform an incoming signal pattern of an arbitrary dimension into a one-or-two dimensional discrete map, and to perform this transformation adaptively in a topologically ordered fashion.
A Comparative Analysis and Simulation of Semicompetitive Neural Networks
95
Fig. 7. Mapping defined in SOM and ASOMBAR
The Algorithm of ASOMBAR
The ASOMBAR algorithm is summarized as follows: 1. Initialization: Choose random values for the initial weight vectors. The only restriction here is that the values of weight vector should be different. Sometimes we can select the weight vectors from the available set of input vector in a random manner. 2. Similarity matching: Find the best-matching (winning) vector N at time k by using then new fuzzy matching criterion:
(
)
(
)
n n ⎧ 2 ⎡ ⎤ l xi − w ji + (1 − λ )∑ xi − w ji ⎥ ⎪arg max j =1 ⎢(2λ − 1) min i =1 n i ⎪ ⎣ ⎦ N =⎨ n n ⎡ ⎤ 2 ⎪arg max l (1 − 2λ ) max x − w + (λ ) ∑i xi − w ji ⎥⎦ j =1 ⎢ i ji ⎪ i =1 n ⎣ ⎩
(
)
(
)
λ ≥ 0.5
(11) λ < 0.5
3. Updating: Adjust the weight vector of all neurons by using the update formula
w j (k + 1) = w j (k ) + η (k )h j , R (k )(x − w j (k ))
(12)
Where η(k) is the learning-rate parameter, and h j,R (k ) is the new topology neighborhood function. The learning-rate parameter η(k) should begin with a value close to 0.1; thereafter it should decrease gradually but remain above 0.01. Therefore we can choose the parameters as follows:
η0 = 0.1 τ1 = τ
2
100 logσ 0
= 200
96
Y.Z. Mehran et al.
5 Simulation as Calibration This section demonstrates the performance of the presented networks on a number of demonstrated problems. There is an artificial dataset designed to show how the algorithms learn. In order to evaluate the performance and structure of the networks mentioned above, two kinds of simulations have been performed. The simulations are performed by MATLAB ver.7. Simulations on two dimensional data of three different clusters lead to performance evaluations and conclusions of methods. First, these 120 two dimensional data are clustered in three datasets. The training of fuzzy SOM (SOMf) network is performed in 25 epochs and the trainings of other networks are performed in 7 epochs. The number of neurons is 9 at the end of training. Figure.9 illustrates the comparison of dataset clustering for different networks and Table.1 illustrates the mean value of performance evaluation error and training error for each network. For SOM ση network simulation, the following equation is used for determining
σ & η. ⎛η η = η i ⎜⎜ f ⎝ ηi
⎛ t ⎞ ⎜ ⎟
⎞ ⎜⎝ tmax ⎟⎠ ⎟⎟ ⎠
Whereη is the learning-rate parameter. The standard deviation σ of the gaussian is varied according to ⎛ t ⎞ ⎟ ⎜
⎛ σ f ⎞ ⎜⎝ tmax ⎟⎠ ⎟ σ = σ i ⎜⎜ ⎟ ⎝ σi ⎠
Whereby ηi and η f are initial and final values of the learning rate and tmax is the total number of adaption steps which is taken for or a suitable initial value σ i and a final value σ f . Second example is about 0-9 digit recognition (Figure.8. illustrates a digit for simulation with 10% noise and without noise). In this example the performance comparison of simulated networks has been proposed. Then the evaluation and training error have been calculated as shown in Table.2. A digit recognition system has been proposed where each digit has a matrix to create the input vectors then the matrix size is changed in to 10×8 where the zeros are replaced by (-1). 200 matrices are created for each digit, which matrices are the original matrix with addition of 10% white noise. By modification of matrix size in to (1×80) and concatenating them, the input matrix is created. The number of neurons is 6 and the training is performed in 7 epochs. Each digit is mostly selected by a neuron which it has to be specified or introduced as the representative of that digit. In this way any samples of a cluster which don’t have any representative are introduced as error, therefore the error percentage is determined in the whole cluster. The evaluation error is determined in this way. The vectors are randomized 10 times and each time the evaluation and training are calculated and ultimately the mean value of these ten values is calculated. According to the results, the best SOM-based network is SOMση , because the evaluation error in comparison to other methods has been decreased considerably.
A Comparative Analysis and Simulation of Semicompetitive Neural Networks
a
97
b
Fig. 8. A digit for simulation. a) “zero” without noise b) “zero” with10% noise.
6 Discussion The results of two different clusters of GEA signal lead to performance evaluations and conclusion of methods. First, these 18 features for 10 normal and 10 FGD signals for each six channels are clustered in two datasets. The training of fuzzy SOM network is performed in 15 epochs and the training of other networks are performed in 11 epochs. Table1. The mean value of performance evaluation error and training error for each network
SOM
SOM
SOM FUZZY
Training error%
0.89
4.589
5.36
performance error%
0.55
3.14
2.12
0
1 W(i,1)
a
2
2.5 2 1.5 1 0.5
W(i,2)
2.5 2 1.5 1 0.5
Weight Vectors
Weight Vectors
W(i,2)
W(i,2)
Weight Vectors
0
1 2 W(i,1)
b
2.5 2 1.5 1 0.5 0
1 2 W(i,1)
c
Fig. 9. Comparison of dataset clustering for different networks :a) SOM b) SOMf c)
SOMση
98
Y.Z. Mehran et al. Table 2. The percentage of error for 10 different training vector and two states of noise
SOMf
SOMση SOM
SOMf
SOMση SOM
without noise with noise without noise with noise without noise with noise
without noise with noise without noise With noise without noise with noise
1
2
3
4
5
9.937 10.25 0.312 1.75 2 1.75
11.25 10.50 0312 1 3.812 4.750
9.937 12.25 0.625 0.25 2.562 2
6.5 7.25 0.75 0.50 3.687 4
15.75 13.75 0.375 13.75 1.625 0.750
6
7
8
9
10
16.81 15.25 0.812 0.50 2.625 1.750
9.56 10 0.687 0.75 5.750 0.75
4.625 4.25 0.562 0.75 4 3
12.93 10.75 0.50 1 1.875 1.750
9.62 11.75 0.562 1 8.687 7.500
Acknowledgment. The authors gratefully acknowledge the financial and support of this research, provided by the Islamic Azad University, Shahr-e-Rey Branch,Tehran, Iran.
References [1] Chui, C.K.: Wavelets: A Tutorial in Theory and Applications. Academic Press, Boston (1992b) [2] Kaiser, G.: A Friendly Guide to Wavelets. Birkhauser, Basel (1994); ISBN 0-8176-3711-7 [3] Daubechies, I.: Orthonormal Bases of Compactly Supported Wavelets. Communications on Pure and Applied Math. 41, 909–996 (1988) [4] Daubechies, I.: Ten Lectures on Wavelets, Society for Industrial and Applied Mathematics (1992); ISBN 0-89871-274-2 [5] Wickerhauser, M.V.: Adapted Wavelet Analysis From Theory to Software. A K Peters Ltd, Stanford (1994); ISBN 1-56881-041-5 [6] Addison, P.S.: The Illustrated Wavelet Transform Handbook, Institute of Physics (2002); ISBN 0-7503-0692-0 [7] Vaidyanathan, P.P.: Multirate Systems and Filter Banks. Prentice Hall, Englewood Cliffs (1993); ISBN 0-13-605718-7 [8] Ruskai, M.B.: Wavelets and their Applications. Jones and Bartlett, Boston (1992) [9] Mallat, S.G.: A Theory for Multiresolution Signal Decomposition: The Wavelet Representation. IEEE.Transactions on Pattern Analysis and Machine Intelligence 11, 674–693 (1989) [10] Wang, Z.S., Cheung, J.Y., Gao, S.K., Chen, J.D.Z.: Spatio-temporal Nonlinear Modeling of Gastric Electrical Activity. In: The 3rd International Workshop on Biosignal Interpretation (BSI 1999), Chicago, USA, June 12-14 (1999) [11] Wang, Z.S.: Blind separation of slow waves and spikes from gastrointestinal electrical recordings. IEEE Trans. Inform. Technol. Biomed. 5, 133–137 (2001)
A Comparative Analysis and Simulation of Semicompetitive Neural Networks
99
[12] Wang, Z.S., He, Z.Y., Chen, J.D.: Optimized Over Complete Signal Representation and Its Applications to Adaptive Time-Frequency Analysis of EGG. Ann. Biomed. Eng. 26, 859–869 (1998) [13] Zandi, Y., Nasrabadi, M.A.M.: Neural Network Application in Strange Attractor Investigation to Detect a FGD. In: 4th International IEEE Conference "Intelligent Systems" (2008) [14] Zandi, Y., Nasrabadi, M.A.M., Jafari, A.H.: Fuzzy Neural Network for Detecting Nonlinear Determinism in Gastric Electrical Activity: Fractal Dimension Approach. In: 4th Inter-national IEEE Conference "Intelligent Systems" (2008) [15] Zandi, Y., Nasrabadi, M.A.M., Hashemi, M.R.: An Investigation on Chaotic Approach to Detect Functional Gastrointestinal Disorders. In: Proceedings of the 29th Annual International Conference of the IEEE EMBS Cité Internationale, Lyon, France, pp. 23–26 (August 2007) [16] Boinee, P., Angelis, A.D., Milotti, E.: Automatic Classification using Self-Organising Neu-ral Networks in Astrophysical Experiments. Dipartimento di Fisicadell’Universit‘a di Udine e INFN, Sez.di Trieste, Gruppo Collegato di Udine, via delle Scienze 208, 33100 Udine, Italy [17] Marsland, S., Shapirob, J., Nehmzowc, U.: A self-organizing network that grows when required. Neural Networks 15, 1041–1058 (2002) [18] Liu, Y.: Adaptive self-organized maps based on bidirectional approximate reasoning and its applications to information filtering. Knowledge-Based Systems 19, 719–729 (2006)
Individuals Identification Using Tooth Structure Charbel Fares, Mireille Feghali , and Emilie Mouchantaf Holy Spirit University of Kaslik, Faculty of Sciences, P.O. Box 446 Jounieh, Lebanon [email protected],[email protected] http://www.usek.edu.lb
Abstract. The use of automated biometrics-based personal identification systems is an omnipresent procedure. Many technologies are no more secure, and they have certain limitations such as in cases when bodies are decomposed or burned. Dental enamel is one of the most mineralized tissues of an organism that have a post-mortem degradation resistance. In this article we describe the dental biometrics which utilizes dental radiographs for human identification. The dental radiographs provide information about teeth, including tooth contours, relative positions of neighboring teeth, and shapes of the dental work (crowns, fillings, and bridges). Then we propose a new system for the dental biometry that consists of three main stages: segmentation, features extraction and matching. The features extraction stage uses grayscale transformation to enhance the image contrast and a mixture of morphological operations to segment the dental work. The matching stage consists of the edge and the dental work comparison. Keywords: Tooth Print, Contrast Enhancement, Morphological Operations, Image Processing.
1
Introduction
Biometrics is a technology of identification or authentication of a person that transforms a biological, morphological or behavioral characteristic in a digital value. Identification techniques for biometrics is used primarily for applications in the field of security, such as automated access control, counterterrorism, control of movement of persons and the biometric passport. However this individuals control raises ethical questions. At present several biometric technologies are in use, however, several of these techniques have many drawbacks such as copying or imitation of impressions and the time complexity like the DNA. In addition, none of these technologies is capable of identifying individual in serious disasters such as fire, tsunami, or the identification of bodies after a certain time of their death. This is because of the distortion or disappearance of biometrics. Dental biometrics is one of the rare techniques that might be solving this problem. In
Collaboration between the Faculty of Sciences of the Holy Spirit University of Kaslik and the School of Dentistry of the Lebanese University.
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 100–114, 2011. c Springer-Verlag Berlin Heidelberg 2011
Individuals Identification Using Tooth Structure
101
addition to the tooth structure of each individual, we can use for the purpose of identification other unique features such as number of teeth, the contour, volume, size, dental work, root, and the distance between teeth. The rest of the paper is structured as follows. In section 2 we draw up an inventory of current methods and techniques in the field of dental biometrics and we quote their benefits and drawbacks. The implementation of the proposed identification system by the tooth structure is detailed in section 3. Finally, conclusions and future works are shown in section 4.
2
Previous Work
We all search for a secure world where no one is afraid that his bank account or simply his email account is going to be used by someone else. Researchers have been working on systems to help protecting the privacy of humans. Many ideas were implemented such as the DNA, the iris recognition, fingerprint and keystroke. Such systems were excellent when used in normal situations. For sure many disadvantages were associated with each one like DNA chains are complex to extract and fingerprint is easily copied. Having those problems, researchers started looking for techniques where degradation is minimal. They found out that dental biometric is what they need. In our situation many reasons lead us to adopt dental biometrics: – It is not easy to imitate it, and we can always modify the dental structure of the persons in order to always make it unique. – Dental biometrics is precise and correct. – We don’t need an expert to treat and compare the obtained results. – Dental biometrics is the only technique that might be used on corps and living humans. We can classify previous works done with dental biometrics into three categories: Hunter-Schreger Bands (HSB), X-ray and Computed tomography. 2.1
Hunter-Schreger Bands
For most mammals the enamel is characterized by layers of prisms having alternating directions known as Hunter-Schreger bands (HSB). These layers appear as dark and light bands in the low-power light microscope. This occurs because the enamel prisms works like fiber when exposed to a directed light source. When they are observed from the outside, dark and clear lines of HSB in the enamel are seen as a fingerprint. Because of its resemblance to a real fingerprint, the structure of the HSB is mentioned as tooth-print. This similarity push researchers to create systems for identification or verification using tooth-print [1]. This type is shown in Fig. 1.
102
C. Fares, M. Feghali, and E. Mouchantaf
Fig. 1. Hunter-Schreger Bands: (a)teeth in low power light; (b)microscope view; (c)contrast enhancement of the fingerprint; (d)toothprint
2.2
X-Ray
The X-ray dental radiographs are extensively used for teeth restoration and human identification. However, these images provide little information that might help for the understanding of the tooth structure. Studies were made on dental radiographic films: bitewing and periapical, which contains a limited number of teeth, and imposes the need to take multiple images to construct the dental atlas of individual. In addition, they are based on a semi-automatic identification: the user must select the center of each tooth. It should be noted that the resolution of these images is more important than a panoramic image, thus details are highlighted [2] [3]. This type is shown in Fig. 2.
Fig. 2. X-Ray
2.3
Computed Tomography
Computed tomography (CT) is a medical imaging method using the process of tomography, where geometry processing is used to generate a three dimensional image of the inside of an object from a series of two two-dimensional X-ray images taken around a single axis of rotation. With the development of this technique, CT images provide much more information for the visualization of teeth and dental treatment. Consequently, segmentation and manipulation of CT images of each tooth become a difficult issue [4]. This type is shown in Fig. 3.
Individuals Identification Using Tooth Structure
103
Fig. 3. Computer Tomography
3
Proposed Solution
Our proposed solution consists of an automated identification prototype of the tooth structure from a panoramic radiographic image. This prototype named DAIS (Dental Automatic Identification System) is seen as a collection of components as shown in Fig. 4. This model can be divided into two phases: feature extraction and identification. In the first phase, we start by entering a dental panoramic image, followed by the extraction of the Region Of Interest (ROI) and then we enhance the image contrast. After that, a segmentation of the two jaws is performed to end up with isolated teeth. The original image is saved in a database named DRI (Digital Repository of Images). After separating teeth, we will continue with the extraction of high level features, such as number of teeth and dental work. If it is a registration process will continue with the extraction of contours and the results are finally stored in the database archive. If it is an identification process, check the similarity of the high level features, a list of candidates is then obtained. In case the list is not empty we move to the comparison of the low level features in order to obtain a simplified list and consequently one person. 3.1
Image Acquisition
There are different types of radiographic equipment. Each hardware unit is characterized by a resolution or picture quality. In this study we are using X-ray panoramic images. Those images are obtained from the ”Kodak 8000 Digital Panoramic and Cephalometric System ”. The Images have the following dimensions: 2621x1244 pixels. 3.2
ROI Selection
The objective of this study is to identify individuals from their dental structure, so ROI is represented only by teeth that will be analyzed. Two methods were proposed to select the ROI, namely a static and a dynamic method.
104
C. Fares, M. Feghali, and E. Mouchantaf
Fig. 4. Dental Automatic Identification System (DAIS)
Individuals Identification Using Tooth Structure
105
Static Method: This method is based on defining a fixed point and a rectangle for all images. After several trials, the point P was choosen to be (500px, 450px) and the rectangle of width 700px and heigh 1750px; Theoretically, this method can be effective, but practically it fails for some images due to variation in the position of the individual in relation to the device. Fig. 5 highlights the ROI inside a frame using the static method.
Fig. 5. Static ROI
Dynamic Method: This method is based on locating the void between the two jaws. This location is obtained through four steps [5]: – Apply the histogram equalization in order to improve the contrast of the image. Fig. 6-a shows the result of this step. – Apply a threshold value of 0.4 (empirically obtained value). Fig. 6-b shows the resulting black and white image. – Horizontally integrate the result by summing the pixel intensity of each image line. As the line containing the gap between the two jaws are the darkest, we focus therefore on the minimum value of the sum of intensities as shown in Fig. 6-c. – Once the minimum is obtained, we can define the center of the rectangle in the image. The dimensions will be as stated previously (700px, 1750px). Fig. 6-d shows the result of the ROI selected. 3.3
Jaws Segmentation
The requested final result of the segmentation is to obtain every single tooth in one segment. This helps us to define the ROI associated with each tooth. For simplicity, we assume that there is a single row of each maxillary and mandible each. The assumption of this hypothesis is true, except in the case of yound childrens with dental structure not finalized yet as shown in Fig. 7. For the jaws segmentation we did simply apply the dynamic method of the ROI selection as stated previously. Two seperated images will be the result as shown in Fig. 8. This is an easy to implement method but still not effective as it is shown. A new segmentation technique is implemented and it is named “parts segmentation”.
106
C. Fares, M. Feghali, and E. Mouchantaf
Fig. 6. Dynamic ROI: (a)Equalized image; (b)Binary Image; (c)Horizontal Projection; (d)Selected ROI
Fig. 7. Panoramic Image of a Children whose Dental Structure not yet Finalized
Fig. 8. Steps of Jaws Segmentation: (a) Original Image; (b) Equalized Image; (c) Binary Image; (d) Horizontal Projection; (e) The Mandible Section; (f) The Maxillary Section
Individuals Identification Using Tooth Structure
107
Parts Segmentation of Jaws: Let us first consider the simple case shown in Fig. 9-a. We sum the intensities of pixels along each line parallel to the axis. Teeth are usually of a gray level intensity that is higher than the jaws and other tissues in the image because of their higher density, the gap between the maxillary and mandible in the valley form a horizontal projection. However, there could be many valleys in the projection. To detect the difference of the valley, a point of initialization is required. The procedure for detecting the gap valley is as follows:
Fig. 9. Parts Segmentation of Jaws
We initialize a first position y, it is the estimated position of the gap between the maxillary and mandible and it is equal to half of the number of rows of the image. Let Vi with i = 1, 2, ...m be the valleys detected by the projection of the histogram with Di as the depth and Vi as the position as shown in Fig. 10. Final results are shown in Fig. 11 and Fig. 12.
Fig. 10. Horizontal Projection used to Detect the Gap between Maxillary and Mandible
3.4
Improved Contrast of Dental Films
The histogram equalization was sufficient for the selection of the ROI, and the separation of the two jaws, but it was impossible to segment the teeth without contrast enhancement. Most of the teeth are concentrated mainly in the upper gray-scale values, while the support areas and bone appear around mid gray-scale values, and air gap areas are limited to the lower gray-scale values.
108
C. Fares, M. Feghali, and E. Mouchantaf
Fig. 11. Detected Gap Projected on X-Ray Images
Fig. 12. Results of Parts Segmentation Method
3.5
Morphological Operation: Hat Transform
Hat transform can be used to improve the contrast. There are two known Hat transform: top and bottom Hat. Operation Top Hat Fig. 13-a is actually the result of subtracting the opening of the original image.
Fig. 13. (a) Top Hat; (b) Bottom Hat
However, the bottom-hat operation Fig. 13-b is defined as the closure of the image minus the original image. The bottom-hat transformation is used as the complement of the image. The operation removes the top-hat and dark background highlights the foreground objects [6]. The filters top-hat and bottom-hat can be used to extract the bright objects (or, conversely, those dark) on a dark (or light). We use both top-hat filter and bottom-hat on the original image, then combine the results by adding the image to fit the result of top-hat filter, and subtracting the result of bottom-hat filter. Fig. 14 shows an example where the teeth are enhanced and bones are removed.
Fig. 14. Enhanced Image
Individuals Identification Using Tooth Structure
3.6
109
Adaptive and Iterative Threshold
The method starts by applying iterative thresholding followed by adaptive thresholding to segment the teeth from both the background and areas of bone. The result of iterative thresholding still has fragments of bone found in the areas of teeth. To improve the segmentation results, we follow by applying an adaptive thresholding to mask the original image with the binary image thresholding iterative. The end result is a binary image. During the thresholding process, pixels in an image are marked as “object” if their value exceeds a certain threshold. Typically, a value of “1” is given for an object and a value of “0” for the background. The key parameter in the process of thresholding is the choice of the threshold value. Different methods for selecting a threshold exists, but users can manually select a threshold value, or thresholding algorithm can automatically calculate a value that is known as automatic thresholding. A simple method would be to choose the average or median, knowing that if the object pixels are brighter than the background, they should also be higher than average. A more sophisticated approach might be to create a histogram of the image intensities of pixels and use the point of the valley threshold. The histogram approach assumes that there is an average value for the background and another one for objects. Hence the iterative thresholding is used to segment the teeth of the background [7]. The iterative thresholding begins by detecting the contour by the method of Canny’s original image and the implementation of binary morphological dilation of the result. After obtaining the dilation of the image, the average value of pixels corresponding to the original image is used as a first threshold T1 noted. The second stage is T2 = 0.66T1. And the last line is T3 = 0.33T2 [8]. Then we apply adaptive thresholding. Finally we apply a filter by removing unwanted pixels by applying a mix of morphological operations (dilation, erosion, opening, closing, ...). Results are shown in Fig. 15.
Fig. 15. Iterative and Adaptive Threshold
3.7
Tooth Separation
Integration Method: The method to isolate each tooth to its neighbors is similar to the method of separating the upper and lower jaw [8]. A curve that defines the boundaries of each row of teeth is determined. Results for the mandible are shown in Fig. 16. This curve is simply the vertical projection.
110
C. Fares, M. Feghali, and E. Mouchantaf
Fig. 16. The projection Technique Used for the Integration Method of the Tooth Separation
Fig. 17. Results of the Integration Method
Gaps between neighboring teeth causing valleys in the projection are represented by the minimum points of the projection marked in green in Fig. 16-b and Fig. 17. By the location of these valleys, the teeth can be segmented. A similar procedure is then used to segment the teeth of the maxillary but adapting different morphological operations since the structure of the maxillary teeth is different than the mandible (tooth size, rotation). Results of segmentation are shown in Fig. 18 and Fig. 19.
Fig. 18. Maxillary Segmentation
Individuals Identification Using Tooth Structure
111
Fig. 19. mandible Segmentation
3.8
Features Extraction
Three different features were extracted: the dental work performed on the individual, the number of dental screw, and the detection of missing teeth. Detection of Dental Work: For detection of dental work, it is sufficient to isolate all the objects whose intensity exceeds 0.95. This value is found empirically. After the isolation of these objects that can be seen as unique features to each individual, we extracted the position, volume, and the center of each dental work. These results are saved for future reference (Fig. 20).
Fig. 20. Dental Work Extraction
Detection of Dental Screw: The screw is a one of the features that can be used for the purpose of identifying people. For detection of screws we had to remove all objects that have a high intensity. Then we applied a correlation with the image of the screw. We analyzed the correlation result, searching for local maxima. If a maximum reaches a certain threshold then it is a screw. Note that
112
C. Fares, M. Feghali, and E. Mouchantaf
Fig. 21. Screw Work for the Maxillary
the screws of the maxillary are different of those of the mandible. Steps towards the identification of the maxillary are modeled in Fig. 21. It is clearly shown that the correlation result has led to only one overpass of the threshold value and consequently one screw is detected. Note that for valid results, we must have a diversity of screws form. Detection of Missing Teeth: The number of teeth, which is influenced mainly by torn or missing teeth, and their position are the oldest method of identifying an individual by the dental biometrics. We did expand it by adding the measurement of the distance between the missing teeth. Because of the variation in the size of teeth in both jaws, it is necessary to develop two different algorithms, one for the mandible and another for the maxillary. The method consists of taking part of teeth in order to get rid of noise Fig. 22. As the absence of a tooth
Fig. 22. Detection of Missing Teeth
Individuals Identification Using Tooth Structure
113
will probably have an empty space in its place, so it is the latter that should be sought. From the vertical projection, one can determine the number of existing black pixels. If their number reaches a certain threshold, a missing tooth is distinguished. Note that this threshold value varies for mandibular molars, incisors and canines, and this is because of the difference in size and volume. For maxillary, single threshold is sufficient for the detection of these teeth because they have a similar size.
4
Conclusions and Future Works
In this paper we were interested in identifying individuals through analysis of dental radiography. First, a literature review is performed on dental biometrics. Following this study, we have implemented, described and discussed a method of identification using dental biometrics. This method is composed of five stages: Acquisition of the panoramic image, segmentation of teeth, extraction of features from low level (dental work), extraction of features, and finally the comparison. Note that the acquisition of high resolution images makes the segmentation and feature extraction easier. During segmentation. Results are sufficient so far to identify individuals in a small population or for verification purposes. Finally Dental biometrics is a new tool for identifying and it is a promising technique that is still under study, implementation and generalization. Several techniques have been proposed previously and implemented but they were semi-automated and require several bitewing images to form the atlas. The novelty of our system is that it is based on a single panoramic image x-ray and it is automated. As a future work, the system has to be tested on a variety of population. We are also proposing the superposition of the panoramic image with a color image in order to find out color differences. These differences are widely used nowadays to test many vital signs. This superposition will also give us a unique specification about a person which is the degradation of color from teeth to mandible or maxillary. Curently we are working on detecting the form of the teeth as well as the form and length of roots. Acknowledgments. part of this work was done with Alain Egho and Abdo Daccache as a final year engineering project.
References 1. Ramenzoni, L., Line, S.: Automated biometrics-based personal identification of the Hunter-Schreger bands of dental enamel, vol. 273(1590), pp. 1155–1158. The Royal Society (2006) 2. Nassar, D., Abaza, A., Li, X., Ammar, H.: Automatic Construction of Dental Charts for Postmortem Identification. IEEE Transactions on Information Forensics and Security 3(2), 234–246 (2008) 3. Tohnak, S., Mehnert, A., Mahoney, M., Crozier, S.: Synthesizing Dental Radiographs for Human Identification. Journal of Dental Research 86(11), 1057–1062 (2007)
114
C. Fares, M. Feghali, and E. Mouchantaf
4. Kirzioglu, Z., Karayilmaz, H., Baykal, B.: Value of Computed Tomography (CT) in Imaging the Morbidity of Submerged Molars: A Case Report. European Journal of Dentistry 1(4), 246–250 (2007) 5. Abdel-Mottaleb, M., Nomir, O., Nassar, D., Fahmy, G., Ammar, H.: Challenges of Developing an Automated Dental Identification System, vol. 262 (2004) 6. Sulehria, H., Zhang, Y., Irfan, D.: Mathematical Morphology Methodology for Extraction of Vehicle Number Plates. International journal of computers 1(3), 69–73 (2007) 7. Shafait, F., Keysers, D., Breuel, T.: Efficient Implementation of Local Adaptive Thresholding Techniques Using Integral Images. Image Understanding and Pattern Recognition Research Group (2008) 8. Said, E., Fahmy, G., Nassar, D., Ammar, H.: Dental X-ray Image Segmentation, vol. 262. Lane Department of Computer Science and Electrical Engineering West Virginia University, Morgantown (2004)
Modeling Interpolated Distance Error for Clutter Based Location Estimation of Wireless Nodes A. Muhammad1,2, M.S. Mazliham3, Patrice Boursier2,4, and M. Shahrulniza5 1
Institute of Research and Postgraguate Studies, UniKL(BMI), Kuala Lumpur, Malaysia [email protected] 2 Laboratoire L3i Université de La Rochelle, La Rochelle France [email protected] 3 Malaysia France Institute (UniKL MFI), Bandar Baru Bangi, Malaysia [email protected] 4 Faculty of Computer Science & IT Universiti Malaya, Kuala Lumpur, Malaysia 5 Malaysian Institute of Information Technology (UniKL MIIT), Kuala Lumpur Malaysia [email protected], [email protected]
Abstract. This research work is focusing on the location estimation by using interpolated distance calculation and its error in wireless nodes based on different clutters/terrains. This research is based on our previous findings in which we divided the clutter/terrain into thirteen different groups. As radio waves behave differently in different terrains therefore we recorded data points in all different terrains. C# program was used with WiFi (IEEE 802.11) prototype to record the behavioral changes in radio signals because of atmospheric attenuation. We took readings in all different clutters at a regular interval of 10 meters (from 10m to 150m). In current research, we are using linear interpolation to calculate the distances other than regular interval with minimal error rate. First, we calculate actual distance based on receive radio signals at random and then compare it with our interpolated distance and calculate the error. We take a sample size of four in each terrain and divide the interpolated data in six zones for accuracy. This comparison table helps to estimate the location of wireless node in different terrains/clutters with the error rate of as minimum as 0.5 meter. Keywords: Clutters/Terrains, Location Estimation, attenuation, receive signal strength, interpolation.
1 Introduction From past decade location estimation of wireless/mobile nodes is becoming a very popular research area. Location estimation definition is not limited now for path finder or vehicle tracking by using GPS. It is also used in robot tracking [1], VANET [2], Wireless Local Area Network (WLAN) [3] and Wireless Sensor Network (WSN) [4]. Because of the rapid growth in the usage of handheld devices [5] and in the advancement in the mobile communication facilities and architecture, the hand phone originating calls are also increasing rapidly. A recent study shows that almost 50% of emergency calls are originated by the hand phones [6]. Therefore error calculation in location estimation or precise location estimation of mobile/wireless devices is H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 115–130, 2011. © Springer-Verlag Berlin Heidelberg 2011
116
A. Muhammad et al.
becoming key service for crime investigation and disaster management authorities [7] in order to reach the receiver. Location estimation techniques are basically divided into four categories [8, 9] which are satellite based techniques, statistical techniques, geometrical techniques and mapping techniques. Every technique has its advantages and disadvantages. Some researchers combine two or more techniques for error correction [10]. Terrains/Clutters based location estimation is an area which considers the radio waves attenuation under specific geographic and atmospheric behavior. This is the research area which has not been extensively considered by researchers. Because of the atmospheric attenuation radio waves behave differently in different terrains. Behavior of radio waves can be categorized as [11, 12] • • • • •
Reflection Refraction Diffraction Scattering Absorption
Signal to Noise Ratio (SNR) changes due to attenuation which leads to change in the value of Receive Signal Strength (RSS) from Available Signal Strength (ASS) [13]. These RSS and ASS values are the primary parameters for distance calculation between the transmitter and the receiver. SNR introduces error in receive signal (Rx). When the transmitted signal (Tx) propagates through free space the atmospheric impairments apply on it which leads to change the value of Tx. This change in value is called loss. Because of attenuation/impairment available in the free space these parameters comes with error. If we ignore the terrains/clutters impairments then the calculated location will not be précised. In our previous research work [14], we proposed thirteen different terrains based on the geographical areas. These terrains are divided into four environment groups which are [14] A. Urban Terrains/Clutters • Low Dense Urban (LDU) • Medium Dense Urban (MDU) • High Dense Urban (HDU) B. Rural Terrains/Clutters • Low Dense Rural (LDR) • Medium Dense Rural (MDR) • High Dense Rural (HDR) C. Plantation • Agriculture/Field (A/F) • Low Forest/ Plantation (LF/P) • Dense Forest (DF) D. Highways & Water Bodies • Highway/Motorway (H/M) • River/Lake (R/L) • Sea
Modeling Interpolated Distance Error
117
(We propose desert as a 13th category in the terrain division but has not been considered in this research because of the unavailability of desert in the Malaysian peninsula). This research work is a continuation of our previous research in which we calculated location of a wireless node by developing a prototype in WiFi (IEEE 802.11b/g). Previously we calculated distance based on signal strength by using regular interval of 10m (from 10m – 150m). We proposed an Error Rate Table (ERT) [14] and CERT [15] which help to improve location by considering the terrain impairments. But it does not cater the problem if the distance varies from the regular interval. In this research work we use linear interpolation and combine it with ERT to minimize the distance error in order to achieve precise location. Finally we compare actual results with interpolated data points to check accuracy in the mechanism. This research article is divided in to seven sections. Section II is explaining the interpolation and its types. Section III is discussing on the literature review of the previous research work in the area. Section IV covers Discussion based on literature review. Section V describes the problem statement. Section VI explains the experimental work, data analysis, comparison and validation of research. Section VII discusses the conclusion and the limitation of this research.
2 Interpolation Interpolation is a method which is used to define a function under the two set of values [16]. Two set of points in a plane represent a straight line. But to read variation in readings we need interpolation. There are different types of interpolation like Linear, Cosine, Cubic, Hermite, 3D linear, 3D cubic, 3D Hermite, Trilinear interpolation etc. Linear interpolation is the simplest form to get the intermediate values between two data points [17]. Formula which is used to calculate linear interpolated points is as under
P(x)
=
k
∑ ( Π k
j≠k
x xk
-
– xj
xj
)
Following research is using linear interpolation to calculate mid values from the set of two data points.
3 Literature Survey Literature survey is divided in to two sections. Section (a) is focusing on the terrain/clutter based location estimation and the terrain impairments. Section (b) is focusing on the precise location estimation. Terrains/Clutters Discussion In [14, 15] authors considered terrains error rate for location estimation. They divided the region in to thirteen categories based on the geographical area. WiFi (IEEE 802.11b/g) prototype was used to collect real time date of different terrain after
118
A. Muhammad et al.
repeating the experiment 5 times at each point of every terrain. By comparing the result they came up with three zones, which explained the signal strength and distance relation in each terrain. Finally they proposed an Error Rate Table (ERT) [14] and CERT [15] which helped to calculate error in each terrain/clutter. They claimed that error can be reduced up to 3m in low attenuation terrains. In [18], authors divide the projected geographical area in eight terrains/clutters for the deployment of the GSM network architecture in a specific Indian terrain. Categorization of available terrains were low density urban, village/low density vegetation, medium density vegetation, high density vegetation, agriculture, open/quasi open area, water bodies and river/sea. Authors introduced the concept of clutter loss which they named as CL. Based on the Receive Signal Strength (RSS) measurement they proposed a tower location prediction for the region. Based on the path losses simulation results they concluded that the proposed tower prediction method will better work in the remote Indian area (Manipur). In [19], authors considered terrains/clutters impairments for the purpose of power management of mobile nodes. They proposed Power Management Algorithm (PMA) based on the clutter based location estimation. Authors considered clutters/terrains impairment knowledge to decide that how much power could be required for the cell phone in order to maintain communication. In the first phase they proposed eight clutters then they used Google earth facility for the mapping of clutters. Finally based on the clutters knowledge they proposed power management algorithm for the purpose to use the mobile phone battery power more efficiently. The simulation showed that the PMA could save battery power up to 40% if clutter knowledge was used efficiently. In [20], authors studied the forest terrain. They studied the propagation losses which were due to antenna height, antenna gain, depolarization, humidity effect and few other factors. They discussed that the propagation losses can increase or decrease due to the above in forest terrain. They also discussed that the external effects such as rain can cause unexpected losses in the communication links. Precise Location Estimation Discussion In resembling MDS [21], authors proposed a method resembling multidimensional scaling for the tracing of GSM node. They used subsequent signal strength measurements to different base stations for the purpose. As their research was focused on vehicular network therefore they used direction and velocity of vehicle. Based on the testing they concluded that the average deviation in the location of a node is 60 m, whereas a maximum deviation in velocity is 10m/s. Based on the testing researchers concluded that resembling MDS was applicable as a decision support for the assignment of channels in macro and micro cells in hierarchical networks. In [22], author focused on statistical location estimation techniques. He used propagation prediction model for the precise location estimation. Author used Receive Signal Strength (RSS) and Available Signal Strength (ASS) as signal parameters. Propagation delay was considered as a random variable. This variable parameter was statistically dependent on the receiver location, the transmitter location and the propagation environment (terrains/clutters information). Author discussed that statistical location estimation approaches had certain type of flexibilities. This flexibility
Modeling Interpolated Distance Error
119
approach allows the fusion of different types of measurement results such as Receive Signal Strength and the timing advance. Author verdict if signal propagation environment differs significantly from the ideal condition then the angle measurement and distance will be unreliable. In predicted and corrected location estimation [10], authors combined the advantages of geometric and statistical location estimation techniques. Authors divided their research in four phases. In phase I (estimation) they used Receive Signal Strength and the Available Signal Strength as a primary parameter for distance calculation. After estimation they used variance for the filtering of calculated location points. In step III (prediction) they used Bayesian decision theory for error correction. Finally in fusion they combined two statistical techniques to achieve accuracy in estimated location. Authors used Kalman Filter’s prediction and correction cycle for the purpose. Based on their test results they claimed that in an indoor environment their approach can achieve the accuracy level from 1.5 m to 1m.
4 Discussion and Review Location estimation/determination is widely discussed research area with the reference of satellite communication (The GPS, vehicle tracking, path finder), cellular communication, VANET, WLAN and WSN. But the calculated location always comes with errors. These errors occur because the parameters used for location estimation carry impairments in them due to terrain attenuations. These impairments are not widely discussed by the researchers. As discussed in literature survey few researchers consider specific terrain impairments. In [20] authors consider the forest zone only. In [18] authors consider multiple zones based on the projected area they had. But the wireless services are not limited for specific zone. As in [22] author mentioned if signal propagation environment differs significantly from the ideal condition then the angle measurement and distance will be unreliable. Furthermore a mechanism is required to calculate precise location based on receive signal strength and the available signal strength. Therefore it is required to consider all possible terrains/clutters in the geographical area based on the attenuation/ impairment. This research work is focusing the same which will help researchers to improve accuracy in all terrains. Modeling is done by using linear interpolation which helps to predict nearly accurate location. Accuracy can be achieved by considering the error value of different clutters.
5 Problem Statement In location estimation of wireless nodes accuracy is always the key issue faced by the researchers. Transmitted signals (The Tx) and received signals (The Rx) are the core for the calculation of distance of mobile node from antenna. These distances are further used to calculate location. But these Tx and Rx values always comes with error
120
A. Muhammad et al.
because of the atmospheric attenuation in the free space. A procedure is required to calculate the accurate location by considering terrains impairments.
6 Experimental Work and Data Analysis In this section calculated data points [14] are presented. Collection of data points is done at the regular interval of 10 meter (from 10 m to 150 m). We consider Line of Sight (LOS) as a mandatory element for data collection in all clutters/terrains otherwise distortion may leads towards the unreliable results. C-sharp program is used to modulate the data points in table 1-4 to calculate interpolated variable distance (presented in table 5). A sample size of four is used in linear interpolation. Furthermore all terrains/clutters are divided into six zones in order to achieve accuracy and reduce error. In table 5 error (e) is calculated which is a difference between the interpolated and the actual location distance. We repeat all experiments 5 times (t0 – t4) but the error (e) can be unreliable if dealing with high attenuated clutters or if two or more clutters are combine at certain places. Moreover external factors like bright sun light, heavy rain etc can also increase the value of error. Table 1 is representing the collected data points in urban clutters [14]. All experiments are executed in tropical Malaysian weather. Table 1. Data points representation of urban clutters (including LDU, MDU and HDU)
Urban Terrains/Clutters (Signal Strength in %) Distance (d) (in meters) 10 (t0- t4) 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Low Dense Urban (LDU) 100 82 66 60 52 46 38 34 30 22 18 6 2 0 0
Medium Dense Urban (MDU) 100 78 64 56 50 44 36 30 28 20 16 4 0 0 0
High Dense Urban (HDU) 96 78 62 52 46 40 32 26 20 16 12 2 0 0 0
Modeling Interpolated Distance Error
121
Data points of rural clutters with the regular interval of 10m are represented Table 2. Based on the density of population rural clutter is further divided into three clutters which are LDR, MDR and HDR [14]. Table 2. Data points representation of rural clutters (including LDR, MDR and HDR)
Rural Terrains/Clutters (Signal Strength in %) Distance (d) (in meters)
Low Dense Rural (LDR)
10 (t0- t4) 20 30 40 50 60 70 80 90 100 110 120 130 140 150
100 84 70 62 56 50 42 38 34 30 24 14 6 0 0
Medium Dense Rural (MDR)
High Dense Rural (HDR)
100 82 68 60 52 48 40 36 32 26 22 10 2 0 0
100 80 66 58 52 46 38 34 30 22 18 6 0 0 0
Plantation clutter covers agriculture area, low forest and the dense forest (typical tropical region forest). Data points are presented in table 3 [14]. Table 3. Data points representation of plantation clutters (including A/F, LF/P and DF)
Plantation Terrains/Clutters (Signal Strength in %) Distance (d) (in meters) 10 (t0- t4) 20 30 40 50 60
Agriculture/ Field (A/F) 100 88 72 66 58 52
Low Forest /Plantation (LF/P) 100 86 64 58 52 46
Dense Forest (DF) 92 76 58 44 40 36
122
A. Muhammad et al. Table 3. (continued)
Plantation Terrains/Clutters (Signal Strength in %) Distance (d) (in meters) 70 80 90 100 110 120 130 140 150
Agriculture/ Field (A/F) 46 42 38 34 30 26 20 12 2
Low Forest /Plantation (LF/P)
Dense Forest (DF)
40 34 28 20 10 6 2 0 0
34 26 14 8 4 2 0 0 0
We categorize water bodies and highways/motorways in single group because of their attenuated characteristics [14]. Data point representation is as under. Table 4. Data points representation of highway & water bodies (including R/L, Sea and H/M)
Highway & Water Bodies Terrains/Clutters (Signal Strength in %) Distance (d) (in meters) 10 (t0- t4) 20 30 40 50 60 70 80 90 100 110 120 130 140 150
River/Lake (R/L) 100 86 66 58 50 46 40 32 28 20 8 2 0 0 0
Sea 92 74 58 44 40 36 32 26 22 18 12 8 4 2 0
Highway/ Motorway (H/M) 96 80 62 54 48 42 34 28 26 18 14 4 0 0 0
Modeling Interpolated Distance Error
123
Above tables representing distance based on the signal strength (Rx) at the receiver end. But the signal variation is not continuous in every clutter therefore we cannot predict the distance falling under constant intervals. For the purpose we divide clutters in six zones and took sample size of four from each zone. Fig 1 is a screen shot of signal strength to distance convertor which is using linear interpolation.
Fig. 1. Signal strength to distance convertor. Tabs are representing each clutter type. Linear interpolation is used for the mid values.
Fig 2 is explaining zone division. The zone division is based on the impairment/attenuation of each clutter. For example in fig 2, zone 3 of MDU is from 50% to 36% (signal strength) whereas in zone 3 of HDU is from 46% to 32% (refer fig 3).
Fig. 2. Zone selection procedure in each clutter. Zone can be selected based on your device receive signal strength.
After the clutter and the zone selection based on the signal strength we need to enter the signal strength which will calculate the distance of our device with the antenna.
124
A. Muhammad et al.
The calculated distance in HDU is 58.48218 m at 41% signal strength whereas actual distance is 57.955m at the same signal strength. The error value (e) is 0.52714 m (e = Interpolated distance – actual distance) In the above example the accuracy error is approx ½ m only (towards the antenna) in the high attenuated clutter of HDU. Zones division of clutters help to minimize error as representing in table 5.
Fig. 3. Distance calculation based on the device signal strength
Data points are not collected from the clutter type desert because of the unavailability of desert in the peninsular Malaysian. Following is the tracking architecture for the modulated interpolation location estimation and distance error. Step 2: P2P clutter based distance calculation Step 1: Parameters Calculation Parameters ASS & RSS Step 4: Error Calculation Clutter basedP2P Location Estimation: Actual Distance
e = Interpolated Distance – Actual Distance
Step 3: Modulation by using linear interpolation Data points manipulation: The use of linear Interpolation
Clutter based interpolated data points
Fig. 4. Tracking architecture for the modulated interpolation location estimation and error
Modeling Interpolated Distance Error
125
We took sample size of four from each clutter and measure actual distance on specific signal strength. We calculate interpolated distance based on categorization of zones. Finally we compare the actual distance with interpolated distance to calculate error (e). Table 5 is presenting the above. Negative values in error representing the actual position is after the interpolated position (with the reference of antenna) whereas positive value representing that actual position is before the interpolated distance. Table 5. Error calculation between actual distance and the interpolated distance on the sample size of 4 (with the six zones)
Modeling Interpolated Distance & Error Recorded Signal Strength (in %)
Clutter/ Terrain Type
93 54 39 20 92 51 37 17 90 47 33 13 91 57 43 25 94 53 41 23 96 55
LDU LDU LDU LDU MDU MDU MDU MDU HDU HDU HDU HDU LDR LDR LDR LDR MDR MDR MDR MDR HDR HDR
Interpolated Distance (I) (in meters) 13.7316 47.8571 68.9583 104.5833 12.8283 48.1845 68.9583 107.1875 13.1863 48.125 68.9583 107.50 15.4375 48.1845 68.9583 108.75 12.9762 48.75 69.4792 107.25 11.5966 44.7321
Actual Distance (A) (in meters) 13.80 46.00 67.85 103.95 13.215 47.9455 67.9555 107.8333 13.0325 47.8555 67.5950 105.9580 15.5455 46.9540 67.8455 107.7555 13.1010 47.6855 69.2520 107.8450 11.8565 43.9530
Error(H) (in meters) ͲϬ͘Ϭϲϴϰ ϭ͘ϴϱϳϭ ϭ͘ϭϬϴϯ Ϭ͘ϲϯϯϯ ͲϬ͘ϯϴϲϳ Ϭ͘Ϯϯϵ ϭ͘ϬϬϮϴ ͲϬ͘ϲϰϱϴ Ϭ͘ϭϱϯϴ Ϭ͘Ϯϲϵϱ ϭ͘ϯϲϯϯ ϭ͘ϱϰϮ ͲϬ͘ϭϬϴ ϭ͘ϮϯϬϱ ϭ͘ϭϭϮϴ Ϭ͘ϵϵϰϱ ͲϬ͘ϭϮϰϴ ϭ͘Ϭϲϰϱ Ϭ͘ϮϮϳϮ ͲϬ͘ϱϵϱ ͲϬ͘Ϯϱϵϵ Ϭ͘ϳϳϵϭ
126
A. Muhammad et al. Table 5. (continued)
Modeling Interpolated Distance & Error Recorded Signal Strength (in %)
Clutter/ Terrain Type
39 19 93 59 47 31 94 53 41 11 93 44 35 9 91 51 41 10 95 41 33 13 90 49 35 15
HDR HDR A/F A/F A/F A/F LF/P LF/P LF/P LF/P DF DF DF DF R/L R/L R/L R/L SEA SEA SEA SEA H/M H/M H/M H/M
Interpolated Distance (I) (in meters) 68.9583 107.1875 16.0938 48.9583 68.3333 107.5 14.6320 48.3333 68.3333 109.125 9.3403 44.6032 64.5833 97.9167 16.7122 48.75 68.75 108.75 8.4620 47.2024 67.5 108.75 13.8726 48.1845 68.9583 107.1875
Actual Distance (A) (in meters) 69.2525 107.9545 16.4540 48.6540 68.1245 108.2450 14.9550 48.3545 68.1235 108.8080 9.5450 40.00 66.1255 96.7545 17.2415 48.3520 68.3450 105.3555 7.4545 47.8545 67.9595 105.7545 13.8540 47.9555 67.9545 107.0545
Error(H) (in meters) ͲϬ͘ϮϵϰϮ ͲϬ͘ϳϲϳ ͲϬ͘ϯϲϬϮ Ϭ͘ϯϬϰϯ Ϭ͘ϮϬϴϴ ͲϬ͘ϳϰϱ ͲϬ͘ϯϮϯ ͲϬ͘ϬϮϭϮ Ϭ͘ϮϬϵϴ Ϭ͘ϯϭϳ ͲϬ͘ϮϬϰϳ ϰ͘ϲϬϯϮ Ͳϭ͘ϱϰϮϮ ϭ͘ϭϲϮϮ ͲϬ͘ϱϮϵϯ Ϭ͘ϯϵϴ Ϭ͘ϰϬϱ ϯ͘ϯϵϰϱ ϭ͘ϬϬϳϱ ͲϬ͘ϲϱϮϭ ͲϬ͘ϰϱϵϱ Ϯ͘ϵϵϱϱ Ϭ͘Ϭϭϴϲ Ϭ͘ϮϮϵ ϭ͘ϬϬϯϴ Ϭ͘ϭϯϯ
Fig 5 is representing the comparison between actual distance and the interpolated distance in LDU (fig 5a) and DF (fig 5b). As DF is highly attenuated clutter because of the vegetation therefore the error rate in few zones are high. Similarly the error is high in other highly attenuated clutters like HDR, SEA, R/L and H/M (figures are added in appendix).
Modeling Interpolated Distance Error
127
Highly attenuated area
I A e
(a)
I A e
(b)
Fig. 5. Comparison between interpolated distance (I), actual distance (A) and error (e). Fig 5a is representing LDU comparison and fig 5b is representing DF
DF, R/L, Sea and H/M recorded more error because of high impairments in clutters.
I A e
LDU MDU HDU LDR MDR HDR A/F LF/P DF R/L Sea
H/M
Fig. 6. Graphical representation of table 5 data points. Only highly affected clutters have high error rate but still it is approx with the average error of 1.5 m.
Modelling shows that the interpolated distance calculation has accuracy of about 0.5 m when the clutter impairments are considered. In some cases even it is less than 0.2 m (either -ve or +ve value) only. Following figure is representing the comparison between the interpolated distances, actual distances and the error rate. Error is high only in highly attenuated area such as D/F, R/L, SEA, and HDR.
128
A. Muhammad et al.
MDU
MDR MDU
LF/P LF/P
HDU
HDR HDR
R/L R/L
LDR
A/F A/F
SEA SEA
H/M H/M
Fig. 7. Comparison between the interpolated and the actual distance
7 Conclusion and Limitations Terrains/clutters based interpolated distance error helps to calculate corrected location and minimize error. Although in clutters with high impairments sometimes it gives error of 3+ meters but in average the error is 0.5 meter or even lesser depends on the clutter we are dealing with. This research helps to study and understand clutters behavior and to calculate interpolated distance with minimal error. Limitations: We discussed only clutters specific impairments in this research. Bright sunlight, heavy rain, humidity factors and other external factors may affect the signal and produce unreliable results.
Modeling Interpolated Distance Error
129
References 1. Kaplan, E., Hegarty, C.: Understanding GPS: Principles and Applications, 2nd edn. ISBN 1-58053-894-0 2. Laurendeau, C., Barbeau, M.: Insider attack attribution using signal strength-based hyperbolic location estimation. Security and Communication Networks 1, 337–349 (2008), doi:10.1002/sec.35 3. Roos, T., Myllymäki, P., Tirri, H., Misikangas, P., Sievänen, J.: A Prob-abilistic Approach to WLAN User Location Estimation. International Journal of Wireless Information Networks 9(3), 155–164, doi:10.1023/A:1016003126882 4. Patwari, N., Hero III, A.O., Perkins, M., Correal, N.S., O’Dea, R.J.: Relative loca-tion estimation in wireless sensor networks” Signal Processing. IEEE Transactions 51(8), 2137–2148 (2003); ISSN: 1053-587X 5. http://www.guardian.co.uk/world/2008/feb/27/ nokia .mobile phones 6. EU Institutions Press Release, Commission Pushes for Rapid Deployment of Location Enhanced 112 Emergency Services. DN: IP/03/1122, Brussels (2003) 7. Muhammad, A., Mazliham, M.S., Boursier, P., Shahrulniza, M., Yusuf, M.J.C.: Location Estimation and Power Management of Cellular Nodes for rescue operation. ICET Kuala Lumpur, Malaysia, December 08-10 (2009) 8. Khalaf-Allah, M.: A Novel GPS-free Method for Mobile Unit Global Positioning in Outdoor Wireless Environments. Wireless Personal Communications Journal 44(3) (February 2008) 9. Gezici, S.: A Survey on Wireless Position Estimation. Wireless Personal Communications: An International Journal 44(3) (February 2008); ISSN: 0929-6212 10. Alam, M., Suud, M.M., Boursier, P., Musa, S., Yusuf, J.C.M.: Predicted and corrected location estimation of mobile nodes based on the combination of kalman filter and the bayesian decision theory. In: Cai, Y., Magedanz, T., Li, M., Xia, J., Giannelli, C. (eds.) Mobilware 2010. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 48, pp. 313–325. Springer, Heidelberg (2010) 11. Dalela, P.K., Prasad, M.V.S.N., Mohan, A.: A New Method of Realistic GSM Network planning for Rural Indian Terrains. IJCSNS International Journal of Computer Science and Network Security 8(8) (August 2008) 12. Lecture notes on RF fundamentals Universitas Bina Darma, http://images.ilmanzuhriyadi 13. Integrated Publishing, Electrical Engineering Training Series, http://www.tpub.com/neets/book10/40c.htm 14. Muhammad, A., Mazliham, M.S., Boursier, P., Shahrulniza, M., Mustapha, J.C.: Terrain/Clutter Based Error Calculation in Location Estimation of Wireless Nodes by using Receive Signal Strength. In: 2nd International Conference on Computer Technology and Development (ICCTD 2010), Cairo, Egypt, November 2-4 (2010); 2010 ISBN: 978-14244-8844-5 15. Muhammad, A., Mazliham, M.S., Boursier, P., Shahrulniza, M., Mustapha, J.C.: Clutter based Enhance Error Rate Table (CERT) for Error Correction in Location Estimation of Mobile Nodes. In: International Conference on Information and Computer Networks, ICICN 2011, Guiyang, China, January 26-28 (2011) IEEE Catalog Number: CFP1145MPRT ISBN: 978-1-4244-9514-6 16. http://www.mathworks.com/moler/interp.pdf 17. de Boor, C.: A Practical Guide to Splines. Springer, New York (1978)
130
A. Muhammad et al.
18. Dalela, P.K., Prasad, M.V.S.N., Mohan, A.: A New Method of Realistic GSM Network planning for Rural Indian Terrains. IJCSNS International Journal of Computer Science and Network Security 8(8) (August 2008) 19. Muhammad, A., Mazliham, M.S., Shahrulniza, M.: Power Management of Portable Devices by Using Clutter Based Information. IJCSNS, International Journal of Computer Science and Network Security 9(4), 237–244 (2009); ISSN: 1738-7906 20. Meng, Y.S., Lee, Y.H., Ng, B.C.: Study Of Propagation Loss Prediction In Forest Environment. Progress In Electromagnetics Research B 17, 117–133 (2009) 21. Hellebrandt, M., Mathar, R., Scheibenbogen, M.: Estimating Position and Velocity of Mobiles in a Cellular Radio Network. IEEE Transactions On Vehicular Technology 46(1) (February 1997) 22. Tonteri, T., Sc Thesis, M.: A Statistical Modeling Approach to Location Estimation. Department of Computer Science University of Helsinki (May 25, 2001)
Mobile and Wireless Access in Video Surveillance System Aleksandra Karimaa Turku University/TUCS student no. UTU 74474, AA 84459, TSE 16522 [email protected]
Abstract. Wireless communication and mobile technologies are already well established in modern surveillance systems. Mobile-based client applications are commonly used to provide the basic access to camera video streams and other system resources. Camera site devices might connect to the system core by wireless links to address/overcome the environmental conditions. Finally, the surveillance systems themselves can be installed in portable environments such as busses or trains, which require wireless access for infrastructure and internet services, etc. However, we observe the growing popularity of mobile and wireless access solutions. The technology itself is evolving rapidly providing efficient transmission technology, feature-rich and powerful mobile and wireless devices. The users expect to have seamless access and tools where the functionality does not depend on access technologies or access devices. The same functionality is demanded from local client application and remote mobile browser. The aim of this paper is to explore access scenario where mobile and wireless access methods are used to provide enhanced client functionality. We analyze these scenarios, discuss the performance factors. Keywords: mobile, wireless, surveillance.
1 Introduction The availability of internet access, the development of applications technologies and increasing processing capabilities of different access devices have a big impact on a variety of access methods in surveillance. Surveillance systems originate from CCTV (Closed Circuit TV) systems. In traditional CCTV the access tools and methods were dependent on user’s location, e.g. operating room or administrator premises. The security measures aimed to provide physical security. The performance of the access application was dependent on the particular installed hardware. These systems were difficult to upgrade and were not easily scalable. Therefore surveillance systems have moved from traditional analogue into digital and IP-based technologies. The access applications have been made hardware-independent. The type of access were dependent only on type of the user not his or her physical location. The systems were opened for public domain services such as time synchronization, email and SMS services etc. Currently another trend is to provide seamless access to the system and use of wireless and mobile technologies. The users expect to have access tools where the H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 131–138, 2011. © Springer-Verlag Berlin Heidelberg 2011
132
A. Karimaa
functionality does not depend on access technologies and devices. The same functionality is required from the local client application and remote mobile browsers. Video surveillance systems accommodating wireless or mobile technologies are areas of ongoing research. The key research areas are focused around architectural considerations required to support receivers’ mobility [1] [2], and their security and dependability aspects or innovative solutions based on wireless (sensors) and mobile technologies. The researched scenarios are usually presented using small or medium size surveillance systems or other innovative solutions [3]. The subject of transition of complex surveillance into world of wireless, mobile and cloud technologies is relatively unexplored (excluding the security aspects of connectivity). In case of complex surveillance system the transition towards mobile and wireless solutions is rather continuous than disruptive change of applications, architecture and system design in general. Therefore, it is worth to analyze the existing solutions in context of characteristics and limitations of new technologies. The article analyses how current access applications are suitable to support mobile and wireless technologies. We describe the architecture of reviewed client solutions. We describe basic functionality of the client application, based on Teleste VMX system. We analyze the impact of client functions on traffic. Based on expected traffic characteristics we review what access methods could be implemented to provide the expected functionality. Finally, we discuss related performance. In this article we will not address any security related challenges, however the security concerns should not be undermined- security plays the important role in the mobile and wireless access to surveillance.
2 Solutions The basic access application should be at least capable of displaying the video streams from selected cameras. Additionally, the application can provide PTZ control of a camera and access to recorded content. Enhanced client applications can provide some level of (local) content manipulation, such as, export or modification of material of interest and it should allow the user to react on given system events. Administration client application should enable administrator level of access to the system with the ability to modify the system setup and reacting on all system events and access to all system resources. The application itself is expected to be device independent, which leads to the application being browser-based and not requiring an installation process. The client application is typically placed inside the surveillance network. However, the mobile and wireless solutions extended to a variety of available remote locations. In modern surveillance systems the client application should enable access from both private (surveillance system) and public (internet or mobile network) domains. The architecture of surveillance system should adapt to both types of scenarios. Fig. 1 presents few available scenarios. First client is a PC-based application located inside the surveillance system network. The client application is retrieving information and accessing system resources directly from system infrastructure nodes.
Mobile and Wireless Access in Video Surveillance System
133
Fig. 1. Client locations
The second scenario addresses the devices with limited processing and installation capabilities– this simple application is based on a web browser and requires some level of stream manipulation to be done by system in order to avoid the powerexhaustive video decoding and general application complexity. This solution also provides a good level of security by easy control of stream and resource access. The system modules responsible for stream manipulation are usually also capable to limit the user access to the stream or resource. The third scenario is represented by more powerful (and increasingly more common) devices. The original streams are directed into the client node (laptop, iPhone, mobile, other) from the system. The device has the capabilities to decode streams as well as being able to host more complex client applications with extended functionality. For security reasons the access might be controlled by gateway-type nodes but the security decisions are based on the system state rather than the content of the streams -therefore the solution tends to be easily scalable. In the next chapter we analyze how well this scenario is supported in terms of client functionality. 3 Analysis In order to analyze how the client application functionality affects the traffic characteristics we have to first compare traffic characteristics of two types of client installations: PC-installed traditional application and mobile hosted (browser-based) simple application. The comparison allows us to define the major differences in traffic characteristics between these two application technologies. Additionally, the analysis displays the difference between the second and the third scenario in terms of bandwidth.
134
A. Karimaa
Next, we analyze how the traffic characteristics change if enhanced client functions are used. It allows estimating the traffic behavior for a scenario where processingcapable mobile devices are used to access the surveillance resources (see third scenario from chapter above). Fig. 2 and Fig. 3 present the differences in traffic characteristics for simple client operations. Fig. 2 presents characteristics for traffic from system to browser-based client and Fig.3 from system towards PC-based client -both are examples of Teleste VMX Client applications.
Fig. 2. Traffic statistics for simple browser client- traffic (Mbps) as function of time
Fig. 3. Traffic statistics for PC hosted client application - traffic (Mbps) as function of time
Mobile and Wireless Access in Video Surveillance System
135
The operations performed during the time of capture were the same for both types of client applications. The sequence was as follows: logging in, connecting to first camera, connecting to a second camera, connecting a third camera, PTZ operations on first camera, sequential disconnection of three cameras, logging out. The general shape of the traffic characteristic is similar in both cases. The number of video streams being viewed has the biggest impact on bandwidth occupied by client application traffic. The function of the client application retrieving each of three streams is easy to distinguish. The bandwidth used by the web browser client for decoding the video transmission is approximately 6Mbps per stream whereas PC-hosted application uses approximately 2Mbps (which is original size of the camera stream in our test bench). The reason for this is the fact that in our test bench the PC-hosted application retrieves the original MPEG-4 stream whereas the browser-based application retrieves JPEG files from the gateway node – in this scenario the gateway transcodes the original video stream to JPEG stream which is easy to decode and display by web browser. We have also observed that the stream traffic is more variable in case of the browser application, which is expected due to the affects of the http character of transmission. It also indicates that the performance problems of mobile client applications are more likely to be caused by fluctuations of traffic and not by the amount of video traffic in general. Fig. 4 presents the characteristics for traffic from system towards client tested with advanced operations.
Fig. 4. Traffic statistics for PC hosted client application- full functionality - traffic (Mbps) as function of time
In addition to basic functions (listed earlier) we have added the operations of querying recorded material, playback of recorded material, accessing and modifying system setup, and downloading recorded material. Whereas other functions did not have a substantial impact on traffic characteristics (viewing recorded material has been seen as retrieving another video stream) the function of material download was
136
A. Karimaa
clearly visible. It indicates that the function of data transmission might have a major impact on the performance of a mobile application. The results is consistent with our expectations- a downloaded stream is perceived by the network as a data transmission file type and might be prioritized as such, this may in turn affect the quality of operation for other client functions.
4 Technology Review The earlier analysis reveals the biggest impact on bandwidth of client connections have the functions of: viewing life or recorded video and downloading of the material. Other operations have minor impact on traffic and can be omitted in analysis. The number of simultaneously viewed streams is directly dependent on user interface of application. In case of mobile devices where small size of screen limits the number of viewed stream we can assume the maximum number of displays being equal to four (of 4CIF resolution). The bandwidth occupied by typical stream of this resolution varies from few Mbps for stream of JPEGs, 4Mbps for MPEG-4 compressed stream to 2Mbps and less when H.264 compression is used. Moreover, we can assume that for solutions where JPEG streams are used (simple applications for devices with limited processing power) it would be also acceptable to limit the number of simultaneously viewed channel to one or two. In addition, different channel adaptive transmission methods can be used, such as channel-adaptive video transmission method using H.264 scalable video coding proposed by [4]. The above concludes the total bandwidth available for mobile and wireless connection would be: around max 20Mbps for simple applications and up to 10Mbps (typically 3 to 5 Mbps) for remaining types of wireless and mobile client application. The above estimations do not take into consideration the impact of download on the bandwidth. However, it is assumed that if download of the content is required (it is still not commonly used), the application should have capabilities to fix the transmission parameters for download operation. In this case the bandwidth requirements will grow insignificantly and we assume the typical mobile client will be able to use less than 10Mbps. The bandwidth of 3 to 20 Mbps is available in many technologies. Popular Wireless Fidelity (Wi-Fi) technology (based on IEEE 802.11 standard) offers WLAN range standard interconnectivity with channel capacity up to 11Mbps for basic 802.11b, up to 54Mbps for 802.11a or 802.11g, and even above 100Mbps for 802.11n. Assuming effective transmission rate being not less than half of standard channel capacity Wi-Fi standards can provide basic wireless interconnectivity within the building with restrictions for simple browser application not being used on 802.11b infrastructure. WiMAX technology (based on IEEE 802.16 standard) provides wireless access for wide areas (typically several km and up to 50 km) with channel capacity of typically 54Mbps (and up to 100Mbps). It can be used for wide area surveillance networks and provide the access for almost whole range of local system users. 3G mobile technologies do not necessarily guarantee necessary bandwidth but as they are expected to provide the minimum data rate of 2 Mbps for stationary or
Mobile and Wireless Access in Video Surveillance System
137
walking users and 384 kbps in a moving vehicle (refer to [3]). However, also 3G systems bandwidth might be sufficient to handle client connections if advanced compression standards and channel-adaptation techniques are implemented (see [4]). There are many successful examples of deploying 3G -based surveillance access (see example [1]). It is worth to underline that the access technologies provide not only the transmission bandwidth but also define available transmission techniques for given environments and infrastructures. IP networks can introduce delays or jittering of transmitter signal. Wireless networks introduce the challenges related to mobility e.g. signal fading and mobile technologies can add the problems with handover. Typical delays commonly acceptable in surveillance are 300ms for video, 50ms for speech audio and often lip-sync level of synchronization for downloaded content and playback. Whereas the synchronization of playback and downloaded content is independent from network environment and can be done by system itself (based on RTCP protocol). The delays on video and audio transmission affect greatly the quality of client operations and therefore should be considered when deploying wireless access over different technologies. All proposed access technologies (Wi-Fi, WiMAX and 3G) provide basic QoS mechanisms to address problems of delays in multimedia transmission. 802.11e Wi-Fi standard provide traffic prioritarization for different (real-time) applications by creating different classes of transmission for different types of data transmitted over wireless link. However, delivering multimedia through Wi-Fi might meet additional challenges as Wi-Fi technology was not originally designed as multimedia broadband carrier technology. Article [5] discuss the problem of absence of multicast feedback mechanisms and proposes leader-based mechanism to overcome this problem. Article [6] addresses the problems, such as: multicast transmission using the slowest link-speed, common link adaptation mechanisms for clients, lack of a call admission policy, and irreducible PER even in good channel conditions. 802.16 WiMAX standard has been originally designed to support reliable delivery of broadband multimedia data - it has built in scheduled access and Quality of Service (QoS) mechanisms (refer to [7]). The topic of WiMAX access for multimedia content has been reviewed by publications, such as: article [8]. We observe the popularity of WiMAX access for surveillance is growing.
5 Conclusions and Future Prospects In this paper we provide evidence that secure and good quality access to surveillance systems and data applications and data, using mobile devices and wireless networks is attainable with current technology. The general trend is to open surveillance networks to modern tools. Despite the fact that the main concerns related to mobile or wireless access-the security of the solution still bring discussions we observe growing need of seamless and mobile access. It is expected that the future trends bring more openness of the surveillance with enhancement of the security level applied.
138
A. Karimaa
Acknowledgments. The author gratefully acknowledges the contribution of Teleste colleagues - Pete Ward and Navid Borhany. It should nevertheless be stressed that the views in this paper are authors own and do not necessarily represent the views of Teleste.
References 1. Ruichao, L., Jing, H., Lianfeng, S.: Design and Implementation of a Video Surveillance System Based on 3G Network. In: Int. Conf. on Wireless Communication and Signal Processing WCSP 2009, pp. 1–4 (2009) 2. Bing-Fei, W., Hsin-Yuan, P., Chao-Jung, C., Yi-Huang, C.: An Encrypted Mobile Embedded Surveillance System. In: Proc. IEEE Symp. on Intelligent Vehicles, pp. 502–507 (2005) 3. International Telecommunication Union: Cellular Standards for 3G: ITU’s IMT-2000 Family. Cellular Standards for the Third Generation (2005) 4. Hye-Soo, K., Eun-Seok Ryo, C., Jayant, N.: Channel-adaptive Video Transmission Using H.264 SVC over Mobile WiMAX Network. In: Digest of Tech. Papers. Int. Conf. on Consumer Electronics ICCE 2010, pp. 441–442 (2010) 5. Dujovne, D., Turletti, T.: Multicast in 802.11 WLANs: An Experimental Study. In: Int. Symp. on Modeling analysis and simulation of wireless and mobile systems, MSWiM 2006, pp. 130–138 (2006) 6. Ferre, P., Agrafiotis, D., Chiew, T.K., Nix, A.R., Bull, D.R.: Multimedia Transmission over IEEE 802.11g WLANs: Practical Issues and Considerations. In: Digest of Tech. Papers. Int. Conf. on Consumer Electronics ICCE 2007, pp. 1–2 (2007) 7. Sayenko, A., Alanen, O., Karhula, J., Hamalainen, T.: Ensuring the QoS Requirements in 802.16 Scheduling. In: Proc. 9th ACM Int. Symp. on Modeling Analysis and Simulation of Wireless and Mobile Systems, MSWiM 2006, pp. 108–117 (2006) 8. Rui-Yen, C., Chin-Lung, L.: IP Video Surveillance Applications over WiMAX Wireless Broadband Technology. In: Proc. 5th Int. Joint Conf. on INC, IMS and IDC NCM 2009, pp. 2100–2102 (2009)
A Novel Event-Driven QoS-Aware Connection Setup Management Scheme for Optical Networks Wissam Fawaz, Abdelghani Sinno, Raghed Bardan, and Maurice Khabbaz Lebanese American University, Electrical and Computer Engineering Department, Byblos, Lebanon {wissam.fawaz,abdelghani.sinno,raghed.bardan, maurice.khabbaz}@lau.edu.lb http://soe.lau.edu.lb
Abstract. This paper proposes a QoS-Aware Optical Connection Setup Management scheme that uses the Earliest Deadline First (EDF) queueing discipline to schedule the setup of the optical connections. The benefits of this EDF-based scheme are twofold: a) it reduces the blocking probability since blocked connection requests due to resource unavailability are queued for possible future setup opportunities and b) it realizes QoS differentiation by ranking the blocked requests in the EDF queue according to their connection setup time requirements, which are viewed as deadlines during connection provisioning. As such, pending lesser delay-tolerant requests are guaranteed to experience better QoS than the ones having longer setup time requirements. Extensive simulations are performed to gauge the merits of the proposed EDF-based scheme and study its performance in the context of two network scenarios, namely, the National Science Foundation Network (NSFNET) and the European Optical Network (EON). Keywords: Optical networks, connection setup management, Earliest Deadline First (EDF) scheduling, performance evaluation.
1
Introduction
Wavelength Division Multiplexing-based (WDM) optical networks are foreseen to support the numerous ever-emerging applications having distinct Quality of Service (QoS) requirements. The evolution of such networks along with all related technological developments are driven by the urgent need to cater for these requirements. QoS-Aware WDM-based optical networks come forth as a promising solution, the realization of which is, however, a major challenge. Over the last couple of years, a large body of researchers contended in addressing this challenge through various proposals (e.g. [1,2,3]). These proposals mainly aimed at enabling WDM-based optical networks to provide the so-called Predictable Quality of Transport Services (PQoTS) by monitoring a set of parameters that affect an established connection’s data flow. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 139–150, 2011. c Springer-Verlag Berlin Heidelberg 2011
140
W. Fawaz et al.
The Connection Setup Time (CST) defined as the maximum amount of time that elapses between the instant an optical connection is first requested and the instant the requested connection is setup, is an important parameter that may significantly improve the optical connections setup management. However, this parameter has been given little attention. In fact, CST can be interpreted as a deadline prior to which a received connection request must be established and thus provides an opportunity for network operators to carry out QoS differentiation during connection provisioning. According to the authors in [4, 5], CST is expected to become an integral part of an optical connection’s service profile and is thus foreseen as a potential service differentiator in the Service Level Agreements (SLAs) established between optical operators and their clients. Inspired by the above observation, we presents a novel QoS-Aware Optical Connection Setup Management strategy that uses CST as both an indicator of the priority of a connection request and a measure of the delay tolerance associated with that request. Whenever connection blocking occurs due to the lack of optical resources, the blocked connection setup requests are queued and scheduled in an order that is consistent with their deadlines. This objective is achieved by utilizing the well known Earliest Deadline First (EDF) queueing discipline whereby the connection request with the smallest setup time (i.e., earliest deadline) is served first. As such, what distinguishes this newly proposed strategy from others found in the open literature is the introduction of QoS differentiation among the incoming connection requests during the course of connection provisioning. The rest of this paper is structured as follows: in Section 2, the EDF-based connection setup scheme is described in details, then its relation to the literature is discussed. Section 3 introduces the simulation framework used to highlight the merits of the proposed setup strategy. The simulation results are given in Section 4. Finally, Section 5 concludes the paper.
2
Related Work
In [6,7,8], the authors investigated the problem of dynamic bandwidth allocation for Deadline-Driven Requests (DDRs). For this purpose, they designed several algorithms that enabled transmission rate flexibility throughout the DDR provisioning procedure in WDM-based optical networks. To this end, their followed approach differs from ours in that they considered the deadline to be the maximum connection holding time. Nonetheless, in order to increase the fraction of successfully provisioned DDRs the algorithms proposed in [6, 7, 8] can be easily supplemented with the management technique developed in this paper. The authors of [9] studied the effectiveness of using an EDF-based optical for managing the setup of point-to-point connections in optical networks with single-wavelength fiber links. The work in [10] appears as an extension to [9] where the authors considered the multi-wavelengths fiber links’ case. However,
Event-Driven Management of Optical Connection Setup
tAC
B
141
tBC
EDF Queue
tAB2 tAB1
A
C tDC
D
Fig. 1. A sample 4-node optical network
both of the aforementioned studies lacked generality as they did not assess the performance of their corresponding EDF-based connection setup management schemes when applied to a wavelength routed optical network. Indeed, it was proven in [11, 12, 13] that the EDF-based strategies proposed in [9, 10] may not be as efficient as expected when utilized in such networks. Instead, the authors presented improved strategies which they showed to be more suitable for wavelength routed optical networks. Nevertheless, those improved strategies also suffered from two major limitations: a) they were driven only by connection departures and b) they solely served the head-of-line pending request when a departure event occurred. In this paper, we alleviate the above observed deficiencies by introducing an improved event-driven EDF-based connection setup policy that, first, accounts for both a connection’s arrival and departure, and second, ensures the setup of a wider spectrum of the pending requests upon the occurrence of a departure or an arrival event. This is achieved by having the proposed scheme target not only the head-of-line pending request but also a large portion of the other pending requests that may potentially be provisioned into the network.
3
Description of the Proposed Scheme
For illustration purposes, the sample network topology given in Figure 1 is used to explain the main idea behind the EDF-based setup scheme. The figure shows four optical cross-connects (OXCs), namely: A, B, C, and D, that are connected together by means of 3 fiber links. Each OXC serves an incoming connection request by attempting to establish an end-to-end lightpath connecting the source node of the connection request to its destination. When such an attempt fails, the connection request is said to be blocked. Historically, blocked connection requests
142
W. Fawaz et al.
used to be immediately dropped. Instead, we propose to insert such connections into an EDF queue and to arrange them in an ascending order of their Connection Setup Time (CST) requirements. This solution is motivated by the studies made in [4, 5], which stipulate that CST is a parameter that determines a connection request’s class of service where the smaller a connection request’s CST value is, the higher its associated class of service becomes. It is important to emphasize on the fact that, in the context of the proposed EDF-based connection setup policy, the priority of a connection request waiting in the EDF queue increases as that pending request approaches the deadline prior to which it must be established. If a CST expires before its corresponding connection request gets provisioned, then this request is called a dead request. The way an EDF queue treats a dead request depends on whether a work-conserving or a non work-conserving EDF policy is implemented. In particular, the non work-conserving variant immediately drops a dead request, whereas the work-conserving one further holds dead requests in the queue until they get served. In this paper, the more realistic non-work conserving EDF variant is considered. Let us illustrate the main operation of the EDF-based setup scheme by considering the following scenario. Suppose that the capacity of each fiber link in Figure 1 is limited to 2 wavelengths and that 2 connections tBC and tDC are already established from B to C and from D to C respectively. For the sake of simplicity, each connection setup request is assumed to be requiring a full wavelength of bandwidth. Furthermore, let us assume that the EDF queue associated with A is finite and contains 2 previously blocked requests tAB1 and tAB2 destined for B, with tAB1 being the head-of-line pending request. tAB1 and tAB2 have deadlines of 1 and 2 units of time associated with them, respectively. Eventually, under such circumstances, a setup request tAC addressed to C that arrives at A with a deadline of 3 would be blocked and consequently the eventdriven EDF-based setup strategy would come into play. The event-driven aspect of the EDF-based setup scheme is highlighted by the fact that it is activated on the occurrence of an arrival event. Recall that the proposed scheme is driven by both arrivals and departures. Once the setup management scheme is activated, an attempt is made to serve the pending request occupying the front of the EDF queue. In the context of the scenario under study, tAB1 will hence be provisioned into the network. Then, the setup scheme turns to the next pending request attempting to serve it. This process continues until the setup strategy reaches a pending request that cannot be routed into the network. In this case, since the establishment of tAB2 turns out to be impossible, the setup scheme stops and inserts the blocked request tAC into the EDF queue at the appropriate position relative to tAB2 . Given that tAB2 ’s deadline is less than that of tAC , tAC ends up being queued behind tAB2 . As time evolves, the degree of urgency of tAB2 and tAC increases. Ultimately, if one of the pending requests reaches its deadline, that request is immediately dropped out of the queue, in which case a deadline mismatch is said to have taken place. Subsequent connection requests whose deadlines are less than the deadline associated with tAC are placed in front of tAC in the EDF queue and as such are served prior to tAC . Note that if
Event-Driven Management of Optical Connection Setup
143
the number of such connection requests is large enough, tAC may end up being pushed out of the EDF queue. Now consider what happens upon the occurrence of a connection departure event. Say the previously provisioned tAB1 connection departs from the network before tAB2 ’s deadline is violated. On the occurrence of the departure event, the EDF-based setup scheme is activated requiring A to examine its associated EDF queue one request at a time and to establish each pending request in turn into the network. This process terminates when A encounters a pending request that cannot be provisioned. In the case of the considered scenario, A would succeed in provisioning tAB2 , but would fail in serving tAC . As such, tAC becomes the sole pending request in the EDF queue and would thus be obliged to wait until the next arrival or departure event occurs before retrying to gain access into the network. In summary, the EDF-based connection setup strategy proposed in this paper is activated upon the occurrence of two types of events, namely the departure and the arrival of connections. When a connection emanating from an arbitrary source node A departs from or arrives to the network, the setup strategy proceeds as follows. It scans through the EDF queue associated with A aiming at provisioning as many pending requests as possible. This process continues until either all pending requests are provisioned or the setup strategy comes across a pending request whose setup is impossible, in which case the setup strategy stops its probing for possible connection setups. This suggests that the event-driven EDF-based setup strategy enjoys a wide setup probing scope.
4
Simulation Study
A Java-based discrete event simulator was developed to analyze the performance of the proposed event-driven EDF-based connection setup strategy in the context of two real-life optical networks, namely: a) the National Science Foundation Network (NSFNET) and b) the European Optical Network (EON). NSFNET’s topology is shown in the bottom part of the composite Figure 2, while the one corresponding to EON is depicted in the topmost part of the Figure. NSFNET consists of 24 nodes and 43 bidirectional fiber links while EON is composed of 19 nodes together with 39 bidirectional links. The data relating to the physical topologies of both networks was taken from [14, 15]. Our simulation study is based on the following assumptions: 1. In both optical networks, it is assumed that each node has a full wavelength conversion capability. 2. Incoming connection requests are uniformly arranged into 3 service classes referred to as gold, silver, and bronze. 3. Each incoming request requires a full wavelength of bandwidth. 4. The overall arrival process is Poisson and the connection holding time is exponentially distributed with a mean normalized to unity.
144
W. Fawaz et al.
4 3 5 1
6 2
7 8 19
18
15
9
17
16 12
10
13
14
11
19
1
2
3
6
11
12
9
7
21
16 22
4
5
20
15
13
17
10 8
14 18
23
24
Fig. 2. Network topologies used in simulation
5. Following the guideline presented in [4, 5], the parameters associated with the three service classes are as follows: – Gold connection requests arrive with an initial deadline of 6 units of time. – Silver requests have deadlines of 10 units of time associated with them. – The initial deadline of the bronze requests is set to 14 time units. 6. One EDF queue is deployed per optical node with a capacity to hold up to 20 pending connection requests.
Event-Driven Management of Optical Connection Setup
145
NSFNET Topology
Overall Rejection Probabilty (%)
40
No Queue scheme FIFO−based scheme EDF−based scheme Improved EDF scheme
35 30 25 20 15 10 5 0 140
160
180
200
220
240
250
Total Offered Load (Erlang)
Fig. 3. Overall rejection probability for different setup strategies (NSFNET).
7. Dijkstra’s algorithm is used to find the shortest path for the arriving connections, while wavelengths are assigned to the provisioned connections according to a first-fit strategy. 8. The capacity of each fiber link is set to 8 wavelengths. It is important to stress that 106 connection requests are simulated per run of the simulator and that each obtained value of the results is the average of the outcomes of multiple simulation runs to ensure that a 95% confidence interval is realized. The 106 simulated connection requests are uniformly distributed among the nodes of the considered optical networks. 4.1
Performance Metrics and Benchmarks
The performance metrics used to gauge the benefits of the proposed event-driven EDF-based connection setup strategy are: (i) the overall rejection probability and (ii) the rejection probabilities for gold, silver, and bronze connection requests. Note that the rejection probability is nothing else but the fraction of connection requests whose access to the network is blocked. Blocking could occur either due to buffer overflow or due to deadline mismatch which, as mentioned earlier, happens when a request’s CST expires prior to its provisioning. Three connection setup management approaches will serve as benchmarks: – A queue-free connection setup strategy, where no queues are used to store the blocked connection requests due to optical resource unavailability. This strategy will be referred to henceforth as the No-Queue strategy. – A First In First Out (FIFO) queue-based connection setup scheme, where blocked connections requests are queued and then served according to the FIFO principle. – The EDF-based connection setup mechanism studied in [11, 12, 13].
146
W. Fawaz et al.
NSFNET Topology
Rejection Probability for Gold (%)
50 45 40 35
FIFO−based scheme EDF−based scheme Improved EDF−based scheme
30 25 20 15 10 5 0 140
150
160
170
180
190
200
210
220
230
240
250
Total Offered Load (Erlang) EON Topology
Rejection Probability for Gold (%)
48 45 40 35
FIFO−based scheme EDF−based scheme Improved EDF−based scheme
30 25 20 15 10 5 1 230
240
260
280
300
320
340
350
Total Offered Load (Erlang)
Fig. 4. NSFNET: Gold rejection probability for FIFO, EDF, and IEDF based setup schemes (uppermost); EON: Gold rejection probability for FIFO, EDF, and IEDF based setup schemes (bottom)
In order to distinguish our newly proposed event-driven EDF-based strategy from the one of [11, 12, 13], we will refer, in what follows, to our strategy as the Improved EDF-based (IEDF) connection setup strategy. 4.2
Numerical Results
Consider the NSFNET topology shown in bottom part of Figure 2. In this context, Figure 3 compares the overall rejection probabilities achieved by all of the IEDF and the three benchmark schemes as a function of the load offered to the network. IEDF clearly outperforms the other three strategies as it presents the lowest blocking probabilities. In contrast, a No-Queue scheme yields the worse performance in terms of the overall blocking probability since, simply, blocked connection requests are immediately dropped. Building on this observation, this scheme will not be considered in the subsequent set of results. It is worth mentioning that by limiting their setup probing scope to only the
Event-Driven Management of Optical Connection Setup
147
Rejection Probabilty for Silver (%)
NSFNET Topology 42 40
EDF−based scheme Improved EDF−based scheme
35 30 25 20 15 10 5 2 140
150
160
170
180
190
200
210
220
230
240
250
320
330
340
350
Total Offered Load (Erlang) EON Topology
Rejection Probability for Silver (%)
56
EDF−based scheme Improved EDF−based scheme
50
40
30
20 240
250
260
270
280
290
300
310
Total Offered Load (Erlang)
Fig. 5. NSFNET: Silver rejection probability for EDF and IEDF based setup schemes (uppermost); EON: Gold rejection probability for EDF and IEDF based setup schemes (bottom)
head-of-line pending request, the FIFO-based and the traditional EDF-based schemes achieved the same overall blocking probabilities and hence their blocking probability curves overlapped with each other. The rejection probabilities for gold connection setup requests resulting from the deployment of the FIFO-based, EDF-based, and IEDF connection setup schemes are graphed in Figure 4 in the context of both, the NSFNET and EON networks. Based on these reported results, smaller gold rejection probabilities are observed for the EDF-based and IEDF schemes relatively to the FIFO-based strategy. This finding can be justified by the fact that, in terms of access to the network, the EDF-based and IEDF schemes privilege the connections with the smallest deadline requirements (i.e. gold connections). Furthermore, in contrast to the traditional EDF-based scheme, targeting more than one of the front pending gold connection setup requests upon the occurrence of a departure or arrival event, enabled IEDF to provision a larger number of those requests. This is due mainly to the fact that gold connection setup requests are most likely to
148
W. Fawaz et al.
Rejection Probabilityfor Bronze (%)
NSFNET Topology 67 60
EDF−based scheme Improved EDF−based scheme
50
40
30
20
10
0 140
150
160
170
180
190
200
210
220
230
240
250
320
330
340
350
Total Offered Load (Erlang) Rejection Probability for Bronze (%)
EON Topology 75
70
EDF−based scheme Improved EDF−based scheme
65
60
55
50
45
40 240
250
260
270
280
290
300
310
Total Offered Load (Erlang)
Fig. 6. NSFNET: Bronze rejection probability for EDF and IEDF based setup schemes (uppermost); EON: Bronze rejection probability for EDF and IEDF based setup schemes (bottom)
be found towards the front of the EDF queue because of their small deadline requirements and thus have a higher chance of being provisioned on time under the proposed IEDF scheme. Also in the context of both NSFNET and EON, Figure 5 shows the rejection probability associated with silver connections for different values of the network’s offered load. The results demonstrate that IEDF is also hard to beat when it comes to the provisioning of silver connection setup requests in comparison to the traditional EDF-based scheme. This is again due to the fact that silver connection setup requests occupy the middle of the EDF queue and thus can benefit from the wider setup probing scope characterizing IEDF. This feature causes more silver setup requests to be provisioned on time and accordingly reduces the silver rejection probability. Finally, Figure 6 compares the performance of IEDF to the that of the EDFbased strategy in terms of the rejection probability corresponding to bronze requests as a function of the network offered load in the context of the NSFNET and EON networks. The fact that IEDF privileges gold and silver connection
Event-Driven Management of Optical Connection Setup
149
setup requests in terms of network access comes at the expense of bronze requests. This explains the slightly degraded performance that the bronze requests experience under IEDF compared to the EDF-based strategy.
5
Conclusion
This paper proposes IEDF, an event-driven EDF-based connection setup scheme that has the luxury of triggering the setup of pending connection requests upon the occurrence of either a connection departure or the arrival of a new connection. This reduces the likelihood of incidence of deadline mismatches. An additional characterizing feature of IEDF is its ability to provision multiple pending connection setup requests per single event occurrence as opposed to existing EDFbased policies that consider provisioning only the head-of-line pending request. Extensive simulations were conducted to evaluate the performance of IEDF and to measure its impact on the quality of service perceived by the end clients. Throughout the simulation study, the performance of IEDF was contrasted with that of multiple other benchmark schemes, including the traditional EDF-based setup scheme. The reported simulation results proved that IEDF has the upper hand when it comes to rejection probability improvement. Moreover, these results inferred that IEDF is able to simultaneously achieve QoS differentiation and lower blocking probability of the incoming connection requests without compromising the privilege of higher priority clients with respect to network access.
References 1. Szymanski, A., Lanson, A., Rzasa, J., Jajszczyk, A.: Performance evaluation of the Grade-of-Service routing strategies for optical networks. In: IEEE ICC 2008, pp. 5252–5257. IEEE Press, New York (2008) 2. Martinez, R., Pinart, C., Cugini, F., Andriolli, N., Valcarenghi, L., Castoldi, P., Wosinska, L., Comellas, J., Junyent, G.: Challenges and requirements for introducing impairement awareness into the management and control planes of ASON/GMPLS WDM networks. IEEE Communications Magazine 44, 76–85 (2006) 3. Jukan, A., Franzl, G.: Path selection methods with multiple constraints in serviceguaranteed WDM networks. IEEE/ACM Transactions on Networking 12, 59–72 (2004) 4. Fawaz, W., Daheb, B., Audouin, O., Berde, B., Vigoureux, M., Du-Pond, M., Pujolle, G.: Service Level Agreement and provisioning in optical networks. IEEE Communications Magazine, 36–43 (2004) 5. Sambo, N., Pinart, C., Le Rouzic, E., Cugini, F., Valcarenghi, L., Castoldi, P.: Signaling and Multi-layer Probe-based Schemes for guaranteeing QoT in GMPLS Transparent Networks. In: IEEE OFC 2009, pp. 22–26. IEEE Press, New York (2009) 6. Andrei, D., Batayneh, M., Martel, C.U., Mukherjee, B.: Provisioning of DeadlineDriven Requests with flexible transmission rates in different WDM network architectures. In: IEEE OFC 2008, pp. 1–3. IEEE Press, New York (2008)
150
W. Fawaz et al.
7. Andrei, D., Batayneh, M., Martel, C.U., Mukherjee, B.: Deadline-Driven bandwidth allocation with flexible transmission rates in WDM networks. In: IEEE ICC 2008, pp. 5354–5358. IEEE Press, New York (2008) 8. Andrei, D., Tornatore, M., Batayneh, M., Martel, C.U., Mukherjee, B.: Provisioning of Deadlin-driven requests with flexible transmission rates in WDM mesh networks. IEEE/ACM Transactions on Networking, 353–366 (2010) 9. Fawaz, W., Chen, K.: A Shortest Setup Time First optical connection setup management approach with quantifiable success rate. In: IEEE Globecom 2006, pp. 1–5. IEEE Press, New York (2006) 10. Fawaz, W., Chen, K., Abou-Rjeily, C.: A novel connection setup management approach for optical WDM networks. IEEE Communications Letters, 998–1000 (2007) 11. Fawaz, W., Ouaiss, I., Chen, K., Perros, H.: Deadline-based Connection Setup in Wavelength-routed WDM Networks. Elsevier Journal of Computer Networks, 1972–1804 (2010) 12. Cavdar, C., Tornatore, M., Buzluca, F., Mukherjee, B.: Dynamic Scheduling of Survivable Connections with Delay Tolerance in WDM Networks. In: IEEE Infocom Workshops, pp. 1–6. IEEE Press, New York (2009) 13. Cavdar, C., Tornatore, M., Buzluca, F., Mukherjee, B.: Shared-Path Protection with Delay Tolerance in Optical WDM Mesh Networks. IEEE Journal of Lightwave Technology, 2068–2076 (2010) 14. Miyao, Y., Saito, H.: Optimal Design and Evaluation of Survivable WDM Transport Networks. IEEE Journal on Selected Areas in Communications, 1190–1198 (2004) 15. Fumagalli, A., Cerutti, I., Tacca, M., Masetti, F., Jagannathan, R., Alagar, S.: Survivable Networks Based on Optimal Routing and WDM Self-healing Rings. In: IEEE Infocom, pp. 726–733. IEEE Press, New York (1999)
Semantic Data Caching Strategies for Location Dependent Data in Mobile Environments N. Ilayaraja1,3, F. Mary Magdalene Jane1,2, I Thomson1,3, C. Vikram Narayan1,3, R. Nadarajan1,3, and Maytham Safar4,5 1
Department Of Mathematics and Computer Applications 2 Dr N.G.P. Institute of Technology, India 3 P.S.G. College of Technology, India 4 Computer Engineering Department 5 Kuwait University, Kuwait
Abstract. A new model for semantic caching of location dependent data in mobile environments is proposed. In the proposed model, semantic descriptions are designed to dynamically answer the nearest neighbor queries (NNQ) and range queries (RQ) from the client cache. In order to accurately answer user queries, the concept of partial objects is introduced in this paper. Both NNQ and RQ results are stored as semantic regions in the cache. We apply a cache replacement policy RAAR (Re-entry probability, Area of valid scope, Age, Rate of Access) which takes into account the spatial and temporal parameters. The cache replacement policy which was proposed for NNQ is also applied to evict the cached semantic regions. The experimental evaluations using synthetic datasets show that RAAR is effective in improving the system performance when compared to Least Recently Used (LRU) and Farthest Away Replacement (FAR) replacement policies. Keywords: Mobile Computing, Location Based Services, Data Caching, Semantic Caching.
1 Introduction Advances in wireless networking, the proliferation of portable devices and ability to gather information about the users’ immediate surroundings have contributed to the growing popularity of a new kind of information services, called Location-based services (LBS) [7, 9, 11, 12, 14, 15, 18, 21]. Schiller J.H et al. [31] have observed that LBS are very enticing from an economic point of view and they increase users’ productivity, perceived to be very enjoyable and useful for location dependent tasks. LBS, being wireless in nature are plagued by mobility constraints like limited bandwidth, client power and intermittent connectivity [1, 5, 33]. Data caching at mobile clients is an effective antidote for the above cited limitations and the importance of data caching has been observed early and could be seen in works of Acharya S. et al. [1] and Barbara D. [4]. Many authors like Agrawal D.P. et al [2] and H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 151–165, 2011. © Springer-Verlag Berlin Heidelberg 2011
152
N. Ilayaraja et al.
Lien C-C et al.[24] have observed that these Location-Based Services which are available on hand-held devices would become a field of active research. 1. In this paper a new semantic caching model to answer location dependent nearest neighbor queries (NNQ) and range queries (RQ) in mobile environments is proposed. In a real world scenario the mobile user would request a variety of RQ and NNQ dynamically. The proposed scheme is designed to answer both kinds of queries by making use of semantic descriptions available in the mobile cache. In order to take into account the spatial and temporal factors for cache eviction of the semantic regions, the cache replacement policy RAAR (Re-entry probability, Area of valid scope, Age, Rate of Access) is applied. The later part of the paper studies the performance of various cache replacement policies and experimental results show that the replacement policy RAAR performs better when compared to Least Recently Used (LRU) and Farthest Away Replacement (FAR). The fact that clients in a mobile environment can change locations opens up the possibility of answering queries that are dependent on the current position of the client. These kinds of queries are called location dependent queries. Examples of location-dependent queries are as follows, i)
“Where is the nearest gas station from here?”, called as nearest neighbor queries (NNQ); ii) “List all restaurants within 2 kilometers from here!” called as range queries (RQ). The results of Location-dependent queries are based on the current location of the user who issued it. The user might ask for an essential service (example fuel stations) within a particular distance range (Range Query) and might move on. After a point of time, the user might run out of fuel and is compelled to ask for the fuel station closest to him (Nearest Neighbor Query). In such cases the nearest neighbour queries could be answered from the users’ cache if the necessary details exist. If a range query is followed by another range query, it could be answered partially or completely from the client cache depending on the level of intersection. 1.1 Data Caching Data Caching is a data replication mechanism in which copies of data are brought to a mobile unit as a response to a query and retained at the cache for possible use by subsequent queries thereby improving data accessibility and minimizing access costs. Few works like Kambalakatta R. et al. [19], Xu J. et al. [32] and Zaslavsky A. et al. [34] have noted that wireless data transmission requires a greater amount of power or up to 10 times as much power as the reception operation. Hence the usage of uplink and downlink channel has to be taken into serious consideration. . Semantic caching schemes were studied by Li L. et al. [22], Ren Q. et al. [29], Zheng B. et al. [35, 36] to access location dependent data in mobile computing. Semantic Caching is a technique to manage the client cache by storing the query descriptions along with the query results. Semantic caching makes cache management more flexible. The cache can be managed based on temporal and spatial information. Semantic caching is also useful during disconnections by answering users’ queries locally from the cache. Most of the research works done on semantic caching are restricted to any one type of location dependent query. In this work, semantic caching technique is applied to answer both range queries and nearest neighbor queries from the client’s cache.
Semantic Data Caching Strategies for Location Dependent Data
153
1.2 Semantic Caching Semantic cache manages the client cache as a collection of semantic regions. The access information is managed and cache replacement is performed at the unit of semantic regions. Each semantic region contains the cached data, semantic description of the data, valid scope of semantic region and last access time. Semantic description is defined according to the cache model proposed in Ren Q. and Dunham M. [28]. Semantic description is stored in client cache for each Location Dependent Data (LDD) query. An instance of semantic description is a tuple S (SR, SP, SC, SL, ST), where SR is the source basic data item, SP is the selection condition, SC represents actual data item’s content of the semantic segment S, SL is the bound of the location (Valid Scope) and ST is the latest accessed time of the semantic segment. When a query is received, the client verifies the semantic region to answer the query from semantic cache. The query can be either completely answered from the cache or partially answered from the cache. If it is completely answered from the semantic cache then there is no communication between client and server to fetch the data items. If it is not fully answered from the cache then the query is divided into probe query and remainder query. A probe query can be answered by the intersecting semantic regions in the cache and the rest of the regions (non-intersecting) are sent as a remainder query to the server for evaluation and the returned results are updated in the cache. This section presents details of cache management for the range query (RQ) and the nearest neighbor query (NNQ). Figure 1 depicts the processing of range queries. Rectangular areas from S1 to S6 represent semantic descriptions in S. Consider the case where a mobile user issues query Q “Find all the hospitals within 5 KM from my current location”; whenever mobile user issues query Q; the query Q is trimmed to find out probe query and remainder query. The query Q is depicted as a lighter shaded region named as S6. The query Q can be partially answered from the overlapped semantic scope regions which exists in cache; the lighter shaded box represents probe query and the darker shaded box represents remainder query. The remainder query is the one which is sent to the server for fetching the objects not present in the cache. The results of RQ depend on the degree of the overlap between the semantic description area and the query area. As shown in Figure 1, probe query area overlaps with two semantic descriptions S4 and S5. According to the degree of overlap, the query Q process is divided into three cases: Client can obtain all of the query results from its cache; where remainder query is null; the query Q partially overlaps with arbitrary descriptions S4 and S5, that is, the result of a probe query area is obtained from the cache, but the remainder query area is obtained after it requests the remainder query from the server; no overlapping area exists between the query area and the existing semantic description area, and then client has to send the entire query to the server. Now for answering the query Q, probe query returns 07, 08, and 09 from the overlapped areas which are in cache. Then client sends remainder query to server and receives 011 .Then, the range query Q results become objects 07, 08, 09 and 011. Once the query results are acquired according to their degree of overlap, they are added / merged to the semantic description table S. When the same query is later issued by the mobile user, the number of remainder queries is reduced, and the utilization of the user’s cache is maximized.
154
N. Ilayaraja et al.
Fig. 1. Semantic Regions
2 Related Works Dar S. et al. [10] proposed a semantic model for client-side caching and compared this approach to page caching and tuple caching strategies. The client cache is treated as a collection of semantic regions; data access and replacement is performed at the unit of semantic regions. Semantic regions, like pages, provide a means for the cache manager to aggregate information about group of tuples. Unlike pages, the size and shape of semantic regions can change dynamically. Semantic Caching model proposed by Dar S. et al. [10] organizes the cache in the granularity of a query. The client maintains both the semantic descriptions and the associated results of queries in the cache. They suggested a replacement strategy based on the semantic locality. For each semantic region in the cache, its distance to the most recently accessed region is calculated. The region with the largest distance is the candidate segment for replacement. The semantic regions are assumed to be rectangle in shape and the distance is measured from the center of the rectangle. Bjorn Por Jonsson et al. [6] observes that this work has motivated a line of research that focuses on caching query results at a fine granularity. The existing works done on semantic caching focused on answering the range queries in mobile environment. A new range query from the user was totally or partially answered from cache using previously cached range queries. In case of being partially answered, the query is trimmed and a remainder query is sent to the server. The efficiency of any caching mechanism is dependent on the replacement policy which is adopted. The policies which considered both the spatial and temporal factors have showed better performance for location dependent data than the traditional policies with respect to tuple caching. These policies have not been exploited for semantic caching.
Semantic Data Caching Strategies for Location Dependent Data
155
Kang S.W. et al. [20] have studied the possibility of a client issuing range and nearest neighbor queries using semantic caching. In their work, answering the NNQ depends purely on whether the query point exists in a semantic description. If the query point is inside the semantic description then nearest object in the description is treated as the result. If the query point exists outside of the descriptions then the nearest description to the query point, is selected. Once the nearest description is selected, the client makes a new description as a minimum rectangle including the query point and the selected description. At this time, the expanded part of description is requested from the server, as in the remainder query of a RQ process. After receiving the result of an expanded part the client checks for the nearest object in the expanded description. The resultant nearest neighbour could be from the old description or the new expanded description. According to their model when query point is inside the semantic region an approximate nearest neighbour is given and not the exact nearest neighbour because when the query point lies very close the edges of the semantic description its NN may exist outside the description. If the NN is not in the cache, the overheads of fetching all the remainder query objects are very high. After fetching the objects from the server, the NN might exist in the old description itself. This might worsen the utilization of cache and increase access costs. In order to overcome the above mentioned drawback, we propose a new semantic model which handles both NNQ and RQ by using the concept of partial objects. We also used a cache replacement policy RAAR which takes into consideration both spatial and temporal factors for cache eviction.
3 Assumptions and Terminologies A standard cellular mobile network is assumed which consists of mobile clients and fixed hosts, and the client can have seamless mobility across the network. A twodimensional geometric location model is assumed wherein the location of a mobile client could be determined using any positioning technology. In this work the data items are assumed to be of fixed size and read only. 3.1 Valid Scopes Valid scope of an item is defined as a region within which the item value is valid. The range query contains multiple objects with a certain radius which is specified by the user. The valid scope of a range query is treated as a circle whose center is the query location and the valid scope is the minimum bounding rectangle (MBR) of the circle. The scope of a NN query is also an MBR of the valid scope as per Approximated Circle (AC) scheme. Let us consider the case of a user issuing a RQ and NNQ consecutively. The user issues a RQ for an item and the MBR shown in figure 2 is the geographical area covering the range query. It contains three full objects O1, O2 and O3. The valid scopes for NNQ are generated using voronoi diagrams [26].
156
N. Ilayaraja et al.
An instance of full object is a tuple F (FD, FV, FC, FL, FI), where FD is the source basic data item, FV is the valid scope to which it belongs, FC represents actual data item’s content, FL is the bound of the location and FI is the invalidation information. The full objects are valid in the voronoi cells V1, V2 and V3. If the user issues a NNQ from within the MBR, it could be answered only if the locations of all the objects of the voronoi cells which are all intersecting the MBR are known. The distance to every full object from the user’s current location is calculated and the one with the minimum distance is given as the nearest object to that user.
Fig. 2. Semantic Cache Description with all objects in MBR
Figure 3 shows a range query covering two objects and intersecting three voronoi cells. Since third object O3 lies outside the MBR, A NNQ cannot be answered from the MBR if the query is issued from the area shaded in dark grey, the NNQ would give an erroneous result (Approximate NN). So to precisely answer a NN query from the MBR, the location of the third object O3 which lies outside the MBR should be known. 3.2 Valid Scope with Partial Objects To overcome the above mentioned problem in finding the exact nearest neighbour, partial objects were introduced whose location alone is stored in the valid scope. An instance of partial object is a tuple P (PV, PL, PI), where PV is the valid scope to which it belongs, PL is the bound of the location and PI is the invalidation information. Partial objects are not explicitly requested by the user but they are required to answer a nearest neighbour query accurately. Full objects are those which are accessed on demand by the user and are stored in cache. The main advantage of using partial objects is the fact that it helps in giving precise solutions while storing only location information about the extra objects.
Semantic Data Caching Strategies for Location Dependent Data
157
When a user requests for a range query, the resultant full objects and the partial objects along with the invalidation information are sent. An important observation here is that the partial objects are related to the corresponding valid scopes and they always lie outside the MBR. The partial objects will contain only the location information of all the objects which are not included in the MBR but are part of the set of regions which are intersected by the MBR. In the above example O3 will be the partial object of the valid scope. This will be retrieved along with the query and stored in the cache to accurately answer any future nearest neighbour queries.
Fig. 3. Semantic Cache Description with partial objects in MBR
4 Proposed Policy The new semantic model proposed uses the full and partial objects to precisely respond to the user queries while dynamically handling RQ and NNQ. RAAR replacement policy which was initially designed to handle NNQ is used to calculate the replacement scores for cache eviction of semantic regions. This section presents details of query processing for the range query (RQ) and the nearest neighbor query (NNQ). 4.1 Answering Nearest Neighbour Queries The valid scope for a range query consists of the vertices of the MBR and the locations of the partial objects that are within the voronoi cells intersected by the MBR but not requested on demand. If the user issues a query from any one of the MBR in cache then any one of the following cases might hold good. According to the location of the user, the query process is divided into three cases.
158
N. Ilayaraja et al.
Case-1: If a user asks a NNQ from the MBR which results in a full object, then answer the query from client’s cache. Case-2: If a user asks for a NNQ from MBR which results in partial object, then fetch the contents of the partial object from the server and make it as full object. Case-3: If a user asks for a NNQ from outside of the existing MBRs then fetch the result from the server. 4.2 Answering Range Queries Answering range queries has been studied in literature. The results of the range query depend on the degree of overlap and the number of full objects and the number of partial objects that lie in the overlapping area. According to the degree of overlap the query process is divided into the following cases. Case-1: All are full objects and the results are fetched from the cache. This is referred to as complete overlap. Case-2: If the probe query results in complete full objects, then fetch from the cache. If the probe query contains partial objects, then fetch contents from the server. To answer the remainder query, the full objects and partial objects are fetched from the server. This is referred to as partial overlap. Case-3: The complete query is answered from the server and the corresponding full and partial objects are cached. This is referred to as no overlap. 4.3 Cache Replacement The cache replacement policy has a significant impact on the performance of the cache management system. When a new data object needs to be added to the cache and there is insufficient cache space, the semantic region with the lowest replacement cost is removed. The Least Recently Used (LRU) [25] is the most commonly used replacement policy which evicts the least recently used item in the cache. LRU is not sufficient for location dependent data. A cache replacement policy determines the items to be evicted. Zheng B. et al.[37] , Ajey kumar [3] have studied the effect of replacement mechanisms and have proven that there is an improvement in overall system performance. Farthest Away Replacement (FAR) discussed in [28] uses the user’s current location and movement direction to make cache replacement decisions. Cached objects are grouped into two sets namely out-direction set and in-direction set. Data objects in the out-direction set are always evicted before those in the in-direction set. Objects evicted are based on their replacement scores considering the client distance. FAR deals only with the impact of the client location and client movement while neglecting the temporal properties of the client’s access pattern. In this work we have used RAAR [13] as the replacement policy considering the unit of replacement to be a semantic region. A replacement score is associated with every semantic region in the cache. Calculation of replacement score should take into account both temporal and spatial properties of information access. Items which are fresh having high access frequency, closer to the user’s current location and with
Semantic Data Caching Strategies for Location Dependent Data
159
large semantic region are to be retained. If the semantic region is small, the user would move out of it pretty fast. The semantic regions in cache are ranked based on the replacement score. The cache replacement policy RAAR [13] has been extended to apply it in semantic caching scheme. Existing RAAR policy was used to calculate replacement score for an individual data item in the client cache. In this work RAAR is used to calculate the replacement score for semantic region. The replacement score is calculated by the following formula. Replacement Score =
Pi * A(VS i , j ) * D (VS i , j ) k Ai
where k = -1 when user is outside the semantic region k=1 when user is inside the semantic region. Pi is the rate of access calculated per data item and is calculated using the exponential aging method. D(VSi,j) is a measure of the entry or exit probability based on the location of the user. Ai is the age of the data item. The semantic regions are ranked according to the score and they are evicted based on the incoming data size. The best fit method is employed here. The proposed valid scope using partial objects helps the user to precisely answer the user queries while handling both the range queries and nearest neighbour queries. The cache replacement policy RAAR supports better handling of the semantic regions in cache and helps to achieve a higher cache hit ratio. In the next section we describe the experiments conducted to substantiate our claims of the proposed semantic caching strategy for location dependent query in mobile environments.
5 Experiments This section describes the simulation model used to evaluate the performance of the proposed location-dependent replacement policy. The simulator was built using Java. It consists of a system execution model, server execution model and a client execution model as described in [13]. Experiments were carried out to check the performance of the replacement policy. The service area is represented by a rectangle with a fixed size of 4000m * 4000m where a “wrapped around” model is assumed for the service area in which a client leaves one edge of the service area to enter the service area from the opposite edge at the same velocity. The database contains ItemNum items or services which are requested by the clients and every item has different values. Every item may have different data values for different locations within the service area. The access pattern is generated using Zipf distribution [38, 39]. The network consists of uplink channel and downlink channel. The uplink channel is used by clients to issue NNQ, RQ and remainder queries. The downlink channel is used to return the query results. It is assumed that the user issues a combination of NNQ and RQ. Two models for representing locations exist which are observed in works of Lee D.L. et al.[21] and Pagonis J. and Dixon J. [27] where a two-dimensional geometric location model is assumed wherein the location of a mobile client can be determined using systems such as Global Positioning System (GPS). In this paper the data items are assumed to be of fixed size and read only.
160
N. Ilayaraja et al.
The server is modeled by a single process that services the requests from clients. The requests are buffered at the server if necessary and an infinite queue buffer is assumed. The First Come First Service principle is assumed in the model. The server uses point location algorithms to answer location-dependent queries. Locating points with respect to the specified area is a fundamental problem in computational geometry and could be found in works like Rourke J. [30]. Since the main concern is the cost of the wireless link, which is more expensive than the wired-link and disk IO costs, the overheads of request processing and service scheduling at the server are assumed to be negligible in the model. The input for the simulation is taken for varied proportions of RQ and NNQ starting from purely RQ to purely NNQ. The results are averaged over the output of all these proportions and cache hit ratio is calculated. Table 1 summarizes the configuration parameters of the client model. Table 1. Parameter Settings
Parameter QueryInterval MovingInterval
Settings Average time interval between two consecutive queries Time duration that the client keeps moving at a constant velocity
MinSpeed
Minimum moving speed of the client
MaxSpeed
Maximum moving speed of the client
CacheSizeRatio
Ratio of the cache size to database size
ParaSize
Space needed for storing each parameter for cached data
Theta
Skewness parameter for the Zipf access distribution
Average time interval between queries
50 sec
Cache size
10% of the database size
Uplink Bandwidth
15 Kbps
Downlink Bandwidth
120 Kbps
Experiments were carried out on the proposed semantic caching mechanism. The performance of the system was studied with most commonly used LRU, FAR and RAAR as replacement policies. The input data is taken across sets of user patterns by varying the Zipf factor. An average of all the inputs is taken for consideration and the results are shown below for 1000 consecutive user queries with an average query interval. The input is also varied for various proportions of NNQ and RQ and the average result is taken for cache hit ratio analysis.
Semantic Data Caching Strategies for Location Dependent Data
161
Fig. 4. Cache Hit Ratio Vs Cache Size for NNQ and RQ
Figure 4 shows the comparison of the performance of RAAR with existing Least Recently Used (LRU) algorithm and Farthest Away Replacement (FAR) algorithm with respect to the change in size of the cache, where the cache size is measured in terms of number of real objects in the cache. As the cache size increases we can see an increase in the overall cache hit ratio. This is because the cache can hold a larger number of data items which increases the probability of getting a cache hit. Moreover, replacement occurs less frequent as it happens with low size cache. RAAR consistently outperforms LRU and FAR from small size cache to large size cache. For the given input number of queries, the cache hit ratio tends to stabilize over increasing cache size. This result is average over various proportions of NNQ and RQ and is observed that average improvement of RAAR over LRU is 30% and that over FAR is 16.5%. Since this work assumes that clients can issue combination of range and NN queries, an experiment was done by varying the ratio of range and NN queries. For purely NNQ and purely RQ, RAAR clearly outperforms both FAR and LRU. Fig 5 shows the effect RQ to NNQ ratio has on the cache hit ratio. We assume a fixed cache size of 50 for this simulation. As the ratio of RQ to NNQ increases we observe an overall increase in the cache hit ratio. This is because, as the percentage of RQ increases the number of partial objects used for answering NNQ increases and hence the overall cache hit ratio increases. We can observe that as it tends to be purely RQ, the overall cache hit ratio dips. This is because the number of NNQ is low and hence the probabilities of using partial objects to answer them are low. We can see that for a RQ : NNQ ratio of 75 : 25, maximum cache hit ratio is observed. Here also RAAR consistently outperforms both LRU and FAR for varying RQ : NNQ ratio.
162
N. Ilayaraja et al.
Fig. 5. Cache Hits for ratio of RQ | NNQ
Fig. 6. Object Hits vs Cache Size for NNQ and RQ
Figure 7 shows a comparison of the total response time of RAAR, LRU and FAR to the cache size. The response time is measured in unit time and is measured for a query size of 1000. The results are averaged over proportions of RQ and NNQ. The response time is the sum of the uplink time, the server computation time and the downlink time. We have declining graph because more queries will be answered from
Semantic Data Caching Strategies for Location Dependent Data
163
cache hits when the cache size increases. The response times are maximum at the query size between 5 and 15 because there will be higher number of server requests to answer user queries as the cache storage is small. The graph stabilizes after a cache size of 30. RAAR takes lesser time to respond to user queries than LRU and FAR which can be clearly seen in the above graph, and this when trying to answer both NNQ and RQ is a significant achievement.
Fig. 7. Response Time vs Cache Size for NNQ and RQ
6 Conclusion A new model for semantic caching for location dependent queries has been proposed in this work. The possibility of clients issuing a combination of range and NN queries has been taken into consideration. Most of the semantic caching schemes reuse cached data based on similar query types. This work uses cached range query data to answer an NN query; a range query might also use the NN query data provided the valid scope of NN query is large. The performance of this proposed model was tested with LRU, FAR and RAAR replacement polices. Cache hit ratio was used as metric for performance evaluation. Simulation studies showed that RAAR outperformed the traditional policies. The emergence of new applications will lead to new development of new types of spatial queries. In future, we wish to explore the possibilities of extending semantic caching techniques to cover other complex queries like spatial skyline queries.
164
N. Ilayaraja et al.
References 1. Acharya, S., Alonso, R., Franklin, M., Zdonik, S.: Broadcast Disks: Data Management for Asymmetric Communications Environments. Proceedings of ACM SIGMOD Conference on Management of Data, 199–210 (1995) 2. Agrawal, D.P., Zeng, Q.A.: Introduction to Wireless and Mobile Systems. Thomson Brooks/Cole Inc. (2006) 3. Kumar, A., Misra, M., Sarje, A.K.: A Weighted Cache Replacement Policy for Location Dependent Data in Mobile Environments. In: SAC 2007, pp. 920–924 (2007) 4. Barbara, D.: Mobile Computing and Databases: A Survey. IEEE Transactions on Knowledge and Data Engineering 11, 108–117 (1999) 5. Barbara, D., Imielinski, T.: Sleepers and Workaholics: Caching Strategies in Mobile Environments. In: Proceedings of SIGMOD 1994, pp. 1–12 (1994) 6. Jonsson, B.P., Arinbjarnar, M., Porsson, B., Franklin, M.J., Srivastava, D.: Performance and Overhead of Semantic Cache Management. ACM Transactions on Internet Technology 6, 302–331 (2006) 7. Cheverst, K., Davies, N., Mitchell, K., Friday, A.: Experiences of Developing and Deploying a Context-Aware Tourist Guide: The GUIDE Project. In: Proceedings of the International Conference on Mobile Computing and Networking, pp. 20–31 (2000) 8. Cho, G.: Using predictive prefetching to improve location awareness of mobile information service. In: Sloot, P.M.A., Tan, C.J.K., Dongarra, J., Hoekstra, A.G. (eds.) ICCS-ComputSci 2002. LNCS, vol. 2331, pp. 1128–1136. Springer, Heidelberg (2002) 9. Dao, D., Rizos, C., Wang, J.: Location-Based Services: Technical and Business Issues. GPS Solutions 6, 169–178 (2002) 10. Dar, S., Franklin, M.J., Jonsson, B.T., Srivatava, D., Tan, M.: Semantic Data Caching and Replacement. In: Proceedings of the VLDB Conference, pp. 330–341 (1996) 11. De Montalvo, U.W., Ballon, P.: Business Models for Location Based Services. In: Proceedings of the AGILE Conference, pp. 115–121 (2003) 12. Dunham, M.H., Kumar, V.: Location Dependent Data and its Management in Mobile Databases. In: Proceedings of Ninth International Workshop on Database and Expert Systems Applications, pp. 414–419 (1998) 13. Mary Madgalene Jane, F., Ilayaraja, N., Safar, M., Nadarajan, R.: Entry and Exit Probabilities based Cache Replacement Policy for location dependent data in mobile environments. In: ACM proceedings of the 7th International Conference on Advances in Mobile Computing & Multimedia, MoMM 2009 (2009) 14. Huang, X., Jensen, C.S.: Towards A Streams-Based Framework for Defining LocationBased Queries. In: Proceedings of Spatio-Temporal Database Management Workshop, pp. 73–80 (2004) 15. Ilarri, S., Mena, E., Illarramendi, A.: Location-Dependent Query Processing: Where We Are and Where We Are Heading. ACM Computing Surveys 42(3) (September 2010) (to appear) 16. Jeganathan, C., Sengupta, T.: Utilization of Location Based Services for the Benefit of Visually Handicapped People. In: Proceedings of MapIndia (2004), http://www.gisdevelopment.net 17. Jung, I., You, Y., Lee, J., Kim, K.: Broadcasting And Caching Policies For LocationDependent Queries In Urban Areas. In: Proceedings of the International Workshop on Mobile Commerce, pp. 54–60 (2002) 18. Kalakota, R., Robinson, M.: M-Business: The Race to Mobility. McGraw-Hill Companies, New York (2001)
Semantic Data Caching Strategies for Location Dependent Data
165
19. Kambalakatta, R., Kumar, M., Das, S.K.: Profile Based Caching to Enhance Data Availability in Push/Pull Mobile Environments. In: Proceedings of the International Conference on Mobile and Ubiquitous Systems: Networking and Services, pp. 74–83 (2004) 20. Kang, S.-W., Gil, J.-M., Keun, S.: Considering a User’s Mobility and Query Patterns in Location- Based Services. In: Proceedings Of the ACM Conference on Mobility 2007, pp. 386–393 (2007) 21. Lee, D.L., Lee, W.C., Xu, J., Zheng, B.: Data Management in Location-Dependent Information Services. IEEE Pervasive Computing 1, 65–72 (2002) 22. Li, L., Birgitta, K.R., Pissinou, N., Makki, K.: Strategies for Semantic Caching. In: Proceedings of the International Conference on Database and Expert Systems Applications, pp. 284–298 (2001) 23. Li, Z., He, P., Lei, M.: Research of Semantic Caching for LDQ in Mobile Network. In: Proceedings of the IEEE International Conference on e-Business Engineering (ICEBE 2005), pp. 511–517 (2005) 24. Lien, C.-C., Wang, C.-C.: An Effective Prefetching Technique for Location-Based Services with PPM. In: Proceedings of the Conference on Information Sciences, JCIS (2006) 25. O’Neil, E., O’Neil, P.: The LRU-k Page Replacement Algorithm for Database Disk Buffering. In: Proceedings of the ACM SIGMOD, pp. 296–306 (1993) 26. Okabe, A., Boots, B., Sugihara, K., Chiu, S.N.: Spatial Tessellations, Concepts and Applications of Voronoi Diagrams. John Wiley and Sons Ltd. Chichester (2000) 27. Pagonis, J., Dixon J.: Location Awareness and Location Based Services (2004), http://www.symbian.com 28. Ren, Q., Dunham, M.: Using Semantic Caching to Manage Location Dependent Data in Mobile Computing. In: Proceedings of the International Conference on Mobile Computing and Networking, pp. 210–221 (2000) 29. Ren, Q., Dunham, M.H., Kumar, V.: Semantic Caching and Query Processing. IEEE Transactions on Knowledge and Data Engineering 15, 192–210 (2003) 30. Rourke, J.: Computational Geometry in C. Cambridge University Press, Cambridge (1998) 31. Schiller, J.H., Voisard, A.: ’Location-Based Services. Morgan Kaufmann Publishers, San Francisco (2004) 32. Xu, J., Lee, D.L., Hu, Q., Lee, W.C.: Data Broadcast. In: Handbook of Wireless Networks and Mobile Computing, . ch. 11, pp. 243–265. John Wiley and Sons, Chichester (2002) 33. Yin, L., Cao, G., Cai, Y.: A generalized Target-Driven Cache Replacement Policy for Mobile Environments. In: Proceedings of the IEEE Symposium on Applications and the Internet, pp. 14–21 (2003) 34. Zaslavsky, A., Tari, S.: Mobile Computing: Overview and Current Status. Australian Computer Journal 30, 42–52 (1998) 35. Zheng, B., Lee, D.L.: Semantic Caching in Location Dependent Query Processing. In: Proceedings of the International Symposium on Spatial and Temporal Databases, pp. 97–116 (2001) 36. Zheng, B., Lee, W.-C., Lee, D.L.: On Semantic Caching and Query Scheduling for Mobile Nearest-Neighbor Search. Wireless Network 10, 653–664 (2004) 37. Zheng, B., Xu, J., Lee, D.L.: Cache Invalidation and Replacement Strategies for LocationDependent Data in Mobile Environments. IEEE Transactions on Computers 51, 1141–1153 (2002) 38. Zipf, G.K.: Selected Studies of the Principle of Relative Frequency in Language. Harvard University Press (1932) 39. Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge (1949)
PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution in Data Stream Management System Mehdi Alemi1, Ali A. Safaei2, Mostafa S. Hagjhoo1, and Fatemeh Abdi3 1
Data Stream Systems Lab, Department of Computer Engineering, Iran University of Science and Technology, Hengam Street. Tehran, Iran [email protected], [email protected] 2 Department of Computer Engineering, Azad University (Mahdishahr unit), Semnan, Iran [email protected] 3 Department of Computer Engineering, Azad University (Babol unit), Babol, Iran [email protected]
Abstract. In Data Stream Management Systems (DSMSs), as long as continuous streams of data are arriving in the system, queries are executing on these input data. Regarding high volume of input data, having high processing capacity by using multiple processors is non-negligible. Also, many applications of DSMSs, such as traffic control systems, and health monitoring, have real-time nature. To support these features, this paper aims at developing an efficient multiprocessor real-time DSMS. To achieve efficiency, a multiprocessor real-time scheduling algorithm is proposed based on partitioning approach. In this algorithm, each received query has a chance to fit into any processor with first fit assignment. If it could not fit due to its utilization then that query is broken into some queries with smaller processing capacity based on utilization of processors. We conduct performance studies with real workloads. The experimental results show that the proposed algorithm outperforms the simple partitioning algorithm. Keywords: Data Stream, DSMS, Real-Time, Multiprocessor, Partitioning.
1 Introduction Traditional DBMSs expect all data to be managed as persistent data sets [1]. However, many modern applications such as network monitoring, financial analysis, and traffic management systems are feeding by continues, unbounded, and time varying streams of data [2, 3]. These applications also have inherent real-time requirements, i.e., their queries should be finished under specified deadlines [4]. As well as DBMSs are appropriate for managing persistent data sets, DSMSs are appropriate for handling continuous data streams. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 166–179, 2011. © Springer-Verlag Berlin Heidelberg 2011
PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution
167
High workload due to high volume of input data may cause to which some queries not to be processed timely. So, it is required to increase the processing power of the system by multiple processors. Using multiple processors, the Real-Time DSMS has complicated scheduling algorithm. There are two approaches in multiprocessor real-time scheduling algorithms: partitioning and global scheduling [5]. In the partitioning approach, each processor has its own task waiting queue. The set of tasks is partitioned and each task is assigned to the proper processor according to the allocation algorithm. Each processor executes tasks in its task waiting queue according to its real-time scheduling policy. In the global approach, each task can be executed over all processors. In fact, a task which is started in a processor can migrate to any other processor to be continued. In the proposed system, modified partitioning algorithm is used as a scheduling algorithm which will improve the performance of the system against simple partitioning algorithm. Based on our algorithm, each query is attempted to assign to a processor. Assigning a query to a processor is possible if utilization of that query is less than remaining utilization of that processor. All queries are periodic and utilization of each query is equal to the ratio of execution time to period of that query. Utilization of each processor is dependent on used uniprocessor real-time scheduling algorithm. Because of using optimal EDF scheduling algorithm, maximum utilization of each processor is equal to one. If query could not be assigned to any processor then it is broken into some queries with smaller execution requirements. These new generated queries are alternative for original query and have equal query graphs as the original query. The rest of this paper is organized as follows. Related work is investigated in section 2. Although there are many studies on DSMSs, some general purpose DSMSs and real-time ones are reviewed. Definition of our system specifications is placed in section 3. Section 4 contains details of proposed DSMS. Performance evaluation is presented in section 5. Finally, conclusion is drawn in section 6.
2 Related Work STREAM [2] as a general purpose DSMS aims to provide DBMS’ functionalities for continuous queries. But in fact, employing stream-to-relation, relation-to-relation and relation-to-stream operators has considerable overheads for underlying system which is not profitable for real-time applications. Therefore, STREAM extension (known as RTSTREAM [4]) supports real-time requirements. It uses the EDF real-time scheduling algorithm which is optimal for uniprocessor systems but not for multiprocessor systems. The proposed method in this paper is based on a multiprocessor system to gain desired processing power for satisfying deadlines. Aurora is also a general system for processing several data streams [3]. It uses multiple operator scheduling algorithms which reduce tuple latency or memory usage. Query execution precedence is determined based on QoS metrics. In Aurora, every arrived tuple is considered as a task that must be scheduled and processed. In order to reduce the overheads, a batch of tuples is processed each time. Although tuple latency
168
M. Alemi et al.
is important as a QoS metric, Aurora does not provide any guarantee about it. Also, Aurora does not consider real-time requirements (e.g., deadline) at all. In [3] eight misconceptions in real-time databases are discussed. One of the most common and important misconception is: “real-time computing is equivalent to fast computing.” In fact, fast processing does NOT guarantee time constraints. In other words, although being fast is necessary but is not sufficient. For a real-time system, there is a need for other mechanisms (real-time scheduling, feedback control, etc) to handle and satisfy time constraints. To meet fast operation, many real-time systems are multiprocessor systems. Processing of continuous and periodic queries is considered in [9]. It aims to minimize deadline miss ratio of aperiodic queries via TV servers. Real-time DSMS which is supposed to rely on a real-time operating system and named QStream is proposed in [10]. But, to be real-time is non-functional and must be considered in all of the abstraction layers of the system. Since the proposed system in [10] is based on static assumptions, it is not applicable in real systems. So, adaptable version of the QStream is considered in [11]. We have presented parallel processing of continuous queries over logical machines in [7]. Scheduling method employed in [7] is dynamic but event-driven (in overload situation). Considering the continuous nature of continuous queries and data streams, compatibility with this nature and adaptivity with time varying characteristics of data streams is very important. In [8], we introduced dynamic continuous scheduling method (dispatching) to substitute the even-driven one presented in [7]. Employment of dispatching instead of event-driven scheduling provided system performance improvement as well as fluctuations reduction [8]. In this paper, real-time processing of queries based on multiprocessors is proposed in which improved version of partitioning multiprocessor real-time scheduling approach is employed.
3 System Specifications This section contains main definition and specification of the proposed multiprocessor real-time data stream management system. Real-Time Query Model Our DSMS supports periodic query model, i.e., each instance of query is activated in fixed period of time. Queries executes periodically on continuous received tuples. For each query there is one queue holding arrived tuples. Each query selects maximum M tuples, execution window, from its queue and should process all of its received tuples within defined deadline. If query instance processes all of its selected tuples then its miss ratio is zero, otherwise, it misses processing some tuples. Equation 1 shows miss ratio of each executed query instance. miss =
M −N . M
(1)
PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution
169
In this equation M is number of selected tuples and N is number of processed tuples of query instance. Definition 1: Execution Window Execution window, We, is maximum number of tuples selected by each instance of query for execution. As soon as a new instance of query executes, no newly arrived tuples will be selected by current instance of query and will be selected in the next instances. In this model, utilization of each query is defined as equation 2. In this equation, Ci is execution requirement for each tuple, M is size of execution window, and Pi is period of query i. Ui =
Ci ∗ M . Pi
(2)
Figure 1 shows internal structure of each query, i.e., query graph. There are some operators which receive their input tuples from main queue or predecessor operators. Each operator has its own queue buffer. Input tuples are buffered in MainQueue and OutWriter writes the results to output. We is execution window which is the size of input tuples per instance of query.
Fig. 1. Internal structure of a query
Real-Time Data Stream System Model In order to clarify system specification, this section describes the proposed data stream management system model. • Hard, soft, or firm real-time: Due to unpredictable condition of system, exact estimation of execution time of each query is impossible. So, the proposed real-time DSMS belongs to soft real-time. The returned value by the query instance i is based on its miss ratio and is defined as equation 3. The less miss ratio, the more value is returned by query instance i. Vi = 1 − Missi .
(3)
• Release of queries: each query is released periodically based on defined period.
170
M. Alemi et al.
• Preemption: Preemption of query instances is allowed, i.e., a higher priority query may postpone execution of a lower priority query. • Priority assignment: Priority assignment of query instances is dynamic, which is calculated at run time. • Dependency among queries: It is assumed that there is no dependency between queries. • Single processor or multiprocessor: To get high processing performance, multiple processors is considered and a multiprocessor real-time scheduling algorithm is employed. Because of using partitioning approach, task migration is not allowed. • Closed loop or Open loop: Since the environment of system is dynamic, it is required to use feedback control mechanism for system tuning. So, this system is closed loop.
4 Proposed DSMS We consider using multiple processors in our system. Necessary mechanisms are employed in system architecture to support real-time scheduling and deadlines. Different part of our proposed system is depicted in figure 2.
Fig. 2. Architecture of proposed real-time DSMS
Queries along with their characteristics are received by Request Manager and after generating query graph, it passes the queries to the Assigner. Assigner assigns received query to a Query Engine (QE) based on its utilization and may produce multiple similar queries with smaller utilizations to fit this query into QEs. Having assigned query to QEs, Assigner inform Admission Control about this assignment. Admission Control receives input tuples from each Input Reader and copies those tuples into the buffer of each registered query. Input Reader receives input data from streams and convert it to data tuples. Deadline Monitor monitors deadline of queries
PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution
171
and may configure Admission Control for load shedding. Finally, Output Writer writes output of each QE to the appropriate destination. Query Engine (QE) Query Engine executes a set of queries on one processor. Each QE has employed EDF scheduling to select a query for execution. Figure 3 shows the components of a QE. In this figure, queries are placed in Query Queue and EDF scheduler select one active query with the nearest deadline among eligible queries. After choosing a query, EDF scheduler sends it to Execution Thread for executing. Based on three conditions, a query will be stopped in Execution Thread: 1) all of the selected tuples are executed, 2) deadline of query is missed, and 3) one query with nearest deadline is activated and regarding preemptive feature of the queries, current query should be preempted.
Fig. 3. Internal structure of Query Engine
Multiprocessor Real-Time Scheduling Algorithm Multiprocessor real-time scheduling is based on partitioning approach. First, each received query will be assigned, if possible, to a QE based on First Fit assignment. If utilization of query is higher than utilization of all QEs then first step assignment fails and second step of algorithm starts to assign that query to some QEs. Assigning one query to some QEs is possible by breaking execution requirement of query into some smaller units. Since fitting of query into QE is based on utilization of both query and QE, breaking of execution requirement should be based on utilization. After some definitions, details of this algorithm will be discussed. • Definition 2: Query Breaking Each query qi can be broken into some queries qi1, qi2, …, qin with the same internal structures, query graph, as qi. Some of utilization and some of execution window of these new queries are as equal as utilization and execution window of qi respectively. Equation 4 and 5 represent these relationships. n
∑u
ij
= ui .
ij
= Mi .
(4)
j =1
n
∑M j =1
(5)
172
M. Alemi et al.
As mentioned before, utilization of each query i equals ui=(Ci*M)/Pi. In this relation, utilization has a direct relation with M, number of selected tuples for processing. So, utilization can be partitioned by breaking M into smaller parts. For example, if Ci and Pi are 1,100 respectively and M = 20 then ui=(1*20)/100=0.2. Now, producing two queries qi1 and qi2 from qi with execution window M1=5 and M2=15 splits utilization of qi to (1*5)/100=0.05 and (1*15)/100=0.15. So, execution of qi can be distributed to some queries with smaller utilizations. • Definition 3: Utilization of QE Utilization of QE j equals sum of utilization of its assigned query as shown in equation 6. U QE j =
∑u
(6)
. q
q ∈QE j
• Definition 4: Free Space of QE Since EDF scheduling is used in each QE, utilization of QE should not exceed one in order to guarantee deadlines. So, free space of QE is EQE = 1 − U QE . QEj is considered as a filled QE when EQE = 0 . j
j
j
• Definition 5: Fitting Query in QE Let utilization of query qi be ui and free space of QE be EQE . If j
ui ≤ EQE j
then qi can be fitted into QEj, otherwise, it cannot.
For assigning a query to multiple QEs, this algorithm works as follows. First, all QEs are sorted based on EQE in descending order. First not filled QE is selected for executing qi1, with execution window shown in equation 7. Therefore, remaining execution window is mx=M-mi1. Then, next QE is chosen for assigning qi2. Depending on condition stated in equation 8, execution window of qi2, mi2, may be different. If that condition was established then mi2=mx otherwise like mi1, mi2 is also calculated based on EQE2. In this situation mx is update to mx-mi2. Algorithm continues until mx reaches zero. Having finished the algorithm while having mx>0, assigning query qi to QEs fails. Pseudo code of assignment algorithm is shown in listing 1. ⎢ Pi * EQE1 ⎥ . mi1 = ⎢ ⎥ ⎣ Ci ⎦
(7)
Ci * m x ≤ E QE2 . Pi
(8)
Listing 1. pseudo code of assignment algorithm Assignment(Query qi) { Assign qi to a query based on First Fit If it is assigned then return success QEs = Sort QEs based on EQE in descending order mx = M, u = (Ci*M)/Pi, k = 1 Forach QEj in QEs { If(EQEj< u){ m = roof ((Pi*EQEj)/Ci)
PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution
173
mx = mx – m qik = Copy new query from qi with We=m Assign qik to QEj u = u – (Ci*m)/Pi k = k + 1 }//end if Else { qik = Copy new query from qi with We=mx Assign qik to QEj Return success }//end else }//end for Return fail } After assigning a query to a QE, Assigner informs admission control about this assignment. For broken queries, input tuples should be dispatched based on their execution window. Each input tuple is copied to the queue of one of the new generated queries. For example, let have two QEs and six queries. First five queries are fitted into QEs by simple first fit approach and two queries q61 and q62 are generated from q6 with execution window M1 and M2 respectively. As shown in figure 4, for each M received tuples, M1 tuples of them are copied into q61’s queue and M2 of them are copied into q62’s queue.
Fig. 4. Dispatching of input tuples between queries
Admission Control Admission Control receives input tuples and has two main tasks: load shedding and input load distribution. When an overload is detected, System attempts to reduce the volume of tuple processing via load shedding. After receiving each tuple of streams, Admission Control copies it in the queue of registered queries for that stream. In the heavy workload time when it is not possible to process all of input tuples and the number of query misses increases, it is preferable to decrease the workload by dropping some tuples. Load shedding is done based on α value, 0≤α≤1, which is configured by Deadline Monitor. It drops unprocessed tuples in the first stage of system with uniform distribution. The lesser value of α, the more drops of input tuples will be. Another task of Admission Control is to put received tuples to the queue of queries waiting for that tuples. For queries assigned by simple first fit assignment there is no problem. The problem is when a query is broken and its input data should be
174
M. Alemi et al.
dispatched. If query qi is broken to qi1,qi2,...,qin with execution window Mi1,Mi2,…,Min, respectively then for M received tuples, Mi1 of M is put into qi1’s input queue, Mi2 of M is put into qi2’s input queue, and so on. For example, let qi be broken to qi1 and qi2 with utilization 0.2 and 0.3 respectively. Then, for five received tuples two of them are put in queue of qi1 and three of them are put in queue of qi2. Sequence of putting these tuples in the queue of queries is shown in Table 1. Table 1. Sequence of delivering first five tuples to the broken queries qi1 and qi2 with utilization 0.2 and 0.3 respectively Tuples First tuple Second tuple Third tuple Fourth tuple Fifth tuple
Queue of query qi1 qi2 qi1 qi2 qi2
Deadline Monitor Deadline Monitor periodically measures the system performance and configures Admission Control. System performance is based on the number of misses. The lesser misses happens, the higher performance exists. As mentioned before, Deadline Monitor tunes load shedding in Admission Control by α value. As shown in equation 9 and 10, based on two averages, long miss average and short miss average, α value changes. Δ = −1 * ( β 1 ( AVGL (misses) − MAX miss ) + β 2 ( AVGS (misses) − MAX miss ) .
(9)
α =α +Δ.
(10)
In the above equations AVGL (misses) and AVGS (misses) are long term miss average and short term miss average, respectively. MAXmiss is maximum allowable miss ratio and β1 and β2, β1 + β2 =1, are parameters to specify the effect of either long term miss average or short term miss average on final calculated Δ value. When these averages are less than MAXmiss, Δ is positive and α will increase. The more α value, the lesser tuples will be dropped.
5 Performance Evaluation To evaluate our approaches in multiprocessor real-time scheduling, we developed a prototype [12] implemented in java language with JDK 6.0 in Linux environment on a machine with Core i7 2930 Intel processor and 6GB RAM. This system has been compared with a system using simple partitioning multiprocessor real-time scheduling. The simple partitioning algorithm on two processors has been outperformed RTSTREAM [4] which is a single processor real-time DSMS [13]. In the following, experimental setup and experimental results are discussed.
PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution
175
Experimental Setup Input data set is related to monitoring of IP packets which is located in Internet Traffic Archive (ITA) [14]. One of their traces, named “DEC-PKT” contains one hour’s worth of all wide-area traffic between Digital Equipment Corporation and the rest of the world. This real-world data set is used in our experiments. Two types of monitored packets, one is TCP packets and other is UDP packets, are selected as input streams. Each TCP packet contains five items: source address, destination address, source port, destination port, and length. UDP packets are same as TCP packets except containing length of packet. Number of Query Engines is three and ten queries are defined to be executed on the system. Each query is periodic with equal deadline and period. Utilization of each query is 0.3. Six queries include five operators, two queries include six operators, and the rest queries include seven operators. Supported operators are filter, join, project, union, intersection, and count. Duration of experiment is 120000 milliseconds. Number of runs is 10 and average results of these runs are considered as the final experimental results. Experimental Results As mentioned before, our scheduling algorithm is compared with simple partitioning approach which had better results than RTSTREAM [4]. Various parameters are chosen for comparing the performance of these algorithms. These parameters are: miss ratio, throughput, used memory, and tuple loss. Comparison of miss ratio is shown in figure 5. Although these two algorithms have almost same results in miss ratio till 50th second, PDMRTS shows its prominence in duration of experiment. As long as PDMRTS has lesser miss ratio than simple partitioning, its results have smaller variance as well.
Fig. 5. Comparison of miss ratio
176
M. Alemi et al.
Throughput, number of executed queries per second, is shown in figure 6. As it is obvious, throughput of PDMRTS is higher than simple partitioning. Since query numbers may be increased as a result of breaking queries and decreased miss ratio, it is perfectly rational to have higher throughput in PDMRTS.
Fig. 6. Comparison of throughput
In implemented system each query has its own used memory. So, more queries result in more memory usage. Since in PDMRTS query numbers may be increased, there is more memory usage in PDMRTS than that in simple partitioning. Diagram of memory usage is shown in figure 7.
Fig. 7. Comparison of memory usage
PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution
177
Tuple loss, number of accepted tuples by admission control but dropped due to filled queues, is compared in figure 8. Number of tuple loss in PDMRTS is same as simple partitioning till 106th second. However, after 106th second tuple loss increases suddenly, which it decreases by continuing executing of queries. PDMRTS has smaller tuple loss during the experiment.
Fig. 8. Comparison of tuple loss
Fig. 9. Average of comparison’s parameters
178
M. Alemi et al.
Average tuple loss in simple partitioning is 1267 which is higher than 1153 as average tuple loss of PDMRTS. As the number of queries in PDMRTS increases, it is acceptable to have more memory usage in PDMRTS. We had 9.81 MB average used memory in PDMRTS and 9.33 MB in simple partitioning. As the results show, our approach in Multiprocessor real-time scheduling, PDMRTS, perfectly could increase performance of systems in deadline miss ratio, throughput, and tuple loss in comparison with simple partitioning approach. Although the amount of used memory is increased, regarding high memory capacity of current memory technology, this difference in memory usage is tolerable.
6 Conclusion In Data Stream Management Systems, queries execute on continuous coming streams of data. Applications of DSMSs, such as network monitoring and financial analysis systems, have inherent real-time requirements. In this work, each query has a deadline and a period. Queries periodically execute on received data tuples. Due to high bandwidth of network, the workload of system may be increased. So, we need to consider high processing power by using multi processors in system design. There are two approaches, partitioning and global, for Multiprocessor real-time scheduling, which is more complex than uniprocessor scheduling. Since partitioning approach is simple and has lower overhead, we choose it as a base of our multiprocessor real-time scheduling. With regard to non-optimality of simple partitioning algorithm, we tried to improve the algorithm by breaking some queries and executing them on multiple processors. The proposed approach improves the performance of system in comparison with simple partitioning algorithm. Although this approach uses more memory than simple partitioning, it improves performance in deadline miss ratio, tuple loss, and throughput.
References 1. Babu, S., Widom, J.: Continuous queries over data streams. ACM SIGMOD Record 30(3) (September 2001) 2. Arasu, A., Babcock, B., Babu, S., Cieslewicz, J., Datar, M., Ito, K., Motwani, R., Srivastava, U., Widom, J.: STREAM: The Stanford Stream Data Manager. In: Proc. of ACM SIGMOD, USA (2003) 3. Abadi, D., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Erwin, C., Galvez, E., Hatoun, M., Hwang, J., Maskey, A., Rasin, A., Singer, A., Stonebraker, M., Tatbul, N., Xing, Y., Yan, R., Zdonik, S.: Aurora: A Data Stream Management System. In: ACM SIGMOD Conference (2003) 4. Wei, Y., Son, S.H., Stankovic, J.A.: RTSTREAM: real-time query processing for data streams. Object and Component-Oriented Real-Time Distributed Computing (2006) 5. Carpenter, J., Funk, S., Holman, P., Srinivasan, A., Anderson, J., Baruah, S.: A categorization of real-time multiprocessor scheduling problems and algorithms. In: Handbook on Scheduling Algorithms, Methods, and Models, pp. 30.1–30.19 (2004) 6. Stankovic, J.A., Son, S., Hansson, J.: Misconceptions About Real-Time Databases. IEEE Computer 32, 29–36 (1998)
PDMRTS: Multiprocessor Real-Time Scheduling Considering Process Distribution
179
7. Safaei, A.A., Haghjoo, M.S.: Parallel Processing of Continuous Queries over Data Streams. Distributed and Parallel Databases 28(2-3), 93–118 (2010) 8. Safaei, A.A., Haghjoo, M.S.: Dispatching of Stream Operators in Parallel Execution of Continuous Queries. Submitted to the Journal of Scheduling (June 2010) 9. Li, X., Wang, H.: Adaptive Real-time Query Scheduling over Data Streams. In: VLDB 2007 (September 2007) 10. Schmidt, S., Berthold, H., Lehner, W.: Qstream: Deterministic querying of data streams. In: Proc. of International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, August 30 - September 3, pp. 1365–1368 (2004) 11. Schmidt, S., Legler, T., Lehner, W.: Robust Real-time Query Processing with Qstream. In: Proceedings of the 31st VLDB Conference, Trondheim, Norway (2005) 12. Alemi, M.: Real-time Task Scheduling in Data Stream Management System. MSc. Thesis, Iran University of Science and Technology (2011) 13. Alemi, M., Haghjoo, M.S., Safaei, A.S.: Multiprocessor Real-Time Scheduling in Data Stream Management Systems. In: Third National Conference in Iran (2011) (in Persian) 14. Internet Traffic Archive, http://www.acm.org/sigcomm/ITA/
PFGN: A Hybrid Multiprocessor Real-Time Scheduling Algorithm for Data Stream Management Systems Ali A. Safaei, Mostafa S. Haghjoo, and Fatemeh Abdi Department of Computer Engineering, Iran University of Science and Technology Tehran, Iran {safaeei,haghjoom,abdi}@iust.ac.ir
Abstract. In many of recent applications data are received as infinite, continuous, rapid and time varying data streams. Real-time processing of queries over such streams is essential in most of the applications. Single processor systems are not capable to provide the desired speed to be real-time. Parallelism over multiprocessors can be used to handle this deficit. In such a system, a multiprocessor real-time scheduling algorithm must be used. Generally, multiprocessor real-time scheduling algorithms fall into two approaches: Partitioning or Global. The partitioning approach has acceptable overhead but can NOT be optimal. The global approach can be but it has considerable overheads. In this paper, a multiprocessor real-time scheduling algorithm for a DSMS is proposed that employs hybrid approach. It is shown that it is optimal while has minimum overheads. Also, simulation results illustrate that the proposed hybrid multiprocessor real-time scheduling algorithm outperforms algorithms that use either portioning approach or global approach. Keywords: hybrid real-time scheduling, data stream, multiprocessor systems, partitioning and global scheduling approaches.
1 Introduction A real-time system has two notions of correctness: logical and temporal. In particular, in addition to producing correct outputs (logical correctness), such a system needs to ensure that these outputs are produced at the correct time (temporal correctness). Real-Time Databases (RTDBs) also focus on data freshness (validation of data in its usage time interval) as well as timeliness (satisfying deadlines assigned to the transactions) [19]. In real-world, they are generally soft real-time [20-24] and most often are implemented as main memory databases [25-27]. In data stream systems data is received as continuous, infinite, rapid, bursty, unpredictable and time-varying sequence. In most data stream applications, real-time functionality is emphasized as a major characteristic of Data Stream Management System (DSMS) [28]. Traffic control and healthcare systems are such applications. In [2] eight misconceptions in real-time databases are discussed. One of the most common and important misconceptions is: “real-time computing is equivalent to fast computing.” In fact, fast processing does NOT guarantee time constraints. In other H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 180–192, 2011. © Springer-Verlag Berlin Heidelberg 2011
PFGN: A Hybrid Multiprocessor Real-Time Scheduling Algorithm
181
words, although being fast is necessary but is not sufficient. For a real-time system, there is a need for other mechanisms (real-time scheduling, feedback control, etc) to handle and satisfy time constraints. Stonebraker et. al. in [29] introduced eight requirements for real-time processing of data streams. Two key requirements are “fast operation” and “automatic and transparent distribution of processing over multiple processors and machines”. The requirements raise from the fact that single processor DSMSs are not capable of processing huge volume of input streams and cannot execute query operators continuously over them with satisfactory speed [14] [30]. Parallelism in query processing over multiple processors is a solution for this bottleneck [14][15]. Despite fast operation is a necessary condition for a real-time system, but it is not sufficient; selecting appropriate methods for scheduling activities is one of the most important considerations in the design of a real-time system. Due to employing multiprocessors to achieve fast operation, multiprocessor real-time scheduling algorithms should be applied. Multiprocessor real-time scheduling algorithms are significantly more complex than the uniprocessor ones, since the scheduling algorithm must not only specify an execution ordering of tasks (scheduling problem on each processor), but also determine the specific processor on which each task must execute assignment problem.
(a)
(b) Fig. 1. Multiprocessor real-time scheduling approaches (a) The partitioning approach (b) The global approach
Traditionally, there have been two approaches for scheduling real-time tasks on multiprocessor systems: Partitioning and Global scheduling. In the partitioning approach, each processor has its own task waiting queue. The set of tasks is partitioned and each task is assigned to the proper processor (task waiting queue) according to the allocation algorithm. Each processor executes tasks in its task waiting queue according to
182
A.A. Safaei, M.S. Haghjoo, and F. Abdi
its real-time scheduling policy (figure 1.(a)). In the global approach, each task can be executed over all processors. In fact, a task which is started in a processor can migrate to any other processor to be continued (figure 1.(b)) [1]. Generally, online real-time scheduling in multiprocessor systems is a NP-hard problem [24]. The partitioning approach may not be optimal but is suitable for real-time DSMS because: (1) Independent real-time scheduling policies can be employed for each task queue. Therefore, the multiprocessor real-time scheduling problem is simplified to single processor real-time scheduling. (2) It has low run-time overhead which helps for better performance [1]. The global approach has the ability to provide optimal scheduling due the migration capability, but has considerable overhead. Furthermore, to have the optimal schedule, some preconditions must be held which is not possible in all applications [1]. Roughly speaking, partitioning approach is more profitable when tasks set is static and predefined whilst global approach is proper for dynamic task systems [2]. To handle the deficits of the two existing approaches, in [16], we have proposed a hybrid multiprocessor real-time scheduling approach which outperforms the two approaches, partitioning and global. The hybrid multiprocessor real-time scheduling approach proposed in [16] is summarily described in the background (section 2). In this paper, a multiprocessor real-time scheduling algorithm for processing realtime queries in a DSMS is presented that is based on the hybrid multiprocessor realtime scheduling approach we have proposed in [16]. This paper is continued as follows: a background on the hybrid multiprocessor real-time scheduling approach is studied in section 2. Our proposed hybrid multiprocessor real-time scheduling algorithm for scheduling real-time data stream queries is presented in section 3. Performance evaluation and simulation results of the presented algorithm are presented in section 4. Finally, we present related work in section 5 and conclude in section 6.
2 Background In [16], a hybrid approach for multiprocessor real-time scheduling is proposed in which the benefits and advantages of the two existing multiprocessor real-time scheduling approaches (i.e., partitioning and global) are employed. The main objectives of the proposed hybrid approach are as followings: -
Optimality: maximizing task system’s scheduleability w.r.t. processors utilization capacity. Lightweightness: minimizing scheduling overheads for underlying system.
In order to propose new hybrid multiprocessor real-time scheduling approach to satisfy the desired objectives, different paradigms which could be conceived via combing the two existing approaches are considered and analyzed in [16]. These paradigms can be considered from different perspectives. In one of the most important view points, precedence of employing each of the two well-known multiprocessor real-time scheduling approaches is important.
PFGN: A Hybrid Multiprocessor Real-Time Scheduling Algorithm
183
According to the precedence (time) of applying an approach rather than the other, combination methods to have a hybrid approach is classified as follows: I) Serial Employment The two existing multiprocessor real-time scheduling approaches are employed sequentially and isolated (i.e., the second one will be started whenever the first one completed) as shown in figure 2. Change-point First approach
Second approach
time
Fig. 2. Serial employment of multiprocessor real-time scheduling approaches
II) Concurrent Employment The two existing multiprocessor real-time scheduling approaches are employed concurrently over the scheduling time, as shown in figure 3: First approach
second approach
second approach
First approach
second approach
... time
Fig. 3. An example of concurrent employment of multiprocessor real-time scheduling approaches
Analogous to serial and concurrent scheduling of transactions, the serial method of employment is simpler and easier to implement and manage, while the concurrent method which is more complicated, improves system performance. In the serial method, w.r.t. the existence of two approaches for multiprocessor realtime scheduling (partitioning and global), the two following cases are conceivable for serial employment method: a) Partitioning-First In the partitioning-first method, first, tasks of the task system are assigned and bound to the processors via the partitioning approach until the situation in which no more task can be assigned to none of the processors. Change-point Partitioning approach
Global approach
time
|1 Change-point condition: all of the tasks (in the task system ) before task k are assigned and bound to one of the processors and weight of task k is greater than remained utilization capacity of all of the processors.
184
A.A. Safaei, M.S. Haghjoo, and F. Abdi 1
, 1
max
:
In which M is number of processors, indicates weight of task k (i.e., , of processor .
(I)
means that tasks is bound to processor denotes remained utilization ) and
In such a situation, the global approach will be used and the remained tasks can migrate among the processors to be completed. b) Global-First Since breaking down and migration of the tasks can be possible forever (while processors’ utilization capacity is not completely full), this method is contrary with its definition; because we can use the first approach (i.e., global) forever whilst the second one never is used. Albeit, the second approach can be used if we modify the definition of change-point (condition for starting the second approach). But, where is the proper choice for the change-point? The later the change-point, the more the system overhead (i.e., more usage of the global approach). Change-point
Global approach
Partitioning approach
time
Based on discussion proposed in [16], the partitioning-first paradigm seems to be better. Besides, theorems 1 and 2 show that the partitioning-first paradigm satisfies the objectives of the hybrid approach (i.e., optimality and lightweightness). So, it can be deduced that the desired hybrid multiprocessor real-time approach is the partitioning-first approach. A high-level description of the partitioning-first hybrid multiprocessor real-time scheduling approach is as follow: 1.
2.
Partitioning-First hybrid multiprocessor real-time scheduling approach Partitioning approach phase: all of the tasks in the task system are assigned and bound to the specific processors. each processor schedules the tasks waiting in its waiting queue. these will be continued until the change-point condition (equation (I)) holds. Global approach phase: after this, the remained tasks in the task system (or even the tasks that are entered the system newly (dynamic task system)) are considered as tasks that must be scheduled on set of processors which their utilization capacity is reduced (updated with their remained utilization capacity as equation (II)): 1
,1
:
∑
(II)
employing the global approach (migration is allowed) using the proper scheduling algorithm.
3 The Proposed Hybrid Multiprocessor Real-Time Scheduling Algorithm As discussed in [16], among the different paradigms which are conceivable for providing a hybrid multiprocessor real-time scheduling approaches via combining the two wellknown approaches (i.e., partitioning and global), the Partitioning-First paradigm satisfies
PFGN: A Hybrid Multiprocessor Real-Time Scheduling Algorithm
185
the desired requirements and objectives. Accordingly, the proposed hybrid multiprocessor real-time scheduling algorithm for real-time DSMSs is based on the Partitioning-First approach in which the best algorithms of each of the partitioning and global approaches are employed. Due to the continuous nature of input data streams as well as queries in data stream system, continuous and dynamic behavior of a DSMS is very challenging. So, the real-time scheduling algorithm also must consider the dynamic situations such as ad-hoc queries of query set which are not pre-defined. For pre-defined queries (either one-time or continuous queries) [28], off-line scheduling is performed over the batch of queries statically. Moreover, for ad-hoc queries that will arrive later, or existing queries which leave the system, dynamic scheduling is performed via reweighting (updating utilization capacity of processors) and migrating queries among the processors. As stated before, roughly speaking, partitioning approach is more profitable when task set is static and predefined whilst global approach is proper for dynamic task systems. So, the Partitioning-First approach which the proposed algorithm is based on will also use partitioning approach for statically predefined queries and the global approach for the dynamic ad-hoc data steam queries. With respect to this fact that migrating queries has many overheads for underlying system, in order to minimize the overheads, we should use the partitioning approach as much as we can. Therefore, the heavier queries (queries with higher weights) should be scheduled first via the partitioning approach. Also, the multiprocessor real-time scheduling algorithms which are employed in each of the phases of the Partitioning-First hybrid approach should be the best ones. In general, the best solution for multiprocessing real-time scheduling based on partitioning approach is FF+EDF [12] [2]. In this method, the First-Fit [2] algorithm is used for allocating the queries to each of the processors and the EDF [2] as real-time scheduling algorithm for each of the processors. Also, among several PFair algorithms that are based on the global approach (e.g., PF [31] and PD [32]), the PD2 [7] algorithm is the efficient optimal PFair algorithm. PFGN ( ) { ); 1. Sort the pre-defined queries in in descending order of their weights (i.e., //the partitioning approach phase 2. Repeat { 3. Starting from the first query in this ordered list, allocate it to the proper processor via the First-Fit algorithm (w.r.t. query's weight and processors' utilization capacity); 4. Each single processor selects a query from its allocated queries waiting queue to process (via the EDF policy); 5. update utilization capacity of the processors w.r.t. equation (II); max , 1 : ) //situation in which no more task 6. } Until ( 1 //can be assigned a processor //the global approach phase 7. For (the query tk which could not be allocated to any of the processor and also for ad-hoc queries that would be arrived later){ 8. use the PD2 algorithm for scheduling it via migrating among the processors; 9. If (there is no more query to schedule OR all of the processors are full) 10. Break; 11. } 12. } 13. End.
186
A.A. Safaei, M.S. Haghjoo, and F. Abdi
According to the above considerations, pseudo code of the proposed Hybrid MultiProcessor Real-Time Scheduling algorithm (named PFGN1) for real-time queries in a Data Stream Management System is as following ( is the query system):
4 Performance Evaluation 4.1 Experimental Setup We implemented a real-time DSMS prototype (QRS) [12] with the C in Linux environment on a machine with Core i7 2930 processor and 3GB RAM [13]. Each logical machine of parallel query processing engine is considered as a core of the multi-core CPU. The proposed Hybrid multiprocessor real-time scheduling algorithm for scheduling data stream queries (named PFGN) is evaluated and compared with the best algorithm for partitioning and global multiprocessor teal-time scheduling algorithms, FF+EDF [2][12] and PD2 [7], named FF+EDF and PD2, respectively. The input data is from Internet Traffic Archive (DEC-PKT) [33]. Two data streams, one for TCP packets and the other for UDP packets are used. The former has 5 attributes source address, destination address, source port, destination port and packet length, whilst the later has only the first 4 attributes. The task system consists of 15 queries each with the utilization (weight i.e., ) of 0.4. M (number of processors) is equal to 4. Simulation duration is 1e+10 seconds and average values for 12 different execution of this scenario is measured and compared. Deadline and period of queries are set as estimated query execution time (1 10). The most important evaluated parameters are: •
DMR: deadline miss ratio according to equation (III), which is the most important parameter for a real-time system. (III) In fact, DMR of a system illustrates schedulability of the employed scheduling algorithm.
•
Throughput: number of queries executed in a time unit.
Overheads such as communication or context-switching are negligible because the employed machines are cores of a multi-core CPU. 4.2 Experimental Results Experimental results are shown in the following figures. As expected, the global approach has better performance compared to the partitioning approach; also, as shown in figures 4 and 5, the proposed hybrid multiprocessor real-time scheduling approach has a considerable improvement in terms of deadline miss ratio and system throughput rather 1
Partitioning-First, Global-Next.
PFGN: A Hybrid Multiprocessor Real-Time Scheduling Algorithm
187
than the existing approaches, partitioning and global. To have a comparison at a glance, the average value of the two parameters are computed and illustrated in figures 6 and 7. Figures 4 and 5 illustrate comparison of deadline miss ratio and system throughput, respectively between the proposed hybrid multiprocessor real-time scheduling approach and the two existing approaches, partitioning and global.
Fig. 4. Comparison of deadline miss ratio
Fig. 5. Comparison of system throughput
As shown in figures 4 and 5, the FF+EDF algorithm which is based on the partitioning approach has the worst performance nearly in all of the cases, as expected. The proposed hybrid multiprocessor real-time scheduling algorithm (i.e., PFGN) competes with the PD2 algorithm in terms of deadline miss ratio and system throughput. This competition has some properties: both of them have performance improvements by lapse of time. None of them are definitely better than the other. So, average values of these parameters are compared for the three algorithms in figures 6 and 7.
188
A.A. Safaei, M.S. Haghjoo, and F. Abdi
Fig. 6. Comparison of deadline miss ratio in average case
Fig. 7. Comparison of system throughput in average case
According to the results shown in figures 6 and 7, it is clear that the proposed hybrid multiprocessor real-time scheduling algorithm (PFGN) outperforms even the PD2 algorithm (which is based on the global approach) in terms of deadline miss ratio, the most important parameter for a real-time system. Also, the proposed algorithm has nearly the same throughput as the PD2 algorithm. The FF+EDF (the partitioning approach) is generally the worst and also is a very week in term of system throughput. Totally, the proposed hybrid multiprocessor real-time scheduling algorithm (PFGN) performs better than the two other algorithms which are based even on the partitioning or the global multiprocessor real-time scheduling approach.
5 Related Work A considerable research activity pertains to stream systems [34]. Real-time query processing is essential in most data stream applications (e.g., surveillance, healthcare or network monitoring) [34]. Although a number of DSMS prototypes have been developed including STREAM [35] and Aurora [36], but none of them satisfy realtime requirements.
PFGN: A Hybrid Multiprocessor Real-Time Scheduling Algorithm
189
In [3] eight misconceptions in real-time databases are discussed. One of the most common and important misconceptions is: “real-time computing is equivalent to fast computing.” In fact, fast processing does NOT guarantee time constraints. In other words, although being fast is necessary but is not sufficient. For a real-time system, there is a need for other mechanisms (real-time scheduling, feedback control, etc) to handle and satisfy time constraints. To meet fast operation, many of real-time systems are multiprocessor systems. On the other hand, the main contribution in a real-time system design is its real-time scheduling. History of important events and key results in real-time scheduling is reviewed in [4]. Multiprocessor real-time scheduling which is totally different from traditional single processor real-time scheduling is classified into two approaches: global and partitioning. Problems and algorithms related to these approaches are discussed in [2]. Despite optimality of PFair scheduling algorithms (such as PF [31], PD[32] and PD2[7]), partitioning is currently favored [8]. The reasons are: (a) PFair scheduling algorithms have excessive overhead due to frequent preemptions and migrations (b) PFair scheduling are limited to periodic tasks (c) though partitioning approaches are not theoretically optimal, they tend to perform well in practice [8]. Utilization bound of EDF scheduling policy with partitioning approach in multiprocessor system is increased in comparison with single processor systems. Utilization bound in these environments depends on the employed allocation algorithm and task size. Utilization bound of EDF for multiprocessor systems with extended and complex task model (e.g., resource sharing, jitter of task release, deadlines less than period, aperiodic and non-preemptive tasks) is studied in [9]. In order to schedule soft deadline tasks in multiprocessor systems efficiently, PFair scheduling algorithm (known as optimal for hard real-time applications) is extended in [55]. This extension (known as EPDF PFair) considers tardiness bound and uses the global approach as well as PFair. In [11] supertasking is proposed to improve processor utilization in multiprocessor real-time systems. In this scheme, a set of tasks, called component tasks, is assigned to a server task, called a supertask, which is then scheduled as an ordinary Pfair task. Whenever a supertask is scheduled, its processor time is allocated to its component tasks according to an internal scheduling algorithm. In [37] a scale for timestamp, known as tick, is considered and tick-based scheduling is proposed. Its query model is limited to continuous query and deadline is assumed as toleratable upper bound of tuple latency. Generally speaking, the assumptions and mechanisms do not match real-world considerations of a real-time DSMS. Also in [38], data stream enrichment by adding a type of metadata (called tick-tag) to data elements is argued. Tags are not necessarily selected from a specific and controlled vocabulary and can be added whenever needed. Processing and managing tags while providing the desired flexibility is difficult. In fact, this approach is an extension of punctuations suitable for XML data stream. It seems to be applicable for customized real-time monitoring applications.
6 Conclusion Real-time requirements are essential in most data stream applications such as traffic control, surveillance systems and health monitoring. Most often, a single processor
190
A.A. Safaei, M.S. Haghjoo, and F. Abdi
DSMS is not capable of processing query's operators continuously over infinite, continuous and rapid data stream tuples with a satisfactory speed. Parallel processing of queries in a multiprocessing environment is a solution for this shortcoming. Fast operation is a necessary condition for each real-time system but is not sufficient. Real-time scheduling is the most essential parts of a real-time system. Multiprocessing real-time scheduling algorithms are different and more complex that the single processor ones. Generally, there are two approaches for multiprocessor real-time scheduling, partitioning and global. Although the partitioning approach provides an acceptable overhead for the underlying system but it doesn’t guarantee to be optimal. The global approach can provide this guarantee but it needs some preconditions to be hold; also, the most important deficit of the global approach is its considerable overheads. A hybrid multiprocessor real-time scheduling approach which uses the partitioning approach as much as possible first and then uses the global approach is shown to be optimal while has a toleratable overhead for underlying system. In this paper, a hybrid multiprocessor real-time scheduling algorithm for scheduling real-time queries in a DSMS is proposed. The proposed algorithm is based on the hybrid approach proposed in [16] which schedules the tasks in descending order of their weights via the partitioning approach until no more task an be allocated to a single processor; it then schedule the remained queries via migration among processors (i.e., employing the global approach). The best partitioning and global multiprocessor real-time scheduling algorithms (i.e., FF+EDF and PD2, respectively) are used in each of these two phases. It is shown it [16] that such an algorithm which is based on the Partitioning-First hybrid approach is optimal while has minimum overheads for underlying system. Also, simulation results illustrate that the proposed hybrid multiprocessor real-time scheduling algorithm (named PFGN) outperforms algorithms that use either portioning approach (FF+EDF) or global approach (PD2) in terms of deadline miss ratio and system throughput.
References [1] Holman, P., Anderson, J.: Group-based Pfair Scheduling. Real-Time Systems 32(1-2), 125–168 (2006) [2] Carpenter, J., et al.: A Categorization of Real-time Multiprocessor Scheduling Problems and Algorithms. In: Handbook on Scheduling: Algorithms, Models and Performance Analysis (2004) [3] Stankovic, J.A., et al.: Misconceptions About Real-Time Databases. Journal of Computer 32(6) (June 1999) [4] Sha, L., et al.: Real Time Scheduling Theory: A Historical Perspective. Real-Time Systems 28, 101–155 (2004) [5] Baruah, N., et al.: Proportionate progress: A notion of fairness in resource allocation. Algorithmica 15, 600–625 (1996)
PFGN: A Hybrid Multiprocessor Real-Time Scheduling Algorithm
191
[6] Baruah, S., Gehrke, J., Plaxton, C.: Fast scheduling of periodic tasks on multiple resources. In: Proceedings of the 9th International Parallel Processing Symposium, pp. 280–288 (April 1995) [7] Anderson, J., Srinivasan, A.: Mixed Pfair/ERfair Scheduling of Asynchronous Periodic Tasks. Journal of Computer and System Sciences 68(1), 157–204 (2004) [8] Srinivasan, A.: Effcient and Flexible Fair Scheduling of Real-time Tasks on Multiprocessors., Ph. D. thesis, University of North Carolina at Chapel Hill (2003) [9] Lopez, J., Garcia, M., Diaz, J., Garcia, D.: Worst-case utilization bound for EDF scheduling on real-time multiprocessor systems. In: Proceedings of the 12th Euromicro Conference on Real-time Systems, pp. 25–33 (June 2000) [10] Srinivasan, A., Anderson, J.H.: Efficient Scheduling of Soft Real-time Applications on Multiprocessors. Journal of Embedded Computing 1(3) (June 2004) [11] Holman, P., Anderson, J.H.: Using Supertasks to Improve Processor Utilization in Multiprocessor Real-time Systems. In: 15th Euromicro Conference on Multiprocessor Real-Time Systems, ECRTS (2003) [12] Safaei, A., et al.: QRS: A Quick Real-Time Stream Management System. Submitted to Journal of Real-Time Systems (November 2010) [13] Alemi, M.: mplementation of a Real-Time DSMS prototype, M. Sc. Thesis, Iran University of Science and Technology (2010) [14] Safaei, A., Haghjoo, M.S.: Parallel Processing of Continuous Queries over Data Streams. Distributed and Parallel Databases 28(2-3), 93–118 (2010) [15] Safaei, A., Haghjoo, M.S.: Dispatching of Stream Operators in Parallel Execution of Continuous Queries. Submitted to the Journal of Supercomputing (July 2010) [16] Safaei, A., et al.: Hybrid Multiprocessor Real-Time Scheduling Approach. International Journal of Computer Science Issues, 8(2) (2011) [17] Ghalambor, M., Safaeei, A.A.: DSMS scheduling regarding complex QoS metrics. In: IEEE/ACS International Conference on Computer Systems and Applications (AICCSA), May10-13 (2009) [18] Safaei, A., et al.: Using Finite State Machines in Processing Continuous Queries. International Review on Computers and Software 4(5) (September 2009) [19] Ramamritham, K., Son, S.H., DiPippo, L.C.: Real-Time Databases and Data Services. The Journal of Real-Time Systems 28(2-3), 179–215 (2004) [20] Haritsa, J., et al.: Data Access Scheduling in Firm Real-Time Database Systems. The Journal of Real-Time Systems 4, 203–241 (1992) [21] Schmidt, S., et al.: Real-time Scheduling for Data Stream Management Systems. In: Proceedings of the 17th Euromicro Conference on Real-Time Systems, ECRTS 2005 (2005) [22] Graham, M. H.: Issues In Real-Time Data Management, Technical Report, Software Engineering Institute, Carnegie Mellon University Pittsburgh, Pennsylvania (July 1991) [23] Kang, K.D., Son, S., Stankovic, J.: Specifying and Managing Quality of Real-Time Data Services. IEEE TKDE, University of Virginia (2004) [24] Aldarmi, S.A.: Real-time database systems: concepts and design. Department of Computer Science, University of York (1998) [25] Garcia-Molina, H., Salem, K.: Main Memory Database Systems: An Overview. IEEE Transactions on Knowledge and Data Engineering 4(6) (December 1992) [26] Adelberg, B.S.: Strip: A Soft Real-Time Main Memory Database for Open Systems, PhD. Thesis, Stanford university (1997) [27] Gruenwald, L., Liu, S.: A performance study of concurrency control in a real-time main memory database system. ACM SIGMOD Record 22(4) (December 1993)
192
A.A. Safaei, M.S. Haghjoo, and F. Abdi
[28] Babcock, B., et al.: Models and issues in data stream systems. In: Proceedings of the Twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (2003) [29] Stonebraker, M., et al.: The 8 Requirements of Real-Time Stream Processing. SIGMOD Records 34(4) (December 2005) [30] Johnson, T., et al.: Query-Aware Partitioning for Monitoring Massive Network Data Streams. In: Proc. Of SIGMOD (2008) [31] Kang, K.D., Son, S., Stankovic, J.: Specifying and Managing Quality of Real-Time Data Services. IEEE TKDE, University of Virginia (2004) [32] Aldarmi, S.A.: Real-time database systems: concepts and design. Department of Computer Science, University of York (1998) [33] The internet traffic archive, http://ita.ee.lbl.gov/html/contrib/DEC-PKT.html (last accessed on January 2011) [34] Babcock, B., et al.: Models and Issues in Data Stream Systems. In: Invited paper in Proc. Of PODS (June 2002) [35] The STREAM Group. STREAM: The Stanford Stream Data Manager. IEEE Data Engineering Bulletin (March 2003) [36] Abadi, et al.: Aurora: A New Model and Architecture for Data Stream Management. VLDB Journal 2(12), 120–139 (2003) [37] Ou, Z., Yu, G., Yu, Y., Wu, S., Yang, X., Deng, Q.: Tick Scheduling: A Deadline Based Optimal Task Scheduling Approach for Real-Time Data Stream Systems. In: Fan, W., Wu, Z., Yang, J. (eds.) WAIM 2005. LNCS, vol. 3739, pp. 725–730. Springer, Heidelberg (2005) [38] Nehme, R.V., et al.: Tagging Stream Data for Rich Real-Time Services. In: Proc. of VLDB (2009)
Detecting Cycles in Graphs Using Parallel Capabilities of GPU Fahad Mahdi1, Maytham Safar1, and Khaled Mahdi2 1
Computer Engineering Department Kuwait University, Kuwait 2 Chemical Engineering Department Kuwait University, Kuwait [email protected], [email protected], [email protected]
Abstract. We present an approximation algorithm for detecting the number of cycles in an undirected graph, using CUDA (Compute Unified Device Architecture) technology from NVIDIA and utilizing the massively parallel multi-threaded processor; GPU (Graphics Processing Unit). Although the cycle detection is an NP-complete problem, this work reduces the execution time and the consumption of hardware resources with only a commodity GPU, such that the algorithm makes a substantial difference compared to the serial counterpart. The idea is to convert the cycle detection from adjacency matrix/list view of the graph, applying DFS (Depth First Search) to a mathematical model so that each thread in the GPU will execute a simple computation procedures and a finite number of loops in a polynomial time. The algorithm is composed of two phases, the first phase is to create a unique number of combinations of the cycle length using combinatorial mathematics. The second phase is to approximate the number of swaps (permutations) for each thread to check the possibility of cycle. An experiment was conducted to compare the results of our algorithm with the results of another algorithm based on the Donald Johnson backtracking algorithm. Keywords: graph cycles, GPU programming, CUDA Parallel algorithms.
1 Introduction The communication field is one of the largest fields in engineering, and it is growing faster than any other field. LANs, E-mail, chatting, peer-to-peer networks and even friends websites, all these facilities and many others were created to facilitate the communications between two or more users and appease the user’s sociality. Sociality is the most uniqueness of the human being. This uniqueness implied an important question that was considered centuries ago, how do people communicate? What are the rules that really control these communications? Half century ago, it was almost impossible to provide qualitative or quantitative assessment to help providing accurate answers to these questions. However, as the population of the world is growing, the importance of answering this question is increasing [1][2][3]. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 193–205, 2011. © Springer-Verlag Berlin Heidelberg 2011
194
F. Mahdi, M. Safar, and K. Mahdi
Social Network is a graph that represents each actor in the community as a vertex, and the relations between actors as an edge. Social networks are how any community is modeled. The social network model helps the study of the community behavior and thus leveraging social network to researchers demands [1][2] numerous applications of social networks modeling are found in the literature. An essential characteristic of any network is its resilience to failures or attacks, or what is known as the robustness of a network [9]. The definition of a robust network is rather debatable. One interpretation of a robust network assumes that social links connecting people together can experience dynamic changes, as is the case with many friendship networks such as Facebook, Hi5. Individuals can delete a friend or add a new one without constraints. Other networks have rigid links that are not allowed to experience changes with time such in strong family network. Entropy of a network is proven to be a quantitative measure of its robustness. Therefore, the maximization of networks entropy is equivalent to the optimization of its robustness [9]. In a social network, there are several choices that define the state of the network; one is the number of social links associated with a social actor, known as degree. This definition is commonly used by almost all researchers [9][10][11]. Characterization of a social network through the degree leads to different non-universal forms of distribution. For instance, a random network has a Poisson distribution of the degree. A Small-World network has generalized binomial distribution. Scale-Free networks have a power law distribution form [10]. There is no universality class reported. Instead of nodes degree, the cycle’s degree probability [12][13] is a universal distribution form that is applicable for all social networks by using a different definition of the state of the network. Cycles were one of the major concerns in the social network field. In [14], they mentioned that the cycles could be considered as the major aspect that can separate the network to sub-networks or components. Other researches proved that there is a strong relation between the structural balance of a social network and the cycles in the network. Balancing in social network is the set of rules that defines the normal relations between the clients. The problem of finding cycles in graphs has been of interest to computer science researchers lately due to its challenging time/space complexity. Even though the exhaustive enumeration technique, used by smart algorithms proposed in earlier researches, are restricted to small graphs as the number of loops grows exponentially with the size of the graph. Therefore, it is believed that it is unlikely to find an exact and efficient algorithm for counting cycles. Counting cycles in a graph is an NP-Complete problem. Therefore, this paper will present an approximated algorithm that counts the number of cycles. Our approximated approach is to design a parallel algorithm utilizing GPU capabilities using NVIDIA CUDA [21] technology.
2 Related Work Existing algorithms for counting the number of cycles in a given graph, are all utilizing the CPU approach. The algorithm in [15] starts by running DFS (Depth First Search) algorithm on a randomly selected vertex in the graph. During DFS, when discovering an adjacent
Detecting Cycles in Graphs Using Parallel Capabilities of GPU
195
vertex to go deeper in the graph if this adjacent vertex is gray colored (i.e. visited before) then this means that a cycle is discovered. The edge between the current vertex (i) and the discovered gray vertex (j) is called a back edge (i,j) and stored in an array to be used later for forming the discovered cycles. When DFS is finished, the algorithm will perform a loop on the array that stores the discovered back edges to form the unique cycles. The cycles will be formed out of discovered back edges by adding to the back edge all edges that form a route from vertex (j) to vertex (i). The algorithm in [16] is an approximation algorithm that has proven its efficiency in estimating large number of cycles in polynomial time when applied to real world networks. It is based on transferring the cycle count problem into statistical mechanics model to perform the required calculation. The algorithm counts the number of cycles in random, sparse graphs as a function of their length. Although the algorithm has proven its efficiency when it comes to real world networks, the result is not guaranteed for generic graphs. The algorithm in [17] is based on backtracking with the look ahead technique. It assumes that all vertices are numbered and starts with vertex s. The algorithm finds the elementary paths, which start at s and contain vertices greater than s. The algorithm repeats this operation for all vertices in the graph. The algorithm uses a stack in order to track visited vertices. The advantage of this algorithm is that it guarantees finding an exact solution for the problem. The time complexity of this algorithm is measured as O ((V+E)(C+1)). Where, V is the number of vertices, E is the number of edges and C is the number of cycles. The time bound of the algorithm depends on the number of cycles, which grows exponentially in real life networks. The algorithm in [18] presented an algorithm based on cycle vector space methods. A vector space that contains all cycles and union of disjoint cycles is formed using the spanning of the graph. Then, vector math operations are applied to find all cycles. This algorithm is slow since it investigates all vectors and only a small portion of them could be cycles. The algorithm in [19] is DFS-XOR (exclusive OR) based on the fact that small cycles can be joined together to form bigger cycle DFS-XOR algorithm has an advantage over the algorithm in [16] in the sense that it guarantees the correctness of the results for all graph types. The advantage of the DFS-XOR approximation algorithm over the algorithm in [17] is that it is more time efficient when it comes to real life problems of counting cycles in a graph because its complexity is not depending on the factor of number of cycles.
3 Overview of GPU and CUDA GPU (Graphics Processing Unit) [22] [23] [27] is a specialized microprocessor that offloads and accelerates graphics render from the CPU. GPU can perform complex operations much faster than CPU. The design philosophy of GPU is to have massively parallel multi-threaded processor where millions of threads executes on a set of stream processors (minimum of 32 and above) and dedicated device memory (DRAM). CUDA [22] [23] [26] is an extension to C based on a few easily-learned abstractions for parallel programming, coprocessor offload, and a few corresponding
196
F. Mahdi, M. Safar, and K. Mahdi
additions to C syntax. CUDA represents the coprocessor as a device that can run a large number of threads. The threads are managed by representing parallel tasks as kernels (the sequence of work to be done in each thread) mapped over a domain (the set of threads to be invoked). Kernels are scalar and represent the work to be done at a single point in the domain. The kernel is then invoked as a thread at every point in the domain. The parallel threads share memory and synchronize using barriers. Data is prepared for processing on the GPU by copying it to the graphics board's memory. Data transfer is performed using DMA and can take place concurrently with kernel processing. Once written, data on the GPU is persistent unless it is de-allocated or overwritten, remaining available for subsequent kernels
4 Thread-Based Cycle Detection Algorithm High Level Design SPMD (Single Program Multiple Data), an approach that fits well in cases where the same algorithm runs against different sets of data. Cycle count problem can be viewed as systematic view of inspecting all possible paths using different set of vertices. A typical implementation of SPMD is to develop a massively threaded application. The thread based solution of the cycle count can be modeled as algorithm code (SP) plus set of vertices (MD). Cycle Count Problem for small graph sizes fits well to CUDA programming model. We shall create N threads that can check N possible combinations for a cycle in parallel, provided that there is no inter-thread communication is needed to achieve highest level of parallelism and avoid any kind of thread dependency that degrade the performance of the parallel applications. The main idea of the thread-based cycle detection algorithm is to convert the nature of the cycle detection problem from adjacency matrix/list view of the graph, applying DFS or any brute force steps on set of vertices to a mathematical (numerical) model so that each thread in the GPU will execute a simple computation procedures and a finite number of loops (|V| bounded) in a polynomial time (thread kernel function time complexity). The algorithm is composed of two phases, the first phase is to create a unique number of combinations of size C (where C is the cycle length) out of a number of vertices |V| using CUDA parallel programming model. Each thread in the GPU device will be assigned to one of the possible combinations. Each thread will create its own set of vertices denoted as combination row (set) by knowing its thread ID. Then each thread examines the cycle existence of combination row vertices to see if they form a cycle or not regardless of the vertices order in the set by using a technique called “virtual adjacency matrix” test. The second phase is to approximate the number of swaps (permutations) for each thread vertices to check other possibilities of cycle occurrence. The following are the main execution steps of the thread-based cycle detection algorithm, detailed explanation of the steps will be discussed in the next section: 1. 2. 3. 4. 5.
Filter all vertices that do not form a cycle. Create a new set of vertex indices in the array called V’. For each cycle of length C, generate unique combination in parallel. For each generated combination, check cycle existence. Compute the approximation factor (n) to be the number of swaps.
Detecting Cycles in Graphs Using Parallel Capabilities of GPU
6. 7. 8. 9.
197
For each swapped combination, check for cycle existence. After each completion of cycle existence check, decrement (n). Send swapped combinations from the GPU memory to CPU memory. Repeat steps (6, 7, 8), until permutation factor (n) for all threads equal to 0.
5 Thread-Based Cycle Detection Algorithm Detailed Level Design The following will go through a detailed explanations of the steps listed in section 4. It gives the description of the components that build up the entire thread-based cycle detection algorithm. 5.1 Vertex Filtering Each Element in the V’ array contains a vertex that has a degree greater than (2) (excluding self loop edge). The major advantage of filtering “pruning” is to minimize the number of combinations (threads) generated (i.e. combinations that will never form a cycle). Since there are some vertices that do not satisfy the cycle existence, this will create wasteful combinations. For example If |V| = 10, looking for a cycle of length C=3, then for all vertices the number of combinations to be created without pruning is 240. If we filter the vertices having degree greater than (2); let us say that 3 vertices have degree less than (2), then |V`| = 7 (3 out of 10 vertices will be eliminated from being used to generate combinations), then we have to create 35 combinations, results of saving 205 unnecessary combinations. 5.2 Vertex Re-indexing Vertices will be labeled by their index in V’ array rather than their value in V, for instance If V=2 It will be represented as 0 in V’. To retrieve the original set of vertices if the set of combinations of V’ array forms a cycle, we do a simple re-mapping procedure. For example, if the combination (0, 1, 2, 0) forms a cycle in V’ array, then it will be resolved to (2, 3, 4, 2) in V array. The main reason for vertex re-indexing is to allow each thread to generate its own set of combinations correctly, because the combination generation mechanism is relying on an ordered sequenced set of numbers. 5.3 Parallel Combination Generator Given |V|, where V is the graph size in terms of the number of vertices, for example if V = {1, 2, 3, 4… 10}, then |V| =10. Given |Pos| where Pos is the index of Vertex Position to be placed in the cycle length, for a cycle of length C, then |Pos| = C , we to have to place vertices in the following order Pos(1), Pos(2), Pos(3),….. Pos (C). For example V={1,2,3,4,5,6}, take a cycle of length 4 (2-4-5-6) Then, at Pos(1) will have 2, at Pos(2) will have 4, at Pos(3) will have 5 and at Pos(4) will have 6. Knowing the first position vertex value of the combination will allow us to know the remaining positions, since the set of vertices that form a given combination are sorted, we can guarantee that at position[i] the value must be at least value of position [i-1] + 1. There is a strong relationship between the row id of the combination and the vertex value of the first position.
198
F. Mahdi, M. Safar, and K. Mahdi
Consider an undirected graph of size 6, looking for a cycle of length 3, as an example for showing how the generator works. The target Thread ID is 9, and we want to generate its corresponding entries. Before we start explaining the algorithm, here is the mathematical expression of each function that is referenced in the algorithm: !
C
!
!
.
C is the number of unique subset of size n to be generated from the set m. Vmin i
Pos i .
Vmin (i) is the minimum possible vertex value to be stored at position (i). Vmax i
V
C
Pos i
.
Vmax (i) is the maximum possible vertex value to be stored at position (i). Offset k, i
C
.
Offset (k,i) is the number of combination rows of given value (k) at position (i). 5.4 Virtual Adjacency Matrix Building the “Virtual Adjacency Matrix” is made by constructing (C) by (C) matrix without allocating a device memory. Each thread will create its own matrix based on the combination of vertices of size (C). The word “virtual” comes from the fact that building the matrix in the code is not a traditional 2 Dimensional array, but it is a nested loop that computes the number of edges within the combination generated by each thread. Since virtual adjacency matrix is implemented as a nested loop, so no memory is required , it gives a yes/no answer for a given set of combinations that they form a cycle or not. A small example that shows how virtual adjacency matrix works, consider a sample undirected graph and the corresponding actual adjacency matrix representation. Figure 1 shows the sample graph, while figure 2 shows the adjacency matrix representation for the graph in figure 1.
Fig. 1. Sample graph used to demonstrate the virtual adjacency matrix test
Detecting Cycles in Graphs Using Parallel Capabilities of GPU
199
Fig. 2. Adjacency matrix representation for the graph in Fig. 1
Case 1: Examine the combination set (1, 2, 3, 4) to see if they form a cycle of length 4 or not, Figure 3 shows the construction of Virtual Adjacency Matrix test:
Fig. 3. Virtual Adjacency Matrix for the combination (1, 2, 3, 4)
Since ∑ passed 4 which is equal to the Cycle Length, this implies that the combination set (1, 2, 3, 4) forms a cycle (regardless of the vertex order), so the virtual adjacency matrix test will return (TRUE). Case 2: Examine the combination set (1, 2, 4, 8) to see if they form a cycle of length 4 or not, Figure 4 shows the construction of Virtual Adjacency Matrix test:
Fig. 4. Virtual Adjacency Matrix for the combination (1, 2, 4, 8)
200
F. Mahdi, M. Safar, and K. Mahdi
Since ∑ passed 2 which is not equal the Cycle Length, this implies that the combination set (1, 2, 4, 8) do not form a cycle (regardless of the vertex order), so the virtual adjacency matrix test will return (FALSE). 5.5 Check for Cycle Using Linear Scan of Edge Count Once each combination generated passed the virtual adjacency test with (TRUE), a second, detailed cycle detection algorithm will be applied to check if the ordered set of vertices can generate a cycle or not. Because the virtual adjacency matrix test will not tell us what is the exact combination that generates a cycle. The edge count algorithm addresses this issue. The algorithm does a simple nested linear scan for the set of vertices. This algorithm needs temporary array that is assigned to each thread independently called vertex cover of size (C). Initially a vertex cover array is initialized to false. A scan pointer starts from the vertex indexed as current passed from the CPU, current vertex will examine its nearest neighborhood to check for an edge connectivity provided that this neighbor is not covered yet, if there is an edge, then a counter called “edge counter” is incremented and flag entry is set to (TRUE) in the covered array for that neighbor vertex is set to true, then setting the newly connected vertex as current. Restart from the first element in the combination array and repeat the procedure again. Keep scanning until the end of the combination array. A cycle exists for the set of vertices if the edge counter is equal to the cycle length, otherwise no cycle is formed. 5.6 Swaps of Vertices Set Using Quick Permutation We have modified an existing permutation of array algorithm called “Quick Permutation reversals on the tail of a linear array without using recursion” [28]. The original algorithm works on sequential behavior on the CPU, we have adopted it to a parallel version that can run on the GPU.
6 Approximation Approach Using Permutation Factor (n) The first stage of the thread-based cycle detection algorithm is to create a unique set of combinations of length C, then apply check for cycle using virtual adjacency matrix technique, which is yes/no decision make algorithm that gives the answer for the following question: “Is this unique set of vertices (combination) can form a cycle regardless of the vertices order?. We can use the following equation from [20] that is used to compute the maximum number of cycle in an undirected graph: , Comb
Since Perm
C! , then equation (1) can be express as: ,
!
(1)
!
(2)
If we do permutations for the unique possible combination that are passed the virtual adjacency matrix check with the answer of “yes”, then we have covered all
Detecting Cycles in Graphs Using Parallel Capabilities of GPU !
possible cycles, but since then we need
!
201
is impractical to execute for even small C, say C=10,
= 181,440 iterative steps to be executed, which may not be feasible.
Our approximation approach is to minimize
!
to be a reasonable number, by !
. The value of (n) defining a factor (n) that is needed to be multiplied by obtained by the multiplication of the degree percentage of each vertex that is involved in the combination, (n) can be expressed as ∑
)
(3)
The value of (n) will become very small (close to zero) if the percentage degree of each vertex is low, which means that the set of vertices are poorly connected and they have small chances to constitute a cycle. For example if want to do a permutation for a cycle of length 5 with a total degree of 40, the following set of vertices (v1, v2, v3, v4, v5) have the degrees (2, 2, 3, 3, 4), respectively. If we use the equation in (3) then: 2 40
2 40
3 40
3 40
4 40
0.35
!
If we multiply 0.35 by will get n = 4, rather than getting 12 (total Permutations). The value of n will become very large (close to one) if the percentage degree of each vertex is high, which means that the set of vertices are strongly connected and they have big chances to constitute a cycle. For example if want to do a permutation for a cycle of length 5 with a total degree of 40, the following set of vertices (v1, v2, v3, v4, v5) have the degrees (9, 8, 7, 6, 6) respectively. If use the equation in (3) then: 9 40
8 40
7 40
6 40
6 40
0.9
!
If we multiply 0.9 by will get n = 11, rather than getting 12 (total Permutations). As a result, based on the connectivity behavior of the vertices either strongly or poorly connected will influence on value the permutation factor (n). Retrieving the value of (n) for each possible cycle length, will create an iteration control to determine when to stop permuting.
7 Experiments and Results The experiment in [15] to solve the problem of finding cycles in a graph with 26 nodes used 30 machines in a distributed environment. In our experiments we used only 1 commodity hardware PC, equipped with only 1 nVIDIA 9500 GS GPU device. The cost of 9500 GS GPU card is less than $ 50. The specification of this GPU card is 32 Stream Processors (Core) and 1024 MB Memory (DRAM), all in a single PCIexpress graphics card. Our experiment was conducted on X86-based PC, the CPU is Intel Pentium Dual Core with 2.80GHz. The Memory is 4 GB, equipped with nVIDIA GeForce 9500 GS,
202
F. Mahdi, M. Safar, and K. Mahdi
all running under Microsoft windows 7 ultimate edition. The source code written in Microsoft Visual C++ express edition with CUDA SDK v3.2 . Our experiments showed that the execution time are in seconds. The GPU time is the total execution time to generate combinations kernel function plus check for cycle existence using virtual adjacency matrix. The CPU time is the total execution time for exporting generated combination stored in GPU memory to the CPU Memory and then dumps the memory content to the secondary storage device. The main reason behind better execution time is that CUDA devices are designed to work with massively parallel thread 2 . The next phase of the experiment is to include the permutation factor (n) for each cycle length, and do a check for cycle using edge count for each generated permutation. For example if the permutation factor for cycle of length 13 is n, we shall check (n 667,794) possible permutations, where each of (667,794) iterations are made in parallel. We used same graph data in the experiment in [15], and then apply our experiment for a different set of fixed approximation factor (n). Table 1 shows the detected cycles and the corresponding execution time for the thread-based cycle detection algorithm. Table 1. Detected cycles and execution time for the approximation solution
Approximation Factor
Cycles Detected
Execution Time
2048
20830044
3.2 Hours
4096
39408671
7 Hours
8192
76635770
18 Hours
The execution time is the total time used to generate parallel swaps (permutations) by a factor of (n) and then check the cycle existence for each swap that is executed at the GPU context plus the total time needed to sum up the total cycles detected at the CPU context. The experiment in [15] to find the cycles in a graph with 26 nodes takes around 16 days. In our experiment we have solved all the unique possible combination of each cycle length (this is the first phase of the experiment) in less than 2 minutes. Even though in the approximation approach, where for each possible combination that form a cycle we have created (n) swaps (this is the second phase of the experiment) it did not last more than 3.2 hours when n = 2048. We can better approximate the solution with more time needed, but still less than the time in the original experiment. The way of viewing the cycle definition plays an important decision making in the solution feasibility. If the assumption behind solving the problem stated that any cyclic permutation of existing cycle combination considered as one cycle (for instance, cycle 12-3-4-1 is the same as -1-3-2-4-1) then applying the first phase of the thread-based cycle detection algorithm (parallel combination generator) will result a significant improvement over existing algorithms. This also achieves time breaker solution compared with other algorithms. But, if the assumption stated that cyclic permutations are considered as individual ones (for instance cycle 1-2-3-4-1 is different as 1-3-2-4-1) then applying approximation approach in the thread-based cycle detection algorithms has a time/resources advantage over other approximation algorithms.
Detecting Cycles in Graphs Using Parallel Capabilities of GPU
203
7.1 Approximation Accuracy of the Thread-Based Cycle Detection Algorithm Table 2 shows the approximation accuracy for the number of detected cycles between the exact solution in the algorithm specified in [15] and our thread-based cycle detection algorithm (approximated) alongside with the results obtained from table 2 using the three different approximation parameters (n). Also within each run of the approximation factor (n), we have included the approximated entropy calculations. The approximation accuracy measured using the following equation: 100
(4)
Table 2. Approximation accuracy (%) of the detected cycles and Entropy calculations for running three different approximation factors (n) of the experiment
Approximation Factor
Accuracy (%)
Entropy
2048 4096 8192
0.147 0.279 0.542
2.407 2.391 2.355
Although the approximation values are quite small, but it can be increased as we increase the permutation factor (n), since our main concern in the thread-based cycle detection algorithm design is speeding up the computations.
Fig. 5. Normalized Results for the exact and approximated solution
204
F. Mahdi, M. Safar, and K. Mahdi
Figure 5 shows the normalized results for the exact solution of the detected cycle found in [15] and our approximated solution using three different values of the approximation factor (n).
8 Conclusions Cycle count problem is still an NP-Complete, where no such polynomial time solution has been found yet. We have presented a GPU/CUDA thread-based cycle detection algorithm as alternative way to accelerate the computations of the cycle detection. We have utilized GPU/CUDA computing model , since it provides a cost effective system to develop applications that are data-intensive operations and needs high degree of parallelism. The thread-based cycle detection algorithm have viewed the problem from a mathematical perspective, we did not use classical graph operations like DFS. GPU/CUDA works well for mathematical operation, if we can do more mathematical analysis of the problem, we can get better results.
9 Future Work CUDA provides a lot of techniques and best practices that can be applied on the GPU applications to enhance the performance of the execution time of the code. Here we have selected two techniques that are commonly used. #pragma unroll [24] is a compiler directive in the loop code. By default, the compiler unrolls small loops with a known trip count. The #pragma unroll directive however can be used to control unrolling of any given loop. It must be placed immediately before the loop and only applies to that loop. Shared Memory [24], because it is on-chip, the shared memory space is much faster than the local and global memory spaces. In fact, for all threads of a warp, accessing the shared memory is as fast as accessing a register as long as there are no bank conflicts between the threads. Porting the application to 64 bit platform to support wide range of integer numbers since CUDA provides a 64-bit SDK edition. Converting the idle threads the do not form a cycle to an active threads which form a cycle, by passing such combinations to the idle threads in order to increase the GPU efficiency and utilization .Finding methods to do more filtering procedures for vertices that may not form a cycle, in the current implementation only filter less than degree 2 was used. Migrating the CPU based cycle detect counter by thread procedure to parallel GPU based count , since it creates a performance bottleneck , especially for large number of cycles
References 1. Shirazi, S.A.J.: Social networking: Orkut, facebook, and gather Blogcritics (2006) 2. Safar, M., Ghaith, H.B.: Friends network. In: IADIS International Conference WWW/Internet, Murcia, Spain (2006) 3. Fiske, A.P.: Human sociality. International Society for the Study of Personal Relationships Bulletin 14(2), 4–9 (1998)
Detecting Cycles in Graphs Using Parallel Capabilities of GPU
205
4. Boykin, P., Roychowdhury, V.: Leveraging social networks to fight spam. Computer 38(4), 61–68 (2005) 5. Xu, J., Chen, H.: Criminal network analysis and visualization. Communications of the ACM 48(6), 100–107 (2005) 6. Bagchi, A., Bandyopadhyay, A., Mitra, K.: Design of a data model for social network applications. Journal of Database Management (2006) 7. Bhanu, C., Mitra, S., Bagchi, A., Bandyopadhyay, A.K., Teja: Pre-processing and path normalization of web graph used as a social network. Communicated to the Special Issue on Web Information Retrieval of JDIM (2006) 8. Mitra, S., Bagchi, A., Bandyopadhyay, A.: Complex queries on web graph representing a social network. In: 1st International Conference on Digital Information Management, Bangalore (2006) 9. Wang, B., Tang, H., Guo, C., Xiu, Z.: Entropy optimization of scale-free networks’ robustness to random failures. Physica A 363(2), 591–596 (2005) 10. Costa, L.d.F., Rodrigues, F.A., Travieso, G., Villas Boas, P.R.: Characterization of complex networks: A survey of measurements (2006) 11. Albert, R., Barabasi, A.-L.: Statistical mechanics of complex networks. Reviews of Modern Physics 74 (2002) 12. Mahdi, K., Safar, M., Sorkhoh, I.: Entropy of robust social networks. In: IADIS International Conference e-Society, Algarve, Portugal (2008) 13. Mahdi, K.A., Safar, M., Sorkhoh, I., Kassem, A.: Cycle-based versus degree-based classification of social networks. Journal of Digital Information Management 7(6) (2009) 14. Scott, J.: Social Network Analysis: A Handbook. Sage Publication Ltd., Thousand Oaks (2000) 15. Mahdi, K., Farahat, H., Safar, M.: Temporal Evolution of Social Networks in Paltalk. In: Proceedings of the 10th International Conference on Information Integration and Webbased Applications & Services, iiWAS (2008) 16. Marinari, E., Semerjian, G.: On the number of circuits in random graphs. Journal of Statistical Mechanics: Theory and Experiment (2006) 17. Tarjan, R.: Enumaration of the Elementary Circuits of a Directed Graph, Technical Report: TR72-145, Cornell University Ithaca, NY, USA (1972) 18. Liu, H., Wang, J.: A new way to enumerate cycles in a graph. In: International Conference on Internet and Web Applications and Services (2006) 19. Safar, M., Alenzi, K., Albehairy, S.: Counting cycles in an undirected graph using DFSXOR algorithm, Network Digital Technologies. In: First International Conference on, NDT 2009 (2009) 20. Safar, M., Mahdi, K., Sorkhoh, I.: Maximum entropy of fully connected social network. In: The International Conference on Web Communities (2008) 21. Halfhill, T.R.: Parallel Processing With Cuda Nvidia’s High-Performance Computing Platform Uses Massive Multithreading ,Microprocessors Report (January 2008), http://www.MPROnline.com 22. Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Skadron, K.: A Performance Study of General-Purpose Applications on Graphics Processors Using CUDA. The Journal of Parallel and Distributed Computing 23. Harish, P., Narayanan, P.J.: Accelerating large graph algorithms on the GPU using CUDA. In: Aluru, S., Parashar, M., Badrinath, R., Prasanna, V.K. (eds.) HiPC 2007. LNCS, vol. 4873, pp. 197–208. Springer, Heidelberg (2007)
Survey of MPI Implementations Mehnaz Hafeez2, Sajjad Asghar1, Usman Ahmad Malik1, Adeel ur Rehman1, and Naveed Riaz2 1
National Centre for Physics, QAU Campus, 45320 Islamabad, Pakistan 2 Shaheed Zulfikar Ali Bhutto Institute of Science and Technology (SZABIST), H-8/4 Islamabad, Pakistan [email protected], {Sajjad.Asghar,Usman.Malik,Adeel.Rehman}@ncp.edu.pk, [email protected]
Abstract. High Performance Computing (HPC) provides support to run advanced application programs efficiently. Message Passing Interface (MPI) is a de-facto standard to provide HPC environment in clusters connected over fast interconnect and gigabit LAN. MPI standard itself is architecture neutral and programming language independent. C++ is widely accepted choice for implementing MPI specifications like MPICH and LAM/MPI. Apart from C++ other efforts are also carried out to implement MPI specifications using programming languages such as Java, Python and C#. Moreover MPI implementations for different network layouts such as Grid and peer-to-peer exist as well. With these many implementations providing a wide range of functionalities, programmers and users find it difficult to choose the best option to address a specific problem. This paper provides an in-depth survey of available MPI implementations in different languages and for variety of network layouts. Several assessment parameters are identified to analyze the MPI implementations along with their strengths and weaknesses. Keywords: Java, C/C++, Python, C#, Grid, peer-to-peer, message passing, MPI.
1 Introduction Nowadays, there is a persistent demand for greater computational power to process large amount of data. High Performance Computing (HPC) makes previously unachievable calculations possible. Today the modern computer architectures are relying more and more upon hardware level parallelism. They attain computing performance, through realization of multiple execution units, pipelined instructions [1] and multiple CPU cores [2]. The largest and fastest computers use both shared and distributed memory architectures. Contemporary trends show that the hybrid type of memory architectures will continue to prevail [3]. There are two popular message passing approaches Parallel Virtual Machine (PVM) and MPI. MPI is the leading model used in HPC at present. It supports both H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 206–220, 2011. © Springer-Verlag Berlin Heidelberg 2011
Survey of MPI Implementations
207
point-to-point and collective communication. MPI aims at high performance, scalability, and portability. Interoperability of objects defined in MPI facilitates mixed-language message passing programming. It supports the Single Program Multiple Data (SPMD) model of parallel computing. The message-passing model uses its own local memory during computation. Multiple tasks can reside on the same physical machine as well as on any random number of machines. For data transfer cooperative operations are required to be performed by each process, specifically a send operation must have a matching receive operation. Communication modes offered by MPI are blocking, non-blocking, buffered and synchronous. The MPI specifications [9] provide guidelines for binding of different languages such as C and FORTRAN. The guidelines for C++ are included in MPI-2 [10]. Besides, MPJ [11] provides guidelines of MPI specifications for Java. It should be noted that there are no official bindings of MPI specifications available other than C/C++. Today peer-to-peer and Grids are dominant models for providing computing over Intra and Internet. There are several MPI implementations written for Grids and peerto-peer computing models and various shortcomings have been identified. For example the dynamic nature of Grid [27], resource discovery and management are not compatible with the static host configuration file model of MPI. This paper presents a literature review of existing MPI implementations in C/C++, Java, Python and C#. Moreover MPI models for peer-to-peer and Grids are also included. The remaining part of the paper is organized as follows; section 2 covers the literature review, section 3 gives critical analysis of MPI models, and concluding remarks are covered in section 4.
2 Literature Review 2.1 MPI Models in C/C++ Although the MPI standard is not constraining an implementation in a certain language yet most of the MPI implementations are written in C/C++ and FORTRAN. Along with free C/C++ implementations of MPI there are several commercial implementations provided by Microsoft, HP and Intel Corporation. In this section a survey of different C/C++ implementations of MPI is presented. LAM [7] is a programming environment and a development system for message passing multicomputer based on UNIX network. The three building blocks of LAM multicomputer are physical computers, processes on these computers, and messages transmitted between processes/computers. It has a modular micro-kernel that performs many important tasks. Moreover it provides a complete API for explicit message passing. The modular structure along with explicit node based communication makes it a suitable choice for building higher-level tools and parallel programming paradigms. It also provides a message buffering system and is compliant with most of the features specified by the MPI standard. MPICH [31] is a MPI implementation integrating portability with high performance. MPICH represents a sign of adaptability with respect to its environment, thereby, implying the portability characteristic at the cost of minimum efficiency loss. MPICH is implemented as a complete implementation of MPI specifications. It offers
208
M. Hafeez et al.
a range of diversified communication styles including Scalable Coherent Interface (SCI) [30], TCP, and shared memory. Several enhancements and performance improvements became part of the implementation in MPICH2 [37]. OpenMPI [47] is an open-source implementation of MPI. It enables high performance parallel computing for homogenous and heterogeneous platforms. The two components of OpenMPI, Open Run-Time Environment (OpenRTE) and the communication library provide efficient communications and transparent execution of applications. The communication protocols supported by OpenMPI include shared memory, Myrinet [28], InfiniBand [29], TCP/IP, and Portals. The architecture and network properties are determined for each participating node and most efficient communication mode is chosen to achieve maximum performance. MS MPI [34] is the Microsoft implementation of MPI that is based on MPICH2. MS MPI can be used to easily port the existing code based on MPICH2. MS MPI also provides security based on Active Directory (AD) services and binary compatibility across different types of interconnectivity options. The scheduler works on First Come First Serve (FCFS) basis with back fill capabilities. The cluster administrator can specify the resource allocation policies. Thus preventing unauthorized jobs from accessing restricted nodes. The scheduler also provides fail-safe execution. 2.2 MPI Models in Java Java requires native marshalling of primitive data types at MPI communication level. This slows down the communication speed. Nevertheless Java adaptation in HPC has become an apparent choice for new endeavors, because of its obvious features like; object-oriented, portability, memory management, built-in communication libraries, support for multi-threads, platform independent, secure, and an extensive set of Application Programming Interfaces (APIs). Therefore many attempts have been made to provide MPI implementations for Java. JavaMPI [16] ensures portability across different platforms and full compatibility with MPI specifications. It supports distributed memory model for process communication. For distributed memory it provides an automatic binding of native MPI using Java to C Interface (JCI). JavaMPI uses Java Native Interface (JNI) that is responsible for format conversion. Native binding introduces portability issues as it contradicts the Java philosophy of compile once, run everywhere. MPIJ [17] is a Java based implementation. It runs as a part of the Distributed Object Group Meta computing Architecture (DOGMA) system. MPIJ includes pointto-point communication, groups, intra-communicator operations, and user-defined reduction operations. MPIJ communication uses native marshalling for primitive Java types. MPIJ can function on applet-based nodes. This provides flexibility in creating clusters without installing any system or user software related to the message-passing environment on the client nodes. MPJava [18] is a pure Java message-passing framework. It provides communication through Java’s new I/O APIs (java.nio). It is the first implementation based on java.nio. MPJava requires bootstrap process to read a list of hostnames and do the necessary remote logins to each machine. The outcome is that each process gets connected to every other process. The nodes for point-to-point as well as collective communications use these connections. It offers scalability and high
Survey of MPI Implementations
209
performance communications. The performance of MPJava communication is comparable to native MPI codes. Jcluster [19] is a pure Java implementation for heterogeneous clusters. It provides both PVM like and MPI like APIs. It uses asynchronous multithreaded transmission based on UDP protocol. Jcluster has implemented dynamic load balancing to reduce the idle time of resources and to use the bandwidth in an efficient way. Jcluster achieves larger bandwidths for larger message sizes. Parallel Java (PJ) [20] is a unified shared memory and message passing parallel programming library written in Java. PJ provides it own APIs for programmers to write parallel programs for SMP machines, clusters, and hybrid SMP clusters. PJ has its own middleware for managing job queues and submitting processes on cluster machines. mpiJava [21] follows the C++ binding of MPI-2 specifications. It uses wrapper classes to invoke native MPI implementation through JNI. It supports high-speed networks including Myrinet, InfiniBand and SCI. It provides a set of APIs for programmers to write MPI like code using Java. MPJ Express [23] is a thread-safe library based on mpiJava 1.2 API specifications. Two communication devices are implemented; niodev, based on the java.nio and mxdev, based on the Myrinet eXpress (MX) library. MPJ Express provides an experimental runtime, which allows portable bootstrapping of Java Virtual Machines (JVM) across a cluster or network of computers. MPJ Express proposes pluggable architecture of communication devices. MPJ/Ibis [24] is the first available implementation of Java Grande Forum (JGF) [32] MPJ to use high-speed networks using some native code and some optimization techniques for performance. MPJ/Ibis library provides both pure Java communication and Myrinet native communication. Pure Java communication is offered through Java I/O and java.nio. MPJ/Ibis provides competitive performance with portability ranging from high-performance clusters to grids. Current Java runtime environments are heavily biased to either portability or performance; the Ibis [12] strategy achieves both goals simultaneously. Java Message Passing Interface (JMPI) [25] is a pure Java implementation of message passing library, which complies with MPJ specifications. It provides a graphical user interface and tools to setup the computing environment. It is based on two typical distributed system communication mechanisms that are sockets and RMI. Fast MPJ (F-MPJ) [26] proposes a scalable and efficient Message-Passing middleware in Java for parallel computing. F-MPJ provides non-blocking communication, based on Java I/O sockets. Java Fast Sockets (JFS) implementation in F-MPJ provides shared memory and high-speed networks support. A-JUMP framework [33] is based on MPI specifications for Java. A-JUMP can easily employ different languages, architectures and protocols for writing parallel applications. The backbone of A-JUMP is HPC bus, which provides asynchronous mode of communication. It is responsible for inter-process communication and code distribution over the cluster. In addition it provides loosely coupled relationship between communication libraries and the program implementation. A-JUMP offers dynamic management of computing resources as well. Though the multilingual support is included in the goals of A-JUMP but it is not the part of the current AJUMP implementation.
210
M. Hafeez et al.
There are several other Java message passing implementations. These are CCJ [13], Java Object Passing Interface (JOPI) [14] and JavaWMPI [15]. CCJ has not followed MPI syntax and its communication is based on Java's object-oriented framework. JOPI is a pure Java implementation for parallel programming based on an object-passing model for distributed memory systems. JavaWMPI has used wrapper approach built on Windows-based implementation of MPI (WMPI). 2.3 MPI Models in Python Python is a popular tool for developing scientific applications. It is a viable option for writing MPI implementations as well. Some accepted implementations in Python are presented in this section. MYMPI [4] is a Python based implementation of MPI standard covering a subset of MPI specifications. It is a partial implementation containing only twenty five (25) most important routines of the library. It can be used with a standard Python interpreter. It mostly follows the C and Fortran MPI syntax and semantics, thus providing better integration of Python programs with these languages. MYMPI provides better control to programmers for writing parallel applications. MYMPI has been used under various operating system platforms with number of different MPI libraries including MPICH Ethernet and Myrinet communication libraries used under SuSie. pyMPI [5] is a near complete implementation of MPI specifications in Python containing almost all constructs of the MPI library. pyMPI is a module that also provides a custom version of Python interpreter. It follows the Single Program Multiple Data (SPMD) style parallelism and can be used interactively. It offers a comprehensive support to MPI features including basic broadcast and barrier, reductions, blocking and non-blocking send and recv, gather and scatter, and communicators ensuring collective as well as point-to-point communications. The customized interpreter itself runs as a parallel application. The custom Python interpreter calls the MPI_INIT when it starts taking away the control from the programmers. MPI for Python [6] is an open source, object-oriented Python package providing MPI bindings built on top of MPI-1 and MPI-2 specifications. The initial release was built on top of MPI-1 specifications with a focus on translating MPI-2 C++ bindings to Python. The latest release complies with almost complete MPI-2 specifications introducing object serialization, direct communication of memory buffers, nonblocking and persistent communications, dynamic process management, one-sided communications, extended collective operations, and parallel I/O. MPI for Python is capable of communicating any Python object exposing a memory buffer along with general Python objects. 2.4 MPI Models in C# The initial release of MPI.Net [35] has two libraries for C# and MPI.NET. The C# binding is a low-level interface following the C++ bindings of MPI-2 specifications and MPI.Net is a high level interface influenced by Object Oriented MPI (OOMPI)
Survey of MPI Implementations
211
C++ library. MPI.Net supports other languages as well that execute under Common Language Infrastructure (CLI) as implemented in Microsoft.Net framework. The improved implementation of MPI.Net [36] also provides a C# interface to MPI. It is built on top of C MPI interface. The design goals include abstraction for programmers as well as performance efficiency. Primitive types are sent directly using corresponding MPI types, whereas user-defined types and objects are serialized. Performance results demonstrate that the abstraction penalty is very small but serialization reduces performance considerably. The improved MPI.Net [36] provides a uniform interface with better optimizations as compared to the earlier release [35]. 2.5 MPI Models for Grid Computing MPI programming on grids may require several communication paradigms. The current grid applications are tightly coupled with their communication layer implementations. Furthermore these applications have to deal with multiple communication interfaces, low-level protocols, data encodings, data compressions and quality of service in order to achieve acceptable performance. Therefore to design MPI implementations for Grids is a complex task. P2P-MPI [22] has been designed to ease parallel programs deployment on grids. It allows an automatic discovery of resources, a message passing programming model, a fault-detection system and replication mechanisms. Performance benchmarks of P2PMPI demonstrate slower performance as compared to other MPI implementations. MPICH-G [38] is an implementation of the MPI that uses services provided by the Globus toolkit [43] to allow the use of MPI in wide area environments. MPICH-G masks details of underlying networks and computer architectures so that diverse distributed resources can appear as a single “MPI_COMM_WORLD". MPICH-G2 [39] is a redesigned implementation of MPICH-G. Unlike MPICH-G, MPICH-G2 does not use Nexus for MPI communication. Instead it uses TCP for inter-machine communication and vendor supplied MPI for inter-machine messaging. MagPIe [40] library is based on MPICH. MagPIe implements the complete set of collective operations of MPI. With MagPIe, MPI programs can use collective communication operations efficiently and transparently without changing the application code. G-JavaMPI [41] middleware supports MPI-style inter-process communication through MPICH-G2 libraries [39]. This middleware also supports security-enhanced Java process migration of Java programs in the Grid environment using Globus Security Infrastructure (GSI) [42] provided by Globus toolkit [43]. The middleware requires no modifications of underlying OS, Java Virtual Machine, and MPI. Therefore, the parallel Java processes can be migrated freely and transparently between different sites in the Grid. PACX-MPI [44] is another Grid enabled MPI implementation. It is based on MPI1.2 standard. PACX-MPI is well suited for heterogeneous clusters. Moreover it provides support for optimized collective operations, optimizations for the handling of derived data types, encryption of external communication and data compression to reduce the size of data transferred between the machines. MPICH/Madeleine III [45] is MPI implementation designed to support natively setups of multiple heterogeneous intra-clusters by interfacing it with the available
212
M. Hafeez et al.
multi-protocol communication library called Madeleine. This approach makes it possible to reuse readily available software components. GridMPI [46] supports heterogeneous clusters. It has implemented on MPI-1.2 specifications to provide message-passing facility over Grid. For interconnected cluster communication it uses Interoperable MPI (IMPI) protocol. The IMPI defines message formats and protocols to enable interoperability among different MPI implementations.
3 Comparative Analysis This section provides comparative analysis and critical evaluation of the models discussed in the literature review according to their classifications. The reviewed MPI implementations have their unique features, strengths and weaknesses. To make a comparative study tangible following assessment parameters have been identified: • • • •
Resource Environment (homogenous/heterogeneous) Communication Type (Blocking, Non-blocking, Synchronous, Asynchronous) Memory Model (Shared, Distributed, Hybrid) High-Speed Network Support
Resource environment provides flexibility on hardware as well as software resources that includes hardware architecture, API libraries and resource operating system. For C/C++ and C# implementations it is hard to achieve heterogeneity at the operating system level, whereas this is a not a limitation for Java and Python based models. Communication type is an important criterion for HPC frameworks. In any case the frameworks should provide synchronous, blocking and non-blocking communication. Nowadays asynchronous communication in HPC framework is desirable as it increases message delivery reliability even if the recipient is not available at the time message is sent. There are two major classifications of memory models; shared memory and distributed memory. The hybrid approach clearly provides scalability and performance for the clusters and multi-core systems. Communication speed itself is very important to meet the performance over the network. Most of the MPI implementations support high-speed networks. It is observed that Java based implementations rely on native code to provide high-speed networks support. 3.1 C/C++ Models MPICH is a complete, most widely used, and one of the earliest implementation of MPI. All three LAM, MPICH and OpenMPI support heterogeneous resource environment at the level of hardware alone. They lack heterogeneity when it comes to cross platform environment. This inherited weakness comes from C/C++ which itself requires a recompilation whenever it ported to different operating system. Both MPICH and OpenMPI support variety of different communication protocols. LAM has a modular structure and requires a fully connected network at initialization time, whereas MPICH makes connection on demand basis. In OpenMPI
Survey of MPI Implementations
213
alone, the communication mode is chosen after determining the architectural and network properties of nodes to enhance the communication performance. MPICH involves heavy communication overheads for message passing as messages have to copy many times and kernel is involved in the transfer of the messages. The improvements in MPICH2 enhance the performance and add some level of scalability. The summary of comparative analysis of C/C++ models is presented in Table 1. Table 1. Comparison of C/C++ Models MPI Implementations
Assessment Parameters Resource Environment
LAM [7]
Homogeneous
MPICH [31]
Homogeneous
OpenMPI [47]
Homogeneous
MS MPI [34]
Homogeneous
Comm. Type
Blocking, NonBlocking Blocking, NonBlocking Blocking, NonBlocking Blocking, NonBlocking
Memory Model
High-Speed Network(s) Support
Distributed
Yes
Distributed
Yes
Shared, Distributed Shared, Distributed
Yes Yes
3.2 Java Models Most of the implementations of Java message passing libraries have their own MPIlike bindings for the Java language. The Java message passing libraries are implemented using either Java RMI, Java sockets or wrapping an underlying native messaging library like MPI through Java Native Interface (JNI). Java RMI ensures portability but at the same time it is not an efficient solution for high-speed networks. The use of JNI has portability problems. Java sockets and RMI both offer efficient communication for message passing libraries though Java sockets require considerable programming effort. For comparing Java based models two more assessment parameters are included. These are: • •
Implementation Style (pure Java, Java wrapper) Communication library (Native MPI, Java I/O, Java new I/O, RMI, Sockets)
Implementation style plays an important role. If an implementation is written in pure Java it assures portability. Communication libraries must be flexible and should support different implementations to assure interoperability. In addition communication layer must be independent from the underlying hardware. It is observed that JavaMPI is relating to mpiJava on the basis of its implementation style, which is Java wrapper, communication library which is native MPI, and distributed memory model. Furthermore, mpiJava uses the native code to support high-speed networks. On the basis of high-speed networks when we try to relate mpiJava with other available implementations; MPJ Express, MPJ/Ibis, JMPI and F-MPJ stand out. We observe that MPJ Express, MPJ/Ibis, JMPI and F-MPJ are relating to each other on
214
M. Hafeez et al.
the basis of pure Java implementation style as well. At the same time Table 2 shows that F-MPJ uses JFS implementation for fast networks rather than using native code. Therefore, MPJ Express, MPJ/Ibis and JMPI do not adhere to the implementation style of pure Java. Moreover, F-MPJ is relating to Parallel Java and Jcluster on the basis of memory model. All these implementations support distributed as well as shared memory models. These observations make the F-MPJ and mpiJava prominent on the basis of their claimed functionalities and features. On the other hand literature review shows that MPJ Express is among the most widely adopted implementation in Java [8]. It is worth noting that over the time, the way communication libraries are either used or developed for different implementations show continuous progress towards pure Java implementation. Table 2 provides the comparison between the various implementations. JavaMPI is not a pure Java implementation like mpiJava and unlike other models presented in literature review. It is based on native communication libraries. Like MPIJ and MPJava, it does not offer support for high-speed networks. Both Jcluster and Parallel Java can support shared and distributed memory models however JavaMPI only supports distributed memory model. Communication in P2P-MPI is based on Java I/O and Java new I/O; on the other hand communication libraries for JavaMPI are native. The communication performance of both the models is comparable. MPIJ provides pure Java communication based on Java I/O whereas MPJ Express can support Java I/O and native libraries for high-speed networks. When MPIJ is compared to MPJ/Ibis both support distributed memory model but the communication libraries of MPJ/Ibis are based on Ibis framework. The modes of communication for all implementations are blocking, and non-blocking, unlike MPJava and Jcluster. Jcluster is the only model that provides asynchronous mode of communication using UDP protocol. In addition, MPJava is another pure Java implementation for homogenous resources that supports only blocking mode of communication. P2P-MPI and MPJ/Ibis are the only two models in the reviewed literature that can support Grid. JMPI has a unique support for communication layer amongst other Java based implementations as it is based on Java I/O, RMI and Sockets. F-MPJ uses JFS for high-speed network support unlike mpiJava, MPJ Express and MPJ/Ibis that use native libraries. However mpiJava and F-MPJ are the only implementations that provide the high-speed network support for Myrinet, Infiniband and SCI. A-JUMP is purely written in Java. Like Jcluster, A-JUMP also supports heterogeneous clusters and is highly scalable like MPJ/Ibis. Unlike JavaMPI and mpiJava, A-JUMP does not use any native code for different APIs. The communication layer of A-JUMP is based JMS and therefore it is independent of application code. The backbone of A-JUMP is HPC bus that makes it unique amongst other MPI implementations. HPC bus of A-JUMP provides loosely coupled relation between communication libraries and program implementation. A-JUMP also supports pure asynchronous mode of communication.
Survey of MPI Implementations
215
Table 2. Comparison of Java Models MPI Implementations
Assessment Parameters High-Speed Network(s) Support
Implementation Style
Resource Environment
Memory Model
Comm. Library
JavaMPI [16]
Java Wrapper
Homogeneous
Distributed
Native MPI
MPIJ [17]
Pure Java
Heterogeneous
Distributed
Java I/O
MPJava [18]
Pure Java
Homogeneous
Distributed
Java new I/O
Blocking
No
Jcluster [19]
Pure Java
Heterogeneous
Shared, Distributed
Java I/O
Asynchronous
No
Parallel Java [20]
Pure Java
Heterogeneous
Shared, Distributed
Java I/O
Blocking, Non-blocking
No
mpiJava [21]
Java Wrapper
Homogeneous
Distributed
Native MPI
Blocking, Non-blocking
Yes
P2P-MPI [22]
Pure Java
Heterogeneous
Distributed
Java I/O, Java new I/O
Blocking, Non-blocking
No
MPJ Express [23]
Pure Java
Heterogeneous
Distributed
Java new I/O
Blocking, Non-blocking
Yes
MPJ/Ibis [24]
Pure Java
Homogeneous
Distributed
Java I/O
Blocking, Non-blocking
Yes
JMPI [25]
Pure Java
Homogeneous
Distributed
Java I/O, RMI, Sockets
Blocking, Non-blocking
No
F-MPJ [26]
Pure Java
Homogeneous
Shared, Distributed, Hybrid
Java I/O, JFS
Blocking, Non-blocking
Yes
A-JUMP [33]
Pure Java
Heterogeneous
Distributed
JMS
Blocking, Non-Blocking, Pure Asynchronous
No
Comm. Type
Blocking, Non-blocking Blocking, Non-blocking
No No
3.3 Python Models A comparative analysis of Python models is given in Table 3. Although pyMPI is a complete implementation of MPI, it deviates from the standard MPI model and does not allow explicit calls to MPI_INIT. The Python interpreter itself runs as a parallel application and calls MPI_INIT at the start. On the other hand MYMPI and MPI for Python implement only a subset of routines in the MPI library, yet MYMPI provides freedom to programmers on calling the MPI_INIT anywhere in their programs. pyMPI provides a bulky Python module that takes time to compile and comes with a custom Python interpreter, whereas MYMPI runs with any standard Python interpreter and provides a light-weight module which compiles very fast. pyMPI has a more object oriented flavor and can be run interactively whereas MYMPI and MPI for Python follow the syntax and semantics of MPI more closely, though MPI for Python does not support Fortran libraries yet. Several real world applications have been built using MYMPI and MPI for Python.
216
M. Hafeez et al.
All three implementations MYMPI, pyMPI, and MPI for Python have communication performance degradation issues, especially when communicating serialized objects. The communication performance of pyMPI can be twice as slow when compared to a C MPI implementation in some cases. Table 3. Comparison of Python Models MPI Implementations
Assessment Parameters Resource Environment
Memory Model
MYMPI [4]
Heterogeneous
Distributed
pyMPI [5]
Heterogeneous
Distributed
MPI for Python [6]
Heterogeneous
Distributed
Comm. Type
Blocking, NonBlocking Blocking, NonBlocking Blocking, NonBlocking
High-Speed Network(s) Support
Yes No No
3.4 C# Models MPI.Net provides abstraction for programmers and also supports multiple programming languages that are executed under the CLI within Microsoft .Net framework. No other framework provides interoperability between different programming languages. Support for multilingual bindings for A-JUMP are in initial development phase. MPI.Net lacks support for heterogeneous resource environments and its communication performance is worst when communicating serialized objects. 3.5 Grid Models Both implementations MPICH-G2 and PACX-MPI support heterogeneous computing environment. Though, design considerations of MPI implementations are different such as MPICH offers flexible communication method, MPICH-G2 supports Grid environment and PACX-MPI provides extensibility for future computer architecture. MPICH-G2 uses Globus services whereas A-JUMP is not dependent on any third part middleware and/or software. A-JUMP supports loosely coupled component based architecture. Its communication layer is independent of the application code. MagPIe and G-JavaMPI are based on MPICH and MPICH-G2 respectively. Therefore, both implementations inherit the shortcomings of MPICH and MPICH-G2 libraries. Unlike MagPIe and G-JavaMPI, A-JUMP has support for multiple vendors provided implementations as HPC bus of A-JUMP is based on JMS set of standards. PACX-MPI and MPI/Madeleine III both have limitations of scalability whereas it is one of the important features of A-JUMP. GridMPI is based on MPI-1.2 specifications while A-JUMP follows MPJ standards. For interconnected cluster message-passing GRIDMPI uses IMPI whereas A-JUMP message passing communication is based on ActiveMQ, which is based on JMS1.1 specifications. Therefore A-JUMP provides heterogeneity at protocol as well as operating system level. Thus A-JUMP becomes distinct amongst other MPI like frameworks. A comparative analysis of all of the above-mentioned implementations is summarized in Table 4.
Survey of MPI Implementations
217
Table 4. Comparison of Grid Models MPI Implementations
Assessment Parameters Implementation Style
Resource Environment
Comm. Type
Comm. Library
P2P-MPI [22]
Pure Java
Heterogeneous
Blocking, Non- blocking
Java I/O
MPICH-G2 [39]
C/C++
Heterogeneous
Blocking, Non- blocking
MPI
MagPIe [40]
C/C++
Heterogeneous
Blocking, Non- blocking
MPI
G-JavaMPI [41]
Java/C/C++
Heterogeneous
Blocking, Non- blocking
MPI
PACX-MPI [44]
C/C++
Heterogeneous
Blocking, Non- blocking
MPI
MPICH/Madeleine III [45]
C/C++
Heterogeneous
Blocking, Non- blocking
MPI
GridMPI [46]
C/C++
Heterogeneous
Blocking, Non- blocking
IMPI
A-JUMP [33]
Pure Java
Heterogeneous
Blocking, Non- blocking, Pure Asynchronous
JMS
4 Conclusion In this paper we have studied the existing implementations of MPI for different platforms, written different programming languages. Assessment parameters are identified to make a critical analysis and comparative study. Most popular and adopted implementations are written in C/C++ as they are suited for a wide range of scientific and research communities for enabling parallel applications. However it lacks the support for heterogeneous operating system in an integrated environment. It has been observed that most of Java projects are in their investigational phase and currently they are not in extensive use. On the other hand these all these projects exhibit that the performance for Java based MPI implementations is improving. Though there are few MPI implementations in Python but all of them are being utilized in specific projects and have communication performance issues. For future implementations Java remains an obvious choice for developing parallel computing applications for multi core hardware mainly because of its diversity and features. MPI.Net is the only implementation other than A-JUMP that provides interoperability between different programming languages within the Microsoft .Net framework. The study of different grid implementations clearly shows that MPI over Internet is a challenge because of its volume and complexity. Programming for interconnected cluster applications often requires the use of several communication paradigms that increases the programming effort for code writing. A-JUMP is the only implementation that supports multiple protocols for message passing using JMS. The survey shows the scarce features of the implementations like multiple protocol support to offer transparency of code execution over different types of networks such as LAN and WAN. The high-speed network support and asynchronous mode of communication are also not supported by most of the implementations.
218
M. Hafeez et al.
On the basis of the survey, we conclude that a message-passing framework should support data distribution for parallel applications. Moreover portability, performance, transparency of execution, and asynchronous mode of communication in heterogeneous environment should also be provided. To achieve above-mentioned goals a set of easy to use APIs should be part of the framework. The future message passing implementations for Java should also incorporate the support for multiple communication protocols, multiple programming languages and inter-cluster communication. This will enable the HPC community to get benefits offered by the current software and hardware technologies.
References 1. Shen, J.P., Lipasti, M.H.: Modern processor design: fundamentals of superscalar processors, 1st edn., p. 656. McGraw-Hill, New York (2005) 2. Dongarra, J., Gannon, D., Fox, G., Kennedy, K.: The Impact of Multicore on Computational Science Software. CTWatch Quarterly (2007), http://www.ctwatch.org/quarterly/articles/2007/02/ the-impact-of-multicore-on-computational-science-software 3. Protić, J., Tomašević, M., Milutinović, V.: Distributed shared memory: concepts and systems, p. 375. Wiley-IEEE Computer Society Press, University of Belgrade, Serbia (1997) 4. Kaiser, T.H., Brieger, L., Healy, S.: MYMPI – MPI Programming in Python. In: Proceedings of the International Conference on Parallel and Distributed Processing Techniques, USA (June 2006) 5. Miller, P.: pyMPI – An Introduction to parallel Python using MPI, UCRL-WEB-150152 (September 2002) 6. Dalcin, L., Paz, R., Storti, M., Elia, J.D.: MPI for Python: Performance improvements and MPI-2 extensions. J. Parallel Distrib. Comput. 68, 655–662 (2008) 7. Burns, G., Daoud, R., Vaigl, J.: LAM: An Open Cluster Environment for MPI. Ohio Supercomputer Centre, Columbus, Ohio (1990) 8. Taboada, G.L., Tourino, J., Doallo, R.: Java for High Performance Computing: Assessment of current research & practice. In: Proceedings of 7th International Conference on Principles and Practice or Programming in Java, Calgary, Canada, pp. 30–39 (2009) 9. MPI: A Message Passing Interface Standard. Message Passing Interface Forum, http://www.mpi-forum.org/docs/mpi-11-html/mpi-report.html 10. Geist, A., et al.: MPI-2: Extending the Message-Passing Interface. In: Fraigniaud, P., Mignotte, A., Bougé, L., Robert, Y. (eds.) Euro-Par 1996. LNCS, vol. 1123, Springer, Heidelberg (1996) 11. Carpenter, B.: MPJ specification (mpijava 1.2 : API Specification) homepage on HPJAVA, http://www.hpjava.org/reports/mpiJava-spec/ mpiJava-spec/mpiJava-spec.html 12. Nieuwpoort, R.V., et al.: Ibis: an Efficient Java based Grid Programming Environment. Concurrency and Computation: Practice and Experience 17(7-8), 1079–1107 (2005) 13. Nelisse, A., Maassen, J., Kielmann, T., Bal, H.: CCJ: Object-Based Message Passing and Collective Communication in Java. Concurrency and Computation: Practice & Experience 15(3-5), 341–369 (2003) 14. Al-Jaroodi, J., Mohamed, N., Jiang, H., Swanson, D.: JOPI: a Java object-passing interface: Research Articles. Concurrency and Computation: Practice & Experience 17(7-8), 775–795 (2005)
Survey of MPI Implementations
219
15. Martins, P., Moura Silva, L., Gabriel Silva, J.: A Java Interface for WMPI. In: Alexandrov, V.N., Dongarra, J. (eds.) PVM/MPI 1998. LNCS, vol. 1497, pp. 121–128. Springer, Heidelberg (1998) 16. Mintchev, S., Getov, V.: Towards Portable Message Passing in Java: Binding MPI. In: Bubak, M., Waśniewski, J., Dongarra, J. (eds.) PVM/MPI 1997. LNCS, vol. 1332, pp. 135–142. Springer, Heidelberg (1997) 17. Judd, G., Clement, M., Snell, Q.: DOGMA: Distributed Object Group Metacomputing Architecture. Concurrency and Computation: Practice and Experience 10(11-13), 977–983 (1998) 18. Pugh, B., Spacco, J.: MPJava: High-Performance Message Passing in Java using Java. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 323–339. Springer, Heidelberg (2004) 19. Zhang, B.Y., Yang, G.W., Zheng, W.M.: Jcluster: an Efficient Java Parallel Environment on a Large-scale Heterogeneous Cluster. Concurrency and Computation: Practice and Experience 18(12), 1541–1557 (2006) 20. Kaminsky, A.: Parallel Java: A Unified API for Shared Memory and Cluster Parallel Programming in 100% Java. In: Proceedings of 9th International Workshop on Java and Components for Parallelism. Distribution and Concurrency, p. 196a (8 pages) (2007) 21. Baker, M., Carpenter, B., Fox, G., Ko, S., Lim, S.: mpi-Java: an Object-Oriented Java Interface to MPI: In 1st International Workshop on Java for Parallel and Distributed Computing, LNCS, vol. In: Rolim, J.D.P. (ed.) IPPS-WS 1999 and SPDP-WS 1999. LNCS, vol. 1586, pp. 748–762. Springer, Heidelberg (1999) 22. Genaud, S., Rattanapoka, C.: P2P-MPI: A Peer-to-Peer Framework for Robust Execution of Message Passing Parallel Programs. Journal of Grid Computing 5(1), 27–42 (2007) 23. Shafi, A., Carpenter, B., Baker, M.: Nested Parallelism for Multi-core HPC Systems using Java. Journal of Parallel and Distributed Computing 69(6), 532–545 (2009) 24. Bornemann, M., van Nieuwpoort, R.V., Kielmann, T.: MPJ/Ibis: A flexible and efficient message passing platform for java. In: Di Martino, B., Kranzlmüller, D., Dongarra, J. (eds.) EuroPVM/MPI 2005. LNCS, vol. 3666, pp. 217–224. Springer, Heidelberg (2005) 25. Bang, S., Ahn, J.: Implementation and Performance Evaluation of Socket and RMI based Java Message Passing Systems. In: Proceedings of 5th International Conference on Software Engineering Research, Management and Applications, Busan, Korea, pp. 153– 159 (2007) 26. Taboada, G.L., Tourino, J., Doallo, R.: F-MPJ: scalable Java message-passing communications on parallel systems. Journal of Supercomputing (2009) 27. Foster, I.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 1–4. Springer, Heidelberg (2001) 28. Myrinet webpage on MYRI, http://www.myri.com/myrinet/overview 29. Infiniband, http://www.infinibandta.org 30. Gustavson, D.B.: The Scalable Coherent Interface and Related Standards Projects. IEEE Micro. 18(12), 10–22 (1992) 31. Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implementation of the MPI message passing interface standard. J. Parallel Computing 22, 789–828 (1996) 32. Java Grande Forum, http://www.javagrande.org 33. Asghar, S., Hafeez, M., Malik, U.A., Rehman, A., Riaz, N.: A-JUMP Architecture for Java Universal Message Passing. In: Proceedings of 8th International Conference on Frontiers of Information Technology, Islamabad, Pakistan (2010) 34. Microsoft MPI, http://msdn.microsoft.com/enus/library/bb524831v=vs.85.aspx
220
M. Hafeez et al.
35. Willcock, J., Lumsdaine, A., Robison, A.: Using MPI with C# and the Common Language Infrastructure. Concurrency and Computation: Practice & Experience 17(7-8), 895–917 (2005) 36. Gregor, D., Lumsdaine, A.: Design and Implementation of a High-Performance MPI for C# and the Common Language Infrastructure. In: Proceedings of 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, UT, USA (2008) 37. MPICH2, http://www.mcs.anl.gov/research/projects/mpich2 38. MPICH-G, http://www3.niu.edu/mpi 39. Karonis, N., Toonen, B., Foster, I.: MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface. Journal of Parallel and Distributed Computing 63(5) (2003) 40. Kielmann, T., Hofman, R.F.H., Bal, H.E., Plaat, A., Bhoedjang, R.A.F.: MagPIe: MPI’s Collective Communication Operations for Clustered Wide Area Systems. ACM SIGPLAN Notices 34(08), 131–140 (1999) 41. Chen, L., Wang, C., Lau, F.C.M., Ma, R.K.K.A.: Grid Middleware for Distributed Java Computing with MPI Binding and Process Migration Supports. Journal of Computer Science and Technology 18(04), 505–514 (2003) 42. Welch, V., Siebenlist, F., Foster, I., Bresnahan, J., Czajkowski, K., Gawor, J., Kesselman, C., Meder, S., Pearlman, L., Tuecke, S.: Security for Grid Services. In: 12th International Symposium on High Performance Distributed Computing. IEEE Press, Washington (2003) 43. Globus Toolkit, http://www.globus.org/toolkit 44. Balkanski, D., Trams, M., Rehm, W.: Heterogeneous Computing With MPICH/Madeleine and PACX MPI: a Critical Comparison. Technische Universit, Chemnitz (2003) 45. Aumage, O.: MPI/Madeleine: Heterogeneous multi-cluster networking with the madeleine III communication library. In: 16th IEEE International Parallel and Distributed Processing Symposium, p. 85 (2002) 46. GridMPI, http://www.gridmpi.org/index.jsp 47. Graham, R.L., et al.: OpenMPI: A High-Performance, Heterogeneous MPI. In: Proceedings of IEEE International Conference on Cluster Computing, pp. 1–9 (2006)
Mathematical Model for Distributed Heap Memory Load Balancing Sami Serhan, Imad Salah, Heba Saadeh, and Hamed Abdel-Haq Computer Science Department [email protected], [email protected], [email protected], [email protected]
Abstract. In this paper, we will introduce a new method to achieve the fair distribution of data among distributed memories in a distributed system. Some processes consume larger amount of heap spaces [1] in its local memory than the others. So, those processes may use the virtual memory and may cause the thrashing problem [2].At the same time, the rest of heap memories in that distributed system remain almost empty without best utilization for heap memories of the whole distributed system. So, a UDP-Based process communication system is defined [3] to make use of remote heap memories. This is done by allocating dynamic data spaces for other processes in other machines. In fact, the increasing use of high-bandwidth and low-latency networks provides the possibility to use the distributed memory as an alternative to disk. Keywords: Distributed Heap Memory, Interconnection Network, Processes Communication Protocol, Message Passing, Message Format, Heap Memory Management, Multicomputers, Load Balancing.
1 Introduction Main memory is a core component in the computer system in which the program data and instructions are loaded to be ready for execution. The loaded program is called a process. When a CPU starts executing any process, it fetches its instruction and data from the main memory and starts executing them. In a distributed memory system (DMS) the memory is associated with individual processors and a processor is only able to address its own memory. Some authors refer to this type of system as a multicomputer [4][9]. There are several benefits of this organization. 1) There is no bus or switch contention. Each processor can utilize the full bandwidth to its own local memory without interference from other processors. 2) The lack of a common bus means there is no inherent limit to the number of processors; the size of the system is now constrained only by the network used to connect processors to each other. 3) There are no cache coherency problems [10]. Each processor is in charge of its own data, and it does not have to worry about putting copies of it in its own local cache and having another processor refers to them. The major drawback in the distributed memory design is that interprocessor H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 221–236, 2011. © Springer-Verlag Berlin Heidelberg 2011
222
S. Serhan et al.
communication is more difficult. If a processor requires data from another processor's memory, it must exchange messages with the other processor [5]. This introduces two sources of overhead: it takes time to construct and send a message from one processor to another, and a receiving processor must be interrupted in order to deal with messages from other processors. On the other hand, Distributed shared memory (DSM) machines [6][16] are more appropriate than message-based communication in distributed systems. The DSM is a technique for making multicomputers easier to program by simulating shared memory on them [4], and the communication between processors is much faster than DSM. There are many programming language support forms of DSM such as Orca [7][9]. NUMA (NonUniform Memory Access) architectures represent special type of DSM. The NUMA multiprocessor has a single virtual space that is visible to all CPU's [4]. Each CPU in the NUMA system is associated with a memory module that is directly connected to it, and the CPU's are connected by either a hierarchy of buses or switching network [8][15]. The paper is composed of six sections. Section 2 presents some related works. Section 3 introduces the proposed technique in dealing with distributed heap memory system. Section 4 illustrates the different message formats for the proposed process communication protocol. Section 5 presents the mathematical model of the proposed technique. Finally conclusions are derived in section 6.
2 Related Works Different contributions have been achieved regarding DSM and processes load balancing. A Genetic Software Engineering (GSE) [11] methodology was proposed to fully utilize the distributed systems heterogeneous computing resources to meet their heavy-duty computing needs without acquiring expensive supercomputers and servers. They depend on the fact that a gigabit Ethernet or similar high-speed networks are almost equivalent to the system bus timings of modern desktop workstations. A new dynamic load balancing scheme called RAS [12] was introduced to simultaneously deal with three issues in multicomputer systems: the computers heterogeneity, appropriate dealing with dynamic failures, and selecting a proper number of nodes in a node pool. RAS solves the load-balancing problem and dynamic failures by a work stealing mechanism, and the processor selection problem by data distribution based on a reservation scheme. An algorithm is suggested for token-based adaptive load balancing for dynamically parallel computations on multicomputer platforms [13]. This algorithm for load balancing is initiated and performed by the idle or under-loaded processes and requires token message circulating among the parallel processes and bearing information about the load distribution throughout the system. The following sections describe the proposed model for distributed heap memory load balancing.
3 The Proposed Model The proposed model consists of multiple processes; each is running only on one machine. The communication between processes is achieved by massage passing
Mathematical Model for Distributed Heap Memory Load Balancing
223
through Ethernet network. The main point in this model is to exploit the other processes heap memories in order to store some of an application's data of processes in other machines, and also to create variables of dynamic types and perform read, write and remove operations remotely. There is no centralized coordinator that controls the behavior and communication among the distributed processes. The design works as peer-to-peer approach [3][16]. So, no single point of failure exists. The proposed model works as the following: 1.
2. 3. 4. 5. 6. 7.
Assume there are multiple processes (i.e. process_1, process_2, process_3,…. Process_n). Suppose a process wants to reserve an array of m bytes on one of the other processes memories to store some of its application data. If the availabe memory space >= (1/3) size of the memory, then the process will create the variable on its local heap memory. Otherwise, process n requests for available space by sending broadcast message to all processes in the system. Each process responses to the request message by sending unicast message to process n that contains the available space on its machine. The initiator of the request (process n) chooses the suitable process for creating its data (according to the largest available memory space) Process n sends a message to the remote process that has the largest available memory informing it to reserve a location on its local heap memory. The remote process will create a variable in its local memory after reading the contents of the message sent by process n. Process1 requests for reserving array of 1000 byte Process 2 1 2 1 Process 1
Process 4
2 3000 Byte 2
1
3
2000 Byte Process 3
1: broadcast (check for clients heap space) 2: response of process available memory. 3: request to create data
Fig. 1. Process 1 asks for the most suitable space on remote process to create new data
224
S. Serhan et al.
In figure 1, process 1 wants to reserve an array of 1000 bytes. It sends a broadcast message to remote process (process 2, process 3, and process 4), this presents the first step. Process 2, 3, 4 each sends a unicast message that contains the available memory space, for example process 2 sends a message to process 1 informing it that it has 900 bytes of space available on its heap memory, as in step 2. Process 1 waits for all responses from all the clients (i.e. clients 2, 3, 4 send response messages contains available space 900, 2000, 3000 bytes respectively). Finally process 1 has a list of data that contains the process's IP addresses and its available memory, then it chooses the remote process that has the largest local memory and sends it a message informing that it want to request for reservation (to reserve array of 1000 bytes), in the given example, process 3 has the largest free memory (3000 bytes of free space). Reading, writing and removing data remotely using the previous approach are straight forward. It is done only by sending requests and receiving responses. You should notice that the shared data is not considered here (we only deal with the process private data). So, the shared memory synchronization problem is avoided. In other words, each remotely reserved data is associated to only one process (none of the other processes is trying to access or write on it). The problem of unequal heap memory allocation on different machines is solved, as shown in figure 2(a). Some processes may consume their whole heap memories, the others may not. So, the former may force some needed pages to be replaced by another new one, and that behavior continues making the CPU time to be wasted in page memory transferring, instead of executing the process instructions. This phenomenon is called thrashing. So, the fair distribution of data among the different heap memories is the main goal of this paper, as shown in figure 2(b). The proposed model highly depends on the fact that a gigabit Ethernet or similar high-speed networks are almost equivalent to the system bus speed of modern desktop workstations [14][15].
Fig. 2. (a) Unequal distribution of data application among distributed heap memories. (b) Fair distribution of data among them.
Mathematical Model for Distributed Heap Memory Load Balancing
225
4 Message Format There are two main types of messages defined in the proposed model that are exchanged between remote clients in order to reserve data in one of the client's memory. The first one is a Request message; the second one is Response message. Notice that the words written in italic take different values according to the message type and the sent information. The capital letter words are fixed and clarify the communication protocol specification. 4.1 Request Message In the request message there are five types of messages, each with different format. 1.
Check_Create message (REQ-1): It is a broadcast message that requests for available space from remote clients. The format of this message is CHECK_CREATE>
= 1/3 * total_mem If the previous condition is satisfied, it will reserve the variable in its local heap memory. Otherwise, REQ-1 will be broadcasted to the other machines. The RES-1 messages will be collected after a specified amount of time. The node that has the maximum available memory will be chosen to be the target to store the data, as in equation 1.
Mathematical Model for Distributed Heap Memory Load Balancing
229
6 Experimental Results Many experiments were conducted in order to determine the best threshold value of the available memory to allocate variables on the local process space. Any distributed system either consist of homogenous or heterogeneous machines; hence the experiments took place for the two types of systems. The proposed model was implemented within different environmental aspects like: 1) different number of variables allocation; 625, 2500, and 3750, 2) different distribution for the variables allocation; Normal and unfair distribution, 3) different ratios/schemes for memory reservation used for non-Local variables allocations; 0%, 30% and 50%. The following sections summaries the performed experiments.
Fig. 3. Comparison between 0, 30 and 50% in terms of the NOLV, NORV and NOFV probabilities in 3750 variables allocation, with Normal distribution, in Homogenous system
230
S. Serhan et al.
6.1 Experimental Results for Homogenous System The simulator for the distributed homogenous system consists of 10 processes each contains 3000 Bytes memory space. The experiments examine the probabilities of: Number of Local Variables (NOLV), Number of Remote Variables (NORV) and Number of False Variables (NOFV) which indicate the number of variables that failed to be allocated locally and remotely. Normal and unfair distribution were examined in the three reservation schemes 0%, 30% and 50%. Figure 3 show that the 0% reservation scheme has the highest and lowest NOLV and NORV probabilities respectively, whereas the 50% has the opposite behavior. The 10 processes in the homogenous systems with normal distribution behave the same despite the number of variables allocation.
Fig. 4. Comparison between 0, 30 and 50% in terms of the NOLV, NORV and NOFV probabilities in 2500 reservation, with Unfair distribution, in Homogenous system
Mathematical Model for Distributed Heap Memory Load Balancing
231
The behavior of the three schemes in 2500 and 3750 variables allocation in terms of the NOLV, and NORV was similar; the same as the behavior in figure 3, on the contrary the schemes differ in NOFV probability since there was some available memory in 2500 variables allocation, therefore all the variables were allocated either remotely or locally and there were no false variables allocation. However in the 625 variables allocation and after running the different schemes, all the processes could allocate their variables locally hence the NOLV = 1, and NORV = NOFV = 0. With the unfair distribution the performance of the 10 processes differ, especially in process 9 and 10, since the number of variables which were allocated on these processes was larger than the other processes and this affect the NOLV and NORV probabilities as illustrated in figure 4. The Load Balancing (LB) value was calculated in the different schemes in order to compare between them. The results were as shown in figure 5. In the 3750 variables allocation the LB value in the three schemes was identical = 0.00067 this is because of the large number of allocations that utilize all the memory in the different processes. As noticed from figure 5 the LB in the unfair distribution was better in 30 and 50 % schemes than that in the normal distribution. Although the difference is tiny -in the 50% reservation scheme- the LB value was smaller “better” than the other schemes.
Fig. 5. Comparison between 0, 30 and 50% in terms of LB in 625, 2500 variables allocation, with Normal and Unfair distribution in homogenous system
232
S. Serhan et al.
6.2 Experimental Results for Heterogeneous System The simulator for the distributed heterogeneous system also consists of 10 processes. Each process contains memory space between 1000 and 3000 byte, allocated randomly; table 1 show the memory space allocated to each process. The experimental factors were the same as the homogenous system. Figure 6 shows that the behavior of the three schemes in 2500 and 3750 variables allocation in terms of the NOLV, and NORV probabilities was similar; the 0% reservation scheme has the maximum NOLV and the minimum NORV probability values whereas the 50% reservation scheme has the opposite behavior. Process 8 has the minimum and the maximum NOLV and NORV respectively, because it has the minimum memory space allocated 1000 byte.
Fig. 6. Comparison between 0, 30 and 50% in terms of NOLV, NORV and NOFV in 2500 variables allocation, with Normal distribution, in Heterogeneous system
Mathematical Model for Distributed Heap Memory Load Balancing
233
Table 1. Memory space allocated to processes
Process id 1 2 3 4 5
AM 3000 1500 2000 2500 3000
Process id 6 7 8 9 10
AM 1500 2500 1000 3000 2000
Fig. 7. Comparison between 0, 30 and 50% in terms of NOLV, NORV and NOFV in 3750 reservation, unfair distribution, Heterogeneous system
234
S. Serhan et al.
In the unfair distribution the behavior of the three schemes in 625, 2500 and 3750 reservation in terms of the NOLV, and NORV allocated was similar, they only differs in the NOFV probability. As illustrated in figure 7 the processes 9 and 10 have the largest NORV probability since the number of allocations on these processes was higher than the others, on the contrary they have the smallest NOLV, and this is due to the unfair allocation distribution among the different processes. The general performance between the normal and unfair distribution is a like between the three schemes. The LB value was calculated in the different schemes in order to compare between them. The results were as shown in figure 8.
Fig. 8. Comparison between 0, 30 and 50% in terms of LB in 625, 2500 and 3750 reservation, Normal and Unfair distribution
Mathematical Model for Distributed Heap Memory Load Balancing
235
7 Conclusion The aim of the proposed approach is to make effective use of the whole distributed heap memory spaces and to apply a fair distribution of private data applications such as dynamic objects, arrays and variables among the distributed heaps. This operation is achieved using message passing over wireless LANs. So, different types of messages each with different format are defined. The advantages of the proposed method are to avoid unequal heap memory allocation in different machines. Also, the thrashing problem is eliminated. In the proposed model, the concentration was on the private data for processes instead of public shared data, so, there is no memory synchronization problem. We examined different reservation schemes –in homogenous and heterogeneous systems- for memory reservation used for non-Local variables allocations; 0%, 30% and 50%, with different number of variables allocation; 625, 2500, and 3750 in normal and unfair distribution. The results show that the 0% reservation has the maximum number of variables allocated locally and the minimum number of variables allocated remotely, and the best LB values were in normal 2500 and 3750 variables allocation.
References [1] Lopez-Ortega, O., Lopez-Morales, V.: Cognitive communication in a multi-agent system for distributed process planning. International Journal of Computer Applications in Technology (IJCAT) 26(1/2) (2006) [2] Katzke, U., Vogel-Heuser, B.: Design and application of an engineering model for distributed process automation. In: Proceedings of the American Control Conference 2005, vol. 4, pp. 2960–2965 (June 2005) [3] Kruse, RoBert, L., Rybe, Alexander, J.: Data Structure and Program Design in C++. Prentice Hall Inc., Englewood Cliffs (1998) [4] Wilkinson, Barry: Computer Architecture design and performance, 2nd edn. Prentice Hall Europe, Englewood Cliffs (1996) [5] Peterson, Larry, L., Davie, Bruce, S.: Computer Network A System Approach, 3rd edn. Morgan Kaufann Publisher, San Francisco (2003) [6] Tanenbaum, Andrew, S.: Distributed Operating Systems. Prentice Hall Inc., Englewood Cliffs (1995) [7] Geist, A., Beguelin, A., Dongarra, J., Manchek, R., Jaing, W., Sunderam, V.: PVM: A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, Boston (1994) [8] Coulouris, George, Dollimore, J., Kindberg, T.: Distributed Systems: Concepts and Design, 2nd edn. Addision-Wesely, Harlow (1994) [9] Bal, H.E., Kaashoek, M.F., Tanenbaum, A.S.: Orca: A language for Parallel Programming of Distributed Systems. IEEE Trans. On Software Engineering 18, 190– 205 (1992) [10] Di Stefano, A., Santoro, C.: A Java kernel for embedded systems in distributed process control (December 2000) [11] Concurrency, IEEE [see also IEEE Parallel & Distributed Technology] 8(4), 55-63 (October-December 2000)
236
S. Serhan et al.
[12] Sithirasenan, E., Muthukumarasamy, V.: A Model for Object-based Distributed Processing using Behaviour Trees. Proceeding (436) Software Engineering and Applications (2004) [13] Kee, Y., Ha, S.: A Robust Dynamic Load-balancing Scheme for Data Parallel Application on Multicomputer Systems. In: Proceedings of International Conference on Parallel Programming Environment (1998) [14] Borovska, P., Lazarova, M.: Token-Based Adaptive Load Balancing for Dynamically ParallelComputations on Multicomputer Platforms. In: International Conference on Computer Systems and Technologies – CompSys.Tech. (2007) [15] Faes, P., Christiaens, M., Stroobandt, D.: Mobility of Data in Distributed Hybrid Computing Systems. In: IEEE International Parallel and Distributed Processing Symposium, p. 386 (2007) [16] Roohi Shabrin, S., Devi Prasad, B., Prabu, D., Pallavi, R.S., Revathi, P.: Memory Leak Detection in Distributed System. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 16 (November 2006) [17] Serhan, S., Armiti, A., Herbawi, W.: A Heuristic Distributed System Load Balancing Technique with Optimized Network Traffic. Accepted for publication in AMSE Journals/ series D (2010)
Effectiveness of Using Integrated Algorithm in Preserving Privacy of Social Network Sites Users Sanaz Kavianpour, Zuraini Ismail, and Amirhossein Mohtasebi University Technology Malaysia (UTM), Kuala Lumpur, Malaysia [email protected], [email protected], [email protected]
Abstract. Social Network Sites (SNSs) are one of the most significant and considerable topics that draw researches attention nowadays. Recently, lots of people from different areas, ages and genders join to SNSs and share lots of different information about various things. Spreading information can be harmful for users’ privacy. In this paper, first we describe some definitions of social networks and their privacy threats. Second, after reviewing some related works on privacy we explain how we can minimize the information disclosure to adversaries by using integrated algorithm. Finally we try to show the effectiveness of proposed algorithm in order to protect the users’ privacy. The enhanced security by anonymizing and diversifying disclosed information presents and insures the paper aim. Keywords: Online Social Network, Privacy, K-Anonymity, -Diversity and anonymizing.
1 Introduction to Social Networks Web-based services which allow users to make profiles, share information such as photos, interests, activities and go through others information and activities is called Online Social Networks or Social Networking Sites (Ellison, 2007). Based on the nature of activities that occurs in social networks, they can be categorized into various types. As social network sites encompasses huge amount of information they are a suitable environment for attackers to mine. Although social network sites have benefits for people, they have significant problem that threaten the security and privacy of users. Three main elements make privacy one of the main issues in online communities: The mixture of social networks and web create a lucrative environment for abusers, Huge amount of information, all in one place, and almost unprotected from others’ access on one hand, and the nature of web that makes attackers anonymous lead to this matter. Based on Boyd (Boyd, 2004) people trust more in online social network sites those offline communities so they reveal more information, more honestly about themselves. All these conditions and cases lead to privacy threats. As these social networks become more popular every day, some major efforts have been done by researchers in order to protect the privacy of online users. These efforts range from adding some filters to empower users controlling their information flow, to some complex algorithms to make users information anonymous on the eye of advertisers and users of information. The aim of this paper is to depict how H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 237–249, 2011. © Springer-Verlag Berlin Heidelberg 2011
238
S. Kavianpour, Z. Ismail, and A. Mohtasebi
information disclosure to adversaries by social network sites users can be minimized by using integrated algorithm. So by reviewing privacy threats to SNSs it figures out the vulnerabilities and presents the integrated algorithm to anonymize users’ information in an optimized way that leads to increase the users’ privacy.
2 Privacy Threats of Social Networks The number of social networking sites’ member has been increasing during last year. The SNSs’s victory depends on the number of their members; therefore the providers attempt to improve the design and to enhance the attractiveness in order to attract more users. In the development of social network sites, the privacy and security have not the precedence by design. Based on that, there are many threats in SNSs that can compromise the user’s privacy (Hasib, 2009). Table 1. Threats of Social Network Sites Threats
1. Privacy Related Threats
Threats Sample x
Digital Dossier of Personal Information
x
Face Recognition
x
Content-based Image Retrieval
x
Image Tagging and Cross-profiling
x
Difficulty of Complete Account Deletion
x
Spamming
2. SNS Variants of Traditional Network and
x
Cross Site Scripting, Viruses and Worms
Information Security Threats
x
SNS Aggregators
x
Phishing
x
Information Leakage
x
Profile squatting through Identity theft
x
Stalking
x
Corporate Espionage
3. Identity Related Threats
4. Social Threats
The threats can be divided into four groups as below: 1. 2. 3. 4.
Privacy Related Threats SNS Variants of Traditional Network and Information Security Threats Identity Related Threats Social threats
All these four main groups of threats can be categorized into subgroup as following: A. Privacy Related Threats 1. Digital Dossier of personal information 2. Face Recognition 3. Content-based Image Retrieval
Effectiveness of Using Integrated Algorithm in Preserving Privacy
239
4. Image Tagging and Cross-profiling 5. Difficulty of Complete Account Deletion B. SNS Variants of Traditional Network and Information Security Threats 1. Spamming 2. Cross Site Scripting, Viruses and Worms 3. SNS Aggregators C. Identity Related Threats 1. Phishing 2. Information Leakage 3. Profile squatting through Identity theft D. Social threats 1. Stalking 2. Corporate Espionage According to these threats, some threat scenarios depicted as below in order to illustrate how users’ information will pass to third parties and how this disclosure of information can threaten the users’ privacy.
Fig. 1. Profiling investigation scenario
240
S. Kavianpour, Z. Ismail, and A. Mohtasebi
2.1 Profiling Investigation Person A joins a social network and gains some followers for him. He shares some ideas about religion or politics. Person C impersonates himself as one of the followers and monitor all Person A’s ideas. The shared information able Person C for future references such as blackmailing. The blackmailing can hurt Person A either by embarrassing socially or ravaging legally. This is not avoidable as a blackmailer has physical proof of information that was shared by Person A. 2.2 Impersonate Person C wants to gain access to a corporation local network. His problem is that there is a firewall that blocks his access to that network. His trick is that he traverses all the profiles of employees in that company using a social network site such as LinkedIn and personates himself as Person A who is the IT guy of the organization. He sends some of them an email with Trojan. As soon as one of the targets (Person B) opens the email and runs the Trojan, a backdoor will be opened up into the corporate network and Person C can get access to the whole corporate network.
Fig. 2. Impersonate scenario
2.3 Browse and Search Person A or Person B can Searches and browses different websites. They shares lots of information about their searches in social network sites, and buy things from online shopping websites. However, they have not read the terms of usage of those websites and do not know they keep the right to sell that information to third parties. As a result those records can be shared with third party companies such as insurance companies or some more crucial agencies such as government.
Effectiveness of Using Integrated Algorithm in Preserving Privacy
241
Fig. 3. Browse and search scenario
For instance, Person A searches for sport’s book and post information on his social network account. The social network site will consider him as sport adventurer and based on that send update sport news for him. His data may sell to insurance companies, so they can decide to increase his health insurance payment. Another example is that Person B buy a religious book online and shares the information with their friends on Facebook. The Amazon and Facebook create a profile for him and consider him as an old religious follower. The government will use this profile and place him in terrorism group. 2.4 Social Network Sites Aggregators The above figure shows the social network sites aggregators scenario. Some social network aggregators such as Friendfeed or Spindex give users ability to aggregate all the shared information in their all social networks in one place. In this way, when Person A shares something in Facebook or Twitter, all his followers get updates automatically and real-time. This is very helpful for Person C who is an attacker to bypass all security measures in all other social networks by just passing Friendfeed security measures and getting friends with Person A. The only thing he should do is to impersonate himself as one of Person A’s followers.
242
S. Kavianpour, Z. Ismail, and A. Mohtasebi
Fig. 4. Social network sites aggregators
2.5 Digital Dossier of Background Information Person B has a profile in a social network. She fills up her profile with her personal data and shares lots of information with her friends and other users. When she applies for a job, Person A who is the employer of that organization goes thorough all her profile in the social network and finds out about her personality. If Person A discovers some immoral background about Person B, she will lose her job opportunity. All in all, not be aware of the importance of disposed information and the related risks by most of users particularly teenagers and non IT people can provide an opportunity for attackers in order to perform their malicious action. Wall posts, photos and friends list have the most threat to privacy in compare with users’ profiles. On the other hand, privacy settings guidelines and awareness are not adequate, automatically and user-friendly. Therefore setting and managing them are not an easy task even for those users who are aware about security threats. These settings are not complete enough to cover all threats. Along with the benefits and attractions of online social network, security and privacy concerns are also exist that require to be protected. As attackers and intruders are always eager to penetrate the systems and network in order to perform malicious actions and attacks the target, the best way in order to terminate their action is to remove the SNS’s vulnerabilities and improve the security and privacy settings.
Effectiveness of Using Integrated Algorithm in Preserving Privacy
243
Fig. 5. Digital dossier of background information scenario
3 Related Works on Privacy of Social Networks Unfortunately, there are not very comprehensive relate works about increasing privacy in social networks by researchers. Most of enhancements were done by vendor companies such as Facebook or MySpace on their own network. However, these enhancements attracted lots of negative critics as they seems to make configurations more complicated and even intentionally lead users in a way to reveal more information to public (Agranoff, 2009). In 2009, Ford et al. for the first time migrate the k-anonymity that is normally used for microdata privacy protection into online social networks. They categorized attributes as identifiers, quasi-identifiers and sensitive attributes. In their algorithm, they used the previous k-anonymity algorithm and extend it to p-sensitivity. Simply speaking, while records in this method are k-anonymous, they are p-sensitive as well and as a result some sensitive attributes are more secured than quasi-identifiers against re-identification (Ford, 2009). In 2008, Adu-Oppong and Gardiner developed an application to automatically find social circles around each user and group his/her friends into them in order to increase privacy (Adu-Oppong & Gardiner, 2008). In 2007, Byun et al. introduced an optimized method for k-anonymity and named it as k-member model in order to rectify the huge information loss from k-anonymity (Byun, 2007). In 2006, Machanavajjhala et al. worked on attacks that can break the anonymity of data. They suggested two methods: homogeneity of data & background knowledge and proposed a new algorithm in order to rectify this problem that is known as –Diversity (Machanavajjhala, 2006).
244
S. Kavianpour, Z. Ismail, and A. Mohtasebi
In 2005, Jones and Soltren created some threat scenarios, and then by applying URL hacking technique, they wrote a python script to download each user’s profiles and extract personal information in them. They saved online information in those pages to find out how much information in those profiles is available to public. To put the results in a nutshell, it is enough to say that among four universities they focused on, they could download and access 72% of total registered users’ profiles (Jones & Soltren, 2005).
4 How to Minimize Information Disclosure to Adversaries by Using Integrated Algorithm The integrated algorithm is based on k-anonymity and -diversity algorithms. At first by using integrated algorithm social network will be divided into some clusters in which the number of records is equal or bigger than k and records have most similarity in attributes. The clustering that is used by integrated algorithm is based on k-anonymity that will describe in details in section 4.1. The values need to be generalized in order to be anonymized. The generalization will do on quasi-identifiers (Section 4.1). The k-anonymized records can be compromised by two types of attacks background knowledge and homogeneity of data. For example, suppose that in one cluster with k=4 records all the quasi-identifiers are the same and in the sensitive attributes we have two hobbies smoking and drinking alcohols. If Alice knows that Bob is in that cluster based on quasi-identifiers (homogeneity of data) and Alice saw Bob when he was buying alcohol (background knowledge), she can directly get all the information about Bob. To cover this vulnerability of k-anonymity the integrated algorithm will diversify the sensitive attributes based on the number of in order to have number of diversified sensitive attributes for k-similar attributes by using -diversity algorithm which its details are in section 4.2. 4.1 K-Anonymity Algorithm The idea of k-anonymity is backed by partitioning the social network graph (Sweeney, 2002). The goal is to divide a social network graph into some clusters where count of records (persons) in that cluster with the same quasi-identifiers is equal or bigger than “k”. In this way, this technique reduces the chance of matching records to pinpoint a person tremendously. The first algorithm that we implement is k-anonymity algorithm. Based on this algorithm the social network will be divided into different clusters. The users which are known as records that have most similar quasi-identifiers attributes which can be mixed in order to reach the attacker to the individual will be placed in the same cluster. As a result of that less generalization is required because the data quality will increase by this kind of clustering. There is no limitation for number of clusters, but they should encompass all the records of social network. The process of clustering of this algorithm is to move among all records and find the most similar ones; place them in the same cluster. It will consider that place at least k records in each cluster. In the case that less than k
Effectiveness of Using Integrated Algorithm in Preserving Privacy
245
records remain out of the cluster, it will look for the most similar records in the clusters and place the rest in them. Each cluster can contain at least k records and at most 2k-1 records. The main action that needs to be performed in order to make users’ identity anonymous in k-anonymity algorithm is generalization (Sweeny, 2002). It means replacing the values of attributes with other related values that are in their domain. The goal of generalization is to mitigate the accuracy of the data to harden the access to the data for unauthorized users. The frontier of generalization should have limitation as extreme generalization will reduce the value of the data and make the data unusable (John Miller, 2008). The domains of each attributes are defined as a taxonomy tree. Each taxonomy tree commences from root and goes down into different layers branches. There are two types of attributes numerical and categorical. In order to generalize numeric attributes we can consider their values as intervals and for categorical attributes set of related values can be set out. These taxonomy trees help us to accomplish generalization more accurately which reduce the generalization information loss. As we move towards the root in each tree the possibility of identifying the individual will decrease.
Fig. 6. Taxonomy tree of marital status
The identifiers attributes should be removed from the set at first. Then generalization will be commencing on quasi-identifiers based on the taxonomy trees. Taxonomy trees with branches are profitable and let generalization to pass on with less information loss. But the main problem occurred when we face to flat taxonomy tree. In this case, in order to generalize the records we just can suppress the attribute and this suppression will increase the information loss. As generalization increase the information loss will increase. This will happen due to less information that will be given because of high generalization. Taxonomy trees with more levels can prevent the enhancement of information loss. Thus applying taxonomy trees with deeper hierarchies will improve the generalization and as a consequence will deliver the best and usable result. The distances between both numerical and categorical attributes need to be calculated in order to make it possible for us to compute information loss. As generalization which should be performed in clusters to anonymize the records will cause
246
S. Kavianpour, Z. Ismail, and A. Mohtasebi
distortion, it requires having an accurate metric to measure. For each cluster it will be calculate separately and at last the sum of all will give the total information loss that this amount should be less than fixed value. The fixed value will define by administrator and it should be bigger than zero. Although k-anonymity can be one of the good solutions for preserving privacy, it cannot guarantee the users’ privacy completely as it is vulnerable to some attacks. The two most important attacks that can decrease the percentage of k-anonymity integrity are homogeneity and background knowledge. In order to cover this problem another algorithm that is called -diversity should be implemented on the result of kanonymity. 4.2
-Diversity Algorithm
-Diversity is a complementary method to k-anonymized records and it dictates that in each k-anonymized graph, each k similar generalized records should have number of diversified sensitive attributes (Machanavajjhala, L-Diversity: Privacy Beyond KAnonymity, 2006). The basic idea of -diversity algorithm is to replace sensitive attribute with well-represented sensitive values. In order to be able to perform the diversity the entropy of our data set should be at least log ( ). The -1 background knowledge needs to be known by the adversary in order to make him able to ignore 1 sensitive value and gain the exact sensitive attribute. As a number of sensitive attribute increase the complexity of -diversity will increase. It means that large data is required in order to perform diversifying; and each sensitive attribute should be well-represented by others sensitive values. In order to lessen the access to the sensitive attribute by background knowledge, enhancing the value has a significant role as it requires more information to reach to the attribute directly. The second algorithm that we implement on the result of the previous algorithm is -diversity. The k-anonymity algorithm results can reveal information positively or negatively to the adversary’s background knowledge. Positive disclosure is when the adversary directly gains the sensitive attribute from k-anonymity result. Negative disclosure is when the adversary easily erased the sensitive attribute. Generally the exposed information shouldn’t give the adversary lots of information more than the background knowledge. The main and basic area that this algorithm plays a role is on sensitive attributes. It doesn’t do any action with non- sensitive attributes opposed to k-anonymity algorithm which works on non-sensitive attributes. Sensitive attributes are those attributes that the values of them must be hiding from adversary to prevent their access to any user. The main difference between K-anonymity and - Diversity algorithms is on how the clustering is done. In the K-anonymity method, the clustering has only one phase which is partitioning population based on the similarity of quasi-identifiers. Obviously, the main measurement for the successfulness of this algorithm is the lowest information loss as well as keeping data anonymous. The way clustering works is straight forward in theory. First, it prepares the population and sort them based on quasi identifiers similarity. For instance, when we have two sensitive attributes, diversity should be applied in both of them that inherently increase the information loss metric. The heuristic workaround is to combine
Effectiveness of Using Integrated Algorithm in Preserving Privacy
247
two sensitive values and make one combined sensitive value. An example can be as table 2. Table2. Combination of sensitive attributes Sensitive 1 Cold Fever Malaria HPV Headache
Sensitive 2 Cigarettes Cigarettes Codeine Cigarettes LSD
Combined Sensitive Values Cold; Cigarettes Fever; Cigarettes Malaria; Codeine HPV; Cigarettes Headache; LSD
The next phase is to group rows based on combined sensitive values. The idea is to group rows based on combined sensitive values and then select number of rows with distinct combined sensitive values from each group to satisfy -diversity condition. The clustering section starts here. The values will be sort again in each group based on the similarity of quasi-identifiers. As a result, rows next to each other are more similar from rows far from each other in each group and lead to lower information loss if they be in one cluster. For example, if the application needs to satisfy K=10 anonymity, and L=5 diversity, it starts from the first group and put the first 5 available rows in one cluster to maximize the similarity of quasi identifiers in one group. For the next 5 rows needs to be put in the cluster it selects one (and only one) row from each next cluster. For the remaining items that the total members of them are less than K they will be distributed to the clusters at the end. Moreover, for some of the records that can break the diversity condition, they will be reported at the end of this phase to the user. The integrated algorithm can be useful in protecting users from re-identification by advertisement, research agencies and governmental agencies. This algorithm can be used on online social network sites as a default, so the users’ information and privacy will be more secure.
5 Effectiveness of Integrated Algorithm The main goal of this paper is to anonymize data and this will be done by generalization. As generalization will cause distortion in data there should be a metric for measuring the distortion in order to improve it. The information loss is the main metric in order to calculate the amount of distortion. The amount of information loss in the k-anonymity part of the integrated algorithm depicts that the result is more optimized than any previous type of k-anonymity algorithm. But when this k-anonymity combined with the l-diversity in order to implement integrated algorithm the information loss will increase somehow. As there is no metric to compare the result of integrated algorithm with, we cannot be sure that it won’t be useful, especially when we can overcome two types of attacks on the k-anonymity by adding l-diversity. And also the records are more secure in integrated algorithm as they have number of diversified sensitive attributes. As the number of increase, the probability of reaching to the exact records will decrease. The following figure depicts the reduction of probability based on the number of .
248
S. Kavianpour, Z. Ismail, and A. Mohtasebi
60%
50%
50%
33.30%
40%
11.10%
10%
6
12.50%
5
20%
14.20%
16.60%
30%
20%
25%
Probability of reaching to records
Security in K+L
7
8
9
10
10% 0% 2
3
4
L
Fig. 7. The security in K+L
6 Conclusions In this paper we depict the effectiveness of proposed algorithm. In the k-anonymity part of this algorithm the information loss become less than information loss in the conventional k-anonymity algorithm and it happens due to better method for generalization. In the -diversity part, the two type of attack that can compromise the kanonymity were blocked by diversifying the records. The integrated algorithm is able to protect users’ information from attackers’ access by anonymizing them, so when they pass to third parties they are not clear and accessing to them is not possible directly. This algorithm is also diversifying the sensitive information; therefore it secures the privacy of users.
7 Future Work The main goal in the k-anonymity algorithm is to reduce the information loss but in the -diversity is to increase the security. Although generalization in -diversity is similar to k-anonymity, the attributes that they focus on is different. Thus the information loss in -diversity may increase a bit in compare with k-anonymity. Finding new method to reduce the information loss on the proposed algorithm (k+ ) and also improving diversifying by increasing the number of sensitive attributes are subject of future work. Acknowledgments We would like to express our sincere gratitude to our parents and all the teachers and lecturers who help us to understand the importance of knowledge and show us the
Effectiveness of Using Integrated Algorithm in Preserving Privacy
249
best way to gain it. Also, we are very thankful to all of our friends who gave us suggestions and solutions in any aspect of our paper in order to achieve the optimized and best results.
References 1. Adu-Oppong, F., Gardiner, C.K.: Social Circles: Tackling Privacy in Social Networks. In: SOUPS, Pittsburg (2008) 2. Agranoff, C.: Facebook’s Latest Controversy – New Privacy Changes Violate PolicyAvailable from World Wide Web (2009), http://www.rev2.org/2009/12/17/facebooks -latest-controversy-new-privacy-changes-violate-policy/ (accessed January 18, 2010) 3. Boyd, D.: Friendster & Publicly Articulated Social Networking. In: Conference on Human Factors and Computing Systems (CHI 2004), pp. 24–29 (2004) 4. Byun, J.-W., Kamra, A., Bertino, E., Li, N.: Efficient K-Anonymization Using Clustering Techniques. CERIAS and Computer Science (2007) 5. Ellison, D.M.: Social Network Sites: Definition, History, and Scholarship. ComputerMediated Communication (2007) 6. Emam, K.E.: A Globally Optimal k-Anonymity Method for the De-identification of Health Data. Journal of the American Medical Infomatic Addociation, 670–682 (2009) 7. Ford, R., Truta, T.M., Campan, A.: P-Sensitive K-Anonymity for Social Networks. Northern Kentucky University (2009) 8. Gross, R.A.: Information Revelation and Privacy in Online Social Networks. In: The 2005 ACM Workshop on Privacy in the Electronic Society, pp. 71–80 (2005) 9. Hasib, A.A.: Threats of Online Social Networks. IJCSNS International Journal of Computer Science, 288–293 (2009) 10. John Miller, A.C.: Cnstrained k-Anonymit: Privacy with Generalization Boundaries (2008) 11. Jones, H., Soltren, J.H.: Facebook: Threats to Privacy (2005) 12. Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: ℓ-Diversity: Privacy Beyond k-Anonymity. In: Proc. 22nd Intnl. Conf. Data Engg. (ICDE), p. 24 (2006) 13. Sweeney, L.K.: Anonymity: A Model For Protecting Privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 557–570 (2002) Sweeny, L.: Achieving k-Anonymity Privacy Protection Using Generalization and Suppression. International Journal on Uncertainty,Fuzziness and Knowledge-based Systems, 571–588 (2002)
A Web Data Exchange System for Cooperative Work in CAE Min-hwan Ok and Hyun-seung Jung Korea Railroad Research Institute, 360-1 Woulam, Uiwang, Gyeonggi, Korea 437-757 [email protected]
Abstract. The popular Social Network Service, SNS, induces an open system for collaboration. A data exchange system is opened without any grant as a Computer-supported Cooperative-work tool for Computer-Aided Engineering via Web interface. The SNS plays the role for security instead. SNS is advantageous in recruiting members adequate to the cooperative work, and this open system could be a good workplace under the paradigm of convergence. Keywords: Peer-to-peer file transfer, Computer-supported Cooperative-work, Computer-Aided Engineering.
1 Introduction The paradigm of convergence is being settled in many areas of engineering today. This implicates not only common development between technically related fields but also incarnations of new interdisciplinary fields. Thus there are a number of works collaborated between workgroups, organizations, and even countries, which implies collaborations among members geographically distant from each other. The members would use computer-supported collaboration tools although the tools could not cover the entire process of the cooperative work. These days we face another paradigm of Open Society with popular Social Network Service. The paradigm of Open Society is induced by SNS, and further, vice versa. Due to security concerns, canonical CSCW tools are developed in the concept of closed system. Only users granted are accessible to contents of the closed system and most of commercial CSCW tools are in this category nowadays. However, the closed system prevents Open Society and it might become obsolete among CSCW tools in those paradigms. In this work a data exchange system is opened without any grant as a CSCW tool for CAE via Web interface. The SNS plays the role for security instead. The data with information are transferred in peer-to-peer manner, with no centralized system management. Preliminaries regarding CAE are predicated concisely in [1]. For a large work of CAE, it is desirable to divide the object into partitions and they are distributed to the workers along the person’s efficacious field and then collected and unified into the entire object. In this work these distributed analysis tasks are the case of cooperative work. An object is divided into partitions and unified into the H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 250–255, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Web Data Exchange System for Cooperative Work in CAE
251
object. Among the partitions divided, some may have deep relevancy among them thus should be analyzed exchanging results. The SNS aids the interactions between the members of the workgroups.
2 Members for Cooperative Work and Groups in Collaboration In the concept of open system, anyone is able to contact the system, look up item lists, and address the item desired, but not able to access the content of the item without a proper authority. The members get the authorities in the workgroup after joining and taking a part, through SNS, with respective security levels in a project. 2.1 Recruiting Members through SNS Social network services aid people to constitute an open society on Internet. This SNS is also used to create and maintain a social network of engineering professionals. When a project is established, the project manager could recruit members of workgroups from the social network through SNS. A professional of a workgroup or a whole workgroup could join the project team and the project manager should grant respective security levels to the team workers or the team leader would take part in the project. The content of an item or item itself could be classified by ID / passwords or exclusive listing, respectively, in the data exchange system. As the members recruited are to be trusted, they should be authenticated, evaluated and then contracted by the project manager in a conventional manner out of Internet. This is the weakness of SNS today and SNS are evolving to lessen the weakness. Workgroup for CAE. In Fig.1, members are recruited into 3 workgroups for their respective projects. In CAE of railway technology for example, the professional groups A through D are of mechanical engineers, civil engineers, electrical engineers and industrial engineers, respectively. Some engineers of the 4 professional groups take parts in the project Y.
Professional Group A
Professional Group B
Project X
Project Y
Professional Group D Professional Group C
Project Z
Fig. 1. Recruiting members for projects from professional groups through SNS on Internet
252
M.-h. Ok and H.-s. Jung
2.2 Collaboration in the Project Team with SNS The procedure of a CAE project is depicted in Fig. 2. The project manager takes a part as a head analyst in Fig. 2. Basically cooperative analyses proceed with the unanimous passes among related teams in peer-to-peer manner. However dividing the object, the entire work, and unifying into the entire work are processed in top-down and bottom-up manner, respectively. Partitions of some functional blocks may need another round of analyses due to their interactions by related features. This cross-field analysis is conducted by related teams under the co-operation of the two team leaders and the head analyst. By modified criteria after the cross-field analysis, further analyses could be necessitated in each team. In the case the object is a car, for example, an axle shaft can be a functional block. The partitions of structural analysis and the partitions of fatigue analysis may overlap on the wheel block. Structural analysis is a process to predict the behavior of structures under loading. Most engineering artifacts such as cars, trains, aircrafts, ships, buildings and bridges are subjected to external or environmental loadings and designed to withstand them. Through structural analysis the response of the structure can be calculated based on the physical laws and the safety of the structure can be assessed. Fatigue analysis is a process to predict the life of structures under cyclic loading. Even if the load is not above a certain level which structures can withstand, structural failure can be occurred by repeated loading and unloading. Through fatigue analysis fatigue limit or fatigue strength of the structure subjected to cyclic loading can be evaluated. The objective of the analyses is conducted in size to be processed separately. After processed separately, the influencing values are exchanged and the analyses are conducted again in their sizes until some criterion is satisfied. The cooperative analyses are described further in [2]. The directions for collaboration including the influencing values are exchanged with SNS.
Team Leader and Workers
Respective Team Worker
Team Leader
Head Analyst and Team Leaders
Dividing the Object and Prescribing Analyses Criteria
Analysis Work with Partitions
Unifying the Partitions with Analysis-files into the Object
Modifying the Analysis Criteria for cross-field analysis on related features
Satisfiable
Analysis with a modified criterion
Fig. 2. Multiple teams conduct their analyses concurrently and the analyses are conducted multiple times in the cross-field analysis
3 Data Exchange via a Web Service The data exchange system comprises a Web database and the workers’ workstations. Once the leader of the analysis team posts up the location of source-files for analysis
A Web Data Exchange System for Cooperative Work in CAE
253
in a Web DB, the team worker retrieves the files with prescribed parametric conditions to be satisfied in the analysis. Likewise, the worker’s analysis files are not stored in the Web DB. These source-files and analysis files are shared among the team workers and the leader by forming distributed vaults with their workstations. The distributed vaults are a collection of specific zones in local storages of the persons’ workstations. The distributed vaults are constituted in peer to peer manner. There is no replication and thus no actions for file state change such as check in/out or release/obsolete, required by one central vault. The owner only is able to create, modify and remove the file and it is stored in the person’s local storage only. A remote worker may have a copy of the original in the person’s local storage, but modifying the copy is useless in the system.
(a) Registration of analysis files
(b) Finding out the file location
(c) Copying the files through FTP connection
Fig. 3. Data exchange system for collaborative works
The cooperative analyses proceed by changing the states of files associated with partitions. After the completion of an analysis for a partition, the files appear to other remote workers indicating Ready to check. When analysis files of a relevant partition appears with Ready to check, the cooperating workers conduct larger analyses combined with files of relevant partitions and make the report of each result. The result is messaged including Pass or Fail to the owner. If the larger analysis produces the
254
M.-h. Ok and H.-s. Jung
result does not satisfy a prescribed criterion, the result is messaged with Fail and additional memo describing the reason. The CAE collaboration involves registration of analysis files, finding out the file location, and copying the files through FTP connection as depicted in Fig. 3. Once a team worker logs in the Web DB with the group-ID and password, the worker is able either to register his analysis files finished, to see whether the analysis files are passed by the leader, or to find a registered analysis interested and copy those files from the owner’s workstation. The team leader logs in the Web DB with the leader-ID of additional authority such as marking Pass or Fail on the registered files.
4 Task Information Server The collaboration process in Fig. 2 involves iterative steps, and there may exist another teams for other sorts of analyses. Those teams are managed by a head analyst. The head analyst could request another rounds of analyses to the teams of analyses against parts having interactions. Thus some processes, not described here, go off-line and the whole process is more complicated than such a presented in [3]. Every worker and the leader of an analysis team cooperate via the Task Information Server. An object is divided into partitions, which are the entity to be analyzed, and the partition could be of various kinds of input data sets, and thus has the set number of the partition. The leader marks ‘Pass’ or ‘Fail’ in the File State. The file location is linked after finding out its registration data of the analysis files, as shown in Fig. 4, of the Web DB.
(a) Registration into TIS
(b) Finding out interested files
Fig. 4. The Web interface of the task information server
The characteristics of the data exchange system are inherited from its messaging facility, SNS. The system is basically in form of a peer-to-peer data exchange system which is called system specific classified in [1]. However this system is in form of a central data exchange system which is called system neutral[1] at times. As a team leader and the team workers are peer-to-peer while working with their tasks, they are
A Web Data Exchange System for Cooperative Work in CAE
255
peers on SNS. As the team leader plays the leader’s role, at times, the leader is a central in the team on SNS. Likewise, their workstations transfer files in peer-to-peer manner or the leader’s workstation becomes a data central of the team, when exchanging the data.
5 Summary Computer-supported collaboration supports users working together but geographically distant each other. SNS of these days has some functionality near to that of CSCW tools and seems to evolve enough in place of the CSCW of yesterdays. Furthermore, SNS is advantageous in recruiting members adequate to the cooperative work, which is an intrinsic strength of the SNS. The concept of open system could make a good workplace for the paradigm of convergence. A number of items are addressable on exclusive lists for anyone. The task information server shows those items interested to somebody, and that should be one way to the convergence from this open system. Although SNS is not used as a messaging facility, a study[4] uses network analysis and mapping technology to explore the characteristics of research collaboration in the academic community. large-scale networks were constructed using empirical data representing research partnerships formed through national research program, in practice. Cooperative work in CAE, cooperative analyses, proceeds exchanging relative data between workers and the leader in a team. Multiple teams could conduct their analyses concurrently with directions of the head analyst. Directions of team leaders and the head analyst are delivered through SNS, and data with information are transferred in peer-to-peer manner.
References 1. Vornholt, S., Geist, I., Li, Y.: Categorisation of Data Management Solutions for Heterogeneous Data in Collaborative Virtual Engineering. In: Int. Works. Digital Engineering, pp. 9– 16. ACM, New York (2010) 2. Ok, M.-H., Kwon, T.-S.: A Conceptual Framework of the Cooperative Analyses in Computer-Aided Engineering. In: IFIP World Computer Congress, pp. 239–245. Springer, Boston (2008) 3. Lee, J.-K., Kim, H.S., Kuk, S.H., Park, S.-W.: Development of an e-Engineering Framework Based on Service-Oriented Architectures. In: Luo, Y. (ed.) CDVE 2006. LNCS, vol. 4101, pp. 251–258. Springer, Heidelberg (2006) 4. Luo, Y.-L., Hsu, C.-H.: An Empirical Study of Research Collaboration Using Social Network Analysis. In: Int. Conf. Computational Science and Engineering, pp. 921–926. IEEE, Los Alamitos (2009)
Online Social Media in a Disaster Event: Network and Public Participation Shu-Fen Tseng1, Wei-Chu Chen1, and Chien-Liang Chi2,* 1
Yuan Ze University, 135 Yuan-Tung Rd. Chung-Li City 320, Taiwan 2 National Taiwan University, 1, Sec. 4, Roosevelt Rd., Taipei City 106, Taiwan [email protected]
Abstract. In August 2009, Typhoon Morakot attacked southern Taiwan and caused the most tremendous disaster in the decade. In this disaster, netizen used internet tools such as blog, Twitter, and Plurk to transmit great amount of timely information including emergencies, rescue actions and donations. By reviewing electronic documents and interviewing eight major micro-bloggers during that period of time, two major functions of social media are identified: information dissemination and resources mobilization. In sum, three major findings are concluded in this study: (1) micro-blogging applications presented potentials for public participation and engagement in crisis events; (2) these end to end users of blog, Twitter, and Plurk successfully employed collective networking power and played the vital collaborator in this disaster event.; (3) the use of social media as a more efficient disaster backchannel communication mechanism demonstrates the possibility of governmental and public participation collaboration in times of disaster. Keywords: social media, micro-blogging, disaster, public participation.
1 Introduction Recently, the use of social media in disaster management and emergency planning is making waves in the US and worldwide. In a disaster event, information communication and resources mobilization represent important functions to rescue victims of disasters from danger. Several disaster events (i.e. Hurricane Katrina in 2005, Virginia Tech Shooting and Southern California Wildfires in 2007, Seattle-Tacoma shooting in 2009) have demonstrated the important role that individual citizens can play during disaster crisis and crucially the value of social media in the whole process. In general, these research suggest social media sites provide citizens with a means to create, disseminate, and share information with a wide audience instantaneously in times of emergencies[1][2][3][4]. The internet has brought changes to the speed with which people and information can converge around disaster events, as well as the distance from which people can participate[5]. *
Corresponding author.
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 256–264, 2011. © Springer-Verlag Berlin Heidelberg 2011
Online Social Media in a Disaster Event
257
Information communication technologies that support peer-to-peer communication and specifically social media or web 2.0 applications like Facebook, Flickr, and messaging services like Twitter serve as a new means for disaster survivors, curious onlookers and compassionate helpers to find information and to assist others. For example, Twitter was used in the Southern California US wildfires in 2007 to inform citizens of time critical information about road closures, community evacuations, and shelter information. The case of 2009 violent shooting crisis in the Seattle-Tacoma also demonstrated Twitter was used as one method for citizens, news media organizations, and other types of organizations to share crisis related information. Liu et al.[6] conducted a longitudinal qualitative study to investigate how disaster-related Flickr activity evolved for six disasters between 2004 and 2007, and concluded that information people voluntarily capture, gather and aggregate provide useful data for disaster response and recovery. It is argued that general public increasingly relies on peerdistributed information, often finding it to be more timely and accurate. In events that are spatially diffuse (like hurricanes and wildfires), sources of information from multiple eyes-on-the-ground can be more helpful that official news sources because the information can provide a more local context and rapid updates for those who need to make decisions about how to act. In research during the 2007 Southern California fires, researchers concluded that social media as “backchannel” communication tools provide the opportunity for the public to actively engage in the creation of information rather than to be passive consumers. Palen et al.[7] presented a future vision of emergency management that addresses social-technical concerns in large-scale emergency response. This view expands consideration to include not only official responders, but also members of the public. By viewing public participation as powerful, self-organizing, and collectively intelligent force, information and communication technology can play a transformational role during disasters and emergency events. This paper uses a recent case in Taiwan, the Morakot typhoon, to explore how social media converges and functions in this disaster event. In specific, the study aims at exploring how micro-blogging applications facilitate public engagement and collaboration in an emergency event.
2 Typhoon Morakot The Central Weather Bureau of Taiwan issued Typhoon Morakot on August 6, 2009. Typhoon Morakot was closing in on Taiwan on August 7, it moved very slowly and it made landfall just before midnight. After midnight of August 8, most of the districts in south Taiwan recorded heavy rainfall.(Fig.1) Typhoon Morakot was one of the deadliest typhoons to impact Taiwan in history. This typhoon caused severe damage to the southern and eastern part of Taiwan with record breaking rainfall of 2,900 millimeters in 3 days, leaving 645 people dead and roughly$3.3 billion USD in damages. The storm produced enormous amounts of rainfall that media quote it as the worst floods in 50 years in Taiwan. The extreme amount of rain triggered enormous mudslides and severe flooding throughout southern and eastern Taiwan (Fig. 2). One mudslide buried the entire town of Shiao-lin (Fig. 3) at the southern county of Kaohsiung killing an estimated 500 people in the village alone.
258
S.-F. Tseng, W.-C. Chen, and C.-L. Chi
Fig. 1. The Typhoon Morakot Source: Central Fig. 2. Heavy rainfall caused a building collapse Weather Bureau of Taiwan Source:http://www.nownews.com/photo/photo. php?phid=3044&pid=&no=4#tphoto
Fig. 3. Shiao Lin Villiage after Typhoon Morakot attacked Source: http://wangdon.pixnet.net/blog/trackback/54f14bd46d/29048064
During this period, the Central Disaster Prevention and Protection Center that organized by National Fire Agency, Ministry of Interior, Central Weather Bureau and disaster management units was unable to provide accurate information and failed to respond needs for disaster areas. Taiwan's president Ma Ying-jeou and his administration faced extreme criticism for the slow response to the disaster, lack of concern and displacement of responsibilities. If we look at the National Disaster Prevention and Protection Commission website on 9th August, it only reported incidents up to 3rd August and said ‘No major disaster’ on the top, even though the Central Weather Bureau had issued the Morakot typhoon on August 6. Internet community has responded faster than the government. Witnessing the widespread destruction caused by Morakot in Southern Taiwan, blogger and netizens decided to take action.
Online Social Media in a Disaster Event
259
3 Functions of Social Media By reviewing electronic documents (i.e. websites, blogs, twitters, bulletin boards) and interviewing major micro-bloggers (a total of 8 interviewees) during that period of time, we distinguish two major functions of micro-blogging has played in this disaster event. The first function is for information dissemination and networking; the second one is for supplies and resources mobilization. Major micro-bloggers in this event are classified by these two functions and introduced as followed. 3.1 Information Dissemination and Networking (a) Morakot Disaster Information on Plurk Plurk is a popular micro-blogging service like Twitter in South-eastern Asia. On August 6, 2009 there were many separate information about disaster reports appeared on Plurk. A Plurk user “Gene” wrote an API to collect related information on Plurk and set an account “floods” to re-plurk all messages for convenience. Many related unofficial websites used information on Plurk to provide advanced assistance. (b) Morakot Online Disaster Report Center On August 8, 2009, one day after the typhoon Morakot attacked Taiwan, the Association of Digital Culture Taiwan (ADCT is a non-profit organization founded in 2007 by heavy bloggers, branches are located around the island) noticed lots of information and tweets on Twitter and Plurk mixed with right and wrong, confusing messages, and decide to build an integrated website (Morakot Online Disaster Report Center). This website adopted peer-distributed mechanism to aggregate detailed information, messages and all major sources online i.e. Twitter, Plurk, PTT bulletin board, and media. In a very short time, the website collected more than 30,000 messages in Plurk and 2,000 messages in Twitter. Worth mentioned, with timely information aggregated and rapid responses from internet community, on August 10, the southern branch of this association (ADCT) that located right next to the local government became the major information sources of local government. The original purpose of this website was to integrate information about disaster event on the internet. Later on, August 13, 2009, the ADCT coordinated with the Central Disaster Prevention and Protection Center, and two local disaster management centers, and became the official websites of local governments for disaster information and resource mobilization. (c) Morakot Disaster Map Morakot Disaster Map (Fig.4) was built by a doctor nicknamed BillyPan on August 8, 2009. Billypan and a team of volunteers on Plurk started aggregating and gathering information from all over Taiwan and create a Google map mash-up to mark the affected areas. This map not only translated information from internet to the visual map location, by using symbols it could show timely hazard status and needed resources in these affected areas. Within 24 hours, the hit rate of this map was over 220,000 times. More than 1,200 affected areas were positioned and located in 2 days. Differ from the ADCT, this group viewed actions of official agencies were un-functioned and failed, members in this group recognized internet grass-root actions were more important than governments.
260
S.-F. Tseng, W.-C. Chen, and C.-L. Chi
Fig. 4. Morakot Disaster Map Source: http://gis.rchss.sinica.edu.tw/google/?p=1325
(d) http://disastertw.com Yi-Ting Cheng (a.k.a. xdite) is the chief programmer in PC home publication group. She is one of the most famous technology bloggers in Taiwan. On August 9, she built an emergency reporting / resource news exchanging system for Morakot typhoon rescue in an hour. The system was the only website that can handle large traffic during the typhoon from beginning till the end, and it was also the earliest system online for citizens to use. 3.2 Supplies and Resources Mobilization Although not typical defined as social media, a bulletin board system PTT played an important role for resource mobilization in this disaster event. The popularity of PTT is a very unique internet culture in Taiwan. The PPT is a terminal-based bulletin board system (BBS), the largest BBS in the country has more than 1.5 million registered users, and a great amount of users are college students in Taiwan. On August 9, users of PTT, have created an “Emergency” board for disaster reporting and emergency information uses, and further they formed the PTT users disaster relief team. To play more active role in responding this disaster, on August 10, PTT disaster relief
Online Social Media in a Disaster Event
261
team lunched a resource and supply fund raising, they raised money online and purchased relief supplies online. They recruited and organized volunteers collectively, in addition, they coordinated with NPOs to manage and deliver relief supplies such as foods, clean water, shoes, and sleeping bags for people living in areas hard-hit by this typhoon. In total, they have collected more than 80 tons of supplies and resources in two days. The relief action has continued on August 22nd, till government and local NPOs took over this resource assistance function.
4 Interviews and Findings In this paper, we interviewed those major micro-bloggers who heavily engaged in this event. By summarize their motivations and actions in this disaster, first of all, we identify the reason why micro-blogging applications became the backchannel of communication platform in this event. During this period, heavy rain fall cause flood and interrupt traffic and communication infrastructures. The 911-based telephone system faced two problems. One was it failed to bear and handle unexpectedly huge amount of information. The other was the telephone lines were overload and couldn’t get through for those who made the calls. Social media applications became the major alternatives to handle the emergency. Traditional media can not handle the huge amount of information flow in a disaster event like this one. Once we realized the disfunction of formal and hierarchical way of gathering and distributing information, we decided to do it by ourselves. (ADCT 4-B) Governments didn’t have an integrate system that can completely realize and digest huge amount information on the internet, and they can’t communicate with internet users efficaciously. ……….information rush just like snowballrolling, unless you can process it immediately, it will blow up, no one can handle that….(ADCT 4-B) I went to the government websites of Central Disaster Prevention and Protection Center and found out there was nothing on the site. That’s why I built an emergency reporting / resource website for rapid and large information flow. (xdite) Secondly, the functions of social media played during this event greatly on its capacity to quickly and timely integrate, verify information and respond to the needed. On August 9, after the typhoon has stroke Taiwan for two days, the National Disaster Prevention and Protection Commission still reported ‘No major disaster’ on the website. Active users of social media and anxious netizens were willing and felt obligate to do something to help in this emergency event. The only thing we want from governments was a timely and accurate information platform…..In this kind of emergent event, integrating correct information was extremely important. (PTT 4-D)
262
S.-F. Tseng, W.-C. Chen, and C.-L. Chi
You need to provide a centralized and integrated website that citizens didn’t have to search one by one from different levels of governmental websites. We need a platform that provides comprehensive and needed information to the public. (Lee 27) Blog, websites, BBS, facebook, Plurk all play some roles in the process, eventually we found the twitter is particularly efficient in rapid information dissemination in the emergent situation. (ADCT 4-B) It (social media) gathered and mobilized resources by collective power….the public receive information from it (social media) more quickly than television or radio….Netizens voluntarily organized themselves into resource team, medical team, and saving team, etc…..This quick, mobilized power was hardly can find in pre-internet era. (PTT 4-D) Thirdly, the ADCT coordinated with the Central Disaster Prevention and Protection Center and two local disaster management centers demonstrated collaboration among formal governments, NPOs and virtual groups on internet were possible during a disastrous event. With the collective power from micro-blogging, governments can employ strengths of social media on its information integration capacity and further disseminate useful information to the public. Moreover, in this emergent event, netizens were quickly mobilized and organized to distribute resources to the needed areas without governments’ interference. They demonstrated a strong, collective action and grass-root movement in a disastrous event. I am more than willing to render this site to government for future use. Next time, this site can play significant role and provide timely information to those who needed. (xdite) I probably couldn’t do anything like those onsite disaster aids and relief teams, But joining the PTT emergency board, I could contribute my specialty and play a role on this event. (PTT 4-D) On internet, netizens not always have same opinions. However, people are more likely to gather for public good during an emergency event. We can group together without asking personal preference. (ADCT 4-C)
5 Discussion In this disaster case study, we summarize three major findings. First, micro-blogging applications presented potentials for public participation and engagement. In this event, we found that internet users are more willing to participate in public affairs when disaster event happened. In addition, micro-blogging services are used more than any other online applications to distribute and transfer disaster information as a “backchannel” communication in a disaster event.
Online Social Media in a Disaster Event
263
Secondly, this case demonstrated the collective intelligence of social media. The traditional hierarchical structure of command-and-control in the central and local governments failed to function timely and effectively in this emergency event. Internet community has responded faster than the government agencies. In particular, use of social media offers a vital tool for empowering individual citizens and becomes essential nodes and catalyzers in responding to threats and natural disasters. Timely information of victims of disaster areas was disseminated to relatives or friends by mobile phones, and updated situation of hazard areas was posted on the internet or called-in to TV news channels to inform the officials. The affordances of these diverse social media empowered people from different areas to network together and demonstrate grassroots power in responding to the disaster. Each of social media, Facebook, Blog, RSS, Plurk (Twitter), and Google map had its role in this event. They catalyzed social participation of people and facilitated abilities of resource mobilization of people. This study suggests features of social interaction in a highly networked world where convergence of people, information, and media can create new environments within which collective intelligence and action takes place. Thirdly, this case study demonstrates the possibility of coordination between formal governmental organization and public participation grassroots. A further question is how to account for the role of public participation in formal response efforts? We think there might be two pathways that internet grassroots can deliver. One is ICTsupported citizen communications can spawn information useful to the formal response effort. The availability of this data that people can further seek and access information from each other. Such capability can help the formal response effort in collecting and providing information useful to the public. The second one is citizen communications can create new opportunities for the creation of new temporary organizations that help with the informal response effort. ICT-supported communications add powerful means by which this kind of organization can occur. In sum, this case study demonstrates the network and mobilization power of social media. Online use of social media offers a vital tool for public participation and collaboration among different agencies in times of disaster.
References 1. Palen, L.: Online Social Media in Crisis Events. Educause Quarterly (3), 76–678 (2008) 2. Sutton, J., Palen, L., Shklovski, I.: Backchannels on the Front Lines: Emergent Uses of Social Media in the 2007 Southern California Wildfires. In: Proceedings of the 5th International ISCRAM Conference, Washington, D.C (2008) 3. Heverin, T., Zach, L.: Microblogging for Crisis Communication: Examination of Twitter Use in Response to a 2009 Violent Crisis in the Seattle-Tacoma, Washington Area. In: Proceedings of the 7th International ISCRAM, Seattle, USA (2010) 4. White, C., Plotnick, L., Kushma, J., Hiltz, S.R., Turoff, M.: An Online Social Network for Emergency Management. In: Proceedings of the 6th International ISCRAM Conference, Gothenburg, Sweden (2009) 5. Hughes, A.,, L.: Twitter Adoption and Use in Mass Convergence and Emergency Events. In: Proceedings of the 6th International ISCRAM Conference, Gothenburg, Sweden (2009)
264
S.-F. Tseng, W.-C. Chen, and C.-L. Chi
6. Liu, S.B., Palen, L., Sutton, J., Hughes, A., Vieweg, S.: Search of the Bigger Picture: The Emergent Role of On-Line Photo Sharing in Times of Disaster. In: Proceedings of the 5th International ISCRAM Conference, Washington, D.C (2008) 7. Palen, l., Anderson, K.M., Mark, G., Martin, J., Sicker, D., Palmer, M., Grunwald, D.: A Vision for Technology-Mediated Support for Public Participation & Assistance in Mass Emergencies & Disasters. In: Proceedings of ACM-BCS Visions of Computer Science 2010, Edinburgh, United Kingdom (2010)
Qualitative Comparison of Community Detection Algorithms Günce Keziban Orman1,2, Vincent Labatut1, and Hocine Cherifi2 1
Galatasaray University, Computer Science Department, Istanbul, Turkey 2 University of Burgundy, LE2I UMR CNRS 5158, Dijon, France [email protected]
Abstract. Community detection is a very active field in complex networks analysis, consisting in identifying groups of nodes more densely interconnected relatively to the rest of the network. The existing algorithms are usually tested and compared on real-world and artificial networks, their performance being assessed through some partition similarity measure. However, artificial networks realism can be questioned, and the appropriateness of those measures is not obvious. In this study, we take advantage of recent advances concerning the characterization of community structures to tackle these questions. We first generate networks thanks to the most realistic model available to date. Their analysis reveals they display only some of the properties observed in real-world community structures. We then apply five community detection algorithms on these networks and find out the performance assessed quantitatively does not necessarily agree with a qualitative analysis of the identified communities. It therefore seems both approaches should be applied to perform a relevant comparison of the algorithms. Keywords: Complex Networks, Community Detection, Community Properties, Algorithms Comparison.
1 Introduction The use of networks as modeling tools has spread through many application fields during the last decades: biology, sociology, physics, computer science, communication, etc. (see [1] for a very complete review of applied studies). Once a system has been modeled, the resulting network can be analyzed or visualized thanks to some of the many tools designed for graph mining. Such large real-world networks are characterized by a heterogeneous structure, leading to specific properties. In particular, a heterogeneous distribution of links often results in the presence of a so-called community structure [2]. A community roughly corresponds to a group of nodes more densely interconnected, relatively to the rest of the network [3]. The way such a structure can be interpreted is obviously dependent on the modeled system. However, independently from the nature of this system, it is clear the community structure conveys some very important information, necessary to a proper understanding [4]. Detecting communities is therefore an essential part of modern network analysis. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 265–279, 2011. © Springer-Verlag Berlin Heidelberg 2011
266
G.K. Orman, V. Labatut, and H. Cherifi
Many different community detection algorithms exist [2], which can diverge in two ways: first the process leading to an estimation of the community structure, but also the nature of the estimated communities themselves. This raises a question regarding the comparison of these algorithms, from both a theoretical and a practical point of view. Authors traditionally test their community detection algorithms on real-world and/or artificial networks [5, 6]. The performance is assessed by comparing the estimated communities with some community structure of reference using a quality (e.g. modularity [3]) or association (e.g. Normalized Mutual Information [7]) measure. This single value is then compared with those obtained when applying preexisting algorithms on the same data. The main problem with this approach is its purely quantitative aspect: it completely ignores the nature of the considered community structures. Two algorithms can reach the exact same level of performance, but still estimate very different community structures. Besides performance evaluation, the data used to perform the tests are also subject to some limitations. For realism reasons, it is necessary to apply the algorithms to real-world networks, but these are not sufficient because 1) reference communities (i.e. ground truth) can rarely be defined objectively, and 2) the topology of the selected networks can hardly be diverse enough to represent all types of systems. Testing on artificial networks can be seen as complementary, because they overcome these limitations. Indeed, a random model allows generating as many networks as desired, while controlling some of their topological properties. The only issue is the realism of the obtained networks, which should mimic closely real-world networks in order to get relevant test results. Some properties common to most real-world networks are well-identified: powerlaw distributed degree, small-worldness, non-zero degree correlation and relatively high transitivity [8]. Additionally, networks with a community structure are characterized by a power-law distributed community size [9]. Several generative models with increasing realism were successively designed [6] before finally meeting these constraints [6, 10, 11]. However, recent studies showed other properties besides the community size distribution can be used to characterize real-world community structures [4, 12], such as community-wise density and average distance, hub dominance, embeddedness. They allow giving a more detailed description of the internal topology of the communities, and of the way they are interconnected. To our opinion, these new results have two important consequences on the problem of community detection assessment. First, the question to know whether the artificial networks used as benchmark also exhibit these properties arises naturally. But more importantly, it is now possible to perform a qualitative comparison of the communities identified by different algorithms, instead of relying only on a single performance measure. In this article, we try to answer both questions using the realistic generative model LFR [6] and a representative set of community detection algorithms. In section 2, we review in greater details the properties used to describe community structures in complex networks. In section 3, we shortly present the LFR model and its properties, and introduce an adjustment allowing to improve the realism of the networks it generates. We also review the various approaches used in the literature to define the concept of community, and describe a selection of community detection algorithms from this perspective. In section 4, we first analyze the properties of the generated
Qualitative Comparison of Community Detection Algorithms
267
networks. We then compare the community structures detected by the selected algorithms on these networks. We consider both their similarity to the community structure of interest, and how community properties differ from one algorithm to the other. Finally, we discuss our results and explain how our work could be extended.
2 Characterization of a Community Structure Complex networks are often characterized at microscopic and macroscopic levels, i.e. by studying the characteristics of nodes taken individually and of the network considered as a whole, respectively. The microscopic approach focuses on some nodes of interest, and tries to identify which features allow distinguishing them from the rest of the network (degree, centrality, local transitivity, etc.). At the macroscopic level, one can take advantage of the multiplicity of nodes to derive statistics or distributions summarizing some of the network features (degree distribution, degree correlation, average distance, transitivity, etc.). The development of community detection corresponds to the apparition of a mesoscopic level, and highlights the need for adapted tools to characterize the community structure. In this section, we present a selection of the mesoscopic measures recently proposed and indicate how real-world networks behave relatively to them. In some cases, no general observation can be made, and one has to consider the class of the network: communication, biological, social, etc. Note other measures exist besides those described here, such as the network community profile [12] or roles distribution [13]. The community size distribution is considered as an important characteristic of the community structure. It has been largely studied in real-world networks, and seems to follow a power law [9, 14] with exponent ranging from 1 to 2 [15]. This means their sizes are heterogeneous, with many small communities and only a few very large ones. The embeddedness measure assesses how much the direct neighbors of a node belong to its own community. It is defined as the ratio of the internal degree to the total degree of the considered node [4]: ⁄
(1)
This internal degree is the number of links the nodes has with other nodes from the same community, by opposition to its external degree , which corresponds to connections with nodes located in other communities. The maximal embeddedness of 1 is reached when all the neighbors are in its community ( ), whereas the minimal value of 0 corresponds to the case where all neighbors belong to different communities ( 0). In real-world networks, a majority of nodes, usually with low degree, have a very high embeddedness. The rest are distributed depending on the considered class: communication, Internet and biological networks exhibit a peak around 0.5, whereas social and information networks have a more uniform distribution. In all cases, the whole range of is significantly represented, even small values [4]. The density of a community is defined as the ratio of links it actually contains, noted , to the number of links it could contain if all its nodes were connected. In the case of an undirected network, the latter is 1 ⁄2, where is the number
268
G.K. Orman, V. Labatut, and H. Cherifi
of nodes in the community, and we therefore get 2 ⁄ 1 . When compared to the overall network density, the density allows assessing the cohesion of the community: by definition, a community is supposed to be denser than the network it belongs to. The scaled density is a variant obtained by multiplying the density by the community size [4]: 2 1
(2)
1 links, and 2. If the considered community is a tree, it has only If it is a clique (completely connected subgraph), then 1 ⁄2 and we have . The scaled density therefore allows charactering the structure of the community. Some real-world networks such as the Internet or communication networks have essentially tree-like communities. On the contrary, for other classes like social and information networks, the scaled density increases with the community size. Finally, biological networks exhibit a hybrid behavior, their small communities being tree-like whereas the large ones are denser and close to cliques [4]. The distance between two nodes corresponds to the length of their shortest path. When averaged over all pairs of nodes in a community, it allows assessing the cohesion of this community. In real-world networks, small communities ( 10) are supposedly small-world, which means the average distance ℓ should increase logarithmically with the community size [4]. For larger communities, the average distance still increases, but more slowly, or even stabilizes for certain classes like communication networks. A small average distance can be explained by a high density (social), the presence of hubs (communication, Internet), or both (biological, information). From a community structure perspective, a hub is a node connected to many of the other nodes belonging to the same community. The presence of a central hub in a community can be assessed using the hub dominance measure, which corresponds to the following ratio: max
⁄
1
(3)
The numerator is the maximal internal degree found in , and the denominator is the maximal degree theoretically possible given the community size. The hub dominance therefore reaches 1 when at least one node is connected to all other nodes in the community. It can be 0 only if no nodes are connected, which is unlikely for a community. In real-world networks, the behavior of this property depends on the considered class. For communication networks, it is close to the maximum for all community sizes, meaning hubs are present in all communities. Considering their communities are sparse and tree-like, one can conclude they are star-shaped. Other classes do not have as many hubs in their large communities, which is why their hub dominance generally decreases with community size increase [4].
3 Methods Our experiment is two-stepped: first we generate a set of artificial networks and study the realism of their community-related topological properties; second we apply a
Qualitative Comparison of Community Detection Algorithms
269
selection of community detection algorithms on these networks and analyze the properties of the community structure they estimate. In this section, we first describe the LFR model we applied during the first step, which supposedly allows generating the most realistic networks in terms of overall properties (degree distribution, smallworldness, etc.) [6, 10, 11]. Then, we shortly describe the community detection algorithms we selected, and explain how they differ on the way they handle the concept of community. 3.1 Network Generation Only a few models have been designed to generate networks possessing a community structure. Girvan and Newman seemingly defined the first one [5], which produces networks taking roughly the form of sets of small interconnected Erdős-Rényi networks [16]. Although widely used to test and compare community detection algorithms, the Girvan-Newman method is limited in terms of realism [6], mainly because the degree is not power-law distributed and the communities are small, few, and evensized. Several variants were defined, allowing to produce larger networks and communities with heterogeneous sizes [2, 7, 17]. More recently, a different approach appeared, based on a rewiring process [6, 18]. It increased the realism level even more by generating networks with power-law distributed degree. Among these newer models, we selected the LFR model, which seems to be the more realistic and was previously used as a benchmark to compare community detection algorithms [6, 10, 19]. The LFR model was proposed by Lancichinetti et al. [6] to randomly generate undirected and unweighted networks with mutually exclusive communities. The model was subsequently extended to generate weighted and/or directed networks, with possibly overlapping communities [19]. However, in this article, we focus on undirected unweighted networks with non-overlapping communities, because the community structure-related properties we want to study have been defined and/or used only for this type of networks. The model allows to control directly the following parameters: number of nodes , desired average and maximum degrees, exponent for the degree distribution, exponent for the community size distribution, and mixing coefficient . The latter represents the desired average proportion of links between a node and nodes located outside its community, called inter-community links. Consequently, the proportion of intra-community links is 1– . A node of degree has therefore an external degree of and an internal degree of 1 . The generative process first uses the configuration model (CM) [20] to generate a network with average degree , maximum degree and power-law degree distribution with exponent . Second, virtual communities are defined so that their sizes follow a power-law distribution with exponent . Each node is randomly affected to a community, provided the community size is greater or equal to the node internal degree. Third, an iterative process takes place to rewire certain links, in order to approximate , while preserving the degree distribution. For each node, the total degree is not modified, but the ratio of internal and external links is changed so that the resulting proportion gets close to . By construction, the LFR method guaranties to obtain values considered as realistic [1, 8] for several properties: size of the network, power law distributed degrees and community sizes. Other properties are not directly controlled, but were studied
270
G.K. Orman, V. Labatut, and H. Cherifi
empirically [10]. It turns out LFR generates small-world networks, with relatively high transitivity and degree correlation. This is realistic [8], but holds only under certain conditions. In particular, transitivity and degree correlation are dramatically affected by changes in , and become clearly unrealistic when it gets different from 0. An adjustment was proposed to solve this issue, consisting in using a different generative model during the first step [11]. By applying Barabási & Albert’s preferential attachment model (BA) [21] instead of the CM, the degree correlation and transitivity become more stable relatively to changes in . It is rather clear the mixing coefficient is complementary to the embeddedness presented in section 2 (eq. 1): 1 . Yet, it was mentioned in the same section the embeddedness varies much from one node to the other in real-world networks, exhibiting bimodal and flat distributions. From this point of view, the LFR model is not realistic, since it produces networks whose nodes have all roughly the same mixing coefficient. To solve this problem, we implemented a small adjustment allowing to specify the complete distribution of in place of a single objective value. 3.2 Community Detection Because of their number and great diversity, it is difficult to categorize community detection algorithms. Here, we chose to characterize them not by considering the process they implement, as it is usually done, but rather the definition of the community concept they rely upon. We selected a representative set of algorithms, favoring fast ones because of the size of the analyzed networks. We give very partial descriptions in this section, so the reader might want to consult the review by Fortunato [2] to find more information concerning community detection. A very widespread informal definition of the community concept considers it as a group of nodes densely interconnected compared to the other nodes [2, 14, 22]. In other terms, a community is a cohesive subset clearly separated from the rest of the network. Formal definitions differ in the way they translate and combine both these aspects of cohesion and separation. A direct translation of the informal definition given above consists in first specifying two distinct measures to assess separately cohesion and separation, and then processing an overall measure by considering their difference or ratio. This approach led to many variants, differing on how the measures are defined and combined. The most widespread one is certainly the modularity, a chance-corrected measure which assesses cohesion and separation through the number of intra- and inter-community links, respectively. We selected two modularity optimization algorithms, which differ in the way they perform this optimization. Fast Greedy applies a basic greedy approach [3], and Louvain includes a community aggregation step to improve processing on large networks [23]. Another family of approaches is based on node similarity measures. Such a measure allows translating the topological notions of cohesion and separation in terms of intracommunity similarity and inter-community dissimilarity. In other terms: a community is viewed as a group of nodes which are similar to each other, but dissimilar from the rest of the network. Once all node-to-node similarities are known, detecting a community structure can be performed by applying a similarity-based classic cluster analysis algorithm [24]. We selected the Walktrap algorithm, which uses a similarity based on random walks and applies a hierarchical agglomerative clustering approach [17].
Qualitative Comparison of Community Detection Algorithms
271
Some approaches based on data compression do not use the cohesion and separation concepts like the previous definitions. They consider the community structure as a set of regularities in the network topology, which can be used to represent the whole network in a more compact way. The best community structure is supposed to be the one maximizing compactness while minimizing information loss. The quality of the representation is assessed through measures derived from mutual information. Algorithms essentially differ in the way they represent the community structure and how they assess the quality of this representation. We selected the InfoMap algorithm [25], whose representation is based on coded ids affected to nodes. The definition of the community concept is not always explicit: procedural approaches exist, in which the notion of community is implicitly defined as the result of the processing. To illustrate this, we selected the MarkovCluster algorithm, which simulates a diffusion process in the network to detect communities [26]. This approach relies on the transfer matrix of the network, which describes the transition probabilities for a random walker evolving in this network. Two transformations are iteratively applied on this matrix until convergence. The resulting matrix can be interpreted as the adjacency matrix of a network with disconnected components, which correspond to communities in the original network. We have to mention another family of algorithms based on link centrality. They iteratively remove the most central links until disconnected components are obtained, which are interpreted as the network communities. The community structure largely depends on the selected centrality measure, e.g. edge-betweenness [5]. However, the computational cost of such algorithms is very high and we were not able to apply them to our data.
4 Results and Discussion 4.1 Properties of the Generated Communities Using our review of the literature, we selected realistic values for the parameters the LFR model lets us control. We used three different network sizes: 10000,100000,500000 , constant average and maximal degrees 30, 1000, and exponent 3 for the degree power-law distribution. The exponent was 2 for the community size distribution, whose bounds were and . The mixing coefficient was distributed uniformly over its definition domain 0; 1 . We generated 5 instances of network for each combination of parameter values, in order to check for consistency. Among the community-related properties we described in section 2, two are directly controlled by the LFR model: the community size and embeddedness distributions. Our measurements confirm on all networks that the community sizes follow a powerlaw distribution as expected (cf. Fig. 1). Note the range of these sizes varies much from one real-world network to the other, and it is therefore difficult to describe a typical set of values. However, we can say the communities we generated are very similar in size to those from real-world networks of comparable size. For instance, we have communities containing between 15 and 700 nodes for 10000, which is compliant with what was observed in networks of this size [4]. For the embeddedness,
272
G.K. Orman, V. Labatut, and H. Cherifi
we obtained a uniform distribution, as expected. It is close enough to what can be observed in social and information networks. The main difference is we do not have as many nodes with very high embeddedness as in those real-world networks. This could be easily corrected though, by specifying a more appropriate distribution when applying the modified LFR model. We now focus our attention on the uncontrolled properties. The results are very similar independently from the size of the network. The only difference seems to be that values measured on larger networks exhibit slightly smaller dispersion. For this reason we present only results for networks with size 10000. The scaled density increases from 11 to 22 along with the community size. This means the smallest communities are clique-like ( ), and no tree-like communities are generated ( 2). These features cannot be considered as realistic: as mentioned before, in real-world networks the small communities are tree-like and the large ones are either tree-like too, or much more clique-like. In other words: realworld networks exhibit two different behaviors, but the generated networks can be compared to none of them. It seems the links are distributed too homogeneously over the generated networks, making small communities too dense and large ones too sparse. As shown in Fig. 1, the average distance increases regularly from 1.5 to 2.5 along with the community size. The main difference with real-world networks is these have a much lower average distance for smallest communities, reaching values slightly greater than 1. Consequently, we do not observe for the generated networks the fast increase of average distance which was characteristic of the real-world networks. For the rest of the communities, the observed distribution is comparable with communication networks though, with a stable average distance for medium and large communities. Moreover, the values measured for these communities are also realistic in terms of magnitude.
Scaled Density
−1
10
10−2 10−3 10−4 5
Hub Dominance
Average Distance
Frequency
100
4 3 2 1 101
102 Community Size
103
102
101
100 1.0 0.8 0.6 0.4 0.2 0.0 101
102 103 Community Size
Fig. 1. Properties of the generated communities. Each network instance is represented with a different shape/color. Points are averages over logarithmic bins of the community size. The dotted lines in the scaled density plot represents its limits ( 2 and , cf. section 2).
Qualitative Comparison of Community Detection Algorithms
273
Hub dominance is very high for the smallest communities, with values close to 1. For large communities, there is no general trend in 10000 networks: the property varies much over the networks we generated. This dispersion decreases when the network size increases though, and 500000 networks show a hub dominance decrease with community size increase, reaching values close to 0.3. This behavior is compatible with most classes of real-world networks, which exhibit hubs dominance mainly for small communities. Moreover, the same dispersion was also observed on real-world networks [4]. This measure completely relies on the way high degree nodes are distributed over communities, since it directly depends on the maximal internal degree found in communities. The fact there are much less large communities, due to their power law-distributed sizes, can explain this dispersion. A possible solution would be to consider a measure based on the highest internal degrees of the community instead of a single one. To summarize our observations: the generated communities exhibit some, but not all, of the properties observed on real-world networks. Their sizes are realistic, but the distribution of links is not always appropriate. The small communities are too dense and clique-like, when they should be sparser and tree-like, with a smaller average distance. In other terms, they should be star-shaped. They possess the high hub dominance characteristic of such structures though, but this is certainly due to their clique-like configuration. The fact their average distance is much higher than in comparable real-world communities is a surprise. Indeed, one would expect such dense, hub-dominated communities to have a lower average distance. It turns out they are constituted of a clique core and a few very low degree nodes connected to this core: the latter explain the relatively high average distance. The larger communities, on the contrary, should be substantially denser and more clique-like. In some cases, their hub dominance is relatively low despite their small average distance and low density, which seems to indicate they do not contain a main central hub, but several interconnected medium ones. By definition, this feature is not reflected by the hub dominance measure, which only considers the maximal degree in the community. Another issue is the fact generated networks do not comply with a specific class of real-world networks, but rather have similarities with different classes depending on the considered property. Their average distance have common points with communication networks, whereas this is not the case at all for their embeddedness and hub dominance distributions, which look like social and biological networks. Despites these limitations, the model produces what we think to be the most realistic networks to date, which is why the generated networks constitute an appropriate benchmark to analyze community detection algorithms. 4.2 Comparison of the Estimated Communities We applied the selected community detection algorithms on the generated networks: Louvain (LV), Fast Greedy (FG), MarkovCluster (MC), InfoMap (IM) and Walktrap (WT). For time matters, it was possible to process networks with sizes 10000 and 100000, but not 500000. We however generated denser 100000 networks, with 3000 (instead of 1000), in order to study the effect of density. Table 1 displays the performance of each algorithm expressed in terms of Normalized Mutual Information (NMI), which is a measure assessing the similarity of two partitions (in
274
G.K. Orman, V. Labatut, and H. Cherifi
our case: the reference and estimated community structures). It is considered to be a good performance measure for community detection, and was used in several studies [6, 7, 10]. According to the NMI, IM clearly finds the closest community structure to the reference, followed by MC, LV, and WT, while FG is far behind. This type of quantitative analysis is characteristic of existing works dealing with algorithms comparison. In the rest of this section, we complete it with a qualitative analysis based on the previously presented community properties. We first focus on the results obtained on 10000 networks. As we can see on Fig. 2, most algorithms have found communities whose sizes distribution is reminiscent of the power-law used during network generation. However, important differences exist between them. First, MC visibly finds many very small communities ( 5), and the other sizes are consequently strongly under-represented. A more thorough verification showed most of these communities are even single nodes, which is particularly problematic since community identification consists in grouping them. It is important to remark this does not appear on the NMI values, since MC has the second best score. This raises a question regarding the appropriateness of this measure to assess community detection performance. IM also finds some small communities, but much less than MC, and the rest of the distribution is more similar to the reference. Compared to the reference and the other algorithms, communities detected by FG and LV have sizes distributed rather uniformly. Interestingly, these two algorithms have very different performances in terms of NMI, so despite the relatively similar sizes of their communities, their community structures are probably very different too. For WT, the size distribution is very close to the reference. Again, this fact alone is not equivalent to a high NMI value, since its performance is substantially lower to IM. Table 1. Algorithms performances, as measured with the Normalized Mutual Information Algorithm Louvain Fast Greedy MarkovCluster InfoMap Walktrap
10 0.80 0.59 0.83 0.88 0.77
10 , 0.78 0.66 0.87 0.93 0.79
10
10 ,
3
10
0.80 0.67 0.80 0.91 0.78
For the embeddedness, MC and WT are clearly different from the reference, displaying a distribution with very few extreme embeddedness values. The small numbers of highly embedded nodes and the fact almost half the nodes have very low embeddedness with MC seems to be linked to the community size distribution. Many of the smallest communities identified by MC are certainly grouped together in the reference, leading to a smaller number of intercommunity links. Compared to the reference, WT does not contain nodes with low embeddedness, whereas it has more nodes with medium embeddedness. In this case, it cannot be related to the community sizes though, since they are comparable to those of the reference. Maybe the lack of low embeddedness nodes can be interpreted as a failure to classify interface nodes, located at the limit of their community and largely connected with other communities. The
Qualitative Comparison of Community Detection Algorithms
275
embeddedness distributions observed for FG and LV are again very similar. They also lack low embeddedness nodes, but not as much as WT. Finally, IM presents the values the most similar to the reference. When considering the scaled density (Fig. 2), IM, MC and WT are very close to the reference, except IM and MC present very low values for their smallest communities (meaning these are tree-like). For FG and LV, the scaled density is relatively stable, and does not present the slow increase which is characteristic of the reference. This can be interpreted as the fact the communities detected by these algorithms all present the same structure, independently from their size. The average distances measured on the FG and LV communities are much dispersed and do not follow the evolution observed for the reference. FG, in particular, has a much higher average distance than the reference and the other algorithms. This property is a good indicator of cohesion, so it seems this quality is absent from the communities identified by FG. The remaining algorithms (IM, MC, WT) are very close to the reference. IM displays two outliers though: the average distance is surprisingly high for its smallest and largest communities.
Frequency
100
Reference Louvain Fast Greedy MarkovCluster InfoMap Walktrap
10−1 10
−2
10−3 10−4 10−5
101
102 103 Community Size
Scaled Density
Frequency
100 10−1 10−2 10−3 0.2 0.4 0.6 0.8 Embeddedness
5 4 3 2 1 101
102 103 Community Size
101
100
1
Hub Dominance
Average Distance
0
102
101
102 103 Community Size
101
102 103 Community Size
1.0 0.8 0.6 0.4 0.2 0.0
Fig. 2. Properties of the detected communities. Each shape/color corresponds to a different algorithm, whereas the reference is represented by a solid line. Points are averages over logarithmic bins of the community size. The dotted lines in the scaled density plot represent the limits of this property, as in Fig. 1.
276
G.K. Orman, V. Labatut, and H. Cherifi
For hub dominance, IM, MC and WT seem to follow the reference, with a positive bias. The fact these algorithms have slightly higher scaled-density and lower hub dominance, relatively to the reference, is consistent with their slightly lower average distance. The inverse observation is valid for the smallest and largest communities detected by IM: sparse and non-centralized communities lead to high average distance. FG and LV once again display similar behaviors, with hub dominance values clearly bellow the reference. When also considering their stable scaled density, this can explain their increasing average distance. The topological analysis of the estimated community structures gives a new perspective to the quantitative performance measures. The communities detected by IM, the best algorithm in terms of NMI, are unsurprisingly very close to the reference ones. However, MC, the second algorithm and not far from IM, presents a very different community structure, characterized by much more very small communities. On the contrary, most of the properties of the communities identified by WT are very similar to the reference. It only differs clearly in terms of embeddedness distribution, which is apparently sufficient to rank it only fourth in terms of NMI, relatively far from IM. It thus seems there is no equivalence between a high NMI value and a community structure with properties close to the reference. We conclude both approaches are complementary to perform a relevant analysis of community detection results. It is worth noticing LV and FG, both based on modularity optimization, comparably differ from the reference, which confirms the importance of considering the community definition which characterizes an algorithm.
Frequency
0
10 10−1 −2
10
10−3 10−4 10−5 1
10
2
3
4
10 10 10 Community Size
1
10
2
3
4
10 10 10 Community Size
Fig. 3. Community size distributions for size 10 networks, with 1000 (left) and 3000 (right). Shapes/colors meaning is the same as in Fig. 2, and points are also averages over logarithmic bins of the community size.
On size 100000 networks, FG and LV find communities much larger than the roughly ranges from 100 reference ones, as shown in Fig. 3. For these algorithms, to 15000, when it goes from 15 to 1000 in the reference. Both are based on modularity optimization, so this might be due to the resolution limit characteristic of this measure [27], which prevents them from finding smaller communities. IM is relatively close to the reference, but not as much as it is on 10000 networks. WT is also very similar to the reference, but it departs from it by finding a very large community ( 30000). MC results are relatively similar to those obtained on the 10000 networks, i.e. it finds many very small communities. In order to separate the effects of
Qualitative Comparison of Community Detection Algorithms
277
network size and density on the algorithms, we generated additional networks with the same size, but maximal degree 3000 (instead of 1000). This reduces slightly the overestimation of FG and LV community sizes, whereas MC has roughly the same results. On the contrary, IM and WT properties are excellent, they follow almost perfectly the reference values.
5 Conclusion In this study, we took advantage of recent advances relative to the characterization of community structures in complex networks to tackle two questions: 1) Do artificial networks used as benchmark exhibit real-world community properties? 2) How do community detection algorithms compare in qualitative terms, by opposition to the usual quantitative measurement of their performances. We first applied a variant of the LFR model [6] to generate a set of artificial networks with realistic parameters retrieved from the literature. We studied their properties and concluded some of them are realistic (community sizes, hub dominance), some are only partly realistic (embeddedness, average distance), and others are not realistic at all (scaled density). We then applied on these networks a representative set of five fast community detection algorithms: Fast Greedy, InfoMap, Louvain, MarkovCluster and Walktrap. It turns out the performance assessed quantitatively through the widely used Normalized Mutual Information (NMI) measure does not necessarily agree with a qualitative analysis of the identified communities. On the one hand, MarkovCluster, ranked second in terms of NMI, actually found an extremely large number of very small communities and almost no large community. On the other hand, the properties of the community structure estimated by Walktrap are very close from the reference ones, but the algorithm comes fourth in terms of NMI, with a score relatively far from MarkovCluster’s one. It therefore seems both approaches should be applied to perform a relevant comparison of the algorithms. Our contributions are as follow. First, we introduced a slight modification to the LFR model, in order to make the embeddedness distribution more realistic in the generated networks. Second, we studied these generated networks in terms of community-centered properties. This complements some previous analyses focusing on network-centered properties such as transitivity or degree correlation [6, 10, 11]. Third, we applied several community detection algorithms on these networks and characterized their results relatively to the same community-centered properties. Previous studies adopted a quantitative approach based on some performance measure [6, 7, 10, 11, 18]. Our work can be extended in various ways. First, it seems necessary to either increase the realism of the LFR model or to define a completely new approach able to generate more realistic networks. Second, by lack of time, we could test only a few algorithms, on a few relatively large networks. A more thorough analysis would consist in using much larger networks, with more repetitions to improve statistical significance. Moreover, applying several algorithms relying on the same definition of the community concept would allow to compare their properties and maybe associate a certain type of community structure to a certain family of algorithms. It could additionally be interesting to use other performance measures than the NMI to assess their
278
G.K. Orman, V. Labatut, and H. Cherifi
relevance with the studied topological properties. Third, it would noticeably be interesting to apply classic network-wise measures to communities (transitivity, degree correlation, centrality, etc.), and to consider additional community specific measures, such as those designed in [13], which seem complementary to the embeddedness, and the concept of community profile [12], although this one looks particularly costly from a computational point of view. Acknowledgments. This project is supported by the Galatasaray University Research Fund.
References 1. da Fontura Costa, L., Oliveira Jr., O.N., Travieso, G., Rodrigues, r.A., Villas Boas, P.R., Antiqueira, L., Viana, M.P., da Rocha, L.E.C.: Analyzing and Modeling Real-World Phenomena with Complex Networks: A Survey of Applications. arXiv physics.soc-ph, 0711.3199 (2008) 2. Fortunato, S.: Community Detection in Graphs. Phys. Rep. 486, 75–174 (2010) 3. Newman, M.E.J., Girvan, M.: Finding and Evaluating Community Structure in Networks. Phys. Rev. E 69, 26113 (2004) 4. Lancichinetti, A., Kivelä, M., Saramäki, J., Fortunato, S.: Characterizing the Community Structure of Complex Networks. PLoS ONE 5, e11976 (2010) 5. Girvan, M., Newman, M.E.J.: Community Structure in Social and Biological Networks. PNAS 99, 7821–7826 (2002) 6. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark Graphs for Testing Community Detection Algorithms. Phys. Rev. E 78, 46110 (2008) 7. Danon, L., Diaz-Guilera, A., Arenas, A.: The Effect of Size Heterogeneity on Community Identification in Complex Networks. J. Stat. Mech., 11010 (2006) 8. Newman, M.E.J.: The Structure and Function of Complex Networks. SIAM Rev. 45, 167– 256 (2003) 9. Guimerà, R., Danon, L., Díaz-Guilera, A., Giralt, F., Arenas, A.: Self-Similar Community Structure in a Network of Human Interactions. Phys. Rev. E 68, 65103 (2003) 10. Orman, G.K., Labatut, V.: A comparison of community detection algorithms on artificial networks. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 242–256. Springer, Heidelberg (2009) 11. Orman, G.K., Labatut, V.: The Effect of Network Realism on Community Detection Algorithms. In: ASONAM, Odense, DK, pp. 301–305 (2010) 12. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical Properties of Community Structure in Large Social and Information Networks. In: WWW, ACM, Beijing (2008) 13. Guimerà, R., Amaral, L.A.N.: Functional Cartography of Complex Metabolic Networks. Nature 433, 895–900 (2005) 14. Newman, M.E.J.: Detecting Community Structure in Networks. Eur. Phys. J. B 38, 321– 330 (2004) 15. Palla, G., Derenyi, I., Farkas, I., Vicsek, T.: Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society. Nature 435, 814–818 (2005) 16. Erdõs, P., Rényi, A.: On Random Graphs. Publ. Math. 6, 290–297 (1959)
Qualitative Comparison of Community Detection Algorithms
279
17. Pons, P., Latapy, M.: Computing communities in large networks using random walks. In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 284–293. Springer, Heidelberg (2005) 18. Bagrow, J.P.: Evaluating Local Community Methods in Networks. J. Stat. Mech (2008) 19. Lancichinetti, A., Fortunato, S.: Community Detection Algorithms: A Comparative Analysis. Phys. Rev. E 80, 56117 (2009) 20. Molloy, M., Reed, B.: A Critical Point for Random Graphs with a Given Degree Sequence. Random Structures and Algorithms 6, 161–179 (1995) 21. Barabási, A.-L., Albert, R.: Emergence of Scaling in Random Networks. Science 286, 509 (1999) 22. Danon, L., Duch, J., Arenas, A., Díaz-Guilera, A.: Community Structure Identification. In: Large Scale Structure and Dynamics of Complex Networks: From Information Technology to Finance and Natural Science, pp. 93–113. World Scientific, Singapore (2007) 23. Blondel, V.D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E.: Fast Unfolding of Communities in Large Networks. J. Stat. Mech., 10008 (2008) 24. Gan, G.a.M., C. and Wu, J.: Data Clustering: Theory, Algorithms, and Applications. Society for Industrial and Applied Mathematics, Philadelphia, US-PA (2007) 25. Rosvall, M., Bergstrom, C.T.: Maps of Random Walks on Complex Networks Reveal Community Structure. PNAS 105, 1118 (2008) 26. van Dongen, S.: Graph Clustering Via a Discrete Uncoupling Process. SIAM J. Matrix Anal. Appl. 30, 121–141 (2008) 27. Fortunato, S., Barthelemy, M.: Resolution Limit in Community Detection. PNAS 104, 36– 41 (2007)
Proof-of-Concept Design of an Ontology-Based Computing Curricula Management System Adelina Tang and Amanullah Abdur Rahman Sunway University, School of Computer Technology, Bandar Sunway, 46150 Petaling Jaya, Selangor Darul Ehsan, Malaysia [email protected], [email protected]
Abstract. The management of curricula development activities is timeconsuming and labor-intensive. The accelerated nature of Computing technological advances exacerbates the complexity of such activities. A Computing Curricula Management System (CCMS) is proposed as a Proof-of-Concept (POC) design that utilizes an ontology as a knowledge source that interacts with a Curriculum Wiki facility through ontological agents. The POC design exploits agent-interaction models that had already been analyzed through the application of the Conceptualization and Analysis phases of the MASCommonKADS Agent Oriented Methodology. Thereafter, the POC design of the CCMS is developed in the Design phase. The paper concludes with a discussion of the resulting contribution, limitation and future work. Keywords: Computing Ontology, curricula development, curriculum wiki, MAS-CommonKADS, software agents.
1 Introduction Curriculum development, maintenance and management are time-consuming and labour-intensive. Furthermore, curriculum activities are expected to be carried out regularly within a 3-yr timeframe owing to the pace of Computing advances. As an initiative to facilitate management of such activities, it is proposed that a Computing Ontology be created as the knowledge source of existing curricula. In addition, there would be a wiki-like facility that would enable suggestions to be voiced, and a discussion forum that replaces face-to-face meetings. This Computing Curricula Management System (CCMS) would provide a live system available in real-time for all stakeholders to comment, re-think, and re-work all suggestions transparently. This is in line with the emphasis on outcomes-based academic programmes in Malaysia. The Ministry of Higher Education, through its Malaysian Qualifications Agency (MQA), has been very particular in observing such policies. It is hoped that the CCMS would be useful in aiding the document submission of new programmes and updates of enhancements to the MQA. This paper will introduce the motivation and supporting literature for this project. Then the methodology will be detailed. It will conclude with future work possibilities. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 280–292, 2011. © Springer-Verlag Berlin Heidelberg 2011
Proof-of-Concept Design of an Ontology-Based CCMS
281
2 Literature Review 2.1 Outcome-Based Curricula An outcome-based curriculum has its roots in Outcome-Based Education (OBE). According to Acharya [1], OBE is a method of curriculum design and teaching that focuses on arming students with knowledge, skills and related professional capabilities after undergoing the particular program. The desired outcome is selected first and the corresponding curriculum and other teaching and learning aids would then be created to support the required outcome [2]. In this manner, lower level subject outcomes would collectively contribute to the top level program outcomes. Professional bodies and government agencies have moved away from subject objectives to a specification of characteristics of graduates, often in the form of program learning outcomes [3-9]. In Malaysia, the MQA introduced the Malaysian Qualifications Framework (MQF) as a point of reference to explain and clarify qualifications and academic achievement in Malaysian higher education and how these qualifications might be linked. The MQF is intended to provide a detailed description of the Malaysian education system to an international audience [10]. The MQF advises educators to shift their Teaching-Learning focus from the conventional Teacher-centered emphasis to the more progressive Student-centered emphasis. With the focus sharply on the role played by “Learning Outcomes” within Student-centered Teaching-Learning, all curricula are to be designed through the mapping of the subject and program outcomes onto the eight Learning Outcome Domains (LODs) (see Fig. 1). Of particular relevance to outcome-based curricula is Cassel et al.’s proposal [11], in which they suggested that their Computing Ontology could be used to fulfill the requirements of outcome-based curricula.
Fig. 1. (left) Shift of focus to Student-centered Teaching-Learning. (right) Learning Outcome Domains defined by the MQF.
2.2 Ontology Consider an ontology. It facilitates the sharing of knowledge. In Gruber’s seminal paper, he defined it as a specification of a representational vocabulary for a shared
282
A. Tang and A.A. Rahman
domain of discourse [12]. The shared domain of discourse would consist of classes, relations, functions, and similar objects of interest. Ontologies provide the basic structure around which knowledge bases can be built. Ontological engineering activities include philosophy, knowledge representation formalisms, development methodologies, knowledge sharing and reuse, information retrieval from the Internet or any online repositories, to name a few. It provides a systematic design rationale of a knowledge base according to the context of interest [13, 14]. Berners-Lee et al. in their famous paper [15] included Ontologies as the important third basic component of the Semantic Web, stating that Web Ontologies typically consist of a taxonomy that defines classes of objects and relations among them, and a set of inference rules. The human user benefits greatly from these inference rules as they can be manipulated to improve human understanding. With the emergence of cloud computing which sees information processed and stored in various locations, the combination of the taxonomy and inference rules would be invaluable in providing, as a whole, the “big” picture from several disparate information sources. Computing Ontology. Further to these perspectives, Cassel et al. proposed their Computing Ontology project that compressed the five distinct fields, Computer Engineering, Computer Science, Information Systems, Information Technology, and Software Engineering into one generic Computing field [16]. Their primary objective was to connect the comprehensive list of typical computing topics with curriculum development and subject planning activities. Thereafter, a prototype system for matching subject topics and outcomes would emerge. In an earlier paper, the authors had proposed a web-based utility to enable a subject developer to select or create outcomes as well as to select suitable topics that could achieve those outcomes [11]. A study of Cassel’s ontology revealed that their “Flash” displays were not linked with actual curricula objects in their ontology. This indicated that actual curricula management could not be carried out successfully. In addition, the authors also admitted that their ontology did not specify curriculum requirements [17]. Ontologies and Software Agents. Ontologies require operators to retrieve relevant information from them and to learn useful relationships that may exist among their objects. Medina et al. [18] utilized agents and ontologies to retrieve information from a set of federated digital libraries. Their approach was to adopt the MASCommonKADS Agent-Oriented Methodology (AOM) to model their software agents. These agents were called ontological agents as they operated with ontologies. The agent approach is attractive as it is a natural way to model systems consisting of multiple, distinct and independent components. Agents enable functionalities such as planning, learning, communication or coordination and are perceived as a natural extension of current component-based approaches [19]. A system consisting of several agents that support inter-agent and agent-environment interaction within a digital system is called a Multi-Agent System (MAS). The MAS-CommonKADS AOM is an extension of the CommonKADS methodology [20]. CommonKADS uses Object-Oriented concepts and techniques to facilitate Knowledge Acquisition, Engineering and Management under the ESPRIT IT initiative
Proof-of-Concept Design of an Ontology-Based CCMS
283
[21]. It provides the methods to perform a detailed analysis of knowledge-intensive tasks and processes. Therefore, it is little wonder that it is the de facto European standard for knowledge analysis and knowledge-intensive system development. In response to the work by Cassel et al. and Medina et al., Tang and Lee [22] proposed a Computing Curriculum Wiki that applies the MAS-CommonKADS AOM to model the software agents and their accompanying processes. This would form the initial work required in the development of the CCMS.
3 Methodology Further to [22], this paper proposes to exploit its resulting agent-interaction models to design and develop the Computing Curricula Management System (CCMS) based on the scheme proposed by Cassel et al. [11] (see Fig. 2).
Fig. 2. CCMS scheme adopted from [11]
In Fig. 2, the “Existing Curriculum” is fixed and not editable by the larger Computing community. The “Curriculum Wiki” is an editable environment which allows suggested additions and revisions to be introduced into a copy of the existing curriculum, i.e. a working/discussion copy. These activities are made transparent to all members of the Computing community. The third component, the “Discussion Forum”, enables discussions regarding the aforementioned additions and revisions. Four typical operations were identified – two searches from keywords, one subject exemption request, and one subject syllabus change request. Agent-interaction models resulted from applying the Conceptualization and Analysis phases of the MASCommonKADS AOM to these operations. 3.1 Conceptualization and Analysis The Conceptualization phase was used to develop a preliminary Conceptualization model with Message Sequence Charts. This was followed by the Analysis phase that produced the Agent, Expertise, and Coordination models. The Task, Knowledge, and Organization models were omitted as the requisite information was already captured by the previous three. To facilitate understanding, the models of the subject exemption request are reproduced in Table 1. The reader is directed to [22] for a more detailed treatment.
284
A. Tang and A.A. Rahman
Table 1. Conceptualization and Analysis phases utilized to model the CCR operation: User Request for Subject Exemption
0$6&RPPRQ.$'6$20 $JHQW,QWHUDFWLRQ0RGHOV D&RQFHSWXDOL]DWLRQSKDVH &RQFHSWXDOL]DWLRQPRGHO
E$QDO\VLVSKDVH
6WHSV &OLFNRQ³([HPSWLRQ´LFRQ 6\VWHPSHUIRUPVFKHFN 'LVSOD\WKHOLVWRIDOOSRVVLEOHVXEMHFW $OO H[HPSWLRQV DUH DSSOLFDEOH RQO\ IRU c, which define the kind of the cyclide, Figures 4 (a), 4 (b) and 5. We have a CD4A when a ≥ μ ≥ c, √a CD4I when μ ≥ a ≥ c and a CD4E when a ≥ c ≥ μ . For convenience, we note b = a2 − c2 . In the appropriate frame of reference, a CD4 with parameters a, c and μ has the two following equivalent implicit equations [5,8]: 2 2 x + y2 + z2 − μ 2 + b2 − 4 (ax − cμ )2 − 4b2y2 = 0 1
We do not consider the torus case.
(2)
Blending Planes and Canal Surfaces Using Dupin Cyclides
409
Fig. 2. A Dupin cyclide of four degree and its two generating conics (an ellipse and a hyperbola)
(a)
(b)
Fig. 3. A Dupin cyclide as envelope of two families of spheres tangent to three fixed ones. (a) : The first family. (b) : The second family
(a)
(b)
Fig. 4. (a) : A ring cyclide, called CD4A. (b) : A spindle cyclide, called CD4I
410
L. Druoton et al.
Fig. 5. A horned cyclide, called CD4E
2 2 x + y2 + z2 − μ 2 − b2 − 4 (cx − aμ )2 + 4b2z2 = 0
(3)
The origin of the reference system is the common centre of the conics and the three axes are the axes of the conics (one of them is shared by the two conics). From equations (2) and (3), it is easy to see that a CD4 admits two planes of symmetry: Pz : (z = 0) and Py : (y = 0). In the same reference system, this Dupin cyclide has the parametric equation: ⎛ ⎞ μ (c − a cos θ cos ψ ) + b2 cos θ ⎜ ⎟ a − c cos θ cos ψ ⎜ ⎟ ⎜ ⎟ b sin θ × (a − μ cos ψ ) ⎜ ⎟ ⎜ ⎟ (4) Γ (θ , ψ ) = ⎜ ⎟ a − c cos θ cos ψ ⎜ ⎟ ⎜ ⎟ b sin ψ × (c cos θ − μ ) ⎝ ⎠ a − c cos θ cos ψ where (θ , ψ ) ∈ [0, 2π ]2 . In this paper, the terms cyclide and CD4 will always refer to Dupin cyclide. Line of Curvature. The lines of curvature of a CD4 are circles [5,8]. These circles of curvature are obtained with constant values of θ or ψ in the parametric equation (4). They belong to planes of equations: ax sin θ0 − by cos θ0 = μ c sin θ0
(5)
for a constant value of θ and: cx sin ψ0 − bz = μ a sin ψ0
(6)
for a constant value of ψ . These two families of planes generate a pencil of planes. Relating to the first family, the common line Δθ is given by the equations x = caμ and y = 0. Relating to the second pencil of planes, the common line Δψ is given by the equations x = acμ and z = 0.
Blending Planes and Canal Surfaces Using Dupin Cyclides
(a)
411
(b)
Fig. 6. Circles of curvature of a CD4A. (a) : with a constant value of θ . (b) : with a constant value of ψ
(a)
(b)
Fig. 7. Pencil of planes containing the circles of curvature of a CD4A. (a) : with a constant value of θ . (b) : with a constant value of ψ
(a)
(b)
Fig. 8. (a) : Principal circles of a CD4A. (b) : Principal circles of a CD4I.
412
L. Druoton et al.
Fig. 9. Principal circles of a CD4E
Symmetry Planes and Principal Circles. In the set of CD4 curvature circles, four of them (coplanar two by two) play an important part because they allow us to determine the parameters of the cyclide if we fix its kind. These circles are called principal circles of the CD4 and are contained in the two symmetry planes Py and Pz . Figures 8(a), 8(b) and 9 show the four principal circles of each kind of CD4. Determination of the Cyclide Parameters The three parameters a, c and μ of a CD4 can be found using two coplanar principal circles when the kind of the CD4 is known. We note C(O, r) the circle of centre O and radius r. Let C1 (O1 , r1 ) and C2 (O2 , r2 ) with r1 ≥ r2 , be the principal circles of a CD4 in Py , Figures 10 and 11(a), the parameters a, c and μ are given by [9]:
O1 O2 r1 − r2 r1 + r2 , , (a, c, μ ) = (7) 2 2 2 Relating to the plane Pz , we have, Figure 11(b), for a CD4A:
r1 + r2 O1 O2 r1 − r2 , , (a, c, μ ) = 2 2 2
Fig. 10. Principal circles of a CD4A in Py
(8)
Blending Planes and Canal Surfaces Using Dupin Cyclides
413
y
z
μ+a
μ+ c
1 0 0 1
O1
a
1 0 0 1
μ− c
a 11 00
c
x O1 c
00 11
O O 2
O
O2 μ− a
x
C2
C2
C1
C1
(b)
(a)
Fig. 11. (a) : Principal circles of a CD4I in Py . (b) : Principal circles of a CD4A or CD4I in Pz .
Fig. 12. Set of tangent circles to three fixed ones
and for a CD4I:
(a, c, μ ) =
r1 − r2 O1 O2 r1 + r2 , , 2 2 2
(9)
Determination of Principal Circles of a CD4 From Three Spheres. Consider that we know three spheres (the centres are non colinear) of the same family defining a CD4A or a CD4I. We want to build this cyclide. To find principal circles of a CD4, we had to work on the plane containing the three centres of the spheres (these points are on one of the generating conics, Figure 2). The section of each sphere by the plane
414
L. Druoton et al.
C C2 B1
B2
C1
A1
A2
Δ
Fig. 13. Construction used in existing algorithms
is a circle. The principal circles of the CD4 are two tangent circles to the three initial ones. In fact, there exist eight tangent circles or four pairs, Figure 12. Actually, the centres of tangent circles to two given ones belong to two hyperbola. We had to find the appropriate intersection of two out of the four hyperbolas to obtain a CD4. As we consider only CD4A and CDAI and the symmetry plane Pz of the CD4, the two circles to use are the two circles Ct1 and Ct2 such as Ct1 ∩Ct2 = ∅, Figure 11(b). Blending Algorithms. In the literature, there are some algorithms that blend revolution surfaces and spheres or planes with CD4 [13,10]. They consist in building two principal circles of the CD4 belonging to a symmetry plane of the initial primitives, that is to say building 2D blending as in Figure 13: they build two circles Ci , i ∈ {1, 2}, tangent to the circle C in Bi and to the line Δ in Ai . Determination of the Boundaries of The CD4 For The Blending. We have to keep only a part of the CD4 for the blending, so, we have to determine the boundaries of the useful part [9]. The blending is made between two spheres of one of the families along two circles of curvature of the CD4, obtained with a constant value of ψ . We can have two different kinds of blends: one called pillar blend, Figure 14(a), and the other called recipient blend, Figure 14(b).
3 Blending Algorithms between Canal Surfaces and Planes 3.1 Principle of the Methods We give two blending algorithms between a canal surface and a plane using CD4. From three tangent spheres to the plane and to the canal surface along a circle of curvature (blending circle), the algorithm 1 determine the CD4, as the envelope of family of spheres containing the three first. After, we consider the construction plane as the plane passing through the centres of the three spheres. This plane is, also, a symmetry plane of
Blending Planes and Canal Surfaces Using Dupin Cyclides
(a)
415
(b)
Fig. 14. A blend using a CD4 along circles of curvature with a constant value of ψ . (a) : A pillar blend. (b) : A recipient blend
the CD4 used for the blend. This algorithm is based on the second definition of a Dupin cyclide cited in this paper. The second algorithm uses only geometric deductions. It generalizes the one suggested in [10]. We determine the construction plane from the initial primitives and after, we use a similar algorithm to the one used in [10] in this plane. In the previous method, spatial constraints were imposed to construct the blend. The plane containing the blending circle is a plane of the pencil of planes described before with a constant value of ψ . In our method, we consider the line which is the intersection of the initial plane and the plane containing the blending circle. This line is orthogonal to the construction plane Py of the CD4 which passes through the centre of the blending circle.So, Py is totally determined. In the two cases, the blending circle is given by the canal surface curvature circle. Once the construction plane determined (Pz in algorithm 1 and Py in algorithm 2), the two algorithms determine the principal circles of the CD4 in the symmetry plane considered and compute the parameters of the CD4. Finally, the boundaries of the useful part of the CD4 and also the possible translations and rotations to replace the CD4 in the appropriate reference system are determined. Note that the CD4 boundary of the useful part corresponding to the plane is ψ = π2 or ψ = −2π . We can note that: – In the first algorithm, the user has to choose the kind of the CD4 (CD4A or CD4I) after determining its principal circles; – In second algorithm, these principal circles are determined according to the choice of the CD4 kind; – We do not consider the case where the CD4 of blending is a torus (the calculus are trivial in this particular case).
416
L. Druoton et al.
Algorithm 1. Blending canal surfaces and planes using CD4 Input: A canal surface Sur f0 and a plane P0 1. 2. 3. 4. 5.
Choice of a circle of curvature C0 onto Sur f0 Choice of three distinct spheres S1 , S2 and S3 tangent to P0 and to Sur f0 along C0 Determination of the symmetry plane Pz of the CD4 containing centres of S1 , S2 and S3 Determination of the circles C1 , C2 and C3 , sections of S1 , S2 and S3 by Pz Determination of the two principal circles Ct1 and Ct2 , with Ct1 ∩Ct2 = ∅, tangent to C1 , C2 and C3 , Figure 12 6. Computation of the parameters a, c and μ of the CD4 from Ct1 and Ct2 according to the kind of CD4 (CD4A or CD4I) 7. Determination of the translations and rotations 8. Computation of the boundaries of the useful part of the CD4 Output: The useful part of the CD4 blending the canal surface Sur f 0 and the plane P0
Algorithm 2. Blending canal surfaces and planes using CD4 Input: A canal surface Sur f0 and a plane P0 1. Choice of a circle of curvature C0 onto Sur f0 and determination of the plane P1 containing C0 2. Determination of the line Δ 0 , Δ 0 = P0 ∩ P1 3. Determination of the plane Py , orthogonal to Δ 0 containing O0 , centre of C0 4. Determination of the points B1 and B2 , sections of C0 by Py 5. Determination of the line Δ , Δ = P0 ∩ Py 6. Determination of the two lines Δ 1 and Δ 2 tangent to Sur f0 in B1 and B2 in Py 7. Determination of the circle Ct1 (resp. Ct2 ), tangent to Δ and Δ 1 (resp. Δ 2 ) in B1 (resp. B2 ) according to the kind of CD4, Figures 10 and 11(a) [10] 8. Computation of the parameters a, c and μ of the CD4 from Ct1 and Ct2 9. Determination of the translations and rotations 10. Computation of the boundaries of the CD4 useful part Output: The useful part of the CD4 blending the canal surface Sur f 0 and the plane P0
4
Numerical Examples
4.1 Example 1: Blending a Tube and a Plane Using Algorithm 1 Let Tube be the map of the tube defined by his central curve Γ0 as Γ0 (u) = (u, 0, cos (u)): ⎛ ⎞ (r cos (v)) sin (u) u + ⎜ ⎟ ⎜ 1 + sin(u)2 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ r sin (v) Tube (u, v) = ⎜ (10) ⎟ ⎜ ⎟ ⎜ ⎟ r cos (v) ⎜ cos (u) + ⎟ ⎝ ⎠ 1 + sin(u)2
Blending Planes and Canal Surfaces Using Dupin Cyclides
417
with r = 0.8, u ∈ [−5, 5] and v ∈ [0, 2π ]. Let P0 be the plane of equation z = −0.6 x + 10. We want to blend the tube and P0 along a circle of curvature C0 of equation Tube(5, v) with v ∈ [0, 2π ], Figure 15.
Fig. 15. The initial tube and the initial plane
We compute the radii and the coordinates of the center of three spheres S1 , S2 and S3 tangent to P0 and to the tube along C0 . Their centres are O1 (5, 6.88, 0.28), O2 (5, −6.88, 0.28) and O3 (2.48, 3.86, 3.21) (some coordinates are approximated) and their radii are r1 6.08, r2 6.08 and r3 4.66. We work in the plane Pz defined by O1 , O2 and O3 . We determine the circles C1 , C2 and C3 , sections of S1 , S2 and S3 by Pz and the principal circles of a CD4A tangent to C1 , C2 and C3 , Figure 16.
Fig. 16. The principal circles in Pz obtained from three spheres tangent to the initial plane and tube (two views)
Then, we compute the CD4 parameters and we obtain a 7.30, c 2.33 and μ 6.52, Figure 17(a). The blending is realized with ψ ∈ 0, π2 , Figure 17(b). We obtain the same result using algorithm 2: the construction plane Py , orthogonal to the plane containing C0 and to P0 has for equation y = 0. The points, sections of C
418
L. Druoton et al.
(a)
(b)
Fig. 17. (a) : The CD4 used for blending the tube and the plane. (b) : The blending between the tube and the plane.
Fig. 18. The two principal circles in Py used for blending
by Py , are B1 (4.44, 0, 0.86) and B2 (5.55, 0, −0.29). To obtain a CD4, the circle Ct1 and Ct2 , tangent to P and to the tube in B1 and B2 are computed, Figure 18. The parameters of the CD4A with these principal circles are a 7.30, c 2.33 and μ 6.52. We obtain about the same blend, Figure 17. 4.2 Example 2: Blending a Coil and a Plane Using Algorithm 2 We want to blend a coil and a plane P0 along a curvature circle C0 of the coil. The coil equation is given by formula (1), with values a0 = 3, b0 = 2 and h0 = 1. The equation of the plane P0 is y = x − 10. C0 is given by Coil (u, 0), its centre is O0 = Γ0 (0) and its radius is b0 , Figure 19. In the plane Py , orthogonal to P0 and to the plane P1 containing C0 , we determine Ct1 and Ct2 tangent to P0 and to the coil along C0 in order to obtain a CD4A, Figure 20(a), or a CD4I, Figure 20(b). In the first case, the CD4A parameters are a 14.84, c 10.84 and μ 12.94 and, in the second case, the parameters of the CD4I are a 6.49, c 4.71 and μ 8.49. We obtain the useful part with ψ ∈ 0, π2 , in the two cases, Figures 21(a) and 21(b).
Blending Planes and Canal Surfaces Using Dupin Cyclides
419
Fig. 19. The initial coil and the initial plane
(a)
(b)
Fig. 20. The two principal circles used to obtain a CD4. (a) : A CDA4. (b) : A CD4I.
(a)
(b)
Fig. 21. Blend between the coil and the plane using a CD4. (a) : A CD4A. (b) : A CD4I.
5 Conclusion In this paper, we give two new algorithms for blending canal surfaces and planes using CD4s. The user, during the first step of our algorithms, chooses a curvature circle onto
420
L. Druoton et al.
the canal surface. Then, he chooses three spheres tangent to the plane and to the canal surface, along this circle, using the first algorithm or he determines the plane containing this circle using the second algorithm. The two methods are equivalent and lead to the same result. In the first case, we determine the symmetry plane Pz of the CD4 whereas, in the second case, we determine Py . This study is done as part of a thesis on 3D reconstruction of shells manufactured in CEA Valduc. The theoretical model of which is a particular canal surface. Our next goal is to reconstruct the real object obtained from the theoretical one by a deformation. We hope to achieve this reconstruction using parts of different CD4s. This future work will use the algorithms described in this paper.
References 1. Allen, S., Dutta, D.: Cyclides in pure blending I. Computer Aided Geometric Design 14(1), 51–75 (1997); ISSN 0167-8396 2. Allen, S., Dutta, D.: Cyclides in pure blending II. Computer Aided Geometric Design 14(1), 77–102 (1997); ISSN 0167-8396 3. Cayley, A.: On the cyclide. Quartely Journal of Pure and Applied Mathematics 12, 148–165 (1873) 4. Darboux, G.: Lec¸ons sur la Th´eorie G´en´erale des Surfaces, vol. 1. Gauthier-Villars (1887) 5. Darboux, G.: Principes de g´eom´etrie analytique. Gauthier-Villars (1917) 6. Dutta, D., Martin, R.R., Pratt, M.J.: Cyclides in surface and solid modeling. IEEE Computer Graphics and Applications 13(1), 53–59 (1993) 7. Dupin, C.P.: Application de G´eom´etrie et de M´echanique la Marine, aux Ponts et Chauss´ees, etc. Bachelier, Paris (1822) 8. Forsyth, A.R.: Lecture on Differential Geometry of Curves and Surfaces. Cambridge University Press, Cambridge (1912) 9. Garnier, L.: Math´ematiques pour la mod´elisation g´eom´etrique, la repr´esentation 3D et la synthese d’images. Ellipses (2007); ISBN : 978-2-7298-3412-8 10. Garnier, L., Foufou, S., Neveu, M.: Blending of surfaces of revolution and planes by dupin cyclides, Seattle (2004) 11. Martin, R.R.: Principal patches for computational geometry. PhD thesis, Engineering Department, Cambridge University (1982) 12. Pratt, M.J.: Cyclides in computer aided geometric design II. Computer Aided Geometric Design 12(2), 131–152 (1995) 13. Pratt, M.J.: Quartic supercyclides I: Basic theory. Computer Aided Geometric Design 14(7), 671–693 (1997)
Avoiding Zigzag Quality Switching in Real Content Adaptive Video Streaming Wassim Ramadan, Eugen Dedu, and Julien Bourgeois University of Franche-Comt´e, Laboratoire d’Informatique de l’Universit´e de Franche-Comt´e Montb´eliard, France
Abstract. A high number of videos, encoded in several bitrates, are nowadays available on Internet. A high bitrate needs a high and stable bandwidth, so a lower bitrate encoding is usually chosen and transferred, which leads to lower quality too. A solution is to adapt dynamically the current bitrate so that it always matches the network bandwidth, like in a classical congestion control context. When the bitrate is at the upper limit of the bandwidth, the adaptation switches constantly between a lower and a higher bitrate, causing an unpleasant zigzag in quality on the user machine. This paper presents a solution to avoid the zigzag. It uses an EWMA (Exponential Weighted Moving Average) value for each bitrate, which reflects its history. The evaluation of the algorithm shows that loss rate is much smaller, bitrate is more stable, and so received video quality is better. Keywords: Real time content, Video streaming, Rate control, Congestion control.
1
Introduction
Nowadays, the number of videos encoded in several bitrates and accessible for everyone increases significantly day by day. Their contents are generally delivered to final user using streaming services over Internet. These services as well as the demand for high video quality (e.g. HD and 3D videos) are in constant progression. They require more and more bandwidth, hence available bandwidth variation must be taken into account to shorten buffering time at the receiver. Currently, one video bitrate is chosen at the beginning of a video streaming; the transmission is controlled at the network layer (TCP or UDP) and application is not involved at all. Hence, two choices exist when playing streamed video content. The first is to choose a low video bitrate, and the video is played directly, without interruption. The second is to choose a bitrate higher than average bandwidth, buffer multimedia data at the user and play the video when the buffer has sufficient data; because the buffer empties, the user will be confronted to many play/pause during the streaming. Both cases are unpleasant to the user eye: the first has the advantage to watch a fluid video but with low quality, while the latter allows to have a good quality but with either short or long waiting H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 421–435, 2011. c Springer-Verlag Berlin Heidelberg 2011
422
W. Ramadan, E. Dedu, and J. Bourgeois
time and with frequent interruptions. Users wishing to have an instant access without delay to multimedia content and with the best possible quality can use a new kind of multimedia streaming services, multimedia content adaptation. For video streaming in a network with highly variable bandwidth, a content adaptation is a way to adapt the video bitrate to the network characteristics, hence to improve the video quality perceptible by the final user. A cooperative approach between application layer and network layer can be used. Transport protocol handles the congestion control on the network side, while the application handles the video bitrate control on the server side. Bitrate control can be done by changing the quantisation parameters, or by changing the FPS (frames per second) etc. An undesirable effect which appears when multiple video qualities are available and when bandwidth changes, is that the bitrate control leads to constant switching between two qualities (bitrates); one is smaller than the bandwidth and the second is greater. For example, when a user is connected to Internet through a 2.5Mb/s link and the video is available in 2 and 3Mb/s qualities, then an adaptive streaming algorithm will constantly switch between these two qualities; obviously, the best solution would be to stay with 2Mb/s much more time before retrying 3Mb/s quality. We call this problem zigzag quality switching. Few papers treat this issue, and they do not solve it completely. This paper presents a solution, called ZAAL (Zigzag Avoidance Algorithm), to this problem. It uses a successfulness value of each bitrate, and its average is constantly updated. This average is used to take the decision whether a next higher quality can be chosen or not, thus preventing the zigzag. This paper is organized as follows. Section 2 formulates the problem which we try to solve, and section 3 compares it to similar existing problems. Section 4 presents our ZAAL algorithm as a solution to the problem, and its performance is evaluated through real experiments in section 5. Finally, section 6 concludes this article and presents some perspectives.
2
Problem Formulation
The zigzag quality switching is the fact that an adaptive video streaming sender application keeps switching the video quality between two values, especially when one value is greater and the other is smaller than the bandwidth capacity. In a network with highly variable bandwidth, video adaptation is very important. It aims to adapt the video bitrate to the network characteristics and thus to improve the video quality perceptible by the final user. The adaptation can be done at several OSI layers, but we take the application layer to exemplify. To adapt video to network conditions, the sender application controls some video parameters, such as the bitrate, FPS (Frames per second) or image size; for example, if more bandwidth is available, application increases the video bitrate, and if less bandwidth is available, application decreases it. To reach this goal, application should constantly retrieve the bandwidth (each 100ms, each 2 sec. or each I image for MPEG video for example) and adapt its bitrate accordingly.
Avoiding Zigzag Quality Switching in Real Content
423
To minimise the quality fluctuation, it is not preferable, in our opinion, to change quality each tiny period (at each sent packet for example). Sometimes the application has the choice among multiple available video qualities (e.g. many videos on Internet are encoded with several qualities). This video adaptation is known in the literature as “rate adaptive video control”. Fig. 1 allows to better understand how video adaptation works. The transport protocol has a congestion control which gives the rate at which the packets leave the socket buffer (and afterwards the machine, and enter the network). When current bandwidth is smaller than video bitrate, the socket buffer fills with packets. When the buffer becomes full, exceeded packets generated by application will fail to be written. We call these packets “failed packets”. The goal of an adaptive video content application is to readjust video bitrate in this case by choosing a lower video quality. When new bitrate does not cause failed packets, application will retry a higher quality to match again the available bandwidth and to enhance the video quality.
Fig. 1. Video data flow on the sender side
Our previous paper VAAL (Video Adaptation at Application Layer) [12] is an example of video adaptation method. It uses transport protocol buffer overflow as a solution to find out the available bandwidth and to adapt video content bitrate to the discovered bandwidth. Each 2 seconds, the server application computes the number of failed packets. This number is used to control the video bitrate afterwards. A high number means smaller bandwidth and smaller bitrate. Zero error indicates either a stable or more bandwidth, so bitrate of sent video can be increased. The experiments presented in this paper use VAAL. An adaptive application, such as the one presented above, keeps switching between two qualities: one with bitrate higher than bandwidth and with failed packets, and another one with bitrate smaller than bandwidth and with no error. We noticed this phenomenon in our experiments (see Fig. 4(a) for a clear example, where during the first minute the video bitrate is continuously toggling between 0.5 and 1 Mb/s). This is what we called “zigzag quality switching” problem; it has two undesirable effects: – Numerous bitrate changes, leading to unstable video quality. It is preferable to keep the video quality stable as much as possible while maximising it. – Many lost packets (when bitrate is higher than bandwidth), leading to lower video quality and wasted ressources (CPU and network) for these lost packets. Indeed, videos are generally more affected by packet losses than lower bitrate.
424
W. Ramadan, E. Dedu, and J. Bourgeois
Note that the problem to be solved is not to reduce the number of quality changes, for example when such changes use much resources (CPU, disk etc.) It is neither to use the highest available quality at a certain moment (which can potentially lead to zigzag, for example when right afterwards the bandwidth decreases). The problem to be solved is the zigzag, i.e. avoiding to reuse a quality which has recently been used (one or more times) and has proved unsuccessful. As such, a solution which does not take into account the recent history somehow cannot solve this problem.
3
Positioning Comparing to Related Work
We found that our problem has similarities with other problems, and with two techniques already found in the literature. 3.1
Similar Methods
One method which can be used to solve zigzag quality switching is presented in [10]. It is proposed for multicast streaming video but can be adapted to unicast transmissions too. The sender sends several layers (base and enhancement), and the receiver subscribes dynamically to one or several of them. The receiver uses a timer for each level of subscription (layer). At the beginning the timer is short. When the timer expires and no loss was experienced, the receiver subscribes to the next new level and the timer for that level is started. If on the contrary the level led to lost packets, the receiver goes back to previous level and the timer for the level with losses is multiplicatively increased. In this method, there is no superior limit for the timer, which means that the quality increasing is forbidden for a very long time, even if bandwidth becomes greater in the meantime. Also, each time, only the timer of the current bitrate is updated, which means that good conditions for higher levels do not ameliorate the timer of lower levels. Both these characteristics are unrealistic. On the contrary, our EWMA-based algorithm does not have these drawbacks. A metric to evaluate the jerkiness of a video, given by the number of quality changes, is given in [4]. It uses a formula to calculate the Effective Frame Rate (EFR) of a video: EF R =
N −1 i=1
f psi − P ∗ qualitychange(max(i − W, 0), i) N
(1)
where f psi is the number of frames delivered in the ith second of the video, W is the window size, P is a weighting factor for variation in frame rate, and qualitychange(max(i − W, 0), i) is the number of quality changes in the range i − W to i. This formula is used to limit the number of quality changes during each window (W ). As such, it counts the number of quality changes, no matter where they appear inside the window, which is however an important parameter of the
Avoiding Zigzag Quality Switching in Real Content
425
jerkiness. For example, 10 quality changes in the first 2 seconds of a video of 1 minute are worse visually than 10 quality changes dispersed through the whole video. The goal of EWMA, used in our solution, is exactly to take into account the time of the change too. Moreover, in [4] the window is static (coarse-grained moving) and fixed (same size), while in our case we use a sliding and dynamic window. Finally, our goal is not to limit the number of quality changes, but to avoid quality increases which lead to quality decreases right afterwards. For all these remarks, the formula in this paper is not interesting in avoiding the zigzag problem we want to tackle. The adaptation video method described in [3] does not cope directly with zigzag problem but presents a way for smoothing sent video bitrate and reducing the frequency of video quality changes. The main idea is that the application does not switch to a higher video quality (or video layer in a hierarchical video encoding) until the sender is certain that the video will continue playing even after a reduction of the congestion window. This paper presents results for AIMD and SQRT congestion controls for video streaming. To guarantee the continuous video playing, the sent video quality should be the highest quality which verify: C < R − βRl , where C is the bitrate at which the video is encoded, R is the transmission rate and β = 1/2 and l = 1 for AIMD based congestion control and l = 1/2 for SQRT based one. This formula implies that the highest video √ quality does not exceed half of the transmission rate for AIMD, and it is β R smaller than the bandwidth for SQRT. Results given in [3] show that this formula works well for AIMD based congestion control streaming application but the video bitrate is most of the time very low compared to available bandwidth. On the other hand, it has a little impact on SQRT based algorithm, i.e. the quality changes are not significantly reduced. Finally and most importantly, this formula does not really solve the zigzag quality switching, but merely reduces the bitrate artificially. 3.2
Comparison with Bandwidth Estimation Techniques
These techniques aim to estimate the available bandwidth at the time of the measurement. A well-known technique is packet pair [7]. The basic idea is that the sender sends a pair of packets of the same size to a destination (it could be a specialised server or simply an aware receiver). The destination host then responds by sending an echo for each received packet. By measuring the changes in the time spacing between the two packets, the sender can estimate the available bandwidth of the network path as following: bw = where:
s t2 − t1
(2)
426
W. Ramadan, E. Dedu, and J. Bourgeois
– bw is the bandwidth of the bottleneck link – s is the packet size – t1 and t2 are the arrival time of the first and second packet respectively. The accuracy of this technique depends on several assumptions. The most important one is that the two packets should be enqueued one after the other at the bottleneck link, which is not guaranteed when a router has a non FIFO queue or when another packet is inserted between the two packets. Other variants of packet pair technique are developed [1,9,2] to mitigate this effect. Another known technique is packet train [8]. A burst of packets is sent between source and destination. Only packets in the same train are used for measurement. A train is formed by packets for which the spacing between two consecutive received packets does not exceed some inter-packet (inter-train) gap. Like in packet pair, inter-arrival times between packets in the same train are used for estimating the available bandwidth. All these methods have several drawbacks: 1. Sending/receiving additional data packets is needed to estimate the available bandwidth. If video data packets themselves are used for this, receiver must distinguish them from other data, for example by adding a new field in the packet header. 2. Probing packets should be specifically dispersed, i.e. they need a specific timing when they are sent. This makes the use of the video data packets themselves difficult for probing purposes. 3. If a new connection is used for probing packets, sender/receiver should send/listen to a different socket (port). 4. Not only the sender application but also the receiver application must be modified to be able to respond to those probing packets. Changing both endpoints is known to be very difficult to deploy in reality. Finally, and the most important, these techniques have a different goal than ours. As they do not take into account the bitrates used in the past, they cannot avoid the zigzag. The previous cited reasons show that current bandwidth estimation techniques are not appropriate to solve the zigzag quality switching problem. 3.3
Comparison with Network Congestion Control
Our problem is similar to classical network congestion control problem, but not identical. We consider both window-based (written as TCP [11] in the following) and equation-based (written as TFRC [5] in the following) congestion controls. In fact, both TCP/TFRC and ZAAL try to solve a multi-criterion optimisation problem, but the criteria involved to achieve this goal are not really identical. Zigzag: ZAAL aims to avoid the zigzag between two consecutive qualities, which occurs when (1) the inferior quality is lower than bandwidth, hence no loss, hence the quality is increased, and (2) the superior quality is higher than bandwidth, hence a few losses, hence the quality is decreased to the inferior
Avoiding Zigzag Quality Switching in Real Content
427
quality again. TCP/TFRC have fine-grained sending rate; TCP, being a data transport protocols, does not pay attention to such zigzag, while TFRC smooths the sending rate, without avoiding the zigzag. Losses: TCP/TFRC aim to send at the maximum rate, even if this yields a few losses. On the contrary, ZAAL aims to avoid losses as much as it can. In reality, losses cannot be completely avoided unless a very low bitrate is used. So for ZAAL it is better to maintain a quality without or with few losses, instead of the next higher quality with high loss rate. In fact, ITU.T G.1070 [6] recommends that the end-to-end IP packet loss rate in video streaming should be less than 10%. As an example, during a video transmission ZAAL prefers a 2Mb/s sending rate without losses to a 3Mb/s sending rate (50% more packets) with 20% lost packets. Sending rate: TCP/TFRC aim to send at the highest possible rate. This is why they constantly increase sending rate (using various laws, which differ from one TCP variant to another). VAAL aims that too, but does not increase quality if ZAAL considers, for zigzag/loss avoidance as shown above, that the next higher quality should not be tried at this time. Thus VAAL can be seen as quality increasing/decreasing proposition and ZAAL as quality increasing-only blocker. This comparison shows that classical congestion control algorithms have different goals, hence they are not an appropriate to solve the zigzag problem.
4
Zigzag-Avoiding Algorithm Overview
To solve the zigzag quality switching generated by a video adaptation method, an additional specific algorithm can be used, such as ZAAL. ZAAL algorithm works by avoiding constantly using bitrates higher than the available bandwidth. For that, it maintains an average value for each bitrate, called successfulness in the following. When the adaptive algorithm considers to increase bitrate (and only in this case), ZAAL checks if the successfulness of the higher bitrate is lower than a threshold, called β; if this is the case, a higher bitrate cannot be chosen. Otherwise said, application uses a higher bitrate i only if its successfulness Si > β. After this process, the average successfulness is updated. Note that ZAAL is not an adaptation method. It is used only to prevent an adaptation method from frequently switching the video quality. The only information ZAAL needs comes from the adaptation method, whether some bitrate causes lost packets or not. 4.1
Algorithm
ZAAL algorithm uses the successfulness value each time an adaptation period ends, which depends on the adaptation algorithm (e.g. 2 sec. in case of VAAL). Average successfulness value is calculated separately for each bitrate (with different weights), denoted by Si , which indicates if bitrate of index i can be used for the next period or not. As such, this average value expresses the application last attempts to use the corresponding bitrate. In brief, when a bitrate generates failed packets to the transport protocol buffer, corresponding successfulness
428
W. Ramadan, E. Dedu, and J. Bourgeois
value is greatly reduced; when a bitrate was successful, the successfulness value is greatly increased; finally, when a bitrate is used for a long time, the successfulness values corresponding to higher bitrates are slowly increased. As a general rule, the smaller the successfulness average value, the more the corresponding bitrate causes failed packets and application must avoid using it. The average successfulness Si (where i is bitrate index) of each bitrate, which changes according to the history of that bitrate, is calculated using an EWMA algorithm (Exponential Weighted Moving Average). Using EWMA allows to give greater weight to recent history compared to older history, since obviously current bandwidth is better expressed by recent bitrate usage than by older ones. Additionally, different weights are used, based on the bitrate involved. At the beginning of a video transmission, all S values are set to 1. Then they are calculated each time the application wants to adapt the video bitrate to the available bandwidth, using the following general formula: Si = (1 − α/d)Si + s(α/d)
(3)
where: – s is the successfulness at the time of measurement (the current “observation”) and can be either 0 or 1. 0 value is used when the bitrate did not give good results (hence its successfulness average value will decrease). 1 value is used when the bitrate gave good results, either because the bitrate did not cause failed written packets, or because the bitrate was not used recently (hence its successfulness average value will increase). – α is the degree of weighting increase/decrease, a constant smoothing factor between 0 and 1. A higher α discounts older successfulness values faster. – d is a division factor allowing to speed up or slow down the average value increasing depending on the bitrate involved. In our algorithm, d has three values: 1, 2 and 4. For s = 1, the greater the d, the slower the increasing of the average Si value. They will be explained more later. S values are arithmetic reals between 0 and 1. They can change in three cases: 1. First, when the application increases the video bitrate, i.e. the new bitrate k is higher than the actual one j. This appears when the application does not sense lost packets for a while with actual video quality. In this case S should increase (s = 1) for all bitrates i lower than or equal to the current bitrate. Increasing speed must be high (d = 1). So: Si|i≤j = (1 − α)Si + α
(4)
2. Second, when the application maintains the bitrate. This happens either when some packet losses occur but their rate is acceptable to maintain the current quality j, or successfulness of the higher bitrate k (Sk ) is low and application avoids using it. S value increases for all qualities but with different division factor values (different speeds). So:
Avoiding Zigzag Quality Switching in Real Content
429
– d = 1 and s = 1 (big increase) for all bitrates i lower than the current one j: Si|i 0.7. Using equation 7, this corresponds to 7 unsuccessful attempts by the application for the higher bitrate k, followed by an 8th successful attempt. Fig. 2 shows the evolution of this value: there are 7 unsuccessful attempts between 0.49 + and a value greater than 0.7. – S values between 0.7 and 1 are only possible when the bitrate does not cause failed packets in two cases: it increases slowly beyond 0.7 when the quality is stable (see equation 6), or it increases quickly beyond 0.7 when higher qualities are possible (see equation 5). 4. Finally, using an exponential average rather than a linear average allows to remember past values for a longer time when increasing average values S, and to give more weight to recent history than to old history.
Fig. 2. Successfulness average increasing during the biggest period of time
4.3
ZAAL Complexity
– ZAAL is a simple and easy to implement algorithm. Indeed, ZAAL consists of 3 if clauses with a loop over all bitrates inside each clause. – ZAAL is scalable, since it has a linear complexity in number of bitrates (a loop on all bitrates), and also in number of clients (ZAAL is executed independently for each of them). Moreover, it has a negligible execution time. Indeed, it needs to calculate just one value per available bitrate during each adaptation period (e.g. each 2 sec.), using a simple formula, as shown above. Hence, using ZAAL does not really affect the server processing capacity when the number of clients increases. The small additional impact of ZAAL can be easily compensated by its other positive side (i.e. when using ZAAL, the server avoids processing packets which would otherwise be lost on the network).
5
Experimental Results
Fig. 3 shows the real network used to realise experiments. A video streaming connection is made between a sender and a receiver with wired interface for both. An intermediate machine (called shaping machine in the following) is added between
Avoiding Zigzag Quality Switching in Real Content
431
Fig. 3. Network topology used for experiments
the sender and the receiver. This shaping machine has two interface cards: a wired interface connected to the sender and a wireless one (54Mb/s) connected to an access point (AP). The receiver is connected to the same AP via a wired interface. The video streaming uses a real video, available in four qualities: 3Mb/s, 2Mb/s, 1Mb/s and 512kb/s. The video has 180s. Two series of tests are done. For the first series (traffic shaping series), just one flow is present at any moment during the video transmission, which can thus use all the available bandwidth. On the other hand, the output bandwidth of the shaping machine is changed three times to simulate a variable bandwidth: 600kb/s for the first minute, 2300kb/s for the second minute, and traffic shaping was stopped for the last minute (hence the output bandwidth is the original wireless bandwidth). This series allows to see the effect of ZAAL in a network where bandwidth changes over time. For the second series, ten flows are present during the whole transmission, which clearly exceed the available bandwidth when using the highest bitrate. This allows to see what happens when multiples flows sense the available bandwidth, especially to check if this leads to a wide oscillation in performance (i.e. if some flows monopolise the entire bandwidth). All the flows use VAAL [12] as adaptation algorithm. Additionally, either all the flows use ZAAL, or none of them. This section presents two results: ZAAL avoids zigzag quality switching, and using ZAAL leads to similar quantitative results. 5.1
Zigzag Avoidance
In this section we present the quality variation with and without ZAAL. In the following figures, the x-axis represents the time from 0 to 180s, the duration of a video transmission, and the y-axis shows the video bitrate (with or without ZAAL). One Flow in Case of Traffic Shaping. As expected, ZAAL minimises the zigzag effect: during the first minute in Fig. 4(a) (where ZAAL is not used), the video bitrate is continuously toggling between 0.5 and 1Mb/s, while when ZAAL is used, in Fig. 4(b), application uses 0.5Mb/s bitrate most of the time. The same conclusions can be drawn from the second minute. During the last minute, bandwidth is wide enough to support 3Mb/s, and ZAAL finally allows this bitrate. Further analysis of Fig. 4(b) (with ZAAL) confirms the useful properties of ZAAL algorithm. First, application waits for at least 2 periods of time before trying a bitrate which has recently caused losses (e.g. at the beginning, 1Mb/s did not work between 0s and 2s, hence it was retried not at 4s, but at 6s). Second, when a bitrate causes losses for many consecutive times, application waits more
W. Ramadan, E. Dedu, and J. Bourgeois
432
(a) without ZAAL: many zigzags occur
(b) with ZAAL: few zigzag occur Fig. 4. Quality adaptation for one flow in case of traffic shaping, under the same network conditions
and more time to retry it (e.g. 1Mb/s at 24s, then at 32s, afterwards at 46s, and finally at 64s). Third, the maximum period during which the video quality was prevented to increase is 14s, as shown in section 4.2, point 3 (e.g. the first unsuccessful attempt at 10s followed by the next successful attempt at 24s, the same for 50s and 64s); during that time there were no losses (bandwidth much higher than bitrate), however ZAAL correctly prevented the bitrate increasing. More specifically, Tab. 1 presents the number of zigzags during the first and the second minute (there is no difference during the third minute). It is clear that ZAAL leads to much fewer zigzags (4 against 13 in first minute, 1 against 10 in second minute, so about 80% less in total). Naturally, this has a very big impact on the video quality perceived. Ten Flows. In this experiment, we noticed that all flows have the same tendency when choosing video bitrates. Fig. 5 shows the tendency of one flow out of the ten concurrent flows. We can see that the bitrate changes often between 1Mb/s and 2Mb/s, which indicates that one flow does not stay all the time at a high bitrate (e.g. 3Mb/s); if this happened, it would reduce the available bandwidth for other flows. Also, as in the previous test, application adapts often video quality while avoiding bitrates causing lost packets (e.g. in Fig. 5, 3Mb/s bitrate causes lost
Avoiding Zigzag Quality Switching in Real Content
433
Table 1. Number of zigzags with and without ZAAL Method First minute Second minute Total Without ZAAL 13 10 24 With ZAAL 4 1 5
packets at 26s, then it is not used until 68s). At the same time, it avoids frequent changes in video bitrate during transmission while improving the use of the bandwidth. This test checks also the fairness when using ZAAL for ten concurrent flows. Fig. 6 shows the percentage of sent packets by each flow at application layer when using ZAAL. It shows that ZAAL maintains the fairness among concurrent flows on server (i.e. all flows have nearly equal percentage of sent packets). On the other hand, the fairness on the network is guaranteed by using a TCP-friendly congestion control. 5.2
Performance Comparison
Even if reducing packet loss rate is not the main goal of ZAAL, we investigate how ZAAL affects it. We consider the number of received packets and the number of lost packets. Video quality over network transmission is more affected by packet losses rather than video bitrate value. Table 2 presents numerical results of the same experiments. For one flow in case of traffic shaping, it is clear that the number of received packets is lower when ZAAL is used (e.g. only 40263 received packets for ZAAL compared to 42043 without ZAAL), because of ZAAL preventing high bitrate for some period. On the other hand, when ZAAL is used the flow has 30% fewer lost packets (under the same network conditions). The average of the ten concurrent flows gives yet better results for ZAAL, i.e. number of received packets are about 5% more in case of ZAAL, and the loss rate is about 50% smaller too. To resume: – in the first experiment, it cannot be easily decided which is better in terms of number of sent and received packets, but using ZAAL is more useful because it avoids the zigzag and leads to a more stable video quality; – in the second experiment, ZAAL is better in terms of sent and received packets, avoiding the zigzag in the same time. Table 2. Number of sent and received packets (average of all flows) with and without ZAAL Traffic shaping 10 concurrent flows Sent pkts Received pkts Lost pkts Sent pkts Received pkts Lost pkts Method Without ZAAL 47795 42043 5752 (12%) 41191 32307 8884 (21.6%) 43913 40263 3650 (8.4%) 38105 33889 4216 (11%) With ZAAL
W. Ramadan, E. Dedu, and J. Bourgeois
434
Fig. 5. Quality adaptation with ZAAL for flow number 1 in ten concurrent flows test: few zigzags occur
Fig. 6. Percentage of sent packets by each flow at the application layer using ZAAL: the percentages are nearly equal
We can conclude that ZAAL is better and even if sometimes its number of received packets is lower, it reduces the rate of lost packets while maximising the use of the bandwidth.
6
Conclusions and Perspectives
This article has presented a simple brand new method to avoid the undesirable constant switching in quality occurring during a video streaming using content adaptation. It is a general solution, since it can be integrated to any adaptation method, and no matter if the video is encoded in multiple qualities or in multilayers. The proposed solution uses a history of quality successfulness to infer if a quality should be chosen at a given time. It is a non intrusive method, i.e. it does not change the video transmission and does not need feedbacks from the network. Experiments confirm that with our solution the problem of zigzag quality switching appears very rarely, without much influence on the video throughput. A perspective of our work is to consider a hybrid solution, which uses a bandwidth estimation method (to decide whether to increase or not the quality) when the maximum period of non-increasing with our method is reached. However,
Avoiding Zigzag Quality Switching in Real Content
435
a very interesting future work is to analyse a more general algorithm with VAAL/ ZAAL constraints, but applicable to the whole class of network congestion control methods.
References 1. Carter, R.L., Crovella, M.E.: Measuring bottleneck link speed in packet-switched networks. Performance Evaluation 27-28, 297–318 (1996) 2. Dovrolis, C., Ramanathan, P., Moore, D.: Packet-dispersion techniques and a capacity-estimation methodology. IEEE/ACM Transactions on Networking 12, 963–977 (2004) 3. Feamster, N., Bansal, D., Balakrishnan, H.: On the interactions between layered quality adaptation and congestion control for streaming video. In: 11th International Packet Video Workshop, Kyongiu, Korea (April 2001) 4. Feng, W.-c.: On the efficacy of quality, frame rate, and buffer management for video streaming across best-effort networks. Journal of High Speed Networks 11, 199–214 (2002) 5. Floyd, S., Handley, M., Padhye, J., Widmer, J.: TCP Friendly Rate Control (TFRC): Protocol specification, RFC 5348 (September 2008) 6. ITU-T. Opinion model for video-telephony applications (April 2007) 7. Jacobson, V.: Congestion avoidance and control. SIGCOMM Computer Communication Review 18, 314–329 (1988) 8. Jain, R., Routhier, S.A.: Packet trains: Measurements and a new model for computer network traffic. IEEE Journal on Selected Areas in Communications 4, 986– 995 (1986) 9. Keshav, S.: Packet-pair flow control. IEEE/ACM Transactions on Networking (February 1995) 10. McCanne, S., Van Jacobson, Vetterli, M.: Receiver-driven layered multicast. SIGCOMM Computer Communication Review 26, 117–130 (1996) 11. Postel, J.: Transmission control protocol, RFC 793 (September 1991) 12. Ramadan, W., Dedu, E., Bourgeois, J.: VAAL, video adaptation at application layer and experiments using DCCP. In: WPMC, 13th Int. Symposium on Wireless Personal Multimedia Communications, Recife, Brazil, pp. 1–5 (October 2010)
A Novel Attention-Based Keyframe Detection Method Huang-Chia Shih Department of Electrical Engineering, Yuan Ze University, Taoyuan, Taiwan, R.O.C. [email protected]
Abstract. This paper presents a novel key-frame detection method that combines the visual saliency-based attention features with the contextual game status information for sports videos. First, it describes the approach of extracting the object-oriented visual attention map and illustrates the algorithm for determining the contextual excitement curve. Second, it presents the fusion methodology of visual and contextual attention analysis based on the characteristics of human excitement. The number of key-frames was successfully determined by applying the contextual attention score, while the key-frame selection depends on the combination of all the visual attention scores with bias. Finally, the experimental results demonstrate the efficiency and the robustness of our system by means of some baseball game videos. Keywords: key-frame detection, content analysis, contextual modeling, visual attention model, content-based video retrieval.
1 Introduction The need to access the most representative information and reduce its transmission cost, have made the development of video indexing and a suitable retrieval mechanism popular research topics [1]-[3]. One effective approach is the use of key-frames, i.e. a very small subset of frames representing the whole video.Many multimedia retrieval systems have been proposed [4], [5]. Most of them are based on key selections (e.q., key-frame and key shot) as an index for users to select and browse. From a content semantics point of view, a video content can be divided into four categories based on their semantic significance, including video clip, object, action, and conclusion. A key-frame not only represents the entire video content, but also reflects the amount of human attention the video attracts. There are two clues that help us to understand the viewer’s attention, visual cue and contextual information. Visual information is the most intuitive feature for the human perception system. Modeling the visual attention when watching a video [6] is provides a good understanding of the video content. In intelligent video applications, visual attention modeling [7] combines several representative feature models into a single saliency map which is then allocated to those regions that are of interest to the user. The saliency feature map can be used as an indication of the attention level given to the key-frame. On the other hand, in sports videos, the game status is the information that concerns the subscriber
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 436–447, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Novel Attention-Based Keyframe Detection Method
437
the most. The embedded captions on the screen represent condensed key contextual information of the video content. Taking advantage of prior implicit knowledge about sports videos, Shih et. al [8] proposed an automatic system that can understand and extract context that can then be used to monitor an event. The number of key-frames in a shot is one of the most important issues in the keyframe extraction scheme. Whether it is predefined or chosen dynamically, either way it is affected by the video content. In fact, frames in a shot that undergo a strong visual and temporal uncertainty complicate the problem even more. To balance this, we let the number of key-frames be depending on the level of excitement of the context. This means that when the excitement in the on-going game is high, more key-frames should be extracted from that video shot. We proposed a novel attention-based keyframe detection system by integrating the object-oriented visual attention maps and the contextual on-going game outcomes. Using an object-based visual attention model combined with contextual attention model not only precisely determines the human perceptual characteristics but also effectively determines the type of video content that attracts the attention of the user.
2 Visual Attention In human-attention-driven application, video object (VO) extraction becomes necessary. Visual attention characteristics of every VO can be computed by using the saliency-based feature map extraction. We modified the visual attention model [7] which is frame-based manner to object-based approach, called the object-based visual attention model which provides more accurate information to the viewer. Based on the types of features, the object-based visual attention model can be determined by four types of feature maps, i.e., spatial, temporal (local), facial, and (global) camera motion. 2.1
Spatial Feature Maps
We selected three spatial features to be extracted and to be used as the spatial feature map: intensity, color contrast (red-green, blue-yellow), and orientation. The intensity image I was obtained as I = (r + g + b)/3 where r, g, and b indicate the red, green, and blue channels of the input image, respectively. Four broadly-tuned color channels were obtained, R = r - (g + b)/2 for red, G = g - (r + b)/2 for green, B = b - (r + g)/2 for blue, and Y = min{[(r + g)/2 - (|r - g|/2 - b], 0} for yellow. The local orientation feature was acquired from I using the oriented Gabor pyramids O(σ, θ), where σ represents the scale and θ ∈ {0o, 45o, 90o, 135o}. The spatial feature map FMspatial is then defined by the weighted sum of these three conspicuity maps. 2.2
Temporal Feature Maps
The motion activity (MA) was computed for each frame with W×H macroblocks. The temporal feature map FMtemporal was integrated with the neighboring MAs within the object boundary which implies the moving energy of the object and reflects the information regarding the texture of the object. Consequently, the MAs in the background are ignored because they attract little attention.
438
2.3
H.-C. Shih
Facial Feature Maps
In this study we adopted the skin color instead of a face detection scheme because it requires less time. Empirically, we set the range of the skin tone in the (r, g, b) color space based on the following criteria to obtain the most satisfactory results, (1) r > 90, (2) (r-g) ∈ [10, 70], (3) (r-b) ∈ [24, 112], (4) (g-b) ∈ [0, 70]. When using the skin tone region it is sometimes not only the face, but also the field soil that is being detected. However, if we employ the frame-based scheme, it may cause many unexpected noises such as a cluttered background, the faces of the audiences, and leafs moving about. Thus using the skin tone region to represent the facial feature map will provide satisfactory results. The facial feature maps FMfacial can be described as having the probability that each pixel belongs to the skin-color tone. 2.4
Global Camera Motion
Based on the cinematic models, photographers control the viewpoint, the focus, and the level of detail of various on-going events and situations. When a highlight occurs, they track the key object for a while. Usually they move the camera or change the focal length. Thus, the global camera motion is very important information to infer excitement. In this paper we take the camera motion into consideration for computing the visual attention and replace the time consuming 2-D calculation with two 1-D calculations by projecting the luminance values in the vertical and horizontal directions. Let’s suppose a frame size M-by-N. We use the slice-based approach to find the vertical and horizontal displacement vectors (DVV and DVH) for each pair of consecutive frames, and then simply adopt the sum of the normalized Euclidean norm of these two displacement vector to represent the camera motion characteristics Mcamera which can be expressed as M camera ( f i ) = ρ ( DVV M + DVH N ),
(1)
where ρ denotes the size of the sliding window.
3
Contextual Attention
The contextual attention is defined as the probability of a user’s interest in the specific context vector (game status). The outcome statistics in a sports video are usually shown on the screen in a superimposed caption box (SCB). The information shown in the SCB allows us to determine the contextual excitement. 3.1
SCB Extraction and Modeling
It is obvious that the SCB is always stationary. If it is not, it may be locally dynamic instead of perceivable varying. In this paper, we combine color-based local dynamics and temporal motion consistency to locate the SCB from a group of frames (GoF). It is reasonable to assume that there is high color correlation and low motion activities within the SCB region.
A Novel Attention-Based Keyframe Detection Method
439
To generate the SCB color model, the Hue-Saturation-Intensity (HSI) color components of each pixel within the SCB mask are coded into 4 bits, 2 bits and 2 bits, respectively. The color model for the SCB in frame k can be expressed as
[ (
) ]
h k ( j ) = ∑ δ LHSI ( H ik , Sik , I ik − j for 0 ≤ j ≤ 255,
(2)
i
where H ik , S ik , I ik are the three color components of pixel i in frame k, δ(i - j)=1 for i = j, δ(i - j)=0 otherwise. LHSI (.) represents the color mapping function which converts the 3-D color components (H, S, I) to a 1-D color index. The representative SCB 1-D color histogram hR is created by averaging the histograms of the segmented SCBs in GoF as hR ( j ) =
1 Ng
Ng
∑h
k
( j ),
(3)
k =1
where Ng is the number of frames in GoF. 3.2
Caption Template Construction
Due to the limited image resolution, it is a nontrivial problem to achieve a good character recognition rate. To recognize the characters, we use a simple multi-class SVM classifier method [10] in which the feature vector of the character block consists of Area, Centroid, Convex Area, Eccentricity, Equivalent Diameter, Euler Number, Orientation, and Solidity, and the size of the character block is normalized to 30×40. Once a new character is recognized, we can generate a specific caption template defined as
CTi = [Pos, text/digit, SV,CB],
(4)
where Pos indicates the relative location in the SCB. The text/digit bit indicates that the caption is either text or digit, SV indicates a set of support vectors for this character, and CB is the corresponding character block. The caption template can be used to identify the characters in the succeeding videos. Table 1. The reachable contextual information of the SCB Context
Annotation
Description
#States
INNS
The current innings
RUNNERS
λ1 λ2
9 8
RUNS
λ3
OUTS
λ4 λ5 λ6
BALLS STRIKES
The base that are occupied by runners The score difference The number of outs The number of balls The number of strikes
ns ∈ Ζn
3 4 3
440
3.3
H.-C. Shih
Modeling the Contextual Attention
Generally speaking, contextual attention is defined as the probability of a user’s interest in a specific game status which can be formulated with a context vector. However, different domains contain different linguistic information. Thus, it is hard to use a generic model for all types of video data, since a great deal of domain-specific features are involved. Different from visual attention which varies frame-by-frame, contextual attention varies depending if it is shot-based or event-based. Unfortunately, we were unable to obtain all of the statistical information from video the frames. Therefore, in this paper, we not only adopted the available information from the SCB, but also employed the historical statistics data. Normally, the contextual information embedded in the SCB consists of six classes (Λ={λi|i=1,2,…,6}) for the baseball game, as listed in Table 1. 3.3.1 Implicit Factors In this paper, the contextual information is divided into three classes. These three classes are based on the relationship between the value of the context and the degree it excites the viewer, proportionally, specifically, or inversely. After classifying the contextual description, a group of implicit factors {Fl |l=1,2,…, l’,…,L-1, L} are used to model the human excitement characteristics. Then, each of these implicit factors can be classified as one of the three classes as follows Ω(Fl’) ∈ {ωp, ωs, ωi}
(5)
where Ω(.) denotes the classifier, and ωp, ωs, and ωi represent the corresponding factor as a proportional type, specific type, or an inverse type respectively. It not only uses the implicit factor in the current moment (i.e., Ft), but it also takes into account the implicit factor from the historic statistics (i.e., F0:t-1). Let ψc(fi) indicate the viewer’s contextual attention score of frame f, which can be contributed to the implicit factors Ft={Fk|k=1,2,…,K} which consist of all the probable annotations of the SCB at that moment. Also, the historical statistics are taken into consideration, in which the implicit factors from the historical statistics are represented by F0:t-1={Fq|q=1,2,…,Q}. Thus ψct(fi)= F t+ F 0:t-1.
(6)
Four implicit factors from the current observation are considered in determining the contextual attention score, which include
F1t : The score difference The scoring gap between two teams greatly attracts the attention of the viewer. When the runs scored are the same or are very close, it indicates that the game is very intense. Therefore, the game can be formulated as
A Novel Attention-Based Keyframe Detection Method
F1t =exp(-α1λ3),
441
(7)
where α1 denotes the normalization term
F2t : The number of BALLS and STRIKES pairs The ratio between BALLS and STRIKES can be applied to model user attention. In a baseball game that is being broadcasted, the number of balls is repeatedly updated from 0 to 3 and the number of strikes and outs are updated from 0 to 2. When λ5 or λ6 reaches 3 or 2, it indicates that the game is getting a high level of attention, because the current player will soon be struck out or get to walk soon. Thus,
F2t = exp(λ5-3)*exp(λ6-2).
(8)
F3t : The number of the inning being played Generally speaking, the less the number of remaining innings, the more attention the game will attract. Therefore, we can design an implicit factor from the number of the inning λ1. In general, the maximum number of innings is 9. Thus, this factor can be expressed as F3t =exp(λ1-9).
(9)
F4t : The number of outs Obviously, the higher the number of OUTS, the higher the amount of excitement, and thus the higher the attention that is given. This implicit factor can be written as F4t = exp(λ4-2).
(10)
Here, we are also concerned with the past statistical data. For baseball games, a lot of fans and researchers like to analyze the relationship between the game’s statistics and the probability of scoring points. According to Bill James [11] a baseball writer, historian and statistician whose achievement has been widely influential on the baseball field as well as the field of statistics, the game of baseball is one of the most statistical games in sports. In baseball, players are identified and evaluated by their corresponding hitting and pitching statistics. Jim Albert [12] collected case studies and applied statistical and probabilistic thinking to the game of baseball. He found that there is an active effort by people in the baseball community to learn more about baseball performance and strategy through the use of statistics.
442
H.-C. Shih
Table 2. Expected runs scored in the remainder of the innings based on all combination pairs RUNNERS
#OUTS
1st
2nd
3rd
1st, 2nd
1st, 3rd
2nd, 3rd
0
None on 0.49
0.85
1.11
1.30
1.39
1.62
1.76
Bases loaded 2.15
1
0.27
0.51
0.68
0.94
0.86
1.11
1.32
1.39
2
0.10
0.23
0.31
0.38
0.42
0.48
0.52
0.65
The on-base situation usually attracts much user interest. In addition, coaches tend to change their strategy depending on the situation so as to score as many runs as possible in the remainder of the inning. In [13], authors conducted a thorough statistical analysis of the data from the National League for the 1987 season, the play-byplay data of which is downloadable from [15]. They archived the statistics of the expected runs scored in the remaining inning from each of 24 possible pairs (λ2, λ4) as shown in Table 2. They then estimated the possible scoring under different baseoccupied and number of out scenarios using the historic game statistics. Hence, based on the statistics, the implicit factor was then adjusted by using the weighting sum of the past attention score given λ4 , i.e., the pre-trained probability p(ψ0:t-1|λ4) and the expected runs scored by looking up in Table 2.
F10 :t -1 = β1p(ψ
|λ4) * β2LUT(λ2, λ4),
0:t-1
(11)
where β1 and β2 indicate the weighting factors, and LUT(.) denotes the look-up-table function which will be normalized by the maximal value for the corresponding situation of OUTS.
4 Keyframe Detection In this paper, we adopted the contextual attention score to determine the optimal number of key-frames from each shot, while the key-frames being determined based on the visual attention score. 4.1
Key Shot Selection
Prior to determining the key-frame, the key shot will be quickly selected using the information from the SCB template. There are two different kinds of key shot: SCBappeared key shot and content-changed key shot. The former is defined as the shot taken whenever the SCB appears, whereas the latter is taken when the SCB content changes. In this paper we assume that the SCB template does not change during the entire video program so that we may use the SCB color model to identify its presence, even though the SCB may be transparent. The similarity measure between model hR
A Novel Attention-Based Keyframe Detection Method
443
and the potential SCB is formulated by the Mahalanobis distance as d(hiMscb, hR)=(hiMscb-hR)TCm-1(hiMscb-hR), where hiMscb denotes the color distribution of the potential SCB region in the ith frame. If a shot is being detected that may change the game status, or if the SCB appears, then that shot will be selected as the key shot. 4.2
Key-Frame Rate Determination
As we know, the number of frames selected as key-frames within a shot depends on the importance of the shot. In addition, the contextual information about the game is normally represented in shot-based units. Therefore, it is appropriate to determine the key-frame rate based on the contextual attention score. In this paper, the contextual attention score is obtained by combining all the implicit factors which to a degree reflect the excitement indicated by the different context combinations. Suppose that Tr is the predefined percentage of the accumulated attention score required for determining a key-frame, and that shot i contains a total of Ns frames, then shot i’s keyframe rate Ri can be computed from s ⎧⎪ i t ⎫⎪ ≥ Ri = arg min ψ ( f ) ψtc ( f k ) × Tr % ⎬. ⎨ ∑ ∑ c j Rˆ i ⎪ ⎪⎭ k =1 ⎩ j =1
Rˆ
4.3
N
(12)
Key-Frame Selection
The frame-level attention score can be quantitatively measured by means of the object-based visual attention score which is defined as the combination of all the visual attention scores with bias. Basically, a key-frame selection is based on two rules: (1) key-frames must be visually significant, and (2) key-frames must be temporally representative. It is obvious that combining all attention feature maps can meet the first rule. In the present study we adopted the camera motion characteristics and treated them as the balancing coefficient to support the second rule. A numerical value was then derived from the visual characteristics of all segmented objects within the frame. The denominator of each part is the normalization term, which is the sum of the attention maps for all frames belonging to the same video shot. Based on the predefined number of key-frames, the Ri key-frames {Fk*} with the largest visual attention score ψvt, is calculated as follows
[
Ri
]
Fk* = U ⎧⎨arg max ψ tv ( f i ) × M camera ( f i ) ⎫⎬, fi ⎭ i =1 ⎩
(13)
where ψ tv ( f i ) = γ 1 ×
FM spatial ( f i ) Ns
∑ FM j =1
spatial
( fj)
+γ2 ×
FM temporal ( f i ) Ns
∑ FM j =1
temporal
( fj)
+γ3 ×
FM facial ( f i ) Ns
∑ FM j =1
facial
( fj)
,
(14)
where γ1, γ2, and γ3 denote the weighting coefficients among the visual feature maps.
444
5
H.-C. Shih
Experimental Results
Our proposed object-based attention model is used as an effective scheme to measure the attention score. Table 3 shows the attention scores in different modules for six representative frames shown in Fig. 1. The values of each column indicate the attention score, and accurately reflect the attention to the content via spatial, temporal, facial and global motion. Frames #650 and #1716 are globally static, so they have low temporal attention scores and low camera motions resulting in decreased visual attention. Frame #931 zooms in for a close-up, but the object itself is stable, resulting in a high visual attention. Frame #4001 is a mid-distance view with local motion and the camera zooming. However, the face it zooms into is clear and near the center of the frame, resulting in a high attention score. Frame #3450 is a mid-distance view with middle face attention, and with the camera panning, which also increases attention. Frame #5763 has high attention due to the rapid panning to capture the pitcher throwing the ball. Table 3. The examples with different Attention Scores Camera Motion
#Frame
Integral Visual Attention
Spatial
Temporal
Facial
573
0.243
0.053
0.688
0.107
650
0.093
0.051
0.110
0.039
0
11
931
0.691
0.506
0.672
0.562
82
242
1716
0.051
0.069
0.041
0.091
0
11
3450
0.186
0.038
0.253
0.038
24
129
4001
0.286
0.056
0.394
0.047
26
41
#573
#650
#931
#1716
#3450
#4001
Fig. 1. The representative frames for testing
DVv
DVH
144
241
A Novel Attention-Based Keyframe Detection Method
1-52:FK=52
53-165:FK=165
166-266:FK=209
267-517:FK=517
518-560:FK=558
561-630:FK=580
631-662:FK=660
663-741:FK=741
742-906:FK=801
907-921:FK=910
922-943:FK=943
944-989:FK=989
445
990-1053:FK=1040 1054-1067:FK=1062 1068-1102:FK=1086 1103-1133:FK=1125
1134-1143:FK=1141 1144-1184:FK=1164 1185-1275:FK=1220 1276-1338:FK=1338
1339-1377:FK=1377 1378-1400:FK=1379 1401-1445:FK=1403
Fig. 2. Sample results of the proposed key-frame detection method
446
H.-C. Shih
Fig. 2 shows the results of key-frame detection of the sample. It is evident that the proposed scheme is capable of extracting a suitable number of key-frames. Based on the visual attention score, the extracted key-frames are highly correlated with human visual attention. However, in the third and fourth rows of Fig. 2, there are two redundant key-frames extracted. This is because the shot boundaries were incorrectly determined as a result of the fast panning or tilting by the camera. In addition, all the frames within the shot had near-identical attention scores. Furthermore, the method we proposed can also be applied to determine the key-frame for slow-motion replay clips as the last selected key-frames in Fig. 2. We assume that from each of these shots at least one key-frame was selected based on the frames’ attention scores. That is why a few of the key-frames look like normal average play.
6
Conclusions
In this paper, we proposed a novel key-frame detection method by integrating the object-oriented visual attention model and the contextual game-status information. We have illustrated an approximate distribution of a viewer’s level of excitement through contextual annotations using the semantic knowledge and visual characteristics. The number of key-frames was successfully determined by applying the contextual attention score, while the key-frame selection depends on the combination of all the visual attention scores with bias. Employing the object-based visual attention model integrates with the contextual attention model not only produces precise human perceptual characteristics, but it will also effectively determine the type of video content that will attract the attention of viewers. The proposed algorithm was evaluated using commercial baseball game sequences and showed promising results.
Acknowledgment This research has been supported by National Science Council (NSC) of Taiwan, under grant number NSC-99-2218-E-155 -013.
References 1. Naphade, M.R., Kozintsev, I., Huang, T.S.: A Factor Graph Framework for Semantic Video Indexing. IEEE Trans. on CAS for VT 12(1), 40–52 (2002) 2. Doulamis, A., Doulamis, D.: Optimal Multi-Content Video Decomposition for Efficient Video Transmission over Low- Bandwidth Networks. In: IEEE ICIP (2002) 3. Shih, H.C., Huang, C.L., Hwang, J.-N.: An Interactive Attention-Ranking System for Video Search. IEEE MultiMedia 16(4), 70–80 (2009) 4. Graves, A., Lalmas, M.: Video Retrieval using an MPEG-7 Based Inference Network. In: ACM SIGIR 2002, August 11-15 (2002) 5. Zhong, D., Chang, S.F.: Spatio-temporal Video Search Using the Object-based Video Representation. In: IEEE ICIP, Santa Barbara (1997)
A Novel Attention-Based Keyframe Detection Method
447
6. Tsotsos, J.K., Culhane, S.M., Wai, W.Y.K., Lai, Y.H., Davis, N., Nuflo, F.: Modeling visual-attention via selective tuning. Artifical Intelligence 78(1-2), 507–545 (1995) 7. Itti, L., Koch, C., Niebur, E.: A Model of Saliency-based Visual Attention for Rapid Scene Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998) 8. Shih, H.C., Huang, C.L.: Content Extraction and Interpretation of Superimposed Captions for Broadcasted Sports Videos. IEEE Trans. on Broadcasting 54(3), 333–346 (2008) 9. Kim, M.K., Kim, E., Shim, D., Jang, S.L., Kim, G.: An Efficient Global Motion Characterization Methods For Image Processing Application. IEEE Trans. Consum. Electron. 43(4), 1010–1018 (1997) 10. Crammer, K., Singer, Y.: On the Algorithmic Implementation of Multi-class Kernel-based Machines. J. of Machine Learning Research 2, 265–292 (2001) 11. James, B.: The New Bill James Historical Baseball Abstract. Simon & Schuster, New York (2003) 12. Albert, J.: Teaching Statistics Using Baseball, The Mathematical Association of America (2003) 13. Albert, J.: Using Play-by-Play Baseball Data to Develop a Batter Measure of Batting Performance. Techn. Report, Bowling Green State University (2001) 14. Retrosheet, http://www.retrosheet.org (Date of access July 2008)
Evaluating the Effectiveness of Using the Internet for Knowledge Acquisition and Students’ Knowledge Retention Zakaria Saleh, Alaa Abu Baker, and Ahmad Mashhour Yarmouk University, Department of Management Information Systems, [email protected] {alaa_mis2008,mashhour_ahmad}@yahoo.com
Abstract. The Internet is the superior mean of communication and information dissemination in the new millennium. It is important to note out that searching for information on the Internet does not necessarily mean that some kind of learning process is going on. While the Internet dramatically changed the dissemination and sharing of information, there is no general consent among researches on the impact of the Internet on students’ and learners knowledge retention. U sing the Internet as a source of information, the student’s knowledge acquisition level and knowledge retention maybe affected for both the short term and long term. In light of that, this study was conducted to measure students’ retained knowledge by acquiring information from the Internet, and find how much of that knowledge was really retained at a later time. Keywords: Internet, learning, knowledge acquisition, knowledge retention.
1 Introduction Internet usage for education substantially increasing, and many institutions are using it for distance-learning programs and connecting their academic staff to improve teaching and research [16]. Yet there is little literature evaluating the effectiveness of learning in such environment compared to the traditional face-to-face classroom [19]. Some studies have analyzed how the Internet can be used as a useful tool to improve teaching, and thus improve student’s academic performance [13]. However, searching for information on the Internet does not necessarily mean that some kind of learning process is going on, especially when doing an assignment (or other university work), that can be done by looking for information on the web, and then “copy and paste” the suitable ones. Consequently, it’s a copy and paste activity with almost no cognitive effort [5]. For such a reason using internet by students for learning purposes should be like any other tool, it must be used correctly, otherwise the results may be the opposite of what is desired [13]. This study will evaluate students’ knowledge retention when using the Internet as a source of information.
2 Significance of the Study and Research Objectives There is no general consent among researches on the impact of the Internet on students’ knowledge retention. Kerfoot, [9] indicated that on-line spaced education can H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 448–455, 2011. © Springer-Verlag Berlin Heidelberg 2011
Evaluating the Effectiveness of Using the Internet for Knowledge Acquisition
449
generate enhancements in learning that can be retained two years later; also it can boost the knowledge retention, and in another research Kerfoot [10] suggests that spaced education boosts learning efficiency. Matzie et al [12] showed that spaced education could effectively improve learner behavior through inferring of the training material into real-life. However, other studies oppose the positive effect in a favor of the negative effect [2]. Other studies found that there is no significance impact of Internet on knowledge retention [7[[8][19]. The Internet dramatically changed the dissemination and sharing of information. As a result the technological advancements changed the way in which information is acquired, read, interpreted, and used. Therefore, students need to analyze information deeply to get better understanding [11]. The Internet is also transforming research, teaching, and learning ]17]. It is a rich source of information for students to find ideas for projects and assignments since the Internet can help students search, investigate, solve problems, and learn [6]. Simply, to find this information you need little patience and a decent search engine. Students with better computer and Web experience are likely to have more good perceptions or attitudes toward the Internet as a learning tool [20]. Thus, an inappropriate use of the Internet resources may affect the learning process as a whole, if no cognitive evaluation of the acquired information from the Internet had been conducted. Thus, students should learn to understand the background processes leading to generation of knowledge, identify them and examine various modes of dissemination [1]. Student’s knowledge level and knowledge retention maybe affected for both the short term and long term. Therefore, there is an urgent needed to conduct a research to measure the effectiveness of using the Internet as a source of information on the learning process. This research will aim at measuring the magnitude of knowledge retention as an indicator of the Internet effectiveness on students’ learning. This study tries to answer the following question: RQ1: What is the impact of using Internet as a source of information on student’s knowledge retention? RQ2: Is there a difference in knowledge retention between graduate and undergraduate students after performing a task using the Internet as a source of information? RQ3: Is there a difference in knowledge retention when testing student’s immediate knowledge retention and distant knowledge retention?
3 Literature Review The Internet is a very useful tool to support the self learning processes by supporting information access and creating a more flexibility of learning in terms of time, place and learning styles without any supervision of instructors [5]. Students use the Internet as a knowledge acquisition tool, which affects academic performance [15] and while students’ perceive online learning as effective, other results counter these claims [18]. The primary role of a student is to learn, which requires the ability to analyze and solve problems. Some barriers would challenge these tasks leading to unsuccessful learning process [14]. The ability to access, evaluate, and determine proper information
450
Z. Saleh, A. Abu Baker, and A. Mashhour
is a prerequisite for lifelong learning and education. Information literacy is about finding, evaluating, using, and communicating information effectively to solve problems. It is very important to have the ability to understand and evaluate information[1]. Ciglaric and Vidmar [3] stated that people keep one quarter of what they hear, nearly one half of heard and seen, and less than three quarters of the matter in which they actively participated. In addition, Custers [4] have suggested that, in the general educational domain approximately two-third to three-fourth of knowledge will be retained after one year, with a further decrease to below 50% in the next year. To explore how Internet usage as a learning tool affects the student’s knowledge and the knowledge retention, little research has been conducted so far, and according to Custers [4], retention of learning from on-line sources is often quite poor.
4 Methodology 4.1 Sample Selection A convenient sample of 115 graduate and undergraduate students studying in different areas at several universities was selected to participate in this study. The sample was selected based on students’ knowledge of the Internet and its applications, skills in using the Internet, and the extent use of the Internet as a source of information for college work. Such attributes will remove any deficiency, and thus eliminate uncertainties in the final findings. 4.2 Research Instrument A computer-based test was conducted by asking students general, simple, and easy to understand questions with hints on the URL of where the students will acquire the information. The URLs were provided to insure that students will be able to locate the information since the intention is to evaluate the retention of this information. Caution was made to select questions that have common knowledge so that if the answer was read and fully understood, then the participants would have a memory of that. Besides the general questions, the test contained the demographic information of the sample (gender, age, academic level). The test was two pages long and took 40-50 minute to finish by respondents. 4.3 Data Collection This research adopted pre-test/post-test experimental design. The pre-test was held in three major Jordanian universities. To avoid possible language deficiency, the test was done in Arabic, and the websites containing the answers were in Arabic as well. After the online test was completed, another test (paper-based test) was done where the same questions were asked; however, the answers were to be provided from students’ memory of the answer they obtained from the websites. And finally, to measure the spaced knowledge retention, a post-test was conducted three weeks later using a random sample from the same group of students, using a paper-based test.
Evaluating the Effectiveness of Using the Internet for Knowledge Acquisition
451
4.5 Data Analysis and Research Findings After keying data into a statistical tool (SPSS tool), the collected data was examined for data entry errors and excessive “copy and paste”. As a result of that, 45 test documents were excluded, leaving only 70 test documents for evaluation. Chi-Square test was used to test the relationship between (Internet Answers Score & Pre Test Retention; Pre Test Retention & Post Test Retention), and Chi-Square test was also used to test the relationship between the academic level and the Pre-Post tests knowledge retention. Descriptive statistics (such as means, standard deviations, and percentages) were used to summarize (InternetAnswersScore, Pre-Post tests knowledge retention). Table 1. Sample demographic characteristics Sample Categories Male Female 18-22 years 23-29 years ≥ 30 years Yes (used Internet before as a source of information) No (didn’t use Internet before as a source of information) Yes (Thought that using Internet as a source of information ease knowledge gaining) No (didn’t think that using Internet as a source of information ease knowledge gaining)
Gender Age Usage Thoughts
Frequency 15 55 42 21 7 69 1 68
Percent 21.4 78.6 60.0 30.0 10.0 98.6 1.4 97.1
2
2.9
Demographic characteristics of the overall participants are summarized in Table 1. The dominance of female participants in the sample is obvious (78.6%), and this is consistent with students' distribution in all three universities. Also, the data shows a dominance of certain age category, which is expected because most graduate students started the graduate program right after finishing their bachelor degree (less than or equal to 22 years of age (60.0%). Finally, the prior usage of Internet as a source of information, and the thought that eased acquiring information from the Internet has was very high (98.6, 97.1, respectively). The Results of the first experiment are presented in Table 2 (the scoring was based on a scale of 25 points) and was divided into three categories (Low =1, Moderate =2 and High = 3). It was found that students’ scores with the aid of Internet are in the range of moderate to high, with the dominance of high score (68.6%). On the other hand, students’ scores in the pre-test without the Internet aid range from low to high, with the dominance of moderate score (44.3%). Table 2. Frequencies for (InternetAnswersScore & PreTestRetention) Score Range & Category
Score category Value
Frequency
Percent
InternetAnwersScore
1-7 (Low) 8-14 (Moderate) 15-21 (High)
1 2 3
0 22 48
0 31.4 68.6
PreTestRetention
1-7 (Low) 8-14 (Moderate) 15-19 (High)
1 2 3
20 31 19
28.6 44.3 27.1
452
Z. Saleh, A. Abu Baker, and A. Mashhour
The Chi-Square test, as shown in Tables 3 & 4, indicates that there is a significant relationship between InternetAnswersScore & PreTestRetention. The Chi-Square significant value is .011 (p0.05).
Table 5. PostTestRetention*PreTestRetention Score Range 7-Jan 14-Aug 15-21 Total
Table 6. Chi-Square Tests
PreTestRetention
Total
4
4
8
Pearson Chi-Square
3
5
8
Likelihood Ratio
1
1
2
8
10
18
Linear-by-Linear Association
Value
df
Sig.
0.281a
2
0.869
0.283
2
0.868
0.053
1
0.818
The descriptive statistics for the means, standard deviations, and percentages, are shown in Table 7. The mean of the InternetAnswersScore were high level (mean=2.69) and the means of pre-post tests were from low to moderate level (1< mean< 2). Table 7. Descriptive Statistics Related to Research Variables Variable
Min Score
Max Score
Mean
Std. Deviation
InternetAnswersScore
2
3
2.69
0.468
PreTestRetention
1
3
1.99
0.752
PostTestRetention
1
3
1.67
0.707
Evaluating the Effectiveness of Using the Internet for Knowledge Acquisition
453
Based on Chi-Square test as shown in Tables (8, 9, 10, & 11) there is no significant relationship between Pre-Post tests retention and knowledge retention. ChiSquare significant values are 0.114 for the pretest and 0.557 for the posttest (p>0.05).
Table 8. PreTestRetention
Table 9. Chi-Square Tests Value
df
Sig.
1-8
8-14
15-21
Total
Pearson Chi-Square
4.351a
2
0.114
15
18
8
41
Likelihood Ratio
4.451
2
0.108
5
13
11
29
20
31
19
70
Linear-by-Linear Association
4.287
1
0.038
PreTestRetention Score Range UnderGraduate Graduate Total
Table 10. PostTestRetention
Table 11. Chi-Square Tests
PreTestRetention
Value
df
Sig.
Score Range
1-7
8-14
1521
Pearson Chi-Square
1.169a
2
0.557
UnderGraduate
2
4
1
7
Likelihood Ratio
1.197
2
0.55
Graduate
6
4
1
11
0.883
1
0.347
Total
8
8
2
18
Linear-by-Linear Association
Total
5 Discussion and Conclusion The main objectives for this study were: First, to investigate the impact of using Internet as a source of information for acquiring knowledge on student’s knowledge retention. Second, identify any major difference in knowledge retention between graduate and undergraduate students after performing a task using the Internet. The third and final objective is to identify any major difference in knowledge retention when testing student’s immediate knowledge retention and spaced knowledge retention. Despite the high level of students’ usage of Internet as a source of information (98.6%), the analysis showed that the Internet has low to moderate knowledge retention (72.9%). This might be because of the students’ belief that the Internet does ease the process of acquiring information (97.1%). In addition, the Chi-Square test supports our explanation with respect to differences between student’s scores using the Internet and their scores in the pre-pose tests, which should flag an alert for educators when rely on the Internet for their students’ knowledge acquisition and retention. As for the difference in knowledge retention between graduate and undergraduate students, there was no difference in knowledge retention between both groups, which represents a real issue since graduate students are generally professionals who are seeking higher education and knowledge. In addition, for identify any major difference in knowledge retention when testing student’s immediate knowledge retention and spaced knowledge retention, there is no significant relationship between Pre-Post tests retention and knowledge retention. However, it was found that the students’
454
Z. Saleh, A. Abu Baker, and A. Mashhour
knowledge retention decreases within a short period of time, which presents another alerting point, which needs to be investigated. Overall, the findings of this research raise a question about the effectiveness of using the internet for knowledge acquisition and knowledge retention by students. As 45 tests documents (about 40% of the participants) were excluded due to excessive “copy and paste”. If those participants were to be included in the spaced knowledge retention, the findings of the study would have been dramatically worsened. However, this could be because of the participants’ unawareness of the value of the study (even though prior to conducting the experiments, the student was briefed on the important of the study). Also, it is possible that the students have not critically thought about the material or understood the original ideas well enough to restate them using their own words, especially that participants’ were made aware of the value of the study. Therefore, future study shell be made to address this point and determine if that is issue.
References 1. Amalahu, C., Oluwasinay, O., O., E., Laoye, O.A.: Higher Education and Information Literacy: A Case Study of Tai Solarin University of Education. Library Philosophy and Practice, e-journal (2009) 2. Bell, D., S., Harless, C., E., Higa, J., K.,. Bjork, E., L., Bjork, R., A., Bazargan, M. Mangione, C. M.: Knowledge Retention after an Online Tutorial: A Randomized Educational Experiment among Resident Physicians. Journal of General Internal Medicine, 23(8), 1164–1171 (2008) 3. Ciglaric, M., Vidmar, T.: Use of Internet Technologies for Teaching Purposes. European Journal of Engineering Education 23(4), 497–502 (1998) 4. Custers, E.: Long-term retention of basic science knowledge: a review study. Advances in Health Sciences Education, 15(1) 1, 109–128 (2010) 5. Donoso, V., Roe, K.: Are They Really Learning Online? The Impact of the Internet on Chilean Adolescents’ Learning Experiences. In: Pearson, E., Bohman, P. (eds.) Proceedings of World Conference on Educational Multimedia, Hypermedia and Telecommunications, pp. 1679–1686 (2006) 6. Furner, J., Doan Holbein, M.F., Scullion, K.: Taking an Internet Field Trip. Tech. Trends 44(6), 18–22 (2008) 7. Ibrahim, M., Al-Shara, O.: Impact of Interactive Learning on Knowledge Retention. In: Smith, M.J., Salvendy, G. (eds.) HCII 2007. LNCS, vol. 4558, pp. 347–355. Springer, Heidelberg (2007) 8. Jackson, S.: Ahead of the curve: Future shifts in higher education. Educause Review 39(1), 10–18 (2004) 9. Kerfoot, B.P.: Learning Benefits of On-Line Spaced Education Persist for 2 Years. The Journal of Urology 181(6), 2671–2673 (2009) 10. Kerfoot, B.P.: Adaptive Spaced Education Improves Learning Efficiency: A Randomized Controlled Trial. The Journal of Urology 183(2), 678–681 (2010) 11. Kumar, M., Natarajan, U., Shankar, S.: Information Literacy: A Key Competency to Students’ Learning. Malaysian Online Journal of Instructional Technology 2(2), 50–60 (2005) 12. Matzie, K., A., Kerfoot, B., P., Hafler, Janet P., H., Breen, E.,M.: Spaced education improves the feedback that surgical residents give to medical students: a randomized trial. The American Journal of Surgery, 197(2), 252-257 (2009)
Evaluating the Effectiveness of Using the Internet for Knowledge Acquisition
455
13. Muñoz, J.C., Montoliu, J.M.D.: Uses Of Internet And Academic Performance In The Catalan University System. Readings in Education and Technology. Proceedings of ICICTE 2008, 343–353 (2008) 14. Nitsch, W.B.: Examination of Factors Leading to Student Retention in Online Graduate Education (2003), http://www.decadeconsulting.com/decade/papers/ StudentRetention.pdf (retrieved February 13, 2010) 15. Osunade, O.: An Evaluation of the impact ’Browsing the Internet ‘on students’ academic performance at the Tertiary level of education in Nigeria. An evaluation of the impact of Internet Ernwaga 2002 Small Grant Project (2003) 16. Osunade, O., Ojo, O.M.: The Role of Internet on the Academic Performance of Students in Tertiary Institutions. Journal of Educational Research in Africa/Revue en Africanie de recherche en Education (JERA/RARE) 1(1), 30–35 (2009) 17. Parameshwar, S., Patil, D. B.: Use of the Internet by Faculty and Research Scholars at Gulbarga University Library. Library Philosophy and Practice (e-journal) (2009) 18. Robertson, S.J., Grant, M.M., Jackson, L.: Is online instruction perceived as effective as campus instruction by graduate students in education? Internet and Higher Education 8(1), 73–86 (2005) 19. Schardt, C., Garrison, J.: Continuing Education and Knowledge Retention: A Comparison of Online and Face-to-Face Deliveries (2007), http://works.bepress.com/julie_garrison/1 20. Tekinarslan, E.: Turkish University Students’ Perceptions of the World Wide Web as a Learning Tool: An Investigation Based on Gender, Socio-Economic Background, and Web Experience. International Review of Research in Open and Distance Learning 10(2), 1–19 (2009)
Computer-Based Assessment of Implicit Attitudes Ali Reza Rezaei College of Education, California State University, Long Beach, 1250 BellflowerBlvd, Long Beach, CA, 9840, USA [email protected]
Abstract. Recently, computers are being used extensively in psychological assessment. The most popular measure implicit attitude is the Implicit Association Test “IAT” [2]. According to Aberson and Beeney [3], although the IAT has been applied broadly, little is known about the psychometric properties of the measure. Four different experiments were conducted with four different samples to investigate the temporal reliability of IAT. Also students’ opinion (trust) about the validity and reliability of the test was measured. The results showed that while there are numerous reports of moderate validity of the test, its reliability as measured in this study, particularly for the first time users, is relatively low. Familiarity with similar tests, however, improves its reliability. Keywords: Computerized Association Test.
Assessment,
Validity,
Reliability,
Implicit
1 Introduction The goal of this paper is to evaluate the validity and reliability of an implicit association test. Fyodor Dostoyevsky once said every man has reminiscences which he would not tell to everyone but only his friends. But there are other reminiscences which a man is afraid to tell even to himself [4]. In his recent book “hidden brain”, Vedantam [5] illustrates several situations in which people do something against their intentions. To explain this phenomenon he introduces the concept of “unconscious brain” or “the hidden brain” (the hidden forces that influence us in everyday life). He explains further that our conscious mind is like the pilot of the plane, and the hidden brain is the autopilot function of the plane or the co-pilot function of the plane. According to Vedantam, people regularly transfer functions back and forth between the pilot and the autopilot functions. The problem arises, he suggests, when we do this without our awareness, and the autopilot ends up flying the plane, when we should be flying the plane. That is how stereotypes, prejudices, and discrimination control people’s behavior (they are not aware of it). He suggests that racial categorization begins at a very early age. He cites an experimental study from a daycare center in Canada that found children as young as three showed negative stereotypes against black faces [6]. We usually perform based on our implicit attitudes rather than our socially expressed opinions. As reported by Greenwald and Banaji [1], there is considerable evidence supporting the view that social behavior often operates in an implicit or H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 456–462, 2011. © Springer-Verlag Berlin Heidelberg 2011
Computer-Based Assessment of Implicit Attitudes
457
unconscious fashion. Therefore, the Implicit Association Test (IAT) was developed to examine thoughts and feelings that exist either outside of conscious awareness or outside of conscious control. Since its inception, the IAT has been used in more than 300 published articles and cited in more than 1000 articles. According to Hofmann, Gawronski, Gschwendner, Le, and Schmitt [7], one of the most important contributions in social cognition research within the last decade was the development of implicit measures of attitudes, stereotypes, self-concept, and self-esteem. Investigating the validity and reliability of the IAT is very important. The IAT has been applied in various disciplines including social and cognitive psychology, clinical psychology, developmental psychology, neuroscience, market research , and health psychology [2, p. 267]. The test examines disparate topics such as attitudes, stereotypes, self-esteem, phobias, and consumer behavior [3, p. 27]. It is being used in courts to test the biased witnesses [8], to evaluate applicants based on university admission requirements [9], and in religious studies to evaluate subjects religious faith [10]. Nevertheless, many educators believe that this unusual excitement both among psychologists and the public stems from the kinds of associations (popular topics) that researchers have used the test to measure rather than validity or reliability of the test [11]. 1.1 Research Questions As shown in the literature review, current research on the IAT has focused mostly on validity, rather than the reliability of the test. The goal of this project was to reexamine the reliability of the test and the factors that may improve its reliability. The present study, also intends to evaluate the test from the users’ (college students) perspective. Four studies are reported here that were designed to answer the following four research questions. a-
To what extent do students trust the results of IAT? Do they believe the test is reliable and valid? b- Does it make a difference to train students about their hidden stereotypes and prejudices? Is there a difference in the trust in IAT between a group of students who receive such training and those who do not receive such training? c- What is the reliability of the IAT based on the results reported (displayed) to the users? Is there a difference between the coefficients determined using a one week test-retest reliability and an immediate test-retest reliability index? d- Does it make a difference if the users are familiar (trained) with the test? Does such training make the test more reliable?
2 Methodology A total of 150 participants, all of whom were college students, took part in this study. The participants’ ages ranged from 20 to 55, and about 73% of them were female. Subjects received course credit points for their participation. Most of the subjects were relatively new classroom teachers. All subjects had initial familiarity with the two concepts of reliability and validity.
458
A.R. Rezaei
In an initial pilot study it was observed that some students did not take the test seriously and some did not seem to know that the speed is an issue. To increase the accuracy of the measurement, all students were shown a slide show about the theoretical framework for the IAT and a brief explanation on how it works. This introduction was given to prepare them for the test and also to increase their active participation in the experiment. The content of the introduction was mainly adopted from the IAT web site. At the end of each experiment, the subjects were asked to rate the validity and reliability of the test (based on their experience with the test, the given results, and what they know about validity and reliability). They used a Likert type scale (1= No validity or reliability, 2=Low, 3= Moderate, 4= Strong) to do so. This was considered as their trust (perceived) in reliability and validity of the IAT. In experiment 1 students watched a 20 minute video clip about the unconscious mind and stereotypes and prejudices prior to taking the test. The video used was from a very informative TV documentary “Race and Sex: What We Think But Don’t Say” [12] which presents some social psychology experiments, and interviews with experts and psychologists about the hidden brain, unconscious mind, and implicit measures. Students in this group took the Gender/Science IAT twice. This test is available online. The time interval between the pretest and posttest was only 15 minutes. This test measures the degree to which the users associate females and males with science and liberal arts. In the second experiment subjects did not watch the video. Subjects in this group were asked to practice with the IAT before they started the main test (not the same test). Students took the Asian/American – European/American IAT twice. The time interval between the pretest and posttest was one week. In the third experiment subjects used the Gender-Science IAT. This group also took the test twice in a one-week time interval. Unlike Group 1, however, this group had a chance to practice with the IAT before starting the main test. Finally, in the fourth experiment subjects had a chance to watch the introductory video, to practice with the IAT before starting the main test, and they did the Asian/America test twice in a 15-minute interval. A summary of the 4 experiments (4 groups) is presented in Table 1. Table 1. Experimental conditions of the 4 groups
Group 1 Group 2 Group 3 Group 4 Total
3
N 32 34 38 46 150
Male 6 10 10 14 40
Female 26 24 28 32 110
video Yes No Yes Yes
Practice No Yes Yes Yes
Posttest 1 week 1 week 1 week 15 min
IAT Test Gender/Science Asian/Americans Gender/Science Asian/Americans
Results
The first research question in this study investigated whether students consider the test to be both reliable and valid. The results as shown in Table 2 indicate that about 73% of subjects believed the test was valid to some extent. About 29% of subjects believed that IAT was highly valid, and about 27% thought the test was not valid at
Computer-Based Assessment of Implicit Attitudes
459
all. Table 3 shows how much each group trusted in the reliability of the IAT. This table shows that about 82% of subjects believed the test was reliable to some extent. About 55% of subjects believed that IAT was highly reliable, and about 18% thought the test was not valid at all. The second research question investigated whether watching the introductory video helped build students’ trust in the validity and reliability of the IAT. The results as reflected in Table 2 showed that different groups showed different levels of trust in the validity of the ITA. Subjects in Group 2, who did not get a chance to see the video, mostly rated the test as having no or low validity. Comparing Group 2 and Group 4 (both evaluated the same IAT test) showed that watching the video significantly increased their trust in IAT (t=2.447, p=.017). Table 2. Students’ rating of the validity of the test
Group 1 Group 2 Group 3 Group 4 Total
Count % Count % Count % Count % Count %
No 8 25.0% 13 38.2% 10 26.3% 10 21.7% 41 27.3%
Rating of the Validity Low Moderate 5 8 15.6% 25.0% 8 9 23.5% 26.5% 6 10 15.8% 26.3% 8 12 17.4% 26.1% 27 39 18.0% 26.0%
High 11 34.4% 4 11.8% 12 31.6% 16 34.8% 43 28.7%
Table 3. Students’ rating of the reliability of the test
Group 1 Group 2 Group 3 Group 4 Total
Count % Count % Count % Count % Count %
Rating of the Reliability No Low Moderate 6 5 5 18.8% 15.6% 15.6% 7 8 8 20.6% 23.5% 23.5% 6 3 4 15.8% 7.9% 10.5% 8 4 4 17.4% 8.7% 8.7% 27 20 21 18.0% 13.3% 14.0%
High 16 50.0% 11 32.4% 25 65.8% 30 65.2% 82 54.7%
Table 3 shows how much each group trusted in the reliability of the IAT. This table shows that subjects in Group 2, who did not have a chance to see the video, mostly rated the test as having no or low reliability. Comparing Group 2 and Group 4 (both
460
A.R. Rezaei
evaluated the same IAT test) shows that watching the video significantly increased their trust in the IAT (t=2.039, p=.045). The third goal of this study was to re-examine the reliability of the IAT based on the results reported (displayed) to the users. Consequently, it was intended to check if one-week test-retest reliability was different from immediate test-retest reliability index. The reliability of the test for the control group (the group that did not practice before the test) was .32. However, the combined reliability (computed on all 150 subjects in all 4 groups together) was .52. Comparing Groups 2 and 4 (both used the Asian/American IAT) showed that the time interval between pretest and posttest did not make a difference. Also comparing Groups 2 and 3 (both used one week interval between pretest and posttest) showed that there was no difference in the reliability of Asian/American IAT and the Gender/Science IAT. The last research question was to investigate whether it made a difference in the reliability of the IAT when the users practice with other IATs before the actual test. Does such training make the test more reliable? The results showed that there was a large difference in reliability of the test between the first experiment and the other three experiments. The test-retest reliability index for the four experiments were .32 (N=32) , .57 (N=34), .57 (N=38), and .56 (N=46) respectively. These numbers indicate that familiarity with the test did improve the reliability of the test. The only difference between Group 1 and Group 3 was that the first group did not practice with any IATs before the main test and so they were not familiar with it. These two groups used the same IAT test (Gender/Science) and the interval between test and retest was one week for both groups.
4 Conclusions and Discussion The result of this study has a practical implication. The IAT website reports that the majority of people taking the race IAT (70%) have “automatic preference for Whites over Blacks” and 27% have a “strong automatic preference for Whites over Blacks.” Similar statistics are presented about other social attitudes and stereotypes. According Blanton and Jaccard [13], these diagnoses probably lead many individuals to infer that they possess hidden anti-black racist attitudes. The results of this study also indicates that subjects gained a high level of trust in validity and particularly the reliability of the IAT. Regarding the low reliability of the test as reported in this paper and the earlier discussion about the validity of the test, one should be cautious about the implications, interpretations, and, particularly, in any kind of decision making. This study focused on reliability, while validity of the IAT has also been the subject of hundreds of research studies. Lack or low correlation of the IAT results with parallel explicit (self-report) measures has made many researchers question the validity and reliability of the test. Reliability is a precondition for validity. As noted by Grumm and von Collani [14, p. 2215], “the problem of low test–retest-reliability must be considered as a critical limitation for a diagnostic application”. Without proving high levels of reliability any claims about the validity of the test should be considered with caution. Perhaps what makes this test vulnerable in any kind of assessment of its validity and reliability is the use of reaction time. Reaction time has
Computer-Based Assessment of Implicit Attitudes
461
been used for many other assessments, particularly, in recent computerized implicit measurements (e.g., cognitive styles) which has also been found to be unreliable [15]. “For current reaction-time indices, a tenth of a second can have a consequential effect on a person’s score, and such measurement sensitivity can lead to test unreliability” [13, p. 289]. Therefore, the difference between temporal reliability reported here (r=.32) and the internal consistency index reported earlier (r=.89) may also indicate that the reliability of the IAT is more affected by stability of users in their reaction time rather than the stability in their implicit attitudes. Regarding the massive publicity of the IAT, higher level of reliability was expected in this study. The IAT is typically used to measure deeply seated personality traits or stereotypes rather than temporary states of mind. As observed in Steffens and Buchner [16] and reported by Azar [11], when compared to established tests of personality traits, the trans-situational component of the IAT looks quite small. Therefore, it is reasonable to consider the reliability coefficients in this study and other studies reported earlier to be unsatisfactory. As suggested by the authors of the IAT scholarly criticism of the IAT, increases the motivation to pursue questions about this test. The results of this study suggest that more research is needed to find out how the reliability and validity of the IAT could be improved. The theoretical framework of the test is quite powerful and the publicity of the test without any doubt has elevated social awareness in the areas of ethnical and gender studies. As reported by Azar (2008), the IAT has the potential to be a remarkably powerful tool.
References 1. Greenwald, A.G., Banaji, M.R.: Implicit social cognition: Attitudes, self-esteem, and stereotypes. Psychological Review 102(1), 4–27 (1995) 2. Nosek, B.A., Greenwald, A.G., Banaji, M.R.: The Implicit Association Test at age 7: A methodological and conceptual review. In: Bargh, J.A. (ed.) Social Psychology and the Unconscious: The Automaticity of Higher Mental Processes, pp. 265–292. Psychology Press, New York (2007) 3. Aberson, C.A., Beeney, J.E.: Does substance use impact implicit association test reliabilities? Journal of Social Psychology 147, 27–40 (2007) 4. Dostoevsky, F.: Notes from underground. Trans. Mirra Ginsburg (1974/1992). Bantam, New York (1864) 5. Vedantam, S.: The Hidden Brain: How Our Unconscious Minds Elect Presidents, Control Markets, Wage Wars, and Save Our Lives. Spiegel & Grau, New York (2010) 6. Aboud, F.E.: The formation of in-group favoritism and out-group prejudice in young children: Are they distinct attitudes? Developmental Psychology 39, 48–60 (2003) 7. Hofmann, W., Gawronski, B., Gschwendner, T., Le, H., Schmitt, M.: A meta-analysis on the correlation between the Implicit Association Test and explicit self-report measures. Personality and Social Psychology Bulletin 31(10), 1369–1385 (2005) 8. David, G.: Our Hidden Prejudices, on Trial. Chronicle of Higher Education 54(33), b12– b14 (2008) 9. Bererji, S.: Who do you think you are? Issues in Higher Education 22(15), 32–34 (2005) 10. Ventis, W.L., Ball, C.T., Viggiano, C.: A Christian humanist implicit association test: Validity and test−retest reliability. Psychology of religion and spirituality 2(3), 181–189 (2010)
462
A.R. Rezaei
11. Azar, B.: IAT: Fad or fabulous? Monitor in Psychology 39(7), 44 (2008) 12. Mastropolo, F., Varney, A.: (Producers). Race & Sex: what we think but don’t say / ABC News. Originally broadcast as a segment on the television program 20/20 (September 15, 2006) 13. Blanton, H., Jaccard, J.: Unconscious racism: A concept in pursuit of a measure. Annual Review of Sociology 34, 277–297 (2008) 14. Grumm, M., von Collani, G.: Measuring Big-Five personality dimensions with the implicit association test-Implicit personality traits or self-esteem? Personality and Individual Differences 43, 2205–2217 (2007) 15. Steffens, M.C., Buchner, A.: Implicit Association Test: Separating transsituationally stables and variable components of attitudes toward gay men. Experimental Psychology 50(1), 33–48 (2003)
First Electronic Examination for Mathematics and Sciences Held in Poland - Exercises and Evaluation System Jacek Stańdo Technical University of Lodz , Center of Mathematisc, Łódź, Poland [email protected]
Abstract. The development of new information technology poses a question about the future shape of education. You can now ask: Will not traditional course books, the traditional school, a traditional teacher, the traditional examination process change rapidly? The work will be described first as an electronic mock exam in mathematics and then as a natural exam carried out on the Internet in Poland. The aim of this paper is to present the typology of the tasks and the electronic system of evaluation. The E-examination was conducted for the first time in Europe on such a large scale in the real-time (320 Polish schools from all over Poland took part in it, including 3000 students). The conducted trial is the first step towards changing external examinations in Poland. Keywords: electronic examination, evaluation systems
1 Introduction The new system of external evaluation in Poland, which has been brought in gradually since 2002, makes it possible to diagnose the achievements as well as shortcomings of students’ education in order to evaluate the efficiency of teaching and to compare objectively current certificates and diplomas irrespective of the place where they have been issued. The parts of the external examination system are: • • • •
The Competence Test in the sixth class of primary school. The Lower Secondary School (Gymnasium) Examination conducted in the third class of lower secondary school. The Matura Exam for graduates of general secondary schools, specialized secondary schools, technical secondary schools, supplementary secondary schools or post-secondary schools. The Examination confirming Vocational Qualifications (vocational examination) for graduates of: vocational schools, technical schools and supplementary technical schools.
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 463–472, 2011. © Springer-Verlag Berlin Heidelberg 2011
464
J. Stańdo
The rules of external evaluation are described in detail in The Ordinance of the Minister of National Education. External evaluation is conducted at the end of a particular stage of education given all the requirements to realize the educational assignments described in the Core Curriculum. The methods and results of implementing these assignments may vary owing to the autonomous character of each school. Accordingly, only the final effects of education and results achieved from a completed stage of education can be compared. The achievement standards signed by the Minister of National Education are the basis for conducting the tests and examinations. They have resulted from extensive consultations with teachers and the academic community, and are based on the educational objectives defined in the Core Curriculum. The establishment of uniform and clearly stated attainment standards have a direct influence on the fairness and standardization of the external evaluation. Furthermore, those standards are relevant to move the main interest of the assessment from knowledge to skills and abilities obtained at a particular stage of education. The Central Examination Board (www.cke.edu.pl) is a public education institution based in Warsaw. It was established on 1st January 1999 by the Minister of National Education on the strength of the Act on the System of Education, whose aim was to evaluate the educational achievements of students, to monitor the quality of the educational influence of school and to make all the certificates comparable regardless of the place where they have been issued. The responsibilities of the Central Examination Board are as follows: • Preparing questions, tasks and exam sets for conducting the tests and examinations • Preparing and presenting syllabuses containing a description of the scope of the tests and examinations, sample questions, tasks, tests, and criteria for evaluation. • Analyzing the results of the tests and examinations and reporting them to the Minister of Education in the form of an annual report of students achievements at a given educational stage. • Stimulating research and innovation in the area of assessment and examination The examination at the end of The Lower Secondary School is obligatory, which means that every pupil must sit for it at the end of school. The examination measures mastery of knowledge and skills specified in the current examination standards in all schools in Poland. The examination consists of three parts: • The first one covers the knowledge and skills in the humanities: the Polish language, history, social studies, visual arts and music. • The second one covers the knowledge and skills in mathematics and natural sciences: mathematics, biology, geography, chemistry, physics and astronomy • The third one covers the knowledge and skills of a foreign language
First Electronic Examination for Mathematics and Sciences Held in Poland
465
Each student taking the exam receives a set of tasks on paper along with an answer sheet. Exam begins promptly at the appointed hour in Poland. Part one and two after the exam lasts 120 minutes, and part three continues for 90 minutes. Specially prepared and trained examiners review and score the students’ examinations. The answer sheets on which the answers are given to closed tasks (marked by students) and the results obtained by carrying out open tasks (written by examiners), are finally fed into electronic readers. Since the introduction of a new examination system in Poland the extensive research related to the exams has been done. In the past ten years, over 20,000 students have participated in the mock exams organized. After the test, students completed the questionnaire prepared [2,3,4,5]. The results have been reported in many publications. (One study indicated a link between the exam results and the student’s family economic conditions, [1]).
2 E-Examination Most of the changes effectively began with an idea since the favorable atmosphere allows for innovation. The exam was carried out for the first time in Poland (and on this scale in Europe) over a year ago. First an examination included the parts of mathematics and natural sciences on the Internet. I was the coordinator, originator and author of all the exercises of the examination. I created a special model system for evaluation. Then, computer scientists processed it into multimedia. About 3,000 students from over 320 Polish schools entered to test the e-exam. The fixed time was allotted to all the students who came to do the examination. A special computer system acted as an examiner. Each student received a score within a few hours. All the tasks and the form of the test corresponded to 95% of the examination carried out by the Central Examination Board. The project was conducted in close cooperation with the Central Examination Board and the Minister of Education. For several years, the research was on automatic grading of examinations taken online [6], [7] in which students’ answers were all free-form texts [8], [9]. The next section will present the examples of tasks and their evaluation system.
3 Exercise and Evaluation System The e-examination consisted of 39 exercises. Students had to do them within 120 minutes. In this section we will discuss several of them and introduce the evaluation system. Exercise 3 (fig. 1,2) consisted of determining the volume of a piece of wood in the shape of a cylinder. The student chose an appropriate model, then the enumerated value. The design errors were not chosen accidentally. They were selected on the basis of a test carried out in paper form, as the most prevalent. The grading system: 1 point for pointing out the correct formula, 1 point for doing the calculations. If the student drew a bad pattern, but the data were given correctly and the calculations were done properly, the students received for this task: 0-the first step, 1 - for doing the calculations.
466
J. Stańdo
Fig. 1. E-exam- question 3
Fig. 2. E-exam- question 3
Exercise 6 concerned the determination of the compass (fig. 3,4). This is one of the most important skills attained in the study of geography at this level of education. The presented simulation shows the sunset. On the basis of need to identify appropriate lines of the world. The grading system: 1 point for the correct indication of the direction of east and west, 1 point for the correct indication of the direction of north and south.
First Electronic Examination for Mathematics and Sciences Held in Poland
467
Fig. 3. E-exam- question 6
Fig. 4. E-exam- question 6
On the floating boat there are two forces: buoyancy and gravity. By using the mouse to move, the student has a chance to indicate the respective forces. The task of the pupil is to put them in the right place (fig. 5,6). The grading system: 1 point for applying forces to the location (the center of the boat), 1 point for the correct direction of buoyancy and gravity, 1 point for balancing the forces.
468
J. Stańdo
Fig. 5. E-exam- question 35
Fig. 6. E-exam- question 35
One of the advantages of e-exams is the ability to use multimedia. The following two exercises show you how to perform virtual experiments by checking the appropriate knowledge and skills of the student. The vessel is the substance: water from precipitation (fig. 7,8). The exercise consisted of color indication on how the paper stained (Litmus). The grading system: 1 point for the correct answer.
First Electronic Examination for Mathematics and Sciences Held in Poland
469
Fig. 7. E-exam- question 30
Fig. 8. E-exam- question 30
A drop of ink falls into the water. The task is for the student to identify the continuing process of the chemical substance (fig. 9,10). The grading system: 1 point for the correct answer.
470
J. Stańdo
Fig. 9. E-exam- question 26
Fig. 10. E-exam- question 26
Next exercise (fig. 11,12) . It was based on an estimate of the surface of the gate on the basis of human growth. We used the virtual ruler for this task. The student used the mouse to move the line after drawing in order to estimate the required length. The grading system: 1 point for the correct answer.
First Electronic Examination for Mathematics and Sciences Held in Poland
471
Fig. 11. E-exam- question 17
Fig. 12. E-exam- question 17
4 Conclusions The system of external examinations in Poland began 10 years ago. Electronic systems are virtually unknown in the exams. Next year, the Central Examination Board will conduct the first test in e-marking. I think the next step will be the introduction of e-examinations. I hope that my research, the proposed developments, will contribute to the development of assessment and examination system in Poland
472
J. Stańdo
[10]. After conducting the examination more than 80% of the students found this form of examining better than the traditional one. During the examination, there were some technical problems. More than 2% of the students did not receive their results. It appears that new forms of examining still require a number of tests and trials, better computer equipment and a lot of investment. The advantages of e-examinations include: the results are received quickly (in the traditional method it takes a month whereas in the electronic one it takes a few seconds), the possibility of using multimedia, the economy of paper. The disadvantages include: the overloaded server and the data security in the Net. Acknowledgments. The author thanks the Ministry of Education and the Central Examination Board for the help and commitment to hold the first e-exam.
References 1. Stańdo, J.: The influence of economic factors on the trial exam results. Scientifict Bulletin of Chelm, Section of mathematics and computer science (2007) 2. Stańdo, J.: The use of trial exams results for comparison of changes over 2005 and 2006, XIV Polish-Czech-Slovak Mathematical School, 2007. Prace naukowe, Matematyka XII, Akademia w Częstochowie (2007) 3. Dąbrowicz-Tlałka Stańdo, J., Wikieła, B.: Some aspects of blended-Teaching Mathematics, Teaching learning education, Innovation, new trends, research- 2009 Ruzenborek Slovenia, Editors Marti Billich, Martin Papco, Zdenko (2009) 4. Legutko M., Stańdo J.: Jakie działania powinny podjąć polskie szkoły w świetle badań PISA, Prace monograficzne z Dydaktyki matematyki, Poland (2008) 5. Stańdo J.: Zastosowanie sztucznych sieci neuronowych do wyznaczania przelicznika, Prace monograficzne z Dydaktyki matematyki (2008) 6. Thomas, P.G.: Evaluation of Electronic Marking of Examinations. In: Proc. of the 8th Annual Conference on Innovation and Technology in Comp. Scienc. Educ. Greece (2003) 7. Thomas, P.G.: Grading Diagrams Automatically, Technical Report of the Computing Department, Open University, UK, (2003) TR2004/01 8. Pete, P., Price, B., Paine, C., Richards, M.: Remote electronic examination: student experiences. British Journal of Education Technology 33(5), 537–549 (2002) 9. Burstein, J., Leacock, C., et al.: Automated Evaluation of Essays and Short Answers. In: Fifth International Computer Assisted Assessment Conference Learning & Teaching Development, pp. 41–45. Loughborough University, UK (2001) 10. Bieniecki, W., Stoliński, S., Stańdo, J.: Automatic Evaluation of Examination Tasks in the Form of Function Plot. In: Proceedings of the 6th International IEEE Conference MEMSTECH 2010, Lviv-Polyana, Ukraine, pp. 140–143 (2010)
How Can ICT Effectively Support Educational Processes? Mathematical Emergency E-Services – Case Study Jacek Stańdo and Krzysztof Kisiel Technical University of Lodz , Center of Mathematics, Łódź, Poland [email protected], [email protected]
Abstract. Rapid development of Information and Communication Technologies poses many questions about the future of education. Experts have been analyzing the role of traditional course books, examinations, schools and traditional teachers in educational processes and they are wondering about their roles on the threshold of these significant changes in education. Nowadays, the way that young people share the knowledge which they have been provided with has considerably changed. It has already been confirmed by the results and findings of the Mathematical Emergency E-Services project which has been realized for over a year now by the Technical University of Lodz. The article presents, discusses and analyzes e-learning consultations between Mathematics teachers and students in real time carried out within the project’s activities. Keywords: e-services, e-learning, mathematical education.
1 Introduction One can find a number of websites offering payable services for solution tasks of mathematics. Students take advantage of this questionable form of assistance even more frequently. They receive ready-made solutions without understanding them. The former Polish Minister of Education prof. Zbigniew Maciniak had an idea of introducing ‘state tutoring’ that ought to solve the problem of private lessons in primary schools for the class range IV-VI. State tutoring includes obligatory classes of two main subjects: Polish and Maths. In other words, the government wanted to equalize the chances of education and attenuate the common problem of private tuition. Unfortunately, many parents send their children for private lessons, since they are not certain whether school provides their children with appropriate education or not. It results from the fact that school fails to live up to their increasingly high expectations. The problem of private tuition is also increasing in secondary and high schools, with regard to mathematics and foreign languages in particular. Being employees of Technical University of Lodz, we have been familiar with this problem since 2009. From then on, we have been working on the project Mathematical Emergency E-services , which is entirely financed by the EU funds.
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 473–482, 2011. © Springer-Verlag Berlin Heidelberg 2011
474
J. Stańdo and K. Kisiel
The goal of the project is to provide a real-time, immediate assistance in the field of mathematics for students of secondary schools. Every week from Sunday to Thursday, from 5 PM to 10 PM four teachers provide assistance using a special computer system. The student, who has a problem with the solution of his homework in mathematics or does not understand some issues, may connect to the teacher using the Internet. The student will not be given the ready solution, they must actively take part in the discussion with the teacher. In Lodz there are 30 schools which are covered by this program, from which more than 1000 students can get the assistance. The project is to come to an end in 2011. An accurate description of the platform is included in paper [1]. The results of the students’ surveys were discussed in paper [2]. In this work we present some examples of consultation exercises.
2 Case Study Every day teachers face up to many problems which their students come across. In this section we will present some of them. They are authentic conversations provided in the real time. Thanks to the sound on the platform, the conversations are not fully quoted but saved as a comment. Problem 1. • Identifying the problem. Tutor [19:04:21]: Specify the exercise with which you’ve got a problem, please. S. Marlene [19:05:07]: I’ve got a problem with the solution of the following exercise. Draw a cube, select the angle between the diagonal and the plane of the base. Calculate the cosine of this angle. Tutor [19:05:21]: Hang on a moment, please. S. Marlene [19:06:24]: I think it’s the lack of data. •
Proposed solution.
Tutor [19:07:41]: Draw a cube and select the appropriate angle, please. {Marlene draws a cube and selects the appropriate size, figure 1} Tutor [19:10:28]: Calculate the length of the diagonal of the base cube. S. Marlene [19:11:13]: I remember the rule: a^(1/2) {Marlene still does the calculation of the cube diagonal, figure 1} •
Effect
S. Marlene [19:14:04]: Aha, now I have the data to determine the angle cosine. S. Marlene [19:14:57]: I understand, it is not the lack of some data. •
Summary and conclusions.
How Can ICT Effectively Support Educational Processes?
475
Marlene asked a question about how to solve the problem presented: I think that it’s missing data. The tutor solved the problem together with Marlene by using instant messaging and electronic whiteboard. Marlene understood the mistake which she made after the first hints given by the teacher. The tutor’s aim has been achieved. Marlene knows that one can have doubts as to the missing data, but you should always begin with troubleshooting
Fig. 1. Source: Mathematical Emergency E-services
•
Indications
The tutor sent a note to Marlene’s teacher at school: Student Marlene solved several tasks in which there are many variables. The project teachers teaching in schools from which students will be reassured by email contact with their tutors. They anxiously communicated the information about the progress and problems which the students reported. Problem 2. •
Identification of the problem.
Tutor [20:25:21]: Yes, how can I help you? Anna [20:45:16]: I’ve got a problem. It’s about a trigonometrical equation. Anna [20:45:32]: I can’t deal with the exercises. Anna [20:46:17]: 4(sin x)^2 + (sin 2x)^2=3 •
Proposed solution.
Tutor [20:46:21]: Ok., what shall we start with? Anna [20:46:58]: First I should transfer the equation to get the same angle.. Tutor [20:47:25]: Yes, of course. Anna [20:47:30]: I think you have to transfer: sin(2x)^2. Anna [20:47:35]: ok
476
J. Stańdo and K. Kisiel
Anna [20:47:54]: I don’t know which relation is better for me to use. Anna [20:48:10]: to do it. Tutor [20:48:15]: sin 2x = ? Ann [20:48:53]: Yes, now I remember, I guess 2sinxcosx. Anna [20:49:14]: Am I supposed to replace it with this expression. Tutor [20:49:50]: Try it out, please. Anna [20:51:30]: I Just don’t know what to do when the sine is squared. Anna [20:51:34]: How to write it down? Tutor [20:52:35]: First try it out by yourself. You will get something for sure. Anna[20:54:18]: (sin2x)^2 = (2sinxcosx)^2 ? Can I do it this way? Tutor [20:54:52]: Well done. Tutor [20:55:00]: Please, go on. Anna [20:56:32]: Uhm… Anna [20:56:52]: Shall I write it down in one more way? Anna[20:57:11]: Don’t use (sinx^2)+(cosx)^2=1 for 4sin^2x. Tutor [20:57:54]: Ok I’ll help you. Tutor [20:58:47]: (2* sinx * cos x)^2 = 4 * sin^2 x * cos^2 Tutor [20:58:50]: Is that right? Tutor [20:59:24]: (2* sinx * cos x)^2 = 4 * sin^2 x * cos^2 x Tutor [21:00:39]: a cos^2 x = 1 - sin^2 x Tutor [21:00:47]: Is that right ? Anna [21:01:01]: Ok and replace cos^2x? Tutor [21:02:13]: Sure. Anna [21:02:33]: I’ll have 4sin^4 x? Anna [21:02:42]: and put it outside the brackets? Tutor [21:03:22]: Or rather try to see the trinomial square... Tutor [21:05:32]: Did you manage to see the trinomial square Anna [21:05:40]: Instead of sinx^2 put the parameter? Tutor [21:05:59]: Yes, of course ... •
Effect.
Anna [21:06:25]: Ok :) thank you a lot Anna [21:06:33]: Now I will make it.. Tutor [21:06:55]: I’m really glad. Tutor [21:07:24]: Any other problems? Anna [21:08:03]: No thank you. •
Summary and conclusions.
The assistance given to the pupil confirms that teaching mathematics in the process of solving problems it is necessary to give small directions. The pupil would not do the exercise without them. I think that solving a few more examples will allow the student to do the assignment on their own.
How Can ICT Effectively Support Educational Processes?
477
• Indications. The tutor sent a note to Anna’ teacher at school: Please, do a simple trigonometrical equation , using a few identities. Problem 3. •
Identifying the problem .
Catherine [18:42:02]: Good evening Tutor [18:42:55]: Hello Tutor [18:43:51]: What is your problem then. Catherine [18:44:54]: I’d like to do a geometrical exercise and I’d like to do it with you. It’s been ages since I last did the prism Tutor [18:45:13]: First explain the problem. Catherine [18:46:02]: the base of the prism is a rhombus with the acute angle of 30 degrees. The length of the side is 12 cm. Calculate the surface area of this prism if the height is 8 cm. •
Proposed solution.
Tutor [18:46:56]: Do you remember the formula to calculate the area of the rhombus? Catherine [18:47:15]: No, I don’t. Tutor [18:49:02]: I will remind you, can you see the board and the formula (fig. 2)?
Fig. 2. Source: Mathematical Emergency E-services
Catherine [18:49:28]: I even remembered that. Tutor [18:49:34]: Could you calculate the rhombus area? Catherine [18:50:35]: 96 cm Tutor [18:51:06]: As to the units, they should be square... Catherine [18:51:25]: Of course. Tutor [18:51:19]: How about 96? Catherine [18:51:36]: square cm
478
J. Stańdo and K. Kisiel
Tutor [18:51:52]: cm^2 Yes, but I don’t like this quantity. Tutor [18:52:00]: Where is it from? Could you calculate the rhombus area Catherine [18:52:32]: 12*8 Tutor [18:53:03]: but h is the height of the rhombus not the prism. Catherine [18:53:21]: So,. is 72 ok? Tutor [18:53:43]: Exactly. Tutor [18:54:17]: How about the total surface area of the prism? Catherine [18:55:03]: 2Pp+Pb Tutor [18:56:12]: What is Pb? Tutor [18:56:22]: And, what is its quantity? Catherine [18:56:43]: So the Lateran surface area is about 48 but I’m not sure. Tutor [18:58:39]: But, where is the result from? Catherine [18:58:54]: 4*12 Catherine [18:58:58]: but it’s not correct Tutor [18:59:46]: How about 4? Tutor [18:59:56]: Times 12? Catherine [19:00:13]: I don’t know Catherine [19:00:57]: because 4 sides? Tutor [19:01:34]: And what is the surface of one side? Catherine [19:02:06]: 24? Tutor [19:03:24]: Why? What are the lengths of these rectangulars whose areas are unknown? Catherine [19:03:58]: 12 cm Tutor [19:04:41]: What are the lengths of the sides of the rectangulars whose areas are unknown? Catherine [19:05:06]: the side is 12 cm long Tutor [19:04:58]: I meant the sides of these rectangulars. Tutor [19:05:06]: And the other? Catherine [19:05:49]: I don’t know Catherine [19:08:25]: So 8 Catherine [19:08:39]: That is 96] Tutor [19:09:17]: so Pb= Catherine [19:10:50]: it’s Pc=2*72+96=240 Tutor [19:11:21]: But Pb=4* 96 Catherine [19:11:46]: Oh, yes •
Effect.
Catherine [19:12:37]: 2*72+4*96=528 Tutor [19:12:48]: Great! Catherine [19:13:37]: Oh, thank you very much •
Summary and conclusions.
We notice that the pupil gets to the final solution going through all the particular stages, with more assistance of his tutor. The teacher played here a key role of managing the didactic process in view of solving the problem. Sometimes the teacher
How Can ICT Effectively Support Educational Processes?
479
reacted to their student’s passivity. If the teacher gave the ready answer, it would be of no advantage for the pupil. They would take the solution for granted and it is the biggest mistake that a teacher could make. Finally, the student gets a praise and it is obvious that they are happy with their particular success. •
Indications.
The tutor sent a note to Catherine’s teacher at school: The student must revise the basics of geometry. The tutor does not finish the conversation since Catherine brings up another problem. Problem 4. •
Identifying the problem.
Catherine [19:14:28]: What if the base of the simple prism whose diagonals are 15 and 20 cm long. The height of the prism is 17 cm. Calculate the surface area. •
Proposed solution.
Tutor [19:16:35]: The exercise is very similar to the previous one. Try to solve it. Catherine [19:22:37]: but what is the side? Catherine [19:22:53]: We have the height. Catherine [19:22:55]: 17 cm Tutor [19:23:25]: Why do you need the side? Catherine [19:23:52]: to calculate the lateral surface, Tutor [19:23:53]: But you should remember about the base and height of the prism.. Catherine [19:24:57]: So I don’t need the length of the side Tutor [19:26:08]: Yes, you do Tutor [19:26:28]: But, you have the diagonals. Catherine [19:26:40]: But how to calculate? Tutor [19:27:59]: From the Pythagorean theorem (fig.3). •
Effect.
Catherine [19:29:26]: Ok. I’ll Deal with this exercise now. Tutor [19:29:33]: Why don’t you write the results? Catherine [19:29:48]: Hang on a second Catherine [19:31:54]: So Pc= 1150 cm Tutor [19:34:06]: Great! Catherine [19:35:38]: Thank you for your help Catherine [19:35:42]: Have a nice evening Tutor [19:36:07]: Thank you Tutor [19:36:14]: the same to you… •
Summary and conclusions.
480
J. Stańdo and K. Kisiel
In this exercise the tutor gave some clues and helped the student to get on the right track. The pupil did this exercise practically on their own, as being analogous to the previous one. The assistance given in the first exercise was far bigger than in this one. •
Indications.
The tutor sent a note to Catherine’s teacher at school: The student must revise the basics of geometry.
Fig. 3. Source: Mathematical Emergency E-services
The tutor offered the student to come for consultations the next day in order to solve a more serious problem. Figures 4 and 5 present a fragment of the consultations the next day.
Fig. 4. Source: Mathematical Emergency E-services During the consultations, the tutors make use of the geogebra or Cabri programs to illustrate the problem. The examples of some useful constructions has been shown in figure 6,7.
How Can ICT Effectively Support Educational Processes?
Fig. 5. Source: Mathematical Emergency E-services
Fig. 6. Source: Mathematical Emergency E-servicestwqyuud zzt hdas
Fig. 7. Source: Mathematical Emergency E-services
481
482
J. Stańdo and K. Kisiel
Fig. 8. Newspaper: “Rzeczpospolita”, “Dziennik Lodzki“
3 Conclusions The E-ambulance maths is of great interest for many students, more than a thousand students in total. In the future we would like to extend it to the entire country. By using new information technologies in teaching, examinations are taking a new dimension - a global dimension [3][4][5]. Teaching and examinations are beginning to leave the walls of schools turning to e-school. Are we witnessing a process of change? Are we attending schools or e-school? The following projects prove it best: “Mathematical emergency e-service” and “E-exams”. The project was received with great interest within many educational environments. It has been repeatedly featured in the media as well (Figure 8). Acknowledgments. Project Director, Jacek Stańdo made thanks to the Marshal Office in Lodz for the grant funds from the European Union project to conduct an Mathematical Emergency E-services.
References 1. Krawczyk-Stańdo, D., Stańdo, J.: Supporting didactic processes with Mathematical Eemergency Services, Poland, Education Nr 2 (110) (2010) 2. Stańdo, J., Bieniecki, W.: Ways of application of different information technologies in education on the example of mathematical emergency e-services. In: Information Systems in Management VII, Distant Learning and Web Solutions for Education and Business, Scientific editors Piotr Jałowiecki, Arkadiusz Orłowski, Warsaw (2010) 3. Xiaolin, C.: Design and Application of a General-purpose E-learning Platform. International Journal of Business and Management 4(9) (2009) 4. Gumińska, M., Madejski, J.: Scaleable model of e-learning platform. Journal of Achievements in Materials and Manufacturing Engineering 21 (2007) 5. Ali, G., Bilotta, E., Gabriele, L., Pantano, P., Servidio, R.: An e-Learning Platform for applications of mathematics to microelectronic industry. In: Proceedings of the14th European Conference on Mathematics for Industry (ECMI). Springer, Heidelberg (2006a)
A Feasibility Study of Learning Assessment Using Student’s Notes in An On-Line Learning Environment Minoru Nakayama1 , Kouichi Mutsuura2 , and Hiroh Yamamoto1 1
2
Human System Science, Tokyo Institute of Technology, Ookayama, Meguro, Tokyo, Japan Faculty of Economics and Graduate School of Engineering, Shinshu University Asahi, Matsumoto, Japan [email protected] http://www.nk.cradle.titech.ac.jp/~ nakayama Abstract. The effectiveness and features of ”note-taking” are examined in an on-line learning environment. The relationships between the assessment of the contents of participants’ note taken during class, and the characteristics of students were studied. Some factors about personality and the learning experience are significant, and positively affect the grades given to their notes. Features of notes taken were extracted using a text analysis technique, and these features were compared with the grades given. The good note-takers constantly recorded the terms independently of the number of terms which was presented during the class. Conceptual mapping of the contents of notes was conducted, and it suggests that the deviation in the features of notes can be explained by the number of terms in a lesson. Keywords: note-taking, on-line learning, learning activity, learning evaluation, text analytics.
1
Introduction
“E-technology” supports various types of educational practices. An online learning environment can provide a flexible method for teaching university courses. Additionally, educational evaluation is facilitated by allowed notes taken by students during a course to measure and track the learning process. “Note-taking” is recognized as a popular skill, used in all types of learning situations; even in higher education [17]. The functions and effectiveness of notetaking have already been reviewed and discussed [6,15]. In particular, “note taking” requires the summarization and understanding of the context of the notes [14]. Learning performance of note-takers has been previously confirmed at the university level [13,7,12]. Though there are a few means for students to obtain note-taking skills at Japanese universities, they usually have to gain practical experience during conventional courses and in distance or on-line learning environments. On-line learning environments are advancing and do provide paper-less settings for learning, H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 483–493, 2011. c Springer-Verlag Berlin Heidelberg 2011
484
M. Nakayama, K. Mutsuura, and H. Yamamoto
so that students receive fewer printed materials and also encounter fewer opportunities to take notes. This problem has been pointed out [5], though some of the influence of on-line learning environments on note-taking activities has also been confirmed using a previous survey of results of learning activities of ordinary students [12]. Note-taking activities may depend on the characteristics of students, and also on the setting. The contents of notes taken require the quantification of the features of the notes in order to develope a support systems and to improve the teaching methodology. In this paper, the following topics were addressed in response to the problems mentioned above: – The relationship between the assessment of the contents of notes taken and student’s characteristics was measured to extract factors used in note-taking activities in an on-line learning environment. – The relationship between lexical features of the contents of students’ notes and the lecturer notes which were provided during a class were analyzed to extract the features of the contents of notes.
2 2.1
Method On-Line Learning Courses
Note-taking activity was surveyed during an information networking system course, which was a blended distance education course in a bachelor level program at a Japanese university. All participants were bachelor students in the Faculty of Economics, and they have already had some experience studying in an on-line learning environment. In this blended learning course, face-to-face sessions with students were conducted every week, and students from the course who participated in this survey gathered in a lecture room. Being a good note taker was expected of all participants during face-to-face sessions. Students were encouraged to take an on-line test for each session of study outside the class room, as a function of the learning management system (LMS). To do well in an on-line test, good notes are preferable. These on-line test scores were referred to when final grades were determined, and students could try to take tests repeatedly until they were satisfied with their results. The LMS recorded the final scores. This course required participants to write three essay reports and a final exam. Participants have to synthesize their own knowledge regarding three topics taught during the course in the essay reports. These activities are an important part of the constructivism learning paradigm and may influence comprehension during the course [16]. As the course included face-to-face sessions, the lecturer could talk freely about course topics, and students could take notes of the impressions they received there. 2.2
Note-Taking Assessment
All participants were asked to take their own notes, and to present their notebooks to the lecturer every week. The lecturer quickly reviewed and assessed
A Feasibility Study of Learning Assessment Using Student’s Notes
485
each student’s notes after the weekly sessions, then returned the notebooks to students as soon as possible. Prior to doing so, all notes were scanned by the lecturer and stored as images in a PC. The contents of the lecturer’s notes created when the course sessions were designed can be used as a standard for the notes taken in every class. Therefore, these notes can be used as the criterion for evaluating students’ lecture notes. The individual content of students’ notes was evaluated using a 5-point scale (0-4), 4:Good, 3:Fair, 2:Poor, 1:Delayed, 0:Not presented. If a student reproduced the same information in his or her notebook, the note-taking was rated as “Fair”. “Fair” note-taking is the reproduction of transmitted information given as instructions. If any information was omitted, the rating given was “Poor”. In a sense, “Poor” note-takers failed to reproduce the information transmitted. When students wrote down additional information from the lecture, the note-taking was rated as “Good”. The “Good” note-takers included those who integrated this knowledge with relevant prior knowledge [8], as some pieces of knowledge are related to each other, and some pieces of knowledge are related to relevant prior knowledge. At this point, several kinds of constructivistic learning activities were occurring. The number of valid subjects was 20 students.
2.3
Characteristics of Students
In this study, the student’s characteristics were measured using three constructs. These constructs were: personality [3,4], information literacy [10] and learning experience [11]. 1. Personality: For the first construct, the International Personality Item Pool (IPIP) inventory [4] was used. Goldberg [3] lists five personality factors and for this construct, there were five component scores: “Extraversion”, “Agreeableness”, “Conscientiousness”, “Neuroticism” and “Openness to Experience”. 2. Information literacy: Information literacy is made up of various abilities, such as operational skills used with information communication technology, and knowledge about information sciences. Fujii [2] defined and developed inventories for measuring information literacy. For this construct, the survey had 32 question items, and 8 factors were extracted: interest and motivation, fundamental operation ability, information collecting ability, mathematical thinking ability, information control ability, applied operation ability, attitude, and knowledge and understanding. The overall mean of factor scores was used to indicate each student’s information literacy level. This inventory was originally developed to measure information literacy among high school students in many countries. It can also be used to measure the information literacy level of university students [2]. 3. Learning experience: Students’ on-line learning experiences were measured using a 10-item Likert-type questionnaire. As in previous studies, three factors
486
M. Nakayama, K. Mutsuura, and H. Yamamoto
were extracted: Factor 1 (F1): overall evaluation of the e-learning experience, Factor 2 (F2): learning habits, and, Factor 3 (F3): learning strategies [11].
2.4
Text Analysis of the Notes Taken
Ten of twenty subjects were randomly selected, and all images of notes taken were converted into digital texts. Figures were excluded. The texts of selected notes and the lecturer’s notes were classified into noun and adjective terms using a Japanese morphological term analysis tool [9], then a term-document matrix (X), and term frequency vectors were created for each class session. Latent semantic indexing (LSI) was used to classify documents according to the degree of similarity between them [1]. The features of terms and course sessions were extracted from the term-document matrix of the notes taken in each class using SVD (singular value decomposition analysis) [1] as follows: X = T SD . The term feature matrix (T ) contains feature vectors of each term. The feature vectors of note documents (N ) for each class session (i) were calculated for the lecturer and for each student using a summation of term feature vectors (T ) which was weighted by term frequency (Fi ) in a class (i), therefore, Ni = T Fi .
3 3.1
Results Note-Taking Assessment
Grades of Notes Assessed The assessment scores of each set of notes taken were gathered and summarized in Figure 1. According to the figure, percentages for “Fair” are almost always higher than the other ratings during almost all weeks. The percentage of note takers rated “Good” in the first five weeks of the course is relatively higher than the percentages for the remaining weeks. This suggests that students cannot create “Good” notes as the course progresses. Also, the percentages of “Poor” ratings is almost always the lowest. Effectiveness of Student’s Characteristics One of the hypothesis of this paper, the influence of student’s characteristics, is examined in this section. The sums of assessment scores for note-taking during the course are calculated to evaluate student’s note-taking ability. The students are then divided into two groups, consisting of high scores and low scores. Scores are compared between the two groups, to measure the contribution of student’s attributes and characteristics for note-taking activities. First, learning performance, report scores, on-line test scores and final exam scores of the two groups were summarized in Table 1. Though there are no significant differences between the two groups, scores of on-line tests and final exams for the group of note-takers with high scores are higher than those for the
A Feasibility Study of Learning Assessment Using Student’s Notes
487
100
Percentage (%)
80
60
NT: Fair
40 NT: Good 20 NT: Poor 0 1
2
3
4 5 6 7 8 9 10 11 12 13 Course sessions (weeks)
Fig. 1. Grades of notes assessed across weeks. (n=20).
Table 1. Learning performance between two groups of note assessment scores
Report score On-line test score Final exam score () indicates SD.
Note assessment score High(n=11) Low(n=9) 0.59(0.27) 0.67(0.18) 289.7(85.0) 254.3(79.7) 49.5(7.3) 44.9(7.3)
Level of significance n.s. n.s. n.s.
group with low score, while report scores for the low score group are higher than those for the high score group. As a result, note-taking scores did not contribute to learning performance in this course. The contribution of learning experiences such as a subject’s subjective evaluation was examined, and three factor scores are summarized in Table 2 using the same format as Table 1. The third factor, learning strategies, is significantly higher for the low score group than it is for the high score group. This consists of two question items: “I have my own method and way of learning” and “I have my own strategies on how to pass a course”. Therefore, the purpose of note-taking may affect the scores of note-takers. The relationship between a student’s personality and note-taking performance was measured. The note-taking scores are summarized across 5 personality factors in Table 3. As shown in the table, the scores for the high score group are significantly higher than those for the low score group across all personality factors except the “Openness to Experience” factor. This result suggests that learning personality positively affects note-taking activity. These aspects should be considered when out-of-class assistance is provided to students to improve their note-taking skills.
488
M. Nakayama, K. Mutsuura, and H. Yamamoto Table 2. Learning experience between two groups of note assessment scores
Factors of learning experience Overall evaluation of the e-Learning experience
Learning habits Learning strategies 5-point scale, () indicates SD.
Note assessment score High(n=11) Low(n=9) 2.82(0.58) 3.17(0.65) 2.09(0.63) 2.50(0.97) 3.00(0.71) 3.67(0.61)
Level of significance n.s. n.s. p < 0.05
Table 3. Scores of personality factors between two groups of note assessment scores
Personality Factors Extraversion Agreeableness Conscientiousness Neuroticism Openness to Experience 5-point scale, () indicates
Note assessment score High(n=11) Low(n=9) 3.29(0.61) 2.52(0.73) 3.58(0.35) 3.10(0.63) 3.37(0.47) 2.88(0.51) 3.34(0.60) 2.38(0.75) 3.74(0.76) 3.23(0.64) SD.
Level of significance p < 0.05 p < 0.05 p < 0.05 p < 0.01 n.s.
The contribution of information literacy is also measured using the same format. In Table 4, the scores of note-takers are summarized across 8 factors of information literacy. Additionally, the scores are summarized in response to the two secondary factors and a summation of information literacy. There are significant differences in note-taking scores for “Fundamental operation ability” and “Knowledge and understanding” at a 10% level of significance. All factor scores for information literacy for the high score group are higher than the scores for the low score group, except for the factor “Interest and motivation”. Study of the contribution of information literacy to note taking should be continued using larger samples. 3.2
Features of Notes
In order to compare the number of terms in notes between the lecturer and students, mean numbers of terms are summarized in Figure 2. Because the numbers of terms which appear in lectures is different across course sessions, the volume of notes taken by students may depend on the lecturer’s notes. In Figure 2, the horizontal axis shows the number of terms the lecturer has presented in each course sessions, and the vertical axis shows means of terms students have written down. Therefore, the diagonal line in the figure shows the same numbers of terms which the lecturer and note-takers have written. The error bar shows the standard error of the mean. The number indicates a series of course sessions. The means are calculated for “Good” and “Fair” note takers respectively. Both the numbers and means of terms for “Good” note takers are always higher than the numbers and means for “Fair” note takers, while the number for “Fair” note takers is almost the same as the number of terms written by the lecturer, because these plots are distributed on a diagonal line. In particular, the
A Feasibility Study of Learning Assessment Using Student’s Notes
489
Table 4. Scores of information literacy between two groups of note assessment scores
Information literacy Factors Interest and motivation Fundamental operation ability Information collecting ability Mathematical thinking ability Information control ability Applied operation ability Attitude Knowledge and understanding Grand total 5-point scale, () indicates SD.
Note assessment score High(n=11) Low(n=9) 3.84(1.00) 4.22(0.49) 4.52(0.49) 3.92(0.88) 3.57(0.98) 3.11(0.93) 2.86(0.85) 3.17(1.31) 3.09(1.10) 2.86(0.79) 3.11(1.06) 2.89(1.11) 3.00(0.85) 2.69(0.69) 3.68(0.79) 2.97(0.81) 3.46(0.61) 3.23(0.65)
Level of significance n.s. p < 0.10 n.s. n.s. n.s. n.s. n.s. p < 0.10 n.s.
number of terms for “Good” note takers stays high even when the number of terms in the lecturer’s notes is low. According to Figure 2, the “Good” note takers record more terms. The number of terms these students record is almost always higher than the number of terms provided by the lecturer. The next question is whether the terms students recorded covered the terms which the lecturer presented. To evaluate this degree of coverage, a coverage ratio was calculated as a percentage of terms which were
Number of terms (Students)
300 250 10
200
7
8
150
9
100
11 2 6
3 4
5 12
50 13
1
Good Fair
0 0
50
100 150 200 250 Number of terms (Lecture)
300
Fig. 2. Relationship between the number of terms in notes for lecturer and students
490
M. Nakayama, K. Mutsuura, and H. Yamamoto 1 NT: Good Term coverage rate (%)
0.9
0.8
NT: Fair
0.7
0.6
0.5 1
2
3
4 5 6 7 8 9 10 11 12 13 Course sessions (weeks)
Fig. 3. Coverage ratio of terms in student’s notes compared to lecturer’s notes
recorded by students. The coverage ratios of terms for each session are summarized in Figure 3. As the figure shows, there are some differences in coverage rates between “Good” and “Fair” note takers. The differences are almost small, while the ratios are almost always over 80 %. The possible reasons why coverage rates are relatively low in the 4th and 9th sessions may be because it is not easy for students to pick up additional terms while the lecturer displays figures. The above analyses suggest that all subjects record almost all terms the lecturer presents during the class. This means that all students can reproduce the conceptual contents of the lecture. Also, choosing to record the rest of the terms presented by the lecturer may affect the grades of notes taken. To confirm this phenomenon, features of each note are calculated using the term frequency of the session and feature vectors of terms which are extracted from the LSI model mentioned above. Features of all notes are illustrated two-dimensionally in Figure 4(a), using two principal components of the feature vectors. The horizontal axis shows the first component, and the vertical axis shows the second component. The features of notes for “Good”, and “Fair” note takers and for the lecturer are displayed separately. Also, the number of sessions is indicated. The distances between the features of notes show a degree of similarity between them. The degree of separation of the plots of these notes of sessions represents certain aspects of note taking. As the figure shows, there are clusters for each class session. For some sessions, features of notes for students and features of the lecturer’s notes overlapped. To make clear the relationship, four typical sessions are extracted in Figure 4(b). In this figure, plots for sessions 7 and 10 produce a small cluster, which means that all notes are similar to each other. This suggests that all students can reproduce the lecturer’s notes, though their notes are classified into two grades. On the other hand, plots for sessions 12 and 13 are distributed in the same plane, but plots for the student’s notes are disparate from the lecturer’s notes.
A Feasibility Study of Learning Assessment Using Student’s Notes 0.6 3
0.4
9
4 10
8 11
5
1
0.2 2
0 12
-0.2
0.4 10
0.2 0 12
-0.2
13
-0.4
Second component of feature
Second component of feature
0.6
491
6 7
Good Fair lecturer
-0.6 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 First component of feature (a)
-0.4
13 7
Good Fair lecturer
-0.6 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 First component of feature (b)
Fig. 4. Conceptual map for students and lecture notes: (a): all sessions, (b) typical sessions
According to Figure 2, the number of words written down for sessions 7 and 10 are the highest, while the number of words written down for sessions 12 and 13 are the lowest. When the number of terms is large, most notes include almost all major features, so that all plots are mapped around a common point. The rest of the terms and the structure of student’s note-taking may affect their note-taking grade, however, because the lecturer had classified their notes as “Good” and “Fair”. When the lecturer presents a small number of terms, the plots of “Good” note takers are disparate from the lecturer’s notes since the students recorded some related terms in addition to the terms presented. The “Fair” note takers miss some terms presented by the lecturer and add appropriate or inappropriate terms to their notes, as their conceptual positions are different from those of the lecturer. To compensate for this learning condition, designs for the structure of classes and student support should be considered carefully. These points will be subjects for our further study.
4
Conclusion
To improve learning progress and to develope a support program for use in an online learning environment, the assessment of note-taking ability as an important learning activity was conducted during a blended learning course. The relationships between note-taking activity and student characteristics were examined. According to the results of a comparison of scores between “Good” and “Fair” notes taken, student’s characteristics, such as personality and one of three factors regarding the learning experience affect the contents of their notes.
492
M. Nakayama, K. Mutsuura, and H. Yamamoto
Some lexical features of the notes taken were extracted, and the relationship between these features and the grades of the note takers were discussed. The “Good” note-takers constantly record terms for their own notes independent of the terms presented by the lecturer. The features of student’s notes can be presented as a conceptual map, and the relationship between the deviations in features of these notes and the number of terms the lecturer presented are discussed. The development of supporting methodologies will also be a subject of our further study.
Acknowlegements This research was partially supported by the Japan Society for the Promotion of Science (JSPS), Grant-in-Aid for Scientific Research (B-22300281: 2010-2012).
References 1. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990) 2. Fujii, Y.: Development of a Scale to Evaluate the Information Literacy Level of Young People –Comparison of Junior High School Students in Japan and Northern Europe. Japan Journal of Educational Technology 30(4), 387–395 (2007) 3. Goldberg, L.R.: A Broad-Bandwidth, Public Domain, Personality Inventory Measuring the Lower-Level Facets of Several Five-Factor Models. In: Mervielde, I., Deary, I., De Fruyt, F., Ostendorf, F. (eds.) Personality Psychology in Europe, vol. 7, pp. 7–28. Tilburg University Press (1999) 4. International Personality Item Pool, A Scientific Collaboratory for the Development of Advanced Measures of Personality Traits and Other Individual Differences, http://ipip.ori.org 5. Kiewra, K.A.: Students’ Note-Taking Behaviors and the Efficacy of Providing the Instructor’s Notes for Review. Contemporary Educational Psychology 10, 378–386 (1985) 6. Kiewra, K.A.: A Review of Note-Taking: The Encoding-Storage Paradigm and Beyond. Educational Psychology Review 1(2), 147–172 (1989) 7. Kiewra, K.A., Benton, S.L., Kim, S., Risch, N., Christensen, M.: Effects of NoteTaking Format and Study Technique on Recall and Relational Performance. Contemporary Educational Psychology 20, 172–187 (1995) 8. Mayer, R.E., Moreno, R., Boire, M., Vagge, S.: Maximizing Constructivist Learning From Multimedia Communications by Minizimizing cogmitive load. Journal of Edcational Psychology 91(4), 638–643 (1990) 9. MeCab: Yet Another Part-of-Speech and Morphological Analyzer, http://mecab.sourceforge.net 10. Nakayama, M., Yamamoto, H., Santiago, R.: Impact of Information Literacy and Learner Characteristics on Learning Behavior of Japanese Students in On line Courses. International Journal of Case Method Research & Application XX(4), 403–415 (2008)
A Feasibility Study of Learning Assessment Using Student’s Notes
493
11. Nakayama, M., Yamamoto, H., Santiago, R.: The Impact of Learner Characteristics on Learning Performance in Hybrid Courses among Japanese Students. The Electronic Journal of e-Learning 5(3), 195–206 (2007) 12. Nakayama, M., Mutsuura, K., Yamamoto, H.: Effectiveness of Note Taking Activity in a Blended Learning Environment. In: 9th European Conference of E-Learning, pp. 387–393. Academic Publishing, Reading (2010) 13. Nye, P.A., Crooks, T.J., Powley, M., Tripp, G.: Student note-taking related to university examination performance. Higher Education 13, 85–97 (1984) 14. Pilot, A., Olive, T., Kellogg, R.T.: Cognitive Effort during Note Taking. Applied Cognitive Psychology 19, 291–312 (2005) 15. Trafton, G.J., Trickett, S.B.: Note-Taking for Self-Explanation and Problem Solving. Human-Computer Interaction 16, 1–38 (2001) 16. Tynaj¨ al¨ a, P.: Towards expert knowledge? A comparison between a constructivist and a traditional learning environment in the university. International Journal of Educational Research 31, 357–442 (1999) 17. Weener, P.: Note taking and student verbalization as instrumental learning activities. Instructional Science 3, 51–74 (1974)
IBS: Intrusion Block System a General Security Module for elearning Systems Alessio Conti1, Andrea Sterbini2, and Marco Temperini1 1
Department of Computer and System Sciences 2 Department of Computer Science, Sapienza University of Roma, Italy [email protected], [email protected]
Abstract. The design and implementation of a security plug-in for Learning Management Systems is presented. The plug-in (called IBS) can help in protecting a Leaning Management System from a varied selection of threats, carried on by malicious users via internet. Nowadays it is quite likely that the installer and/or administrator of a system are interested teachers, rather than skilled technicians. This is not a problem from the point of view of user friendliness and ease of use of the systems functionalities; those are actually features that motivate the widespread adoption of both proprietary and open source webbased learning systems. Yet, as any other web application, learning systems are subject to seamless discovery and publication of security weaknesses buried into their code. Accordingly, such systems present their administrators with apparent needs for continuous system upgrade and patches installation, which may turn out to became quite a burden for teachers. The integration of IBS in a system allows easing the above mentioned needs and can help the teachers to focus their work more on the pedagogical issues than on the technical ones. We report on the present integration of IBS in two well established open source Learning Management Systems (Moodle and Docebo), allowing for a reasonably standing protection from the threats comprised in five well known classes of “attacks”. Besides describing the plug-in definition and functionalities, we focus in particular on the specification of a whole protocol, devised to guide the adaptation and installation of IBS in any other php-based learning system, which makes the applicability of the plug-in sufficiently wide. Keywords: LMS, LMS security, SQLi, XSS, RCE, LFI, RFI, Moodle, Docebo.
1 Introduction In recent years the construction and use of Learning Management Systems (LMSs) has surged ahead. An LMS is a web application allowing the management and delivery of e-learning courses. Modern LMSs comprise a very wide set of functionalities: -
supporting variable levels of administrative management (such as enrolment in programs and courses), complying with the application of standards for e-learning in the production and use of learning objects,
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 494–503, 2011. © Springer-Verlag Berlin Heidelberg 2011
IBS: Intrusion Block System a General Security Module for elearning Systems
-
495
helping the teacher in the definition of aims, pedagogical strategies and learning content of courses, supporting teachers and learners via assessment tools and accompanying the interaction between learners and teachers, recently also with the use of features for social interaction and sharing, as in the Web2.0 enhanced LMSs.
In the nature of LMSs as “web-application”, though, is comprised the possibility to be misused by malicious internet users [1,2,3]. Usually such misuses are performed by “exploiting” security weakness buried into the application programming code, through which the attacker can originate leaks of possibly data from the system (user data, passwords, not to mention homework marks). Against security weaknesses, and delicate data leaks, an LMS in the same position of a common e-commerce web application or of a whole web server in fact: nobody wants them and no effort must be spared by the system administrator in order to avoid such events. On the other hand, it is quite likely that the administrator and/or installer of an LMS is not in the same professional position as for their counterparts in the other mentioned applications: she may be rather a teacher willing to use new technologies for her teaching activity, and may be for her colleagues’. The security weaknesses of an application are usually depending on errors made by the programmers, deeply buried in the programming code of possibly just one script in the system or just a very small part of the application. Details about the security exploits available on a web application are usually spread via internet. The same happens for the alerts about exploits, and for the publication of the corrections needed by the code, under the form of patches (small programs that do the very limited necessary corrections) or recommendations about system settings and protocols. This means that the administrator of a web application is usually subject to a seamless activity of checking about published threats and application of the corresponding patches, with the occasional need of reinstalling the system with a newer whole version. While this is simply “work” for somebody, this may turn out to become quite a burden for teachers responsible for LMSs. Besides the mentioned burden, we may also consider that in the case of LMSs (and of teachers that might not be seamlessly available to searching and installing patches) there may be a dangerous latency between weakness discovery and application of the issued protection: during this span of time, the system would be unprotected and subject to well documented and easily attemptable attacks.
2 Related Work In the general area of web-applications, the design and implementation of system protections, capable to act against whole families of technically homogeneous exploits, is already endeavoured and successful. Intrusion Detection Systems, such as ModSecurity [4] and php-ids [5], work at the level of the web-server. This is done either basing on lists of known weaknesses and vulnerabilities, or filtering interactions, through monitors watching on anomalies, unusual behaviour, and common web application attack techniques.
496
A. Conti, A. Sterbini, and M. Temperini
In [6] a preliminary implementation of the Intrusion Block System (IBS) plug-in has been presented; it was installed in Moodle [7] version 1.8.4 e 1.9.5 and was shown successful in blocking and reporting on threats in a family of five main classes of attacks. While the approach followed for the implementation of the first plug-in has been mostly ad-hoc, in this paper we generalize it, to the extent of the definition of a protocol applicable to the following versions of Moodle and to any other php-based open source LMS. We shortly describe the classes of attacks whose block is managed by IBS, give details about the protocol of integration of IBS in other LMSs, and show the application of such protocol for the installation of IBS in Moodle, again, and in Docebo [8], where IBS is integrated seamlessly in the (9) versions 3.0.3 – 3.0.6.4.
3 The Problem 3.1 Five Classes of Attacks IBS is designed to stop the five most common types of attack to web sites: •
•
SQL Injection (SQLI): execution of additional SQL code together with the original SQL query. The attacker could extract information from the database or change its content. This attack normally exploits bugs in the PHP application while it's handling the URL or form's parameters of html headers, to build the SQL database queries. E.g., suppose that one of the LMS pages uses the SQL query SELECT name,surname FROM usertable WHERE id=X to show on the current page the user's name and surname, and that the value X is simply concatenated from unsafe input. The attacker could pass the parameter X='-1 UNION SELECT username,password FROM users WHERE id=1' to transform the initial query into the following: SELECT name,surname FROM usertable WHERE id=-1 UNION SELECT username,password FROM users WHERE id=1 The resulting query produces a null set of rows (from the first part) together with a single row containing the username and password of the user with ID=1. The resulting PHP page shows username and password instead than name and surname. The attacker now could decrypt the password and assume the stolen user's identity in the system. Local File Include (LFI): inclusion of local files in the PHP pages. This attack is normally used to expose internal files containing reserved data (e.g. the server's password file). The attacker exploits application bugs while handling the URL, form parameters or html headers, to build parameters for the include() or require() functions, to include a local file and show its content in together with the application content.
IBS: Intrusion Block System a General Security Module for elearning Systems
497
E.g. Suppose the system is multilingual, and that the language selected for the pages is taken from one of the language preferences exchanged between the browser and the web server. Normally the preferred language is expressed as an ISO country code (e.g. 'en', 'fr', 'it' …). Suppose the LMS loads the language strings by just including the file with name $APP_PATH/$LANG, where $LANG is the preferred non-sanitized language string and $APP_PATH is the LMS directory. An attacker could be able to include (and show) the local server passwords file by crafting a special header containing the preferred language string "../../../../../../../../etc/passwd" instead than the "it" language name, and the PHP code will include and show the /etc/passwd file. •
Remote File Include (RFI): inclusion and execution of remote PHP files (residing in a external server) inside the normal PHP application code. This attack could execute malicious code within the web server permissions. The attack is normally based on a wrong configuration of the PHP server, that allows inclusion and execution of remote files, together with application bugs in the handling of URL/form parameters or html headers, while building the arguments for a include() or require() function call. E.g. In this case, suppose that the PHP system is mis-configured, allowing the inclusion of remote files. As in the previous example, the attacker could craft an exchanged header parameter, or one of the other possible URL and form inputs, such that it contains the string “http://www.evil.com/exploit.php” the, as before, the unsafe usage of this string as a local language file to be included would automatically download the “exploit.php” file from the “www.evil.com” server and execute its content inside the LMS itself, within the privileges of the web-server user. This would completely expose the LMS, the database and the web server, executing any kind of PHP commands. In the worst case, if the web server runs with administration privileges, the whole server would be compromised.
•
Remote Command Execution (RCE): Execution of system commands from the PHP scripts. This attack could either gain complete access to the host or disclose reserved data. Again, the attack exploits bugs in the handling of form parameters, html headers or URLs, to build commands executed by the application through the system() call. E.g. suppose that the LMS makes a directory with the user's ID when a file is uploaded; the ID parameter is passed by means of a form parameter and the creation of the directory is done by means of the system call system( "md $ID" ); without proper sanitization of the $ID parameter.
498
A. Conti, A. Sterbini, and M. Temperini
In this case an attacker could run any system command by just appending it to the $ID parameter as follows: "89 ; cat /etc/passwd" The resulting command "mkdir 89 ; cat /etc/passwd" would first create the directory and then output the content of the system passwords file in the page. •
Cross Site Scripting (XSS): inclusion of malicious JavaScript in the generated web pages. The delivered malicious JavaScript code runs inside the other visitors' browsers and could steal personal data (e.g. username, password, cookies). E.g. suppose the LMS contains a wiki component, where any user could write text pages. If the edited pages are not checked against extraneous HTML tags and JavaScript code, any user could just paste pieces of JavaScript that will execute in the browser of other LMS visitors. This could allow stealing personal information (username, cookies, passwords).
Notice that with IBS we aren't concerned with more complicated attacks exploiting vulnerabilities at the web-server level. What we want to filter are hacking attempts at the application level. 3.2 A Common Problem: Non-sanitized Input The first four, of the five types of attacks discussed earlier, share a common attack methodology: • •
the attacker crafts special inputs to be served from the browser to the PHP application, to exploit flaws in their usage inside the application, then the Web application uses this bad inputs to prepare the parameters for particularly sensitive functions: • SQL queries (SQLI) • include() or require() (RFI and LFI) • system() calls (RCE)
The fifth attack type (XSS) can be classified under the same schema if we describe it as follows: • •
the attacker crafts special input to be served from the browser to the PHP application, to exploit flaws in its usage inside other peoples' browser. then the visiting browsers use this (indirect) bad input to run JavaScript.
The real culprit is the mismanagement of input coming from the browser. The application shouldn't EVER trust it, and thus all data coming from the browser should be “sanitized” before use, i.e. transformed to sure input and used safely. Unfortunately, developers of LMSs and web applications sometimes lack the security expertise to write safe PHP code. Yet, a common detection and defence strategy can be designed to block all five types of attacks, through the analysis of the input coming from the attacker's browser.
IBS: Intrusion Block System a General Security Module for elearning Systems
499
In this we use signature-based analysis of the input data to detect attacks. Fortunately, all input coming from the browser is contained in only 4 PHP arrays: • • • •
the parameters of a form submission of type POST the parameters of a URL request (or a GET form) the headers exchanged within the request the browser's cookies for that domain
$_POST $_GET $_SERVER $_COOKIE
To ensure that form/url parameters, cookies and headers do not contain an attack, we look for signatures of well-known attacks in their contents.
4 Contribution The IBS system is the result of the complete redesign of an earlier Moodle adaptation, with the aim of defining a stable API and an OOP-based implementation that allows easy adaptation to other database engines or to other LMSs. Moreover, the security engine is designed to be extensible with other detection mechanisms.
IBS (Intrusion Block System): Structure USERS HTTP REQUEST
CLEAN DATA
NETWORK DATA INPUT
HACKER
ATTACK ENGINE LOAD FILTERS
BAD DATA ESCAPE LOAD CONFIG.
LMS/WEB APPLICATION
M y S Q L D B
LOGS
M y ADMIN S Q L READ D LOGS B
BLOCK MySQL DB
MySQL DB BLOCK PAGE
FILTERS
CONFIG.
IBS
Fig. 1. IBS architecture
4.1 Architecture of IBS The main requirements for the three IBS components are: • the database should be updated easily with the signatures of newly discovered types of attacks • the security module should be fast (as it's called for every page) • the administration interface should blend easily within the LMS' administration pages
500
A. Conti, A. Sterbini, and M. Temperini
The IBS system is, thus, made of 4 parts (see Figure 1): • • • •
a database of suspect signatures, represented as regular expressions a database logging the intrusion attempts a security engine called before any PHP page is requested to the application a web-based administration interface
4.2 Workflow of the IBS Interactions IBS works as a filter between the browser and the LMS application, checking if one of the four arrays containing cookies, headers and parameters matches against one of the signatures associated to known attacks. Ó If a match is detected an “access blocked” page is shown and info about the attack, including the IP address, are recorded in the IBS log, Ó else the input is deemed correct and the normal execution of the requested LMS page is continued. 4.3 Installing IBS The IBS installation is now automatic, the system manager needs only to copy the IBS files in the LMS administration directory and configure the module through its install page.
IBS (Intrusion Block System): Workflow START
HTTP REQUEST FOR A PAGE
CONFIG.PHP INCLUDED
page_of_LMS.php
IBS INIZIALIZED
IS A PAGE TO BE SKIPPED?
CONFIG. LOAD
YES EXIT
table_ibsconfiguration NO config.php
ibs.class.php table_ibsfilters
table_ibslogs Depends on LMS IBS Component
BAD DATA
NEW LOG
mysql.class.php interaction Database table for IBS
FILTERS LOAD
CHECK DATA
DATA ARE CLEAN
log.class.php REDIRECTING TO BLOCK PAGE block.php
Fig. 2. Workflow of IBS's interaction
EXIT
IBS: Intrusion Block System a General Security Module for elearning Systems
501
4.5 IBS Administration The administrative interface of IBS allows two main activities: Ó log analysis, showing the last attacks detected, info about the attacker, the signature matched and the input that has triggered the alert, Ó management of the attack signature database, mainly to add new attack signatures as regular expressions.
5 Evaluation 5.1 Example of Detection As an example, we show a Windows installation of Docebo without and with IBS. The attack shown is a simple Local File Inclusion of the c:\boot.ini local file, obtained through a wrong URL parameter. Without IBS, the page obtained shows the content of the file, as it’s shown in Figure 3. When IBS is active, instead, the page shown is the one in Figure 4, listing: time of attack, method used (GET/POST), type of attack (LFI…), page attacked, suspect parameter matched against the attack signatures, client IP and browser’s User-agent string.
Fig. 3. Local File Include (LFI): the hacker retrieves the c:\boot.ini content
5.2 Testing the IBS We have tested IBS installed on many different versions of the Docebo LMS (from version 3.0.3 to version 3.6.0.4) by running four types of tests: Ó functionality test: to check that IBS works properly we have applied several classic attack examples like the ones discussed above (even if Docebo is not vulnerable to them). Ó stress test: we have used the Wapiti [9] web-application security scanner to try hundreds of known attacks and check how many were stopped by IBS. Ó known attacks test: we have applied several specific attacks known to be working on an earlier, insecure, version of Docebo, to show that IBS really protects a vulnerable Docebo installation. Ó self protection test: we attack the IBS administration pages, checking what happens if an attacker tries to exploit IBS security issues.
502
A. Conti, A. Sterbini, and M. Temperini
Fig. 4. LFI attack blocked by IBS
The results are very encouraging: Ó IBS stops more than 400 different attacks of the Wapiti test suite. Wapiti is a “fuzzer” injection security scanner, automatically generating hundreds of attacks on all application parameters. We have used Wapiti both on the oldest and on the newest versions of Docebo, both with IBS active and inactive. It’s interesting to see that Wapiti doesn’t detect any vulnerability in both Docebo versions, that means that Docebo is very well written. Testing Docebo with IBS switched on, we see that 404 Wapiti attacks are detected and blocked. Ó IBS protects both its own pages and the LMS.
6 Conclusions and Future Work We have described IBS, an LMS plugin for hardening the security of PHP-based LMSs. Its implementation has been revamped respect to an earlier version and designed so that it's more easy to add IBS to other LMSs. As an initial example, IBS has been added both to the Moodle and to the Docebo LMSs. In a near future we would like to enhance IBS along the following research lines: Ó caching: as the set of attack signature changes rarely (respect to the LMS accesses), caching the signature database would speed up the IBS filter execution; Ó block vulnerabilities scans: attackers normally try more than one attack on an LMS, scanning it for multiple types of vulnerabilities. It would be very useful to immediately blacklist (temporarily or permanently block the access
IBS: Intrusion Block System a General Security Module for elearning Systems
503
to) the IP that is currently attacking the LMS, to avoid further attempts (that could be not present in the signature database); Ó automatic import of new attack signatures: at this moment updating the IBS signature database is a manual process. It would be better to design an automatic update mechanism with a centralized attack signatures database, to drastically reduce the administration effort needed to protect several LMSs; Ó detection of unknown attacks: IBS can stop only known attacks because it needs to know the attacker's tricks to detect them in the input. If, instead we implant fake sensitive data in the LMS database (e.g. a fake administration password) we could detect the presence of an attack whenever we detect such password in the OUTPUT, even if the actual attack type is unknown.
References 1. Hope, P., Walther, B.: Web Security Testing Cookbook: Systematic Techniques to Find Problems Fast. O’Really, Sebastopol (2008) 2. Kurose, R.: Computer networking: a top-down approach. Addison-Wesley, Reading (2009) 3. OWASP. A Guide to Building Secure Web Applications and Web Services, http://www.owasp.org/index.php/Category:OWASP_Guide_Project 4. ModSecurity apache module, main reference, http://www.modsecurity.org 5. php-ids apache module, main reference, http://php-ids.org 6. Braga, G., Sterbini, A., Temperini, M.: A threats blocking plug-in for open source learning management systems. In: Lytras, M.D., Ordonez De Pablos, P., Avison, D., Sipior, J., Jin, Q., Leal, W., Uden, L., Thomas, M., Cervai, S., Horner, D. (eds.) ECH-EDUCATION 2010. Communications in Computer and Information Science, vol. 73, pp. 551–557. Springer, Heidelberg (2010) 7. Moodle Learning Management System, main reference, http://www.moodle.org 8. Docebo Learning Management System, main reference, http://www.docebo.org 9. Wapity web security scanner, http://wapiti.sourceforge.net
Interpretation of Questionnaire Survey Results in Comparison with Usage Analysis in E-Learning System for Healthcare Martin Cápay1, Zoltán Balogh1, Mária Boledovičová2, and Miroslava Mesárošová1 1
Department of Informatics, Faculty of Natural Sciences Department of Nursing, Faculty of Social Sciences and Health Care Constantine the Philosopher University in Nitra 1 Tr. A. Hlinku 1, 2 Kraskova 1 949 74 Nitra, Slovakia {mcapay,zbalogh,mboledovicova,mmesarosova}@ukf.sk 2
Abstract. Organization of the distance form of study is not an easy task. It can be quite complicated, especially regarding the communication or rather the instantaneousness and promptness of giving the feedback to the students. One of the useful methods of increasing the effectiveness of the learning process and quality of students’ results is integrating the on-line content and learning management system. In the paper we describe the applicability of different types of resources and activity modules in the e-learning courses and the worthiness of their usage. The presented ideas are supported by the outcomes of the questionnaire research realized within the e-learning study as well as the usage analysis of particular e-course “Role of a nurse in community care” which was one of the outcomes of the international project E-learning in Community Care. We will also compare the outcomes of data analyses mentioned before and try to find the reasons of these differences. Keywords: E-learning, Life Long Learning, E-course, Usage Analysis, Questionnaire.
1 Introduction The increasing economical demands on providing the health care in highly developed countries brought about the changes of health care focus and goals. Several governments enforce the development of community health care aiming to health support and sustainability as well as the prevention of diseases. All these factors require the modernization of education, especially education in the field of nursing. New facts are discovered daily, the amount of knowledge constantly grows and they must be reflected in the curricula, e.g. the study programs must be regularly innovated [1]. However, not only the content needs to be modernized but also the form has to be adapted to actual trends and needs of the society. The above mentioned points result in the necessity of cooperation between the universities and companies in the field of H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 504–516, 2011. © Springer-Verlag Berlin Heidelberg 2011
Interpretation of Questionnaire Survey Results in Comparison
505
e-learning methodology [2][3][4]. The educational system supported by e-learning offers modern form of study which is highly flexible from the point of view of time requirements and material resources. It represents very good accessibility and easy and direct communication with the tutor [2]. Internet learning environments are considered as individual and learner-centered learning environments as they contain multiple and rich resources and have an autonomous character which offers a flexible learning environment [5]. In general, e-learning is the delivery of education and training courses over the Internet and/or Intranet. It can be defined as a mixture of content (on-line courses or courseware) and communication (reaching online, emails, discussion forums) [6]. Advanced Internet technology in health care learning – including e-learning, webbased learning, online computer-based educational training, Internet-based learning, and distance learning – has been widely adopted in many developed countries [7]. Having entered the European Union the new member countries gained the opportunities for international cooperation. Czech Republic, Slovakia and Poland joined to create the study program for post-gradual specialization study and life-long learning of nurses aimed at community care. These countries were given the opportunity thanks to the international Leonardo da Vinci project named E-learning in Community Care supported by European Commission in Brussels. 1.1 Lifelong Learning Program Due to the project, the supranational partnership of 5 institutions in the V4 countries and France was established. The project was coordinated by the National Centre of Nursing and Other Health Professions in Brno, Czech Republic. The partners were Constantine the Philosopher University in Nitra, Slovakia and the Medical University of Silesia in Katowice, Poland. The aim of this two-year project was to create new specialized module-based educational program for distance form of study and to prepare the modules for e-learning study so that the nurses-to-be could learn effectively. Web-based learning provides them with a new environment that allows them to develop professional skills and knowledge in self-initiated learning [8]. Analysis of community care was carried out by the participants of the project already in 2005. It showed that the Czech Republic, Slovakia and Poland were, more or less, at the same level. The educational programs which would effectively prepare employees in this area for their future tasks did not exist in these countries at that time. However, this sphere of health care has already been well known in the developed countries for many years and a serious attention has been paid to it thanks to its high social importance [9].
2 Structure of Study Program and the Methods of Educational Organization and Management Study program contains the basic specialistic module that is compulsory for all students separately for midwives and for nurses. Furthermore, there are four selective modules for nurses as well as for midwives (Fig. 1).
506
M. Cápay et al.
Fig. 1. List of the courses of international study program
Since the most of the content authors were not digitally skilled enough, it was necessary to find a partner who would be responsible for the technical part of the courses. This role was undertaken by the Department of Informatics, Faculty of Natural Sciences, Constantine the Philosopher University in Nitra. Their task was not only to create the resources and activities in the e-course and fill them in with the materials but also to find quick and easy way of collecting the materials and other important information necessary for the course creation from the authors. The usage of structured MS Word template was proved as an applicable and also very convenient solution. The document contained all the information necessary for the course design, i.e. the study materials, glossary entries, quiz questions, written assignments, etc. - all of them in a logical and comprehensible structure. Text of the study materials was written linearly according to the agreed conventions. Some of the materials were very complex, the greatest of them would have about 620 pages in printed version. For all the modules national and also international versions were created. They were implemented in the LMS Moodle environment. The e-learning portal communicates with LDAP server, which is interconnected with academic information system (students’ profiles) and SAP/SOFIA (employees’ profiles). Thanks to this interconnection, when users are students at our university, it is not necessary to register them manually [10]. The structure of online course needs to be, easy to follow, relevant, and learnercentered [11]. The elaborated e-courses have a unified structure which was approved in advance by the representatives of each partner organization [12]. We use the same structure also in our Department template and we try to divide each course into introductory information, lessons and the final part [13]. The main study materials (a textbook) are complemented by automatically interconnected vocabularies of terms, or video recordings. The dictionary was set-up in the mode of so-called automated switching of key words in all study materials (except for the closing quiz, where this switching was turned off [14]. From the point of view of the courses the following were used: web page – study material, label and glossary – help. There were also used an activity modules, such as forum, assignment – upload a single file and quiz.
3 Research Outputs In the final phase of the project, the chosen courses were tested on a selected group of participants. In Slovakia, it was the course „The Role of a Nurse in the Community
Interpretation of Questionnaire Survey Results in Comparison
507
Care“, which was tested by 30 students of external form of study at the Department of Nursing. The main objectives of the research were to get relevant feedback on the quality of chosen module from the created study programme, especially the information on the suitability of study material structure, quiz structure and content as well as the quantity information about the frequency of usage of particular e-course modules. The tests used in the course served only as the feedback on the quality of theoretical knowledge before the state exams. Furthermore, the outcome of the course was not expressed as a mark but in the form of credits that were practically included in the professional development of the participants. The only requirement was submitting the obligatory assignments. Therefore, we could not evaluate the successfulness according to the final mark. That was the main reason why we decided to use different research methods, specifically questionnaire and log file analysis. 3.1 Methodology of Research Our aim was to evaluate the course quality using various approaches. We used the following research methods: • • •
non-anonymous entrance questionnaire – it served to get the information about the statistical sample from the point of view of age, employment and practice, final (output) questionnaire – it provided the feedback about the quality of content, tests, used means, process of study, method and effectiveness of study as well as the attitude of the participants toward e-learning, usage analysis – analysis of log file recorded during the participants’ study served to formulate the association rules of participants’ behaviour in the ecourse as well as the sequence and frequency of electronic sources accesses.
Participants of the course filled in the final questionnaire, outcomes of which we present in the following paragraphs. The outputs of non-anonymous questionnaire investigation usually tend to be very subjective which was the reason why we decided to rely not only on the participants’ responses but to find also some other, more objective point of view at the course modules. We were interested in how the users studied in the course, what was their navigation (the transition between the modules), which materials they accessed (eventually the number of accesses) etc. We needed to create the model of user’s behaviour in the particular course. Very interesting and useful course usage information were gained also from the log-on file analysis. That can help us better understand the behaviour of the student in the e-learning environment. During the data preparation we took into account recommendations resulting from series of experiments examining the impact of individual steps of data preprocessing on quantity and quality of extracted rules [15][16][17]. Our research question was formulated as follows: Is there a difference between the responses in questionnaire investigation and the real logs in the e-course? This question was stated after the first differences were found out. Comparing the responses and system generated statistics we can reconstruct and analyse the real process of study of the participants. Furthermore, we can try to interpret the difference between the responses and their real behaviour.
508
M. Cápay et al.
3.2 Basic Characteristics of the Research Participants All participants of the tested course had finished bachelors degree. Table 1 illustrates the age categories of the participants. As we can see, most of them were from 31 to 45 years old. Table 1. Age categories of participants < 30 year-old 26 %
31 - 45 year-old 62 %
46 – 60 year-old 12 %
All participants were experienced people from the practice, mainly employed in a hospital. More than 10 years of practice was stated by 79% of participants (Table 2). Table 2. Number of years of practice 20 47 %
3.3 Description of Behaviour of the Users in the e-Course The data gained by the analysis of log file were visualised in Figure 2 and Figure 4 [9]. The interaction plot (Fig. 2) visualises the frequency of accesses to the basic system modules (assignment, upload, mainpage, quiz, etc.) according to the phase of study, Category x Term.
Fig. 2. Interaction Plot - Category x Term (Source: own research)
Interpretation of Questionnaire Survey Results in Comparison
509
The graph shows that in the first phase the students most often accessed the quizzes. They completed 9 self-tests (altogether 720 attempts) and one closing test. The second most common activity was submitting the assignments (there was more than 287 assignments uploaded into the course). In the second phase, the number of accesses to both type of activities decreased rapidly (Fig 2.). However, they still belong among the most visited course modules (either the resources or activities). Similar outcomes would be found out if we analysed the particular chapters of the course separately [9][12]. Integrated tests were motivating for their learning [2]. The results may be significantly different if the online program is perceived as relevant and applicable to the employee’s professional role in the organization [11]. Also the fact that the self-test questions were issued from the oral state exam syllabus was also great inhibitor of the students’ activities. There is a medium dependency between the number of accesses into individual categories of parts of the course and time periods of study [9], i.e. the number of accesses to individual parts of the course (Category) depends on the period of study (Term).
Fig. 3. Frequencies of Access for student, teacher and creator role (Source: own research)
The graph (Fig. 3) visualizes frequencies of activities - Frequencies of access x Month, during 13 months. There are periods of study (Month) on the x axis and observed frequencies on the y axis, while one polygon is drawn for each role (student, teacher and creator of the course). In the graph, we can see the great difference in the amount of access logs of the students in comparison with those of the tutors and the course creators (in the course there were several tutors as well as creators). In the first months after enrolling into the course there was almost no or only very low activity of the tutors. On the other hand, the course creators did the last modifications and settings in the courses (their activity was much higher in the preceding period). The greatest activity of the students can be seen short before the deadlines of particular activities and the tutors were very active at that time, too.
510
M. Cápay et al.
Fig. 4. Web graph – visualization of found rules (Source: own research)
In the following part we are going to describe the results of association rules analysis, which represents a non-sequential attitude to the data being analysed. We shall not analyse sequences, but transactions, i.e. we shall not include the time variable into the analysis. In our case transaction represents a variety of visited categories of the course by one user. Regarding our data, we shall consider one transaction to be the categories of parts of the course visited by one user for an observed period of time. The size of the node in the web graph represents the support of an element (frequency of category accesses), the line-width - the support of the rule (frequency of a pair of consecutively accessed categories) and the brightness of the line - the lift of the rule (specifies how many times more frequently the visited categories occurred jointly than in case if they were statistically independent) [9]. As we can see in the web graph that visualizes associative rules, the most visited categories of the course components were: main page, quiz (self-test), forum, practice assignment, report and feedback entrance output as well as the combination of pairs of these categories [9]. The least visited, on the other hand, were: study material, help and literature. The different outcomes also came out when we analysed the questionnaires dealing with the communication in the forums. According to the questionnaire responses we can assume that only small amount of students participated in the discussions, they say that only one third of the students were active in the forums (Fig. 5), the rest only occasionally. But this opposes the association rules in the web graph (Fig. 4). According to the final questionnaire, almost 37% of the students printed the electronic materials, 7% printed only some topics, and the rest studied the materials
Interpretation of Questionnaire Survey Results in Comparison
511
Fig. 5. Questionnaire responses on the activity in the discussions in the forums (Source: own research)
directly from the computer screen. However, the log file analysis proved that study materials were one of three least visited types of resources and/or activities. As many as 90% of the students stated that the opportunity to study at home was very comfortable for them – this was confirmed also by the analysis of the log file where the most of the course access logs were from the computers outside the university. Each chapter contained also the list of additional resources. The outcomes of the questionnaires showed that 73% of the students used them during their study. The web graph, on the other hand, shows that literature was the least visited category. We can suppose that additional resources were seen by the great amount of the students but most of them visited them only once. 3.4 E-Learning The next part of the questionnaire was focused on the forms of distance learning. Elearning as the form of education interested quite a lot of participants (Tab. 3). The responses also showed that only 13 per cent participants had previous experience with an e-learning method of instruction. Table 3. The preferred type of distance learning E-learning via internet 63 %
Traditional, printed material, CD 23 %
Blended learning 14 %
The pictures and videos as the form of study materials were preferred in comparison to text, animation and audio (Fig. 6).
512
M. Cápay et al.
Fig. 6. The preferred type of study material (Source: own research)
The chart (Fig. 7) shows the number of responses to the questions: Q1: What was your study supported by e-learning like? Answer: Very easy (1) … Very difficult (5) Q2: How much have you learnt during the study? Answer: Very much (1) … Very little (5)
Fig. 7. Responses on question Q1 and Q2 (Source: own research)
Only 3 % of participants presented that such form of study did not suit them, on the contrary, as many as 80 % stated that they would like to participate in a course like that again. Since the participants were at the same time students of Constantine the Philosopher University in Nitra, we were interested in how they perceive the opportunity to study externally and learn via electronic means. In figure 4 we present the questionnaire responses to the question on recommended type of study in which e-learning would be used. The outcomes say that they recommend this method mainly in lifelong learning.
Interpretation of Questionnaire Survey Results in Comparison
513
Table 4. Recommended type of study for e-learning method Specialisational 27 %
Qualificational 13 %
Life-long 60 %
Do not recommend 0
E-learning form of study is available and accessible 24 hours a day. Each student has his/her own routines and is used to study in different time periods according to individual needs. The participants’ responses regarding the time of a day most suitable for e-learning are presented in Table 5. We can see that evening and night study was most preferred. Table 5. Time of studying via e-learning morning
afternoon
evening
Overnight
any time
7%
17 %
50 %
20 %
7%
Fig. 8. Log on entries categorized by the time (Source: own research)
On the other hand, if we take a look at the log file outcome, we see that the greatest amount of accesses was in the afternoon. The analysis showed that the most access entries were between 4 pm and 5 pm. Except of three hours at night, the activity of the students was distributed among the whole day (Fig. 8). However, the evening activity was still much higher in comparison with the activity during the day.
4 Discussion The obtained data, both from the questionnaire and mainly from the adjusted log files of e-courses, were used for finding out certain rules of behaviour of course participants, using usage analysis. It has been shown that the course was considered too
514
M. Cápay et al.
textual and the students missed pictures, animations and other multimedia applications, which was also proved by the analysis of the visit rate of categories of parts of the course [9]. The course was mainly used for the communication among the participants as well as the tool for the assignment distribution and collecting, Students were motivated for active studying also by the allotment of 20 credits for successful passing of this course necessary for their career advancement (the assignments were compulsory for the students to obtain this credits necessary for their carrier advancement). Among the most frequent moves from the main page we can name displaying the list of assignments (Assignment view all), displaying the list of users (User view all) and displaying of the list of test (Quiz view all). Study materials were used much less than we expected. However, that does not mean that the students would not read them. It is possible that they printed them at their first display of them. On the other hand, students used tests quite a lot as they seem to be a good preparation for the state exams (77 % of them stated that self-tests were most helpful activity). The quiz reports show that a lot of students repeated the quiz attempts several times till they gained 100%. The outcomes of self-tests were displayed for some time for all the students so they could compare the results which increased the competition among them.
5 Conclusion In the beginning we stated the question if there is a difference between the responses of questionnaire investigation and the real log entries in the e-course. Thanks to the gained data we were able to formulate several association rules of the users behaviour in the course that were described in this paper, especially in the discussion. In some cases, the differences were found in how students responded in the questionnaire and what the analysis of log file proved. The possible reason for this may be that they knew the questionnaire was not anonymous while in case of the study itself they felt more independently. We can assume that the questionnaire method is not relevant for gaining the information about the process of study in the e-course because the user can embellish the responses or sometimes he/she is not even able to say how often he/she accessed the other parts of the course. One reason for this can be that some processes can become trivial and the user does not see them as important. Via analysis like the one described above we will be able to make the e-courses more effective and attractive for the students to perform better effectiveness of e-learning study. It would be interesting to compare the questionnaire responses and the usage analysis separately for particular participants – that may be the objective of another research. The dissemination of the outcomes was carried in MF&Partners Consulting in Lyon [18] which is the expert for supranational management and also took part in the valorization of the project. All the outcomes of the project were presented there, i.e. the e-courses, books, and proceedings, and the coordinators of the project as well as the partners expressed high satisfaction with all presented materials. In spite of the fact that the project finished, the cooperation among the Slovak partners continues in another project, this time at the level of faculties. The project Virtual Faculty – Distance Learning at the Faculty of Social Sciences and Nursing of Constantine the Philosopher
Interpretation of Questionnaire Survey Results in Comparison
515
University in Nitra started in 2010. Its aim is to create virtual faculty that would be based on e-learning courses available for the students 24 hours a day, 7 days a week. Its effectiveness requires the involvement of pedagogues trained for e-learning education and able not only to manage the education but also to prepare appropriate materials and activities for this type of education. The processes and outcomes defined and verified at the level of one faculty can be, in case of successful realization, applied also to the other faculties that are at the moment functioning rather individually. Acknowledgments. This publication is published thanks to the financial support of the project KEGA 368-043UKF-4/2010 named: Implementation of elements of interactivity in the contentual transformation of professional informatics subjects, and the project ESF 26110230026: A-CENTRE of the Faculty of Natural Sciences, Constantine the Philosopher University in Nitra, Centre of Innovative Education.
References 1. Hvorecký, J., Drlík, M.: Enhancing Quality of E-Learning. In: International Conference ICL 2008, pp. 54–65. Kassel University Press (2008) 2. Reime, M.H., Harris, A., Aksnes, J., Mikkelsen, J.: The most successful method in teaching nursing students infection control – E-learning or lecture? Nurse Education Today 28(7), 798–806 (2008) 3. Pfefferle, P.I., Van den Stock, E., Nauerth, A.: The LEONARDO-DA-VINCI pilot project “e-learning -assistant” – Situation-based learning in nursing education. Nurse Education Today 30(5), 411–419 (2010) 4. Chang, W., Sheen, S.H., Chang, P., Lee, P.: Developing an e-learning education programme for staff nurses: Processes and outcomes. Nurse Education Today 28(7), 822–828 (2008) 5. Chu, R.J., Tsai, C.C.: Self-directed learning readiness, Internet self-efficacy, and preferences for constructivist Internet-based learning environments among higher aged adults. Journal of Computer Assisted Learning 25, 489–501 (2009) 6. Shen, Z., Miao, C., Gay, R., Low, C.P.: Personalized e-Learning – a Goal Oriented Approach. In: 7th WSEAS International Conference on Distance Learning and Web Engineering (DIWEB 2007), Beijing, China, pp. 304–309 (2007) 7. Anderson, E.T., Mercer, Z.B.: Impact of community health content on nurse practitioner practice: a comparison of classroom and web-based teaching. Nursing Education Perspectives 25(4), 171–175 (2004) 8. Liang, J.C., Wu, S.H.: Nurses’ motivations for web-based learning and the role of Internet self-efficacy. Innovations in Education and Teaching International 47(1), 25–37 (2010) 9. Balogh, Z., Munk, M.: Turčáni. M., Cápay, M.: Usage Analysis in e-Learning System for Healthcare. In: The 4th International Conference on Application of Information and Communication Technologies AICT 2010, pp. 131–136. IEEE Press, Tashkent (2010) 10. Drlík, M., Švec, P., Skalka, J., Kapusta, J.: E-learning portal integration to the information system of Constantine the Philosopher University in Nitra, Slovakia. In: EUNIS 2008 Vision IT: visions for IT in higher education, University of Aarhus, Aarhus (2008) 11. Gabriel, M., Longman, S.: Staff Perceptions of E-Learning in a Community Health Care Organization. Journal of Distance Learning Administration VII(III) (2004), http://www.westga.edu/~distance/ojdla/fall73/gabriel73.html
516
M. Cápay et al.
12. Balogh, Z., Turčáni, M., Burianová, M.: Modelling web-based educational activities within the combined forms of education with the support of applied informatics with an elearning support. In: Proceeding of the 7th International Conference Efficiency and Responsibility in Education 2010, pp. 14–24. Czech University of Life Sciences Press, Praha (2010) 13. Cápay, M., Tomanová, J.: Enhancing the Quality of Administration, Teaching and Testing of Computer Science Using Learning Management System. WSEAS Transactions on Information Science & Applications, 1126–1136 (2010) 14. Cápay, M., Balogh, Z., Burianová, M.: Using of e-learning in teaching of non-medical health personnel. In: Innovation Process in E-learning, pp. 1–5 (2009) 15. Munk, M., Kapusta, J., Švec, P.: Data preprocessing dependency for web usage mining based on sequence rule analysis. In: IADIS European Conference on Data Mining 2009 , pp. 179–181, Algarve (2009) 16. Munk, M., Kapusta, J., Švec, P., Turčáni, M.: Data Advance Preparation Factors Affecting Results of Sequence Rule Analysis in Web Log Mining. E+M Economics and Management 13(4), 143–160 (2010a) 17. Munk, M., Kapusta, J., Švec, P.: Data Preprocessing Evaluation for Web Log Mining: Reconstruction of Activities of a Web Visitor. Procedia Computer Science 1(1), 2267–2274 (2010b) 18. E-learning in community care European meeting, http://www.mfpartnersconsulting.com/cariboost_files/ microsoft_20word_20-_20press_20release_20elearning_20october_202009.pdf
Dynamic Calculation of Concept Difficulty Based on Choquet Fuzzy Integral and the Learner Model Ahmad Kardan and Roya Hosseini Department of Computer Engineering and IT, Amirkabir University of Technology 424 Hafez Ave, Tehran, Iran, 15875-4413 {aakardan,hosseini.ec}@aut.ac.ir
Abstract. Adaptation and personalization of information in E-learning systems plays a significant role in supporting the learner during the learning process. Most personalized systems consider learner preferences, interest, and browsing patterns for providing adaptive presentation and adaptive navigation support. However, these systems usually neglect to consider the dependence among the learning concept difficulty and the learner model. Generally, a learning concept has varied difficulty for learners with different levels of knowledge. Hence, to provide a more personalized and efficient learning path with learning concepts difficulty that are highly matched to the learner’s knowledge, this paper presents a novel method for dynamic calculation of difficulty level for concepts of the certain knowledge domain based on Choquet Fuzzy Integral and the learner knowledge and behavioral model. Finally, a numerical analysis is provided to illustrate the proposed method. Keywords: E-learning, Difficulty Level, Choquet Fuzzy Integral, Learner Model, Personalization, Learning Concept.
1 Introduction Rapid progress of the Internet as well as high adoption of Information Technology in educational context, have positively influenced educational processes. Moreover, significant role of Information and Communication Technologies (ICTs) in improvement of learning has greatly changed the traditional approaches on education. In this regard, educational tools and methods have been greatly changed to support learning i.e. gaining knowledge and experience through instruction or study. Computer-Based Training (CBT) is among the staple tools in the learning process. According to the predicted results and based on literature researches, E-learning is a new form of CBT with the potentiality to increase the efficiency and quality of learning [1]. However, as numerous Web-Based tutoring systems have been developed, a great quantity of hypermedia in courseware has created information, cognitive overload and disorientation, such that learners are unable to learn very efficiently [2]. To overcome this problem, Adaptive Educational Hypermedia Systems (AEHS) were introduced in the late 1990s with the aim of increasing hypermedia efficiency by personalization. AEHS adapts the presentation of information to the learner model by H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 517–530, 2011. © Springer-Verlag Berlin Heidelberg 2011
518
A. Kardan and R. Hosseini
tracking the user performance during his/her navigation and then updating the information of the learner model accordingly [3]. Personalized education not only supports learners to learn better by providing different strategies to create various learning experiences, but also considers teachers’ tutorial needs in designing instructional packages [4]. Web-Based education considers a wide range of learners with differences in the knowledge level, age, experience, and cultural backgrounds. Hence, personalization in this context is of high significance [5]. According to [2] adaptation is quintessential in Web-Based education for mainly two reasons. First, most of the Web-Based applications have much wider variety of users with different interests than standalone applications. Second, the user is usually alone while working with a Web-Based tutoring system. So far, different mechanisms have been introduced for the personalization and adaptation of learning process such as Adaptive Presentation, Adaptive Navigation Support, Curriculum Sequencing, Intelligent Analysis of Student Solutions, and Problem Solving Support Technologies [6]. ActiveMath [7], Personal Reader [8], iClass [9], PIMS [10], PELS [11], and ELM-ART [12] are example of systems which use these techniques to provide adaptive learning environment. Motivation and problem definition: Currently, most adaptive tutoring systems consider learner preferences, interests, and browsing patterns when investigating learner behavior for personalized services. One important fact that is mostly ignored for providing personalized services is the dependence among the difficulty level of learning concepts and the learner’s knowledge. To illustrate it more, a learning concept may have different difficulty for learners with different knowledge level. In the current studies, difficulty level of concepts is determined statically and independent of the learner model by instructors. Obviously, dynamic problem difficulty is better matched to the leaner’s knowledge level and hence, can ease the comprehension of learning concepts. The related work in [13] aims to calculate problem difficulty with regards to the learner model. The achieved results of [13] proves the fact that dynamic computed problem difficulty performs well for a wide range of learners, whereas static problem complexity performs well for students of intermediate ability, but rather badly for beginners and advanced learners. The main drawback of the proposed method in [13] is the use of constraint based modeling for calculation of problem difficulty. According to [14] this model requires a huge set of constraints to be defined and this adds to its complexity. Moreover, it is not always possible to define constraint for some knowledge domains. Findings: The most significant finding of this research is in presenting a method for calculation of concept difficulty with the following strengths: • Constraint-Independent learner model which consists of the learner knowledge and behavioral model and is described in section 3.1. • Constraint-Independent knowledge domain by using OWL which will be described in section 3.2. • Dynamic calculation of learning concept difficulty by using Choquet fuzzy integral which will be described in section 3.3.
Dynamic Calculation of Concept Difficulty
519
The novelty of this research is in using Fuzzy Integral and specifically Choquet Fuzzy Integral for calculation of learning concept difficulty. Leaner model will be used for retrieving some necessary parameters. OWL (Ontology Web Language) is also used for classification of concepts in the certain knowledge domain and retrieval of other complementary parameters. So, the organization of this paper is as follows: The following section presents the architecture of the system for the proposed method. The method is addressed in section 3. Section 4 provides a numerical analysis for the proposed method. Finally, Section 5 provides a conclusion for this work.
2 System Architecture This section presents the architecture of the system that dynamically calculates the learning concept difficulty based on Choquet Fuzzy Integral and the learner model. As it is shown in Fig. 1, this system consists of six major units as well as five major databases. The six major units are: Ontology Modeling Unit, Learner Interface Unit, Dynamic Concept Difficulty Calculation (DCDC) Unit, Feedback Unit, Content Recommendation Unit, and Learning Content Repository. The five major databases are: Teacher Account Database, Knowledge Domain Ontology Database, Learner Account Database, Concept Difficulty Database, and Learner Model Database. The Ontology Modeling Unit builds the subject ontology by using the concepts and their relations that are provided by the teacher. The processes in this unit are performed in an Off-line manner. The Learner Interface Unit aims to provide a flexible learning interface for learners to interact with the Feedback Unit and the Content Recommendation Unit. DCDC Unit dynamically calculates concept difficulty based on the learner model and the knowledge domain ontology. The Feedback Unit aims to collect learner explicit feedback information from the Learner Interface Unit and stores it in the Learner Model Database. Content Recommendation Unit is in charge of recommending suitable contents to learner based on the learner model and difficulty of learning concepts. The numbers on the arrows in Fig. 1 show the steps related to data flow within the system. These steps are as follows: Step 1. The Teacher logs in the system by providing his/her account information. Step 2. The teacher login information is compared to the information in the Teacher Account Database. If the information is valid, then teacher models the ontology for the knowledge domain. Ontology Modeling Process: The teacher models the knowledge domain by providing a set of concepts with their relations. Step 3. The output of the Ontology Modeling Process, namely the Knowledge Domain Ontology, is stored in the Knowledge Domain Ontology Database. Step 4. The relations among the Knowledge Domain Ontology and the Concepts Hierarchy are determined by the teacher. Each node in the Knowledge Domain Ontology is related to at least one concept in the Concepts Hierarchy. Step 5. The learner logs in the system by providing his/her user account information. Step 6. The user login information is compared to the information in the Learner Account Database.
520
A. Kardan and R. Hosseini
Off-line Domain Model Design Teacher Account Database
4
Concepts Hierarchy
Knowledge Domain Ontology Database
2 3
Ontology Modeling Unit
1
Learner Account Database
8
6 5/16 Concept Difficulty Database
10
Dynamic Concept Difficulty Calculation (DCDC) Unit
7
Learner Interface Unit
15
14 9 Learning Content Repository
Learner Model Database
11 12
13
18
Content Recommendation Unit 19
17 Feedback Unit
Fig. 1. The System Architecture
Step 7. If the information is valid, then the user selects the concept to be learned from the Concepts Hierarchy and step 8 is followed. Otherwise, the learner remains on Step 5. Step 8. To calculate the concept difficulty for the concepts that are related to the user selected concept, knowledge domain ontology is used as one of the inputs of the DCDC Unit. Step 9. The learner information is the other input that is used by the DCDC unit. DCDC Process: This process uses the inputs provided by Step 8 and Step 9 and then calculates the concept difficulty by using Choquet Fuzzy Integral. Step 10. Concept difficulty is temporarily stored in the Concept Difficulty Database. Step 11. To recommend suitable contents, the information of the Concept Difficulty Database serves as one of the inputs of the Content Recommendation Unit.
Dynamic Calculation of Concept Difficulty
521
Step 12. The information in the learner model is the other input that is used by the Content Recommendation Unit. Step 13. The information required for content recommendation is also related to the contents in the Learning Content Repository. This information serves as the third input of the Content Recommendation Unit. Content Recommendation Process: Using the information provided by Step 11, Step 12, and Step 13, this unit selects suitable learning contents for learner. Step 14. The list of recommended contents is transferred to the Learner Interface Unit. Step 15. The Learner Interface Unit presents the recommended contents to learner. Step 16. The Learner Interface Unit tracks the learner navigation during the learning process. Also, in the case of the test, this unit receives the learner’s response. Step 17. The Leaner Interface Unit sends the received Feedback to the Feedback Unit for updating the information in the learner model and recommending suitable contents. Step 18. In this step, the Feedback Unit updates the information in the learner model based on the received feedback. Step 19. To recommend the learning content based on the learner feedback, concept difficulty level is calculated again. Steps 7 to Step 19 are repeated until the learner learns the selected concept from the Concepts Hierarchy.
3 Methodology This section presents the proposed method for dynamic calculation of learning concept difficulty by using Choquet Fuzzy Integral and the learner model. In this regard, it is quintessential to first determine the influential parameters for calculation of the concept difficulty. Second, the knowledge domain should be modeled. Therefore, the following subsections first describe each of these steps and finally describe the proposed method. 3.1 Parameters Dynamic calculation of difficulty level for a concept relies on the information related to the learner and the knowledge domain. Therefore, both the learner and the domain model are two influential elements in determining the learning concept difficulty. The learner model consists of two parts, namely the learner knowledge model and the learner behavioral model. The former models the learner’s knowledge in concepts of the knowledge domain and the latter models the learner behavior and more specifically the number of time the learner studies each concept of the knowledge domain. The details of these parameters are as follows: Domain Model: The domain model consists of the concepts as well as the relations between them. In this paper, the concept that the difficulty level is calculated for it is called the Main Concept. Obviously, the increase in the number of the relations between the Main Concept and other concepts has direct impact on the Main Concept difficulty. Semantic relationship between the Main concept and other concepts of the
522
A. Kardan and R. Hosseini
certain knowledge domain can be described in two forms: 1- Prerequisite Concepts that are necessary to perceive the Main Concept, and 2- Concepts that are related to the main concept and are part of the same Sub-domain. These concepts are called Related Concepts. Learner Behavioral Model: This model stores the information about the learner’s activity, i.e. the number of times the certain concept is studied. Hence, the learner activity in the Main Concept, its Prerequisite Concepts, and its Related Concepts is important in determining the difficulty level of the Main Concept. The learner study activity has direct impact on the concept difficulty. As the difficulty of the Main concept increases, it is more probable for learner to study it again. Leaner Knowledge Model: This model reflects the learner’s knowledge in the concepts of the knowledge domain. This model is used to determine the learner’s knowledge in the Main concept, its Prerequisite Concepts, and its Related Concepts. The lower the learner’s knowledge in these concepts, the more the difficulty level of the Main Concept will be. All the identified parameters in the Domain Model and the Learner Behavioral Model have direct relationship with the concept difficulty. However, the learner Knowledge level has inverse relationship with the Main Concept difficulty. Hence, to ease the calculation of the Main Concept difficulty, the learner’s wrong answer to the tests related to the Main concept, its Prerequisite Concepts, and its Related Concepts is considered as the input parameter in the proposed method. Hence, all the parameters will have direct relationship with the Main Concept. So, the influential parameters for determining the concept difficulty can be summarized as follows: • Number of the relations between the Main Concept, its Prerequisite Concepts, and its Related Concepts. • Learner’s wrong answers to the tests: related to Main Concept, its Prerequisite Concepts, and its Related Concepts. • Learner’s study activity: in the Main Concept, its Prerequisite Concepts, and its Related Concepts. In this research, an Overlay Model is used for modeling the learner’s knowledge. To this end, the knowledge domain needs to be modeled. Ontology and Concept Map are two common tools that are widely deployed for this purpose. Since the Ontology models the hierarchical structure among the concepts, it is used to model the knowledge domain in this research. The following section presents the steps required to model the knowledge domain by using OWL which is a standard ontology language. 3.2 Knowledge Domain Modeling Using OWL The proposed method uses ontology to model the knowledge domain. Ontology describes the concepts in the domain and also the relationships that hold between those concepts. Different ontology languages provide different facilities. The most recent development in standard ontology languages is OWL (Ontology Web Language) from the World Wide Web Consortium (W3C) [15]. In this paper, OWL is used for modeling the knowledge domain. The modeling process consists of the following steps:
Dynamic Calculation of Concept Difficulty
523
Step 1. An expert determines the subject i.e. the domain of knowledge. Step 2. Concepts in the knowledge domain are defined by an expert. Step 3. The relationships between the concepts of the knowledge domain are determined. Step 4. OWL is used to model the knowledge domain. The output of this step is the knowledge domain ontology. After the preparation of the knowledge domain ontology, the difficulty of knowledge domain concepts can be calculated by using the parameters introduced in section 3.1. The following section, presents the proposed method. 3.3 Concept Difficulty Calculation Using Choquet Fuzzy Integral After determining the value of influential parameters on difficulty of a concept, it is necessary to aggregate these values to a single value that represents the concept difficulty. Weighted Arithmetic Mean and the Regression methods are among the most common aggregation operators. However, none of these operators is able to model in some understandable way an interaction between the input parameters [16]. Hence, these operators are not suitable for calculation of concept difficulty. The Choquet Fuzzy Integral is an operator that is used for the aggregation of the interdependent parameters based on the fuzzy measure. According to [17] the suitability of this Integral is proved for the Real-Time applications. Therefore, Choquet Fuzzy Integral can improve the response time of an E-learning system [18]. In this paper, Choquet Fuzzy Integral is used to aggregate the influential parameters on concept difficulty. The definition of Choquet Fuzzy Integral is as follows: Definition: Choquet Fuzzy Integral is an integral that uses fuzzy measure to aggregate the set of input parameters. According to [19] it is defined as Eq. 1: n
E g (h) = ∫ h(.) g (.) = ∑[h( xi ) − h( xi −1 )]g ( Ai ) . X
where
(1)
i =1
h( x1 ) ≤ h( x2 ) ≤ .............. ≤ h( xn ) and h( x0 ) = 0 . The definition of each
variable used in Eq. 1 is provided herein.
n : Number of the input parameters. In this research, n = 3 .
X:
Input parameters of the Choquet Fuzzy Integral. This set is shown as
X = {x1 , x2 ,..., xn } .
x1 : Represents the number of the relations between the Main Concept, its Prerequisite Concepts, and its Related Concepts.
x2 : Represents Learner’s wrong answers to the tests related to the Main Concept, its Prerequisite Concepts, and its Related Concepts.
524
A. Kardan and R. Hosseini
x3 : Represents Learner’s activities during his/her course of study regarding a Main Concept, any Prerequisite for the Main Concept, and also Related Concepts. Hence, the parameter x3 is modeled by counting the number of times the Main Concept, its Prerequisite Concepts, and its Related Concepts are studied by the learner. h : The function that determines the value of the input variables. For example, h( xi ) is the value of
xi .
g : The λ-fuzzy measure which is defined as g : P( X ) → [0,1] such that:
g (φ ) = 0, g ( X ) = 1 . If A, B ∈ P ( X ) and A ⊂ B , then g ( A) ≤ g ( B ) . A, B ⊂ X , A ∩ B = φ , g( A ∪ B) = g( A) + g(B) + λg ( A) g (B) .
(2) (3) (4)
for some fixed λ > −1 . The value of λ is found from the equation g ( X ) = 1 that is equivalent to solve the Eq. 5:
gλ ( X ) =
1
λ
n
(∏ (1 + λg i ) − 1) , λ ≠ 0 .
(5)
i =1
Ai : Set of input parameters in the form of Ai = {xi , xi +1 ,....., xn } . g ( Ai ) : This value is recursively computed by Eq. 6 and Eq. 7: g ( An ) = g ({ x n }) = g n . g ( Ai ) = g i + g ( Ai +1 ) + λg i g ( Ai +1 ) .
(6)
1≤ i < n
(7)
This research uses the Choquet Fuzzy Integral to calculate the difficulty level of the concepts in the knowledge domain through the following steps: Step 1. Having logged in the system, the learner selects the concept which he/she wants to learn from the Concepts Hierarchy. Step 2. The system searches and selects the set of concepts that are related to the learner’s selected concept. Step 3. Regarding the learner knowledge model, the concepts that the learner has already learned are removed from the concepts obtained in Step 2. The result is the set of Main Concepts that their difficulty needs to be calculated by Choquet Fuzzy Integral. For each of the Main Concept, Step 4 through Step 13 should be followed: Step 4. The first input parameter x1 is determined by counting the number of Prerequisite and Related Concepts of the Main Concept in the Knowledge Domain Ontology.
Dynamic Calculation of Concept Difficulty
525
Step 5. The Second input parameter x2 is determined by retrieving the Learner’s wrong answers to the tests related to Main Concept, its Prerequisite Concepts, and its Related Concepts from the learner knowledge model. Step 6. The third input parameter x3 is determined by retrieving the Learner’s study activity in the Main Concept, its Prerequisite Concepts, and its Related Concepts from the learner behavioral model. Step 7. The values obtained in Step 4, Step 5, and Step 6 are aggregated by using Choquet Fuzzy Integral as mentioned in Eq. 1. h( x1 ) is the number of Prerequisite and Related Concepts of the Main Concept; h( x2 ) is Learner’s wrong answers to tests related to the Main Concept, its Prerequisite Concepts, and its Related Concepts; and h( x3 ) is the learner’s study activity in the Main Concept, its Prerequisite Concepts, and its Related Concepts. Step 8. According to the assumption of Eq. 1, the values of h( xi ) are sorted ascendingly such that h( x1 ) ≤ h( x 2 ) ≤ ... ≤ h( x n ) . Step 9. The value of the fuzzy measure is calculated for each of the three input parameters. This research uses the function introduced in [20] which calculates the fuzzy measure values as Eq. 8:
gi =
1 . 1 + d (hi , h0 )
i = 1,2,3
(8)
gi represents the fuzzy measure value of xi . d (hi , h0 ) is the Euclidean distance between h( xi ) and h0 . h0 is an optional value from which the distance of all h( xi ) is calculated. In this research, it is assumed that h0 = 0 . Since the values of h( xi ) are always positive, the range of g i is between Step 10.
zero and one. λ is calculated by using Eq. 5. For the three input parameters, Eq. 5 is equivalent to solve Eq. 9:
(g1g2 g3 ) × λ2 + (g1g2 + g1g3 + g2 g3 ) × λ + (g1 + g2 + g3 −1) = 0 . Step 11.
(9)
g ( A1 ) , g ( A2 ) , and g ( A3 ) are calculated according to the value of λ
using Eq. 6 and Eq. 7. Step 12. To calculate the difficulty of the Main Concept, the values of h( xi ) and g are used in Eq. 1. For three input parameter, Eq. 1 is equivalent to solve Eq. 10:
Eg (h) = h( x1 ) × g ( A1 ) + (h( x2 ) − h( x1 )) × g ( A2 ) + (h( x3 ) − h( x2 )) × g ( A3 ) . Step 13. The value of
E g (h) is the difficulty of the Main Concept for the learner.
(10)
526
A. Kardan and R. Hosseini
4 Numerical Analysis This section provides a numerical analysis of the proposed method. To this end, Java Programming Language is selected as the subject of the knowledge domain. Java Curriculum for AP ™ Computer Science is used for determining the knowledge domain ontology [21]. The modeling of this ontology is done by using Protégé software [22]. The ontology consists of 187 concepts which are categorized in 33 Sub-domains. The Concepts Hierarchy is designed according to the 33 Sub-domains. Fig. 2 shows part of the output of the knowledge domain modeling process using Protégé. This figure depicts the relationships among the concepts of the “Iterations” Sub-domain. The yellow and red arrow shows the Prerequisite and Related Concepts respectively. Each node in Fig. 2 represents the id of the concepts in the “Iterations” Sub-domain. The “Iterations” Sub-domain consists of 8 concepts that are shown is Table 1. The following steps illustrate the proposed method for a specific learner: Step 1. Learner selects the “Iterations” Sub-domain from the Concepts Hierarchy. Step 2. Related Concepts of “Iterations” is selected from the knowledge domain ontology. Table 1. Related Concepts of “Iterations” in the knowledge domain ontology
1 2 3 4 5 6 7 8
Concept While Loop Loop Boundaries Conditional Loop Strategies For Loop Nested Loops Do-While Loop Choosing a Loop Control Structure Loop Invariants
Fig. 2. Relationships between the concepts in the “Iterations” Sub-domain
Dynamic Calculation of Concept Difficulty
527
Step 3. Since the learner has not learned any concept before, all the concepts obtained in Step 2 constitute the set of the Main Concepts. For each Main Concept, Steps 4 through Step 13 are followed. For example, for the “Loop Invariant” concept these steps are as follows: Step 4. According to Table 2 which shows the values of the input parameters for the each of the Main Concepts, x1 = 2 .
x2 = 0 . Step 6. According to Table 2, x3 = 0 .
Step 5. According to Table 2,
h( x1 ) = 2, h( x2 ) = 0, h( x3 ) = 0 . Step 8. According to the assumption of Eq. 1, the values of h( xi ) are sorted ascendingly such that ( h′( x1 ) = 0) ≤ ( h′( x 2 ) = 0) ≤ ( h′( x3 ) = 2) . Step 9. The values of g i for the “Loop Invariant” is calculated by Eq. 8: Step 7. According to Step 4 through Step 6,
1 = 1.00 1 + h′(1) 1 = 1.00 g2 = 1 + h′(2) 1 = 0.33 g3 = 1 + h′(3) g1 =
Step 10. The value of
λ
Step 11. The value of
g ( Ai ) for the “Loop Invariant” concept is calculated by using
is calculated using Eq. 9 and is
λ = −0.99 .
Eq. 6 and Eq. 7 recursively. The final result is:
g ( A1 ) = g1 + g 2 + g 3 + λg1 g 2 + λg1 g 3 + λg 2 g 3 + λ2 g1 g 2 g 3 = 1.01 g ( A2 ) = g 2 + g 3 + λg 2 g 3 = 1.00 g ( A3 ) = g 3 = 0.33 Step 12. The values of
h( xi ) and g are used in Eq. 10 to calculate the difficulty of
the “Loop Invariant” concept for the learner:
E g ( h) = 0 × 1.01 + (0 − 0) × 1.00 + ( 2 − 0) × 0.33 = 0.66 Step 13. The difficulty level of the “Loop Invariant” for the learner is 0.66. Table 3 and Table 4 show the values of g and difficulty level for all of the Main Concepts in the “Iterations” Sub-domain respectively. The two values are the same for all the Main Concepts that has the same values for the input parameters shown in Table 2.
528
A. Kardan and R. Hosseini Table 2. The value of input parameters for the concepts in the “Iterations” Sub-domain
1 2 3 4 5 6 7 8
Concept While Loop Loop Boundaries Conditional Loop Strategies For Loop Nested Loops Do-While Loop Choosing a Loop Control Structure Loop Invariants
x1 1 1 2 1 1 1 4 2
x3 0 0 0 1 0 1 0 0
x2 1 0 0 1 0 1 1 0
Table 3. The values of fuzzy measure for the concepts in the “Iterations” Sub-domain
1 2 3 4 5 6 7 8
Concept While Loop Loop Boundaries Conditional Loop Strategies For Loop Nested Loops Do-While Loop Choosing a Loop Control Structure Loop Invariants
g1 1.00 1.00 1.00 0.50 1.00 0.50 1.00 1.00
g2 0.50 1.00 1.00 0.50 1.00 0.50 0.50 1.00
g3 0.50 0.50 0.33 0.50 0.50 0.50 0.20 0.33
g(A1) 1.01 1.01 1.01 0.88 1.01 0.88 1.01 1.01
g(A2) 0.75 1.00 1.00 0.75 1.00 0.75 0.60 1.00
g(A3) 0.50 0.50 0.33 0.50 0.50 0.50 0.20 0.33
Table 4. Difficulty level for the concepts in the “Iterations” Sub-domain Concept
Difficulty Level
1
While Loop
0.75
2
Loop Boundaries
0.50
3
Conditional Loop Strategies
0.66
4
For Loop
0.88
5
Nested Loops
0.50
6
Do-While Loop
0.88
7
Choosing a Loop Control Structure
1.20
8
Loop Invariants
0.66
The highlighted concept in row 7 of Table 2 has the greatest value of x1 ,
x2 , and
x3 among all the other concepts. Hence, this concept is expected to be the most difficult concept in the “Iterations” Sub-domain for the learner. The highlighted row in Table 4 confirms this fact and shows the greatest difficulty level of this concept.
Dynamic Calculation of Concept Difficulty
529
5 Conclusion In this research a method is proposed to calculate the difficulty level for the concepts of the certain knowledge domain. To this end, the information in the learner model and Choquet Fuzzy Integral are used. This method provides the means to calculate the difficulty level of concepts which resides in the contents for learner. The content sequencing and presentation can be then adapted to the learner’s knowledge and this in turn leads to the efficiency of the learning process. Choquet Fuzzy Integral considers the dependency among the input parameters, while simultaneously combines the information in the learner model with the relationship between the concepts of the knowledge domain. Therefore, the proposed method is more precise than the static methods which consider equal concept difficulty for all the learners with different levels of knowledge. The results of numerical analysis show that this method is quite hopeful in determining the difficulty level of the knowledge domain concepts. Compare to the work done in [13], the proposed method has less complexity by using Constraint-Independent models for both the knowledge domain and the learner model and hence, can be easily applied to different knowledge domains. As it is shown in this research, the results obtained by this method can be used by an adaptive learning system for recommendation of the contents that are matched to the learner’s knowledge. Our future works aims to implement the proposed system in this paper and evaluate the obtained results. Besides, a method will be proposed for providing adaptive content presentation based on dynamic calculation of the difficulty level of learning concepts. Moreover, the impact of adding or removing the influential parameters can be evaluated and the precision of the method can be improved accordingly. Acknowledgement. This paper is provided as a part of a research performed in the Advanced E-learning Technologies Laboratory in Amirkabir University of Technology and financially supported by Iran Telecommunication Research Center (ITRC). Therefore, the authors would like to express their thanks to the ITRC for its significant supporting help.
References 1. Alexander, S.: E-learning Developments and Experiences. Education +Training 43, 240– 248 (2001) 2. Brusilovsky, P.: Adaptive Hypermedia. User Modeling and User-Adapted Interaction 11, 87–110 (2001) 3. Zhu, F., Yao, N.: Ontology-Based Learning Activity Sequencing in Personalized Education System. In: 2009 International Conference on Information Technology and Computer Science, itcs, vol. 1, pp. 285–288 (2009) 4. Baylari, A., Montazer, G.A.: Design a Personalized E-learning System Based on Item Response Theory and Artificial Neural Network Approach. Expert Systems with Applications 36, 8013–8021 (2009) 5. Chen, C.-M., Liu, C.-Y., Chang, M.-H.: Personalized Curriculum Sequencing Utilizing Modified Item Response Theory for Web-Based Instruction. Expert Systems with Applications 30, 378–396 (2006)
530
A. Kardan and R. Hosseini
6. Brusilovsky, P.: Adaptive and Intelligent Technologies for Web-Based Education. Künstliche Intelligenz 4, 19–25 (1999) 7. Melis, E., Andrès, E., Büdenbender, J., Frischauf, A., Goguadze, G., Libbrecht, P., Pollet, M., Ullrich, C.: ActiveMath: A Generic and Adaptive Web-Based Learning Environment. International Journal of Artificial Intelligence in Education 12, 385–407 (2001) 8. Dolog, P., Henze, N., Nejdl, W., Sintek, M.: The Personal Reader: Personalizing and Enriching Learning Resources Using Semantic Web Technologies. In: De Bra, P.M.E., Nejdl, W. (eds.) AH 2004. LNCS, vol. 3137, pp. 85–94. Springer, Heidelberg (2004) 9. Brady, A., O’Keeffe, I., Conlan, O., Wade, V.: Just-in-time Generation of Pedagogically Sound, Context Sensitive Personalized Learning Experiences. International Journal on Elearning 5, 113–127 (2006) 10. Chen, C.-M., Hsu, S.-H.: Personalized Intelligent Mobile Learning System for Supporting Effective English Learning. Educational Technology & Society 11, 153–180 (2008) 11. Chen, C.-M.: Intelligent Web-Based Learning System with Personalized Learning Path Guidance. Computers & Education 51, 787–814 (2008) 12. Weber, G., Brusilovsky, P.: ELM-ART: An Adaptive Versatile System for Web-Based Instruction. International Journal of Artificial Intelligence in Education 12, 351–384 (2001) 13. Mitrović, A., Martin, B.: Evaluating Adaptive Problem Selection. In: De Bra, P.M.E., Nejdl, W. (eds.) AH 2004. LNCS, vol. 3137, pp. 185–194. Springer, Heidelberg (2004) 14. Wei, T.: An Intelligent Tutoring System for Thai Writing Using Constraint Based Modeling. Master of Science Thesis, School of Computing, National University of Singapore (2005) 15. Harmelen, F., McGuiness, D.: OWL Web Ontology Language Overview, http://www.w3.org/TR/2004/REC-owl-features-20040210 16. Detyniecki, M.: Fundamentals on Aggregation Operators. Manuscript, Computer Science Division. University of California, Berkeley (2001) 17. Shieh, J.-I., Wu, H.-H., Liu, H.-C.: Applying a Complexity-Based Choquet Integral to Evaluate Students’ Performance. Expert Systems with Applications 36, 5100–5106 (2009) 18. Narukawa, Y., Torra, V.: Fuzzy Measures and Integrals in Evaluation of Strategies. Information Sciences 177, 4686–4695 (2007) 19. Chiang, J.-H.: Aggregating Membership Values by a Choquet-Fuzzy-Integral Based Operator. Fuzzy Sets and Systems 114, 367–375 (2000) 20. Medasani, S., Kim, J., Krishnapuram, R.: An Overview of Membership Function Generation Techniques for Pattern Recognition. International Journal of Approximate Reasoning 19, 391–417 (1999) 21. Advanced Placement Program. Introduction of Java in (2003-2004) The College Board, http://www.collegeboard.org/ap/computer-science 22. Protégé. Developed by Stanford Medical Informatics at the Stanford University School of Medicine, http://protege.stanford.edu
Simple Block-Diagonalization Based Suboptimal Method for MIMO Multiuser Downlink Transmission T. Taniguchi1 , Y. Karasawa1, and N. Nakajima2 1
Department of Communication Engineering and Informatics 2 Department of Informatics Advanced Wireless Communication research Center (AWCC) The University of Electro-Communications, Tokyo, Japan {taniguch,karasawa}@ee.uec.ac.jp, [email protected]
Abstract. For MIMO (multiple input multiple output) multiuser downlink processing, block diagonalization (BD) technique is widely used because of its simplicity and high efficiency to eliminate the interuser interferences. In this study, we present very simple weight design approach based on BD, in which the order of design steps is swapped, namely, receiver weights are calculated first, and then transmission weights are derived using zero forcing (ZF) procedure. Computer simulations show that the proposed approach achieves better performance than the conventional BD under certain conditions. In addition, the condition on the degrees of freedom is released, hence it can be utilized for the transmitters with small number of antennas to which BD cannot be applied. Keywords: MIMO, multiuser, downlink, eigenanalysis, array signal processing.
1
Introduction
After passing a decade of active research era of MIMO (multiple input multiple output) communication systems [1], [2], this technique utilizing antenna arrays both in the transmitter and receiver sides has entered into the phase of the practical applications in wide range [3]. Among application-oriented studies of MIMO, design of multiuser system is still under the lively discussion and development of new schemes, because it could be utilized, for example, in LAN (local area network) [4], cognitive radio [5], and recently base station cooperation in CoMP (Coordinated Multi-Point) cellular communications [6]. The multiuser system is classified into uplink and downlink designs: roughly speaking, the uplink scheme is considered in the context of conventional array signal processing [7] if the spatial processing is supposed, but that of the downlink is little a bit out of the line of well know receiver beamforming, and a better choice is sought by many researchers. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 531–542, 2011. c Springer-Verlag Berlin Heidelberg 2011
532
T. Taniguchi, Y. Karasawa, and N. Nakajima
A variety of design methods have been proposed for MIMO multiuser downlink system [8]: one famous way is a nonlinear technique known as DPC (dirty paper coding) [9], but it has the problem of complicated power control, hence linear processing approaches using transmit and receiver weights are mainly used. To derive a(n) (near-)optimal weights, some iterative search algorithms have been presented [10], [11], and some of them achieve the resource allocations together, but they are not easily acceptable to most communication engineers familiar with conventional MIMO system, because of their cost for the implementations, and the mathematical complexity. Many applications are putting their importance on how it is easy-to-use (for example, in LTE (long term evolution) system, weights are simply chosen from a codebook). From such a standpoint, block diagonalization (BD) approach [8] is widely used, since it is based on the extension of the concept of conventional SVD (singular value decomposition), and combines simplicity and good performance. Some modified versions of BD have been also proposed: a design approach considering the imperfect channel knowledge is given in [12], [13] deals with the simultaneous use of antenna selection methods in the receiver (in [14], multi-stream selection is described instead). Reference [15] describes imperfect BD whose procedure is complicated compared to the original BD, but brings a nice performance improvement. In this study, we present simple MIMO downlink algorithms based on the concept of BD and ZF (zero forcing) scheme, which achieve a better performance than BD under certain conditions. The conventional BD first carries out the procedure of ZF to the nontarget users, then designs the weight to derive a larger gain as possible toward the target user, but the operation of steering zero to all the antennas of unexpected users might waste the degrees of freedom, and the gain of the target link might become weak. In our method, first considering to derive a strong link between the transmitter and the receivers, those two steps are swapped. What should be remarked is, not only the performance improvement is anticipated when this ideas works well but also the condition on the degrees of freedom is released since only one degree is required for one stream of one user (each of them could be regarded as a user terminal equipped with a single antenna), hence this method is suitable also for the transmission using a small size array. The characteristics of the new methods are evaluated through computer simulations, and its advantage is verified together with its natures. The organization of the rest of this paper is as follows: in section 2, the model of multiuser MIMO system considered in this study is briefly described, then in section 3, design methods of the system are proposed based on the concept of BD. After computer simulations to verify the effectiveness and characteristics of the proposed method in section 4, conclusions and future works are given in section 5.
2
Multiuser MIMO System
In this section, multiuser MIMO system model considered in this study is briefly described, which is a popular scenario of MIMO downlink transmission.
Simple BD Based Suboptimal Method
sˆ0, 0 (t )
H0
sˆ0, L0 −1 (t )
0
H1
1
0
User 1
N r ,0 − 1
0
H M −1 sM −1, LM −1 −1 (t )
0
User M-1
Nt − 1
sˆ1, 0 (t )
sˆ1, L1 −1 (t )
N r ,1 − 1
0
s M −1, 0 (t )
Transmit Processing
s1, L1 −1 (t )
0
User 0
s0, 0 (t )
s0 L0 −1 (t ) s1, 0 (t )
533
N r , M −1 − 1
sˆM −1, 0 (t )
sˆM −1, LM −1 −1 (t )
Fig. 1. Multiuser MIMO downlink system
Let us consider a multiuser MIMO system as given in Fig. 1 consisting of a transmitter and M receivers each of which corresponds to a user. The transmitter and the receiver for the m-th user (m = 0, · · · , M − 1) are equipped with Nt and Nr,m antenna elements, respectively. The channel between the transmitter and the m-th receiver is denoted by a matrix Hm ∈ CNr,m ×Nt whose (nr,m , nt )-th entry shows the channel response between the nt -th element of the transmitter and the nr,m -th element of the receiver. The total channel T H = [H0T , · · · , HM−1 ]T ∈ CNr ×Nt (Nr is the total number of the receiver antennas, namely, Nr Nr,m ) is defined by stacking Hm for all users. Assume that Lm data streams are transmitted toward the m-th user. For the transmission of the -th stream data {sm, (t)} (t is the time index) for the m-th user, we use transmission weight wt,m, ∈ CNt . The signals for all the streams of all the users are summed up and then radiated from the transmitter, and reach to the m-th user through channel Hm . The received signal at the m-th receiver is expressed by y m (t) = Hm
n −1 L Pn, wt,n, sn, (t) + nm (t),
n
(1)
=0
where Pm, is the transmission power allocated to sm, (t) (hence here, we assume Ps,m = E[|sm, (t)|2 ] = 1), and nm (t) is the additive white Gaussian noise (AWGN) generated at the m-th receiver with power PN,m , namely, E[|nm (t)|2 ] = PN,m INr,m (IM is the M -dimensional identity matrix). The output signal {ˆ sm, (t)} of the -th stream of the m-th user, which is the estimate
534
T. Taniguchi, Y. Karasawa, and N. Nakajima
of {sm, (t)}, is produced from the received signal y m (t) of the m-th receiver by multiplying weight wr,m, . Our aim here is to design a set of weight vectors {wt,m, , wr,m, }. Design procedures for the production of output signals with good quality are described in the next section.
3
Design Method of Multiuser MIMO Downlink System
This section describes BD based design methods of weight vectors of multiuser MIMO downlink system given in section 2. To investigate the further possibility of the improvement, a semi-BD based method adopting MSINR (maximum signal to interference plus noise ratio) criterion in the transmitter is simultaneously described. 3.1
Weight Design Problem
Before describing the actual design procedure, general aspects of our problem are discussed in this subsection. The output signal of the m-th receiver consists of the desired signal mixed with the interferences originated from all the streams of all the users (except the signal of the target stream of the target user) and the noise signal. Therefore, an adequate evaluation criterion for the output signal of the -th stream of the m-th user is SINR denoted by Γm, , and given by the following equation:
Γm, =
2 H wr,m, Hm wt,m, Pm, 2 wH r,m, Ri,t,m, w r,m, + w r,m, PN,m
Ri,t,m, =
(2)
H Hn wt,n,k w H t,n,k Hn Pn,k
(n,k)∈Im,
where Im, = {(n, k) = (m, )} for the target user m and the stream . A straight way of the weight design is to simultaneously maximize the set {Γm, } referring to a certain adequate criterion (e.g., sum capacity which is approximated by C= log (1 + Γm, )), but for this aim, the solution of a much complicated m,
optimization is required. Some efficient algorithms have been proposed, but they still need heavy load for the implementation, and we cannot say that they are engineer-familiar because of their mathematical complexity. Hence, we consider a simple method based on the idea of BD in the next section. 3.2
ZF Based Design
This section describes the proposed method utilizing ZF in the transmitter side. In conventional BD, first, the transmitter attempts ZF to all the antennas of nontarget users. Namely, the transmission weight vector is chosen from the
Simple BD Based Suboptimal Method
535
˜ m = [H T , · · · , H T , H T , · · · , H T ]T . But this approach kernel of H 0 m−1 M−1 m+1 Nr,n , which means not only large number consume the degrees of freedom n=m
of antennas are required even if the number of streams is small for each user, and also the connection with the target receiver might be weaken. Hence, here we consider to design the receiver weight vectors first: though it is difficult to find the optimal vector before the transmission weight is determined, one reasonable choice is to use the SVD based design [16] which is the optimal in the case of conventional single user MIMO: the weight vector wr,m, is derived as the left singular value vector of Hm corresponding to the -th largest singular value λm, (notation λm, denotes the -th largest eigenvalue of the covariance matrix of Hm ). After this operation, total MIMO channel H is converted to Hr ∈ CL×Nt (here, L is the total number of streams, that is, L = Lm ) which is given by Hr = T T T H [Hr,0 , · · · , Hr,M−1 ] , where Hr,m = Wr,m Hm , and the -th column of Wr,m ∈ CNr,m ×Lm is w r,m, . Applying ZF technique to Hr , the transmitter weight vectors are derived: define a matrix T T T T ˜ r,m, = [Hr,0 H , · · · , Hr,m−1 , H Tr,m, , Hr,m+1 , · · · , Hr,M−1 ]
˜ H Hm and where H r,m, = W r,m, ˜ r,m, = [wr,m,0 , · · · , wr,m,−1 , wr,m,+1 , · · · , w r,m,Lm −1 ]. W The vector w t,m, is derived as Vm, v m, , where v m, is the right singular value vector of Hm Vm, corresponding to the largest singular value, and the columns of matrix Vm, ∈ CNt ×(Nt −L+1) are (Nt − L + 1) different right singular value ˜ r,m, corresponding to the null singular value. vectors of H In the first step, the cancellation of interferences from other users is not taken into account, but they could be suppressed in the second step by ZF operation. This method is anticipated to have the following two advantages if the condition on Nt is satisfied: (1) The performance improvement by the establishment of strong connection between the transmitter and (the target stream of) the target receiver could be expected if this idea steers in a positive direction. The detailed nature is verified through computer simulations in the next section. (2) The number of degrees of freedom consumed for the user m is same as Lm , that is, it is less than or equal to that of the conventional BD which always steers zero toward n=m Nr,m antennas. Hence, even if the number of the transmit antennas is restricted to a small number, at least one stream could be transmitted if Nt ≥ M . In spite of those good points, the above method rests very simple both in the computational and theoretical aspects.
536
3.3
T. Taniguchi, Y. Karasawa, and N. Nakajima
MSINR Based Design
Following the ZF based approach in 3.2, in this subsection, we show another approach utilizing MSINR criterion in the transmitter side which is developed based on the concept of above method, following the descriptions in 3.2. The idea of ZF which eliminates interferences to all the nontarget users is easy to understand, and that is the reason this method is widely accepted. But here, we consider another approach returning to the start line of decreasing the interference and noise, and consider to use MSINR design instead of ZF in the transmitter. We consider here to maximize the following criterion Γt,m =
2 |wH r,m, Hm w t,m, | PS,m 2 wH t,m, Ri,r,m w t,m, + w r,m, PN,m
Ri,r,m =
(3)
HnH w r,n,k wH r,n,k Hn Pn,k
(n,k)∈Im,
Though the transmit weights do not have an ability to control the receiver noise, if it is not contained, this method results in ZF. Strictly, the transmit powers of all users should be simultaneously calculated, but here, they are kept to one in the process of weight design, and the power control is ascribed to the water filling [17] as described in the next section. Manipulating equation (3) under the unit norm constraint, we have the optimal condition as an eigenproblem given by 2 wr,n, PN,n INt wt,m, (4) RS w r,m, = Γt,m Ri,r,m, + n
RS =
H Hm w r,m, wH r,m, Hm
and the transmit weights are derived by solving this problem.
4
Simulations
In this section, computer simulations are carried out for the evaluation of the methods of multiuser MIMO system described in section 3, and their effectiveness and features are investigated. 4.1
Simulation Conditions
In this subsection, default simulation conditions are described. The evaluation of the proposed method is carried out using SINR defined by equation (2) in section 3.1. Since the transmit and receive weight vectors have unit norm and E[|sm, (t)|2 ] = 1 for all the signals, the allocation of the transmit power could be determined by water filling theorem [17] in order that the total power of each user becomes one, namely, Pm = {Pm, } = 1. The SNR (signal
Simple BD Based Suboptimal Method
537
Table 1. Simulation Conditions Number of Users Number of Transmit Antennas Number of Receive Antennas Number of Streams per User Energy Constraint SNR Channel Statistics
M =3 Nt = 12 Nr,m = 4 Lm = 2 Pm = 1 SNR = 20dB i.i.d. Rayleigh Fading with unit variance
Pm , which means the PN,m ratio of total signal power and the noise of user m. Here we consider M = 3, and the numbers of antennas are Nt = 12 and Nr,m = 4 for all m: it means degrees of freedom is sufficient also for the conventional BD. The default number of streams is Lm = 2, but if it is impossible because of the channel condition, the maximum number of streams less than Lm is chosen instead. As a modulation scheme, BPSK (binary phase shift keying) is adopted. The channels are assumed to be under i.i.d. (independent and identically distributed) Rayleigh fading with unit variance, and 500 samples of channels are used for the SINR calculation. The default simulation conditions are summarized in Table 1.
to noise ratio) of the m-th user is defined by SNR =
4.2
Results and Discussions
This subsection presents simulation results for the evaluation and discussions related to them. Figure 2 depicts the empirical distribution functions of the output SINR of multiuser MIMO downlink methods for Nt = 12 and Nr,m = 4, where the number of antennas is enough for transmitting maximum Lm = 4 streams. From (a) to (d), the number of streams movers from Lm = 1 to Lm = 4 (Lm is the same for all users). In this figure, only the curves of proposed method based on ZF are shown since they have indistinguishable slight difference from those of MSINR based approach under this situation. From four subplots in Fig. 2, it can be observed that the output SINR characteristics of the proposed method is superior to those of the conventional BD under the condition of Lm = 1, 2. But as the number of streams increases, the conventional BD overcomes the proposed approach. This is because the conventional BD consumes the same degrees of freedom regardless of the number of streams, and on the contrary, the proposed method consumes smaller number of them when the streams are less than the maximum, so the transmit weight for the target user could be chosen from the larger dimension. But as the stream number increases, their difference becomes smaller, and the optimal choice of the weight (in the sense of ZF achieving) of the conventional BD gradually invokes the advantage (we can observe the influence of the order change of transmit and receive weight design in this point).
538
T. Taniguchi, Y. Karasawa, and N. Nakajima
1
1 Conventional (l=0) Proposed (l=0)
0.9
0.9 0.8 Cumulative Probability
Cumulative Probability
0.8 0.7 0.6 0.5 0.4 0.3
0.7 0.6 0.5 0.4 0.3
0.2
0.2
0.1
0.1
0 25
30
35
Conventional (l=0) Proposed (l=0) Conventional (l=1) Proposed (l=1)
0 15
40
20
25 Output SINR [dB]
Output SINR [dB]
(a) 1 Stream.
35
(b) 2 Streams.
1
1 Conventional (l=0) Proposed (l=0) Conventional (l=1) Proposed (l=1) Conventional (l=2) Proposed (l=2)
0.8 0.7
0.9 0.8 Cumulative Probability
0.9
Cumulative Probability
30
0.6 0.5 0.4 0.3
0.7 0.6 0.5 0.4 0.3
0.2
0.2
0.1
0.1
0 10
15
20 Output SINR [dB]
25
0 −5
30
Conventional (l=0) Proposed (l=0) Conventional (l=1) Proposed (l=1) Conventional (l=2) Proposed (l=2) Conventional (l=3) Proposed (l=3)
0
5
(c) 3 Streams.
10 15 Output SINR [dB]
20
25
30
(d) 4treams.
Fig. 2. Distribution functions of output SINR for Nt = 12, Nr,m = 4, and SNR = 20dB
45 40
35 Conventional (l=0) Proposed (l=0) Conventional (l=1) Proposed (l=1)
ZF MSINR 30 25 Output SINR [dB]
Output SINR [dB]
35 30 25 20
15
l=1
10
15
5
10 5 5
l=0 20
10
15
20
25
SNR [dB]
(a) (Nt , Nr,m ) = (12, 4), Lm = 2.
30
0 5
10
15
20
25
SNR [dB]
(b) (Nt , Nr,m ) = (6, 4), Lm = 2.
Fig. 3. SNR versus output SINR characteristics
30
35
40
30
35 Output SINR [dB]
Output SINR [dB]
Simple BD Based Suboptimal Method
25
20
15
Conventional (L=1) Proposed (L=1) Conventional (L=2,l=0) Proposed (L=2,l=0) Conventional (L=2,l=1) Proposed (L=2,l=1)
10 2
2.5
3
3.5 4 4.5 User Number M
5
5.5
6
539
30
25
20
Conventional (L=1) Proposed (L=1) Conventional (L=2,l=0) Proposed (L=2,l=0) Conventional (L=2,l=1) Proposed (L=2,l=1)
15 2
2.5
3
3.5 4 4.5 Receive Antenna Number N
5
5.5
6
r,m
Fig. 4. Number of users versus output SINR for (Nt , Nr,m ) = (12, 2) and SNR = 20dB
Fig. 5. Number of receive antennas versus output SINR for M = 3, Nt = 14 and SNR = 20dB
16
35
14
30
12 Output SINR [dB]
Output SINR [dB]
25
10 8 6
15 10
4
5
2 0
20
1
2
l=0
3
1
2
l=1
(a) SNR = 10dB.
1
l=2
0
1
2
l=0
3
1
2
l=1
1
l=2
(b) SNR = 30dB.
Fig. 6. Output SINR in case of L0 = 3, L1 = 2, and L2 = 1. Numbers in horizontal axis denote user index m.
The relation between SNR defined in 4.1 and the output SINR is drawn in Fig. 3. This figure shows that the output SINR improves almost in proportional to SNR. Figure 3 deals with the case the conventional BD cannot be used because of the lack of the degrees of freedom, hence its curves are not plotted there. The MSINR based method overcomes ZF based one in the low SNR region, but as SNR becomes higher, the characteristics of them approach, and almost same at SNR = 30dB (the improvement of ZF approach in the high SNR is a natural result [8]). Figure 4 plots the number of users versus the output SINR for Nt = 12, Nr,m = 2, and Lm = 1 or Lm = 2. What can be seen from this figure is that, not depending on the proposed method is superior (Lm = 1) or not (Lm = 2), the
540
T. Taniguchi, Y. Karasawa, and N. Nakajima
difference of two methods is enlarged as the number of users increased. Hence the choice of the algorithm becomes important in a system assuming a relatively large number of users. The curves of the number of receive antennas versus the output SINR for M = 3, Nt = 14 and Lm = 1, 2 are drawn in Fig. 5. We should remark that the number of streams is not always the same as that of the receive antennas because of the limitation of the degrees of freedom (e.g., the maximum number is Lm = 2 even under Nr,m = 6). The result of this figure is that the performance difference of the conventional and the proposed approaches becomes larger as the number of receive antennas increases. Next, let us consider the case different numbers of streams are used among users 0 ∼ 3: examples of SINRs for L0 = 3, L1 = 2, and L2 = 1 are shown Fig. 6 in cases of SNR = 10dB and SNR = 30dB. In the stream = 0, the SINR of user 1 using the largest number of streams L0 = 3 is the lowest, and as the stream number increase, a higher SINR is achieved. On the contrary, in the stream = 1, this order is reversed (SINR0 > SINR1 ) for SNR = 10dB and almost equal for SNR = 30dB. As shown here, a simple rule is not found, but through investigation of other cases (e.g., Nr = 12 though the result is not shown here), we have verified that if the number of Nr is sufficiently large (as the conventional BD is possible), SINR of the -th path becomes higher in descent order of the stream number (generally, Lm ≥ Ln results in Γm, ≤ Γm, ). From those results, we can conclude that: (1) The proposed method enables us a multistream transmission using small number of antennas to which the conventional BD could not be applied. This is advantageous, for example, in home-use access point which has a restriction on its physical size. (2) Even in the case where the condition on degrees of freedom for the conventional BD is satisfied, the proposed method shows a better performance if the number of streams is small compared with the maximum possible stream number. MIMO system does not always utilize all the streams since it requires solving a complicated scheduling problem, and simple rule of adaptive modulation choosing the maximum one or two streams is preferred in some applications. So our recommendation is to switch between the conventional and the proposed methods depending on the situation (number of users and streams) in which the multiuser system is actually used. Other than the above-mentioned, investigations changing some parameters are considered: if some of user are under worse condition (e.g., fewer receive antenna number, smaller channel variance), the SINR characteristics are degraded, but which is a common phenomenon to most multiuser transmission schemes, hence not described here.
5
Conclusions
In this study, we have presented a very simple design approach for multiuser MIMO downlink transmission based on BD . In this method, the order of design
Simple BD Based Suboptimal Method
541
steps is swapped from the conventional BD, namely, the receiver weights are calculated first, and then transmission weights are designed using ZF procedure to establish a strong connection between the transmitter and the target receiver. By this operation, the condition on the degrees of freedom in the transmitter side could be also released. Through computer simulations, it has been shown that the proposed approach achieves better performance than conventional BD under certain conditions, and in that case, the SVD based receiver weight choice is reasonable. But authors believe that the choice of the receiver weight vectors here is simple but not the best because of the existence of the interferences. Hence, the future work is the improvement of this method based on more reasonable choice of the receiver weights which achieves a better performance. Application of the proposed method to CoMP transmission and extension to broadband case [18] are also important themes of study.
Acknowledgment The authors would like to thank Dr. Nordin Bin Ramli (Malaysian Institute of Microelectronic Systems (MIMOS) Berhad) for his fruitful discussions with us. This work was performed under a research contract on the development of cooperative base station system with the Ministry of Internal Affairs and Communications (MIC) of Japan.
References 1. B¨ olcskei, H., Gesbert, D., Papadias, C.B., van der Veen, A.-J. (eds.): Space-Time Wireless Systems: From Array Processing to MIMO. Cambridge University Press, Cambridge (2006) 2. Bessai, H.: MIMO Signals and Systems. Springer, New York (2005) 3. Li, Q., Li, G., Lee, W., Lee, M., Mazzarese, D., Clerckx, B., Li, Z.: MIMO Techniques in WiMAX and LTE: A Feature Overview. IEEE Communication Magazine 48, 86–92 (2010) 4. Jin, H., Jung, B., Hwang, H., Sung, D.: Performance Comparison of Uplink WLANs with Single-User and Multi-User MIMO Schemes. In: IEEE Wireless Communications and Networking Conference 2008, WCNC 2008 (2008) 5. Dhillon, H.S., Buehrer, R.M.: Cognitive MIMO Radio: Incorporating Dynamic Spectrum Access in Multiuser MIMO Network. In: IEEE Global Telecommunications Conference 2009, GLOBECOM 2009 (2009) 6. Taniguchi, T., Karasawa, Y., Nakajima, N.: Impact of Base Station Cooperation in Multi-Antenna Cellular System. In: European Microwave Week 2009 (EuMW 2009) (2009) 7. Hudson, J.E.: Adaptive Array Principles. Institution of Engineering and Technology, London (1981) 8. Spencer, Q.H., Peel, C.B., Swindlehurst, A.L., Haardt, M.: An Introduction to the Multi-User MIMO Downlink. IEEE Communication Magazine 42, 60–67 (2004) 9. Costa, M.: Writing on dirty paper. IEEE Transactions on Information Theory 29, 439–441 (1983)
542
T. Taniguchi, Y. Karasawa, and N. Nakajima
10. Tolli, A., Codreanu, M., Juntti, M.: Linear Multiuser MIMO Transceiver Design with Quality of Service. IEEE Transactions on Signal Processing 56, 3049–3055 (2008) 11. Shi, S., Schubert, M., Boche, H.: Computational Efficient Transceiver Optimization for Multiuser MIMO Systems: Power Minimization with User-MMSE Requirements. In: 40th Asilomar Conference on Signals, Systems and Computers, ACSSC 2006 (2006) 12. Ravindran, N., Jindal, N.: Limited Feedback-Based Block Diagonalization for the MIMO Broadcast Channel. IEEE Journal on Selected Areas in Communications 26, 1473–1482 (2009) 13. Wang, F., Bialkowski, M.E.: Performance of Multiuser MIMO System Employing Block Diagonalization with Antenna Selection at Mobile Stations. In: 2010 2nd International Conference on Signal Processing Systems, ICSPS 2010 (2010) 14. Lee, D., Seo, Y., Kim, K.: Joint Diversity with Multi-Stream Selection for MultiUser Spatial Multiplexing Systems with Block Diagonalization. In: 15th AsiaPacific Conference on Communications 2009, APCC 2009 (2009) 15. Nishimoto, H., Kato, S., Ogawa, T., Ohgane, T., Nishimura, T.: Imperfect Block Diagonalization for Multiuser MIMO Downlink. In: The 18th Annual IEEE International Syuposium on Personal Indoor and Mobile Radio Communications, PIMRC 2008 (2008) 16. Andersen, J.B.: Array Gain and Capacity for Known Random Channels with Multiple Element Arrays at Both Ends. IEEE Journal on Selected Areas in Communications 18, 2172–2178 (2000) 17. Proakis, J.G.: Digital Communications, 3rd edn. McGraw-Hill, New York (2000) 18. Huang, Y., Bensty, J., Chen, J.: Acoustic MIMO Signal Processing. Springer, Heidelberg (2006)
Novel Cooperation Strategies for Free-Space Optical Communication Systems in the Absence and Presence of Feedback Chadi Abou-Rjeily and Serj Haddad Department of Electrical and Computer Engineering Lebanese American University, Lebanon {chadi.abourjeily,serj.haddad}@lau.edu.lb
Abstract. In this paper, we investigate cooperative diversity as a fading mitigation technique for Free-Space Optical (FSO) communications with intensity modulation and direct detection (IM/DD). In particular, we propose two novel one-relay cooperation strategies. The first scheme does not require any feedback and is based on selective relaying where the relay forwards only the symbols that it detected with a certain level of certainty that we quantify in both cases of absence or presence of background radiation. This technique results in additional performance enhancements and energy savings compared to the existing FSO cooperative techniques. The second scheme can be applied in situations where a feedback link is available. This scheme that requires only one bit of feedback results in significant performance gains over the entire range of the received signal level. Keywords: Free-Space Optics (FSO), Cooperative Systems, Rayleigh Fading, Pulse Position Modulation (PPM), Diversity.
1
Introduction
Recently, Free-Space Optical (FSO) communications attracted significant attention as a promising solution for the “last mile” problem [1]. A major impairment that severely degrades the link performance is fading (or scintillation) that results from the variations of the index of refraction due to inhomogeneities in temperature and pressure changes [2]. In order to combat fading and maintain acceptable performance levels over FSO links, fading-mitigation techniques that were extensively investigated in the context of wireless radio-frequency (RF) communications were recently tailored to FSO systems. These techniques can be classified in two broad categories. (i): Localized diversity techniques where multiple apertures are deployed at the transmitter and/or receiver sides. In the wide literature of RF systems, this is referred to as the Multiple-Input-Multiple-Output (MIMO) techniques that can result in significant multiplexing and diversity gains over wireless links. (ii): Distributed diversity techniques where neighboring nodes in a wireless network cooperate with each other to form a “virtual” antenna array and profit from the underlying spatial diversity in a distributed manner. These cooperative H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 543–556, 2011. c Springer-Verlag Berlin Heidelberg 2011
544
C. Abou-Rjeily and S. Haddad
techniques are becoming more popular in situations where limited number of apertures can be deployed at the transceivers [3, 4]. In the context of FSO communications, localized diversity techniques include aperture-averaging receiver diversity [5], spatial repetition codes [6], unipolar versions of the orthogonal space-time codes [7] and transmit laser selection [8]. However, these techniques suffer mainly from the channel correlation that is particulary pronounce in FSO systems. In fact, for RF systems, the wide beamwidth of the antennas and the rich scattering environment that is often present between the transmitter and the receiver both ensure that the signal reaches the receiver via a large number of independent paths. Consequently, the assumption of spatially uncorrelated channels is often valid for these systems. On the other hand, for FSO links, the laser’s beamwidth is very narrow and these links are much more directive thus rendering the assumption of uncorrelated channels practically not valid for these systems. For example, the presence of a small cloud might induce large fades on all source-detector sub-channels simultaneously [6]. Consequently, the high performance gains promised by MIMO-FSO systems might not be achieved in practice and “alternative means of operation in such environments must be considered” [6]. On the other hand, while the literature on cooperation in RF networks dates back to about a decade [3,4], it was only recently that this diversity technique was considered in the context of FSO communications [9]. In [9], a simple decode-andforward strategy was proposed in scenarios where one neighboring node (relay) is willing to cooperate with the source node. Based on the strategy proposed in [9], the relay simply decodes and retransmits all symbols it receivers independently from the quality of signal reception at this relay. This strategy was analyzed in the absence and presence of background radiation in situations where the channel state information (CSI) is not available neither at the transmitter nor at the receiver sides. [9] highlighted the utility of cooperation despite the nonbroadcast nature of FSO communications where the message transmitted from the source to the destination can not be overheard by the neighboring relay. In this work, we present an enhancement to the decode-and-forward strategy proposed in [9]. This enhancement will be referred to as selective relaying. Instead of decoding all symbols at the relay and automatically forwarding these symbols to the destination, the relay will now back off if the fidelity, with which the symbol is recovered, is judged to be low. In this context, we propose two novel rules based on which the relay forwards the corresponding symbol or not. One of these rules is suitable for the no-background radiation case while the other rule is applied in the presence of background radiation. In the absence of background radiation, we will prove that selective relaying results in exactly the same performance as the decode-and-forward scheme proposed in [9] while in the presence of background radiation selective relaying can result in significant performance gains compared to [9]. However, in both cases, the proposed relaying scheme results in significant energy savings since the relay is stopping its transmission (and thus saving energy) when it judges that a certain symbol will most probably be decoded erroneously whether it cooperates or not. Another
Novel Cooperation Strategies for FSO Communication Systems
545
contribution of the paper is that we propose a novel cooperation strategy that can be applied when the CSI is available at the receiver and when one feedback bit is available between the receiver and the transmitter. Under this scenario, significant error rate enhancements can be observed whether in the absence or presence of background radiation.
2
System Model
Consider the example of a FSO Metropolitan Area Network as shown in Fig. 1. Consider three neighboring buildings (1), (2) and (3) and assume that a FSO connection is available between each building and its two neighboring buildings. In FSO networks, each one of these connections is established via FSO-based wireless units each consisting of an optical transceiver with a transmitter and a receiver to provide full-duplex capability. Given the high directivity and nonbroadcast nature of FSO transmissions, one separate transceiver is entirely dedicated for the communication with each neighboring building. We assume that the transceivers on building (2) are available for cooperation to enhance the communication reliability between buildings (1) and (3). By abuse of notations, buildings (1), (2) and (3) will be denoted by source S, relay R and destination D, respectively. It is worth noting that the transceivers at R are not deployed with the objective of assisting S. In fact, these transceivers are deployed for R to communicate with S and D; if R is willing to share its existing resources (and R has no information to transmit), then it can act as a relay for assisting S in its communication with D.
TRx (3,1)
TRx (1.1) TRx (1,2)
TRx (3,2) TRx (2,1)
(3)
(1)
TRx (2,2) (2)
Fig. 1. An example of a mesh FSO network. Cooperation is proposed among the transceivers on buildings (1), (2) and (3) where the transceivers on building 2 can help in transmitting an information message from (1) to (3). Note how, given the non-broadcast nature of FSO transmissions, one couple of FSO transceiver units is dedicated for each link.
546
C. Abou-Rjeily and S. Haddad
Denote by a0 , a1 and a2 the random path gains between S-D, S-R and RD, respectively. In this work, we adopt the Rayleigh turbulence-induced fading channel model [6] where the probability density function (pdf) of the path gain 2 (a > 0) is given by: fA (a) = 2ae−a . In the same way, denote by P0 , P1 and P2 the fractions of the total power dedicated to the S-D, S-R and R-D links, respectively. These parameters must satisfy the following relation: P0 +P1 +P2 = 1 so that the cooperative system transmits the same total energy as non-cooperative systems. We consider Q-ary pulse position modulation (PPM) with intensity modulation and direct detection (IM/DD) where each receiver corresponds to a simple photoelectrons counter. Denote by λs and λn the average number of photoelectrons per slot resulting from the incident light signal and background radiation (and/or dark currents), respectively. These parameters are given by [6]: λs = η
Es Pr Ts /Q Pb Ts /Q =η ; λn = η hf hf hf
(1)
where: – η is the detector’s quantum efficiency assumed to be equal to 1 in what follows. – h = 6.6 × 10−34 is Planck’s constant. – f is the optical center frequency taken to be 1.94 × 1014 Hz (corresponding to a wavelength of 1550 nm). – Ts is the symbol duration. – Pr (resp. Pb ) stands for the optical signal (resp. noise) power that is incident on the receiver. Finally, in eq. (1): Es = Pr Ts /Q corresponds to the received optical energy per symbol corresponding to the direct link S-D. As a first step in the cooperation strategy, a PPM symbol s ∈ {1, . . . , Q} is (0) (0) transmitted along both the S-D and S-R links. Denote by Z (0) = [Z1 , . . . , ZQ ] (1)
(1)
and Z (1) = [Z1 , . . . , ZQ ] the Q-dimensional vectors corresponding to the pho(i)
toelectron counts (in the Q slots) at D and R, respectively. In other words, Zq corresponds to the number of photoelectrons detected in the q-th slot along the S-D link for i = 0 and along the S-R link for i = 1 for q = 1, . . . , Q. For q = s, no light signal is transmitted in the q-th slot and the only source (i) of photoelectrons in this slot is background radiation. In this case, Zq can be modeled as a Poisson random variable (r.v.) with parameter [6]: E[Zq(i) ] = λn ; q = s ; i = 0, 1
(2)
where E[.] stands for the averaging operator. (i) For q = s, Zq can be modeled as a Poisson random variable with parameter: E[Zs(i) ] = βi Pi a2i λs + λn ; i = 0, 1
(3)
where β0 1 and β1 is a gain factor that follows from the fact that S might be closer to R than it is to D. In other words, the received optical energy at R
Novel Cooperation Strategies for FSO Communication Systems
547
corresponding to the energy β0 Es = Es at D is β1 Es . Performing a typical link 2 where dSD and dSR stand for the budget analysis [6] shows that β1 = ddSD SR distances from S to D and S to R, respectively. Based on the decision vector Z (1) available at R, the maximum-likelihood (ML) detector corresponds to deciding in favor of the symbol sˆ given by: sˆ = arg max Zq(1) q=1···Q
(4)
Now, the symbol sˆ (rather than the symbol s) is transmitted along the R-D link. In this case, the corresponding decision vector can be written as Z (2) = (2) (2) (2) [Z1 , . . . , ZQ ] where Zq is a Poisson r.v. whose parameter is given by: β2 P2 a22 λs + λn , q = sˆ; E[Zq(2) ] = (5) q = sˆ. λn , where β2 =
3
dSD dRD
2 with dRD corresponding to the distance between R and D.
Cooperation in the Absence of Background Radiation
In this section, we assume that λn = 0. 3.1
Cooperation Strategies in the Absence of Feedback
Next, we propose two cooperation strategies that will be referred to as scheme 1 and scheme 2, respectively. For both schemes, a sequence of symbols is first transmitted simultaneously to D and R. Given that λn = 0, therefore if symbol (0) s was transmitted, then the Q − 1 decision variables Zq for q = s (at D) and (1) the Q − 1 decision variables Zq for q = s (at R) will all be equal to zero. (1) Consequently, two scenarios are possible at R. (i): Zs > 0; in this case, R decides in favor of sˆ = s and the decision it makes is correct. In fact, in the absence of background radiation the only source of this nonzero count at slot (1) s is the presence of a light signal in this slot. (ii): Zs = 0 implying that zero photoelectron counts are observed in all Q slots. In this case, the best that the relay can do is to break the tie randomly and decide randomly in favor of one of the slots sˆ resulting in a correct guess with probability 1/Q. For scheme 1, the relay forwards the decoded symbol sˆ automatically to D (1) (1) independently from whether Zs > 0 or Zs = 0. For scheme 2, the relay for(1) wards the decoded symbol sˆ only if Zs > 0. In fact, if all counts are equal to (1) zero (Zs = 0), then most probably the relay will make an erroneous decision (with probability Q−1 Q ). In order to avoid confusing D by forwarding a wrong estimate of the symbol, the relay backs off and stops its retransmission during the corresponding symbol duration. For both schemes, the decision at D will be based on vectors Z (0) and Z (2) . If one slot of Z (0) has a nonzero photoelectron count, then D decides in favor
548
C. Abou-Rjeily and S. Haddad
of this slot (and its decision will be correct). On the other hand, if all components of Z (0) are equal to zero, then the decision at D will be based on Z (2) . If one component of Z (2) is different from zero, then D decides in favor of this component. For scheme 1, this decision is correct with probability 1 − pe where pe stands for the probability of error at R; for scheme 2, this decision will be correct with probability 1 since R forwards the message if and only if it decoded the transmitted symbol correctly. Finally, if all components of Z (2) are equal to zero, then D decides randomly in favor of one of the Q slots. Note that for scheme 1, this case occurs only because of fading and shot noise along the R-D link while for scheme 2 this case might occur because the relay backed off as well (in addition to fading and shot noise along the R-D link). To summarize, for both schemes, D decides in favor of symbol s˜ according to the following rule: ⎧ (0) ⎪ 0], Z (0) = 0Q ; ⎨ argq=1,...,Q [Zq = (2) (0) s˜ = argq=1,...,Q [Zq = 0], Z = 0Q , Z (2) = 0Q ; ⎪ ⎩ rand(1, . . . , Q), Z (0) = Z (2) = 0Q .
(6)
where 0Q corresponds to the Q-dimensional all-zero vector while the function rand(1, . . . , Q) corresponds to choosing randomly one integer in the set {1, . . . , Q}. 3.2
Performance Analysis
Proposition: Schemes 1 and 2 achieve the same conditional symbol error probability (SEP) given by: Pe|A =
2 2 2 Q − 1 −P0 a20 λs −β1 P1 a21 λs e (7) e + e−β2 P2 a2 λs − e−β1 P1 a1 λs e−β2 P2 a2 λs Q
where the channel state is defined by the vector A [a0 , a1 , a2 ]. Proof : the proof is provided in the appendix. Averaging Pe|A over the Rayleigh distributions of a0 ,a1 and a2 results in the following expression of the SEP: ! Q−1 1 1 1 1 + − Q 1 + P0 λs 1 + β1 P1 λs 1 + β2 P2 λs (1 + β1 P1 λs )(1 + β2 P2 λs ) (8) 1 which Note that for non-cooperative systems the SEP is given by: Pe = Q−1 Q 1+λs scales asymptotically as λ−1 (for λs 1) . On the other hand, eq. (8) scales s asymptotically as λ−2 showing the enhanced diversity order (of two) achieved s by the proposed schemes. Despite the fact that scheme 1 and scheme 2 result in exactly the same error performance, scheme 2 presents the additional advantage of reducing the overall transmitted energy since the relay backs off when the decision vector Z (1) it Pe =
Novel Cooperation Strategies for FSO Communication Systems
549
observes does not ensure a correct decision. In other words, the relay stops its transmission when Z (1) = 0Q thus saving energy. This saved energy can be further invested in the transmission of other symbols (for which Z (1) = 0Q ) along the R-D link (refer to section 5 for more details). Finally, eq. (6) shows that the proposed cooperation strategies can be implemented without requiring any channel state information (CSI) at the receiver side (non-coherent detection). 3.3
Cooperation in the Presence of Feedback
In the case where there is no feedback from the receiver to the transmitter, the CSI is not available at the transmitter side and no preference can be made among the available links. In this case, the best choice is to distribute the transmit power evenly among the three links S-D, S-R and R-D by setting P0 = P1 = P2 = 1/3. This will be referred to as the no-feedback case in what follows. When one feedback bit is available, the transmitter can choose to transmit the entire optical power either along the direct S-D link (P0 = 1 and P1 = P2 = 0) or along the indirect S-R-D link (P0 = 0, P1 = 0 and P2 = 0). This transmission strategy in the presence of feedback is motivated by the fact that the conditional SEP in eq. (7) is equal to the product of two functions; one of them depending only on the direct S-D link (via the channel coefficient a0 ) and the other one depending only on the indirect S-R-D link (via the channel coefficients a1 and a2 ). In what follows, we fix P1 = P2 = 1/2 if the information is to be transmitted along the indirect link. Note that the values of P1 and P2 along the indirect link can be further optimized to minimize Pe|A ; however, we observed that this optimization results only in a marginal decrease in the SEP. Moreover, this approach requires much more than one bit of feedback to inform S and R about the fractions of the power that they need to allocate to the S-R and R-D links. If the direct link S-D is chosen (with P0 = 1 and P1 = P2 = 0), the resulting conditional SEP will be given by: (d)
Pe|A =
Q − 1 −a20 λs e Q
(9)
If the indirect link S-R-R is chosen (with P0 = 0 and P1 = P2 = 1/2), the resulting conditional SEP will be given by: (in)
Pe|A =
2 2 2 Q − 1 −β1 a21 λs /2 e + e−β2 a2 λs /2 − e−β1 a1 λs /2 e−β2 a2 λs /2 Q
(10)
Given that the third term in the above expression is two orders of magnitude (in) smaller than the first two terms, then Pe|A can be approximated by: (in)
Pe|A ≈
Q−1 2 2 2 Q − 1 −β1 a21 λs /2 e 2e− min(β1 a1 ,β2 a2 )λs /2 + e−β2 a2 λs /2 ≈ Q Q
(11)
550
C. Abou-Rjeily and S. Haddad
where the second approximation follows from the strong monotonic behavior of the exponential function. Finally, based on eq. (9) and eq. (11), the proposed cooperation strategy in the presence of 1 feedback bit corresponds to choosing (P0 , P1 , P2 ) according to: 2 (1, 0, 0), a20 ≥ 12 min(β1 a21 , β2 a22 ) − ln λs ; (P0 , P1 , P2 ) = (12) (0, 1/2, 1/2), otherwise. Note that the inequality a20 ≥ 12 min(β1 a21 , β2 a22 ) − lnλs2 is easier to be satisfied for smaller values of λs . In this case, the direct link is preferred over the indirect link since the performance of the indirect link will be severely degraded because of errors occurring at the relay for these small values of λs .
4
Cooperation in the Presence of Background Radiation
In this section, we assume that λn = 0. In this case, the background radiation results in nonzero photoelectron counts even in empty slots. As in the case of no background radiation, we consider two cooperation schemes: scheme 1 where the relay decodes and forwards all symbols it receives without applying any kind of selection strategy and scheme 2 where the relay backs off over some symbol durations in order not to confuse the destination with noisy replicas of symbols it received. Unlike the case of no background radiation where the relay can be 100% sure that the symbol it detected is correct (when there is one nonempty slot in Z (1) ), the presence of photoelectrons in empty slots due to background radiation imposes a certain level of uncertainty on the decision made at the relay. For both schemes, the decision at D will be based on the vector Z = Z (0) + (2) Z = [Z1 , . . . , ZQ ]. If the decoded symbol at R (ˆ s) is equal to the transmitted symbol (s), then the parameters of the components of Z that follow the Poisson distribution are as follows: P0 a20 λs + β2 P2 a22 λs + 2λn , q = s; ; sˆ = s (13) E[Zq ] = q = s. 2λn , On the other hand, if sˆ = s: ⎧ ⎨ P0 a20 λs + 2λn , q = s; E[Zq ] = β2 P2 a22 λs + 2λn , q = sˆ; ⎩ 2λn , q = s ; q = sˆ.
; sˆ = s
(14)
For scheme 1, R is always relaying sˆ defined in eq. (4); consequently, equations (13) and (14) correspond to the only two cases that might arise at D. For scheme 2, R might back off resulting in Z (2) = 0Q ; consequently, in addition to equations (13) and (14), the parameters of the components of Z might be as follows: P0 a20 λs + 2λn , q = s; ; R is backing off (15) E[Zq ] = q = s. 2λn ,
Novel Cooperation Strategies for FSO Communication Systems
551
We next propose a metric based on which R will forward the message or not. We define the probability p(q) as the probability that the symbol was transmitted in slot q along the S-R link. Following from equations (2) and (3), this probability can be written as: (1)
2
p(q) =
e−(β1 P1 a1 λs +λn ) (β1 P1 a21 λs + λn )Zq (1)
Zq !
Q "
Z
(1)
e−λn λn q
q =1 ; q =q
(1)
; q = 1, . . . , Q
Zq !
(16) The highest probability in the set {p(q)}Q is p(ˆ s ) since the maximum q=1 likelihood decision at R corresponds to selecting sˆ = arg maxq=1,...,Q [p(q)] ≡ (1) arg maxq=1,...,Q [Zq ]. Denote by sˆ the symbol associated with the next highest probability in {p(q)}Q q=1 : sˆ = arg max [p(1), . . . , p(ˆ s − 1), p(ˆ s + 1), . . . , p(Q)] (1) (1) (1) (1) ≡ arg max Z1 , . . . , Zsˆ−1 , Zsˆ+1 , . . . , ZQ
(17)
in other words, the probabilities in the set {p(q)}Q q=1 satisfy the following relation: p(q)|q=sˆ,ˆs ≤ p(ˆ s ) ≤ p(ˆ s). Based on what preceded, we define the index that quantifies the accuracy of the decision at the relay as the logarithm of the ratio of the most two probable p(ˆ s) s) and p(ˆ s ) results in: events: I = log p(ˆ s ) . Simplifying the common terms in p(ˆ
β1 P1 a21 λs (1) (1) I = Zsˆ − Zsˆ log 1 + λn
(18)
Now, the relay participates in the cooperation effort if the index I is greater than a certain threshold level Ith . In other words, if I ≥ Ith , R forwards sˆ and the parameters of the components of the decision vector at D are as given in equations (13) and (14). Note that the relation I ≥ Ith does not imply that sˆ = s; however, it imposes some kind of selectivity on the symbols to be forwarded. On the other hand, if I < Ith , R does not forward sˆ and the parameters of the components of the decision vector at D are as given in eq. (15). Finally, as in the case of no background radiation, scheme 2 results in an additional energy saving compared to scheme 1. Similarly, in the absence of feedback, we set: P0 = P1 = P2 = 1/3. Note that the conditional SEP does not lend itself to a simple analytical evaluation in the presence of background radiation. Therefore, we adopt the strategy given in eq. (12), and that was proposed in the no-background radiation case, for selecting one of the direct S-D path or indirect S-R-D path for the background radiation case in the presence of feedback. Even though this approach is not optimal, it is simple and it is expected to result in additional performance gains compared to the no feedback case. These expectations are confirmed in the next section.
552
C. Abou-Rjeily and S. Haddad
0
10
−1
P
e
10
−2
10
−3
10
−4
10 −190
No cooperation no feedback, scheme 1 or scheme 2 with no energy reuse no feedback, scheme 2 with energy reuse with feedback −185
−180
−175
−170
−165
Es(dBJ)
Fig. 2. Performance of 4-PPM in the absence of background radiation. In this figure, we fix: β1 = β2 = 1.
0
10
−1
P
e
10
−2
10
−3
10
−4
10 −190
No cooperation no feedback, scheme 1 or scheme 2 with no energy reuse no feedback, scheme 2 with energy reuse with feedback −185
−180
−175
−170
−165
E (dBJ) s
Fig. 3. Performance of 4-PPM in the absence of background radiation. In this figure, we fix: β1 = β2 = 4.
Novel Cooperation Strategies for FSO Communication Systems
5
553
Numerical Results
Based on section 3, two variants of scheme 2 are possible. (i) Scheme 2 with no energy reuse: in this case the energy saved when the relay backs off is not used to transmit future symbols. This strategy achieves the same performance level as scheme 1 and the additional advantage over the latter can be quantified by a parameter Es,s that captures the amount of energy saved per symbol. (ii) Scheme 2 with energy reuse: in this case the energy saved when the relay backs off is used to transmit future symbols. This energy reuse allows this strategy to achieve a better performance than scheme 1; evidently, Es,s = 0 in this case. Fig. 2 shows the performance of 4-PPM in the absence of background radiation. In this figure, we assume that dSD = dSR = dRD resulting in β1 = β2 = 1. The slopes of the SEP curves indicate that cooperation results in an increased diversity order even in this extreme case where S is as far from R as it is from D. This figure also shows the impact of feedback on enhancing the performance. While cooperation with no feedback outperforms non-cooperative systems at values of Es exceeding -176 dB approximately, cooperation with feedback outperforms non-cooperative systems for practically all values of Es . This figure also shows that scheme 2 with energy reuse results in additional gains especially for small values of Es . The energy reuse also decreases the value of Es starting from which cooperation becomes useful (with no feedback) by about 1 dB. Similar results are obtained in Fig. 3 for β1 = β2 = 4. Fig. 4 plots the variation of Es,s as a function of Es for scheme 2 with no energy reuse. Two cases are considered: β1 = β2 = 1 and β1 = β2 = 4. This figure shows that significant energy savings are possible in both cases. Note that for β1 = 4, the relay is two times closer to the source (compared to the case β1 = 1).
−185
−195
ss
E : Energy saved per symbol (dBJ)
−190
−200
β =β =1 1 2 β1=β2=4 −205 −200
−195
−190
−185
−180
−175
−170
−165
−160
−155
−150
E : transmitted energy per symbol (dBJ) s
Fig. 4. Energy saved when scheme 2 is applied (with no energy reuse) in the absence of background radiation
554
C. Abou-Rjeily and S. Haddad
In this case, the signal level at the relay is large implying that the relay backs off less often resulting in less energy savings. Fig. 5 shows the performance of 4-PPM in the presence of background radiation where we fix: Pb Ts /Q = −185 dBJ. In this figure, we assume that dSD = 2dSR = 2dRD resulting in β1 = β2 = 4. This figure compares the performance of non-cooperative systems with scheme 1 and scheme 2. The results pertaining to scheme 2 are plotted in the case where the energy saved from certain symbols (for which I < Ith ) is not reused for the transmission of subsequent
−1
10
−2
Pe
10
−3
10
−4
10
no cooperation no feedback, scheme 1 no feedback, scheme 2 Ith=1 no feedback, scheme 2 Ith=2 no feedback, scheme 2 Ith=3 no feedback, scheme 2 Ith=4 no feedback, scheme 2 I =5 th
with feedback −180
−175
−170
−165
−160
−155
−150
Es(dBJ)
Fig. 5. Performance of 4-PPM in the presence of background radiation with Pb Ts /Q = −185 dBJ and β1 = β2 = 4. Scheme 2 is applied with no energy reuse
−184
Ess: Energy saved per symbol (dBJ)
−185
−186
−187
−188
no feedback, scheme 2 Ith=1 no feedback, scheme 2 Ith=2 no feedback, scheme 2 Ith=3 no feedback, scheme 2 Ith=4 no feedback, scheme 2 Ith=5
−189
−190 −180
−175
−170 −165 −160 E : transmitted energy per symbol (dBJ)
−155
−150
s
Fig. 6. Energy saved when scheme 2 is applied (with no energy reuse) with Pb Ts /Q = −185 dBJ and β1 = β2 = 4
Novel Cooperation Strategies for FSO Communication Systems
555
symbols (no energy reuse). This figure shows that scheme 2 outperforms scheme 1 and that the performance gains increase with Ith . For example, as Ith increases from 1 to 2, the performance gain of scheme 2 compared to scheme 1 increases from 5 dB to about 8 dB. Moreover, these performance gains are achieved with additional energy savings that are shown in Fig. 6. Evidently, these savings are more significant for larger values of Ith since the relay will be backing off more often in this case.
6
Conclusion
We proposed novel cooperation strategies that are adapted to FSO systems. In the absence of feedback, selecting the cooperation intervals in an adequate manner can result in significant performance gains as well as energy savings that can be reinvested to further boost the error rate. In the presence of 1 feedback bit, a simple strategy based on selecting one link among the available direct or indirect links turned out to be very useful especially for low signal levels.
References 1. Kedar, D., Arnon, S.: Urban Optical wireless communications networks: the main challenges and possible solutions. IEEE Commun. Mag. 42, 2–7 (2003) 2. Zhu, X., Kahn, J.M.: Free-Space Optical communication through atmospheric turbulence channels. IEEE Trans. Commun. 50, 1293–1300 (2002) 3. Laneman, J., Wornell, G.: Distributed space time coded protocols for exploiting cooperative divesrity in wireless networks. IEEE Trans. Inf. Theory 49, 2415–2425 (2003) 4. Laneman, J., Tse, D., Wornell, G.: Cooperative diversity in wireless networks: Efficient protocols and outage behavior. IEEE Trans. Inf. Theory 50, 3062–3080 (2004) 5. Khalighi, M.-A., Schwartz, N., Aitamer, N., Bourennane, S.: Fading Reduction by Aperture Averaging and Spatial Diversity in Optical Wireless Systems. IEEE Journal of Optical Commun. and Networking 1, 580–593 (2009) 6. Wilson, S.G., Brandt-Pearce, M., Cao, Q., Leveque, J.H.: Free-space optical MIMO transmission with Q-ary PPM. IEEE Trans. Commun. 53, 1402–1412 (2005) 7. Simon, M.K., Vilnrotter, V.A.: Alamouti-type space-time coding for free-space optical communication with direct detection. IEEE Trans. Wireless Commun. 4, 35–39 (2005) 8. Garcia-Zambrana, A., Castillo-Vazquez, C., Castillo-Vazquez, B., Hiniesta-Gomez, A.: Selection Transmit Diversity for FSO Links Over Strong Atmospheric Turbulence Channels. IEEE Photon. Technol. Lett. 21, 1017–1019 (2009) 9. Abou-Rjeily, C., Slim, A.: Cooperative Diversity for Free-Space Optical Communications: Transceiver Design and Performance Analysis. IEEE Trans. Commun. (accepted for publication)
556
C. Abou-Rjeily and S. Haddad
Appendix: Performance in the Absence of Background Radiation with No Feedback Assume that the symbol s ∈ {1, . . . , Q} was transmitted. We recall that the decoded symbol at R is deonoted by sˆ while the final decision at D is denoted by s˜. (0) Scheme 1: If Zs > 0, then a correct decision will be made at D. Consequently, Pe|A can be written as: (2) (2) (19) Pe|A = Pr(Zs(0) = 0) Pr(Zsˆ = 0)p1 + Pr(Zsˆ > 0)p2 (0)
(2)
where p1 = Q−1 = Zsˆ = 0 implies that Z (0) = Z (2) = 0Q Q since the case Zs resulting in a random decision taken at D. On the other hand, p2 = pe (which is (0) (2) the probability of error at R). In fact when Zs = 0 and Zsˆ > 0, D will decide in favor of s˜ = sˆ resulting in an erroneous decision (˜ s = s) when an erroneous decision is made at the relay (ˆ s = s) with probability pe . Given that pe can be (1) written as: pe = Q−1 Q Pr(Zs = 0), then eq. (19) reduces to: Pe|A =
Q−1 (2) (2) Pr(Zs(0) = 0) Pr(Zsˆ = 0) + Pr(Zsˆ > 0)Pr(Zs(1) = 0) Q 2
(20)
2
From eq. (3): Pr(Zs = 0) = e−P0 a0 λs and Pr(Zs = 0) = e−β1 P1 a1 λs and 2 (2) (2) from eq. (5): Pr(Zsˆ = 0) = 1 − Pr(Zsˆ > 0) = e−β2 P2 a2 λs . Replacing these probabilities in eq. (20) results in eq. (7). For scheme 2, an error occurs with probability Q−1 Q (tie breaking) only when (0) (2) Z = Z = 0Q . Consequently, Pe|A can be written as: (0)
(1)
Q−1 Pr(Z (0) = 0Q )Pr(Z (2) = 0Q ) Q Q−1 Pr(Zs(0) = 0) 1 − Pr(Z (2) = 0Q ) = Q
Pe|A =
(21)
Now Z (2) = 0Q if and only if R is forwarding the message (a nonzero photoelectron count was observed along the S-R link) and a nonzero photoelectron count was observed along the R-D link. In other words, Z (2) = 0Q if and only if (1) (2) Zs > 0 and Zsˆ > 0 (where in this case sˆ = s also). Consequently: Q−1 Pr(Zs(0) = 0) 1 − Pr(Zs(1) > 0)Pr(Zs(2) > 0) Q 2 2 Q − 1 −P0 a20 λs 1 − (1 − e−β1 P1 a1 λs )(1 − e−β2 P2 a2 λs ) e = Q
Pe|A =
which reduces to eq. (7).
(22)
Hybrid HMM/ANN System Using Fuzzy Clustering for Speech and Medical Pattern Recognition Lilia Lazli1, Abdennasser Chebira2, Mohamed Tayeb Laskri1, and Kurosh Madani2 1
Laboratory of research in Computer Science (LRI/GRIA), Badji Mokhar University, B.P.12 Sidi Amar 23000 Annaba – Algeria [email protected], [email protected] http://www.univ-annaba.org 2 Images, Signals and Intelligent Systems Laboratory (LISSI / EA 3956) PARIS XII University, Senart-Fontainebleau Institute of Technology, Bat.A, Av. Pierre Point, F-77127 Lieusaint, France {achebira,kmadani}@univ-paris12.fr, http://www.univ-paris12.fr
Abstract. The main goal of this paper is to compare the performance which can be achieved by three different approaches analyzing their applications’ potentiality on real world paradigms. We compare the performance obtained with (1) Discrete Hidden Markov Models (HMM) (2) Hybrid HMM/MLP system using a Multi Layer-Perceptron (MLP) to estimate the HMM emission probabilities and using the K-means algorithm for pattern clustering (3) Hybrid HMM-MLP system using the Fuzzy C-Means (FCM) algorithm for fuzzy pattern clustering. Experimental results on Arabic speech vocabulary and biomedical signals show significant decreases in error rates for the hybrid HMM/MLP system based fuzzy clustering (application of FCM algorithm) in comparison to a baseline system. Keywords: Arabic speech recognition, biomedical diagnosis, fuzzy clustering, hidden Markov models, artificial Neural Network.
1 Introduction In many target (or pattern) classification problems the availability of multiple looks at an object can substantially improve robustness and reliability in decision making. The use of several aspects is motivated by the difficulty in distinguishing between different classes from a single view at an object [9]. It occurs frequently that returns from two different objects at certain orientations are so similar that they may easily be confused. Consequently, a more reliable decision about the presence and type of an object can be made based upon observations of the received signals or patterns at multiple aspect angles. This allows for more information to accumulate about the size, shape, composition and orientation of the objects, which in turn yields more accurate discrimination. Moreover, when the feature space undergoes changes, owing to different H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 557–570, 2011. © Springer-Verlag Berlin Heidelberg 2011
558
L. Lazli et al.
operating and environmental conditions, multi-aspect classification is almost a necessity in order to maintain the performance of the pattern recognition system. In this paper we present the Hidden Markov Model (HMM) and apply them to complex pattern recognition problem. We attempt to illustrate some applications of the theory of HMM to real problems to match complex patterns problems as those related to biomedical diagnosis or those linked to social behavior modeling. We introduce the theory and the foundation of Hidden Markov Models (HMM). In the pattern recognition domain, and particularly in speech recognition, HMM techniques hold an important place [7]. There are two reasons why the HMM exists. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Second the models, when applied properly, work very well in practice for several important applications. However, standard HMM require the assumption that adjacent feature vectors are statistically independent and identically distributed. These assumptions can be relaxed by introducing Neural Network (NN) in the HMM framework. Significant advances have been made in recent years in the area of speaker independent speech recognition. Over the last few years, connectionist models, and Multi Layer-Perceptron (MLP) in particular, have been widely studied as potentially powerful approaches to speech recognition. These neural networks estimate the posterior probabilities used by the HMM. Among these, the hybrid approach using the MLP to estimate HMM emission probabilities has recently been shown to be particularly efficient by example for French speech [1] and American English speech [14]. We then propose a hybrid HMM/MLP model for speech recognition and biomedical diagnosis which makes it possible to join the discriminating capacities, resistance to the noise of MLP and the flexibilities of HMMs in order to obtain better performances than traditional HMM. We develop also, a method based on concepts of fuzzy logic for clustering and classification of vectors: Fuzzy C-Means (FCM) algorithm, and demonstrate its effectiveness with regard to K-Means (KM) traditional algorithm, owing to the fact that the KM algorithm provides a hard decision (not probabilitized) in connection with the observations in the HMM states.
2 HMM Advantages and Drawbacks Standard HMM procedures, as defined above, have been very useful for pattern recognition; HMM can deal efficiently with the temporal aspect of pattern (including temporal distortion or time warping) as well as with frequency distortion. There are powerful training and decoding algorithms that permit efficient training on very large databases, and for example recognition of isolated words as well as continuous speech. Given their flexible topology, HMM can easily be extended to include phonological rules (e.g., building word models from phone models) or syntactic rules.
Hybrid HMM/ANN System Using Fuzzy Clustering
559
For training, only a lexical transcription is necessary (assuming a dictionary of phonological models); explicit segmentation of the training material is not required. An example of a sample HMM is given in Fig. 1; this could be the model of a sort assumed to be composed of three stationary states.
p(xi\q1)
p(xi\q2)
p(xi\q3)
Fig. 1. Example of a three-state Hidden Markov Models Where 1) {q1, q2, q3}: the HMM states. 2) p(qi\qj): the transition probability of state qi to state qj (i, j = 1..3). 3) p(xi\qj): the emission probability of observation xi from the state qj (i = 1..number of trames, j = 1..3).
However, the assumptions that permit HMM optimization and improve their efficiency also, practice, limit their generality. As a consequence, although the theory of HMM can accommodate significant extensions (e.g., correlation of acoustic vectors, discriminate training,…), practical considerations such as number of parameters and trainability limit their implementations to simple systems usually suffering from several drawbacks [2] including: - Poor discrimination due to training algorithms that maximizes likelihoods instead of a posteriori probabilities (i.e., the HMM associated with each pattern unit is trained independently of the other models). Discriminate learning algorithms do exist for HMM but in general they have not scaled well to large problems. - A priori choice of model topology and statistical distributions, e.g., assuming that the Probability Density Functions (PDF) associated as multivariate Gaussian densities or mixtures of multivariate Gaussian densities, each with a diagonal only covariance matrix (i.e., possible correlation between the components of the acoustic vectors is disregarded). - Assumption that the state sequences are first-order Markov chains. - Typically, very limited acoustical context is used, so that possible correlation between successive acoustic vectors is not modeled very well.
560
L. Lazli et al.
Much Artificial Neural Network (ANN) based Automatic pattern recognition research has been motivated by these problems.
3 Estimating HMM Likelihoods with ANN ANN can be used to classify pattern classes such as units. For statistical recognition systems, the role of the local estimator is to approximate probabilities or PDF. Practically, given the basic HMM equations, we would like to estimate something like p(xn\qk), is the value of the probability density function of the observed data vector given the hypothesized HMM state. The ANN in particular the MLP can be trained to produce the posterior probability p(qk\xn) of the HMM state give the acoustic data (figure 2). This can be converted to emission PDF values using Bayes’rule. Several authors [1-4, 7-10] have shown for speech recognition that ANN can be trained to estimate a posteriori probabilities of output classes conditioned on the input pattern. Recently, this property has been successfully used in HMM systems, referred to as hybrid HMM/ANN systems, in which ANN are trained to estimate local probabilities p(qk\xn) of HMM states given the acoustic data. Since the network outputs approximate Bayesian probabilities, gk(xn,Θ) is an estimate of:
p(q k \ x n ) =
p( x n \ q k ) p(q k ) p( xn )
(1)
which implicitly contains the a priori class probability p(qk). It is thus possible to vary the class priors during classification without retraining, since these probabilities occur only as multiplicative terms in producing the network outputs. As a result, class probabilities can be adjusted during use of a classifier to compensate for training data with class probabilities that are not representative of actual use or test conditions [18], [19]. Thus, scaled likelihoods p(xn\qk) for use as emission probabilities in standard HMM can be obtained by dividing the network outputs gk(xn) by the training set, which gives us an estimate of:
p( xn \ q k ) p( xn )
(2)
During recognition, the seating factor p(xn) is a constant for all classes and will not change the classification. It could be argued that, when dividing by the priors, we are using a scaled likelihood, which is no longer a discriminate criterion. However, this needs not the true, since the discriminate training has affected the parametric optimization for the system that is used during recognition. Thus, this permit uses of the standard HMM formalism, while taking advantage of ANN characteristics.
Hybrid HMM/ANN System Using Fuzzy Clustering
561
Fig. 2. The MLP that estimates local observation probabilities
4 Motivations for ANN ANN have several advantages that make them particularly attractive for Pattern Recognition (PR) [4, 9, 10, 14], e.g.: - They can provide discriminate learning between pattern units, the HMM states that are represented by ANN output classes. That is, when trained for classification (using common cost functions such as Mean Squared Error (MSE) or relative entropy), the parameters of the ANN output classes are trained to minimize the discrimination between the correct output class and the rival ones. In other words, ANN only trains and optimizes the parameters of each class on the data belonging to that class, but also attempts to reject data belonging to the other (rival) classes. This is in contrast to the likelihood criterion, which does not lead to minimization of the error rate. - Because ANN can incorporate multiple constraints for classification, features do not need to be assumed independent. More generally, there is no need for strong assumptions about the statistical distributions of the input features (as is usually required in standard HMM). - They have a very flexible architecture which easily accommodates contextual inputs and feedback, and both binary and continuous inputs. - ANN are highly parallel and regular structures, which makes them especially amenable to high-performance architectures and hardware implementations.
5 Clustering Procedure Clustering is a method for dividing scattered groups of data into several groups. It is commonly viewed as an instance of unsupervised learning. The grouping of the patterns is then accomplished through clustering by defining and quantifying similarities
562
L. Lazli et al.
between the individual data points or patterns. The patterns that are similar to the highest extent are assigned to the same cluster. Clustering analysis is based on partitioning a collection of data points into a number of subgroups, where the objects inside a cluster show a certain degree of closeness or similarity. In our work, two clustering methods are used: the K-Means (KM) and the Fuzzy C-Means (FCM) algorithms. 5.1 K-Means Algorithm This is the most heavily used clustering algorithm because it is used as an initial process for many other algorithms. As can be seen in Fig.3, where the pseudo-code is presented [9], the KM algorithm is provided somehow with an initial partition of the database and the centroïds of these initial clusters are calculated. Then, the instances of the database are relocated to the cluster represented by the nearest centroïd in an attempt to reduce the squareerror. This relocation of the instances is done following the instance order. If an instance in the relocation step (step3) changes its cluster membership, then the centroïd of the clusters Cs and Ct and the square-error should be recomputed. This process is repeated until convergence, that is, until the square-error cannot be further reduced which means no instance changes its clusters membership.
Step1. Select somehow an initial partition of the database in k clusters {C1, …, Ck}. Step2. Calculate cluster centroïds
wi =
1 ki
ki
∑w
ij
, i = 1,..., k
j =1
Step3. FOR every wi in the database and following the instance order DO Step3.1. Reassign instance of wi to its closest cluster centroïd, wi ∈ Cs is moved from Cs to Ct if
wi − wt ≤ wi − w j
for all j=1, …, k, j ≠ s.
Step3.2. Recalculate centroïds for clusters Cs and Ct. Step4. IF cluster memberships stabilized THEN stop ELSE go to Step 3.
Fig. 3. The pseudo-code of the K-Means algorithm [9]
5.2 Drawbacks of the K-Means Algorithm Despite being used in a wide array of applications, the KM algorithm is not exempt from drawbacks. Some of these drawbacks have been extensively reported in literature. The most important are listed below:
Hybrid HMM/ANN System Using Fuzzy Clustering
563
- As many clustering methods, the KM algorithm assumes that the number of clusters k in the database is known beforehand which, obviously, is not necessarily true in real-world applications. - As an iterative technique, the KM algorithm is especially sensitive to initial starting conditions (initial clusters and instance order). - The KM algorithm converges finitely to a local minimum. The running of the algorithm defines a determinist mapping from the initial solution to the final one. - The vectorial quantization and in particular, the KM algorithm provides a hard and fixed decision not probabilitized which does not transmit enough information on the real observations. Many clustering algorithms based on fuzzy logic concepts have been motivated for this last problem. 5.3 Proposed Clustering Algorithm In general, and by example for speech context a purely acoustic segmentation of the speech cannot suitably detect the basic units of the vocal signal. One of the causes, is that the borders between these units are not acoustically defined. For this reason, we were interested to use the automatic classification methods which are based on fuzzy logic in order to segment the data vectors. Among the adapted algorithms, we have chosen the FCM algorithm which has already been successfully used in various fields and especially in the image processing [13], [15]. FCM algorithm is a method of clustering which allows one piece of data to belong to two or more clusters. The use of the measurement data is used in order to take not of pattern data by considering in spectral domain only. However, this method is applied for searching some general regularity in the collocation of patterns focused on finding a certain class of geometrical shapes favored by the particular objective function [5]. The FCM algorithm is based on minimization of the following objective function, with respect to U, a fuzzy c-partition of the data set, and to V, a set of K prototypes [5]: m
c
J m (U , V ) = ∑∑ u ijm X j − Vi j =1 i =1
2
(3)
1≤1≺ ∞ where m is any real number greater than 1, uij is the degree of membership of xj in the cluster i, xj is the j th of d-dimensional measured data, Vi is the d-dimension center of the cluster, and ||*|| is the any norm expressed the similarity between any measured data and the center. Fuzzy partition is carried out through an iterative optimization of (3) with the update of membership u and the cluster centers V by [5]:
u ij =
1 ⎛ d ij ⎜⎜ ∑ k =1 ⎝ d ik c
⎞ ⎟⎟ ⎠
2 m −1
(4)
564
L. Lazli et al.
n
Vi =
∑u j =1
m ij
Xj (5)
n
∑u
m ij
j =1
The criteria in this iteration will stop when
max u ij − uˆ ij ≺ ε where ε is a termiij
nation criterion between 0 and 1.
6 Validation on Speech and Biomedical Signal Classification Paradigm Further assume that for each class in the vocabulary we have a training set of k occurrences (instances) of each class where each instance of the categories constitutes an observation sequence. In order to build our tool, we perform the following operations 1. 2.
For each class v in the vocabulary, we estimate the model parameters λv (A, B, π) that optimize the likelihood of the training set for the vth category. For each unknown category to be recognized, the processing of Fig. 4 is carried out: measurement of the observation sequence O = {o1, o2, …,oT}, via a feature analysis of the signal corresponding to the class; the computation of model likelihoods for all possible models, P(O/λv), 1≤ v ≤ V; at the end the selection of the category with the highest likelihood.
Fig. 4. Block diagram of a speech and biomedical database HMM recognizer
Hybrid HMM/ANN System Using Fuzzy Clustering
565
6.1 Speech Database The speech database has been used in this work contained about 50 speakers saying their last name, first name, the city of birth and the city of residence in Arabic language. Each word should be marked 10 times. We have chosen the vocabulary in an artificial way in order to avoid the repetitions in the words of vocabulary. The used training set in the following experiments consists of 2000 sounds (1500 sounds for training and 500 for cross validation) used to adapt the learning rate of the MLP [1]. The test set was said by 8 speakers (4 men and 4 women) pronounce the sequence, 5 times. The training and test data were recorded in an emphasized environment by using a microphone. Speech recordings were sampled over the microphone at 11 KHz. After preemphasis (factor 0.95) and application of Hamming windows, we use the log RASTA-PLP (RelActive SpecTrAl processing – Perceptual Linear Predictive) features [6]. These parameters were computed every 10 ms on analysis windows of 30 ms. Each frame is represented by 12 components plus energy (log RASTA-PLP + E). The values of the 13 coefficients are standardized by their standard deviation measured on the frames of training. The feature set for our hybrid HMM/MLP system was based on 26 dimensional vectors composed of the log RASTA-PLP cepstral parameters, the first time derivative of cepstral vectors, the ∆ energy and the ∆∆ energy. Nine frames of contextual information was used at the input of the MLP (9 frames of context is known to yield usually the best recognition performance) [4]. The acoustic feature were quantized into independent codebooks according to the KM and FCM algorithms respectively: -
128 clusters for the log RASTA-PLP vectors. 128 clusters for the first time derivative of cepstral vectors. 32 clusters for the first time derivative of energy. 32 clusters for the second time derivative of energy.
6.2 Biomedical Database For task of biomedical database recognition using HMM/MLP model, the object of this survey is the classification of an electric signal coming from a medical test [16], experimental device is described in Fig. 5. The used signals are called Potentials Evoked Auditory (PEA), examples of PEA signals are illustrated in Fig. 6. Indeed, the exploration functional otoneurology possesses a technique permitting the objective survey of the nervous conduction along the auditory ways. The classification of the PEA is a first step in the development of a help tool to the diagnosis. The main difficulty of this classification resides in the resemblance of signals corresponding to different pathologies, but also in the disparity of the signals within a same class. The results of the medical test can be indeed different for two different measures for the same patient. The PEA signals descended of the examination and their associated pathology are defined in a data base containing the files of 11185 patients.. We chose 3 categories of patients (3 classes) according to the type of their trouble. The categories of patients are:
566
L. Lazli et al.
1) Normal (N): the patients of this category have a normal audition (normal class). 2) Endocochlear (E): these patients suffer from disorders that touches the part of the ear situated before the cochlea (class Endocochlear). 3) Retrocochlear (R): these patients suffer from disorders that touches the part of the ear situated to the level of the cochlea or after the cochlea. (class retrocochlear). We selected 213 signals (correspondents to patients). So that every process (signal) contains 128 parameters. 92 among the 213 signals belong to the N class, 83, to the class E and 38 to the class R. To construct our basis of training, we chose the signals corresponding to pathologies indicated like being certain by the physician. All PEA signals come from the same experimental system. In order to value the HMM/MLP realized system and for ends of performance comparison, we forced ourselves to apply the same conditions already respected in the work of the group describes in the following article [17] and that uses a multi-network structure heterogeneous set to basis of RBF and LVQ networks and the work describes in the following articles [11], [12] and that uses a discrete HMM. For this reason, the basis of training contains 24 signals, of which 11 correspondent to the class R, 6 to the class E and 7 to the N class. After the phase of training, when the non learned signals are presented to the HMM, the corresponding class must be designated. For the case in which we wish to use an HMM with a discrete observation symbol density, rather than continuous vectors above, a quantized vector VQ is required to map each continuous observation vector into a discrete codebook index. Once the codebook of vectors has been obtained, the mapping between continuous vectors and codebook indices becomes a simple nearest neighbor computation, i.e., the continuous vector is assigned the index of the nearest codebook vector. Thus the major issue in VQ is the design of an appropriate codebook for quantization.
Fig. 5. PEA generation experiment
Fig. 6. (A) PEA signal for normal patient, (B) PEA signal for endocochlear pathology
Codebook sizes of from M = 64 vectors have been used in biomedical database recognition experiments using HMM/MLP model.
Hybrid HMM/ANN System Using Fuzzy Clustering
567
6.3 Training and Recognition Model 10 and 5 states, strictly left-to-right, discete HMM respectively for speech and PEA signals are trained. For speech signals, emission probabilities computed from MLP with 9 frames of quantized acoustic vectors at the input, i.e., the current acoustic vector preceded by the 4 acoustic vectors on the left context and followed by the 4 acoustic on the right context. Each acoustic vector was represented by a binary vector in the case of the KM clustering, composed of 4 fields (representing the 4 acoustic features) respectively containing 128, 128, 32 and 32 bits. In each field, only one bit was "on" to represent the current associated cluster. Since 9 frames of acoustic vectors were used at the input of the MLP, this resulted in a (very sparse) binary MLP input layer of dimension 2880 with only 36 bits "on". In the case of FCM clustering, we presented each cepstral parameter (log RASTA-PLP, ∆ log RASTA-PLP, ∆E, ∆∆E) by a real vector which whose components definite the membership degrees of the parameter to the various classes of the codebook. The number of the hidden neurons was chosen empirically. An output layer made up of as many neurons as there are HMM states. Thus a MLP with only one hidden layer including 2880 neurons at the entry, 30 neurons for the hidden layer and 10 output neurons was trained for speech signals. For the PEA signals, a MLP with 64 neurons at the entry, 18 neurons for the hidden layer and 5 output neurons was trained. 6.4 Results and Discussion The recognition rates in Figs. 7 and 8 show that the hybrid HMM/MLP system using the FCM clustering gave the best results for the speech and biomedical data classification. However, we can draw a preliminary conclusion from the results reported in Figs. 7 and 8: for the PEA signals diagnosis, by comparing its rate with the rate of classification of the system using the multi-network structure (RBF/LVQ) describes in [17], the rate of the discrete HMM is better. The hybrid discrete HMM/MLP
Fig. 7. Recognition rates for the speech data base
568
L. Lazli et al.
Fig. 8. Recognition rates for the biomedical data base
approaches always outperforms standard discrete HMM and the hybrid discrete HMM-MLP system using the FCM clustering outperforms the hybrid discrete HMMMLP system using the KM clustering.
7 Conclusion and Future Work In this paper, we presented a discriminate training algorithm for hybrid HMM/MLP system based on the FCM clustering. Our results on isolated speech and biomedical signals recognition tasks show an increase in the estimates of the posterior probilities of the correct class after training, and significant decreases in error rates in comparison to the tree systems: 1) Discrete HMM, 2) discrete HMM/MLP approaches with KM clustering and 3) multi-network RBF/LVQ structure for the PEA signals diagnosis. These preliminary experiments have set a baseline performance for our hybrid FCM/HMM/MLP system. Better recognition rates were observed. From the effectiveness view point of the models, it seems obvious that the hybrid models are more powerful than discrete HMM or multi-network RBF/LVQ structure for the PEA signals diagnosis. We thus envisage improving the performance of the suggested system, the following points: - It would be important to use other techniques of speech parameters extraction and to compare the recognition rate of the system with that using the log RASTA-PLP analysis. We think of using the LDA (Linear Discriminate Analysis) and CMS (Cepstral Mean Subtraction) owing to the fact that these representations are currently considered among most powerful in ASR. - It appears also interesting to use the continuous HMM with a multi-Gaussian distribution and to compare the performance of the system with that of the discrete HMM.
Hybrid HMM/ANN System Using Fuzzy Clustering
-
-
569
In addition, for an extended speech vocabulary, it is interesting to use the phonemes models instead of words, which facilitates the training with relatively small bases. For the PEA signals recognition, the main idea is to define a fusion scheme: cooperation of HMM with the multi-network RBF/LVQ structure in order to succeed to a hybrid model and compared th perfomance with the HMM/MLP/FCM model proposed in this paper.
References 1. Deroo, O., Riis, C., Malfrere, F., Leich, H., Dupont, S., Fontaine, V., BoÎte, J.M.: Hybrid HMM/ANN system for speaker independent continuous speech recognition in French. Thesis, Faculté polytechnique de Mons – TCTS, BELGIUM (1997) 2. Bourlard, H., Dupont, S.: Sub-band-based speech recognition. In: Proc. IEEE International Conf. Acoustic, Speech and Signal Process, Munich, pp. 1251–1254 (1997) 3. Berthommier, F., Glotin, H.: A new SNR-feature mapping for robust multi-stream speech recognition. In: Proceeding of International Congress on Phonetic Sciences (ICPhS), Sanfrancisco, vol. XIV, pp. 711–715. University of California, Berkeley (1999) 4. Boite, J.-M., Bourlard, H., D’Hoore, B., Accaino, S., Vantieghem, J.: Task independent and dependent training: performance comparison of HMM and hybrid HMM/MLP approaches, vol. I, pp. 617–620. IEEE, Los Alamitos (1994) 5. Bezdek, J.C., Keller, J., Krishnapwam, R., Pal, N.R.: Fuzzy models and algorithms for pattern recognition and image processing. Kluwer, Boston (1999) 6. Hermansky, H., Morgan, N.: RASTA Processing of speech. IEEE Trans. on Speech and Audio Processing 2(4), 578–589 (1994) 7. Hagen, A., Morris, A.: Comparison of HMM experts with MLP experts in the full combination multi-band approach to robust ASR. To appear in International Conference on Spoken Language Processing, Beijing (2000) 8. Hagen, A., Morris, A.: From multi-band full combination to multi-stream full combination processing in robust ASR. To appear in ISCA Tutorial Research Workshop ASR2000, Paris, France (2000) 9. Lazli, L., Sellami, M.: Hybrid HMM-MLP system based on fuzzy logic for arabic speech recognition. In: PRIS2003, The Third International Workshop on Pattern Recognition in Information Systems, Angers, France, April 22-23, pp. 150–155 (2003) 10. Lazli, L., Sellami, M.: Connectionist Probability Estimators in HMM Speech Recognition using Fuzzy Logic. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, Springer, Heidelberg (2003) 11. Lazli, L., Chebira, A.-N., Madani, K.: Hidden Markov Models for Complex Pattern Classification. In: Ninth International Conference on Pattern Recognition and Information Processing, PRIP 2007, Minsk, Belarus, May 22-24 (2007), http://uiip.basnet.by/conf/prip2007/prip2007.php-id=200.htm 12. Lazli, L., Chebira, A.-N., Laskri, M.-T., Madani, K.: Using hidden Markov models for classification of potentials evoked auditory. In: Conférence maghrébine sur les technologies de l’information, MCSEAI 2008, Ustmb, Oran, Algeria, April 28-30, pp. 477–480 (2008) 13. Pham, D.-L., Prince, J.-L.: An Adaptive Fuzzy C-means algorithm for Image Segmentation in the presence of Intensity In homogeneities. Pattern Recognition Letters 20(1), 57– 68 (1999)
570
L. Lazli et al.
14. Riis, S.-K., Krogh, A.: Hidden Neural Networks: A framework for HMM-NN hybrids. In: IEEE 1997, to appear in Proc. ICASSP 1997, Munich, Germany, April 21-24 (1997) 15. Timm, H.: Fuzzy Cluster Analysis of Classified Data. IFSA/Nafips, Vancouver (2001) 16. Motsh, J.-F.: La dynamique temporelle du trons cérébral: Recueil, extraction et analyse optimale des potentiels évoqués auditifs du tronc cérébral Thesis, University of Créteil Paris XII (1987) 17. Dujardin, A.-S.: Pertinence d’une approche hybride multi-neuronale dans la résolution de problèmes liés au diagnostic industrièle ou médical. Internal report, I2S laboratory, IUT of "Sénart Fontainebleau, University of Paris XII, Avenue Pierre Point, 77127 Lieusaint, France (2006) 18. Morris, A., Hagen, A., Glotin, H., Bourlard, H.: Multi-stream adaptative evidence combination for noise robust ASR. Accepted for publication in Speech Communication (2000) 19. Morris, A., Hagen, A., Bourlard, H.: MAP combination of multi-stream HMM or HMM/ANN experts. Accepted for publication in Euro-speech 2001, Special Event Noise Robust Recognition, Aalborg, Denmark (2001)
Mobile-Embedded Smart Guide for the Blind Danyia AbdulRasool1 and Susan Sabra2 1
Kuwait, Kuwait [email protected] 2 Manama, Bahrain [email protected]
Abstract. This paper presents a new device to help the blind find his/her way around obstacles. Some people use different things to help with their visual impairment such as glasses, Braille, seeing eye dogs, canes, and adaptive computer technology. Our device called “Smart Guide” (SG) helps the blind find his/her way by using a mobile phone. SG integrates different technologies, packed in a small compartment, where it consists of a Bluetooth antenna, a sensor, a central processing unit, a memory and speakers. The device has a sensor which sends a signal to the mobile application when detecting any solid objects. After that, the mobile will notify the blind and gives him/her warning alarm to change their direction and get on the right way. The warning alarm can work in three options; voice alarming, a beeping alert, and vibration. The goal of SG is to convert signals of sensing objects to an audio output to help the visually impaired people move without colliding with surrounding objects or people. SG is light to carry, easy to use, affordable and efficient. Interviews and surveys with doctors and a group of blind individuals in an organization for the disabled in Kuwait helped obtaining a very positive feedback on the success of using the device. The device is patented today under Kuwaiti law. Keywords: Pervasive Computing, Blind, Bluetooth, Mobile Application, Audio Processing.
1 Introduction Today, people embrace technology and incorporate it in their daily life activities to become a necessity rather than a luxury. For this reason, we have to think of new ways for helping those who cannot benefit from the technology in a conventional way. Transforming technology to conform to human needs requires a lot of research, nevertheless to serve the disabled needs. Current research investigates the use of dynamically updated verbal descriptions, messages whose content changes with respect to the movement through the environment [3]. Blind navigators are a group of the disabled who have their share of new devices using state-of-the-art technology to help them overcome movement difficulties. Automatic way-finding verbal aids for blind pedestrians in simple and structured urban areas [2] have been developed such devices are: Angel Wing, Wand, Touch & Go, and UltraCane. As a strongly on-going area of research, developing devices and using different technologies, became a priority and a necessity in developed and developing countries. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 571–578, 2011. © Springer-Verlag Berlin Heidelberg 2011
572
D. AbdulRasool and S. Sabra
The ministry of Social Services in Kuwait is dedicated to serve the disabled and oversee their needs through numerous local non-profit organizations or institutions. As a social worker, one of our responsibilities was to visit these organizations and assess the requirements to satisfy the needs of those disabled people. Our focus was on the blind where it was noticed that they need better devices to help them find their way by overcoming the difficulties and shortcomes of the currently used devices such as high cost, bulky size, heavy weight, inaccurate detection, etc. Hence, the idea of “Smart Guide” (SG) was born to overcome the mishaps of the other devices combined and help the blind to be more independent and feel more comfortable when in public. SG is distinguished from the other devices because: • • • • •
It is not expensive and suitable for any standard of living or income level. It is high in value and quality. It is easy to use It has a nicely designed shape that could be also used as an accessory. Most importantly it can detect all objects even large windows, walls and doors that are made of glass.
2 Smart Guide (SG) Smart Guide belongs to more than one domain of technology due to its integration of characteristics of multiple technologies at once. Unobtrusive pervasive computing has a number of applications within the personal and professional worlds of the mobile device user. Conceptual drivers involved are [1]:
Intelligent, mobile and portable devices; Intelligent agent applications triggered by sensing the environment; Distributed networking environments mediated by wireless telecommunications; Convenient, unobtrusive access to relevant information anywhere and anytime; Richness of location-based information and applications.
Unobtrusive pervasive computing aims to naturalize the computing environment within the environment of the user or other people so they will not realize its presence. This is exactly what blind people need to overcome their fear of being noticed in public. Many people depend today on appliances and devices such as mobile phones for the convenience and the variety of services they provide. This is why we thought our device would be conveniently carried when embedded in a mobile phone. SG is now designed to work with a mobile application via Bluetooth. (see Figure 1).
Mobile-Embedded Smart Guide for the Blind
573
Fig. 1. Smart Guide Use Case Diagram
Fig. 2. Smart Guide Components
2.1
Smart Guide Components
As shown in Figure 2, SG consists of a combination of hardware and software. Most of the hardware parts were manufactured by Parallax; a private company that designs and manufactures microcontroller development tools and small single-board computers. 2.1.1 Hardware Some of the hardware used in SG are:
Ultrasonic sensor BASIC Stamp model 2 Board Of Education- USB Easy Bluetooth 7v Power Adapter
574
D. AbdulRasool and S. Sabra
Power Converter (Transistor): The converter helps to convert the battery power to 5v. Mobile with Symbian operating system Bluetooth-enabled. Ultrasonic Sensor A Sensor is a device, which responds to an input quantity by generating a functionally related output usually in the form of an electrical or optical signal. Sensors are used to measure basic physical phenomena such as shock & vibration, humidity flow rate, magnetic fields, pressure, proximity, sound, temperature, and velocity. Our focus on proximity sensors that measure distance using technologies like Infrared, Radio Wave (RADAR), Ultra Sonic Waves (SONAR), etc. Ultrasonic sensor utilizes the reflection of high frequency (20KHz) sound waves to detect parts or distances to the parts [4]. Parallax's PING ultrasonic sensor provides a very low-cost and easy method of distance measurement. This sensor is perfect for any number of applications that require performing measurements between moving or stationary objects. The Ping sensor measures distance using sonar; an ultrasonic (well above human hearing) pulse is transmitted from the unit and distance-to-target is determined by measuring the time required for the echo return. Output from the PING sensor is a variable-width pulse that corresponds to the distance to the target. This sensor can read from 2 cm up to 3 m range using simple pulse in/pulse out communication with 20 mA power consumption. It operates and endures temperatures from +32 to +158 °F (0 to +70 °C). BASIC STAMP Model The BASIC Stamp requests a measurement from the Ping sensor by sending it a brief pulse, which causes it to emit a 40 kHz chirp. Then, the Ping listens for an echo of that chirp. It reports the echo by sending a pulse back to the BASIC Stamp that is equal to the time it took for the Ping sensor to receive the echo. To calculate the distance based on the echo time measurement, the speed of sound must be converted into units that are convenient for the BASIC Stamp. This involves converting meters per second to centimeters per pulse in measurement units. The main features of Basic Stamp include a processor with 20 MHz speed, a program execution with a speed of ~4,000 PBASIC instructions/sec, and RAM size of 32 Bytes (6 I/0, 26 Variable). Board of Education - USB The Board of Education "Full Kit" used (as shown in Figure 3), contains a Board of Education carrier board, BASIC Stamp 2 module, pluggable wires, and USB cable. No power supply is included [6].
Mobile-Embedded Smart Guide for the Blind
575
Fig. 3. Board of Education source:http://www.parallax.com [4]
1. Battery place or clip: Board of Education needs 2. 9V batteries . 3. Barrel Jack: It’s a power supply but we cannot use the battery clip with barrel jack in the same time. 4. Voltage regulator: It is a supplier with 5V for socket and pins Vdd that’s gives power to the circuits that billed on the board area. 5. Power Indicator LED: The light signal that shows when the power is supplied to the board. 6. Servo headers (x4, x5) and power select Jumper: There are two headers: each of them has servo or 3-pin devices that connecters that brings the power. 7. Power header(x3) is the socket Vdd connect to +5 VDC ,Vin will connect to the power supply to the board by the battery clip and the Vss will be connect to 0V . 8. Brealboard :It is the white plastic board it has strips ,it will help to connect to sensor and cables. 9. I/O pin access header (x2): It has 16 pins from 0 to 15 are connected to this header and controls the I/OPUT that is in the brealboard. 10. AppMod header(x1): Provider power from supplier to all board devices. 11. Reset button: Make restart the BASIC Stamp. 12. Position power switch: On/Off switch and provide Vin to the regulator. 13. Socket for BASIC Stamp: Connect the B-Stamp to the programming connector. 14. USB programming connector: Is the connection between the computer and the board of education. Parallax's Easy Bluetooth The Easy Bluetooth Serial Module is an effective and low-cost solution to free hardware applications from wires. The module is small in size, and with its SIP header design, it can fit on any 0.1 spacing breadboards for rapid prototyping. The Easy Bluetooth Module is compatible with all the Parallax microcontrollers. The module has two parts, the RBT-001 module and the SIP with voltage regulator PCB. With the on-board regulator,
576
D. AbdulRasool and S. Sabra
the module can be connected to voltages higher than 3.3 VDC (such as the Parallax Board of Education 5 VDC regulated supply) without worry of damaging the unit, while the RX and TX can utilize serial communication at CMOS and TTL levels. The Bluetooth has easy serial communication and low power consumption. 2.1.2 Software Two programming languages have been used in SG: PBASIC and Java (J2ME): the former was used to make BS2 program while the latter was used to make mobile application. Micro-engineering labs PICBASIC is a DOS command line compiler that now ships with Code Designer Lite IDE for Windows, Code Designer Lite allowing writing code in a PicBasic-friendly environment. True compiler provides faster program execution and longer programs than BASIC interpreters. Java 2 Micro Edition (J2ME), is designed for small devices with a limited processor power and small memory size. Mobile phones, personal digital assistants (PDAs), consumer electronics, and embedded devices are common examples of J2ME capable devices. J2ME configurations define the minimum set of java virtual machine features and java class libraries available for particular category devices. A configuration typically represents a group of devices with similar processing power and amounts available memory. A profile is a specification that defines sets of application programming interfaces (APIs) and features and utilizes the underlying configuration to provide a complete run-time environment for a specific kind of device.
Fig. 4. Smart Guide
Symbian is mobile operating system (OS) targeted at mobile phones that offers a high-level of integration with communication and personal information management (PIM) functionality. Symbian OS combines middleware with wireless communications through an integrated mailbox and the integration of Java and PIM functionality (agenda and contacts). The Symbian OS is open for third-party development by independent software vendors, enterprise IT departments, network operators and Symbian OS licensees. Symbian is the operating system used on the Nokia 5800, on all of Nokia's smartphones, and on smartphones from many other manufacturers too such as Samsung, Sony Ericsson and LG. S60, also known as series 60, is a software platform and interface that sits on top of Symbian.
Mobile-Embedded Smart Guide for the Blind
577
2.2 Smart Guide in a Mobile Environment Figure 4 shows the complete setup of the various components into a device that communicates with a mobile phone via Bluetooth. The Ping sensor sends a brief chirp with its ultrasonic speaker and makes it easy for the BASIC Stamp to measure the time it takes the echo to return to its ultrasonic microphone. The BASIC Stamp will start by sending the Ping sensor a pulse to start the measurement. Then the Ping sensor waits long enough for the BASIC Stamp program to start a PULSIN command. At the same time the Ping sensor chirps send a high signal to the BASIC Stamp. When the Ping sensor detects the echo with its ultrasonic microphone, it changes that high signal back to low. The BASIC Stamp's PULSIN command stores how long the high signal from the Ping sensor lasted in a variable. The time measurement is how long it took for sound to travel to the object and back. With this measurement, we can then use the speed of sound in air to make our program calculate the object's distance in centimeters, inches, feet, etc. and then transform it into audio output via Easy Bluetooth. After the mobile phone receives signals from Easy Bluetooth, the Java application for mobile Smart Guide will output an alarm if the object is closer than 3 meters. This Java application allows the user to choose the type of warning he/she wants to receive in a form that makes him/her comfortable. There are three types of warnings: Voice warning option: will allow the user to know how far he/she is from the object by a speech output giving the number of steps to reach it. For this option we chose the Arabic language to be set by default. Vibrate warning option: will allow the user to sense the mobile vibrating if the object is close. Beeps warning option: the user will hear the mobile beeping whenever an object is close. The voice and beeps warnings have been recorded using a movie maker program and a speech program that converts text or numbers into Arabic speech recorded with Real Player.
3 Conclusions Blind people can use our device which can help detecting objects across or in front of their upper body level preventing any incidents and embarrassment of crumbling down in public which is the main fear of the visually impaired. Smart Guide can help blind people overcome their disabilities and change the society's perception towards them. SG gives them a chance to become involved in the community without fear and offers them the atmosphere to interact and communicate with normal people. With Smart Guide blind people will depend on themselves and will not be exempted from knowing their way alone. They will become more
578
D. AbdulRasool and S. Sabra
independent and feel more comfortable when using such an unobtrusive computing device. Our device has been tested by blind people in Kuwait Blind Association and Department of Special Need Children Schools to prove its usefulness and complete success. We developed an evaluation form to collect users’ feedbacks as well as involved caregivers. The evaluation form was printed in Braille language for the blind. The results were overwhelmingly satisfactory where we obtained 90% of the users and caregivers extremely satisfied using the new device. And in comparison with other devices, Smart Guide was rated the highest in terms of functionality, practicality, cost and reliability.
Acknowledgements We would like to thank Mr. Abdulla ALEnize, founder of the Department of Special Need Children Schools in Kuwait, for sponsoring this project. Also, we would like to thank all the participants from the aforementioned department for giving us the opportunity to test our product.
References [1] Elliot, G., Phillips, N.: Mobile Commerce and Wireless Computing Systems. Pearson Addison-Wesley, London (2004) [2] Gaunet, F.: Verbal guidance rules for a localized way finding aid intended for blindpedestrians in urban areas. Universal access in the information society 4, 338–353 (2008) [3] Giudice, N.A., Bakdash, J.Z., Legge, G.E.: Wayfinding with words: spatial learning and navigation using dynamically updated verbal descriptions. Psychological Research 71, 347–358 (2006), doi:10.1007/s00426-006[4] http://www.engineershandbook.com/Components/sensors.htm (accessed on September 2010) [5] Parallax Company Website, http://www.parallax.com (accessed on April 2010) [6] http://pcworld.idg.com.au/article/190318/motherbards/?pp=1#w hatisamotherboard (accessed on December 2010) [7] http://www.webopedia.com/TERM/S/Symbian.html (accessed on April 2010) [8] http://www.zeroonezero.com/glossary/mobile-phoneapplications.html (accessed on October 2010)
MMET: A Migration Metadata Extraction Tool for Long-Term Preservation Systems Feng Luan and Mads Nygård Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway {luan,mads}@idi.ntnu.no
Abstract. Migration is the most often used preservation approach in long-term preservation systems. To design a migration plan, custodians need to know about technical infrastructure about a preservation system, characteristics and provenance about digital materials, restrictions about preservation activities, and policies about retention rules. However, current tools cannot provide all these information. They just can output information about formats and characteristics for several given formats. Hence, in this paper, we design a migration metadata extraction tool. This tool uses the stored metadata to retrieve the above information for the custodians. The test results show that due to the limitation on the stored metadata, our solution still cannot get the sufficient information. However, it outputs more migration metadata and has better performance than current tools. Keywords: Migration, Metadata Extraction, Long-Term Preservation System.
1 Introduction Our society is becoming an e-society where computing technology is indispensable in people’s life. For example, government departments use e-government systems to create digital government documents, education institutions use e-learning systems to provide and digitize teaching resources, and libraries use e-library systems to store and publish digitized books, magazines, pictures, etc. When so much information is digitized or is born in a digital form, preservation becomes an important issue for the information management science. Several preservation approaches, such as migration, emulation, universal virtual computer, encapsulation, and technique preservation, have been proposed and analyzed in [1-7]. Amongst them, migration is the most often used approach. Also it is deemed as the most promising preservation approach. In practice, when doing a migration, custodians of a preservation system must prepare a plan, test the plan, and deploy the plan. For example, [8,9] introduce methodologies to detect format obsolescence, and [10,11] introduce methodologies to select migration solutions. In these methodologies, one of prerequisites is to obtain necessary and sufficient information about technical infrastructure about the preservation system, characteristics H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 579–590, 2011. © Springer-Verlag Berlin Heidelberg 2011
580
F. Luan and M. Nygård
and provenance about every type of digital materials, restrictions about preservation activities, and policies about retention. Hence, we choose our research question on how to get this information. We find that several tools have been designed for this purpose. They can scan a file system and extract metadata from files in the file system. However, these tools have some drawbacks. For instance, 1) they take much time to do the extraction; 2) the extracted metadata may not be accurate; and 3) the custodians just get format information and characteristics for several given formats. In order to overcome these drawbacks, we design a new migration metadata extraction tool (MMET), which uses a set of administrative metadata to obtain necessary information for migration. The structure of this paper is summarized as follows: We firstly in Section 2 introduce several related works and our research motivation. Secondly, our previous works on migration data requirements are shown in Section 3. The requirements would be used in MMET to specify what metadata should be retrieved. Thirdly, we summarize the design of MMET in Section 4. Fourthly, we test MMET and evaluate it with JHOVE in Section 5. Finally, a further discussion about our solution and current solutions is shown in Section 6.
2 Related Work and Motivation Current solutions to retrieve migration metadata are based on digital materials. There are three sets of such solutions. The first set focuses on the extraction of content characteristics. For example, the eXtensible Characterization Language (XCL) [12] can extract characteristics of a digital material, and can further use XCL-ontology to compare the characteristics before and after migration; ExifTool [13] can read, write and modify metadata that are embedded with the digital materials; Tika [14] can extract metadata and structured text content from various types of digital materials. The second set is to extract metadata about format. The format extension may be a clue for judging a format. However, since the file extension is modifiable, this kind of the judgment may not be trustworthy. The custodians have to use other mechanisms. For instance, in Linux files are assigned a unique identifier of a given format, so that the FILE command can use this identifier to judge a format rather than the file extension. The second example is DROID [15] that use internal and external signatures to identify the format of a digital material. These signatures are stored in a file downloaded from the format register PRONOM [16]. Using the DROID signature, the custodians can query a given format in PRONOM, and then can view the technical context for this format. Fido [17] converts the signature downloaded from PRONOM into regular expression for obtaining a good performance on the format identification task. The last set combines the functions of the above two classes. JHOVE [18] is such example. It is designed to identify a format, validate a format, extract format metadata, and audit a preservation system. JHOVE is able to support 12 formats, e.g., AIFF, ASCII, BYTESTREAM, GIF, HTML, JPEG, JPEG-2000, PDF, TIFF, UTF-8, WAVE, and XML. In addition, JHOVE provides an interface by which developers can design modules for other formats. Some projects have integrated JHOVE into their solutions. For example, AIHT [19] is a preservation assessment project. In their
MMET: A Migration Metadata Extraction Tool for Long-Term Preservation Systems
581
assessment procedure, JHOVE is used to identify formats in a preservation system and calculate the number of files for each format. PreScan [20] is another implementation of JHOVE. Using PreScan, preservation metadata can be automatically and manually created and maintained. Another example of the last set is FITS (Format Information Tool Set) [21]. It contains a variety of third-party open source tools, such as ExifTool, JHOVE, DROID, and the FILE command. The above solutions output metadata mainly about format and characteristics. Obviously, it is not sufficient in terms of our previous research work on migration data requirements (see Section 3). Lacking sufficient metadata, it might cause some problems when designing a migration procedure. For instance, 1) the migration procedure may fail, as digital materials are encrypted; and 2) the custodians over estimate migration time, so that they may choose a fast but expensive solution. In addition, the extraction of characteristics is time-consuming. As mentioned in [20], it takes PreScan about 10 hours to extract characteristics metadata from 100 thousand files. Hence, we try to find a solution that should be more efficient and get more metadata than current solutions. When surveying preservation systems, we find that the most systems have stored many metadata together with digital materials. These metadata provide description information, structural information, and administrative information. Hence, we decide to use these stored metadata to retrieve necessary information for migration. In the following sections, we will summarize the design of this tool.
3 Quality Data Requirements on Migration The preserved metadata may use various types of metadata schemas. In order to help the custodians identify what metadata element is necessary to be extracted, we first designed 24 migration data requirements (see Figure 1) in our previous work [22]. Secondly, we did a survey to validate the necessity and sufficiency of the requirements. The details about the requirements and the survey comments are summarized as following: ─ Storage: Storage metadata provide background about components in the storage system, such as its storage medium (R1), its storage driver (R2), and its storage software (R7)1. Using those metadata, the custodians can find compatible storage solutions or replacements. There are two conditions under which metadata of this category are necessary: 1) Preserved digital materials are offline data. The related storage system may be seldom accessed, so that people in future may not know the components of the storage system. 2) The storage system depends on special storage media, storage drivers and storage software, so that the custodians must have the sufficient components to read this storage medium. Hardware: Hardware metadata specify what components are necessary to build a computer system, with which the custodians can read the old storage system and can run the old applications. For example, microprocessor (R3), memory (R4), motherboard (R5), and peripherals (R6) are needed to create a basic computer system. However, the survey respondents commented that these metadata
─
─
1
R7 was at the Application category before doing the survey. However, the survey comments on R7 are often related to R1 and R2. Therefore, we move it to the Storage category.
582
F. Luan and M. Nygåård
Fig. 1. Migration n Data Requirements (
─
─
─
─
─
✝. Necessary; *. Conditional)
should be preserved when the components of the preservation system are dependent each other. In n addition, some of them deemed that just having the naame of a given computer generation g is enough. Application: Interpreetation software applications (R8) can interpret a technnical specification. Most respondents agree that the metadata on R8 must be ppreserved. It is because any interpretation software application is the key to view and manipulate the prreserved digital materials. Specification: Speciffication metadata describe techniques used for preserrved digital materials. Currrently, there are five kinds of techniques that may be uused by digital materials, namely n format (R9), identifier (R10), hyperlink (R11), encryption (R12), and fixity f (R13). Most respondents believe that these requuirements are necessary, because the developers of the migration plan must use them to develop a mig gration solution and compare various migration solutionns. Characteristics: Chaaracteristics metadata define essential facets of a digital m material, e.g., content (R14), ( appearance (R15), behavior (R16) and refereence (R17). Using these faacets, preservation systems may evaluate the migrationn results. The respondentt argued that whether the preservation system should sttore characteristics is deteermined by the existence of two kinds of software appliications: an application that t can extract these characteristics and an application tthat can utilize these charaacteristics. Provenance: Provenaance metadata describe previous activities on digital maaterials. It includes thee documentation of those activity events (R18) and all changed parts of thee preserved digital materials (R19) during the migratiion. These data are necesssary and helpful to improve the trustworthiness of the ddigital materials. Modification Rights:: Metadata on modification rights specify what kind of migration activity can be b carried out. These rights may be intellectual propeerty rights (IPRs, R20) or government law (R21). Hence, in order to keep the miggration legal, the custod dians must comply with those pre-specified modificattion rights.
MMET: A Migration Metadata Extraction Tool for Long-Term Preservation Systems
583
─ Retention Rights: Retention rights specify a set of preservation rules, which let the custodians to use the same criteria as before. The possible rules could be the preservation level (R22), important factors on characteristics (R23), and assessment methods for migration results (R24). As the theories on R23 and R24 are not mature, the survey respondents deem that these two data requirements are not necessary. However, the respondents believe that R22 is necessary to store.
4 MMET - Migration Metadata Extraction Tool MMET is implemented by Java and depends on a structural metadata schema called METS [23], which organizes a preservation package including several digital materials. Each METS document contains 7 sub-parts: metsHrd describing this METS file, dmdSec describing files within this preservation package, admSec providing administration information of these files, fileSec providing location of these files, structMap providing the organizational structure of these files, structLink defining hyperlinks between these files, and behaviorSec defining software behaviors necessary for viewing or interacting. Figure 2 illustrates the abstract architecture of MMET. In the architecture, there are an execution part and a specification part. In the execution part, there are 7 tasks that are allocated into four MMET components, namely MMETManager, MMScanner, MMETExtractor and MMETSummary (see Figure 3). In the specification part, there is an external task in which migration specialists should define a set of mapping rules between the preserved metadata and the necessary metadata for migration. Due to the specification part just has one task. The following description is based on the components of the execution part. The task of the specification part will be mentioned when we introduce MMETExtractor. MMETManager
MMETManager provides a graphic interface to the custodians. Using this interface, the custodians can select the file folder under which files are going to be migrated (i.e., Task 1), and view the file situation of this folder and explore the stored metadata for each preserved digital materials in an XML form (i.e., Task 7). MMETScanner
MMETScanner carries out Task 2 and Task 3. In Task 2, a set of files including subfolders is retrieved from a given folder. Then, MMETScanner determines what type each file belongs to. If the type is directory, MMETScanner will go into this subfolder and do the same task as Task 2 again. If the type is file, MMETScanner will judge whether it is a METS file or not. Only the METS file is sent to Task 3. Task 2 will be recursively executed until all files have been analyzed. In Task 3, the METS file would be loaded into memory for analyzing. Firstly, a java library2 is used to parse METS and extract the METS sub-parts. Secondly, a set of works is deployed to extract the migration metadata. We found that in the METS subparts, admSec and fileSec are useful for MMET. In admSec, there are as techMD, 2
From the Australia National University, http://sourceforge.net/projects/ mets-api/.
584
F. Luan and M. Nygåård
Custodians
Migration Specialists
Execution Part
Specification Part
Graphic interfacce 7. Display the results
1. Choose a folder a folder Identify the Mapping Table
2. Search METS files
XML
no files a METS file
3. Analyze METS 6. Output a report
an XML wrapper Mapping Table
4. Query migration metadata Migration Metadata
migration metadata 5. Store migration metadata
Software activity
Human activity
Dependency
Fiig. 2. Abstract Architecture of MMET
Fig g. 3. Components of the Execution Part
rightsMD, digiprovMD, an nd sourceMD. Each of them contains a wrapper (nam med mdWrap) or a reference (naamed mdRef) linking to a XML file. Both the wrapper and the file contain a set of adm ministrative metadata that can provide the migration meetadata. In fileSec, a set of fiiles is listed, which are the preserved digital materials. In addition, these files have liinks that connect to techMD, rightsMD, digiprovMD, and sourceMD. Figure 4 illusttrates the relation between fileSec and admSec. Hennce, MMET retrieves files from m fileSec. Following the links of these files, MMET cann go to techMD, rightsMD, digip provMD, or sourceMD for retrieving the possible wrappper.
MMET: A Migration Metadaata Extraction Tool for Long-Term Preservation Systems
585
Fig. 4. Relationship R between FileSec and AdmSec MMETExtractor
MMETExtractor contains Task 4, which extracts the migration metadata from the XML wrapper generated in n Task 3. However, before doing Task 4, MMET requuires doing an external task in thee specification part, i.e., a migration specialist designs a set of mapping rules between the wrapper schema and the migration data requiremeents. For instance, in our test daataset, the wrapper uses PREMIS-v1.0 [24] and MIX-vv1.0 [25]. PREMIS-v1.0 includees many metadata on the archive package, the single m material, rights, events, and agent. a Hence, there are many mapping rules for PREM MISv1.0 (see Table 1). As for MIX-v1.0, it mainly provides the characteristics metaddata for digital images. This infformation is just related to R14, so the mapping table has one entry, i.e., MIX-v1.0 -> > R14. Using the mapping ruless, Task 4 is able to query the migration metadata. As the test environment has somee constraint, we cannot use any database. Hence, the java interface mechanism is useed. The interface defines several abstract query operatioons, whilst the java class for a given wrapper schema provides implementation of thhose abstract operations. For instance, in our implementation, the PREMIS-v1.0 java cllass uses the XML path language (XPath) [26] to retrieve the migration metadata. Ussing this interface mechanism, itt is easy for MMET to support any wrapper schema. Finnally, Task 4 will transfer the migration m metadata to MMETSummary. MMETSummary
MMETSummary does Task k 5 and Task 6. In Task 5, MMETSummary receives the migration metadata of a given digital material from MMScanner and stores this migration metadata together with w other metadata. Like the mapping rules, we havee to give up databases and use an in-memory XML data structure, i.e., Document Objject Model (DOM), to store thee summary information. In addition, MMET will save the migration metadata to an XML X file, because the custodians may need to check a single digital material’s migrattion metadata.
586
F. Luan and M. Nygård Table 1. Mapping Table for PREMIS-v1 and our Migration Data Requirements
Category
Req. Elements in PREMIS-v1.0
Storage
R1 R2 R7 R3 R4 R5 R6 R8
• • • • • • • •
Storage.storageMedium n/a n/a Hardware Environment.hardware.{hwName, hwType, hwOtherInformation} Environment.hardware.{hwName, hwType, hwOtherInformation} Environment.hardware.{hwName, hwType, hwOtherInformation} Environment.hardware.{hwName, hwType, hwOtherInformation} Application Environment. Software.{swName, swVersion, swType, swOtherInformation, swDependency} • CreatingApplication.{creatingApplicationName, creatingApplicationVersion, dateCreatedByApplication, creatingApplicationExtension} Specification R9 • objectCharacteristics.format.formatDesignation.{formatName, formatVersion} R10 • objectIdentifier.objectIdentifierType R11 • relationship.relatedObjectIdentification.relatedObjectIdentifierType • relationship.relatedEventIdentification.relatedEventIdentifierType • linkingEventIdentifier.relatedEventIdentifierType • linkingIntellectualEntityIdentifier.linkingIntellectualEntityIdentifierType • linkingPermisionStatementIdentifier.linkingPermissionStatementIdentifierType R12 • objectCharacteristics.inhibitors.{inhibitorType, inhibitorTarget} R13 • objectCharacteristics. Fixity.{messageDigestAlgorithm, messageDigestOriginator} Characteri- R14 • objectCharacteristics.significantProperties stics R15 • objectCharacteristics.significantProperties R16 • objectCharacteristics.significantProperties R17 • objectCharacteristics.significantProperties Provenance R18 • eventType • eventDateTime • linkingAgentIdentifier.{linkingAgentIdentifierType, linkingAgentIdentifierValue, linkingAgentRole} R19 • eventOutcomeInformation.{eventOutcome, eventOutcomeDetail} Modification R20 • permissionStatement.* rights R21 • permissionStatement.* Retention R22 • preservationLevel rights R23 • n/a R24 • n/a *. All sub-elements of a given element should be provided.
When all files in the user-specified folder have been analyzed, Task 2 will invoke Task 6 in MMETSummary to store a report about the overview of this folder to an XML file. In the XML file, the first level contains categories of the migration data requirements. The second level contains instances of the requirements. All the instances are organized in terms of the classification of the requirements. The third and last level contains identifiers of preserved digital materials. Every instance of the requirements would list all identifies of its digital materials. In addition, every identifier has an attribute about the location of an XML file, in which the migration metadata is stored.
MMET: A Migration Metadata Extraction Tool for Long-Term Preservation Systems
587
5 Experiment Results and Evaluation In the MMET experiment, we use a number of METS files from the National Library of Norway. The test results show that MMET successfully retrieves much information from the preserved metadata. However, some information still cannot be retrieved as the preservation metadata schema does not have related elements, or the data are not stored into the preservation system at all. As for the speed aspect, Table 2 summarizes the average times of MMET. The results show that the overall performance increases in a linear growth way. When scanning 1 million METS files, MMET will take nearly 7,8 hours. Hence, we stop the test at the scale of one million, as it will take more than 3 days for MMET to scan 10 million METS files. Table 2. Performance of MMET (in sec) Files ≈102 ≈10
Task 3
Task 4
Task 5
Task 6
Other
Overall
0,72
3,37
0,35
0,31
0,05
4,81
3
2,91
22,46
3,81
1,63
0,09
30,91
≈104
23,11
204,78
38,28
14,30
0,52
280,98
≈105
250,65 2448,72 (≈40,8 min)
2044,84 20337,21 (≈5.7 hr)
445,36 3898,90 (≈1,1 hr)
147,54 1451,18 (≈24,2 min)
5,22 66,39 (≈1,1 min)
2893,61 28202,39 (≈7,8 hr)
≈106
Table 3. Performances of MMET, JHOVE and JHOVE Audit* (in hr) Dataset MMET JHOVE Audit JHOVE 303,3 GB ≈ 1,3 ≈ 1,1 ≈ 52,0 606,6 GB n/a ≈ 2,6 ≈ 2,3 909,9 GB n/a ≈ 3,9 ≈ 3,3 1213,2 GB n/a ≈ 5,4 ≈ 4,5 *. JHOVE means JHOVE does the characteristics extraction function, whilst JHOVE Audit means JHOVE just does the audit function.
We further evaluate MMET by JHOVE. Using JHOVE is because it is often used in preservation systems. The evaluation focuses on the efficiency and the quality and quantity of migration metadata. We use digital books as our testing dataset. Every page of a digital book is stored in the JPEG-2000 format and the JPEG format, respectively. In addition, the content of the book is extracted by an OCR machine and is stored into an XML file. The associated metadata files are METS files and the output of JHOVE. We find that it is time-consuming for JHOVE to extract characteristics for each digital material. For instance, JHOVE would spend nearly 52 hours for a 303.3 GB dataset, but MMET only needs 78 minutes for the same dataset. As the characteristics extraction function of JHOVE spends too much time, we have to use the audit function of JHOVE (written JHOVE Audit), which validates file formats and creates an inventory about the file system. We tested JHOVE Audit and MMET. Table 3 illustrates the evaluation results. JHOVE Audit and MMET have similar speeds, but JHOVE is very slow.
588
F. Luan and M. Nygård
As for the quality and the quantity of retrieved metadata, JHOVE Audit creates few metadata. It just reports the validity statue, format types in the MIME classification, and the number of files for a given format and folder. For instance, for the 303.3 GB dataset, JHOVE Audit reports that all files are valid and there are 4 kinds of formats, i.e., image/jp2, image/jpg, text/plain with the US-ASCII charset, and text/plan with the UTF-8 charset3. Compared again the real situation, we found this information is not accurate. JHOVE Audit recognizes most of XML files using UTF-8 as US-ASCII. JHOVE creates more metadata than JHOVE Audit. For each file, JHOVE shows not only the validity and the MIME format type, but also it retrieves metadata embedded in the file and generates characteristics metadata based on the content. For instance, JHOVE use MIX-v1.0 to store characteristics of images. MMET provides more metadata than JHOVE Audit and JHOVE. In the MMET report, there are many metadata about storage, software, format, identifier, reference, fixity, preservation level, the schema for wrapping provenance, and the schema for wrapping characteristics. As for the format metadata, MMET reports JPEG2000, JPEG-1.01, and XML-1.0. This is the same as the real situation. Therefore, for the quality and the quantity of the outputted metadata, MMET is the best in our evaluation.
6 Further Discussion There are two methods to obtain information for a migration plan design. The first method is called file-based solution, which directly analyzes digital materials, like JHOVE. The second method is named metadata-based solution, which retrieves information from the preserved metadata, like MMET. At different time points, these two methods can play different roles. For instance, when a digital material is inserted into the preservation system, there are few metadata. Hence, the metadata-based solution will not work at all. The file-based solution should be used. However, in the preservation period, the metadata-based solution works better than the file-based solution. The file-based solution can only be used to do some simple functions, such as identifying formats. This is because 1) the file-based solution is slow when it realizes a complex function, e.g., characteristics extraction; 2) the extracted metadata may not be the same as the real situation; and 3) many redundancy files, which were ever used but are not important now, may be involved in the calculation of the file-based solution. The metadata-based solution plays well in the preservation period. It can retrieve many metadata, and the retrieved metadata are more accurate than the file-based solution. Moreover, the metadata-based solution does not need to access the preserved digital materials when the custodians design a migration plan. This advantage is helpful to increase security of the preservation system, and makes it possible for the preservation system to outsource the migration plan design job. For instance, a thirdparty institution can assess risks in the preservation system and design corresponding solutions. However, the metadata-based solution has some limitations: 1) the quality and the quantity of preserved metadata will affect the migration metadata; and 2) a manual intervention is involved, e.g., defining mapping rules. 3
Text/plain with the US-ASCII or UTF-8 charset refers to a XML format. Since our test environment has no Internet, the XML module of JHOVE cannot be used.
MMET: A Migration Metadata Extraction Tool for Long-Term Preservation Systems
589
However, the speed is a big challenge for both the metadata-based solution and the file-based solution. In our test, the 1213,2 GB dataset contains 1280 digitized books with 57380 pages in total. MMET needs around 5,4 hours to retrieve the metadata, and JHOVE Audit needs 4,5 hours. However, large preservation systems, such as national libraries or national archives, have hundreds and thousands books. When all these books are digitized, it may take many days or months to retrieve metadata. In this situation, both the metadata-based solution and the file-based solution are bad. The possible solutions are 1) parallel computing technique should be used in the metadata-based solution and the file-based solution, and 2) the management task for the metadata should be transferred from the application level to the system level. For example, there exist a preservation-aware storage in [27], in which some metadata can be added.
7 Conclusion Migration is a time-consuming and expensive task for preservation systems. When the custodians design a migration plan, they ask to obtain necessary and sufficient metadata. In terms of our previous study on the migration data requirements, we find that the current solutions just provide a part of necessary metadata. Hence, MMET is designed to analyze the preserved metadata and retrieve related metadata from them. In the experiment, MMET outputs many metadata for migration. However, in terms of the migration data requirements, some of metadata still cannot be retrieved, as the preservation system does not store them at all. For the performance aspect, under the almost same time, MMET can obtain the overview of the file system and metadata for every digital material, whereas JHOVE just generates the overview. Acknowledgements. Research in this paper is funded by the Norwegian Research Council and our industry partners under the LongRec project. We would also thank our partners of LongRec, especially the National Library of Norway for providing experiment environment and technique supports.
References 1. The Consultative committee for Space Data Systems: The Reference Model for an Open Archival Information System, OAIS (2002) 2. Lee, K.H., Slattery, O., Lu, R., Tang, X., McCrary, V.: The State of the Art and Practice in Digital Preservation. Journal of Research of the National Institute of Standards and Technology 107(1), 93–106 (2002) 3. Thibodeau, K.: Overview of Technological Approaches to Digital Preservation and Challenges in Coming Years. CLIR Reports, Conference Proceedings of The State of Digital Preservation: An International Perspective (2002) 4. Wheatley, P.: Migration–a CAMiLEON discussion paper. Ariadne 29(2) (2001) 5. Granger, S.: Emulation as a Digital Preservation Strategy. D-Lib Magazine 6(10) (2000) 6. Lorie, R.A.: A methodology and system for preserving digital data. In: Proceedings of the 2nd ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 312–319. ACM, New York (2002)
590
F. Luan and M. Nygård
7. Borghoff, U., Rödig, P., Schmitz, L., Scheffczyk, J.: Migration: Current Research and Development. In: Long-Term Preservation of Digital Documents, pp. 171–206. Springer, Heidelberg (2006) 8. Stanescu, A.: Assessing the durability of formats in a digital preservation environment The INFORM methodology. OCLC Systems & Services: International Digital Library Perspectives 21(1), 61–81 (2005) 9. Li, C., Zheng, X.H., Meng, X., Wang, L., Xing, C.X.: A methodology for measuring the preservation durability of digital formats. Journal of Zhejiang University - Science C 11(11), 872–881 (2010) 10. Strodl, S., Becker, C., Neumayer, R., Rauber, A.: How to choose a digital preservation strategy: evaluating a preservation planning procedure. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 29–38. ACM, New York (2007) 11. Becker, C., Kulovits, H., Guttenbrunner, M., Strodl, S., Rauber, A., Hofman, H.: Systematic planning for digital preservation: evaluating potential strategies and building preservation plans. International Journal on Digital Libraries 10(4), 133–157 (2009) 12. Thaller, M., Heydegger, V., Schnasse, J., Beyl, S., Chudobkaite, E.: Significant Characteristics to Abstract Content: Long Term Preservation of Information. In: ChristensenDalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 41–49. Springer, Heidelberg (2008) 13. ExifTool, http://www.sno.phy.queensu.ca/~phil/exiftool/ 14. Tika, http://tika.apache.org/ 15. DROID, http://sourceforge.net/apps/mediawiki/droid/ index.php?title=Main_Page 16. PRONOM, http://www.nationalarchives.gov.uk/pronom/ 17. Fido, https://github.com/openplanets/fido 18. Abrams, S.L.: The role of format in digital preservation. Vine 34, 49–55 (2004) 19. Anderson, R., Frost, H., Hoebelheinrich, N., Johnson, K.: The AIHT at Stanford University: Automated preservation assessment of heterogeneous digital collections. D-Lib magazine 11, 12 (2005) 20. Marketakis, Y., Tzanakis, M., Tzitzikas, Y.: PreScan: towards automating the preservation of digital objects. In: Proceedings of the International Conference on Management of Emergent Digital EcoSystems 2009, vol. 60; 411, ACM, New York (2009) 21. File Information Tool Set (FITS), http://code.google.com/p/fits/ 22. Luan, F., Mestl, T., Nygård, M.: Quality Requirements of Migration Metadata in LongTerm Digital Preservation Systems. In: Sánchez-Alonso, S., Athanasiadis, I.N. (eds.) Metadata and Semantic Research, vol. 108, pp. 172–182. Springer, Heidelberg (2010) 23. McDonough, J.: METS: standardized encoding for digital library objects. International journal on digital libraries 6(2), 148–158 (2006) 24. PREMIS Data Dictionary for Preservation Metadata 1.0. In. The PREMIS Editorial Committee (2005) 25. ANSI/NISO Z39.87 - Data Dictionary - Technical Metadata for Digital Still Images. ANSI/NISO (2006) 26. Clark, J., DeRose, S.: XML Path Language (XPath) version 1.0 w3c recommendation.Technical Report REC-xpath-19991116. World Wide Web Consortium (1999) 27. Factor, M., Naor, D., Rabinovici-Cohen, S., Ramati, L., Reshef, P., Satran, J., Giaretta, D.L.: Preservation DataStores: Architecture for Preservation Aware Storage. In: 24th IEEE Conference on Mass Storage Systems and Technologies, MSST 2007, pp. 3–15 (2007)
RDF2SPIN: Mapping Semantic Graphs to SPIN Model Checker Mahdi Gueffaz, Sylvain Rampacek, and Christophe Nicolle LE2I, UMR CNRS 5158 University of Bourgogne, 21000 Dijon, France {Mahdi.Gueffaz,Sylvain.Rampacek, Christophe.Nicolle}@u-bourgogne.fr
Abstract. The most frequently used language to represent the semantic graphs is the RDF (W3C standard for meta-modeling). The construction of semantic graphs is a source of numerous errors of interpretation. The processing of large semantic graphs is a limit to the use of semantics in current information systems. The work presented in this paper is part of a new research at the border between two areas: the semantic web and the model checking. For this, we developed a tool, RDF2SPIN, which converts RDF graphs into SPIN language. This conversion aims checking the semantic graphs with the model checker SPIN in order to verify the consistency of the data. To illustrate our proposal we used RDF graphs derived from IFC files. These files represent digital 3D building model. Our final goal is to check the consistency of the IFC files that are made from a cooperation of heterogeneous information sources. Keywords: Semantic graph, RDF, Model-Checking, Temporal logic, SPIN, IFC, BIM.
1 Introduction The increasing development of networks and especially the internet has greatly developed the heterogeneous gap between information systems. In glancing over the studies about interoperability of heterogeneous information systems we discover that all works tend to the resolution of semantic heterogeneity problems. Now, the W3C1 suggest norms to represent the semantic by ontology. Ontology is becoming an inescapable support for information systems interoperability and particularly in the Semantic. Literature now generally agrees on the Gruber’s terms to define an ontology: explicit specification of a shared conceptualization of a domain [1]. The physical structure of ontology is a combination of concepts, properties and relationships. This combination is also called a semantic graph. Several languages have been developed in the context of Semantic Web and most of these languages use XML2 as syntax [2]. The OWL3 [3] and RDF4 [4] are the most 1
World Wide Web Consortium. eXtensible Markup Language. 3 Web Ontology Language. 4 Resource Description Framework. 2
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 591–598, 2011. © Springer-Verlag Berlin Heidelberg 2011
592
M. Gueffaz, S. Rampacek, and C. Nicolle
important languages of the semantic web, they are based on XML. OWL allows representing the ontology, and it offers large capacity machines performing web content. RDF enhances the ease of automatic processing of Web resources. The RDF (Resource Description Framework) is the first W3C standard for enriching resources on the web with detailed descriptions. The descriptions may be characteristics of resources, such as author or content of a website. These descriptions are metadata. Enriching the Web with metadata allows the development of so-called Semantic Web [5]. The RDF is also used to represent semantic graph corresponding to a specific knowledge modeling. For example in the AEC5 projects, some papers used RDF to model knowledge from heterogeneous sources (electricians, plumbers, architects…). In this domain, some models are developed providing a common syntax to represent building objects. The most recent is the IFC6 [6] model developed by the International Alliance of Interoperability. The IFC model is a new type of BIM7 and requires tools to check the consistency of the heterogeneous data and the impact of the addition of new objects into the building. As the IFC graphs have a large size, their checking, handling and inspections are a very delicate task. In [7] we have presented a conversion from IFC to RDF. In this paper, we propose a new way using formal verification, which consists in the transformation of semantic graphs into a model and verifying them with a model checker. We developed a tool called “RDF2SPIN” that transforms semantic graphs into a model represented in SPIN [8] language. After this transformation, SPIN verifies the correctness of the model written in PROMELA8 language with temporal logic in order to verify the consistency of the data described in the model of the huge semantic graphs. The rest of this paper is organized as follows. In Section 2 we present an overview of the semantic graphs, especially the structure of the RDF graphs and the model checking. Then, in section 3, we describe the mapping of the semantic graphs into models and our approach is defined in section 4. Finally, we end with the conclusion.
2 An overview of Semantic Graph and Model Checking The RDF is also used to represent semantic graphs corresponding to a specific knowledge modeling. It is a language developed by the W3C to bring a semantic layer to the Web [9]. It allows the connection of the Web resources using directed labeled edges. The structure of the RDF documents is a complex directed labeled graph. An RDF document is a set of triples as shown in the Figure 1. In addition, the predicate (also called property) connects the subject (resource) to the object (value). Thus, the subject and the object are nodes of the graph connected by an edge directed from the subject towards the object. The nodes and the edges belong to the “resource” types. A resource is identified by an URI9 [10, 11].
5
Architecture Engineering Construction. Industrial Foundation Classes. 7 Building Information Model. 8 Process Meta Language. 9 Uniform Resource Identifier. 6
RDF2SPIN: Mapping Semantic Graphs to SPIN Model Checker
Property
Ressource
593
Value
Fig. 1. RDF triplet
The declarations can also be represented as a graph, the nodes as resources and values, and the arcs as properties. The resources are represented in the graph by circles; the properties are represented by directed arcs and the values by a box (a rectangle). Values can be resources if they are described by additional properties. For example, when a value is a resource in another triplet, the value is represented by a circle.
http://example.org/University_of_Bourgogne
http://example.org/Location http://example.org/Dijon http://example.org/Country
http://example.org/France
http://example.org/Department
http://example.org/Cote_d’or
Fig. 2. Example of a partial RDF graph
The RDF graph in the Figure 2 defines a node “University of Bourgogne” located at “Dijon”, having as country “France” and as department “Cote d’Or”. RDF documents can be written in various syntaxes, e.g., N3 [12], N-Triple [13], and RDF/XML. Below, we present the RDF\XML document corresponding to Figure 2.
France Cote d'or
The model checking [14] described in Figure 3 is a verification technique that explores all possible system states in a brute-force manner. Similar to a computer chess program that checks all possible moves, a model checker, the software tool that performs the model checking, examines all possible system scenarios in a systematic manner. In this way, it can be shown that a given system model truly satisfies a certain property. Even the subtle errors that remain undiscovered using emulation, testing and simulation can potentially be revealed using model checking.
594
M. Gueffaz, S. Rampacek, and C. Nicolle
To make a rigorous verification possible, properties should be described in a precise unambiguous way. It is the temporal logic that is used in order to express these properties. The temporal logic is a form of modal logic that is appropriate to specify relevant properties of the systems. It is basically an extension of traditional propositional logic with operators that refer to the behavior of systems over time.
Fig. 3. Model Checking approach
The following algorithm explains the way that the model checking works. First we put in the stack all the properties expressed in the temporal logic. All of them are verified one by one in the model and if a property does not satisfy the model, it is whether the model or the property that we must refine. In case of a memory overflow, the model must be reduced. Whereas formal verification techniques such as simulation and model checking are based on model description from which all possible system states can be generated, the test, that is a type of verification technique, is even applicable in cases where it is hard or even impossible to obtain a system model. Algorithm: Model-checking Begin
While stack ≠ nil do P := top (stack); while ¬ satisfied (p) then Refine the model, or property; Else if satisfied (p) then P := top (stack); Else // out of memory Try to reduce the model; End End
RDF2SPIN: Mapping Semantic Graphs to SPIN Model Checker
595
3 The Mapping This section speaks about our approach which consists in the transformation of semantic graphs into model in order to verify them with the model-checker. For this, we developed "RDF2SPIN" tool that transform semantic graph into PROMELA [8] language for the Model-checker SPIN. The RDF graphs considered here are represented as XML verbose files, in which the information is not stored hierarchically (so-called graph point of view). On the one hand, these RDF graphs are not necessarily connected, meaning they may have no root vertex from which all the other vertices are reachable. On the other hand, the NµSMV language manipulated by the verification tools of NµSMV always have a root vertex, which corresponds to the initial state of the system whose behavior is represented by the NµSMV language. The RDF graph transformation into NµSMV language is articulated in three steps: exploring the RDF graph, determining a root vertex and, final step, generating the Model of the RDF graph. The third step is divided into three sub-steps. First and second one consists in generating two tables (triplets table and resources and values table). The last one consists in producing PROMELA language. Table of triplets - Going through the RDF graph by graph traversal algorithms, we will create a table consisting of resources, properties and values. In our RDF graph, the resource is a vertex, the property represents the edge and the value is the successor vertex corresponding of the edge of the vertex. The table of triples of RDF graph is useful for the next step to create the table of resources and values. Table of resources and values - Browsing the table triples seen in the previous step, we attribute for each resource and for each value a unique function. These functions are proctype type. We combine all these functions in a table called table of resource and values as you can see in the example in section 3.4. PROMELA language - In this last step, we will write the PROMELA file corresponding to the RDF graph that we want to check. For this step, we will start by writing the function of the main root of the graph and for each property of the root, we call the function of the corresponding value. We will do the same for all "resource" functions defined in the table resources and values. In the other ones, all the function "value" we'll just display their contents. [15]
4 The Verification with the Model Checker As we saw in section 2, the model checker needs properties in order to check the model of semantic graphs. These properties are expressed in temporal logic. The concepts of temporal logic used for the first time by Pnueli [16] in the specification of formal properties are fairly easy to use. The operators are very close in terms of natural language. The formalization in temporal logic is simple enough although this apparent simplicity therefore requires significant expertise. Temporal logic allows representing and reasoning about certain properties of the system, so it is well-suited for the systems verification. There are two main temporal logics, that is linear time and
596
M. Gueffaz, S. Rampacek, and C. Nicolle
branching time. In linear time temporal logic, each execution of the system is independently analysed. In this case, a system satisfies a formula f, if f holds along every execution. The branching time combines all possible executions of the system into a single tree. Each path in the tree is a possible representation of the system execution. This section details our approach which consists in transforming semantic graphs into models in order to be verified by the model-checker. For this, we have developed a tool called “RDF2SPIN” that transforms semantic graphs into SPIN language.
First phase M: semantic graph (RDF) RDF2SPIN
M’: simplified model of semantic graph
Temporal logic description of semantic graph
Tool verification Model-checker (SPIN)
Second phase
M’ satisfies temporal logic
the
M’ not satisfies + counter example
Fig. 4. Our architecture
The architecture of the Figure 4 is divided into two phases. The first phase concerns the transformation of the semantic graph into a model using our tool “RDF2SPIN”, as described in section 3. The second phase concerns the verification of the properties expressed in temporal logic on the model using the model-checker SPIN. To illustrate our approach, we take an RDF graph represented in the Figure 5 and a temporal logic expressed in the table 1 to verify if the BIM “b1” contains a floor.
Fig. 5. Example of partial RDF graph
RDF2SPIN: Mapping Semantic Graphs to SPIN Model Checker
597
Table 1. Temporal logic formula Temporal logic Eventually (b1 Æ Next Next floor )
Meaning Is there a floor after two states starting from the state b1
Result True
We tested several RDF graphs on our tool “RDF2SPIN”, graphs representing buildings as shown in Figure 6, using a machine that runs on a processor with a capacity of 2.4 GHz and 4 GB of RAM, calculating the time of conversion as shown in Figure 7. Note that the RDF2SPIN tool is faster in converting semantic graphs. We have almost 12 seconds for a graph of 53 MB size. The transformation tool follows a polynomial curve.
Fig. 6. The 3D view of an IFC file
Fig. 7. Time conversion of semantic graphs
598
M. Gueffaz, S. Rampacek, and C. Nicolle
5 Conclusion This paper presents how to transform a semantic graph into a model for verification by using a powerful formal method, that is the “model checking”. Knowing that the model-checker does not understand the semantic graphs, we developed a tool RDF2SPIN to convert them into SPIN language in order to be verified with the temporal logics. This transformation is made for the purpose of classifying large semantic graphs in order to verify the consistency of IFC files representing 3D building.
References 1. Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing. In: Presented at the Padua workshop on Formal Ontology, vol. 43(4-5), pp. 907–928 (November 1995); later published in International Journal of Human-Computer Studies (March 1993) 2. Bray, T., Paoli, J., Sperberg-McQueen, C. M., Maler, E., Yergeau, F., Cowan, J.: Extensible Markup Language (XML) 1.1 (second edition) W3C recommendation (2006), http://www.w3.org/TR/2006/REC-xml11-20060816/ 3. Bechhofer, S., van Harmelen, F., Hendler J., Horrocks, I., McGuinness, D., PatelSchneijder, P., Andrea Stein, L.: OWL Web Ontology Language Reference, World Wide Web Consortium (W3C) (2004), http://www.w3.org/TR/owl-ref/ 4. Becket, D., McBride, B.: RDF/ XML Syntax Specification (Revised). W3C recommendation (2004), http://www.w3.org/TR/2004/REC-rdf-syntax-grammar20040210/ 5. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American, 34–43 (2001) 6. IFC Model, Industrial Foundation classes, International Alliance for interoperability (2008), http://www.buildingsmart.com/ 7. Vanland, R., Nicolle, C., Cruz, C.: IFC and Buildings Lifecycle Management. Journal of Automation in Construction (2008) 8. Ben-Ari, M.: Principles of the SPIN Model Checker. Springer, Heidelberg (2008); ISBN: 978-1-84628-769-5 9. Klyne, J.J.C.G.: Resource Description Framework (rdf): Concepts and abstract syntax. Tech. rep., W3C (2004) 10. Bönström, V., Hinze, A., Schweppe, H.: Storing RDF as a graph. In: Latin American WWW conference, Santiago, Chile. (2003) 11. Berners-Lee, T. W3C recommandation (2007), http://www.w3.org/DesignIssues/HTTP-URI 12. Berners-Lee, T., Connolly, D.: Notation3 (N3): A readable RDF syntax. W3C recommendation (2008), http://www.w3.org/TeamSubmission/n3/ 13. Becket, D., McBride, B.: RDF test cases. W3C Working draft (2004), http://www.w3.org/TR/rdf-testcases/ 14. Katoen, J.P.: The principal of Model Checking. University of Twente (2002) 15. Gueffaz, M., Rampacek, S., Nicolle, C.: SCALESEM: Evaluation of Semantic graph based on Model Checking. The 7th International Conference on Web Information Systems and Technologies, WEBIST. Noordwijkerhout, Hollande (May 2011) 16. Pnueli, A.: The temporal logic of programs. In: Proc. 18th IEEE Symp. Foundations of Computer Science (FOCS 1977), Providence, RI, USA, pp. 46–57 (1977)
C-Lash: A Cache System for Optimizing NAND Flash Memory Performance and Lifetime Jalil Boukhobza and Pierre Olivier Université Européenne de Bretagne, France Université de Brest ; CNRS, UMR 3192 Lab-STICC, 20 avenue Le Gorgeu 29285 Brest cedex 3, France {jalil.boukhobza,pierre.olivier}@univ-brest.fr
Abstract. NAND flash memories are the most important storage media in mobile computing and tend to be less confined to this area. Nevertheless, it is not mature enough to allow a widespread use. This is due to poor write operations' performance caused by its internal intricacies. The major constraint of such a technology is the reduced number of erases operations which limits its lifetime. To cope with this issue, state-of-the-art solutions try to level the wear out of the memory to increase its lifetime. These policies, integrated into the Flash Translation Layer (FTL), contribute in decreasing write operation performance. In this paper, we propose to improve the performance and reduce the number of erasures by absorbing them throughout a dual cache system which replaces traditional FTL wear leveling and garbage collection services. C-lash enhances the state-of-the-art FTL performance by more than an order of magnitude for some real and synthetic workloads. Keywords: NAND Flash memory, cache, FTL, wear leveling, performance, storage system, I/O workload.
1 Introduction NAND flash memories are more and more used as main storage systems. We can find them in mp3 players, smart phones, laptop computers, and a huge set of electronic appliances. NAND flash memory is based on semiconductor chips giving them some very interesting characteristics. They are small, lightweight, shock resistant, and very power efficient. For all these reasons, they are considered as a promising technology for large mass storage systems [9][10][11]. Flash memory storage is, however, very expensive; it is one order of magnitude (~5$/GB in early 2009 [10]) more expensive than hard disk storage (~0.3$/GB in 2008 [9]), however, the costs are falling drastically due to fabrication process mastering and exponential market growth [12]. This continuous price-per-byte falling encourages both large scale storage systems vendors and users to test flash memories for future evolutions. When doing so, enterprises are faced with a major problem related to flash poor performance [2]: some disks are still outperforming flash memories for both sequential and random H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 599–613, 2011. © Springer-Verlag Berlin Heidelberg 2011
600
J. Boukhobza and P. Olivier
write intensive workloads. Our current research focuses on the enhancement of sequential and some random write operations workloads on NAND flash media using caching mechanisms. The poor lifetime of flash memories is due to the limited number of erase operations one can perform on a given block (see background section). In addition to this limitation, flash memories present poor write performance because writing can be performed only in one direction (from 1 to 0), so a block must be erased before being modified. The FTL is a hardware/software layer implemented inside the flash-based device. One of its main functionalities is the mapping of logical addresses to low level physical addresses. Throughout this mapping is performed the wear leveling, which consists in spreading the write/modify operations over the flash memory surface to increase the block average lifetime. In order to modify data, the block must be erased and rewritten or completely copied to another location. The use of such a technique implies to take into account the validity of data as many versions can be present in different blocks. Garbage collection mechanisms are used to recover free space. Wear leveling in FTL relies on an on-flash SRAM-based buffer where the logicalto-physical mapping tables are stored. The use of such a SRAM being expensive, FTL designers tried to minimize the size of such metadata. In fact, one have to find a tradeoff between increasing the performance of the flash memory by allowing a page mapping algorithm, which consumes a big amount of SRAM but reduces the number of block erase operations, and reducing the SRAM usage by increasing the granularity of the mapping which increases the number of erase operations (see background section). State of the art solutions are located between those two extremes. We propose, in this paper to replace the existing wear leveling techniques by a small, flexible dual cache that takes into account the structure and the constraints of flash memory. C-lash system is mainly used to delay and to order write operations. The main research contributions of this study are: 1) a novel caching technique to replace the existing wear leveling (and garbage collection) solutions. 2) The implementation of a simulator used to achieve performance evaluation based on FlashSim [13] [2] that is built by extending DiskSim 3.0 [14]. 3) The validation with a large number of real enterprise workloads such as different OLTP (On-Line Transaction Processing) applications, and synthetic workloads to demonstrate the performance impact for different workload patterns. The paper is organized as follows: a description of flash memories is given in the background section and some related works are discussed. In the third section, C-lash architecture and policies are depicted. The following section gives performance evaluation methodology while the fifth section discusses the simulation results, and finally we conclude and give some perspectives in the last section.
2 Background and Related Work Flash memories are nonvolatile EEPROMs. They are mainly of two types: 1) NOR flash and 2) NAND flash. NOR flash memories support bytes random access and have a lower density and a higher cost as compared to NAND flash memories. NOR memories are more suitable for storing code [15]. NAND flash memories are, by contrast, block addressed, but offer a higher bit density and a lower cost and provide
A Cache System for Optimizing NAND Flash Memory Performance and Lifetime
601
good performance for large operations. Those properties make them more suitable for storing data [15]. This study only considers NAND memories. Flash memory is structured as follows: it is composed of one or more chips; each chip is divided into multiple planes. A plane is composed of a fixed number of blocks; each of them encloses a fixed number of pages (multiple of 32). Actual versions have between 128 and 1024 KB blocks of pages of 2-4 KB. A page actually consists of user data space and a small metadata area [2][4]. Three key operations are possible on flash memories: read, write and erase. Read and write operations are performed on pages, whilst erase operations are performed on blocks. NAND flash does not support in-place data modification. In order to modify a page, it must be erased and rewritten in the same location or completely copied to another page to avoid additional latency, and its corresponding logical to physical translation map entry is updated. One fundamental constraint on flash memories is the limited number of write/erase cycles (from 104 to 105). Once this number is reached, a given memory cell can no more retain data. Some spare cells exist on the chip to cope with such an issue. Due to data locality (temporal and/or spatial), causing the write operations to be performed on the same blocks, some of those memory blocks tend to wear out more quickly. This very central problem pushed the researchers to design wear leveling techniques, implemented through the FTL, to even out the wearing over the whole memory area even though these techniques reduce dramatically the performance. 2.1 Basic FTL Schemes FTL systems can be classified into 3 main classes, depending on the way logical to physical mapping is performed: page, block, and hybrid mappings. Page-mapping. It is a flexible system in which each logical page is independently mapped on a physical one. When a write request addressing a page (already containing data) is issued, the physical page is invalidated, and the logical page is mapped to another free one, avoiding a block erase operation. Page mapping shows high performance in terms of response time and number of erase operations. The main drawback of the page-mapping scheme is the big size of the mapping table [1]. Block-mapping. It considers block rather than page granularity for mapping. The mapping table contains the translation between logical and physical blocks while the offset inside one block remains invariable. As compared with the page-mapping, the size of the mapping table is drastically decreased. With block mapping, problems occur when a write request addresses a physical page which already contains data. In this case, the system must copy the entire logical block to another physical one. These operations are extremely costly. Hybrid-mapping. It mixes both preceding techniques. Flash memory is divided into two parts [6]: data and log blocks. The first are mapped by block granularity while the second are mapped by page. As the page-mapping method requires a large space for metadata, the number of log blocks is limited. They maintain most frequently accessed data. When the number of free log blocks becomes too small, part of them are merged to be moved to data blocks. This operation is extremely costly.
602
J. Boukhobza and P. Olivier
2.2 Advanced FTL Schemes We compare the cache system proposed in this paper with some efficient state of the art FTL systems named Fully Associative Sector Translation (FAST) and Demandbased Flash Translation Layer (DFTL): FAST [6] is a hybrid-based FTL. It uses a block-based mapping table, and the log/data blocks partitioning previously described. FAST separates log blocks (0.07 % of the total space) into two spaces: a sequential and a random write log block. DFTL [2] is a pure page mapping FTL. It resolves the mapping table size issue by placing part of it in the media itself. To improve response times, the most accessed pages are stored in the SRAM; it takes into consideration temporal locality. DFTL divides the flash space into 2 parts: the translation space (0.2% of the total space) which contains the flash metadata, and the data space which contains the user data. Many other FTL systems have been developed, each with its pros and cons: a convertible (page/block) flash translation layer (CFTL) [17], a state transition fast based FTL (STAFF) [16] Mitsubishi [20], SSL [21], NFTL [18], other log block schemes such as BAST [6], BFTL (a B-Tree FTL) [19], etc. 2.3 Buffering Systems for Flash Storage Media Even though designed FTL techniques are more efficient, performance of write operations are still very poor. Buffering systems, using an amount of RAM upstream of the FTL, have been designed to cope with this issue. They reorganize nonsequential requests streams before sending them to the FTL. There are many examples of such systems in the literature: CFLRU [7], FAB [3], BPLRU [5], BPAC [8], CLC [4], etc. These systems are very different from the C-lash approach because they do not operate at the same level. They use large RAM buffers, sometimes larger than 100 MB, and are used in addition to the FTL. C-lash fits in a less than 1 MB RAM memory and is designed to replace some FTL services.
3 C-Lash System Architecture and Design 3.1 C-Lash Architecture In C-lash system, the cache area is partitioned into two distinct spaces, a page space (pspace) and a block space (b-space). P-space consists of a set of pages that can come from different blocks in the flash memory while b-space is composed of blocks that can be directly mapped on those of the flash media as one can see in Fig. 1-a. This dual cache system allows to represent, at the cache level, both granularities on which operations can be performed on the flash memory: reads and writes on pages and erases on blocks. The pages in the p-space and blocks of the b-space have, respectively the same size as those of the underlying flash memory pages and blocks. C-lash is also hierarchical, it has two levels of eviction policies: one which evicts pages from p-space to b-space (G in Fig. 1-a) and another level in which blocks from b-space are evicted into flash (I in Fig. 1-a). With this scheme, we insure that blocks are always flushed to the media rather than pages, which causes less erase operations.
A Cache System for Optimizing NAND Flash Memory Performance and Lifetime
603
Fig.1. a) At the left hand side: structure of the C-lash cache for flash system. b) at the right hand side: examples describing different scenarios of the eviction policy for both spaces.
P-space and b-space always contain either valid or free pages or blocks respectively. Consequently, when a read request arrives, the requested data are searched in both spaces. If we get a cache hit, data are read from the cache (B or E in Fig. 1-a), otherwise, they are read from the flash memory (A of Fig. 1-a). In any case, a read miss does generate a copy from the flash memory to the cache. When a write operation is issued, if the written data are present in the p-space or in the b-space, they are overwritten (respectively C or D in Fig. 1-a) with no impact on the flash media which avoids a costly erase and write operations (and merge if data has been modified). If data are not in the cache, they can only be written in the first cache level which is the p-space (C in Fig. 1-a). If enough pages are available, we use them to write the data. If not, we choose some pages to flush from the p-space to b-space (G in Fig. 1-a) and copy the new data in the freed space (see the next section for details). 3.2 Write Cache Eviction Policies Two eviction policies are implemented in C-lash, one for each of the p-space and bspace. P-space Eviction Policy. P-space contains written pages that come from different blocks in the flash memory. When a write request is issued and the considered page is neither present in the p-space nor in the b-space, a new page is allocated in the cache. If a free page is available, the new page is written and data in the corresponding
604
J. Boukhobza and P. Olivier
location in the flash invalidated. If no space is available, the system chooses one or more pages to evict into the b-space (not into the flash media). The choice of the pages to evict is achieved in two steps. First, the system searches the p-space for the largest set of pages belonging to the same block. Then, we have two different cases: 1) A free block is available in the b-space and so the subset of pages found in the first step is copied into one free block. 2) No free block is available and then, the set of victim pages is compared to the number of valid pages contained in each block of the b-space area: if there is a block containing less valid pages than the previously found subset, a switch operation in the SRAM is performed (F and G in Fig. 1). This means that pages in the victim block are moved in the p-space while the subset of victim pages is moved in the freed block. This induces no impact on the flash memory. The second possibility arises when all the blocks in the b-space contain more valid pages than the subset of victim pages to evict from the p-space. In this case the b-space eviction policy is executed to flush a block into the flash media (See next section). In the upper illustration of Fig. 1-b, we have an example of a p-space eviction; the chosen pages are those belonging to block B21 containing the biggest number of pages. One block in the b-space contains 2 valid pages which is less than the 3 pages to evict. Thus, C-lash switches between the block B4 and the 3 pages subset. The system, therefore, frees 1 page without flushing data into the flash. B-space Eviction Policy. The b-space eviction algorithm is called whenever the system needs to write a subset of pages form the p-space to the b-space while all blocks are full. Therefore, C-lash writes an entire victim block from the b-space into the flash memory media in order to free one block in the cache. An LRU algorithm is used for this eviction. Throughout this algorithm, the system takes into consideration, in the b-space, the temporal locality exhibited by many workloads. When a block eviction is performed, the whole corresponding block in the flash memory is erased before being replaced by the one in the cache. In case the flash media still contains some valid pages of the block in the cache to evict, a merge operation (J in Fig. 1-a) is achieved. This consists in reading the still valid pages in the flash memory and copying them into the cache before flushing (erase then write) the whole block. This read operation can be done either during a p-space eviction, we call it early merge, or just before flushing the block on the flash media and we call it late merge. Early merge is more advantageous if the workload is read intensive and shows temporal and/or spatial locality because we have more chances to make benefit from the cached data. If the workload is write-intensive, we gain no benefit in doing the early merge, we would prefer to postpone the merge operation using late merge. By doing so, we insure two main optimizations: 1) we read a minimum number of pages since between the moment the pages are evicted into the b-space and the moment it is evicted into the flash, many pages can be written and so invalidated from the flash (no need to read them). 2) Since it is possible for a block in b-space to be moved to the p-space during a p-space eviction, it may not be worth doing the merge operation too early. This can cause extra flash read operations. We restrict the scope of the presented study to the use of a late merge scheme. An example of block eviction is shown in Fig. 1-b. In the lower illustration, we describe a p-space eviction leading to a b-space flush to the flash media. In the pspace eviction phase, 2 pages of the B21 block are chosen to be evicted. The blocks in
A Cache System for Optimizing NAND Flash Memory Performance and Lifetime
605
the b-space contain more valid pages than the pages' amount to be evicted from the pspace. So, the system needs to evict a block into the flash beforehand. After that, the system copies both pages of the B21 into the freed block. In this specific example, no merge operation occurs because the flushed block is full of valid pages.
4 Performance Evaluation We compare the performance of the C-lash system with the state-of-the-art FTLs because it has been built in order to replace part of those FTL services.
Fig. 2. The simulated global storage system architecture
4.1 Simulation Framework and Methodology FlashSim is a simulator based on Disksim [14] which is the most popular disk drive storage system simulator in both academia and industry. Disksim is an event driven simulator. It simulates the whole storage system going from the device driver until the detailed disk movement. It also integrates a detailed synthetic I/O workload generator used in part of our experimentations. Disksim does not natively include flash memory support. FlashSim integrates modules specific to flash memories in Disksim. It is able to simulate basic flash device infrastructure: read, write and erase operations, in addition to logical to physical address translation mechanisms and garbage collection policies. FlashSim implements FAST, DFTL and an idealized page map FTLs. We have increased the functionality of FlashSim to allow the simulation of a dual cache subsystem placed on the top of the flash media (see Fig. 2). The used cache is configurable and many cache policies can be simulated (FIFO, LRU, LFU). 4.2 Storage System and Performance Metrics We rely on two main performance metrics: the average request response time and the number of performed erase operations. The response time is captured from the I/O driver point of view (see Fig. 2) including all the intermediate delays: caches, controllers, I/O queues, etc. We tried to minimize the intermediate elements impact to focus on the flash memory subsystem behavior. The second metric we capture is the number of performed erase operations. It indicates the wear out of the memory. 4.3 Simulated I/O Workloads We used different sets of real and synthetic workloads to study the impact of the Clash system as compared to the chosen FTLs. We focused our research on write
606
J. Boukhobza and P. Olivier
intensive applications for both sequential and random workloads, but we also show some results based on read dominant workloads. The simulated synthetic traces have been produced with Disksim I/O workload generator that is based on the sequentiality request rates, read/write operation rates, spatial locality, etc. We explored the performance of the C-lash subsystems as compared to other FTLs. Synthetic workloads are in fact more flexible in exploring the limitations and strengths of a given solution (see Table 1). Table 1. Synthetic workloads tested with C-lash, FAST, DFTL and page map Seq. rate Default Other value values 80% 50%, 65%
Spatial locality Default Other value values 40% 30%, 10%
Write rate Default Other value values 80% 50%, 65%
Address space size Default Other value values 1GB 0.5GB, 2GB
We performed more extensive simulations by varying two important parameters that are sequential and spatial locality rates from 20% to 80% by steps of 20 (see Table 2) proving that C-lash is good performing for random I/O workloads also. Request sizes follow a normal distribution from 1 to 64KB and the inter-arrival times follow an exponential distribution from 0 to 200ms. Table 2. Synthetic workloads with a focus on sequentiality and spatial locality Seq. rate Spatial locality Write rate Address space 20%; 40%; 20%; 40%; 80% 2 GB 60%; 80% 60%; 80%
Inter arrival times Request sizes exp (0, 200ms)
nor (1, 64KB)
Table 3. OLTP workloads tested with C-lash, FAST, DFTL and page map Workloads Financial 1 Financial 1 Financial 1 Financial 2 Financial 2 Financial 2
Write rate 99% 81% 18% 95% 24% 0%
Seq. write rate 83% 81% 13% 30% 1% -
Mean req. size (KB) 4 4 11.5 44 2.5 7.5
For real workloads performance tests, we used different enterprise I/O traces from OLTP applications running in financial institutions [22] available thanks to the Storage Performance Council (SPC) [23]. Those traces are extracted from a 5 to 20 disks' storage subsystems. We isolated the I/O trace of each disk and applied each to our one-flash-SSD. In this paper, we did not take into consideration all the disks; we chose 3 disks from each, revealing different characteristics (see Table 3). 4.4 Performed Tests We simulated a NAND flash memory with a page size of 2KB and a block size of 128KB (+4KB for metadata). The request addressing size interval varied between
A Cache System for Optimizing NAND Flash Memory Performance and Lifetime
607
100MB and 2GB. The three operations have the following values: a page read: 130.9µs, a page write: 405.9µs and a block erase: 2ms. This configuration is based on a Micron NAND flash memory [24]; its values are representative of the read/write/erase performance ratio. In the performed set of tests, we compare one C-lash configuration with different FTLs: FAST, DFTL (a very efficient FTL) and the idealized page map. As seen earlier, page map FTL consumes a huge amount of memory but gives ideal performances as compared to other FTLs, as in [2], we use it as a baseline. The chosen C-lash configuration has 2 blocks (128KB each) and 128 pages which constitutes a total small size of 512KB. The (block) mapping table used with C-lash is very small as compared to the cache size; for a 512KB cache, its size is about 320 bytes. Simulations are performed with synthetic and real workloads (see Table 1, 2 and 3). All the performed simulations begun with the same initial state, the flash media was supposed to be completely dirty. Each new flash write in a given block generated an erase operation.
5 Results and Discussion Synthetic I/O Workload Comparison. Fig. 3 shows the behavior of both mean response time and number of erase operations when varying the write rate and sequential rate of the I/O workload. We can see, from the left hand side illustrations, that the more we increase the write rate, the more the C-lash system outperforms the other FTLs and the more we approach from the baseline page map performance. For an 80% write rate, we enhance DFTL mean response time by more than 43% and reduce the number of erase operations by more than 47%. For this experimentation, we always outperform both FAST and DFTL and closely approach the page map performance. Another observation one can draw is that FAST FTL gives very low performance for the tested sequential write intensive workloads. The second tuned workload parameter is the sequential rate as one can see in the right hand side graphics in Fig. 3. For the mean response time, C-lash is out performing both FAST and DFTL for sequential rates greater than 65% (up to 48% improvement in response time and 54% in number of erase operations) but performs poorly when sequential rates are less than 60%. In these simulations, request sizes were fixed to 2KB (which is very small and so negatively impacts C-lash performance) and spatial locality to 40%. We show, latter, in more details, the implications of sequential rate and spatial locality. We can also observe that for a sequential rate of 65%, C-lash gives a better mean response time while generating more erase operations than DFTL. The decreasing sequential rate generates more flush operations on the flash memory, especially for very small request sizes, which increases the number of erase operations, but we still benefit from the low response times because writes are reported on the cache. Fig. 4 shows the variation of both performance metrics when changing the spatial locality and the flash size. We can observe that for spatial localities greater than 30%, C-lash outperforms the other FTLs (more than 40% enhancement) while it gives poor performances for 10% rate. Spatial locality is different from sequentiality from the simulator point of view. Sequentiality means strictly contiguous request addresses while spatial locality means requests with neighbor addresses.
608
J. Boukhobza and P. Olivier
Fig. 3. I/O driver mean response time (in milliseconds) and number of erases for the synthetic workload configuration of Table 1: variation of the write and the sequential rate
Graphics on the right hand side of Fig. 4 depict the variation of performance according to the simulated size of request address space. We can notice that the mean response time of the C-lash system is always better than other FTLs’ with a ratio going from 35% to 63%. C-lash always generates the same number of erase operations independently from the flash size while some other FTLs are less stable. For instance, DFTL outperforms all the other FTLs by generating no erase operations for a flash size of 500MB, we could not explain such a result. More tests were performed (not shown for space reasons) with more extensive sizes (10, 20GB, etc.), C-lash showed better performances for all tested sizes. We noticed that for large address spaces, FAST performed better, but it still lagging behind DFTL and C-lash. We conclude that for synthetic workloads described in Table 1, the C-lash system gives better performance than the tested FTLs for workloads showing a sequential rate of more than 60%, and at least 20% spatial locality for small request sizes. We can have an impressive performance increase of up to 65% on response times and number of erase operations as compared to the 2nd best performing FTL (DFTL). Table 4. shows a more thorough study of the sequential rate and spatial locality variation (see Table 2 for configurations). We observe that for those configurations, whether the workloads are random or sequential, C-lash always outperforms the other FTLs. It is better than DFTL by at least 59%, while it outperforms page mapping FTL by at least 13%. Table 4. shows a summary of obtained results. Results in bold represent cases for which C-lash outperforms the page map idealized FTL.
A Cache System for Optimizing NAND Flash Memory Performance and Lifetime
609
Fig. 4. I/O driver mean response times (in milliseconds) and number of erase operations for the synthetic workload configuration of Table 1: variation of the spatial locality and the flash size Table 4. Synthetic workload results’ summary for spatial locality and sequential rate varitation 20
Spatial locality (%) Sequentiality(%) DFTL FAST P. map
Resp. t. % Erase % Resp. t. % Erase % Resp. t. % Erase %
20 59 57 91 91 13 11
40 60 59 91 91 17 18
40 60
61 62 90 92 27 31
80 63 67 87 91 23 28
20 62 62 91 92 25 26
40 64 66 91 93 33 38
60 60
67 72 91 95 43 52
80 63 67 87 91 23 28
20 65 68 91 94 41 47
40 70 78 91 96 55 68
80 60
67 72 91 95 43 52
80 63 67 87 91 23 28
20 69 84 90 97 62 79
40 70 78 91 96 55 68
60 67 72 91 95 43 52
80 63 67 87 91 22 28
Results of Table 4. do not contradict those of Fig. 5. In fact, C-lash does not perform well for very random workloads coupled with very small request sizes (< 4KB), but still the performance does not fall dramatically. We are working to improve the C-lash performance for those kinds of workloads. Real OLTP Performance Evaluation. We depict, in this part, the results of comparison with real enterprise workloads described in Table 3. Fig. 5 shows that for all the tested disks, C-lash performance is better than the other FTLs. It even achieves better results than the page mapping FTL for the two first disks. From the mean response time point of view, C-lash enhances the FAST
610
J. Boukhobza and P. Olivier
Fig. 5. I/O driver mean response time (in milliseconds) and number of erase operations for the OLTP real workload Financial 1 and Financial 2 configurations of Table 3
FTL performance by more than 95%, while improving the DFTL by 53%, 51% and 13% respectively for the three simulated disks. For the erase operation metric, C-lash generates less erase operations than all the other FTLs for the first disk, while it reduces the DFTL number of erasures by 57% for the second disk. For the third one, despite the fact that C-lash gives better response times, it produces more erase operations. These results confirm that C-lash outperforms the tested FTLs for highly sequential workloads (disk 0: 83% and disk 19: 81%). The performance increase of C-lash on the Financial 2 workload, as shown in Fig. 5, is less impressive, even if it surpasses all the tested FTLs for the two last disks. We observed an improvement of less than 10% on response times as compared to DFTL. The main reason behind this small growth is the write operation sequential rate, spatial locality and request sizes (except for the first disk) that are small in this case. A summary of the results of the simulation is shown in Table 5. All gray cells are those for which C-lash shows better performances than DFTL and FAST. All the results in bold represent cases where C-lash outperforms the page map idealized FTL. Italic results are traces for which C-lash does not perform better for one or both performance metrics, most of those values appear for the page map idealized FTL. The cells containing a (-) sign mean that the FTLs does not show any erase operations while C-lash exhibits a small number. The 0% cells mean that neither C-lash nor the FTLs exhibit erase operations.
A Cache System for Optimizing NAND Flash Memory Performance and Lifetime
611
Table 5. Real OLTP simulation results' summary.
DFTL FAST P.map
Disks Resp. times Erase ops. Resp. times Erase ops. Resp. times Erase ops.
Financial 1 Financial 2 1 2 3 1 2 53% 51% 13% 9% 4% 57% 59% -12% 97% 95% 93% 86% 80% 97% 97% 92% 95% 87% 18% 17% -31% -34% 4% -79% 26% 23%
3 2% 0% 2% 0% 2% -
Other Performance and Cost Considerations − In the experimental evaluation part, the simulator does not consider the whole address translation and garbage collection latencies for the different FTLs. This parameter can negatively impact their performances as compared to the simple direct block mapping used in C-lash. − The main disadvantage from reducing the number of erase operations by absorbing them through the cache is the data loss that can be induced by a sudden power failure. This issue is very important to consider. Different hardware solutions may be proposed. A small battery may delay the effective shutdown until the whole valid data are flushed [5], in the worst case and with the cache configuration tested in this paper (2 blocks with just one valid page and 128 pages from different blocks), this operation takes only 4 seconds. In C-lash system, we thought about this problem and consequently, we did implement the LRU algorithm on the b-space (small number of blocks) and not on the p-space which would have provoked more flush operations on the media. This ensures a more regular data flush operations, and so less data loss, at the expense of performance (increase of response time). − From a cost point of view, the C-lash solution is very attractive; it does not use a lot of cache memory and simplifies the FTL implementation by removing some of its services. We can still reduce the C-lash size since the pages of the blocks in the b-space are not always entirely used. One solution consists in virtualizing the pages' addressing of the b-space region to be sure of getting good memory usage ratio. This is particularly true when dealing with weakly sequential workloads. − For space considerations, we did not show the distributions of the erase operations over the flash area for each simulation. In addition to a decrease of the total erase operation, the performed tests showed a good distribution because the repetitive erase operations (on the same blocks) were absorbed by the cache. There are still some cases for which cache could not absorb such erases, but this rarely happens.
6 Conclusion and Future Work We proposed, in this study, a paradigm shift as we introduced C-lash (Cache for Flash), a cache system for managing flash media supposed to replace the wear leveling and garbage collection parts of FTLs. We showed that C-lash system drastically improves the performance for a large spectrum of real enterprise and synthetic workloads in terms of response times and number of erase operations.
612
J. Boukhobza and P. Olivier
Beyond the C-lash solution, we think that it is more and more interesting to begin considering the migration from FTL wear leveling and garbage collection solutions towards cache based solutions (without the preceding services) for a large set of applications (there are still exceptions). The growing reliability and efficiency of flash memories reinforces us in this position. C-lash is flash-aware because it takes into account the cost of the different flash memory operations (read, write and erase) in terms of access time and lifetime by flushing onto the media only blocks containing the maximum number of valid pages. This tends to severely reduce the number of erase operations and so improves the performances. C-lash takes also into account the temporal and spatial locality of data via its b-space LRU eviction policy. C-lash performance was tested on a large set of workloads either synthetically generated or extracted from well established enterprise trace repositories. The achieved experimentations proved the relevance and efficiency of C-lash for a large set of workloads. In fact, we compared our system with an efficient FTL scheme named DFTL, and we enhanced the performance on the tested workloads by more than 65% in many cases. We also frequently performed better than the idealized page mapping strategy which consumes more than an order of magnitude additional memory. C-lash system performs poorly on very random workloads with very small request sizes if the cache size is not big enough. This is mainly caused by the bad use of the b-space in that case. We will carry on tests on random workloads to finely identify the small parameters’ windows where C-lash does not perform well. We propose, then, to study the possibility to virtualize the access to b-space pages in order to better exploit the whole cache size when facing random workload. Another solution to investigate consists in dynamically reconfigure/adapt the dual cache sizes according to the applied workload. The latter consumes less metadata (mapping tables). We expect to extend our simulation framework to multi-SSD systems to explore in more details the interactions of the different disks in larger enterprise storage systems. A patch to the Disksim simulator including the C-lash support will be available online with the bunch of performed tests.
References 1. Chung, T., Park, D., Park, S., Lee, D., Lee, S., Song, H.: A Survey of Flash Translation Layer. J. Syst. Archit. 55, 332–343 (2009) 2. Gupta, A., Kim, Y., Urgaonkar, B.: DFTL: A Flash Translation Layer Employing Demand-based Selective Caching of Page-level Address Mappings. In: Proceeding of the 14th international Conference on Architectural Support For Programming Languages and Operating Systems, New York, pp. 229–240 (2009) 3. Jo, H., Kang, J., Park, S., Kim, J., Lee, J.: FAB: Flash-Aware Buffer Management Policy for Portable Media Players. IEEE Trans. on Consumer Electronics 52, 485–493 (2006) 4. Kang, S., Park, S., Jung, H., Shim, H., Cha, J.: Performance Trade-Offs in Using NVRAM Write Buffer for Flash Memory-Based Storage Devices. IEEE Transactions on Computers 58(6), 744–758 (2009)
A Cache System for Optimizing NAND Flash Memory Performance and Lifetime
613
5. Kim, H., Ahn, S.: BPLRU: A Buffer Management Scheme for Improving Random Writes in Flash Storage. In: Baker, M., Riedel, E. (eds.) Proceedings of the 6th USENIX Conference on File and Storage Technologies, CA, pp. 1–14 (2008) 6. Lee, S., Park, D., Chung, T., Lee, D., Park, S., Song, H.: A Log Buffer-based Flash Translation Layer Using Fully-associative Sector Translation. ACM Trans. Embed. Comput. Syst. 6, 3 (2007) 7. Park, S., Jung, D., Kang, J., Kim, J., Lee, J.: CFLRU: a Replacement Algorithm for Flash Memory. In: CASES 2006: Proceedings of the 2006 international Conference on Compilers, Architecture and Synthesis For Embedded Systems, Seoul, Korea, pp. 234–241 (2006) 8. Wu, G., Eckart, B., He, X.: BPAC: An Adaptive Write Buffer Management Scheme for Flash-Based Solid State Drives. In: Proceedings of the IEEE 26th Symposium on Mass Storage Systems and Technologies (2010) 9. Myers, D.: On the Use of NAND Flash Memory in High-Performance Relational Databases, Master of Sc. Tech. report, Massachussets Institute of technology (2008) 10. Caulfield, A.M. , Grupp, L. M., Swanson, S.: Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications. In: ACM Architectural Support for Programming Languages and Operating Systems, Washington, DC (2009) 11. Stoica, R., Athanassoulis, M., Johnson, R.: Evaluating and Repairing Write Performance on Flash Devices. In: Fifth International Workshop on Data Management on New Hardware (DaMoN 2009), Providence, Rode-Island (2009) 12. Leventhal, A.: Flash Storage, ACM Queue (2008) 13. Kim, Y., Tauras, B., Gupta, A., Nistor, D.M., Urgaonkar, B.: FlashSim: A Simulator for NAND Flash-based Solid-State Drives, Tech. Report CSE-09-008, Pensylvania (2009) 14. Ganger, G.R., Worthington, B.L., Patt, Y.N.: The Disksim Simulation Environment Version 3.0 Reference Manual, Tech. Report CMU-CS-03-102, Pittsburgh (2003) 15. Forni, G., Ong, C., Rice, C., McKee, K., Bauer, R.J.: Flash Memory Applications. In: Brewer, J.E., Gill, M. (eds.) Nonvolatile Memory Technologies with emphasis on Flash. Series on Microlelectronic Systems. IEEE Press, USA (2007) 16. Chung, T.S., Park, H.S.: STAFF: A Flash Driver Algorithm Minimizing Block Erasures. Journal of Systems Architectures 53(12), 889–901 (2007) 17. Park, D., Debnath, B., Du, D.: CFTL: A Convertible Flash Translation Layer with Consideration of Data Access Pattern, Tech Report, University of Minnesota (2009) 18. M-Systems: Flash Memory Translation Layer for NAND Flash, NFTL (1998) 19. Wu, C.H., Chang, L.P., Kuo, T.W.: An Efficient B-Tree Layer for Flash-Memory Storage Systems. ACM Trans. On Embedded Computing 6(3) (2007) 20. Shinohara, T.: Flash Memory Card with Block Memory Address, United States Patent, No. 5,905,993 (1999) 21. Kim, B.S., Lee, G.Y.: Method of Driving Remapping in Flash Memory and Flash Memory Architecture Suitable Therfor. United States Patent, No. 6,381,176 (2002) 22. OLTP Traces, UMass Trace Rep, http://traces.cs.umass.edu/index.php/Storage/Storage 23. Storage Performance Council, http://www.storageperformance.org 24. Micron: Small Block vs. Large Block NAND Flash Devices, Micron Technical Report TN-29-07 (2007), http://download.micron.com/pdf/technotes/nand/tn2907.pdf
Resource Discovery for Supporting Ubiquitous Collaborative Work Kimberly Garc´ıa1, Sonia Mendoza1 , Dominique Decouchant2,3 , Jos´e Rodr´ıguez1 , and Alfredo Piero Mateos Papis2 1
2
Departamento de Computaci´ on, CINVESTAV-IPN, D.F., Mexico Depto. de Tecnolog´ıas de la Informaci´ on, UAM-Cuajimalpa, D.F., Mexico 3 C.N.R.S. - Laboratoire LIG de Grenoble, France [email protected],[email protected], [email protected],[email protected], [email protected]
Abstract. The majority of the solutions proposed in the domain of service discovery protocols mainly focus on developing single-user applications. Consequently, these applications are unaware of third-party interventions supposing that nobody interferes nor observes. This paper describes a system for discovering sharable resources in ubiquitous collaborative environments. The proposed system is based on the publish/subscribe model, which makes possible for collaborators to: 1) publish resources to share them with their colleagues and 2) subscribe themselves to get information about resources they are interested in. Dynamic information is gathered from different sources, such as user’s applications, a resource locator and a human face recognizer in order to find out the best available resource for a specific request. Resource availability is determined according to several parameters: technical characteristics, roles, usage restrictions and dependencies with other resources in terms of ownership, presence, location and even availability. Keywords: resource discovery, ubiquitous collaborative work, resource sharing and availability, publish/subscribe architecture.
1
Introduction
This paper falls within the Computer Supported Cooperative Work (CSCW) and Ubiquitous Computing (UC) fields. CSCW focuses on: 1) sociological aspects of both individual and collective activities, and 2) technological aspects of information and communication to facilitate effective collaboration among people. The UC field aims to develop intelligent systems capable of: 1) integrating themselves into the user’s physical environment, 2) adapting themselves to the user, and 3) being intuitively used by everyone at every moment and everywhere. Some service discovery protocols [5] allow to create ubiquitous environments in order to reduce users’ tasks when trying to use services (e.g., printing) available in specific places. In order to illustrate what can be achieved with these protocols, H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 614–628, 2011. c Springer-Verlag Berlin Heidelberg 2011
Resource Discovery for Supporting Ubiquitous Collaborative Work
615
let us analyze a scenario inspired by Bettstetter and Renner [2]: A journalist reports from a sport event, so she carries her laptop in order to write and print out articles, send mails and get information from the Web. When she enters the press room, her laptop automatically joins the wireless network giving her access to the Internet. In a moment, she hits the print button, so the service location protocol would find all printers in the environment and let her choose a printer (e.g., a color one). After selecting it, the protocol would perform the whole configuration process to provide her with the service. Therefore, she would not have to do that task herself. Some features can be highlighted from this scenario: 1) these service location protocols do not handle personalized access to services, instead all the users have the same accessibility level, and 2) the accessible services can be reached by anyone. However, in an organization with expensive services that need protection, these protocols may not be functional. Not only a user performing individual activities (like in the previous scenario) can profit from ubiquitous environments. Previous works in the CSCW field involve ubiquity by highlighting the working group members’ need of sharing relevant information at every time and everywhere. To illustrate this need, Markarian et al. [11] present the following scenario: “the members of a group physically meet together to discuss about some particular subject. During the meeting, one of the collaborators remembers that he owns relevant information, which can be shared with his colleagues, but it is stored in his PC. Consequently, the non-mobile character of his PC forces him to go to his office either to print out this information or to make a copy on a USB flash drive”. From this scenario, we conclude that this way of sharing information during a face-to-face meeting unavoidably breaks the interaction flow among collaborators. The problems identified from these scenarios, concerning service discovery protocols mainly intended to support mono-user activities and information statically stuck on computers, motivate us to deploy collaborative applications on mobile devices to give collaborators access to relevant resources when required. The context of our proposal is an organization whose human resources are potential collaborators and whose physical resources are heterogeneous devices (e.g., projectors, laptops, printers) distributed on private (e.g., offices) and public places (e.g., meeting rooms). Supporting resource sharing and human collaboration in ubiquitous environments requires to manage dynamic information about resource availability. Some of the developed sources that provide our system with such information are: 1) user’s applications that allow collaborators to modify their own state and the one of their resources, and 2) a face recognition system capable of identifying and locating collaborators in their physical environment. Our system keeps collaborators informed of the availability of resources and people by allowing them: 1) to publish resources intended to be shared with others, and 2) to subscribe to allowed resources depending on their interest. This paper is organized as follows. After presenting related work (Section 2), we describe a scenario and the architectural components of the proposed system. Then, we explain the main mechanisms to determine resource availability depending on technical characteristics, roles, collaborator-defined usage
616
K. Garc´ıa et al.
restrictions and dependency relationships with other resources in terms of ownership, presence, location and even availability (Section 3). In this way, our system can permit or deny access to context-aware information. Finally, we conclude this paper and give some ideas for future extensions (Section 4).
2
Related Work
Service discovery systems should provide support for ubiquitous environments in order to deal with: 1) the environment dynamism, and 2) the diversity of sharable resources that provide and request services. Some remarkable solutions rely on network protocols, whereas others constitute building blocks of frameworks intended for the development of services. However, their usage suitability depends on the way by which these solutions have been developed and supplied. Looking at the academic field, some systems that can be mentioned are: Intentional Naming System (MIT) [1] and Ninja Service Discovery Service (University of California Berkeley) [7]. On the other hand, some software companies have integrated service discovery systems into their operating systems, e.g., Jini Network Technology (Sun Microsystems) [12] and Universal Plug and Play (Microsoft) [10]. By contrast, Salutation Consortium (Salutation [9]) and Bluetooth Special Interest Group (Bluetooth SDP [4]) propose service discovery systems, which are operating system-independent. Without exception, these systems rely on the comparison of typical attributes (e.g., service type) and communication interface attributes (e.g., IP address and port of the service host) to check the availability of the services required by a user. Regarding service description, some systems are language-independent, e.g., Service Location Protocol (SLP), Salutation, and Universal Plug and Play (UPnP), whereas others are totally language-dependent, e.g., Jini relies on Java. As for service search, UPnP and Salutation use multicast communication, whereas Jini registers services in a Jini Lookup Service (JLS) directory. Despite these systems analyze different aspects of service discovery in distributed environments, they provide limited and non systematically updated availability information about sharable services in ubiquitous collaborative environments. Each proposal is focused on solving a particular problem (e.g., service identification or access control) identified in the domains of domestic applications (e.g., finding multimedia services for a video call) and business applications (e.g., tracking down printing services). Consequently, these systems do not satisfy specific needs of ubiquitous collaborative environments such as: 1) providing a good technological support for collaborators wanting to share their resources in a controlled way, or 2) helping collaborators to find the best available resource for their request by assessing dynamic conditions of the working environment.
3
Resource Availability Management System
The Resource Availability Management System (RAMS), proposed in this paper, aims at providing: 1) state information (e.g., presence, location and availability)
Resource Discovery for Supporting Ubiquitous Collaborative Work
617
about collaborators, physical resources (e.g., printers, whiteboards, and clusters) and virtual resources (e.g., multimedia and software), and 2) user functions to publish relevant resources that collaborators want to share with authorized colleagues, who can subscribe to some of these resources in order to use them or to access their contents depending on temporal and spatial restrictions. 3.1
RAMS Use Scenario
Let us suppose that within his physical working environment Mr. Brown owns, among other devices, a high-quality color plotter (see Fig. 1). At a moment, he decides to share it with his colleagues. Using the RAMS system functions, Mr. Brown can publish his plotter by specifying its main technical characteristics (e.g., resolution, speed, and paper size), roles (e.g., my PhD students) and usage restrictions (e.g., x printed pages per month allowed for my PhD students). Moreover, as Mr. Brown’s printer is located in his office, he defines an estimated access schedule to offer the printing service only during specified time slots. Thus, he prevents authorized users of this plotter from disturbing him.
Fig. 1. Mr. Brown’s, Mr. Smith’s and Miss White’s Physical Working Environment
On the other hand, Mr. Smith (one of Mr. Brown’s collaborators) needs to print a VLSI circuit design, so he uses the RAMS system functions to describe his resource requirements. In response, the system uses the following information to determine the best resource: 1) Mr. Smith’s requirements, 2) the technical characteristics provided by each collaborator that has published a plotter, 3) the roles attributed to Mr. Smith by each plotter publisher, and 4) the location of each published plotter and Mr. Smith’s current one. After the matchmaking process has been performed, the RAMS system informs Mr. Smith that Mr. Brown’s plotter can satisfy his requirements and provides him with the plotter location, availability, and access schedule. Despite the current time slot authorizes Mr. Smith to use the plotter, the RAMS system notifies him that it is temporarily unavailable because Mr. Brown has not yet arrived at his office!
618
K. Garc´ıa et al.
After a while, one of Mr. Brown’s students, Miss White, needs to test her application on his smartboard. She has no access restrictions to his advisor’s office whenever he is absent. When Miss White comes into Mr. Brown’s office, the RAMS system recognizes her and infers her new location. She intends to stay for two hours in his advisor’s office and then declares herself available, which means that other persons can disturb her anytime. Consequently, although Mr. Brown remains absent, the RAMS system infers and then informs Mr. Smith that Mr. Brown’s plotter is now available. Notified of these important changes, Mr. Smith sends his VLSI circuit design to the plotter. After printing completion, he goes to Mr. Brown’s office in order to recover his printed sheet. We study the three following cases to show the RAMS system functionality: Case 1: Mr. Smith is detected close to the secretary’s office If important information has to be transmitted to Mr. Smith while he is taking a tea in the cafeteria, the RAMS system can send it to the closest common printer located in the secretary’s office. Case 2: Mr. Smith is detected in Mr. Brown’s office A technical drawing has to be urgently diffused to Mr. Smith while he is in Mr. Brown’s office. After being notified by the RAMS system through his PDA, Mr. Smith asks Mr. Brown if he agrees to receive this drawing on his high resolution screen. If Mr. Brown agree, Mr. Smith’s drawing is automatically displayed on Mr. Brown’s screen. Moreover, Misters Smith and Brown can establish a technical discussion based on the complementary use of the screen (controlled by Mr. Brown) and the PDA (controlled by Mr. Smith). Case 3: Misters Smith and Brown establish a cooperative working session with their colleagues If Misters Smith and Brown need to establish a videoconference session with other colleagues to analyze such a technical drawing, the RAMS system can lock the meeting room A (as the meeting room B is already reserved). Then, Misters Smith and Brown join up with the working session supported by an interactive whiteboard, Mr. Smith’s PDA and Mr. Brown’s laptop. These scenarios require the management of different kinds of resources: the building map where Miss White and Misters Brown and Smith are located, physical and virtual resources, and finally the collaborators themselves. As described in the next section, for each kind of resource, the RAMS system manages characteristics, usage restrictions, roles, presence, location, and availability. 3.2
RAMS System Architecture
The RAMS system relies on the publish/subscribe asynchronous model [6] to allow collaborators: 1) to publish resources sharable with others, and 2) to subscribe to some of these resources to access them. Using specific applications, collaborators play the role of publishers and subscribers of state information (e.g., presence, location and availability) about human, physical and virtual resources. State information is transmitted via inter-application events (see Fig. 2).
Resource Discovery for Supporting Ubiquitous Collaborative Work
619
An event is an information unit automatically produced each time an agent (i.e., a user application instance) performs actions on resources. The RAMS system manages two different types of clients: 1) producer agents, which generate events and transmit them to RAMS for diffusion, and 2) consumer agents, which subscribe to RAMS for event reception. In a collaborative context, agents need to be identified in order to support resource sharing and its corresponding tasks (e.g., activity coordination, contribution identification and change notification). To be able to send and receive events, an agent has to register itself with the RAMS system, which returns to it a unique identifier. By means of this identifier, an agent can define a filter whose goal is to extend or restrict the diffusion scope of its events (see Fig. 2). The agent definition includes information allowing to identify not only the active entity that executes actions on resources (i.e., user), but also the source of these actions (i.e., site and application). This definition allows to: 1) control actions applied on shared resources, and 2) define sets of roles by means of which agents can act. The role definition is essential to express the social organization of the group work and constitutes the basis to protect resources from unauthorized actions. In this way, we adapt information filtering and notification functions of the publish/subscribe model to the requirements of collaborative systems. Particularly, these functions have to take into account: 1) agent’s identification, and 2) agent’s roles on shared resources. The RAMS publish/subscribe system is composed of three main parts: a Broker, a Matchmaker, and a set of Dynamic Information Subsystems (see Fig. 2). Broker Modules and Matchmaker The Broker consists of the Publication and Subscription Modules, the Topic and Content Filters, and the Event-based Notification Module.
Fig. 2. Architecture of the RAMS System
620
K. Garc´ıa et al.
The Publication Module allows producer agents to describe resources (see Fig. 2 ref. #1) in terms of their technical characteristics. Also, producer agents can define usage restrictions and attribute roles to consumer agents on their sharable resources. Likewise, the Subscription Module allows consumer agents to describe relevant technical characteristics of the required resources (see Fig. 2 ref. #2). Both the Publication and Subscription Modules rely on an ontology (see Section 3.3). Typically, consumer agents do not receive all the published events but a subset of them. In fact, events (see Fig. 2 ref. #3) pass through the Filter before being forwarded to consumer agents. The Filter organizes producer and consumer agents’ information respectively by topic and content (see Section 3.3). For each consumer agent, the Filter provides the Matchmaker with a potential set of shared resources (see Fig. 2 ref. #4), which is selected from predefined information, e.g., consumer agent’s roles, resource usage restrictions and compatibility of the technical characteristics provided by both the producer and consumer agents. In order to select the most suitable resources from this potential set and to automatically infer new events, the Matchmaker also relies on the set of dynamic information subsystems (see Fig. 2 ref. #5) described below. Based on both static and dynamic information, the Matchmaker establishes dependencies among resources from which the set of most suitable resources is inferred (see Fig. 2 ref. #6). For instance, a collaborator’s printer located in his office is available for others if: 1) the owner is in his office, 2) he is available, and 3) the access schedule authorizes the printer usage. Finally, the Event-based Notification Module is in charge of delivering (see Fig. 2 ref. #7) to consumer agents not only static state information provided by producer agents whenever they publish their resources, but also dynamic information about the most suitable resources. Resource Location and Face Recognition Subsystems The first subsystem is a Resource Locator, which is responsible for determining the closest physical resource (relatively to the requester’s current location) from the set of technically suitable, available and accessible resources. The detailed description of this subsystem is out of the scope of this paper. The second subsystem is a Human Face Recognizer, which is in charge of identifying collaborators, whose faces are captured by cameras located in specific places. This subsystem allows not only to inform about a collaborator’s presence and current location in a place, but also it is the basis to manage context-aware information. To cope with privacy and intrusion problems, the RAMS system also allows users to handle their appearance within the collaborative environment by declaring themselves invisible for some colleagues. The RAMS system is focused on locating users to make information closer to them and thus allowing them to easily locate each other within the ubiquitous collaborative environment. The goal is to follow every collaborator rather than to locate his mobile device, from which he may be regularly separated. Through this proposal, we do not negate the interest in locating mobile devices by other approaches, e.g., triangulation of Wi-Fi signals [11] that constitutes a
Resource Discovery for Supporting Ubiquitous Collaborative Work
621
complementary way. Rather, we aim to highlight that an efficient face recognition system may provide support for collaborative mobile work. The computer vision-based face recognition system needs a learning phase before the effective real-time face recognition called testing phase. The learning phase is performed only once, whereas the testing phase takes place each time a camera captures a human face. Several techniques are combined to create the recognizer. The learning phase uses an algorithm that differentiates between a human face and any other object. This algorithm is used to create a picture database of the collaborators to be identified. This database contains several pictures of each collaborator’s face in different positions, e.g., full-face portrait or profile, and customized with accessories, e.g., glasses or hat (see Fig. 3 #1).
Learning Phase
Human Face Capturing
1
Creation of the picture database
2 Face Recognition Phase
! Gustavo
5
Eigenface Analysis
Creation of the classification model
Classification Model
3 4
Adquired Picture Analysis
Human Face Live Detection
Prediction
Fig. 3. Learning and Testing Phases of the Face Recognition Subsystem
Once the database is completed, it is analyzed using the Eigenface method [8]. The information obtained from this analysis is then used to create a classification model (see Fig. 3 #2) that includes information to distinguish between one person’s face and another. Such a model is implemented by means of LIBSVM (Support Vector Machine library) [3]. After the learning phase is finished, the real-time face recognition phase can start. Every time a human face is detected, a picture is captured (see Fig. 3 #3) and then analyzed according to the information produced by the Eigenface method (see Fig. 3 #4). Finally, this information is assessed by the classification model (see Fig. 3 #2), which is in charge of identifying a collaborator (see Fig. 3 #5) by establishing a correspondence between the registered persons and the recently captured picture. 3.3
Resource Description
When producer agents want to share their resources with their colleagues, they use the publish module, which is based on an ontology that covers the special needs of description for our case of study. The main elements of this ontology are (see Fig. 4): 1) individuals (purple diamonds) that represent the objects of our domain of study, 2) classes (yellow circles) that group individuals belonging to a specific category, 3) object properties (blue rectangles) that link two individuals,
622
K. Garc´ıa et al.
and 4) data properties (green rectangles) that describe individuals by relating an individual to an attribute. Description varies from a resource type to another. A human resource (e.g., collaborator) description defines: 1) his social information (see Fig. 4a) using data properties such as name, degree and affiliation, 2) his position (see Fig. 4b) determined by the hasPosition object property, 3) his default location (e.g., office) employing the hasFixedLocation object property, and 4) his office schedule specified with the hasSchedule object property (see Fig. 4c). In contrast to physical resources, a collaborator’s location can regularly change as he moves from one place to another within the same organization.
Fig. 4. Human Resource Description
A physical resource (e.g., printer) description defines: 1) its technical capabilities using data properties such as resolution, double vs single side, color vs monochromatic (see Fig. 5a), 2) its owner determined by the isPropertyOf object property (see Fig. 5b), 3) its default location employing the isLocatedAt object property, and 4) its access schedule specified by the isAssociatedTo object property to link an individual of the Restriction class (see Fig. 5c) to an individual of the PhysicalResource class and then to associate the individual of the Restriction class to an individual of the HumanResource class by the hasToSatisfy object property. Thus, the restriction would be assigned to somebody. We do not present the description of virtual resources (e.g., multimedia and software) because it is similar to the one of physical resources. Resource Usage Restrictions We distinguish two types of physical resources: the public and private ones. Public resources are generally owned by a non-human resource, e.g., an organization department. Some public resources might have usage restrictions defined in terms of time, e.g., a group of collaborators has to search a free slot in order to use a meeting room. In the case of a printer, it might be irrelevant to define such a restriction as the usage time per person is relatively short.
Resource Discovery for Supporting Ubiquitous Collaborative Work
623
On the contrary, private physical resources belong to a collaborator or a group. Thus, the RAMS system allows a producer agent to define usage restrictions based on different criteria. For instance, they can be expressed in terms of time (e.g., cluster usage schedule) and/or in terms of results (e.g., maximal number of printed sheets per month). Like roles, usage restrictions can also vary according to the collaborator or group to which they are associated.
Fig. 5. Physical Resource Description
Subscriber’s Roles on Shared Resources The RAMS system also allows producer agents to attribute roles to consumer agents on their resources. Like resource usage restrictions, roles can be attributed to a specific collaborator or group (a collaborator can be part of several groups). The set of roles can vary from a resource type to another. For instance, a collaborator might consult, review or modify a Web document, whereas he might remotely use, configure or download a software. State Information Filtering Resource state information (e.g., presence, location and availability) is notified by means of events. Thus, filtering is the process of selecting events for processing and transmission. Two forms of filtering are implemented: – By topic: events are published on logically designated channels. Thus, consumer agents obtain all the events published on the channels to which they are subscribed. Producer agents are responsible for the definition of event classes, which correspond to channels. For instance, let us consider the following topics: Printers, Displays, Scanners and Videos. If a consumer agent
624
K. Garc´ıa et al.
is subscribed to the Printers topic, he receives information about all published printers (e.g., high and low resolution as well as monochromatic and color printers) although he looks for a high resolution color printer. – By content: events are notified to a specific consumer agent only if their attributes or contents fit in with his defined requirements. Consumer agents are responsible for event classification. For instance, if a subscriber specifies some attributes of the required printer (e.g., high resolution, color and output device), he receives information concerning all published high resolution color printers, but also high resolution color displays, PC screens, etc. Whereas producer agents publish events by topic, consumer agents subscribe by content to one or more topics. We select this filtering approach because it combines the advantages of both forms, i.e., consumer agents are relieved of information classification (by topic) and filtering is more fine (by content). For instance, if a consumer agent is subscribed to the Printers topic and also defines some attributes, such as high resolution and color, he only receives events about the published high resolution color printers. 3.4
Resource Dependencies
Defining resource dependencies is based on three relationships (see Fig. 6): – The ownership relationship establishes a m to n association between producer agents (owners), and non human resources (accessed entities). – The location relationship creates: 1) a m to n association between human resources and places, 2) a m to n association between virtual resources and storage devices, and 3) a m to 1 association between physical resources and places. A physical or virtual resource location is provided/modified by the owner. In contrast, as a collaborator’s location can change over time, it is regularly computed by the Face Recognition Subsystem. – The collaboration relationship establishes a m to n association between human resources and groups. An office schedule is associated to each person. Collaborators (producer agents) can attribute roles to individuals (consumer agents) or groups. These roles help to assign permissions for performing operations on the producer agents’ resources, which have also associated usage restrictions in order to control them in a better way. State Information for Physical Resources A physical resource published by the RAMS system is perceptible (see Fig. 7 state #2) by a consumer agent if the following conditions are true: 1) he is authorized to use it according to his role (C1 ) and 2) its technical characteristics satisfy his requirements (C2 ). In this case, the RAMS system relies on network functions (e.g., for a network printer) or application events (e.g., for a meeting room) to inform him whether the resource is present or not. Otherwise, the resource is imperceptible (see Fig. 7 state #1) by the consumer agent.
Resource Discovery for Supporting Ubiquitous Collaborative Work
625
Fig. 6. Class Diagram of the RAMS System Key Entities
Fig. 7. State Diagram of Physical Resource Awareness Information
A physical resource is considered either: 1) present (see Fig. 7 state #3) whenever it is reachable, e.g., an online printer or a usable meeting room, or 2) absent otherwise (see Fig. 7 state #4), e.g., the printer is out of order or the meeting room is closed for maintenance. When a resource is present, the RAMS system is able to inform the consumer agent whether it is available or not. As some physical resources (e.g., a cluster) can be remotely exploited, resource availability mainly depends on the usage restrictions defined by the producer agent. However, when a physical resource is located in a restricted area (e.g., an office), the RAMS system relies on the Matchmaker to determine its availability. As shown in Fig. 2, the Matchmaker uses: 1) application events and 2) caught events to infer new knowledge. By means of the ownership and location relationships (see Fig. 6), the Matchmaker is able to determine whether a producer agent and a given resource are currently co-located. Moreover, several users (e.g., administrators) might be authorized to provide access to the same room. In fact,
626
K. Garc´ıa et al.
the location of a room owner is required only if this room needs his presence to be open. This characteristic has to be specified when declaring the resource. Nevertheless, when a resource is located in a restricted area, the resource availability is not only inferred from: 1) the usage restrictions defined by the producer agent, and 2) the accordance between the resource and owner’s current location, but also it depends on the owner’s availability. Thus, the RAMS system notifies the consumer agent that the resource is available (see Fig. 7 state #5) if the following conditions are true: 1) the current context (e.g., time) is in accordance with the resource usage restrictions (C3 ), e.g., the current time is within the time slots during which the printer can be used or the meeting room is free, 2) the owner is currently located in the room where the resource is located (C4 ), 3) the owner is available (C5 ), 4) the resource is functional, i.e., the printer is online and ready to print (C6 ), and 5) the resource is free (C7 ). Conditions C3 , C4 and C7 are implicitly captured by the Resource Location and Face Recognition Subsystems, while C5 and C6 are explicitly captured by the users. Thus, the RAMS system allows the producer agent: 1) to declare himself either available or unavailable when he is working in a specific place (e.g., his office), and 2) to declare his resource unavailable when it is not functional (e.g., the printer has no toner). The Matchmaker also implements a more sophisticated approach to infer resource availability, which relies on the collaboration relationship among users. For instance, according to our scenario (cf. Section 3.1), if Mr. Brown is absent, but Miss White declares herself available while working in his office, Mr. Brown’s colleagues may access his devices located there. If one of these five conditions is false, the RAMS system notifies the consumer agent that the resource is unavailable (see Fig. 7 state #6). For instance, let us suppose that Mr. Brown owns a printer, which is located in his office. He declares that he is working in his office, but the Face Recognition Subsystem detects him within the secretary’s office. Consequently, the Matchmaker will notify other collaborators that Mr. Brown’s printer is temporarily unavailable. State Information for Collaborators Like physical resources, a user (producer or consumer agent) of the RAMS system is perceptible (see Fig. 8 state #2) by another if one of the following conditions are true: 1) they have a collaboration relationship (C1 ), e.g., they are members of a group, or 2) the collaborator (who observes) attributed roles to the perceived collaborator in order to authorize him to use his resources (C2 ). In this case, the RAMS system provides the later with information about the presence or absence of the former. Otherwise, the collaborator is imperceptible (see Fig. 8 state #1). The RAMS system notifies two types of presence: 1. physical presence (see Fig. 8 state #3) indicates that the collaborator is currently present in the building, and 2. virtual presence (see Fig. 8 state #4) means that the collaborator is currently absent from the building, but he is logged into the RAMS system. Thus, at any moment his colleagues can initialize a communication/collaboration session with him via some tools.
Resource Discovery for Supporting Ubiquitous Collaborative Work
627
Fig. 8. State Diagram of Collaborator Awareness Information
When the collaborator is present in the building, the RAMS system allows him to declare himself either: 1) unavailable (see Fig. 8 state #6) by explicitly activating the “do not disturb” mode (C3 ), or 2) available (see Fig. 8 state #5) by deactivating this mode (C4 ). Moreover, the collaborator might be available to some collaborators (e.g., boss) and unavailable to others (e.g., colleagues). The RAMS system notifies that the collaborator became absent passing from state #6 to state #7 (see Fig. 8) if one of the following conditions is true: 1) the Face Recognition Subsystem detected that he left the building (C5 ), or 2) the collaborator activated the “invisible” mode (C6 ). The possible states of virtual resources are not explained as being similar to the ones of physical resources.
4
Conclusion and Future Work
In this paper, we have proposed some principles and mechanisms that manage and provide awareness information about physical and virtual resources in order to support distributed, nomadic and ubiquitous collaborative work. In this context, resources take part of an organizational environment where their state (e.g., presence and availability) evolve depending on their relations with other resources (e.g., ownership, location and collaboration). Moreover, collaborators are considered as nomadic human resources with whom their colleagues can establish working sessions that use physical and/or virtual resources. Such principles and mechanisms have been concretized in the design and implementation of the RAMS system, which allows collaborators to share different resources with others and, in a symmetric way, to discover and use some of these resources. Using a suitable Matchmaker, the RAMS system processes resource descriptions (in terms of technical characteristics, roles, usage restrictions) and inter-resource relations to provide collaborators with actual and updated state information. Resource descriptions and inter-resource relations are based on an ontological approach in order to homogenize publications and subscriptions.
628
K. Garc´ıa et al.
In order to offer such facilities, the RAMS system relies on a publish/subscribe architecture extended with two subsystems that manage dynamic information: the Resource Locator determines the set of the closest resources suitable, available and accessible for a given requester. The second subsystem takes its source from an important remark: examining related projects, we can see that the nomadic user remains a confusing and implicit notion, which is centered on the detection of his mobile devices. This approach is not enough to provide efficient resource management functions since mobile devices cannot be considered as sticked to collaborators. Thus, the Human Face Recognize can identify and locate collaborators when they are separated from their devices. Improvements of this tool are still in development, especially to reduce confusing predictions that, for instance, come from someone’s change in appearance (e.g., hat or moustache). We also plan to develop a third subsystem able to detect mobile devices in order to provide a more complete location function. In fact, locating collaborators implies privacy and intrusion problems, which can be treated considering the users as resources.
References 1. Adjie, W., Schwartz, E., Balakrishnan, H., Lilley, J.: The Design and Implementation of an Intentional Naming System. In: 17th ACM Symposium on Operating System Principles, pp. 186–201. ACM Press, Charleston (1999) 2. Bettstetter, C., Renner, C.: A Comparison of Service Discovery Protocols and Implementation of the Service Location Protocol. In: 6th EUNICE Open European Summer School, Twente, pp. 1–8 (2000) 3. Chang, C., Lin, C.: LIBSVM: A Library for Support Vector Machines. National Taiwan University (2011) 4. Chang, C., Sahoo, P.K., Lee, S.: A Location-Aware Routing Protocol for the Bluetooth Scatternet. Wireless Personal Communications 40(1), 117–135 (2007) 5. Edwards, W.K.: Discovery Systems in Ubiquitous Computing. IEEE Pervasive Computing 5(2), 70–77 (2006) 6. Eugster, P., Felber, P., Guerraoui, R., Kermarrec, A.: The many faces of publish/subscribe. ACM Computing Survey 35(2), 114–131 (2003) 7. Herborn, S., Lopez, Y., Seneviratne, A.: A Distributed Scheme for Autonomous Service Composition. In: 1st ACM International Workshop on Multimedia Service Composition, pp. 21–30. ACM Press, Singapore (2005) 8. Hernandez, B., Olague, G., Hammoud, R., Trujillo, L., Romero, E.: Visual Learning of Texture Descriptors for Facial Expression Recognition in Thermal Imagery. Computer Vision and Image Understanding 106(2), 258–269 (2007) 9. Jamalipour, A.: The Wireless Mobile Internet: Architectures, Protocols and Services. John Wiley & Sons, New York (2003) 10. Jeronimo, M., Weast, J.: UPnP Design by Example: A Software Developer’s Guide to Universal Plug and Play. Intel Press (2003) 11. Markarian, A., Favela, J., Tentori, M., Castro, L.A.: Seamless Interaction Among Heterogeneous Devices in Support for Co-located Collaboration. In: Dimitriadis, Y.A., Zigurs, I., G´ omez-S´ anchez, E. (eds.) CRIWG 2006. LNCS, vol. 4154, pp. 389–404. Springer, Heidelberg (2006) 12. Newmarch, J.: Foundation of Jini 2 Programming. Apress Inc., New York (2006)
Comparison between Data Mining Algorithms Implementation Yas A. Alsultanny College of Graduate Studies-Arabian Gulf University Kingdom of Bahrain [email protected]
Abstract. Data Mining (DM) is the science of extracting useful information from the huge amounts of data. Data mining methods such as Na¨ıve Bayes, Nearest Neighbor and Decision Tree are tested. The implementation of the three algorithms showed that Na¨ıve Bayes algorithm is effectively used when the data attributes are categorized, and it can be used successfully in machine learning. The Nearest Neighbor is most suitable when the data attributes are continuous or categorized. The last algorithm tested is the Decision Tree, it is a simple predictive algorithm implemented by using simple rule methods in data classification. Each of the three algorithms can be implemented successfully and efficiently after studying the nature of database according to their; size, attributes, continuity and repetition. The success of data mining implementation depends on the completeness of database, that represented by data warehouse, that must be organized by using the important characteristics of data warehouse. Keywords: Data mining, knowledge discovery, Nearest Neighbour, Na¨ıve Bayes, Decision.
1 Introduction Data mining is one of the important methods used in decision making by transforming data from different resources such as from data warehouse to knowledge extraction, in this section an overview of data mining will be introduced to show the basic principles of data mining. Data mining is a process that is used to identify hider, unexpected pattern or relationships in large quantities of data. Historically, the notion of finding useful patterns in data has been given a variety of names, including data mining, knowledge extraction, information discovery, information harvesting, data archaeology, and data pattern processing. The term data mining has mostly been used by statisticians, data analysts and the Management Information Systems (MIS) communities. It has also gained popularity in the database field. The phrase knowledge discovery in databases was coined at the first KDD to emphasize that knowledge is the end product of a datadriven discovery. It has been popularized in the artificial intelligence and machinelearning fields. Fig. 1 shows an overview of the data mining and KDD process [1]. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 629–641, 2011. © Springer-Verlag Berlin Heidelberg 2011
630
Y.A. Alsultanny
Data mining predicts future trends and behaviors, allowing business to make proactive, knowledge driven decisions. It moves beyond the analysis of past events provided by retrospective tools typical of decision support systems, answering questions that traditionally where too time consuming to resolve. Data mining scours databases for hidden patterns, finding predictive information that experts might overlook because it falls outside their expectations.
Fig. 1. An overview of the data mining and KDD process
The two high-level primary goals of data mining in practice tend to be prediction and description. The prediction involves using some variables or fields in the database to predict unknown or future values of other variables of interest, and description focuses on finding human-interpretable patterns describing the data. Although the boundaries between prediction and description are not sharp (some of the predictive models can be descriptive, to the degree that they are understandable, and vice versa), the distinction is useful for understanding the overall discovery goal. The goals of prediction and description can be achieved using a variety of particular data-mining methods. • • • • •
Classification Regression Clustering Summarization Dependency modeling
2 Data Warehouse Data warehouse is a repository of subject-oriented historical data that is organized to be accessible in a form readily acceptable for analytical processing activities (such as data mining, decision support querying, and other applications) [2]. The major benefits of a data warehouse are: • •
The ability to reach data quickly, since they are located in one place. The ability to reach data easily and frequently by end users with Web browsers.
Comparison between Data Mining Algorithms Implementation
631
Characteristics of data warehousing are: • • • • • • • • •
Organization. Data are organized by subject. Consistency. In the warehouse data will be coded in a consistent manner. Time variant. The data are kept for many years so they can be used for trends, forecasting, and comparisons over time. Non-volatile. Once entered into the warehouse, data are not updated. Relational. Typically the data warehouse uses a relational structure. Client/server. The data warehouse uses the client/server architecture mainly to provide the end user an easy access to its data. Web-based. Data warehouses are designed to provide an efficient computing environment for Web-based applications. Integration. Data from various sources are integrated. Real time. Most applications of data warehousing are not in real time, it is possible to arrange for real-time capabilities.
It is important to note that if the data warehouse is organized with the above characteristics, the implementation of data mining by using different methods will have high degree of fidelity and can be used in decision making.
3 Data Mining Technology Data mining is an important information technology used to identify significant data from vast amounts of records. In other words, it is the process of exposing important hidden patterns in a set of data [3]. Large volume of data and complexity in problem solving inspire research in data mining and modern heuristics. Data mining (i.e. knowledge discovery) is the process of automating information discovery. It is the process of analyzing data from different perspectives, summarizing it into useful information, and finding different patterns (e.g. classification, regression, and clustering). Many problems are difficult to be solved analytically in a feasible time. Therefore, researchers are trying to find search techniques or heuristics to get a good enough or satisfactory solution in a reasonable time [4], [5].
4 Database Preparation The database of transaction can be stored in a variety of formats, depending on the nature of the data. It can be a relational database or a flat file. We call this a dataset. The typical data set can be stored in a bitmap format, where data stored in Fig. 2 can be mapped in a bitmap format; this step of data pre-processing can reduce memory size needed to store the data in a database warehouse. The original data needs 37 byte, in a bitmap format needs 11.2 byte only in this case and the percentage of compression is (11.2/37 =30.3%). Data compression is represented in bitmap approach to reach a result in a faster and easy manner, which is reduced the memory size required to store database.
632
Y.A. Alsultanny id 1
items a,b,e,g,h
1 2 3 4 5 6 7 8 9 10 a 1 0 1 1 1 1 0 1 1 0
2 3
c,e a,b,d,f
b 1 0 1 1 0 1 1 0 0 c 0 1 0 0 1 1 0 0 1
0 0
4 5
a,b,e,f,h a,c,d,g
d 0 0 1 0 1 0 0 1 1 e 1 1 0 1 0 0 0 1 0
1 0
6 7 8
a,b,c,f,h b,h,x a,d,e
f 0 0 1 1 0 1 0 0 0 g 1 0 0 0 1 0 0 0 0 h 1 0 0 1 0 1 1 0 0
0 0 1
9 10
a,c,d,x d,h
x 0 0 0 0 0 0 1 0 1
0
Horizontal=37byte
Bitmap=[(9*10)/8]=11.2 byte
Fig. 2. Dataset mapping in bitmap format Student ID 10100 10101 10102 10103 10104 10105 10106 10107 10108 10109
Name Suha Suha Muna Muna Salwa Salwa Mery Mery Dana Dana
Corse code CS CS IS IS CN CN IT IT DB DB
Course ID 90011 90011 90012 90012 90013 90013 90014 90014 90015 90015
Uncompressed Memory =4 * 11 = 44 a.
Name Suha Muna Salwa Mery Dana
Prefix Course code CS IS CN IT DB
Course ID 90011 90012 90013 90014 90015
Original dataset
Index 1 3 5 7 10
Suffix 10100 10101 10102 10103 10104 10105 10106 10107 10108 10109 10110
Compressed Memory =15 + 5 + 11 = 31 b.
Arrays of data compression
Fig. 3. Data Compression of the three arrays to store the itemsets
0 1 2 3 4 5 6 7 8 9 10
Comparison between Data Mining Algorithms Implementation
633
5 Compressions by Data Iteration The compressed data structure is based on two kinds of arrays as in Fig. 3, at each iteration k, the first array (prefix) stores the different prefixes of length (k−1). In the third array (suffix) all the length-1 suffixes are stored. Finally, the element i is stored in the second array (index), we store the position in the suffix array of the section of suffixes that share the same prefix. Therefore, when the itemsets in the collection have to be enumerated, we first access the prefix array. Then, from the corresponding entry in the index array we get the section of suffixes stored in suffix, needed to complete the itemsets. Data in Fig. 3 consists of four pillars. In these data there is more than one instance that occurs more than one time. The only column that doesn’t have repeated data is column one. Therefore, this is the column that will be excluded from compression because this column is usually the attribute key of database. Thus, our principal will be to compress data that has frequently repeated instances. The initial step will be to identify columns within the data set that has repeated entries and once these columns are identified the compression process starts. The percentage of compression is (31/44 = 70.5%).
6 Data Mining Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. These tools can include statistical models, mathematical algorithms, and machine learning methods (algorithms that improve their performance automatically through experience, such as neural networks or decision trees). Consequently, data mining consists of more than collecting and managing data, it also includes analysis and prediction. Data mining is the principle of sorting through large amounts of data and picking out relevant information. It is usually used by business intelligence organizations, and financial analysts, but it is increasingly used in the sciences to extract information from the enormous data sets generated by modern experimental and observational methods. It has been described as "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data" and "the science of extracting useful information from large data sets or databases" [6, 7, 8 and 9].
7 Database Classifications Three methods of data mining classification will be tested and compared, these are; 1. 2. 3.
Na¨ıve Bayes that can be used when all the attributes are categorical [11]. Nearest Neighbor that can be used when all the attributes are continuous. A decision tree algorithm is a predictive model that can be viewed as a tree.
7.1 Na¨ıve Bayes Classifiers The data that will be used in comparison is the buses arrivals, if the probability of an event, e.g. that the 6.30 p.m. Buses from the city centre to your local station arrives on
634
Y.A. Alsultanny
time, is a number from 0 to 1 inclusive, with 0 indicating ‘impossible’ and 1 indicating ‘certain’. A probability of 0.7 implies that if we conducted a long series of trials, e.g. the arrival recorded at 6.30 p.m. buses day by day for N days, the buses expected to be on time on 0.7 × N days. The longer the series of trials the more reliable this estimate is likely to be. Example To the Buses example, if we expected four mutually exclusive and exhaustive events. E1 – Buses Switched. E2 – Buses ten minutes or more lately. E3 – Buses less than ten minutes late. E4 – Buses on time or early. The probability of an event is usually indicated by a capital letter P, so we might have P (E1) = 0.04 P (E2) = 0.17 P (E3) = 0.21 P (E4) = 0.58 Each of these probabilities is between 0 and 1 inclusive, as it has to be qualify as a probability. They also satisfy a second important condition: the sum of the four probabilities has to be 1, because precisely one of the events must always occur. In this case P (E1) + P (E2) + P (E3) + P (E4) =1
(1)
Generally we are not in a position to know the true probability of an event occurring. To do so for the buses example we would have to record the buses’ arrival time for all possible days on which it is scheduled to run, then count the number of times events E1, E2, E3 and E4 occur and divide by the total number of days, to give the probabilities of the four events. In practice this is often prohibitively difficult or impossible to do, especially if the trials may potentially go on forever. Instead we keep records for a sample of say 365 days, count the number of times E1, E2, E3 and E4 occur, divide by 365 (the number of days) to give the frequency of the four events and use these as estimates of the four probabilities. The outcome of each trial is recorded in one row of a table. Each row must have one and only one classification. For classification tasks, the usual terminology is to call a table (dataset) such as Table 1, a training set. Each row of the training set is called an instance. An instance comprises the values of a number of attributes and the corresponding classification. The training set constitutes the results of a sample of trials that we can use to predict the classification of other (unclassified) instances. Suppose that our training set consists of 24 instances, each recording the value of four attributes as well as the classification. We will use classifications: switched, very late, late and on time to correspond to the events E1, E2, E3 and E4 described previously.
Comparison between Data Mining Algorithms Implementation Table 1. The buses dataset No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Day Weekday Weekday Weekday Thursday Weekday Friday Holiday Weekday Weekday Thursday Holiday Weekday Friday Weekday Weekday Weekday Friday Weekday Thursday Weekday Weekday Thursday Weekday Weekday
Season Spring Autumn Winter Summer Winter Summer Winter Autumn Winter Spring Autumn Winter Spring Winter Autumn Spring Summer Spring Summer Summer Autumn Spring Summer Autumn
Wind None None None None High High Normal High None Normal Normal High Normal None High None None None None Normal None Normal None High
Rain Low Medium High Low High Low High Low High Low Low High Low High High Low Low Medium Low Low Low High Low Low
Class On Time On Time Late On Time Very Late Late On Time On Time Late On Time Switched Very Late On Time Late Very Late On Time On Time On Time On Time On Time On Time Very Late On Time Late
Table 2. Conditional and prior probabilities class= On Time day=Weekday 8/14=0.57 2/14=0.14 day=Friday 1/14=0.07 day=Holiday day=Thursday 3/14=0.21 season=Spring 5/14=0.36 season=Winter 1/14=0.07 season=Summer 5/14=0.36 season=Autumn 3/14=0.21 9/14=0.64 wind=None 1/14=0.07 wind=High 4/14=0.28 wind=Normal 11/14=0.78 Rain=None Rain =Medium 2/14=0.14 1/14=0.07 Rain =High Prior 14/24=0.58 Probability
class= Late 4/5=0.8 1/5=0.2 0/5=0 0/5=0 1/5=0.2 2/5=0.4 0/5=0 1/5=0.2 0/5=0 3/5=0.6 1/5=0.2 1/5=0.2 0/5=0 3/5=0.6
class= Very Late 3/4=0.75 0/4=0 0/4=0 1/4=0.25 1/4=0.25 2/4=0.5 0/4=0 1/4=0.25 0/4=0 3/4=0.75 1/4=0.25 0/4=0 0/4=0 4/4=1
class= Switched 0/1=0 0/1=0 1/1=1 0/1=0 0/1=0 0/1=0 0/1=0 1/1=0 0/1=0 0/1=0 1/1=0 1/1=0 0/1=0 0/1=0
5/24=0.21 4/24=0.17 1/24=0.04
635
636
Y.A. Alsultanny
For the buses data we can tabulate all the conditional and prior probabilities as shown in Table 2. For example, the conditional probability P (day = weekday | class = on time) is the number of instances in the buses dataset for which day = weekday and class = on time, divided by the total number of instances for which class = on time. These numbers can be counted from Table 1 as 8 and 14, respectively. So the conditional probability is 8/14 = 0.57. The prior probability of class = very late is the number of instances in Table 1 for which class = very late divided by the total number of instances, i.e. 4/24 = 0.17. We can now use these values to calculate the probabilities of real interest to us. These are the posterior probabilities of each possible class occurring for a specified instance, i.e. for known values of all the attributes. We can calculate these posterior probabilities by using Na¨ıve Bayes Classification. When using the Na¨ıve Bayes method to classify a series of unseen instances the most efficient way to start is by calculating all the prior probabilities and also all the conditional probabilities involving one attribute, though not all of them may be required for classifying any particular instance. Using the values in each of the columns of Table 2 in turn, we obtain the following probabilities for each possible classification for the unseen instance: Weekday
Winter
High
High
Very Late
Class = on time 0.58 * 0.57 * 0.07 * 0.07 * 0.07 = 0.0001 Class = late 0.21 * 0.8 * 0.4 * 0.6 * 0.6 = 0.0240 Class = very late 0.17 * 0.75 * 0.5 * 0.75 * 0.75 = 0.0360 Class = Switched 0.04 * 0 * 0 * 0 * 0 = 0.0000 The largest value is for class = Very Late. Since we are looking for the probability, we selected the highest value which is less than one to be the value for the unseen. Note that the four values calculated are not themselves probabilities, as they do not sum to 1. This is the significance of the phrasing ‘the posterior probability. Each value can be ‘normalized’ to a valid posterior probability simply by dividing it by the sum of all four values. In practice, we are interested only in finding the largest value so the normalization step is not necessary. The Na¨ıve Bayes approach is a very popular one, which often works well. However it has a number of potential problems, the most obvious one being that it relies on all attributes being categorical. In practice, many datasets have a combination of categorical and continuous attributes, or even only continuous attributes. This problem can be overcome by converting the continuous attributes to categorical ones. A second problem is that estimating probabilities by relative frequencies can gave a poor estimate if the number of instances with a given attribute/value combination is small. In the extreme case where it is zero, the posterior probability will inevitably be calculated as zero. This happened for class = switched.
Comparison between Data Mining Algorithms Implementation
637
7.2 Nearest Neighbor Classification Nearest Neighbour classification is mainly used when all attribute values are continuous, although it can be modified to deal with categorical attributes. The idea is to estimate the classification of an unseen instance using the classification of the instance or instances that are closest to it, in some sense that we need to define. Supposing we have training set with just two instances such as that shown in Table 3. There are six attribute values, followed by a classification (positive or negative). We are then given a third instance. Table 3. Training set for two instances a yes yes
b no yes
c no yes
d 11 25
yes
no
no
22
e 100 200 180
f low high
Class negative positive
High
???
What should its classification be? Even without knowing what the six attributes represent, it seems intuitively obvious that the unseen instance is nearer to the second instance than to the first. In practice there are likely to be many more instances in the training set but the same principle applies. It is usual to base the classification on those of the k nearest neighbors (where k is a small integer such as (3 or 5), not just the nearest one. The method is then known as k-Nearest Neighbor or just k-NN classification as follows; Basic k-Nearest Neighbour Classification Algorithm ¾ ¾
Find the k training instances that are closest to the unseen instance. Take the most commonly occurring classification for these k instances.
We can illustrate k-NN classification diagrammatically when the dimension (i.e. the number of attributes) is small. The following example illustrates the case where the dimension is just 2. In real-world data mining applications it can of course be considerably larger. Table 4 shows a training set with 25 instances, each giving the values of two attributes for Material Order and Material Value and an associated classification. How can we estimate the classification for an ‘unseen’ instance where the first and second attributes are 28 and 300, respectively? For this small number of attributes we can represent the training set as 25 points on a two-dimensional graph with values of the first and second attributes measured along the horizontal and vertical axes, respectively. Each point is labeled with a + or − symbol to indicate that the classification is positive or negative, respectively. The third attribute is used to find the unknown class (negative or positive) based on the nearest neighbor’s classification and the higher number of occurrences. Based on our earlier assumption the class turned to be positive as illustrated in Fig. 4.
638
Y.A. Alsultanny Table 4. Training set for material data No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Attribute (1) for Mat. Order 1 2 3 4 6 7 8 9 10 11 12 13 14 15 16 17 21 22 23 24 25 26 27 28 29
Attribute (2) Class for Mat. Value 773 − 200 + 100 − 200 − 100 − 200 − 100 − 100 − 110 − 773 + 120 − 125 − 115 − 130 − 100 − 95 − 80 + 80 + 80 + 80 + 80 + 240 + 240 + 240 + 800 +
Fig. 4. Two-dimensional representation of training data in Table 4
Comparison between Data Mining Algorithms Implementation
639
A circle has been added to enclose the three nearest neighbors of the unseen instance, which is shown as a small circle close to the centre of the larger one. 7.3 Decision Trees for Classification A decision tree is a predictive model that, as its name implies, can be viewed as a tree. Specifically each branch of the tree is a classification question and the leaves of the tree are partitions of the dataset with their classification. A decision tree is a classifier expressed as a recursive partition of the instance space. A decision tree consists of nodes that form a Rooted Tree, meaning it is a Directed Tree with a node called root that has no incoming edges. All other nodes have exactly one incoming edge. A node with outgoing edges is called internal node or test nodes. All other nodes are called leaves (also known as terminal nodes or decision nodes). In a decision tree, each internal node splits the instance space into two or more subspaces according to a certain discrete function of the input attributes values. In the simplest and most frequent case each test considers a single attribute, such that the instance space is partitioned according to the attribute's value. In the case of numeric attributes the condition refers to a range. Table 5. Training set for the IT Dept/TM Dept example No.
Title
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Employee Contractor Employee Employee Employee Contractor Employee Employee Contractor Contractor Contractor Employee Employee Employee Employee Employee Contractor Contractor Employee Employee
Married National Yes Yes No No No Yes Yes Yes No No No No Yes Yes Yes No No No No Yes
No Yes Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes No No Yes Yes Yes Yes Yes
Position
Class
Engineer Engineer Engineer Engineer Technician Technician Engineer Technician Technician Technician Engineer Technician Technician Engineer Engineer Technician Technician Technician Engineer Technician
IT Dept. IT Dept. TM Dept. IT Dept. IT Dept. IT Dept. IT Dept. TM Dept. IT Dept. IT Dept. IT Dept. TM Dept. TM Dept. IT Dept. IT Dept. TM Dept. IT Dept. IT Dept. TM Dept. TM Dept.
640
Y.A. Alsultanny
Fig. 5. IT Dept and TM Dept example: decision tree
Each leaf is assigned to one class representing the most appropriate target value. Usually the most appropriate target value is the class with the greatest representation, because selecting this value minimizes the zero-one loss. However if a different loss function is used then a different class may be selected in order to minimize the loss function. Alternatively the leaf may hold a probability vector indicating the probability of the target value having a certain value. A decision tree is created by a process known as splitting on the value of attributes (or just splitting on attributes), i.e. testing the value of an attribute such as CTRYNAME and then creating a branch for each of its possible values. In the case of continuous attributes the test is normally whether the value is ‘less than or equal to’ or ‘greater than’ a given value known as the split value. The splitting process continues until each branch can be labeled with just one classification. Decision trees have two different functions: data compression and prediction. Table 5 Gives a training set of data collected about 20 employees, tabulating four items of data about each one (title, marital status, sex and position) against the department joined. What determines who joins which department? It is possible to generate many different trees from this data using the Top Down Induction of Decision Trees (TDIDT) algorithm [11]. One possible decision tree is Fig. 5. This is a remarkable result. All the Contractor employees work in IT Department. For the direct employees, the critical factor is position status. If they are engineer ones all work in IT Department. If they are Technician, they are working in TM Department. All non-national are direct employees and working in IT Department.
8 Conclusions Classification is one of the most common data mining tasks. Different methods to the data mining were developed on the data prepared for processing; data compression was used as a pre-processing method. The three techniques; Na¨ıve, nearest neighbor and
Comparison between Data Mining Algorithms Implementation
641
decision tree are implemented in this paper with different resources of data. The Bayes Na¨ıve algorithm was calculated the most likely classification for weather forecasting of buses dataset and the probability for unknown classification. This algorithm can be effective in deductive learning algorithm for machine learning similar to the example introduced in this paper to predicate the bus arrival depends on classification historical data. Nearest Neighbor technique which is a prediction technique that is quite similar to clustering and it calculates the instance to determine an unknown classification of attributes, and get highest number of occurrences. This algorithm is most suitable when all the attribute values are continuous or categorized attributes; the predicated value is the value which takes its significant value from its neighbor. Finally, the decision tree technique which has two different functions: data compression and prediction. It is popular for classification and predication, it classify data by rules that can be expressed in English, so that can be understand them easily. The decision tree can gave high accuracy in classify the data at hand and establishing a pattern that is helpful in future predictions. The three different methods have been used with different sets of data; each one has its own data set. The results show that the three methods are appropriate for determining the target value, but selecting the suitable method is very important to obtain correct results and this selection depends mainly on the nature of data types and their completeness by knowing exactly what the data warehouse have.
References 1. Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P.: From Data Mining to Knowledge Discovery: An Overview. In: Advances in Knowledge Discovery and Data Mining, pp. 1–30. AAAI Press, Menlo Park (1996) 2. Tuban, Leidner, Mclean, Welherbe: Information Technology for Management, 7th edn. John Wiley and Sons, Chichester (2010) 3. Yu, S.C., Chen, R.S.: Developing an XML Framework for an Electronic Document Delivery System. The Electronic Library 19(2), 102–108 (2001) 4. Soransen, K., Janssens, G.: Data Mining with Genetic Algorithms on Binary Trees. European Journal of Operational Research 151, 253–264 (2003) 5. Ghosh, A., Nath, B.: Multi-Objective Rule Mining using Genetic Algorithms. Information Science: An International Journal 163(1-3), 123–133 (2004) 6. Frawley, W., Shapiro, P., Matheus, C.: Knowledge Discovery in Databases: An Overview. AI Magazine, 213–228 (1992); ISSN 0738-4602 7. Hand, D., Mannila, H., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001) 8. Olson, D., Delen, D.: Advanced Data Mining Techniques. Springer, Heidelberg (2008) 9. Sharma, S., Osei-Bryson, K.-M.: Framework for Formal Implementation of the Business Understanding Phase of Data Mining Projects. Expert Systems with Applications 36(2), 4114–4124 (2009) 10. Bhargavi, P., Jyothi, S.: Applying Naive Bayes Data Mining Technique for Classification of Agricultural Land Soils. International Journal of Computer Science and Network Security 9(8), 117–122 (2009) 11. Seyed Mousavi, R.: Krysia Broda: Impact of Binary Coding on Multiway-split TDIDT Algorithms. International Journal of Electrical, Computer, and Systems Engineering 2(3) (2008), http://www.waset.org
Role of ICT in Reduction of Poverty in Developing Countries: Botswana as an Evidence in SADC Region Tiroyamodimo M. Mogotlhwane1, Mohammad Talib1, and Malebogo Mokwena2 1
Department of Computer Science, University of Botswana 2 Barclays Bank of Botswana [email protected], [email protected], [email protected]
Abstract. Information and communication technologies (ICTs) have penetrated even some of the poorest developing countries. These include the sudden increase of mobile phone use, advent of the internet with its introduction of globalised social networking sites, information and communications technology (ICT) services and saturation of computerised content. Scholars and observers worldwide have sort to debate ICTs roles. Previous research focused on how ICTs impact on economic, social and cultural apsects of life. There is limited research on ICT services and content that focuses on the poor, particulalry those that encourage entrepreneurship as a means to achieve poverty reduction in developing countries. This paper is using secondary data and document analysis from Botswana, a member country to South African Development Community (SADC), to find out how ICTs can be used in poverty reduction in developing countries. Keywords: Information and Communication Technology (ICT); Poverty reduction; Internet, Media, Communication, SADC (South African Development Community).
1 Introduction 1.1 Definition of Information and Communication Technologies and Poverty Information and Communication Technologies (ICTs) Information and communication have been advancing very fast. Today ICTs are generally regarded as the driving force behind the economy in every country. ICTs include all communication devices or applications, electronic networks services including hardware applied through networks services [1], [2]. These include mobile phone, internet, software systems, hardwares, computing information services, multimedia, telephone, fax, and electronic news. Businesses, organisations and the commercial sectors depend heavily on this technology [3]. Some of the benefits of using technology in communications may include faster and enhanced communication, and provision of efficient services. In many cities, ICTs are a source of employment for citizens (ICT professionals), providing faster H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 642–653, 2011. © Springer-Verlag Berlin Heidelberg 2011
Role of ICT in Reduction of Poverty in Developing Countries
643
and easier working patterns, helping to define the role of organisations and simplifying the nature of work and also helping in growing the country’s economy through its adoption by SMEs [4], [5]. Public service, commercial and industrial organisations are using ICTs to do purchasing, marketing, operations, customer profiles, suppliers profile, information exchanges clients contact, suppliers contact and customers contact [6]. Today, many company directors and managers have taken advantage of ecommerce and e-business as new concepts in their day-to-day service delivery, to ensure that their services and business decisions are promoted to attract many customers efficiently [7]. Governments have also resorted to e-Governance, to better offer their services to clients. Others have also adopted e-Democracy especially in democratic states where governments would want to allow participatory majority through dialogues and chats as a means of communication at government and public forums [8]. At schools ICTs are fused into education to enhance research and expand the quality of learning [9]. One of the major concerns has been the importance of ICT as a means to reduce poverty. As Duuff has put it, “History shows how ICTs have developed over years: from the agricultural society, through the industrialised society and now to the information society [10 p.354]. Many places now use ICTs”. Today, more benefits of ICTs are even extended to the economical status of each country which is made up of a wider society. According to Duff, it is important to look at such areas where ICT has been developing, to find out whether society has been informed and whether they have benefited from it as a result of new technological developments [10]. Botswana like many other countries worldwide is not left behind in ICTs. In local cities, towns, villages and settlements there are some form of ICTs that are being utilised. The country has also introduced policies for expanding and enhancing ICT use. These policies are part of the liberalization of the national telecommunication plans to bring the government and the whole country into the global information age [11]. One significant effect of the policies is that usability of ICT in the country is growing faster due to level of access rollout across the country. Poverty The debate on the definition of poverty and its measures has been on-going since the first half of the 20th century. Poverty has no generally accepted definition. For many people poverty is state of vulnerability which makes the people susceptible to abuse and exploitation by those who have a better life. The old school definition of poverty was premised upon Charles Booth’s (cited in Davies 1978) invention of “poverty line” [12]. According to this definition, a person whose incomes fall below poverty datum line was regarded poor. Booth’s original work also showed that poverty is a social condition. By contrast, recent studies argue that poverty cannot be understood based on figures only. In other words, poverty is regarded as a multifaceted issue, which cuts across all sectors of the economy. It has a social, economic, political and cultural dimension which makes it a priority to policy makers worldwide [13]. According to World Bank, poverty is defined as “the inability to attain a minimal standard of living” [14]. This definition was adopted by many countries over the years. An overview of the economy with regard to poverty shows that there are some disparities between urban and rural centres across the world. Rural areas are hard hit by inequalities and high poverty levels. Most studies perceived a “poor” person to be
644
T.M. Mogotlhwane, M. Talib, and M. Mokwena
somebody who is unskilled, unqualified and have little power to make demands [15], [13]. However, Vandenberg view is that, although some of the perceived causes of poverty can be related to intelligence, it is erroneous to equate poverty with low ability and character defects [16]. Vandenberg is of the view that people under economically poor category would not fit in such description. Due to levels of poverty in Botswana, the country is classified among the third world and developing nations. Many of these countries worldwide have since looked for means to fight poverty [17]. Some developmental goals have been set to push the standard of living for citizens. This include millennium development goals especially the provision of a good life to citizens. National leaders seem convinced that ICTs can be used in this area to counter crises of poverty, especially through complex situations which are economic, educational, political, and even the other challenges facing the poor [2]. 1.2 Research Questions In order to investigate the role of ICT in reducing poverty, the following research questions were formulated to guide the study. The escalation and use of ICT worldwide has been emerging faster, what has been the case in regard to its readiness and use in Botswana? What about the relevance of ICTs to the society, and its impact to Botswana in the indices of poverty reduction? Does ICT reduce socio-economic divisions between the rich and the poor?
2 Methodology The study used case study approach involving the use of multiple sources of evidence to understand a phenomenon [18]. Document and textual analysis involving the use of data from reading and analysing reports, news and information were sourced from the Ministry of Communications Science and Technology of Botswana governement, Botswana Telecommunications Authority and local ICT service providers. News on the efforts by the state government and other local ICT service providers to enrich the citizenry regarding ICT access and policies were studied from online newspapers and other internet sources for evidence. Telephone interviews were conducted with senior personnel at Mascom Wireless and Orange Botswana (mobile phone and internet providers), Botsnet (internet service providers) were asked to define their roles in rolling out ICTs in Botswana to help reduce poverty from affected members of the society. The study was undertaken in March 2010.
3 Literature Review There is limited literature on the role of ICTs in the reduction of poverty and promoting socio-economic development in developing nations [19], [20], [4]. Clarke and Englebright have attempted to define ICT as a basic skill, which includes computing technologies, domestic and commercial systems and equipment [4]. This paper supports literature that ICT covers the use of technology to handle information and aid communications, and that its main characteristic is that it keeps changing and improving for the better, with newer versions released from time to time.
Role of ICT in Reduction of Poverty in Developing Countries
645
Kelles-Viitanen concurs with the UNDP report that “using ICT in pursuit of developmental goals allows countries to achieve a wide diffusion of benefits from ICT, which, in the end will benefit broad-based economic growth, too” [21 p.85],[22],[23]. In her report, Kelles-Viitanen mentions that ICTs can create some employment opportunities for the poor, citing examples such as Grameen Bank in Bangladesh, and other countries such as Malaysia and Taipei. In a report by the World Bank, Information communication technologies are reported to have played an important role in the growth of the economy across many countries [24]. In their study, the researchers considered ‘Trade and the reduced transactions costs of business,’ and ‘capital accumulation,’ as significant factors around ICTs and economic theory. Trade and the reduced transactions costs of business as a result of ICTs refer to the level of business increase, increase in variety of service related activities, efficient supply chain across borders. These factors according to Grace et al., “have created new opportunities for large and small firms from developing countries to increase their sales range and tap into the global market for goods and services” [24 p.7]. Capital accumulation through the use of ICTs refers to the situation when finance networks become digital, and get expanded. An example cited here is ‘AutoBank E,’ a fully automated savings system which minimizes paperwork and transactions costs. This system has been developed and intended for use by the poorest depositors in South Africa [25]. This simply increases the ability of the poor to access financial services. A study by Spence & Smith has also revealed that indeed there has been some booming of ICTs in many countries irrespective of their economic status, and its use is known to facilitate the expansion of markets, social businesses and public services [26]. A couple of examples cited by Spence and Smith include the explosion of mobile phone use, internet communication and networking services, which enable banking systems and financial transactions, marketing and distributions, employment creation, personal and public services [26]. While some of these can be equated to major economical impacts, expectations are that they improve the personal well being of an individual, thereby reducing and preventing poverty. Mobile phone service providers employ many people to serve as ICT shop managers, back office staff, networking specialists, cashiers, marketing and advertising agents thereby adding to their wealth and improving their well being. Mobile phone users are able to save money by utilising their cell phones instead of going to the banks for financial transactions, and their personal securities are improved [27]. Another benefit cited by Spence & Smith is communication and networking enabled by ICTs as these have the potential to transform the economics of a country even the poorer ones [26]. When connectivity is expanded to the poor, through ICT services, they would get employment, be served better, faster and efficiently through these networked services. It has become a surety in many countries that information communication technologies are being utilised to become instruments of government policies. ICTs have been used to create information intensive activities to serve national goals and also serve as the developmental opportunities of information for intensive industries [28]. Examples cited here include the impressive economic success of Singapore, Korea, Hong-Kong and Taiwan. Many countries like India and Indonesia have used mass media technologies for national building purposes. In India, SITE (Satellite Instructional Television Experiment) project – a satellite – was used to reach and educate remote communities, while in Indonesia the satellite communications were used to
646
T.M. Mogotlhwane, M. Talib, and M. Mokwena
reach many people in the country’s many islands (Morison cited in [28]). Other countries like Mauritius have developed several cycles of e-strategies as part of their broader national development programs, and others are already looking into the potential role of ICTs in the developmental efforts to help reduce poverty among citizens [29]. Many developing countries have faced challenges to fight health related issues including the HIV/AIDS scourge. In India for example, the development of Health-care databases, telemedicine, web-based initiatives, and health information systems are some ICT initiatives that have been adopted by the health system [30], [31]. Examples elaborated in this research include “the management of HIV programmes which requires data from various sources such as the mother, child and HIV- specific programmes” [30 p. 268] While the Indian health sector has gone through challenges at its initial stages, the results also proved that as ICT in India developed during those years, signs of serious rewards were also emerging. One other example of ICT use is ICT as an enabler for education for Africa. With the call for education for all, governments have since been committed to meeting the growing demand for the delivery of education services to its populations. ICTs have been placed at the centre of educational developments especially in Africa [32].
4 Presentation of Secondary Data 4.1 ICT Infrastructure and Access in Botswana Many of the Sub-Saharan countries fall in the low-income category. Botswana is counted among the countries regarded as middle-income due to the higher levels of per-capita telecommunications infrastructure, personal computers, internet hosts, telephone main lines, and mobile phones [29]. Compared to other countries in this spectrum, the economical performance of Botswana has a direct bearing on the state of education, infrastructure, health and services through the availability and affordability of ICTs for public, business and private use. The government of Botswana through the Ministry of Communications Science and Technology has established ICTs tele-centres nationally equipped with necessary infrastructure for ICT related businesses. At these centres, citizens, especially the youth are provided with internet facilities, telephone, fax and other secretarial services on daily basis. These tele-centres are under the care of district youth officers. As confirmed by Saboo in email, at these centres the government wants to develop human resources - especially among the youth - that support the deployment and rehabilitation of modern ICT infrastructure [33]. Commercial developments especially at the rural areas are also supported through tele-centres, and there are computer training, thereby giving desktop skills to the unemployed youth who could later get employment elsewhere. Also at these centres, there are job advertisements, application forms for national identity (Omang) and passport, one can obtain funding and school registration, etc. One area which has not been effectively impacted by ICT in Botswana is health services. Recently, there have been reports of introducing ICTs at health centres of hospitals and clinics to provide fast and modernised health services to citizens. The
Role of ICT in Reduction of Poverty in Developing Countries
647
internet nowadays is loaded with most popular sites offering health services to online audience members. Specific health agencies like NACA (National AIDS Coordinating Agency), BOTUSA (BOTswana-USA), BOCAIP (BOtswana Christian AIDS Intervention Programme), BOFWA (BOtswana Family Welfare Association) and even the Ministry of Health, provide all members of the public with information and advice on health issues through their websites. Other ICT services found in Botswana’s health sector include free direct telephone services, and new hospitals like Bokamoso (http://www.bokamosohospital.com) have websites where patients from all walks of life can contact their medical doctors from time to time. At some local private clinics, medical records are kept in databases and this is beneficial to all people since doctors can easily deal with patients understanding their medical histories. Recently, in a survey report by BOPA (BOtswana Press Agency) in the Daily News Mr Nick Ndaba (columnist) indicated that the newly introduced telemedicine in Botswana would help reduce the shortage of health professionals and extend health care resources in Botswana [34]. Mr Ndaba also mentioned that Botswana intends to implement national telemedicine centre, and this will be carried out in stages. Local post offices are equipped with relevant ICT infrastructure to ease service delivery. There is electronic mail service, fax, electronic money transfer, emailing and internet services provided. Significant efforts are being made by the government to make sure that citizens at rural areas utilise ICT through these post offices. Vehicle registration and licence fees can be paid at most of the post offices in the country. The government of Botswana has recently launched e-governance service, to ensure that its citizens are provided with information which calls for public participation in national developments. Internet shopping, which is a new development in Botswana, is likely to widen choice for goods, lower costs and better selection especially those coming from physical shopping, and increase convenience as it expands [35]. An area where ICT use has the potential to have significant impact in Botswana’s environment is in e-commuting. This is because urban areas are facing accute shortage of accommodation forcing property prices to be beyond reach of many. A significant number of employees in Botswana commute long distances and yet the transport sector is well behind in application of ICT to improve e-commuting as practiced in developed countries where a single ticked can be used in buses, trains and airplanes, a feature that serves the tourism sector well. 4.2 Level of ICT Use in Botswana While ICT infrastructure in Botswana is among the best in Africa, not many Batswana (Botswana nationals) especially in rural areas benefit directly from it. Several factors including lack of skill to utilise ICT equipment, lack of skill to utilise ICT service and levels of acquisition in relevance to the economical stand of each household are a hindrance to the use of ICTs in Botswana. Research has been made easy through the internet, and new teaching and learning methods’ including the use of WebCT at the University of Botswana has been a welcome development. In an interview with Mr Grant Son, General Manager of Botsnet, his organisation has realised that not many Batswana have access to ICTs services of Internet; therefore they have decided to reduce the pricing for internet service provision. They have embarked on a wide marketing campaign for this service, and other additional benefits to make sure
648
T.M. Mogotlhwane, M. Talib, and M. Mokwena
it is accessible to customers, including laptop vouchers as part of the package. Other mobile phone company managers interviewed (Orange and Mascom Botswana), mentioned that reducing prices for internet service at Botsnet has made it affordable to many Batswana and other internet service providers are compelled by these low market prices to also reduce theirs. As researchers have put it, information communication technology provides the necessary method to pay for services, and through this, new job opportunities are created and labour intense duties become easier [36]. Civil servants and officers at private institutions and other organisations are utilising the computer, phone and fax machines provided in their office to their benefit and to the benefit of the society. With the use of the internet, through online and virtual communications, the poor communities are assisted easily and quickly. Even during national disasters, citizens are given information faster through televisions, radio and telephones, and this makes them understand national issues. The advent of mobile telephony has transformed the communication landscape and added value to personal communication. This new form of communication has a number of advantages over fixed lines. Many cell phone models now have digital video and still cameras, radio, and internet capabilities thus making the device a digital media hub. Users are not limited to any location as long as network coverage is availed by their service providers. As a consequence, users can make calls, send text and multimedia messages, chat, send email and voice mail, play games, music and videos. The cell phone has rapidly transformed the lives of many individuals [37]. The device provides some nearness in relevance to communication, and through its use it has positively impacted the social bonds between families and friends [38]. Mobile phone service providers support local artists by signing contracts to provide ringtones, caller tunes etc [39]. Through these services, even the less popular artists benefit through advertising since they are able to reach over 1 million Mascom suscribers and sell their music to them through this platform. Through online and telephone banking services provided by some of the local banks, they can now sell their products to customers easily. Studies have shown that many bank customers have resorted to the services, citing that it saves both time and costs [40]. Although this sector may pose threats due to challenges of security and full access; such developments have been welcome by many who now see it as an important way to cut costs. There is no need to pay transportation fees to travel to banks, and the prices of using ICTs provided by the banks from anywhere is less compared to those of getting the service straight at the banks. Also at the banks, “computers and communication systems provide instant information on the state of accounts and provide fast transfer of transcations between branches of the same bank and between different banks” [41 p.120]. In a survey sponsored by two of the commercial banks in Botswana (Barclays and Standard Chartered Bank), it has been noted that cell phone-based remittance and banking services may be one way to extend the reach of financial services to the poor [42]. Online services requiring e-commerce are found in almost every commercial webage. As the number of products sold on the web keeps increasing, the web becomes populated and internet commerce will rise [43]. The incentive for both users and host is that the services are catering for everyone and provided at cheaper prices.
Role of ICT in Reduction of Poverty in Developing Countries
649
Through e-governance, the government of Botswana managed to push service delivery for the betterment of its citizens. The idea by many countries to do this is to make sure that all government services are available electronically [44]. Most ministries and departments in Botswana now provide services through ICT infrastructure. E–passports are now provided at the ministry of Home Affairs, with the idea to catch up with the developed world standards and also to check frauds. Such services are also extended to short message services where clients to the minsitry are sent messages to alert them that their passports are ready. The service is good for all, ensuring that customers do not have to keep coming to the ministry (losing a lot of money through transportation) to check if their passports are ready. ICTs are also available in Botswana for recreational purposes. This includes radio broadcast online, computer games, webcasts, DVDs, and social networking through sites like http://www.facebook.com, http://www.twitter.com and http://www.myspace.com etc, which are essential for the youths and academics as well in Botswana. In fact, ICT based entertainment is expanding. While studies have shown that youth from poor communities are vulnerable to criminal acts, recreational activities through ICTs will engage them and keep them away from illegal acts. Although low bandwidth is currenly limiting advance application of internet ICT based application, this will soon not be the case as connection to the East Africa Submarine System (Eassy) begin to yeild results [45]. This will be improved further once Botswana has connected to the West Africa Cable System (WACS) which will further increase internet speed in Botswana [45]. 4.3 Level of Readiness A level of readiness refers to the degree to which the user is prepared and willing to make use of ICT. As Gasco-Hernandez, Equiza-Lopez, & Acevedo-Ruz, have put it, “Often the true value of ICT for poor people will reside in how their intermediaries – local governemnt, public-service institutions like schools or clinics, nongovernmental organisations, community radio stations, and so forth – can use ICT to better address their individual needs” [46 p.xi]. One of the reasons why the poor cannot access ICTs fully is their limited access to technologies [2]. Internet use in Botswana is estimated to be about 6%, a very low figure in comparison with European countries; however this is a gradual increase from figures of previous years [47]. The level of readiness is gradually increasing, since many young people are showing interest and the Ministry of Communications Science and Technology is busy making sure that citizens are given access through rollout to post offices, youth centres and tele-centres. In many countries especially in Europe, these tele-centres have been developed in rural areas to assist local groups to collect, manage, and disseminate information that other citizens needed to live independently [48]. As part of Botswana government’s service to its people, incorporating ICTs must come secondary to broader reform agenda considered on its own merits. In the process of introducing and implementing ICTs, acceptance by all key stakeholders is necessary, there should be identification for reform, identification of system requirements, and identification of the need for ICTs. While such efforts by the government of Botswana to rollout ICT
650
T.M. Mogotlhwane, M. Talib, and M. Mokwena
services through tele-centres, post offices, there is need to monitor these projects and ensure that every member of the society is guaranteed access. With these efforts in place to push for access for all, maximum impact is guaranteed especially through service delivery [49]. Access should precede service rollout and prioritization should be given to the members of the rural community, marginalised groups and the poor.
5 Conclusions and Recommendations Like in many other developing states, the emergence of ICT in Botswana is proof that these are vital to the lives of the citizens. The picture also shows that with the rate at which ICT equipment and services are booming, ICTs will soon reach many societies in Botswana. This will improve the economy by helping in the reduction of poverty and ensuring a static political culture. However, the economic liberalization, which swept through Africa in the late 1980s, had the effect of stimulating ICT use, ownership, acquisition and benefit among Botswana citizens to what we see in the country today. Efforts to globalize the rollout of ICT worldwide by developed countries to developing countries have also helped introduce equal standards of ICT practices and some level of profesionalism to that of international standards. While ICTs have proven to be significant for the development of Botswana citizens, it has increasingly become important to realise that this cannot happen unless there is full access to the technologies. Some ICT gadgets that help contribute to both the individual and the society are too expensive for the poor to acquire. These include internet and telephone services. Some may need other resources like electricity, and also proper maintenance for them to function efficiently. The lasting solution here may be that governments in developed countries should sell these infrastructures to developing countries at lower prices. The governments in developing countries should do their best to rollout these equipment and services even to the poor and rural areas for use by everyone. There should be maximum level of access to ensure greater impact to everyone. It is an identified problem that most often the poor are marginalised in matters of education and learning. Efforts should be made to make sure that access to infrastructure is equated with training to ensure full and proper participation and utilisation of the ICTs infrastructure. Problems of unsavoury and illegal content distribution are very common with the use of ICTs. These may include pornography and gambling on the internet especially by the poor and lower middle class, and attempts to control these have proven futile since providers can safely operate in offshore locations. In the case of Botswana such issues can be avoided by creating policies regulating the use of ICTs and models of acquisition. There is an urgent need to develop Botswana cyber law to address illegal activities that come with the increase use of internet based ICT application. This is because Botswana does not have cyber law hence this limit application of appropriate legislation by the courts when addressing cyber crime. Other issues pertinent to ecommerce and other financial services through ICTs should be monitored and there should be securities and adequate training (regarding use) by service providers to ensure that members of the society are not vulnerable to any efforts of misuse and fraud.
Role of ICT in Reduction of Poverty in Developing Countries
651
References 1. Adeya, N.: Information and communication technologies in development. In: Global Equity Conference, Rethinking ICTs in Africa, Asia, and Latin America, Heerlen Maastrisht, Nertherlands (2002) 2. Lekoko, R., Morolong, B.: Poverty reduction through community-compatible ICTs: Examples from Botswana and other African countries. In: Gasco-Hernandez, M., EquizaLopez, F., Acevedo-Ruz, M. (eds.) Information Communication technologies and human development: Opportunities and challenges, pp. 116–137. Idea Group Publishing, Hershey (2007) 3. Gehris, D.O., Szul, F.L.: Communications Technologies. Prentice Hall, Upper Saddle River (2002) 4. Clarke, A., Englebright, L.I.: The new basic skill. Niace, Leicester (2003) 5. Ongori, H.: The role of information and communications technology adoption in SMEs: Evidence from Botswana. Research Journal of Information Technology 1(2), 79–85 (2009) 6. Doyle, S.,: Information and Communication Technology: Vocational A level. Nelson Thornes Ltd., Cheltenham (2001) 7. Ray, R.: Technology solutions for growing businesses. Amacom, New York (2003) 8. Kalvet, T.: Management of Technology: The case of e-Voting in Estonia, International Conference on Computer Technology and Development (2009), http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=05360200 (December 10, 2010) 9. Russell, T.: Teaching and using ICT in secondary schools. David Fulton Publishers, London (2001) 10. Duff, A.: Impact of ICT on society. In: Lawson, J. (ed.) Information and communication technology: options, Pearson Education Limited, Essex (2002) 11. McDaniel, T.: Information technology policy and the digital divinde: Lessons for developing countries. In: Kagami, M., Tsuji, M., Giovannetti, E., Edward, E. (eds.), Elgar, Northampton (2004) 12. Davies, W.K.D.: Charles Booth and the Measurement of Urban Social Character. Area 10(4), 290–296 (1978) 13. Graaf, J.: Poverty and Development. Oxford University Press, Cape Town (2003) 14. World Bank, Poverty. World Development Report 1990. Oxford University Press (1990) 15. MacGregor, S.: The politics of poverty. Longman, London (1981) 16. Vandenberg, S.G.: Genetic factors in poverty. In: Allen, V.L. (ed.) Psychological Factors in Poverty. Markham publishing company, Chicago (1970) 17. Chrisanthi, A., Madon, S.: Development, self determination and informarion. In: Beardon, C., Whitehouse, D. (eds.) Computers and Society, Intellect, Oxford, pp. 120–137 (1993) 18. Wimmer, R., Dominick, J.: Mass media research: An introduction.Wadsworth Cengage Learning, Boston (2006) 19. Kagami, M., Tsuji, M., Giovannetti, E.l.: Information technology policy and the digital divide: Lessons for developing countries. Edward Elgar, Northampton (2004) 20. Munasinghe, M.: (ed) Computers and informatics in developing countrie. Butterworths, London (1989) 21. Kelles-Viitanen, A.: The role of ICT in poverty reduction, pp. 82- -94 (2003), http://www.etla.fi/files/895_FES_03_1_role_of_ict.pdf (March 20, 2010)
652
T.M. Mogotlhwane, M. Talib, and M. Mokwena
22. UNDP.: Final Report of the Digital Opportunity Initiative (2001), http://www.markle.org/sites/default/files/doifinalreport.pdf (February 23, 2011) 23. Mathijsen, P.S.R.F.: Technology and industrial change: the role of the European community regional development policy. In: Knaap, B.V.D., Wever, E. (eds.) New technology and regional development, pp. 108–118. Croom Helm, London (1987) 24. Grace, J., Kenny, C., Qiang, C.Z.W., Liu, J., Reynolds, T.: Information communication technology and broad based development: A partial review of the evidence. The World Bank, Washington (2004) 25. Mali, L.: AutoBank E: e-banking for rural South Africans. ICT update, CTA, Issue 13 (2003) 26. Spence, R., Smith, M.: ICTs human development growth and poverty reduction: a background paper. Adialogue on ICTs human development and poverty reduction. Harvard Forun, Boston (2009) 27. BBC, ‘M-Pesa: Kenya’s mobile wallet revolution, BBC News Online (November 22, 2010), http://www.bbc.co.uk/news/business-11793290 (January 25, 2011) 28. Avgerou, C., Madon, S.: Development, self-determination and information. In: Beardon, C., Whitehouse, D. (eds.) Computers and society, Intellect, Oxford, pp. 120–137 (1993) 29. Rezaian, B.: Integrating ICTs in African Development: Challenges and Opportunities. In: Hernandez, M.G., Lopez, F.E., Acevedo-Ruiz, M. (eds.) Sub-Saharan Africa in Information Communication Technologies and Human Development: Opportunities and Challeges. IDEA Group Publishing, London (2007) 30. Ranjini, C.R., Sahay, S.: Computer- based health information systems: Projects for computerization OR HALTH MANAGEMENT? Empirical experiences from India. In: GascoHernandez, M., Equiza-Lopez, F., Acevedo-Ruz, M. (eds.) Information Communication technologies and human development: Opportunities and challenges, pp. 116–137. Idea Group Publishing, Hershey (2007) 31. Bodvala, R.: ICT Applications in Public Healthcare System in India: A Review. ASCI Journal of Management 31(1&2), 56–66 (2002) 32. Isaacs, S.: ICT-Enabled education in Africa: A sober reflection on the development challenges. In: Gasco-Hernandez, M., Equiza-Lopez, F., Acevedo-Ruz, M. (eds.) Information Communication technologies and human development: Opportunities and challenges, pp. 210–234. Idea Group Publishing, Hershey (2007) 33. Saboo, A.: Re: [Telecetres] FYI: Rural Botswana goes online, Email (September 19, 2005), http://www.mail-achive.com/ telecentreswsis-cs.org/msg00436.html (March 27, 2010) 34. BOPA.: Telemedicine to reduce shortage of health professional, Daily News, (May 16, 2006), http://www.gov.bw ( March 31, 2010) 35. Morley, D.: Computers and technology in a changing society. Thomson, Boston (2006) 36. Webster, F., Robins, K.: Information technology: A luddite analysis. Alex Publishing Corporation, Norwood (1986) 37. Hjorth, L., Kim, H.: Being Real in the mobile reel (2005), http://www.cct.go.kr (March 25, 2007) 38. Ling, R.: The Mobile Connection: The cell phone’s impact on society. Morgan Kaufmann, San Francisco (2004) 39. Gaotlhobogwe, M.: Mascom signs 50 local artsists as they exploit ringtones, caller tunes. Mmegi (March 31, 2010), http://www.mmegi.bw (March 31, 2010)
Role of ICT in Reduction of Poverty in Developing Countries
653
40. Mobarek, A.: E-banking practices and customer satisfaction – a case study in Botswana. In: 20th Australasian Finance & Banking Conference (2007), http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1011112 (March 23, 2010) 41. Petrie, H.: Technology in and outside the home: its effects onthe provision of personal information for living. In: Meadows, J. (ed.) Information Technology and the Individual, p. 120. Pinter Publishers, London (1991) 42. Wanetsha, M.: Older banks to follow FNB’s lead in mobile banking. Mmegi (August 14, 2009), http://www.mmegi.bw (March 31, 2010) 43. Hamdani, D.: The use of Internet and Electronic Commerce in the Canadian banking and insurance industry. In: Mothe, J., Paquet, G. (eds.) Information, Innovation and Impacts, Kluwer Academic Publishers, Norwell (2000) 44. Luftman, J.N., Bullen, C.V., Liao, D., Nash, E., Neumann, C.: Managing the information technology resource: Leadership in the information age. Pearson Education, Upper Saddle River (2004) 45. Africa the good news: Internet and phone charges slashed in Botswana: MediaClubSouthAfrica.com (2011), http://www.africagoodnews.com/infrastructure/ict (February 28, 2011) 46. Gasco-Hernandez, M., Equiza-Lopez, F., Acevedo-Ruz, M.: Information Communication technologies and human development: Opportunities and challenges. Idea Group Publishing, Hershey (2007) 47. World Economic Forum. The Global Information Technology Report 2009-2010. Country profile: Botswana (2010) 48. Qvortrup, L.: Electronic village halls- IT and IT-assisted services for rural viallge communities. In: Glastonbury, B., LaMendola, W., Toole, S. (eds.) Information Technology and the Human Services, pp. 265–270. John Wiley & Sons, Chichester (1988) 49. Conradie, P.: Using information and communication technologies for development at centres in rural communities: lessons learned. In: Nulens, G., Hafkin, N., Audenhove, L., Cammaerts, B. (eds.) The digital divide in developing countries: Towards an information society in Africa, Vubpress, Brussels (2001)
Personnel Selection for Manned Diving Operations Tamer Ozyigit1, S. Murat Egi1, Salih Aydin2, and Nevzat Tunc3 1
Galatasaray University, Computer Engineering Department, Ciragan Cad. 36 Ortakoy, 34357, Besiktas, Istanbul, Turkey 2 Istanbul University, Underwater Medicine Department, 3409, Capa, Istanbul, Turkey 3 Bogazici Underwater Research Center, Yavuzturk Sok. 32/1, 34716, Kadikoy, Istanbul, Turkey
Abstract. The selection of personnel for diving operations is a challenging task requiring a detailed assessment of the candidates. In this study, we use computer aided multi-criteria decision support tools in order to evaluate the divers according to their work experience and physical fitness condition. The importance weights of the sub-criteria are determined using the Analytic Hierarchy Process (AHP) based on expert opinions. Two rankings of six divers for seven job specialization are obtained by their scores of work experience and physical fitness. The diver’s scores according to these two main criteria are used in Data Envelopment Analysis (DEA) to reach an aggregate ranking of the divers. This methodology enabled us to determine a ranking of the candidates for seven underwater project types, considering all factors in an objective and systematic way, reducing the conflicts and confusions that might result from the subjective decisions. Keywords: Personnel selection, occupational diving, AHP, DEA.
1 Introduction Occupational diving can be defined as underwater work which is being undertaken for profit or reward [1]. It is a developing industry involving all underwater diving work other than recreational or recreational technical diving [2] and covering various areas such as onshore and offshore diving, aquaculture, media diving, underwater scientific and archeological projects, police and military operations etc. In general, occupational underwater projects require big budgets because of difficult environmental conditions and expensive specific equipments to assure effectiveness and security. Another indispensable element of these kinds of projects is qualified and experienced personnel. Occupational (or commercial) divers have many specific qualifications obtained by challenging trainings and actual work experience that allow them to work in various projects that require different specializations. There exists an increasing need for well trained and experienced divers as the industry develops. The diving teams for manned underwater operations are in general built by temporary contracted professionals. Whether for military or for civil projects, recruitment of a diving team or selecting the appropriate diver among the team for a specific project, H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 654–662, 2011. © Springer-Verlag Berlin Heidelberg 2011
Personnel Selection for Manned Diving Operations
655
is a complex task for which a comprehensive evaluation of candidates is required. Referring the opinions of the professionals, we determined seven underwater project types, which are; a) Survey and inspection works including non destructive testing, b) Underwater cutting, c) Underwater welding, d) Underwater explosives, e) Lifting and salvage projects, f) Onshore construction and g) Offshore construction. The general skills that can be asked for any kind of underwater project are the sub-criteria for job experience. These are; 1) Experience on current project type, 2) Hand tools, 3) Hydraulic tools, 4) Pneumatic tools, 5) LP air jet and water lift/dredge, 6) Wet bell diving and 7) Paramedics Training. The divers who are more experienced on these sub-criteria are preferred in recruitment procedure. The importance (or the weights) of these sub-criteria can differ according the project. The sub-criteria for medical fitness are determined by three underwater medical specialists. These sub-criteria are; 1) Age, 2) VO2 Max value, 3) Flicker fusion frequency, 4) Psychomotor performance, 5) visual acuity and 6) hearing acuity. The determination procedure of the weights will be explained in detail in method section. Selection of the diving teams is one of the duties of supervisors who are over loaded by various responsibilities and paperwork. In this study, we implemented two multi-criteria decision making tools to support the selection of personnel for the seven underwater project types mentioned above. The evaluation is based on the work experiences and physical fitness conditions of the occupational divers. Multi-criteria decision methods are widely used for personnel selection in different work areas [3-6]. However in the occupational diving field, the number of applications is limited. The existing studies for diving personnel selection are predominantly based on the divers’ physical assessment, subjective evaluations made by employers and army regulations [7,8]. In the next section of this paper, the methods for weighting the sub-criteria and ranking the candidates by their scores of the two main criteria are presented. In the application section, 6 occupational divers are evaluated for seven types of underwater projects. The paper ends with the conclusion section.
2 Method In the presence of more than one criterion to evaluate the decision making units (DMUs), the main problem of the decision maker (DM) is to determine the relative importance (weights) of these criteria. The Analytic Hierarchy Process (AHP) developed by Saaty, is a method to support multi-criteria decision making (MCDM), especially in determining these weights using expert opinions [9]. The AHP decomposes a complex MCDM problem into a hierarchy. The goal of the decision procedure (in our case, selecting the best diving personnel for the project) is at the top level of the hierarchy, the decision criteria (and the sub-criteria if there exist any) are in the middle levels and the alternatives are at the bottom. This structure, specific to our problem is shown in Figure 1.
656
T. Ozyigit et al.
Fig. 1. Hierarchy for diving personnel selection MCDM problem
In the hierarchy shown in Figure 1, there are 2 main criteria, 6 sub-criteria for physical fitness, 7 sub-criteria for job experience and 6 alternatives (divers) to be ranked according to them. In order to determine the weights of the of the sub, a pairwise comparison matrix is built where the elements are criteria assigned by asking the field experts such as which criterion is more important than others with regards to the decision and by what scale (1-9) as shown in Table 1 [9]. Table 1. The 1-9 scales for pairwise comparisons in the AHP
Importance intensity 1 3 5 7 9 2,4,6,8
Definition Criterion i and j are of equal importance Criterion i is weakly more important than j Criterion i is strongly more important than j Criterion i is very strongly more important than j Criterion i is absolutely more important than j Intermediate values
The matrix is completed by setting 1 and tor W is determined by solving the following equation: max
1⁄
. The weight vec(1)
,…, satisfies where max is the maximum eigenvalue of A. The vector 1 with 0 for 1, … , . the normalization condition ∑ After having the weights for all the selection criteria, the DM’s work in scoring the alternatives is a simple task. The observed values of the alternatives for each criterion are multiplied by the criteria weights. In this way, the DM obtains an overall weighted score for the alternatives.
Personnel Selection for Manned Diving Operations
657
The job experience data of the divers are the application hours for the related underwater skills. The paramedics training sub-criterion is a verbal data and expert opinions are used to assign a score to each level of paramedical certificates. The physical fitness data consists of the clinical test results of the divers. As the criteria values are in different scales and units, they should be normalized before being multiplied by the weights. Let the vector of the criteria values having ,…, and as maximum and minimum respectively. The normalized vector is obtained as follows: ,…,
(2)
As the result, we obtained two scores (and consequently two rankings) for every diver according to the two main criteria. In order to combine these two rankings, the AHP is not a very suitable method as a pairwise importance comparison between job experience and physical fitness is not meaningful. Data Envelopment Analysis (DEA), developed by Charnes et al. [10], is a linear programming method for measuring the efficiency of DMUs. The efficiency is simply defined by the ratio of output(s) to input(s). DEA enables every DMU to select their most favorable weights while requiring the resulting ratios of aggregated outputs to the aggregated inputs of all DMUs to be less than or equal to 1 [11] and it doesn’t require a prior weighting of the criteria. In our problem, the scores of the divers by the two main criteria are the outputs. In order to implement the DEA in our problem, we created a dummy input which is 1 for every diver. As the input value is the same for every DMU, only the scores of the two main criteria (output values) have effect on the efficiencies calculated by the DEA. The DEA formulation is given below: max subject to: ∑
1
(3) 0, 0,
1, … ,
,
, .
Where is the efficiency of the DMU0, is the quantity of output r produced by is the quantity of output i produced by DMUj, is the weight of output r, DMUj, is the weight of input i, n is the number of DMUs, t is the number of outputs and m is the number of inputs.
3 Application The procedure represented in the Section 2 is applied to 6 divers, for the 7 different types of project listed in the Section 1. In the Table 2, the weights of the job experience sub-criteria for the seven job specializations calculated by the pairwise comparison matrices are given:
658
T. Ozyigit et al. Table 2. Sub-criteria weights for job specifications Survey & Cutting Inspection
Welding Explosives
Lifting & Onshore Salvage Cons.
Offshore Cons.
Exp. on project
0.2676
0.3078
0.3457
0.4592
0.3904
0.1479
0.2245
Hand tools
0.0607
0.0288
0.1034
0.0509
0.0468
0.0311
0.0319
Hydraulic tools
0.1594
0.0682
0.1915
0.0854
0.1400
0.1209
0.0757
Pneumatic tools
0.0189
0.0697
0.0282
0.1632
0.0578
0.0437
0.0173
LP air jet/lift/dredge
0.1457
0.1899
0.1440
0.0948
0.1484
0.0977
0.1420
Wet bell diving
0.1619
0.1612
0.0983
0.0499
0.0383
0.2671
0.2543
Paramedics Training
0.1857
0.1744
0.0890
0.0967
0.1784
0.2916
0.2543
The normalized values of the job experience sub-criteria are given in Table 3: Table 3. Normalized values of working hours
DIVER1 DIVER2 DIVER3 DIVER4 DIVER5 DIVER6 Survey Inspection Underwater cutting Underwater welding Underwater explosives Lifting and salvage Onshore construction Offshore construction
0.0000
0.7179
0.0256
1.0000
0.0513
0.4615
0.0000
0.5385
1.0000
0.0769
0.0769
0.2308
0.0000
0.1538
1.0000
0.0769
0.0000
0.1538
0.0000
0.0000
0.5000
0.0000
0.0000
1.0000
0.2857
1.0000
0.2857
0.8571
0.0000
0.7143
0.0208
0.4521
0.0000
1.0000
0.0000
0.2083
0.0000
1.0000
0.0000
0.0000
0.5714
0.4286
Personnel Selection for Manned Diving Operations
659
Table 3. (Continued)
Hand tools
0.0000
0.4615
1.0000
0.5385
0.3846
0.3077
Hydraulic tools
0.5000
0.5000
0.0000
0.5000
0.7000
1.0000
Pneumatic tools
0.0000
0.7143
0.7143
0.7143
1.0000
0.4286
LP air jet/lift/dredge
0.0000
1.0000
0.8750
0.0000
0.0000
0.6250
Wet bell diving
0.0000
1.0000
0.4000
0.0000
0.0000
0.6000
Paramedics Training
0.0000
1.0000
0.2500
0.0000
0.2500
0.5000
The divers’ score for the job experience according to the seven specializations are given in Table 4: Table 4. Scores of the divers according to job specializations
DIVER1 DIVER2 DIVER3 DIVER4 DIVER5 DIVER6 Survey & Inspection
0.0797
0.8067
0.3198
0.3935
0.2140
0.5908
Underwater cutting
0.0341
0.7884
0.6607
0.1231
0.1958
0.4806
0.0957
0.5480
0.6568
0.1982
0.2243
0.4820
0.0427
0.4241
0.5241
0.1867
0.2667
0.7677
0.1815
0.8883
0.3893
0.4711
0.2184
0.6629
0.0635
0.8293
0.3275
0.2563
0.2132
0.5471
0.0378
0.9400
0.3338
0.0674
0.2744
0.5576
Underwater welding Underwater explosives Lifting and salvage Onshore construction Offshore construction
The physical fitness of the divers is the second main criteria. In the Table 5, the weights, the normalized values for the physical fitness sub-criteria can be found. At the last row of the table, the scores of the divers are given.
660
T. Ozyigit et al. Table 5. Weights, normalized values of physical fitness criteria and the divers’ scores
WEIGHT DIVER1 DIVER2 DIVER3 DIVER4 DIVER5 DIVER6 Age
0.13634
0.0000
0.1000
0.3000
1.0000
0.5000
0.6000
Flicker Test
0.07733
0.0000
0.7692
0.6154
1.0000
0.7692
0.4615
0.11776
0.5455
0.0909
1.0000
1.0000
0.8182
0.0000
0.06411
1.0000
1.0000
1.0000
0.7500
1.0000
1.0000
0.0352
1.0000
0.8687
0.9034
0.0000
0.8037
0.5283
0.56926
0.0000
0.5389
0.6228
0.4311
1.0000
0.4132
0.1635
0.4853
0.6567
0.6249
0.8857
0.4354
Psychomotor Test Visual Acuity Hearing Acuity VO2 Max
DIVERS' SCORES
As mentioned before, a pairwise comparison between job experience and physical fitness for evaluating the divers is not meaningful. These two main criteria shall not have any superiority between each other. The DEA does not require any expert opinion and assigns the most favorable weights for the evaluated DMU’s criteria. The constraint for this assignment is that, no other DMU shall have an efficiency score greater than 1 with the same weights. When the linear program is solved, the 100% efficient DMUs form an efficiency frontier and the non-efficient DMUs’ scores are calculated by their distance to this frontier. A graphical presentation of DEA with experience on survey & inspection and physical fitness main criteria is shown in Figure 2.
Fig. 2. Graphical presentation of DEA with survey & inspection experience and physical fitness criteria
Personnel Selection for Manned Diving Operations
661
This method is applied with the other 6 job specializations together with physical fitness as well. As the result, we obtained 7 efficiency scores for the 7 job specializations for the 6 divers. The results are shown in Table 6. Table 6. The divers’ scores according to the 7 underwater project types
Survey & Inspection Underwater cutting Underwater welding Underwater explosives Lifting and salvage Onshore construction Offshore construction AVERAGE
DIVER1
DIVER2
DIVER3
DIVER4
DIVER5
DIVER6
0.2110
1.0000
0.8471
0.8646
1.0000
0.8100
0.1847
1.0000
1.0000
0.7056
1.0000
0.7000
0.2133
0.8344
1.0000
0.7267
1.0000
0.7339
0.1847
0.7685
1.0000
0.7056
1.0000
1.0000
0.2677
1.0000
0.8752
0.8920
1.0000
0.8183
0.2000
1.0000
0.8490
0.7728
1.0000
0.7723
0.1847
1.0000
0.8160
0.7056
1.0000
0.7336
0.2066
0.9433
0.9125
0.7676
1.0000
0.7954
The Diver 5 is 100% efficient for all job specializations. The reason for this is that, he has the highest score in physical fitness criteria. The Diver 1 is obviously the less efficient diver. In order to have an overall assessment of the divers, the average efficiency score for all the job specializations is given in the last row. According to the average score, we can rank the divers as: 1. Diver 5, 2. Diver 2, 3. Diver3, 4. Diver 6, 5. Diver 4 and 6. Diver 1.
3 Conclusion The AHP is an effective and widely used tool for decision making in personnel selection. In this paper the AHP is used to determine the sub-criteria weigths of the work experience and physical fitness of the commercial divers. Using these weights the divers are ranked by their job experiences for 7 different types of occupational diving specializations and their physical fitness. In order to combine these two main criteria scores, a non parametric mathematical programming based method, the DEA is used. As the result, an overall score for different projects types and an average score for general underwater competency of the divers are obtained. These rankings can be very helpful for supervisors or other decision makers to select the appropriate diver for the underwater projects in objective and systematic way.
662
T. Ozyigit et al.
The use of multi-criteria decision making methods has two major benefits especially in personnel selection process. First of all, the decision maker’s job gets considerably easier and the selection process gets faster. The other benefit of the multi-criteria decision making methods is that, the objective evaluation of the candidates reduces the conflicts and confusions resulting from subjective decisions. In addition, using these methods, the decision maker can easily give an account of his decisions to his or her superiors.
Acknowledgement This project has been financed by Galatasaray University, Scientific Research Project Commission - Project No.09.401.004.
References [1] [2] [3]
[4] [5] [6]
[7] [8] [9] [10] [11]
Occupational Diving Careers, Information, http://www.adas.org.au Workplace Health and Safety Queensland, D.o.J.a.A.-G., 2005, Occupational Diving Work Code of Practice, p. 1 (2005) Albayrak, E., Erensal, Y.C.: Using Analytic Hierarchy Process (AHP) to Improve Human Performance: An Application of Multiple Criteria Decision Making Problem. J. Intell. Manuf. 15, 491–503 (2004) Jessop, A.: Minimally Biased Weight Determination in Personnel Selection. Eur. J. Oper. Res. 153, 43–444 (2004) Gungor, Z., Serhadlioglu, G., Kesen, S.E.: A Fuzzy AHP Approach to Personnel Selection problem. Appl. Soft. Compt. 9, 641–646 (2009) Karsak, E.E.: A Fuzzy Multiple Objective Programming Approach for Personnel Selection. In: Smc 2000 Conference Proceedings: 2000 IEEE International Conference on Systems, Man & Cybernetics, vol. 1-5, pp. 2007–2012 (2000) Brooke, S., Blewett, P.: Dive Supervisors’ Selection Criteria for Deep Operational Diving. In: Conference Subtech 1989, Aberdeen, UK, November 7 - 9 (1989) Management of Army Divers, Personnel Selection and Classification, Army Regulation 611–75, Headquarters Department of the Army, Washington, D (2007) Saaty, T.L.: Analytic Hierarchy Process. McGraw-Hill, New York (1978) Charnes, A., Cooper, W., Rhodes, E.: Measuring the Efficiency of Decision Making Units, Eur. J. Oper. Res. 2, 429–444 (1978) Ozyigit, T.: Evaluating the Efficiencies of Energy Resource Alternatives for Electricity Generation in Turkey. Istanbul Technical University Journal Engineering Series 7, 26– 33 (2008)
Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania Corina Cimpoieru*
Abstract. This study raises the attention to the interaction between the elderly and new digital technologies. Based on an ethnographic research at one of the first Biblionet Internet Centers opened in Romania through the Global Libraries Project, the research draws its originality from the emic perspective it has embarked on, offering valuable insights into the elderly’ attitudes toward and direct experiences with computer and Internet use. Keywords: technology and the elderly, information society, grey digital divide.
I Introduction According to a report of the European Commission [1], the information society in Romania finds itself on the “poor” side of the “digital divide”, with one of the smallest rates of Internet usage in European Union. Another report[2] calls into attention what has been designated as “ the third transition” - namely the wide-spread phenomenon of aging in Eastern Europe – which adds to the challenges the diffusion of new technological innovations[3] has to face in the post-communist transition societies. Providing access, as well as e-literacy to the elderly is a provoking task, which can benefit both this segment of population and society as a whole. Trying to address the questions of how and why older adults started to use the digital technologies and what were the benefits and barriers they encountered, the immediate aim of the present research is to make an informed-description and analysis of a context-related digital interaction between the elderly and computer technology. Given the exploratory and topic-groundbreaking nature of the enquiry in Romania, the larger aim is to provide for reliable information, susceptible for further research.
2 Theoretical Framework There has been a growing body of research about the development and the influence of the new ICTs (Information and Communication Technologies) on their users. However, very few of these studies specifically concern the elderly[4]. In this part I aim to briefly overview some of the existing literature regarding the interaction between the older adults and the digital environment. *
Beneficiary of the project “Doctoral scholarships supporting research: Competitiveness, quality, and cooperation in the European Higher Education Area”, co-funded by the European Union through the European Social Fund, Sectorial Operational Programme Human Resources Development 2007-2013.
H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 663–677, 2011. © Springer-Verlag Berlin Heidelberg 2011
664
C. Cimpoieru
It has been shown in several studies that older adults tend to be among the losers of the so-called “informational technology revolution”[5]. For instance, Norris[6] has pointed out to the “generational difference” which makes the access and use of Internet more slowly among the elderly when compared with the younger’s uptake of the same technologies. In the same line, Milward[7] coined the term “grey digital divide” to describe the seniors unequal access of the digital world, while Hagittai[8] indicated the existence of a “second level digital divide”, arguing that age considerably lowers the ability to online engagement. Furthermore, research has demonstrated that simply providing access to technologies does not imply intricate and immediate or successful up-taking. In general, and particularly in the case of the elderly, access is only part of the problem, “not sufficient to grey the Web”, as Brabazon metaphorically pointed out[9]. In an attempt to research the older adults’ motivation for learning and using the ICTs, some of the literature has concentrated on the perceived benefits affecting older’ engagement to the digital technologies. For instance research made by Melenhorts and Rogers[10] demonstrated the importance of perceived benefits of digital technology as a motivational factor for using this medium, while Morrel et al.[11] found that a lack of awareness regarding the benefits results in its rejection. The main perceived benefits reported by previous research include: communicating with remote family and friends[12][13], meeting information on their hobbies and interests[14], seeking for health information[15], remaining independent[16], keeping mentally healthy[17] and socially active[18]. Furthermore several studies have pointed out to a genderized perceived benefits which indicate that women are motivated by keeping in touch with their family, while men seem to be more inclined in learning and using the Internet for finding personal information[19]. Previous research has also pointed out to the barriers that decrease the chances of older adults’ usage of computer and Internet. These are usually related to lack of awareness[11], lack of IT skills and experience [14] age related physical limits such as disabilities, loss of memory, slower speed performance[20], computer anxiety translated into a general fear of the new technologies[21]. However it has also been shown that barriers could be overcome by a positive experience with computers. For instance, Cody et al[21], have suggested that computer training creates a positive attitude toward computer and improves the Internet efficacy.
2 The Technological Context of the Research Putting forward the case study of one of the Internet pilot centers opened in Romania as part of the Biblionet Project[22], the present research draws its originality from a bottom-up approach of the “grey digital divide”. The case study is all the more important as it is an unique initiative in Romania of bridging this “technological gap” among the elderly population. The context of the research is related to the larger project of Global Library Initiative of Bill and Melinda Gates Foundation [23], which aims to help public libraries to connect people with the world of digital information and opportunities. In Romania The Global Library Project is developed into The Biblionet, “a five-year program that helps Romanian libraries better serve their communities through training and technology”.
Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania
665
Picture. Pensioners at the Internet Center from Club of the Retired in Zalau
The first pilot center opened in the country through the Biblionet project was on 8th of May 2008 within the Club of the Retired[8] in Zalau. The County Library had already a Library Section at the Club and considered a good opportunity to develop its services there. The initiative of opening The Internet Center was highly welcomed by the Club, who could not have afforded either financially, nor logistically such a service for its members. Offering free access to internet and training was considered as a “helping hand” for this special category, usually neglected by other cultural and educational organizations, being they governmental or non-governmental. In addition to that it was quite a challenge addressed both to librarians and the older users, being an experiment of mediation the interaction between the elderly and the new ICT. As the director of the Club mentioned, the impact at the beginning, at least, was quite impressive: “ For most of the people coming here, the Internet was a positive shock, as they couldn’t imagine before to be able to talk to their children in this way. In addition to that, they now search the Internet for everything, they themselves, looking for curiosities and new things about the world. This is a very good thing for me. It’s surprising and pleasant at the same time.”
3 Methodological Approach Given the fact that the present research was designed as a pilot-study, exploratory in nature, the ethnographic approach it has embarked on was considered as the best suited method for getting some insights into the social context in which the interaction between older people and technology takes place. I will try to further describe the main features of the research.
666
C. Cimpoieru
3.1 The Ethnographic Setting Following the visits to the three Biblionet pilot centers from Zalau, the first goal of the research was to make contacts with the elderly users, the main empirical focus of the study. Accordingly, the Internet Center from the Club of the Retired was purposively selected for being a particular context of approaching the subject concerning the digital inclusion of the elderly as a result of the new services offered by the local library with the support of the Biblionet project. After initially contacting the director of the Club I had a rather informal visit to the Club. To my surprise and total amazement, when I arrived there the atmosphere was very vivid, with all computers being intensively used by the pensioners, both males and females. I was quite impressed by the variety of work being carried out: writing e-mails, reading online newspapers, searching for garden information, watching and listening with headphones modern or traditional music video-clips or chatting with friends. That was certainly “anthropologically strange enough”, not only because one would not expect to see the pensioners so “technologically- engaged”, but also because of the perfect setting that seemed to be “too good to be true”. Leaving aside any possible prejudices concerning the elderly’ usually-considered decreased abilities and also trying to detach from the quite tempting-suspicion that everything was “staged”, I immediately began to question myself: Can the elderly really so expertly access computer operations and the internet? How did they learn to use it? What motivated them to learn to do so? Starting from this initial “anthropological curiosity” I decided it’s worth to find some answers. So, after the director kindly introduced me to the trainers and the active users present there, he informed them that I will be spending several days there and I will also be conducting interviews with them. 3.2 Observation and Interviews The fieldwork research took place at the Club over a three-weeks period in May 2010, during which time interviews, as well as many informal talks and participant observation were carried out both during the “Internet morning program” and the afternoon male-activities of the Club. My initial presence on the site typically created a considerable attention and obvious alterations in their behavior for a couple of days till I was fully accepted and relationships began to develop. Gradually I became a “club member”, prone to friendly conversations, computer advice-giving or even welcomedplaying partner in chess and backgammon games. All these immersions provided me with valuable insights concerning their attitudes towards and motivation for using the new technologies. Semi-structured interviews were carried out with 15 persons: two trainers and 13 users – pensioners (8 males, 5 females, 59 to 84 years old). All participants were interviewed at the Internet Center, as this was already their technologically friendly environment. Each interview was followed by an informal practical computer request: the participants were asked to demonstrate some of the operations they usually do at computer-site. This helped check the extent to which their responses were “behaviorally grounded”. This surprisingly proved to be a very good strategy, because many times my interviewees were more comfortable to practically display their computer skills rather than talk about them.
Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania
667
The issues touched by the interviews were related to older users’ motivation for learning and using the computer and Internet and the benefits and barriers they encountered. The topics were opened to any details and new information occurring. For instance, other topics were systematically included, such as their previous experience with computer and Internet, their attitude toward them before and after the training or their perceived usefulness of the online information. These further encouraged the expression of personal experience and interpretations. 3.3 Data Analysis The interviews ranged from 20 to 80 minutes and were digitally audio and video recorded, being afterwards verbatim transcribed. This data was complemented with other information from the local newspapers, official library reports and photo materials, that all gave a more nuanced portrait of the “technological context” in which the research took place. The present study puts a particular emphasis on the interviewees’ personal expression of attitudes and opinions about their direct learning and usage experience of the technological environment. Accordingly, the following presentation of the findings gives direct voice to the interviewees’ responses, organized in common thematic themes relevant for the research. A gender and age identity marker follows each quota. With respect to the structure of the findings, the data is grouped so as to provide the two main sets of information relevant to the study: on the one hand, the findings present the general, main descriptive, information about the training and the trainers’ views in relation to their direct experience as mediators of e-literacy to the older users; on the other hand, significant data is made available reflecting the older users’ perspective on their direct experience with computers and internet and the benefits or barriers involved. Despite the small number of the interviews, due mainly to the short period of time available for the fieldwork research, the gathered data proved to be extremely useful for providing inside and firsthand information on the older users’ attitudes and experiences with technology in the local context of the research. In addition to that, and all the more important, it will hopefully open the door more widely for further research in the domain.
4 Findings and Discussion 4.1 The Training as the Key Point of Digital Inclusion of the Elderly “If you want to be successful show people that you care about them” (trainer) “Vision + Resources + A Lot of Patience = Online Success” (the training device) One of the most important elements of the Biblionet program according to all of the interviewees was the opportunity provided by the free training. Therefore the first part of the findings will discuss not only the training process but also the teaching and
668
C. Cimpoieru
learning experiences of the trainers and the reactions they received from the older users. Initially the trainers could hardly cope with the large number of pensioners interested in taking the course. One of the trainers explicitly talked about her total surprise when people have “busted in” to learn the computer and the Internet: “We didn’t expect so many persons… There were at some point 60 persons who wanted to take the training and we had to put a part of them on a waiting list. We started with four series of six people, given the number of the computer available. The training lasted for three weeks. Everybody was very eager to learn at the beginning.” (trainer, female, 45 years old)
This was remembered also by several of the pensioners who stated that they “had to queue as in the old times” in order to get a place in the training. However, some users did admit that their initial motivation for enrolling into the training was to give an example, “so as the other pensioners will come as well.” This is especially the case of the more active members of the Club. However with time “the passion for computers” diminished and so did the number of people who desired training. New strategies had to be adopted, as one trainer explains: “Of course, at some point the most active members got fewer and fewer and we had to promote the training on all possible local channels, even put flyers at the Club’ pay office, where they would only come for paying their fees, not knowing about the services offered by the Internet Center. I even stayed in front of the hairdresser from the Club, trying to convince people to come to the Center.” (trainer, female, 45 years old)
At the time of the research, 176 people had taken the training courses. While this is an impressive number of users trained, Club’s membership’s is about 9000. This raises the question of how local institutions like the club and library could have been more effective in promoting these training sessions to the elderly. However, it is not clear how many pensioners used Club’s computers without taking part in this training. The frequency of the visits to the Internet Club to use the computers has also decreased as many started to use the computer at home. Several of the pensioners mentioned that they either had a computer at home that previously had not used or purchased one after the training ended and they were more familiar with computers and the Internet. The training now continues at the Club, but on an individual, not a group, basis. 4.1.1 Structure of the Training – Identifying the Needs In order to facilitate the training a survey was initiated by the trainer to collect basic information about the pensioners’ background (age, education, former professional experience) and experience in using computers and searching the Internet. This survey tried to identify their needs and future expectations about the course. The survey was completed by 58 persons, mainly active members of the Club, and the results showed that, overall, the participants strongly welcomed the initiative, as they had either no previous experience or very basic skills in computers. Interestingly, 36 persons mentioned that they had a computer at home that they didn’t know how to use or that they intended to buy one in the near future. Two of the persons mentioned that they were interested not only in initiation in computer skills but also in further specialization.
Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania
669
The training cycle lasted for three weeks, four days per week and comprised 24 participants, divided into four groups. At the beginning pensioners were introduced to what this new ICT could do for them. Presented as a new service of the library, the training consisted of teaching step-by-step basic computer skills (know how) and of knowledge-based searching for information (know what). Each session began with computer skills practice, followed by a topic-related information search on the Internet. As the training progressed other digital devises were introduced, such as webcams, CDs and DVDs, software for downloading pictures and music from the Internet. In addition to that, in order to diversify the activities, “so as people not get bored” other operations where as well introduced, targeting the topic-related information of interest to the users, such as communication with authorities, health and medicine, online newspapers, shopping online etc. In order to encourage pensioners to take part in these trainings and use the Club’s Internet Center, “best surfers” were awarded with personalized diplomas according to their performances: “The first opened website”, “The most active participant”, “The first email sent” and the list continued with diplomas for “The best trips online”, “ The best online news searcher” or the “The most emotional e-mail received’. 4.1.2 Working with the Elderly – A Negotiated Methodology Neither of the trainers had previous experience working with the elderly, nor had the Biblionet courses they had taken been specifically targeted at dealing with this special category of users. Consequently the training course, as one of the trainers acknowledged, was often “a negotiation between explaining meanings, listening to life-stories and giving support”. The gentle and patient introduction to using computers and Internet was therefore a learning experience as well for the trainers, as one of them clearly stated: “The basic idea was that you have to listen to what older people have to say, to emphatically understand them. You talk less…and you do have to collaborate in different domains of interest to them, for instance, from religious icons on glass to oil extraction from Norway, from political comments to recipes of all kinds and even notices about matrimonies. I am not good at everything, but I must listen and sometimes have a lot to learn as well. That’s this generation…” (trainer, male, 40 years old)
4.1.3 Barriers Encountered during the Training – Trainers Perspective Gathered data showed that barriers encountered during the training were usually circumscribed to problems dealing with memorability of the information learned; physical barriers in performing computer tasks; language barriers of the computer programs; general fear of committing errors. In order to make the older users overcome these barriers the trainers adopted different strategies. For instance, one of the trainers commented: “I remember the oldest pensioner who took the training at 82 years old. Now he is 84 years old and is communicating by himself with his children abroad. He always forgot what to press. After three weeks of training he will still ask me:” Mam’ where shall I press now?”. The question will come back several times. Each time we tried to refresh their knowledge being aware and making them aware that only using it each day they will learn it.” (trainer female, 45 years old)
670
C. Cimpoieru
Another problem reported was the constant forgetting of email passwords. If the email ID was easy to remember being connected to something familiar to the user, like its family name, the password would not come in mind when needed. The following anecdote was recounted to me about this problem: “I had to do something: I wrote down in my notebook all their passwords. I do remember a funny moment when one of the gentlemen, who usually forgot his password, asked for it and I tried to whisper it to its ear. Having a bad hearing as well he repeated loudly and than all his colleagues would know it and next time helped him with it.” (trainer female, 45 years old) Age-related health problems were also reported as having influence on the older users’ abilities to learn and perform the computer tasks. Loss of eyesight and eyestrain, difficulties in staying in front of the desktop for more than an hour at a time, easy paresis of the fingers were among the physical barriers mentioned by the trainer. Consequently the training had to be tailored to make up for these deficiencies: “We had also cases dealing with easy paresis which affect the movement of their fingers on the keyboard. In this case we tried take it slowly, taking into account also the fact that for them touching something they didn’t touch before was amplifying their gestural vulnerability (manuality).” (trainer, male, 40 years old)
Another challenge that the trainers had to face was when they introduced this new group of users to the technical jargon, which was mainly in English language. Terms had to be not only translated, but many times simplified or even re-invented in order to come at hand when operating the computer and Internet. One trainer remembered the “technical language engineering” she had to adopt in order to cope with this problem: “We had to adapt all this language, to use words as simple and intelligible as possible, so as for them to easily understand. As well we let them call it in their manner, otherwise they wouldn’t remember it next time. For instance, the desktop was called TV screen (ecran de televizor). Another example were the sign commands such as “the glove” (manusa) or “the hand”(manuta). I also tried to make them understand that when we begin to write we have to be sure that the line-cursor is “blinking” (clipoceste). This way of expression was very useful in order for them to understand. This was in no way considered a underestimation of their intellectual or professional qualities, but rather a useful and necessary strategy of adapting to the computer. There was no complaint about that… When we got to the Internet programs of correspondence, such as Yahoo or Gmail, a few basic English words like send, inbox, attach had also to be explained. None of them new the English language and that was quite an impediment for them.” (trainer, female, 45 years old).
Finally, trainers had to cope with the older users’ general fear of committing errors when performing computer tasks. Constant assurances that nothing will broke, that everything was under control had to be provided to this class of users in order to overcome their lack of trust when using these new information technologies, as one trainer explained: “At the beginning their fears were quite big. The fear of the unknown was the biggest. They were afraid to put their hand on the mouse, to touch the keyboard.
Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania
671
I had to assure them that nothing wrong will happen, to convince them to press the buttons with thrust that nothing will break. At the beginning they had this huge fear of breaking it down, but after a few sessions they got used to it..”
(trainer, male, 40 years old) Overall, the accounts of the trainers about the difficulties that they had to deal with when training, demonstrated that small adaptations and tailored approaches, a good deal of reassurance, a lot of patience, as well as a good sense of humor on both sides were helpful ways to overcoming the so-called “computer anxiety” of this novice class of users. In addition to that, interviews with users further uncovered that perceived benefits from using the computer contributed considerably to their uptaking of the new technologies. 4.2 Perceived Benefits of the Computer and Internet Usage – Older Users’perspective The second part of the findings and discussion is dedicated to the reported benefits of the older users. They are organized into five general areas: facilitating communication with relatives; enforcing sociability; meeting personal information needs; enabling community engagement; positive learning experience. Facilitating communication with relatives One of the top-rated reported motivation and as well benefit for learning and using the computer and Internet was the possibility to stay in touch with their relatives from abroad. The main incentives were the free, direct and immediate nature of the communication. This highly appreciated communicative function of the Internet gave them the chance not only to keep the family link, but also to build bridges over the generation gap communication. Some of the respondents have stated that they now talk more to their children abroad that they used to, as one of the persons admitted: “I have two children, one in Bucharest and one in Toronto, Canada. They were the main reason for which I started to learn the computer. I found so much joy in being able to stay in contact with them through the email. The first thing that I did was to write to them about this center and the beautiful things that are happening here. Then we started to see each other on the webcam and I was so moved to see my nice and we started to talk much more than we use to do on the telephone.” (female, 60 years old)
Similat thoughts were echoed by several other users: “I enrolled to the Internet Center, immediately after they opened it. I took the training classes in order to learn how to communicate through email and messenger with my children. I have seven children, four in France, one in Spain, one in Bucharest and one in Zalau. I missed them a lot so I had to do something. So in less than two week I learnt how to send and receive messages from my children. Now we are permanently in touch and the fact that the telephone bill has lowered considerably is also an extraordinary thing.” (female, 62 years old) “The fact that we can now permanently communicate with our children, with my son in Canada and my daughter in Hungary and that we are now able to see how our nephews grow up almost everyday, for us are incredible things. We
672
C. Cimpoieru come almost daily to the internet center to read our messages. Unfortunately, at home we can’t afford the internet monthly fee and neither a computer.” ( female, 65 years old)
In addition to that, their children were quite impressed and encouraged further their old parents new technological habits. It can be affirmed that the new technologies not only influenced the ways and content of communication across the generations, but also enabled the creation of a technological “diffuse household”[24] which, by bridging time and space distances, mediated communication exchanges with the family members from abroad. It has also been observed across the interviews that it is women who usually take up this task of kin networking. Enforcing sociability In addition to mediating communication with their families, the Internet offered new opportunities of expanding sociability for older adults. For instance one 62-year-old woman talked about a relationship she started online with an American men, after the death of her husband, four years ago. She particularly emphasized the role of the Internet in helping her overcome depression and loneliness by having the chance to make new friends, as the following quotes indicate: “After the death of my husband, I felt lonely and depressed and I was looking for something to keep my mind occupied with. Not only did the Internet opened the door for me, but also help me create a new life. It is now 4 years since I started to navigate on the Internet, after the death of my husband. In my house the Internet is closing now in the evening at 9 or even later if the gentleman from America tells me he is signing in. I’ve met him on the Neogen network where I have my profile… There you can talk, have conversations, and it can also happen that somebody will get in love with you and change your life. As it happened to me.” (female, 62 years old) Another 60-years-old woman expressed her contentment for being able to create a kind of online community by sharing things that could benefit older adults: “The internet has so beautiful things to offer for the pensioners. For instance, I was so impressed by the PPS that I received that I send it around to the people I knew and now we are a group of friends sending each other these things. I think it’s important that all the pensioners be in contact in order to receive these things which make you feel better, bring you peace and quite. That’s what the old people need.” (female, 60 years old) In a similar tone another participant was pleased to find online friends that she had lost contact with for many years: “There is a site iviv.org where all your friends can join. There I have a friend and she put me on the IVIV… everybody can join, your friends from all over the world. Now, as you know, nations are everywhere. Sometimes I was so surprise to find out about people whom I haven’t seen for 20 years…It’s so beautiful when things like this happen.” (female, 67 years old)
Meeting information needs Overall, results showed that the benefits of the computer and Internet use were mostly associated with the effectiveness in meeting the information needs of the older adults. Whether managing the household needs or solving juridical problems, whether
Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania
673
looking for health information or just entertainment, most of the older adults responded that they were able to find the information they needed on the Internet. However, it can be observed that this selective principle allows only for searching problem-solving information, limiting their further potential interests and learning patterns. Several interviewees expressed their contentment with the basic computer knowledge for what they need to search on the internet as the following quotes illustrate: Managing household needs “For me the food recipes and the gardening are the most important. For instance, I checked information about not so common plants such as chick-pea and lentil and than I went and bought the seeds. I also looked for information regarding the conditions it needs and how to take care of it. For the recipes as well, you can find whatever you like. I just type chicken jam –role and I got it.” (female, 65 years old) “I do all kinds of files, starting with excels for the evidence of the water meter, being the Chief of the block, to simple texts of information.” (male, 67 years old) Health, diseases and medicine information “I looked through the Internet for a doctor who operated me a long time ago in the Mures County and to my great surprise I found it. I told him that I am a former patient of him and that I needed some advises, as some of the medical problems that I’ve had reappeared. He responded and recommended certain treatments and some of the products, namely creams, I found and ordered through the Internet. I am now actually waiting for the order.” “There is some medicine that I take and I check them on the Internet to see their effects on other people as well, if they worked or not and with what sideeffects, if the case.” (B.I, male, 67 years old) “Each day I discover something new. I discovered and I strongly believe now that we can defend our health by ourselves. For instance, in my husband case, I search on the internet. There is a program with a human body there and all you have to do is to indicate which part of the body is unhalthy and you receive some advices, what to do and what treatments to follow..I find it very good for older people who usually deal a lot with health problems…Another example: I started now to look for information in order to make a request to the “Foundation “At home” for my husband to be able to follow a recovery treatment, as indicated by the doctor specialist.” (L.C, female, 60 years old) “I only search for what I am interested in. For instance, I have a prescription and I want to know something about the medicines prescribed, about my disease. I come here and check it.” (female 67 years old) Being connected to up-to-date information “I am more interested in the news. So if I want to find out more or to make something clear I search it on the internet. I am also interested in the comments people post, to see how other think as well.” (male, 69 years old) Entertainment “I use the computer only for entertainment: I read the press, the news from the country, play music, look for pension information, things like that that interests us.” (male, 67 years old)
674
C. Cimpoieru
Internet as integrated, routine activity “For me, as an old aged person it means very much. And I will tell you why. I am by myself. So as single as you see me I have all my family around me. I am by my own all day. I have nobody. I sleep, cook and I am all day by myself. So the internet is very useful for me and I will tell you why. I wake up in the morning and, you know, I have my own defects. I wake up, but not without the coffee. Then I shave and check the internet. If I want to see a newspaper I search it and this is something new for me. If a want a beautiful song that I like and now you know we old people like soft songs, I look for it and I find it. And I stay for one or one and a half hours. And After that I go out and see for my household things.” (male, 85 years old) Travel information “For me it means a lot. For instance now I’ve booked a ticket for a treatment resort and than I checked everything about it on the internet: how to get there by train, where to sleep, how it looks and what can I do there. Now I know everything, I’m prepared to go and that gives me confidence. Otherwise where from should I find all that information? It helps a lot, not only the young people, but also the old people, who especially need to be prepared.” (male, 65 years old) Solving juridical problems “Now I am very interested what are the steps that I have to take in order to solve an injustice, an abuse, that was done to me before 1989 by a director from the education institution where I used to work as a secretary. I call it abuse, because I was set up a fault in the stock list in order to be penalized and kicked out, according to the Law no.5/1978, art. 64 lit T, law repudiated by the Final Report of the Presidential Commission regarding the condemnation of communism. I started to document on the internet, to search for articles and information, in order to start a juridical action.. Now I am corresponding with the editorial board of the “22 Newspaper”, who has distributed my message to all editors interested in such problems. I also wait the response from my attorney who will deal with my juridical rehabilitation.” (L.C., female, 60 years old)
Enabling community engagement The access of the Internet does not only meet the personal needs, but also enables community engagement. As one of the trainers rightly observed “the civic spirit at this age functions perfectly, it just needs a little support.” The usage of computer and Internet gave older adults the possibility to be part of the community by staying informed about what is happening and even performing civic roles as the following quota indicates: “I am so proud of myself for having the courage, for the first time in my life, to make a claim at the town hall. I will tell you why. On Christmas daytime I paid a visit to my mom to bring her some things. On my way back home I had to wait for almost 2 hours for a bus to come. Eventually I had to take a taxi. The second day, each bus station had the holyday schedule posted. I did that so things like that would not happen to other people as well.” (female, 60 years old)
Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania
675
Pozitive learning experience Another important finding was the positive learning experience that the computer and Internet usage enables. The results confirmed and extended the by-now known hypothesis that “technology improves the quality of life”. In line with this belief, several respondents stated that their life changed considerably since they started to use the computer: “My life changed a lot after using the computer. I see things differently now. I myself, feel better trying to know more about the present time. Especially now that I don’t work anymore, it gives me a chance to stay informed. Now I like to stay up-to-date with everything…I come here almost everyday. It makes me feel good. I also come for the afternoon activities, backgammon, chess, cards, but for the computer I have a special sympathy.” (male 67 years old) “Since I use the internet I am more confident. I feel that I have someone backing me all the time, I have a support, something that is helping me. When I need something, I know that I should find it on the internet and this fact is giving me confidence and makes me happy as well.. I usually come to the internet with an impulse, let’s see what I will find, and than I find it and I’m relaxed that I found it.” (female, 65 years old)
Interestingly, the gather data showed that social and familial networks do not function as a learning support for computer and Internet use. It is only the external training, or, in some cases, the previous work experience, that was reported as having efficacy for the elderly in acquiring IT skills. It is however widely acknowledged in their testimonies that the social and familial networks are the primary impulse for their motivation to learn. Across all users interviewed the training was reported as being the key to their up-taking of computers and Internet. The question of whether the librarians manage to bridge the digital divide for the elderly, finds its answer precisely in the training deliverance of both computer literacy and knowledge information: “The free Internet access and training activity that we have here at the Club is as if somebody would help us to make a salt into the technology of the third millennium in order to be able to keep the pace with our children, to be able to communicate with them on their language.” (female, 60 years old)
The positive learning experience improved as well the capacities of the elderly as intermediates of knowledge to others as well. One of the participants mentioned that he took great pride in being able to help the others with the information he found on the internet, as the following quota indicates: “I also have some neighbors, pensioners as well, not necessarily older than me, but for which the computer is not a . And they ask me to look for information regarding their disease, the medicine they take or about a certain law. So I look it on the Internet, note it down and than give them the right information. I can honestly say that it brings me a great joy to be able to help the others with their problems.” (male, 67 years old)
676
C. Cimpoieru
5 Conclusions The findings from the present research contradict the general stereotype that the older people are not able to engage in the new technology information. Although there is evidence that this segment of the population is the big looser of the “digital inclusion”, this study demonstrates that initiatives like Biblionet, could be replicated with positive outcome for the elderly’s engagement into the world of the digital. However, providing technology is not sufficient for giving access to information. It is the training which embeds the use of technology and makes it meaningful for the older users’ needs. Furthermore, in order to access technology the older users need a stimuli, which can be provided by an improved tailor-made training for them that should clearly state the benefits involved. The present study was as a pilot-research, limited by the time- availability of conducting fieldwork. However, the “microhistory” from the Club of the Retired gave surface to valuable information concerning the digital inclusion of the elderly, that is opened for further investigation. Acknowledgements. The findings reported in this article are part of a larger research project which examines the implementation of the Biblionet Project in Romania and which is supported by Bill and Melinda Gates Foundation in Romania through IREX Foundation. The research was coordinated by James Nyce, Gail Bader and Alexandru Balasescu. The ethnographic research was done by the author with a grant support from the Ball State University. The author is grateful to all the informants that collaborated to this research, addressing particular thanks to Maria Demble, the trainer coordinator, and Paul Ivascau, the director of the Club of the Retired, whose significant support was offered during the research.
References 1. Europe’s Digital Competitiveness Report, Volume 2: i2010 — ICT Country Profiles, omission of The European Communities, Brussels (2009) 2. From Red to Grey. The Third Transition of Aging Population in Eastern Europe and The Former Soviet Union., World Bank Report (2007) 3. Rogers, E.M.: Diffusion of innovations. Free Press, New York (1962) 4. Blaschke, C.M., Freddolino, P.P., Mullen, E.E.: Ageing and Technology: A Review of the Research Literature. British Journal of Social Work 39(4), 641–656 (2009) 5. Castells, M.: The Information Age: End of the Millennium, vol. 3. Blackwell, Oxford (1998) 6. Norris, P.: Digital Divide: Civic Engagement, Information Poverty and the Internet in Democratic Societies. Cambridge University Press, Cambridge (2001) 7. Millward, P.: The "grey digital divide": Perception, exclusion and barrier of access to the Internet for Older People. First Monday 8(7) (July 2003), http://firstmonday.org/issues/issue8_7/millward/index.html 8. Hargittai, E.: Second-Level Digital Divide: Differences in People’s Online Skills. First Monda 7(4) (April), http://www.firstmonday.org/issues/issue7_4/ hargittai/index.html
Digital Inclusion of the Elderly: An Ethnographic Pilot-Research in Romania
677
9. Brabazon, T.: From Eleanor Rigby to Nannanet: The greying of the World Wide Web. First Monday 10(12) (2005), http://firstmonday.org/issues/ issue10_12/brabazon/index.html 10. Melenhorst, A., Rogers, W.A.: The Use of communicaton and Technologies by Older Adults: Exploring the Benefits From the User’s Perspective. In: Human Factors and Ergonomics Society Annual Meeting Proceedings, Aging, pp. 221–225 11. Morrell, R.W., Mayhorn, C.B., Bennett, J.: A survey of World Wide Web Use in Middle Aged and Older Adults. Human Factors and Ergonomics Society 42, 175–182 (2000) 12. Fox, S.: Wired seniors: a fervent few, inspired by family ties., Washington, D.C.: Pew Internet&American Life Project (2001), http://www.pewinternet.org/reports/toc.asp?Report=40 13. Saunder, E.J.: Maximizing computer use among the elderly in rural senior centers. Educational Gerontology 30, 573–585 (2004) 14. Morris, A., Goodman, J., Branding, H.: Internet use and non-use: views of older users. Universal Access in the Information Society 6(1) (2007) 15. Cambel, R.J., Wabby, J.: The Elderly and the Internet. The Internet Journal of Health 3(1) (2003) 16. Williams, A., Guendouzi, J.: Adjusting to “The Home”: Dialectical dilem- mas and personal relationships in a retirement community. Journal of Communication 50, 65–82 (2000) 17. McConatha, D.: Aging online: Toward a theory of e-quality. In: Morrell RW, editor. Older adults, health information, and the World Wide Web, pp. 21–39. Erlbaum, Mahwah (2002) 18. White, H., McConnell, E., Clipp, E., Bynum, L., Teague, C., Navas, L., Craven, S., Halbrecht, H.: Surfing the net in later life: a review of the literature and pilot study of computer use and quality of life. Journal of Applied Gerontology 18, 358–378 (1999) 19. Age Concern, Internet turns on men and women in different ways – new survey reveals (2002), http://www.icmresearch.co.uk/specialist_areas/it-news.asp 20. Morris A., Branding, H.:E-literacy and the grey digital divide: a review with recommendations, http://jil.lboro.ac.uk/ojs/index.php/JIL/ article/view/RA-V1-13-2007-2 21. Harrington, K.V., McElroy, J.C., Morrow, P.C.: Computer anxiety and compuer-based training: A laboratory experiment. Journal of Educational Computing Research 6(3), 343– 358 (2000) 22. Cody, J.M., Dunn, D., Hoppin, S., Wendt, P.: Silver surfers: Training and evaluating Internet use among older adult learner. Communication Education 48, 269–286 (1999) 23. Biblionet Project in Romania, http://www.irex.org/project/ biblionet-global-libraries-romania and also at, http://www.biblionet.ro/show/index/lang/en 24. Bill & Melinda Gates Foundation. Global Library Project, http://www.gatesfoundation.org/libraries/Pages/ global-libraries-projects-update.aspx 25. Mihăilescu, V.: La maisnie diffuse, du communisme au capitalisme: Questions et hypotheses. Balkanaulogie IV(2) (2000)
CRAS: A Model for Information Representation in a Multidisciplinary Knowledge Sharing Environment for Its Reuse Yueh Hui Kao1, Alain Lepage1, and Charles Robert2 1
Université de Technologie de Compiègne, laboratoire COSTECH Centre Pierre Guillaumat, BP 60319, 60203 Compiègne France [email protected], [email protected] 2 Departmernt of computer Science, Faculty of Science University of Ibadan, Ibadan, Oyo Road, Nigeria [email protected]
Abstract. The objective of the work was to propose a model for accessing information in an information bearing objects (documents) in a multidisciplinary and collaborative environment for its reuse. A model proposed is associated to information system development for product innovation. Forging this specification of information access is expected to provide a base for programmable methodology for accessing a wide range of information in a consistent manner. For example, it was conceived to provide a base to address an information space with specific parameters. It was also meant to provide methodology of programming such as events, methods and properties for such space. The conception was meant not only to create associated programming methodology but storage of shared knowledge emanating from multidisciplinary collaborative environments. The initial concern was to provide a way to share expertise information for transport product innovation in a region of France. The work identified four parameters of information bearing objects: Content, Reference, Annotation and Support. These parameters were used to propose methodology of information representation associated to electronic domain. The detailed parameters were explained with examples. Keywords: Information space, information context, information source, reuse, sharing, JSON.
1 Introduction Most definition of information assumed that information “is a processed” data. Other view of information is the fact that, it is the possession of a decision maker that makes it possible for him to make guided decision. The question that is often neglected is when, where, how can we say that a data has been processed bearing in mind the language of presentation, object in reference, other circumstances surrounding the presented information and other the peculiarities of the information audience? The attempt in this work is to characterize information in line with the environment of H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 678–685, 2011. © Springer-Verlag Berlin Heidelberg 2011
CRAS: A Model for Information Representation
679
information and the place of the information user. It was assumed that information should be defined integrally from the perspective of the user (usage) and from the perspective of the analyst (information generation) all in respect of the object referenced.
2 Related Work In a related work [4], attention is given to the organization of online information space to assist in navigation. It justified the necessity of spatial information organization by comparing its to the importance of spatial organization proposed by architects. In another development, Normore and Bendig, [9], proposed a systematic classification system hinged on multidimensional information space in which words/concepts are arrayed. The essence of this was to create space to place information items in order to establish relationships among items. The reference model for an Open Archival Information System (OAIS) information model [2], is no doubt a detail information model. The problem with it is the complicity and the requirements in adapting it to specific use (particularly in personal use). In the case of Dublin core detailed in [6] and other works [3][4][7][8], the attention was more for document creation. The content of information was not given expected attention. It also emphasized information creation without necessary reference to personal interpretations from the users. In the case of Dublin Core, its proposed fifteen different approach of viewing document (i.e. Contributor, Coverage, Creator, Date, Descriptions, Format, Identifier, Language, Publisher, Relation (Related Resources), Rights, Source (resource that gave birth to the source), Subject, Title and Type). Though these are relatively good starting point to begin understanding document composition, reference was not address and the context of its relative perception by users. The work of Jamil and Modica (2002) looked at the possibility of extending XML to include object-oriented features with inheritance in document. The work demonstrated the use of XML with dynamic inheritance to assist in better document designs, decreased management overheads and support increased autonomy. Applying the adaptation to categorizing information sources based on content, medium, user and reference will enhance object-oriented information representation. The objective in this work is not from the perspective of dynamism of objects but partitioning the entire information “object” for contextual access.
3 CRAS Model The work defined an information space as an environment or an object containing accessible and comprehensible (contextually) information. It may be described as:
Sx → F (CRAS ) The type of information, the quality of information, and factors surrounding information comprehension is not a subject of the work. An information space Sx (information bearing source) can be referenced in time and space with events and actors. It was divided into four layers. (a) Content layer (b) Reference layer (c) Annotation/user
680
Y.H. Kao, A. Lepage, and C. Robert
layer (d) Support layer. Each layer has its specific properties, accessing method and associated method of interaction. The associated method of interaction in each layer is dependent on the reference layer. Reference layer itself is self dependent. F(CRAS) Æ Content (title, descriptors, author, date created, date accessed, domain of reference) Reference (Network, source, focus, parameters of reference) Annotation (user, objective, object, context, type) Support (information type, media support type, coding language, lifespan)
Information sources CRAS
Users
Fig. 1. An overview of CRAS Model
Contextually, an information space was represented with selected domains of interest. Each domain itself was characterized with CRAS. What is meant by domain is nothing but discipline of interest that reflects the perspective of users Domain of interest is a sum of several information spaces.
∑ CRAS n
1
An information space is a container of information (content). It must transmit information through a channel (support). It should be possible to reference it by the method of access and its relationship to other information spaces (reference). When information is accessed, it is primarily for use. The use may be immediate or for future. The use can be personal of for public. When information space is accessed, the user directly or indirectly classifies the accessed information based on his experiences. This classification is just one feature of annotation.. 3.1 Content Layer It was believed that an information bearing object (document) is primarily a container of information. A document is expected to transmit information. The transmission and reception of message in a document demands different consideration. The assumption
CRAS: A Model for Information Representation
681
is that most objects that referred to as document have integrated constituent. A multimedia document is defined as any document having one or more of the following constituent in isolation or in integrated form. An information source can contain text, sound or image. In the table below, these forms of we give example. The idea was that all multimedia information sources can be identified using one of the category in the table provided in table 1. The word multimedia means “more than one”, but in this case, information source with typically just one of the three characteristics in the table was included. This is because; it was assumed that the contextual term of text, image and sound need further clarification that is beyond the considerations here. Definition of image for instance is subjective. An artistic writings may be called text, image or even text and image? A Japanese calligraphy is an image to average French. The definition of image, text or sound is a contextual task. Furthermore, the frontiers of separation between attributes of these types of documents remain unclear. For a layman point of view, documents are class using seven classes in table 1. A combination of sound and image (class no 3) is a video. Combining text and sound is an advertisement. It simply mean that images are set to zero and summation of texts and sound evaluated. An attempt was not made to include animated images because animated images (videos) will involve the use of mathematics to express geometric coordinate of object(s) involved. In other words, this work considered animated document as a summation of aggregated separate images with sound overlay. This is to say that, any information space may be deemed to have existed in a three dimensional planes (x,y,z) at any time. Animated videos are reconstitution of an information space with respect to two or more documents for possibly successive or separated times. 3.2 Reference Layer The term “reference” was used in a specific way to mean what is required to access content of information. A reference to information is the pathway of accessing specific path of elements in a document. The specific elements may be the dominant character or another identifiable character. Identification of dominant character was done using structural constituents of documents. Referencing information space is done using two methods: (access reference and relative reference Access reference: (network parameters, technology, resources) Relative reference (location, time, context) Access. Network parameters are parameters that can be used to relate the sets of information spaces in an entire multimedia document. A space in a multimedia document must have some specific access parameters. In access reference, the questions that must be answered include: What is the place of an information space in the entire document? For example, a set of information spaces may be introducing other information spaces. It may be summarizing other set of information spaces. Another question is, what relative role is a set of information space (possibly a scene) playing? What is the implication of removing or altering the set of information spaces? Technology and resources are needed most of the time to access a series of information spaces. The technological and physical resources needed to access a set of information spaces is too complex to be considered in detail. It is important to note that these accessing methods is dynamic..
682
Y.H. Kao, A. Lepage, and C. Robert Table 1. Table of type of document sources
Class No Text 1 2 3 4 X 5 X 6 X 7 X
Sound
Image X
X X
X X
X X
X
Representation T0S0 ∑ Ij T0S0 ∑ Ij T0 ∑ Sj * ∑ Ik S0I0 ∑ Tj S0 ∑ Tj * ∑ Ij I0 ∑ Tj * ∑ Sj ∑ Ij * ∑ Ij * ∑ Ik
Example Paints Music Video Book Commented image Advertisement Commented Video
Relative. Multimedia document may not be considered in isolation. They can be considered in relationship to other document in history and time. Since documents are supposed to contain information, the information must be located relative to media containing them. 3.3 Adaptive Annotated Layer There can not be information in a document without a personal interpretation (view) to it. Whenever there is information conveyed by an author of a document, the view of the author is not necessarily the view of the user of that information. Unconsciously, every user is directly adding a layer of information on the existing information. This layer is essentially adaptive to his own personality independent of the original author. This work called this layer of interpretation by a user “an annotation”. An annotation may be skewed by external compromises and dictates (as in indexation) or it may be left to the discretion of document user. An annotation may be made available to other users of the same document or to the privacy of the document users. The concern here is its availability to the general public for its sharing. This work is of the view that annotation takes care of anything that has to do with the view and perception of users. Whenever and information source is accessed, the user is bound to classify and interpret the information. The classification and interpretation is dynamic based on social and individualism in each user The classification itself is an annotation on the information source that can be dependent on users and context. It is interesting to note that most work in annotation do not consider the time of consultation, this work consider time as paramount factor in annotation.The subject of annotation is seen from two perspectives and two levels. Annotation is an action as well as an object. The level of annotation is beyond the simple classification of information space to its use for summarizing, interpreting, questioning, making remarks on information space. It is also a reflection of individualism. Assuming that
An annotation may not be created by two persons on an information bearing source. Two or more person may create different annotations on same information source A document may have more than one annotation which is independent of other annotations on other sources that are in reference to the same information source.
CRAS: A Model for Information Representation
683
An annotation is represented as a function of domain of reference, host document, its creator, context and time. Δ ∈ ƒ(domain, user, information source, context, time) An annotation on a document may share two or more domain in common with another annotation on another document. It was assumed that an annotation may belong to several domains depending on the point of reference. Not more than three domains is of importance for any point of reference. A creator of annotation is expected to make his annotations based on his experiences and a domain of reference. The concerned is on three types of experiences. The author of a document may not influence the annotations made on his documents. 3.4 Support Layer The Open Archival Information System (OAIS) was an attempt to provide a model for an archive consisting of an organization of systems and people that were involved in preserving information and its dissemination (CCSDS, 2002) (Cirocchi et al,, 2000). The work gave description of the repository of information archive. Information space from the perspective of their support (container) can be viewed from two broad realms. Information may be hosted on a visible object or intangible object. It may be hosted in an analogue device or in a digital device. The concern of this work is not to detail the specific characteristics of the object hosting information but to give a general description. It suffices to describe an information media as: {media} name:→{Xk} description: →{Cx} peculiarity: →{Ix} {media} Each of the parameter belongs to set of narratives. These narratives have been referenced from contextual and fields of learning. These were also measured with specific parameters explained in cybernetic discusses.
4 Application Using XML and JSON CRAS model was applied to three types of information bearing sources (a) multimedia object (video, audio and images) (b) Web sites and (c) written text libraries. To do this, the following mathematical equation was used to summarize each document:
ηi = ∑ C(t i s ) • R(α, r) • Δ(δ u Χ Ωt τ ) • S(N, D, P) x y z
k k
k
k k
where
C (txiysz ) is the content in a specific information space typically in terms of text (0,1), image (0,1) and sound (0,1) R (α , r ) is the reference to the information space (access
α,
and path r)
684
Y.H. Kao, A. Lepage, and C. Robert
(δk uk Χk Ωtkτk ) is the parameter related to user’s adaptation. The parameters are user’s domain δk uk , context, information X, source reference Ωt and the time of access kτ respectively S ( N , D, P ) is the support for media of the information space. (Nature N, Device D and propertiy P)
Summations in this equation simply mean that, there are several instances to the document. An XML-style of representation of this model was created with four sections and details of each of the section. Example of such XML is given as:
Religious http://www.isfahan.org.uk/
Architecture Tourism
UNESCO, Cultural Heritage, World Class
Isfahan has been designated by UNESCO as a world heritage. It contains a wide range of Islamic Architectural styles ranging from the 11th century (C.E.) to the 19th. Isfahan Web Server Ardalan, Nader and Bakhtiar, Laleh 2007 January 25, 2009
http://www.isfahan.org.uk/biblio/biblio.html
Text
Charles Robert Yueh-Hui Kao Alain Lepage
Review One key issue for metadata management systems is the way to manage semantic interoperability in architectural related field
electronic English,Arabic unknown
CRAS: A Model for Information Representation
685
Applying the model, information spaces were evaluated and represented to correspond to sections of JSON compatible document. An example is the listing above One of the sharing possibilities that was attempted was to advance crasXML generated by the model to a JSON parser. The proposition in crasXML was not “open ended” since parsing in JSON was accomplished on a character-by-character basis. JSON was preferred for a start because it permitted data format for the exchange of information between a net browser and document / object server. It was also a preferred platform to test these propositions because of availability of availability of libraries outstandingly for web information management. In order to provide greater flexibility for exchange, the crasXML files were directed to JSON libraries ported for JAVA, VBNET, TCL, C++ and native JSON JavaScript.
5 Conclusions and Perspectives This work represented information space using four selected criteria of content, reference, annotation and support. The essence was to look at information space from the perspective of its significance and access by a user. Last part of this work was the creation of XML-style standard for information space representation called (crasXML). The standard was meant to facilitate information representation portability across medium, devices. It made room for information reuse particularly using JSON parser. This was done with the objective of providing access to specific section of information.
References [1] Castells, F.M., Vallet, D.: An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval. IEEE Transactions on Knowledge and Data Engineering 19(2), 261–272 (2007) [2] CCSDS, Reference Model for an Open Archival Information System (OAIS), Consultative committee for space data systems, (2002) (November 8, 2007), http:// public.ccsds.org/publications/archive/650x0b1.pdf (consulted February 05, 2011) [3] Cirocchi, G., Gatta, S., Panciera, L., Seta, E.: Metadata, quality information and preservation of digital resources, Associazione italiana biblioteche. BollettinoAIB 2000 3, 328–329 (2000) [4] Dourish, P., Chalmers, M.: Running Out of Space: Models of Information Navigation. In: Short paper presented at HCI 1994, Glasgow, UK (1994) [5] Hariharan, P.C.: “Media,” UCLA Information Studies Seminar on Preserving Authentic Records in Electronic Systems, the US-INTERPARES Project (1999), http://is.gseis.ucla.edu/us-interpares/Mediareport.pdf (consulted March 03, 2011) [6] Hillmann, D.: Using Dublin Core (2005), http://dublincore.org/documents/ usageguide/ (consulted November 8, 2007) [7] Jamil, H.M., Modica, G.A.: An Object Oriented extension of XML for autonomous Web applications. In: CIKM 2002, McLean, Virginia, USA, November 4–9 (2002) [8] Luciana, D.: Preserving Authentic Electronic Art Over the Long-term: The InterPARES Procject. In: Paper Presented at the Proceedings of the American Institute of Conservation/Electronic Media Group Conference, Portland, Oregon, June 13-14 (2004) [9] Normore, L.F., Bendig, M.: Using a Classification-Based Information Space. In: IFLA Satellite Meeting on Subject Retrieval in a Networked Environment, Dublin, OH (2001)
A Comparative Study on Different License Plate Recognition Algorithms Hadi Sharifi1 and Asadollah Shahbahrami2 1
Department of Information Technology University of Guilan, Rasht, Iran [email protected] 2 Department of Computer Engineering University of Guilan, Rasht, Iran [email protected]
Abstract. In the last decades vehicle license plate recognition systems are as central part in many traffic management and security systems such as automatic speed control, tracking stolen cars, automatic toll management, and access control to limited areas. This paper discusses common techniques for license plate recognition and compares them based on performance and accuracy. This evaluation gives views to the developers or end-users to choose the most appropriate technique for their applications. The study shows that the dynamic programming algorithm is the fastest and the Gabor transform is the most accuracy algorithm. Keywords: License Plate Recognition, Hough Transform, Gabor Transform, Dynamic Programming, Morphology.
1 Introduction License Plate Recognition (LPR) is an important subject in today life. Streets and roads are full of various motor vehicles and it is important to identify them for a lot of applications such as speed controlling and security management. The LPR is used in real-time systems; it should provide both accuracy and acceptable response time [1]. Some of the LPR systems are based on image processing techniques and character recognition systems. Each LPR system consists of three basic sections namely image acquisition, License Plate Detection (LPD), and Optical Character Reader (OCR). The image acquisition section receives a signal from a motion sensor and captures an image using a camera. In order to reduce motion blur it should use a high speed shutter. The LPD section of the system analyzes the captured image to find plate location or alphanumeric characters. Some algorithms are based on finding the license plate by using image features such as the shape, color, or height-to-width ratio. The performance of these algorithms is very sensitive to change in environmental conditions such as light or weather conditions that affect the quality of image features. Third part segments the characters and uses an OCR module to read the segmented characters that appear in the plate [2]. The object recognition systems include two functions, detecting the object in a scene and recognizing that object [3]. Most of image H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 686–691, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Comparative Study on Different License Plate Recognition Algorithms
687
processing techniques for the LPD are based on the neural networks, Gabor transform or Hough transform, and Ada-Boost models. This paper studies and evaluates some common LPD algorithms. This paper is organized as follow. Section 2 describes four LPD algorithms and section 3 compares them based on the performance point of view. Finally, conclusions are drawn in section 4.
2. The LPD Algorithms We discuss dynamic programming-based, Hough transform, Gabor transform, and Morphology-based in this section. 2.1 Dynamic Programming-Based Method In the Dynamic Programming-based (DP) algorithm [4], developers never need to find license plate location in the image. This is because it segments the alphanumeric characters directly on the license plate. It does not also require any image features of the license plate such as edges, colors, or lines, which are always affected by intensity variations. To implement this algorithm a wide range of threshold values are considered to detect blobs, which are containing license plate number. Each blob has some key specifications like the height, the width, coordinates of center position, and the threshold value used for extracting it. There is an energy minimizing framework to extract the correct blobs. The vertical and horizontal distance between the center positions of two neighboring characters must be minimized. If two neighboring blobs with the center distance d, as shown in Fig. 1, are within the permitted range, then, they can be considered as correct candidates for part of the numeric characters.
Fig. 1. Geometric specification between two neighboring numerical character blobs. Horizontal distance between two blobs is shown with (d) [4]
The blob extraction module uses most of the computational time, because the threshold values are changed repetitively and image labeling algorithm must be executed once for each threshold value. The DP algorithm has a low computing time so, it is known as a fast algorithm.
688
H. Sharifi and A. Shahbahrami
2.2 Hough Transform Second algorithm is based on combination of Hough transform and contour algorithm [1]. It is oone of the most efficient algorithms to detect lines from binary images. It looks for regions containing two parallel lines which considered as plate candidates. Execution time is a disadvantage of the Hough transform. It requires too much computation when being applied to a binary image with high resolution. In other words, the computational time for high resolution images are so much. Although, image thinning pre-processing can improve the algorithm’s speed, Hough transform computational time is still high and it is difficult to use it for real-time traffic management systems. In order to improve the performance, Hough transform is combined with contour algorithm. From the extracted edging image, it uses the contour algorithm to detect closed boundaries of objects. These contour lines are transformed to Hough coordinate to find two interacted parallel lines (one of 2-parallel lines holds back the other 2-parallel lines and establishes a parallelogram-form object) that are considered as a plate-candidate. Since there are quite few (black) pixels in the contour lines, the transformation of these points to Hough coordinate requires much less computation. Hence, the speed of the algorithm is improved significantly without the loss of accuracy as shown in Fig. 2.
1 Plate candidate
3 Plate candidates
Fig. 2. Combination of Contour algorithm and Hough transform for the LPD system
[1]. However, this technique may detect the headlights or windscreen falsely as license plate candidates, because they have parallelogram shape. Candidates for license plate could be evaluated by a module to reject incorrect ones, and true one remains. From the two horizontal lines of a candidate, it can calculate exactly how inclined the line was from horizontal coordinate. Then it applies a rotate transformation to adjust it to straight angle. After processed, these straight binary plate-candidate regions were passed to a number of heuristics and algorithms to evaluate. The evaluation algorithm of the license plate candidates are based on two main steps, which follows as:
A Comparative Study on Different License Plate Recognition Algorithms
•
689
In this evaluation stage, the ratios of width to height will check to be in a permitted range, if we consider the width as W, the height as H, and the permitted range as ( , ) then: (1)
•
Evaluate the candidates by counting objects cut by horizontal crosscuts. There is a predefined range of objects for each desired license plate type which should be cut in a horizontal line on the plate. Checking this property is useful to evaluate the candidates.
2.3 Gabor Transform Gabor transform is also used for LPR [2]. It is a computer vision system that detects license plates and segments license plate into characters in an image by using the Gabor transform in detection and vector quantization in segmentation. The Gabor filter is one of the major tools for texture analysis. The benefit of this technique is texture analysing in all directions and scale. The filter responses that result from the convolution with Gabor filters are directly used as license plate detector. Three different scales including 9, 11, and 15 pixels, and four directions including 0°, 45°, 90°, and 135° are used, resulting in a 12 Gabor filters. Fig. 3 shows an intensity image and its Gabor filter response. High values in the image indicate the probable plate regions. In order to segment these regions, first threshold algorithm is applied and the binary image is produced. Then, the morphological dilation operator used to the binary image in order to merge neighboring regions. Finally, the license plate regions are simply extracted. Fig. 4 shows the results of the detection by the given method. 2.4 Morphology-Based Algorithm Morphology is an image processing tool, which is based on shapes [5, 6]. Each shape has a Structural Element (SE) and morphological operators use this SE to analyze digital images. License plate has the rectangle shape, so morphology is suitable for LPR because of the rectangle shape of plates. By using this method, many candidates may be detected. To evaluate candidates and reject false ones and find the license plate location some features such as shape, aspect ratio, and width to height ratio will be checked. The LPR procedure based on morphology operators described as follows: • • • • • •
Preprocessing stage containing a convert of RGB image to gray scale image. If original image is taken in gray scale, so time needed for this step can be saved. Use the Sobel edge detection algorithm on the gray scale image. Edge dilating operator (morphology) Close operator (morphology) Noise cleaning Evaluating candidates based on desired feature
690
H. Sharifi and A. Shahbahrami
Fig. 3. Left images are original images and the right images are normalized the Gabor filter responses [2]
3 Discussions The dynamic programming algorithm does not need any edge detection algorithm. The environmental conditions do not almost affect the image features. In other words, its performance is assured in day-night and every weather conditions. The DP does not consume processor time-cycle for converting a gray scale image to binary and neither for edge detecting. The Hough transform algorithm is a time consuming method. In order to improve its performance thinning algorithm must be executed in the preprocessing stage. The Gabor method has a high efficiency in the both parts of the LPR, detecting the license plate and segmentation, but its disadvantage is high execution time regarding to its computational complexity. Morphology based algorithm has the lowest accuracy and executing speed, but it has a simple implementation. Table 1 compares discussed methods based on four basic factors.
A Comparative Study on Different License Plate Recognition Algorithms
691
Table 1. Comparing LPR algorithms based on four basic factors Method
Dynamic Programming Hough+Contur Gabor transform Morphology
Implementation complexity
Sensitive to environmental conditions
Edge detecting
Computational time
High
Low
No
Low
Medium Low Low
Low High High
Yes Yes Yes
High High High
4 Conclusions Four LPR techniques, dynamic programming, Hough transform, Gabor transform, and morphology-based have been discussed in this paper. Their performance and accuracy have been compared to each other. The dynamic programming algorithm is the fastest and the Gabor transform is the most accuracy algorithm. In our future work we consider more LPR techniques for evaluation.
References 1. Duan, T.D., Hong Du, T.L., Hoang, T.V.: Building an Automatic Vehicle License Plate Recognition System. In: International Conference on Computer Science, Research, Innovation, and Vision for the Future (2005) 2. Kahraman, F., Kurt, B., Gokmen, M.: Licence Plate Character Segmentation Based on the Gabor Transform and Vector Quantization. In: International Symposium on Computer and Information Sciences (2003) 3. Dlagnekov, L.: License Plate Detection Using AdaBoost. La Jolla: Computer Science Engineering Department, University of California, San Diego (March 2004) 4. Kang, D.J.: Dynamic Programming-based Method for Extraction of License Plate Numbers of Speeding Vehicles on the Highway. International Journal of Automotive Technology 10(2), 205–210 (2009) 5. Kasaei, S.H., Kasaei, S.M., Kasaei, S.A.: New Morghology-Based Method for Robust Iranian Car Plate Detection and Recognition. Internatoinal Journal of Computer Theory and Engineering 2(2) (April 2010) 6. Martin, F., Garcia, M., Alba, J.L.: New Methods for Automatic Reading of VLP’s (Vehicle License Plates). In: International Conference on Signal Processing Pattern Recognition and Applications (2002) 7. Zhang, H., Jia, W., He., X., Wu, Q.: A Fast Algorithm for License Plate Detection in Various Conditions. In: IEEE International Conferense on Systems, Man, and Cybernetics (October 2006) 8. Arth, A., Limberger, F., Bischof, H.: Real-Time License Plate Recognition on an Embeded DSP-Platform. In: International Conference on Computer Vision and Pattern Recognition (2007) 9. Anagnostopoullos, C.N., Anagnostopoullos, I., Loumos, V., Kayafas, E.: A license Plate Recognition Algorithm for Intelligent Transportation System Applications. IEEE Transactions on Intelligent Transportation Systems 3, 377–392 (2006) 10. Broumandnia, A., Fathy, M.: Application of Pattern Recognition for Farsi License Plate Recognition. International Journal of Graphics Vision and Image Processing 5, 25–31 (2005)
A Tag-Like, Linked Navigation Approach for Retrieval and Discovery of Desktop Documents Gontlafetse Mosweunyane1, Leslie Carr2, and Nicholas Gibbins2 1
Department of Computer Science, University of Botswana, P/Bag 0074, Gaborone, Botswana, [email protected] 2 School of Electronics and Computer Science, University of Southampton, Highfield, Southampton SO 17 1BJ, United Kingdom, {lac,nmg}@ecs.soton.ac.uk
Abstract. Computer systems provide users with abilities to create, organize, store and access information. Most of this information is in the form of documents in files organized in the hierarchical folder structures provided by operating systems. Operating system-provided access is mainly through structure-guided navigation, and through keyword search. An investigation with regard to access and utilization of these documents revealed a need to reconsider these navigation methods. An improved method of access to these documents is proposed based on previous effective metadata use in search system-retrieval and annotation systems. The underlying organization is based on a model for navigation whereby documents are represented using index terms and associations between them exposed to create a linked, similarity-based navigation structure. Evaluation of an interface instantiating this approach suggests it can reduce the user’s cognitive load and enable efficient and effective retrieval while also providing cues for discovery and recognition of associations between documents. Keywords: Linked navigation, Tags, Metadata, personal information management, retrieval.
1 Background 1.1 Introduction Information overload, especially with information in digital form, is a widely recognized and experienced phenomenon [1], as well as being a well-discussed issue in research. Users of computer systems have access to and create large amounts of information. Storage has become more affordable [2] resulting in an increase in storage capacity of devices, allowing individuals to store more data in what has been termed “personal archives” [3]. Most of this data is in the form of documents in files organized in hierarchies on the user's computer system. Gemmell et al [4] predicted that by now terabyte hard drives, with capacity to store 2900 1MB documents per day for a year, will be H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 692–706, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Tag-Like, Linked Navigation Approach
693
common and inexpensive. This is now the case with even capacity up to 2TB reported, although desktop hard drives are rarely above 500MB1. This paper proposes an automatic generation of tag-like keywords to aid linked navigation and discovery of documents stored in the hierarchical folder structure provided by operating systems. This method was hypothesized to improve accessibility of documents through exposure of the tag-like keywords from document properties to provide shorter and precise paths to and connections between documents. 1.2 Current Methods of Browsing Documents and Problems Traditional operating systems employ the use of the desktop metaphor for organizing information, where the digital desktop is managed as the physical one, making desktops behave in a more physically realistic manner. The monitor of a computer represents the user's desktop upon which documents, and folders containing documents, can be placed. A document can be opened into a window, which represents a paper copy of the document. Files can also be spatially arranged on the desktop individually or in groups in different sections of the screen. A large part of managing documents involves organizing them, and this is done mostly by using hierarchical structures provided by operating systems [16]. Systems based on the desktop metaphor allow for information to be stored in documents in files that can be named and placed in folders that can be nested to form hierarchical structures. This storage model is used on the most pervasive computer systems such as Microsoft Windows and the Apple Macintosh and is the only work environment known to many users and designers [17]. The hierarchical file system model matches the underlying data storage on devices and was initially used to provide efficient access to files on disk [7]. Operating system-provided access to information stored using this model is mainly through navigation guided by the structure which is based on the location of the files, and through keyword search. The operating systems also help users to create personalized views of the search. Tags or keywords attached to files help the user and search systems locate more relevant items. For example, in Windows Vista search results can be organized based on file properties like filenames, file types, author, or descriptive keywords (“tags”) that the user added to the files. The files can also be arranged by type, for example, documents, spreadsheets, or presentations. Newer operating systems such as Windows 7 (Microsoft Corporation, 2006) and Apple Macintosh OS X (Apple Inc) have incorporated advanced search technologies in their systems that allow users to sort or group results according to their needs. Both operating systems provide powerful search mechanisms based on indexed file metadata and contents, and methods to dynamically organize the results according to the file attributes. The Finder and Spotlight in Mac OS X and the Windows 7 Start Menu search box provide instant access to both documents and applications. These system-provided methods have limited abilities to organize files spatially, temporally and logically [18]. The hierarchical file system method of organization 1
http://en.wikipedia.org/wiki/Hard_disk_drive#cite_note-2TB-15
694
G. Mosweunyane, L. Carr, and N. Gibbins
provides simple and intuitive navigation of the whole file system [17], but it has also proved to be mainly static, and presents problems in categorizing, finding items later and reminding users of what items they have [19] among other problems. 1.3 Document Properties as the Solution Metadata has been recognized as an important part of document organization systems and search systems, providing further information on documents for organization purposes or being utilized to enhance search results. Metadata about documents, in particular, has been used to help users understand more aspects of the documents, by search systems to improve search results and by classification systems to categorize documents. For desktop documents metadata comprises file attributes that are supported either as part of the filesystem, known as built-in file properties, or as an additional feature that allows users to define and associate files with metadata outside the filesystem (these are known as extended file attributes). In Windows systems the built-in file properties include user-defined values like filename, author, keywords and comments, and system-controlled values such as creation date, last save time and number of pages. The built-in file properties in Windows Systems are incomplete, with some fields empty, but this metadata, however minimal, can provide useful index terms which are useful for retrieval and discovery of documents as will be presented later in this paper. By saving documents in folder hierarchies users are explicitly categorizing and linking documents. This view of folders as conceptual categories and document attributes has been expressed in literature before. By creating the hierarchy users are already specifying tags or annotations for the documents which can be utilized as document index terms to provide a shorter and more precise path to the documents. These, together with other file attributes, have been seen to play an important role in helping users view documents in their hierarchies. File attributes which include the folder hierarchy (paths) already define their context and form a comprehensive framework to the hierarchy which can be exploited to provide exploration lines for reminding and helping users discover information in their personal document archives.
2 Methodology – A Model for Desktop Documents Retrieval and Association 2.1 Metadata Harvesting and Specification We extract all properties for commonly used documents types on the desktop using Windows Management Instrumentation commands. The commands require technologies such as the Component Object Model (COM) to enable a program to interact with the Windows Operating System through languages like Microsoft Visual Basic, C++ and scripting languages on Windows that can handle Microsoft ActiveX objects such as VBScript. These are used to crawl the hard drive within a given root, or a given top-level directory, to recursively explore the sub-folders and files contained
A Tag-Like, Linked Navigation Approach
695
therein in turn. For each file the built-in properties are extracted in the form of property-value pairs. We extract all properties for commonly used documents types. The commonly used document types are Microsoft Office documents (Word, Excel and Powerpoint), html, htm and portable document format (pdf). Visual Basic Scripting Edition (VBScript) is used as a lightweight approach to implement the WMI classes on Windows systems, with the advantage of its ability to utilize the Windows Scripting Host to run directly on a user's computer. An example of some of the commands used for crawling a folder is given below. StrFolderName= CreateObject("Scripting.FileSystemObject").GetAbsolutePathName( ".") Set colSubfolders = objWMIService.ExecQuery _ ("Associators of {Win32_Directory.Name='" & strFolderName & "'} " _ & "Where AssocClass = Win32_Subdirectory " _ & "ResultRole = PartComponent") arrFolderPath = Split(strFolderName, "\") strNewPath = "" For i = 1 to Ubound(arrFolderPath) strNewPath = strNewPath & "\\" & arrFolderPath(i) Next strPath = strNewPath & "\\" On the other hand similar Perl commands are used on Macintosh and Linux systems. Although the Perl script could be used on Windows systems this also requires a Perl interpreter to be installed, which is not ideal when dealing with test users. On the other hand Perl comes packaged with Linux and Mac systems while VBScript is included in Windows systems, hence the decision to use both. To make it easy to identify the files, properties and values extracted and relate them the metadata is structured into Semantic Web form (using the Resource Description Framework - RDF) and based on an ontology. The ontology is designed to define the domain of desktop files in order to be able to represent metadata about files on the desktop in a concise and identifiable way. Some terms are reused from standardized ontologies like the Dublin Core and languages such as the RDF Schema (RDFS) for defining the metadata and the ontology specifies the additional ones that could not be found in the commonly existing schemas. The ontology was defined using SWOOP version 2.2.1 [20], a tool for creating and editing Web Ontology Language (OWL) ontologies. Uniform Resource Identifiers (URIs) for files were defined following the specification RFC 1738 [21] which describes Uniform Resource Locators (URLs), strings that allow for location and access of resources via the Internet. A URL is in the form
696
G. Mosweunyane, L. Carr, and N. Gibbins
file://
>
where host is the fully qualified domain name of the system on which the path is accessible, and path is a hierarchical directory path. The URI provides a direct link from the metadata, and therefore from the application implemented, to the document. The overview of the ontology is given in Figure 1 and an example RDF description for a file is given in Figure 2.
Fig. 1. Overview of the File Ontology 2.2 Index Derivation and Clustering A lightweight model that utilizes the documents attributes is proposed to facilitate retrieval and discovery of documents from file system hierarchies. The model is based on the metadata derived from the file system as above, and determination of similarity between documents. In hypertext and hyperindexing indexes serve as aids that facilitate the location of objects [8]. If ordered, index entries can facilitate the quick location of relevant entries. The semantic metadata is processed to build for each document a forward index by deriving terms from the metadata values which then serve as index terms or keywords in document recognition, retrieval and clustering. The terms are actually any text fragments which are recognized by the algorithm as single words but might only make sense to the creator as they may be “codes” used, for example, to describe files and folders. The algorithm for building the indexes for each document or file is as follows.
A Tag-Like, Linked Navigation Approach
697
First the metadata is queried for all property values relevant to a given document. The extracted property values are all treated the same irrespective of property value. While being extracted some pre-processing is carried out to select useful terms in the multi-valued attribute values. This is done by removing stopwords (for example words like “and”, “the”, “is”), punctuation (replace with spaces to separate terms) and converting all the values to low case to facilitate easy comparison. Stemming, which is usually done in indexing to eliminate variation caused by presence of different grammatical forms of the same word (for example, “research” and “researched”), is not carried to avoid intervening with user's intended meaning of concepts. This is mainly for purposes of presentation of these for view by the user.
Fig. 2. RDF Description for an example Word document
The string value is then divided into terms according to spaces. The derived terms are stored as a document index set: Doc Uri Æ{t1, t2, ...tn}. All the documents' forward indexes are then used to compile a unified metadata index for the whole document set after parsing and term recognition. The indexes are further enhanced by lookup, selection then addition of synonyms from WordNet,
698
G. Mosweunyane, L. Carr, and N. Gibbins
large-scale English lexical database developed at Princeton University [9]. This is done by matching then grouping semantically related words that are already in the metadata using WordNet synsets, thereby implicitly clustering documents with similar keywords. The process is shown in Figure 3.
Attributes Gathering
File Ontology
Semantic metadata generation
RDF Parser
Semantic metadat a pars ing
Met adata databas e
Document metadat a values querying
Term recognition and cleanup
WordNet Database
Unified metadata index generation
Document term Index
Uni fied Metadat a Index
Fig. 3. Index Derivation Process
2.3 Term-Based Documents’ Similarity Retrieval and hypermedia systems employ some kind of similarity measurement to determine and present or integrate related items together. Document similarity has been measured especially using the vector space model, a method makes use of weighted sets of terms for documents to compute their similarity. Because the terms we extracted include those from the hierarchy structure where the folder names form a structure showing relations by location between the documents, distance-based semantic relatedness measures [30] may be applicable, even if only as part of the whole solution. These are based on counting the number of edges (which can also be assigned a weight based on depth or density of a node or the type or strength of a link) between the concepts, with a shorter distance signifying more relatedness. The methods were not used for in research since they are already covered by considering the commonality of terms in the above approach (documents in the same
A Tag-Like, Linked Navigation Approach
699
directory are more likely to share more terms), and the fact that the mental model behind assignment of documents to folders has not been well studied and is not clearly defined enough to warrant assignment of weights to the terms themselves. A document-document similarity matrix is then built after the index is created based on pair-wise comparisons of terms in documents’ forward indexes. Each row (document) is then extracted and ordered by cell value on a similarity request (selecting a document thereby requesting more details) on the interface during browsing. Given Dx as the set of terms for document x, and Dy as those for document y, S(Dx, Dy) represents similarity between document x and document y. The whole matrix is initialized to zero before the comparison starts. Dx = {Tx1, Tx2, …Txn} Dy = {Ty1, Ty2, …Tym} i, j, if Txi like Tyj then S(Dx, Dy) = S(Dx, Dy) + 1 ∑ ∑ That is, S D , D Dx Dy
(1)
The comparison is described as likeness or approximation of similarity (rather than equality) between terms since partial matching based on stemming and synonymy is taken into account. The partial matching is also a result of the data cleanup that was done during term extraction (removal of punctuation, stopwords and conversion to lowercase). With this approach document Di is more similar to document Dj than to document Dk if S(Di,Dj) > (Di,Dk), that is, more common terms between Di and Dj than Di and Dk.
3 The Desktop File Browser The model is then used as an underlying organization to develop an interface for supporting retrieval and discovery of documents. The interface implemented, the Desktop File Browser (DeFiBro) offers a browsing interface for retrieval of desktop documents. The following are a description of important aspects of the interface shown in Figure 4. • Overview - provision of some sort of overview over a document collection is essential. Facets have already been seen to provide good overviews [10, 15]. The interface implemented provides two kinds of overviews: an alphabetic and numeric index of document names, and overlapping facets in the form of authors, organisations and keywords (terms) derived from all the metadata values. • Context - Initially the context of documents in the filesystem was the hierarchy structure itself. In our solution the folder structure have been “flattened”, the layout and positioning of documents in the structure has been relinquished in favor of a flatter and more visible, linked keyword structure. Within the facets provided the user can select items of interest to view clusters of documents based on the facet values. Context for a selected document is provided by related details consisting of linked metadata terms (to get the documents to select one another based on similarity of terms) and similar (associated) documents to the selection. In addition to providing context, linked metadata and similar documents add additional dimensions for navigation within the facet-based overview.
700
G. Mosweunyane, L. Carr, and N. Gibbins
• Details - File thumbnail representations have been found to be effective for locating and organizing documents in user interface studies dealing with documents [11, 12, 13 and 14). At a glimpse the user can see whether a document is a picture, table or plain text and can quickly decide whether there is need to investigate further by opening the document or whether to check other possible links. They can be especially helpful if the contents are clearly visible and the text is readable as is provided in the implemented interface. A preview of the first page of the document in the form of an enlarged thumbnail is given on hovering the mouse pointer over a listed document thumbnail. • Extract and Transfer - A `workspace' panel is provided to enable selection and extraction of desired documents with their metadata. The feature is implemented as a solution to user needs involving difficulty of working with documents across several folders and to address the need for creating desired groupings for immediate work purposes that might not be satisfied by the similarity clustering implemented.
Fig. 4. Selecting a document shows linked keywords, a preview and related documents
The user selects a facet to browse (Author, Keyword, Organisation) and the values of the selected facet are shown in top-most left-hand side panel, in figure 4.When an item in this panel is selected a list of documents matching the criteria satisfied by the selected value is provided in the Documents panel (second column from left).
A Tag-Like, Linked Navigation Approach
701
Browsing by filename presents a submenu where the first character of the file name can be selected and the corresponding filenames shown in the Documents panel. Pointing to a document name or associated icon reveals its preview in the preview panel and selecting it reveals the keywords extracted from its metadata in the keywords panel and related documents are shown in the panel below that. These keywords are linked, that is, one can select the keyword to display documents whose criteria it satisfies. While browsing the user can collect documents of interest into a collection “basket”, the workspace panel. Documents in this panel can be previewed and selected just as documents in the Documents and Related documents panels. Documents can also be opened in their respective applications by simply doubleclicking on the document as in the desktop.
4 Evaluation – Hierarchy Navigation vs. Linked-Based Navigation The implemented system endeavoured to demonstrate the importance of metadata and flattening of hierarchies in assisting users to locate, associate and discover documents while browsing their personal file hierarchies. To verify that users can indeed benefit from these, we perform an experiment that involves hands-on session tasks with users involving locating documents. To demonstrate the need to apply these concepts to improve operating system-assisted browsing of document hierarchies created by users on the desktop, we base these tasks on two environments. These are the Windows visualization method and our implementation, DeFiBro. With the Windows visualization method there are two ways: Windows Explorer or the simple zoomable visualization interface that involves clicking on folders to expand and view files and folders contained therein. Both have been found to have similar performance in locating tasks involving either familiar or unfamiliar hierarchies [22]. This is done in order to compare the two and to clarify the benefits of our approach through our implementation, while at the same time advocating for improvement of browsing documents on the desktop. At the end of the task users were asked to answer a few questions based on a five-point Likert scale and qualitative questions about how the two systems perform in comparison to each other. Since the research is mainly concerned with the browsing or navigation of personal information structures, in particular desktop document hierarchies, during knowledge tasks for retrieval of information sources (documents). Kelly [27] recommends using naturalistic approaches that allow people to perform Personal Information Management (PIM) behaviors in familiar environments with familiar tools and collections. The document browser implemented would have to be therefore evaluated within the context of personal document hierarchies in settings that match as much as possible real user settings and tasks. Owing to the difficulty of using the user’s hierarchy (confidentiality issues and difficulty of coming up with a task from personal information) a test file hierarchy (one of the authors’ own) was used. Users were allowed to familiarize themselves with the hierarchy before the commencement of the tasks. The tasks involved locating documents in the given test hierarchy using DeFiBro and Windows Visualization alternatively, given either the file name or a description of the document(s). The documents were located at the most popular levels where documents are stored, as established by a previous user study, of 2, 3 and 4.
702
G. Mosweunyane, L. Carr, and N. Gibbins
The results of the times the test users took to complete each task in the two environments were recorded in seconds and the averages computed. The summary of the results is shown in figure 5. Users reported the positive aspects of DeFiBro as being user friendly, offering of browsing by association and “searching” without providing search terms as well as helping out in a “badly organized” situation and providing features that enable more effective browsing like previews. Suggestions for improvement by test users were dominated by the need to quickly move within the results (documents found by browsing, keywords, organizations, author) by automatic selection/shortcuts based on keyboard characters. Other suggestions were for the cosmetic appearance of interface to be improved through conversion of the interface to a more graphic one, showing attribute-value pairs instead of only attributes and providing other attributes to locate files such as date. About half of the users indicated that they would have like the system to somehow be connected to the file hierarchy to provide for a way to get to the original location of the file. One user commented “...because in the first place I was trying to locate the file” to emphasize the need for such a system to be intertwined with the original way the files were organized and possibly the sense of attachment the users have with the way their “organized” file structure.
Fig. 5. Average Times for Windows Visualization and DeFiBro for all tasks
5 Other Similar Works – Semantic Information Management Tools 5.1 Introduction Similar tools using semantic information have been for integrating information items on the desktop, including connections to reach documents, and the ability for the user to manage and explore the connections between the items. The main aim of these integration tools is to solve the problem caused by fragmentation of information across different applications and devices by bringing it together in one interface.
A Tag-Like, Linked Navigation Approach
703
5.2 SEMEX SEMEX (SEMantic EXplorer) is a personal information system offering search-byassociation [23, 24]. Information browsing is provided through an underlying ontology which can be personalized by users. Users can browse their personal information by semantically meaningful associations previously created to allow for easier later integration by the user. SEMEX provides a generic domain model of classes and associations between them and uses this to organize data. This model can also be extended by users, for example, by their browsing pattern. A database of objects and associations between them is represented as RDF, and this is stored and retrieved using Jena. Lucene is used to index object instances by the text in their attribute values. The database supports “on-the-fly" integration of personal and public data by keeping associations and previous activities that the user performed. Users can then browse association links or do keyword search, selection-query search or association-query search. When executing a query, SEMEX also tries to deduce other related objects that are related to the matches found, but not necessarily specified in the query. Heterogeneous data is managed and many different references to the same real-world object are reconciled. 5.3 Haystack Haystack [25, 26] is a Java-based, open source system that aims to cater for different users' preferences and needs by giving them control and flexibility over storage and retrieval. It caters for storage and manipulation of task-oriented information sources like email, calendar and contacts. Users can define and view connections between their personal data. A uniform resource identifier (URI) is used to name all individual information objects of interest to the user. These can then be annotated, linked to other objects, viewed and retrieved [26]. RDF is used to represent the data and to record the relationships between the objects. The data is extracted from applications and stored in an in-memory database. Capabilities are provided for users to browse their personal information in one location such that information from different applications and applications such as email, address book, documents hierarchies and the web are brought together in a single view. The user can also add properties to capture any attributes or relationships between the information. These properties can be used as query arguments, for metadata-b-ased browsing, or as relational links to support the associative browsing as in the World Wide Web. A search is also offered as an alternative to the task-specific starting points provided. Multiple views of the same object are offered to allow the user to use an appropriate view based on their task. Views in the system can also be customised using view prescriptions, which are collections of RDF statements describing how a display region should be divided up and which constants and related objects should be shown in the subdivisions. Items could be grouped into collections, and views like calendar view and menu view are provided especially for these. In addition the lens view is provided to allow customization of presentation of objects, for example, to show certain properties. The user can view email messages and select people to view data related to them.
704
G. Mosweunyane, L. Carr, and N. Gibbins
5.4 Gnowsis The Gnowsis system [5, 6] is a semantic desktop prototype which aims to integrate desktop applications and data managed on desktop computers using Semantic Web technology. Desktop resources are treated as Semantic Web resources. A data integration framework is employed to extract information on the fly from common applications. The data and relationships between resources are then represented as RDF. Semantic Web interfaces are added to common desktop applications, allowing the users to browse their desktop like a small personal Semantic Web. To relate information to the user's personal view of the world the Personal Information Model (PIMO) approach is used. The PIMO framework is made up of six components. PIMO Basic defines the basic language constructs and the superclass “Thing" of other classes. A domain-independent ontology containing subclasses of Thing is defined in PIMO Upper, while PIMO Mid integrates various domain ontologies and provides classes for Person, Project, Company etc. The domain model component describes a concrete domain of interest of the user. The user can also extend the above-mentioned models for personal use in PIMO User. Gnowsis now tries to incorporate web 2.0 features to the desktop by having users import their tags from tagging websites such as del.icio.us and Flickr and integrate them into their PIMO ontology [5]. 5.5 Differences with this Research These systems differ from our approach in that they focus on integrating documents as part of a bigger focus of integrating web resources such as email, documents, and desktop applications and as such only help when one is looking for associated information items. On the other hand our idea is to integrate documents and improve navigation of desktop document. They are however similar in that they utilize attribute values to build indexes and utilize them for browsing associations.
6 Conclusions and Future Work The implementation utilizes terms that represent document identity and context and might be useful for recognizing and retrieving documents. The interlinked nature of this information is presented to end users to utilize in browsing documents. The browsing structure provided is expected to integrate data (documents) in a different manner to what the users are used to, and provide an interface which will expose data which will have otherwise been “hidden” in a person's data collection. The navigation structure provided through the interface is evaluated against system-provided navigation of the file system on the Microsoft Windows system using locating and associating tasks. The efficiency of the two environments is measured by its performance in relation to the users’ speed and effort expended in locating the documents in the tasks given. Time taken to locate documents in the given tasks is used as a measure of these attributes. The results show that DeFiBro performs reasonably better than Windows file browsing for tasks involving locating files at deeper levels (levels 3 and 4 in the hierarchy) than at shallow levels (levels 1 and 2). The approach adopted for the evaluation has mainly been concerned with whether and
A Tag-Like, Linked Navigation Approach
705
how the system meets the user's needs, which is the approach usually adopted in information seeking evaluations [28]. Although time was used as the basis for evaluation, it may not accurately reflect the actual benefit of the exploratory interface, as it has been discovered before that a longer time might indicate more beneficial browsing in terms of discovering relevant material [29]. An extended user evaluation with the user's hierarchy and follow-up over a longer time might then be more appropriate. In addition other evaluation factors such as simplicity and efficiency of the Windows system and the implemented system have to be considered to make a fully informed decision on which system is better. Other further work include use of indexes already created by operating systems and their applications, use of extended attributes and use of the indexes to refine Google results.
References 1. Edmunds, A., Morris, A.: The Problem of Information Overload in Business Organizations: A review of the Literature. Intl. J. of Info. Mangt. 20(1), 17–28 (2000) 2. Dong, X., Halevy, A.: A platform for Personal Information Management and Integration. In: Conference on Innovative Data Systems, January 4-7, pp. 119–130 (2005) 3. Kelly, L.: Context and Linking in Retrieval from Personal Digital Archives. In: ACM SIGIR Conference on Research and Development in IR, pp. 899–899 (2008) 4. Gemmell, J., Bell, G., Lueder, R., Drucker, S., Wong, C.: MyLifeBits: Fulfilling the Memex Vision. In: ACM Multimedia 2002, Juan Lens Pins, France, December 1-6 (2002) 5. Sauermann, L., Grimmes, G.A., Kiesel, M., Fluit, C., Maus, H., Heim, D., Nadeem, D., Horak, B., Dengel, A.: Semantic desktop 2.0: The Gnowsis Experience. In: 5th International Semantic Web Conference, Athens, GA, USA (2006) 6. Sauermann, L., Schwarz, S.: Introducing the Gnowsis Semantic Desktop. In: International Semantic Web Conference (2004) 7. Henderson, S.: How do People Organize their Desktops? In: CHI 2004 Extended Abstracts on Human Factors in Computing Systems, pp. 1047–1048 (2004) 8. Bruza, P., van der Weide, T.: Two Level Hypermedia: An Improved Architecture for Hypertext. In: Tjoa, A., Wagner, R. (eds.) Proc. Database and Expert System Applications Conference, pp. 76–83. Springer, Heidelberg (1990) 9. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Intro to WordNet: An On-line Lexical Database. Intl. J. of Lexicography 3(4), 235–244 (1990) 10. Kobilarov, G., Dickinson, I.: Humboldt: Exploring Linked Data. In: Workshop (Linked Data on the Web 2008) at WWW (April 2008) 11. Faichney, J., Gonzalez, R.: Goldleaf Hierarchical Document Browser. In: User Interface Conference Proceedings, Second Australasian, pp. 13–20 (2001) 12. Lucas, P., Schneider, L.: Workscape: A Scriptable Document Management Environment. In: ACM CH1 1994 Conference Companion (1994) 13. Mander, R., Salomon, G., Wong, Y.Y.: A ‘Pile’ Metaphor for Supporting Casual Organization of Information. In: ACM CHI 1992, pp. 627–634 (1992) 14. Robertson, G., Czerwinski, M., Larson, K., Robbins, D.C., Thiel, D., van Dantzich, M.: Data Mountain: Using Spatial Memory for Document Management. In: ACM UIST 1998, pp. 153–162 (1998)
706
G. Mosweunyane, L. Carr, and N. Gibbins
15. Henderson, S.: Genre, Task, Topic and Time: Facets of Personal Digital Document Management. In: Proceedings of the 6th ACM SIGCHI New Zealand Chapter’s International Conference on Computer-Human Interaction: Making CHI Natural CHINZ 2005, Auckland, New Zealand, pp. 75–82 (2005) 16. Ravasio, P., Schar, G., Krueger, H.: In Pursuit of Desktop Evolution: User Problems and Practices with Modern Desktop Systems. ACM Transactions on Computer-Human Interaction 11(2), 156–180 (2004) 17. Introduction: The Desktop Metaphor and New Uses of Technology. In: Kaptelinin, V., Czerwinski, M. (eds.) Beyond the Desktop Metaphor: Designing Integrated Digital Work Environments, pp. 19–48. MIT Press, Cambridge (2007) 18. Henderson, S.: How Do People Organize their Desktops? In: CHI 2004 Extended Abstracts on Human Factors in Computing Systems, pp. 1047–1048 (2004) 19. Beyond Lifestreams: The Inevitable Demise of the Desktop Metaphor. In: Freeman, E., Gelernter, D. (eds.) Beyond the Desktop Metaphor: Designing Integrated Digital Work Environments, pp. 19–48. MIT Press, Cambridge (2007) 20. MINDSWAP Research Group, SWOOP - A Hypermedia-Based Featherweight OWL Ontology Editor, http://www.mindswap.org/2004/SWOOP/ 21. Berners-Lee T., Masinter L., McCahill M., (eds).: Uniform Resource Locators, http://www.ietf.org/rfc/rfc1738.txt 22. Golemati, M., Katifori, A., Giannopoulou, E.G., Daradimos, I., Vassilakis, C.: Evaluating the Significance of the Windows Explorer Visualization in Personal Information Management Browsing Tasks. In: 11th International Conference Information Visualization, pp. 93–100 (2007) 23. Dong X., Halevy A. Y., Nemes E., Sigundsson S.B., Domingos P. SEMEX: Toward Onthe-Fly Personal Information Integration. In: Workshop on Information Integration on the Web (2004) 24. Cai, Y., Dong, X.L., Halevy, A., Liu, J.M., Madhavan, J.: Personal Information Management with SEMEX. In: SIGMOD 2005, pp. 921–923 (2005) 25. Karger, D.R., Jones, W.: Data Unification in Personal Information Management. Communications of the ACM 49(1), 77–82 (2006) 26. Karger, D.R., Bakshi, K., Huynh, D., Quan, D., Sinha, V.: Haystack: A Customisable General-Purpose Information Management Tool for End Users of Semistructured Data. In: Proceedings of the 2003 CIDR Conference (2003) 27. Kelly, D.: Evaluating Personal Information Management Behaviors and Tools. Communications of the ACM 49(1), 84–86 (2006) 28. Kules, W., Wilson, M.L., Schraefel, M.C., Shneiderman, B.: From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web. Technical Report 1516920080208, School of Electronics and Computer Science, University of Southampton (2008) 29. Capra, R., Marchionini Oh, G.J., Stutzman, F., Zhang, Y.: Effects of Structure and Interaction Style on Distinct Search Tasks. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 442–451 (2007) 30. Cross, V.: Tversky’s Parameterized Similarity Ratio Model: A Basis for Semantic Relatedness. In: Annual Meeting of the North American Fuzzy Information Processing Society, NAFIPS 2006, pp. 541–546 (2006)
Heuristic Approach to Solve Feature Selection Problem Rana Forsati1, Alireza Moayedikia1, and Bahareh Safarkhani2 1 Department of Computer Engineering, Islamic Azad University, Karaj Branch, Karaj, Iran 2 Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran [email protected], [email protected], [email protected]
Abstract. One of the successful methods in classification problems is feature selection. Feature selection algorithms; try to classify an instance with lower dimension, instead of huge number of required features, with higher and acceptable accuracy. In fact an instance may contain useless features which might result to misclassification. An appropriate feature selection methods tries to increase the effect of significant features while ignores insignificant subset of features. In this work feature selection is formulated as an optimization problem and a novel feature selection procedure in order to achieve to a better classification results is proposed. Experiments over a standard benchmark demonstrate that applying harmony search in the context of feature selection is a feasible approach and improves the classification results. Keywords: Feature Selection, Meta-heuristic Optimization, Harmony Search.
1 Introduction The process of selecting the best subset of d features from a set of D features which maximizes the classification accuracy is known as feature selection. Those features which are useless, redundant, or of the least possible use is likely to be removed. It is well known that, for a problem of nontrivial size, the optimal solution is computationally intractable due to the resulting exponential search space and, hence, all of the available algorithms mostly cause suboptimal solutions. [1] Feature selection is necessary in applications like data mining, machine learning, pattern recognition and signal processing, datasets have vast amount of features In such cases. The Feature selection was previously introduced as the process of choosing a subset of features from the original set of features forming patterns in a given dataset [2]. In data mining applications feature selection plays a vital role as part of the preprocessing step where the learning of knowledge or patterns is done by a suitable set of extracted features. The feature selection is called a filter approach if the features are selected regardless of the learning algorithm, interested readers to filter approach can refer to [3], which combines filter and genetic approach together. If any learning algorithm is used the algorithm is called wrapper approach. The problem with the filter approach is that the optimal set of features may not be independent of the H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 707–717, 2011. © Springer-Verlag Berlin Heidelberg 2011
708
R. Forsati, A. Moayedikia, and B. Safarkhani
learning algorithm or classifier. The wrapper approach provides a better solution, however it is computationally expensive as each candidate feature subset has to be assessed by executing a learning algorithm on that subset. When used with GAs, the wrapper approaches become even more prohibitively expensive [3, 4]. Evolutionary algorithms, which are stochastic methods based on a search model, define a global function and try to optimize its value by traversing the search space. A common factor shared by the evolutionary algorithms is that they combine rules and randomness to imitate some natural phenomena [13]. Evolutionary algorithms such as PSO-based approaches (PSO) [12, 2] and Genetic Algorithms (GA) [1] are general high-level procedures that coordinate simple heuristics and rules to find good approximate solutions for computationally difficult combinatorial optimization problems. These methods have been previously employed to solve the problems of Feature Selection and results showed that these methods are suitable for achieving comparable accuracies [1-4]. In [1] a new feature selection approach based on Genetic algorithms was presented, which a local improvement procedure has been used, in order to improve the accuracy. Another feature selection algorithm was proposed in [5], which demonstrates a feature subset selection method based on the wrapper approach using neural networks (NNs). One of the most important aspects of this algorithm is the automatic determination of NN architectures during the FS process, in other words, the architecture of Neural Network classifier is not user-specified network architecture. Also a constructive methodology is used in this algorithm which consisting of correlation information in selecting features and identifying NN architectures. We can consider this algorithm as constructive approach for FS (CAFS). To encourage the search strategy for selecting more distinctive features correlation information in CAFS is utilized, that increases the accuracy of NNs. Such an encouragement will decline redundancy of information resulting in compact NN architectures. SAGA is another feature selection algorithm was presented in [6]. One of the common problems in feature selection problems, is entrapment in a local optimum that SAGA has solved the this by combining the ability to avoid being trapped in a local minimum of simulated annealing with the very high rate of convergence of the crossover operator of genetic algorithms, the strong local search ability of greedy algorithms and the high computational efficiency of generalized regression neural networks. In their work, algorithm consistently generates better feature subsets compared to existing search algorithms within a predefined time limit and keeps improving the quality of selected subsets as the algorithm runs. (SAGA) considers GRNN as classifiers to evaluate candidate feature subset solutions. In SAGA features are normalized by mapping them between 0 and 1. In fact SAGA is a hybrid of a number of wrapper methods—a SA, a GA, a GRNN and a greedy search algorithm. According to [7] the best two independent features do not have to be the two best. The sequential forward selection method (SFS), Forward selection adds in turn, the most significant attribute from the candidate set one at a time, until the selected set is a reduct. Backward elimination is the reverse, starting with the full attribute set and removing attributes incrementally [8]. Variety of validation methods was used in [9] to measure the accuracy of the algorithm. that 3 various versions of SS-based heuristic, sequential SS heuristic and one sequential SS with greedy combination (SSS-GC), and sequential SS with reduced greedy combination (SSS-RGC), which differs in
Heuristic Approach to Solve Feature Selection Problem
709
terms of their solution combination strategy was presented. The third one is Parallel SS method (PSS). They developed a Parallel Scatter Search meta-heuristic for solving the Feature Subset Selection Problem in classification. Given a set of instances characterized by several features the classification problem consists of assigning a class to each instance. Feature Subset Selection Problem selects a relevant subset of features from the initial set in order to classify future instances. In other words two methods were proposed for combining solutions in the Scatter Search metaheuristic. These methods provide two sequential algorithms that are compared with a recent Genetic Algorithm and with a parallelization of the Scatter Search. This parallelization is obtained by running simultaneously the two combination methods. Parallel Scatter Search presents better performance than the sequential algorithms. Harmony Search (HS) [10], as a derivative-free algorithm, is a new meta-heuristic optimization method imitating the music improvisation process, where musicians improvise the pitch of their instruments searching for a perfect state of harmony. Since its inception, during the last several years, HS has been vigorously applied to a wide variety of practical optimization problems [15-17]. Several advantages of HS with respect to traditional optimization techniques have been presented in [10]. In fact, in optimization problems, we want to search the solution space and in harmony search it can be done more efficiently. Since stochastic optimization approaches are suitable for avoiding convergence to a locally optimal solution, these approaches could be used to find a globally optimal solution. Typically the stochastic approaches take a large amount of time to converge to a globally optimal partition. By modeling Feature Selection as an optimization problem, we investigate the harmony search algorithm in this problem. Accordingly, in this paper a novel framework for using HS in Feature Selection problem is presented. To demonstrate the effectiveness of algorithm, we have applied the proposed algorithms on standard data sets and the high level results compared to the other algorithms. The rest of this paper is organized as follows. Section 2 provides a explanations about the harmony search algorithm. Section 3 provides a detailed description of the proposed algorithm. Section 4 presents a performance evaluation of the proposed algorithm and comparison with the other algorithms. Finally, Section 5 summarizes the main conclusions of this work.
2 The Basic Harmony Search Algorithm A harmony search mimics the music improvisation process [10], and also is popular as a meta-heuristic optimization method. In music improvisation, musicians improvise their instruments’ pitches searching for a perfect state of harmony. HS works as follow: Step 1: Start and initialization of the problem and HS parameters Step 2: Harmony memory (HM) initialization Step 3: New Harmony vector creation (NHV) Step 4: Update harmony memory Step 5: Check the stopping criterion These steps are described in the next subsections.
710
R. Forsati, A. Moayedikia, and B. Safarkhani
2.1 Initialization of Problem and Algorithm Parameters
r In Step 1, the optimization problem is specified as follows: Minimize f ( x ) subject to: r g i ( x) ≥ 0
i = 1,2, K , M
r h j ( x ) = 0 j = 1,2, K, P
LB k ≤ x k ≤ UB k k = 1,2, K , N
r Where, f ( x ) is the objective function, M is the number of inequality constraints and P is the number of equality constraints of HM. In addition, the parameters of the HS are initialized in this step. These parameters include the harmony memory size (HMS), or the number of solution vectors in the harmony memory; harmony memory considering rate (HMCR); pitch adjusting rate (PAR); and the number of improvisations (NI), or stopping criterion. The harmony memory (HM) is a memory location where all the solution vectors (sets of decision variables) are stored. This HM is similar to the genetic pool in the GA. The HMCR, which varies between 0 and 1, is the rate of choosing one value from the historical values stored in the HM, while (1HMCR) is the probability of randomly selecting one value from the possible range of values. 2.2 Initialization of Harmony Memory In Step 2, the HM matrix is filled with as many randomly generated solution vectors as the HMS: ⎡ x 11 ⎢ ⎢ x 12 ⎢ M HM = ⎢ ⎢ x H M S −1 ⎢ 1 ⎢ x HM S ⎣ 1
x x
1 2 2 2
M x 2H M S − 1 x 2H M S
K L M K K
x 1N x
−1 2 N −1
x 1N x
2 N
M
M
x NH M− 1S − 1 x NH M− 1S
x NH M S − 1 x NH M S
⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦
The initial harmony memory is generated from a uniform distribution in the ranges [LBi , UBi], where 1 ≤ i ≤ N. This is done as follows:
xij = LBi + r× (UBi − LBi ) j = 1, 2, …, HMS Where, r ~ U(0,1) and U is a uniform random number generator. 2.3 Improvising a New Harmony ‘improvisation’ refers to generating a new harmony so called NHV . A new harmony uur vector, x' = ( x'1 ,x'2 ,...x' N ) , is generated based on three rules: 1-memory consideration 2-pitch adjustment 3-random selection
Heuristic Approach to Solve Feature Selection Problem
711
In the memory consideration, the value for a decision variable is randomly chosen from the historical values stored in HM with the probability of HMCR. Every component obtained by the memory consideration is examined to determine whether it should be pitch-adjusted. This operation uses the PAR parameter, which is the probability of pitch adjustment. Variables which are not selected for memory consideration will be randomly chosen from the entire possible range with a probability equal to uur (1- HMCR). If the new harmony vector, x' = ( x'1 ,x'2 ,...x' N ) , has better fitness function than it will be replaced by the worst one in HM, the new harmony is included in the HM and the existing worst harmony is excluded from the HM. 2.4 Checking Stopping Criterion HS is terminated when the stopping criterion (e.g. maximum number of improvisations) has been met; otherwise, Steps 3 and 4 are repeated. Recently, the superior performance of the HS algorithm is demonstrated by applying it on different problems. The main reason for this success is the explorative power of HS which expresses its capability to explore the search space. The evolution of the expected population variance over generations provides a measure of the explorative power of the algorithm. For theoretical analysis of exploratory power of HS, the interested reader can refer to [13]. In recent years, some researchers have improved the original harmony search algorithm. In [14] the HS algorithm is improved by using varying parameters. The HMCR and PAR parameters of HS help the method in searching for globally and locally improved solutions, respectively. PAR and bw have a profound effect on the performance of HS. Thus, fine tuning of these two parameters is very important. Between these two parameters, bw is more difficult to be tuned because it can take any value from (0, ∞). To address the shortcomings of HS, a new variant of HS, called the Improved Harmony Search (IHS), is proposed [11]. IHS dynamically updates PAR according to the following equation,
PAR ( t ) = PARmin +
( PARmax − PARmin ) × t NI
(1)
Where, t is the generation number, PAR(t) is the pitch adjusting rate for generation t, PARmin and PARmax are the minimum adjusting rate and the maximum adjusting rate, respectively. In addition, bw is dynamically updated as follows:
bw ( t ) = bwmax e
⎛ ⎛ bwmin ⎜ ln ⎜⎜ ⎜ ⎝ bwmax ⎜ NI ⎜ ⎜ ⎝
⎞ ⎞ ⎟⎟ ⎟⎟ ⎠ ×t ⎟ ⎟ ⎟ ⎠
(2)
Where, bw(t) is the bandwidth for generation t, bwmin is the minimum bandwidth and bwmax is the maximum bandwidth.
712
R. Forsati, A. Moayedikia, and B. Safarkhani
3 Proposed Algorithm In this section, we propose our Harmony Search based feature selection algorithm namely HSF select, that allows us to formalize feature selection as an optimization problem. HSF select uses random selection and refines the solutions at each iteration. We must first model feature selection as an optimization problem that extracts the optimal features. When a general purpose optimization meta-heuristic is used for feature selection problem, a number of important design decisions have to be made. Predominantly, these are the problem representation and the objective function. Both of these can have an important effect on optimization performance and thus on the quality of extracted solutions. 3.1 Solution Encoding In feature selection problem, a harmony is referred to a string with D binary digits, in which, values 1 and 0 meaning selected and unselected, respectively. For instance, consider Y {1,4,2,7,8,0,5} as an instance of available features, then the following harmony ,00101000 means that the third and fifth features are selected, which X is a subset of selected features X {2,8}. Our solution space also is indicated as solution table is represented as follows:
1
0
1
1
0
0
0
0
1
0
0
1
1
0
1
0
0
1
1
0
0
1
0
1
0
Fig. 1. Solution representation with harmony memory size of 5
3.2 Initial Population Initially the HM is randomly initialized known as solution space and the fitness of each solution will be computed. The generation of the random initial population is straightforward, as shown below. The function Rand () generates a random double type numbers within 0 and 1 interval. If the generated number is lower than some prespecified float type number, then the current component of the solution will be assigned to 0 else 1.
Heuristic Approach to Solve Feature Selection Problem
713
Random Initial Population procedure: For (int i=0; i < Row ; i++) For (int j=0; j < Col; j++) If (rand.NextDouble()>0.5) HM[i,j]=1; Else HM [i, j] = 0; Where, HM [i, j] (Harmony Memory), represents the jth component of the ith solution , Row is the total number of solutions in in harmony memory (HMS), in our case Row is equal to 5, and finally Col is the total number of available features. Col size depends on the number of features within a dataset.
1
0
0
1
1
0
1
1
1
0
0
0
1
1
1
0
1
1
0
0
Fig. 2. Initial Memory
3.3 Fitness Evaluation, Replacement and Stopping Condition The evaluation is straightforward since a solution represents a selected feature subset, X, and the evaluation function is clear. The fitness of a solution S is defined as: (Accuracy) = (correctly classified samples / total samples) × 100%. To produce a new NHV, first for each component of the NHV a probability value is randomly created is known as NHV_Prob, then this random value is compared to the HMCR, which is a constant double variable, if NHV_Prob is bigger than HMCR, the procedure counts the number of ones and zeros of the ith column, to fill the ith component of NHV with the most frequent binary variable. If NHV_Prob is less than HMCR, 0 or 1 is randomly chosen. In addition, PAR has a random probability value within the range of [0, 1]. If the random generated number is bigger than PAR value, then component’s binary value will be inversed. If the generated solution that is called New Harmony vector (NHV) is superior to at least one of the solutions with minimum fitness, it will be replaced by it; if NHV it has the minimum fitness among all of the solutions it will be discarded. The HM stops when the number of generations is reached a pre-set maximum generation T.
714
R. Forsati, A. Moayedikia, and B. Safarkhani
Fig. 3. Introduce NHV
4 Results and Discussion The data sets in this study were obtained from the UCI Machine Learning Repository. Table1 illustrates the format of 4 datasets. If the number of features is between 8 and 19, the sample groups can be considered small; these datasets include Wine, Pima Indian diabetes (PM) and Breast. If the number of features is between 20 and 49, the sample test groups are medium scale problems; this includes Ionosphere problems. Selected feature subsets were classified by the 1-NN method. Table 2 shows that our proposed algorithm, outperformed some previous related works, also some dynamic parameters were found by , fine-tuning method , it simply means that , the values were continuously changed until the accuracy reached the highest amount, at certain values. Table 1. A brief overview of the datasets Dataset
Size(test/train)
178(60/118) 768(268/500)
Number of Classes 3 2
Number of features 13 8
Classifier Method 1-NN 1-NN
Wine Pima Indian (PM) Ionosphere Breast
351(234/117) 699(181/518)
2 2
34 9
1-NN 1-NN
Table 3 shows that the proposed algorithm, outperformed some datasets in [12], In which k demonstrate a value, representing the number of selected features, and below each, k, the fitness of the related dataset in [12] is expressed. But the obtained accuracy in RHS and the number of features which caused the fitness is depicted in 2 last columns of table 3.
Heuristic Approach to Solve Feature Selection Problem
715
Table 2. Experimental results of random version [units in %] Dataset
Par_min
Par_max
HMCR
Fitness
No. Features
Breast
0.44
0.99
0.56
98.89
5
Ionosphere
0.41
0.99
0.51
91.45
19
0.6
0.99
0.43
75.37
4
0.01
0.99
0.43
96.66
5
Pima Indian(PM) Wine
Table 3. A comparison study between PSO-based algorithm proposed in [12] and RHS in 2 datasets of Breast and Ionosphere [units in %] Dataset Name Breast
Ionosphere
k=3 0.959± 0.003
k=4 0.962± 0.006
PSO k=5 0.960± 0.004
k=6 0.960± 0.007
k=7 0.960± 0.007
Fitness 98.89
0.861± 0.009
0.862± 0.008
0.862± 0.004
0.850± 0.006
0.878± 0.014
91.45
RHS Features 5
19
As it is clear in Table 3, the highest accuracy in RHS has exceeded, the highest accuracy in [12],, in which 5 features in Breast dataset was selected, this story goes for Ionosphere dataset but the number of features are 19. Also table 4 shows that Random Harmony search was able to outperform wine dataset in [1]. In RHS we achieved 96.66 % of accuracy with 5 features, but the other algorithm reached at the best situation 95.51% with 5 and 8 selected features, respectively. Table 4. A comparison study between RHS and HGA proposed in [1] in Wine dataset [units in %]
Wine
II-Seok Oh, et al Selected Optimum features 3 93.82 5 95.51 8 95.51 10 92.70
RHS No. Features Fitness
5
96.66
Table 5 demonstrates a brief comparison between SS-based methods proposed in [9] and RHS algorithm, in which the number of features in SS-based methods are mentioned as continuous values.
716
R. Forsati, A. Moayedikia, and B. Safarkhani
Table 5. Comparison of SS based methods and the proposed algorithm RHS version on Pima Indian diabetes (PM) dataset RHS Accuracy 75.37
Features 4
SSS-GC Accuracy 0.679±0.024
Features 4.1±0.99
SSS-RGS Accuracy 0.677±0,024
Features 4.0±0.94
PSS Accuracy 0.681±0.024
Features 4.2±1.14
5 Conclusion In this paper we proposed a Harmony search method in order to solve the feature selection problem. Our algorithm was based on random generation of numbers between 0 and 1. As it can be seen this procedure was able to outperform some of the related works, in some specific datasets. Chaotic generations of numbers can be taken into account as an alternative for random number generation between 0 and 1.
References [1] Il-S. Oh, J.S., Lee, B.R.: Moon,: Hybrid Genetic Algorithms for Feature Selection. IEEE Trans. On Pattern Analysis and Machine Intelligence 26(11) (November 2004) [2] Wang, X., Teng, X., Xia, W., Jensen, R.: Feature Selection Based on Rough Sets and Particle Swarm Optimization. Pattern Recognition Letters 24, 459–471 (2007) [3] Lanzi, P.: Fast Feature Selection with Genetic Algorithms,: A Filter Approach. In: Proceeding of IEEE International Conference on Evolutionary Computation, pp. 537–540 (1997) [4] Sikora, R., Piramuthu, S.: Framework for Efficient Feature Selection in Genetic Algorithm-based Data Mining. European Journal of Operational research 180, 723–737 (2007) [5] Kabir, M., Islam, M., Murase, K.: A New Wrapper Feature Selection Approach using Neural Network. Neurocomputing 73, 3273–3283 (2010) [6] Gheyas, I.A., et al.: Feature Subset Selection in Large Dimensionality Domains. Pattern Recognition 43, 5–13 (2010) [7] Cover, T.M.: The Best two Independent Measurements are not the two Best. IEEE Trans. Systems Man Cybern. 4(2), 116–117 (1974) [8] Zhang, H., Sun, G.: Feature Selection using Tabu Search Method. Pattern Recognition 35, 701–711 (2002) [9] Garcia, F.C., Garcia Torres, M., Batista, B.M., Moreno, J.A., Marcos, P.J.: Solving Feature Subset Selection Problem by A Parallel Scatter Search. European Journal of Operational Research 169(2), 477–489 (2006) [10] Lee, K., Geem, Z.: A New Meta-heuristic Algorithm for Continuous Engineering Optimization. Harmony Search Theory and Practice, Computer Methods in Applied Mechanics and Engineering 194, 3902–3933 (2005) [11] Omra, M.G.H., Mahdavi, M.: Global-best Harmony Search. Applied mathematics and computing 198, 643–656 (2008) [12] Unler, A., Murat, A.: A Discrete Particle Swarm Optimization Method for Feature Selection in Binary Classification Problems. European Journal of Operation Research 206, 528–539 (2010)
Heuristic Approach to Solve Feature Selection Problem
717
[13] Das, S., Mukhopadhyay, A., Roy, A., Abraham, A., Panigrahi, B.K.: Exploratory Power of the Harmony Search Algorithm. Analysis and Improvements for Global Numerical Optimization. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 41(1), 89–106 (2011) [14] Mahdavi, M., et al.: An Improved Harmony Search Algorithm for Solving Optimization Problems. Applied Mathematics and Computation 188, 1567–1579 (2007) [15] Forsati, R., Mahdavi, M.: Web Text Mining Using Harmony Search, Recent Advances. In: Harmony Search Algorithm, pp. 51–64 (2010) [16] Forsati, R., Meybodi, M.R., Mahdavi, M., Ghari Neiat, A.: Hybridization of K-Means and Harmony Search Methods for Web Page Clustering. In: Web Intelligence, pp. 329– 335 (2008) [17] Forsati, R., Mahdavi, M., Haghighat, A.T., Ghariniyat, A.: An Efficient Algorithm for Bandwidth-delay Constrained Least Cost Multicast Routing. In: Canadian Conference Electrical and Computer Engineering, CCECE 2008, pp. 1641–1646 (2008)
Context-Aware Systems: A Case Study Bachir Chihani1,2, Emmanuel Bertin1, Fabrice Jeanne1, and Noel Crespi2 1
Orange Labs 42, rue des Coutures, 14066 Caen, France {bachir.chihani,emmanuel.bertin, fabrice.jeanne}@orange-ftgroup.com 2 Institut Telecom, Telecom SudParis, CNRS 5157 9 rue Charles Fourier, 91011 Evry, France [email protected]
Abstract. Context aware systems are a promising approach to facilitate dailylife activities. Concerning communication services, business users may be sometimes overloaded with work so that they become temporally unable to handle incoming communications. After having surveyed the challenges to build context-aware systems, we introduce here HEP, a system that recommends communication services to the caller based on the callee’s context. HEP’s main context source is the usage history of the different communication services as well as the users’ calendars. It has been prototyped and tested at Orange Labs. Keywords: Ubiquitous Computing, Context-Awareness, Recommendation Systems, Communication.
1 Introduction Nowadays ubiquity is everywhere, fully embedded with smart devices integrating intelligence for processing various kinds of data. In such an environment, the interaction and management of all various devices that a user may hold will be a tough task. Context-aware systems are an emerging solution to alleviate such tasks; they will be in charge of supervising the way users interact with the ubiquitous environment for automating users’ repetitive actions. For example, a context-aware system can detect that a user never responds to phone calls while driving, and thus propose to transfer automatically all incoming calls to his/her voice box whenever he/she is driving. Lots of definitions have been proposed to define context and context-awareness clearly. However, most researchers agree with A. Dey [Dey00] when he describes the context as: “Any information that can be used to characterize the situation of entities (i.e., whether a person, place or object) that are considered relevant to the interaction between a user and an application, including the user and the application themselves.” Context information may be classified according to the described entity. The context of a user may be a combination of various entities such as his/her identity, activity, H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 718–732, 2011. © Springer-Verlag Berlin Heidelberg 2011
Context-Aware Systems: A Case Study
719
location, mood, etc.; his/her social context may be the nature of his/her relationship with other persons (e.g. family member, colleague, friend…); his/her physical context might include for instance the lighting level of the location where he/she is standing. The context of a network may be its QoS (Quality of Service) parameters like RTT (Round-trip Time). The context of a device may be its capabilities, display features or battery level. Due to the computation complexity of managing all possible pieces of context information, context-aware systems should choose a subset of this context information that is relevant to the application. For example, a context-aware system aiming to choose the suitable network access (e.g. EDGE, Wi-Fi) based on the user’s context should have as fundamental context information such as the user identity, location, time, the available access network and its QoS parameters. A context management system (CMS) is defined as follow in [Dey01]: “A system is context-aware if it uses context to provide relevant information and/or services to the user, where relevancy depends on the user's task”. Different approaches have been proposed for the development of CMSs. From the conceptual viewpoint, CMSs are mainly based on the Producer-Consumer design pattern where context sources (e.g., sensors) play the role of Producers, and contextaware applications play the role of Consumers. From the implementation viewpoint, CMSs can be classified into centralized or distributed architecture. In the centralized architecture, a central point often named broker [Che03b] is introduced between the producers and consumers. All context requests are handled by the broker, which forwards it to the right component. Producers and consumers are then decoupled, while in the distributed architecture, the different components have to know each other (e.g., by regularly sending multicast or broadcast messages for announcing themselves), like in the middleware-based CMS [Gu04b]. Many challenges face the field of context-awareness ranging from the collection of contextual information with the use of sensors (e.g. calendar, light, battery charge, etc.), to the modeling of context that can be anything (e.g. GPS location or a street address, time, etc.) and reasoning about it to produce an adaptive behavior (e.g. automated call transferring, the proposal of a meeting session, etc.). Starting from a real use case, our work shows how the challenges can be addressed when only relevant subset of the user’s context is chosen. In this paper, we first present the current researches in context-aware systems, by analyzing the ongoing works and the key issues for designing context-aware systems. We then introduce the major application fields of context-awareness. We finally present a case study on HEP, a centralized context-aware recommendation system designed, prototyped and experimented at Orange Labs, that takes as context information the usage of communication services.
2 Designing Context-Aware Systems 2.1 General Overview From the functional viewpoint, context-aware systems can be represented as a layered framework [Bal07] [Lok06] (figure 1) composed from bottom to up by: sensors, raw
720
B. Chihani et al.
data retrieval, preprocessing, storage/management, and application layer. The Context Management System (CMS) is responsible of retrieving raw data from sensors, abstracting and combining the sensed data into high level context, and then of making it available for context-aware applications. The first layer (Sensors) is a collection of sensors responsible of retrieving raw data from the user environment (e.g. user device, social network, or used access network). Context sensors can be classified into: • Physical sensors or hardware sensors that are able to capture physical measurements like light, audio, location, temperature; • Virtual sensors that are able to sense data from software applications or services (e.g. sensing calendar entries); • Logical sensors that are able to aggregate information from different sources (combine physical, virtual sensors with additional sources like databases) to perform complex tasks. The second layer (Raw data retrieval) makes use of specific API or protocols to request data from the sensor layer. These queries must be as far as possible implemented in a generic way, making possible to replace sensors (e.g. replacing a RFID system by a GPS one).
Fig. 1. Layered framework for context-aware systems
The third layer (Preprocessing) is responsible for reasoning and interpreting contextual information. It transforms the information returned by the underlying layer to a higher abstraction level (e.g. it transforms a GPS position to a position like at home or at work). Not only sensed or deduced data have to be modeled, but also meta-data describing them (e.g. accuracy and recall, or life cycle information). The fourth layer (Storage and Management) organizes the gathered data and make them available to 3rd parties applications in a synchronous or asynchronous way. In the first mode, the 3rd party applications use remote method calls for polling the server for changes. In the second mode, they subscribe to specific events of interest, and are notified when the event occurs (for example by a call back).
Context-Aware Systems: A Case Study
721
The fifth layer (Application) is where the reactions to context changes are implemented (e.g. displaying text in a higher color contrast if illumination turns bad). We will now highlight some key aspects of CMS. 2.2 Context Modeling Related works in the development of context-aware systems have tried different approaches for modeling the used context information [Tru09]. Most of these works distinguished between context information modeling and implementation technologies. For example, UML may be used to model the context information while XML is used to describe data instances. Key-Value data structure (where the key is the context, and the value is the corresponding sensed information) is the simplest representation. However, it lacks of richness, and do not support interoperability and the representation of relation among context information. Key-Value has been used in some early works, for instance by Dey [Dey01b] and Yamabe [Yam05]. XML-based languages are interesting candidates for modeling context because they rely on a widely used standard that provides the possibility to hierarchically represent context and to abstract it from low to high level. Much work was done to propose a generic XML-based language for both context modeling and implementation. In [Ven10] and [Kna10], the authors propose ContextML which is an XML-based language for representing user context that has been developed as part of the C-CAST1 European Project [Zaf10]. In [Fer06], the authors propose an XML-based language named PPDL (Pervasive Profile Description Language) to describe profile of mobile peers and to support their interaction in a pervasive environment. The profile is enriched dynamically at runtime based on changes occurring in environment conditions. MDA (Model Driven Architecture) is another interesting approach for the development of CMS [She05]. It enables to create high level UML models or strictly speaking MOF (Meta-Object Facility) compliant models of the system, then based on these models, the implementation stubs are automatically generated alleviating tremendously the work of developers. UML-based modeling language offers the full power of object orientation (encapsulation, reusability, inheritance) and also design flexibility by separating the modeling of context and context awareness from the service components. ContextUML, as well as the SENSEI European Project [She05] are examples of such context modeling languages. The context knowledge base can then be represented in a relational database (e.g. MySQL), as in [Che04]. RDF [Rdf04] (Resource Description Framework) is a language for describing tagged oriented graphs. It is based on triplets (subject, predicate, and object). The subject is the described resource, the predicate represents a propriety type that can be applied to this resource, and the object represents data or another resource. Each triplet corresponds to an oriented arc tagged with the predicate, where the source node is the subject and the destination node is the object. RDF schema has been extensively used for context modeling [Tru08b]. Some vocabularies have been standardized on top of RDF to define context profiles like CC/PP [CCPP] (Composite Capability/Preference Profile) and UAProf [UAProf ] (User Agent Profile). They have been 1
http://www.ict-ccast.eu/
722
B. Chihani et al.
combined with other modeling languages like FOAF2 (Friend of a Friend) for modeling Person Organization, Group, Document and Project; vCard3 for modeling addresses and personal data, Basic Geo for modeling geo-spatial context, vCal4 for modeling events, ResumeRDF5 for modeling skills and expertise of team members, and the Time ontology for modeling temporal context. However, the RDF language suffers from some limitations for the reasoning aspects, and current work is more and more relying on ontology. Ontology is a formal and explicit description of concepts from a particular domain, and of the relationships between these concepts. It provides a vocabulary for representing domain knowledge and for describing specific situations in this domain. Using ontology for context modeling allows a semantic description of context. It enables then to share a common understanding of the context structure among users, devices and services. It also allows formal analysis of domain knowledge, i.e. reasoning using first order logic. OWL6 (Web Ontology Language) is an ontology language based on a RDF (Resource Description Framework) schema. It enables to define rich vocabulary and to describe complex ontologies. OWL ontologies have been extensively used for context modeling, for instance in [Alm06], [Gu04], [Ha07], [Che09], or CoBrA [Che03] ontology (COBRA-ONT). XML, RDF and OWL-based approaches are open and interoperable. Particularly, RDF and OWL offers the reuse of the common vocabularies while for XML there is no standard way for exchanging vocabularies. Associating RDF schema with OWL ontology can increase the expressiveness of the context description by drawing the relationship between a low level context information (e.g. the user is present in room 528) and high level one (e.g. the user is attending a meeting). As for UML, it is not directly compatible with XML/RDF/OWL, but it presents the advantage of being seamlessly integrated with MDE (Model Driven Engineering). This point is especially interesting when the whole CMS is designed using the MDE software approach. It is interesting to store historical context data because it can be used to establish trends and predict future context values. Relational databases are usually used for context storage as plenty of available libraries allow the serialization of XML, RDF or OWL data. 2.3 Quality of Context Context information can be retrieved from different kind of sensors having different level of reliability. Also, noise or failure of sensors can introduce imperfection on the sensed context. Other types of imperfections are ambiguity, imprecision, error. The notion of QoC (Quality of Context) aims to measure the imperfection of sensed information. A good example of QoC modeling is [McK09]. The authors propose an extendable UML-based model for context quality. They define three context levels (sensor, abstracted context and situation). For each level a set of quality parameters is defined. At the Sensor level: Precision (indicates the maximum deviation of a 2
http://www.foaf-project.org/ http://www.w3.org/Submission/vcard-rdf/ 4 http://www.imc.org/pdi/ 5 http://rdfs.org/resume-rdf/ 6 http://www.w3.org/TR/owl-features/ 3
Context-Aware Systems: A Case Study
723
measurement from the correct value), Accuracy (indicates error rate or frequency correctness of a sensed information), Frequency of sensor readings; At the Abstracted context layer: Fuzzy membership (quantify imperfections of vague context) used when fuzzy context filters are used during abstraction process, for context filters with clear boundaries (e.g. location) the Precision membership is derived from Precision readings, Reliability (error rate associated with a context event). A context Confidence is derived for each context event from a combination of context event quality parameters. At the Situation level, there is a Confidence quality parameter to assess the truthiness of the corresponding situation. This parameter value is calculated based on the used reasoning scheme (neural networks, Dempster Shafer, voting). 2.4 Context Reasoning Inferring new knowledge (e.g. transportation mean) from raw sensed data (e.g. GPS position) is important for context-awareness and adaptation to the user's context changes. But before being able to infer any new knowledge, some processing has to be done. Context processing can be divided into aggregation and interpretation. The former refers to the composition of raw context information either to gather all context data concerning a specific entity or to build higher-level context information. The later refers to the abstraction of context data into human readable information. The inference can be done with help of sophisticated reasoning techniques that relies mainly on context representation. For example, SPARQL-based semantic reasoning techniques can easily be done if the context representation technique is based on OWL. Ontology learning techniques can be used to derive new facts given a knowledge base of specific facts and an ontology describing concepts and relations among them. Machine learning techniques (e.g. Bayesian networks, fuzzy logic) can be used to construct higher level context attributes from sensed context. Combing both reasoning techniques can be interesting as demonstrated in [Van06]. Expert Systems (e.g. JESS, CLIPS) or Rule inference engines can also be used in context reasoning. Such reasoning systems inherit from forward-chaining inference the power of inferring knowledge (i.e. logical consequences) from sensed data (i.e. facts) and from backward-chaining inference the power of recognizing relevant context (i.e. facts). Knowledge might also be deduced using the Jena framework that provides ontology inference facility, and Jess (Java Expert System Shell) to implement forward-chaining inference. Jess is used when it is not possible to reason about context information with only ontology axioms, as described in [Ram09]. Reasoning techniques are not widely supported. Also, OWL representations are hardly manageable (implementation/integration) and reasoning on XML data or UML class diagrams is not very developed. Reasoning with logical expressions like in Expert Systems allows a rich description of situations, actions and knowledge derivation due to the use of logical connectives (and, or and not), implications, universal and existential quantifiers.
3 Context-Aware Applications Context-awareness will affect our daily life in all its dimensions (at home, at work, in public spaces, etc). It provides indeed a way to adapt the behavior of applications in
724
B. Chihani et al.
order to meet user expectations, for instance by specifying the actions that an application should apply in a given situation. This service adaptation principle might apply in very various fields, such as: service selection [Tru08a], task adaptation [Tru08b], security and privacy control adaptation to apply an access control given a situation, communication adaptation [Her08] to select a communication protocol and optimize the communication, or content adaptation [Zim07] to adapt content resulting from a request and return the content in suitable form. In this section, we focus on some usually envisaged context-aware applications: Location-Based Services (LBS), Context-Aware Communication (CAC), contextaware buildings and Context-Aware Recommendation Systems (CARS). LBS are very developed context-aware systems that are mainly based on location as a fundamental context dimension. According to E. Kaasinen [Kaa03] “Location aware services or systems are defined as context-aware services that utilize the location of the user to adapt the service accordingly“. A plenty of commercial LBS for mobile devices have been developed like Nulaz (Pan07), Foursquare7, Gowalla8, Loopt9. These services are based on both outoor location (mainly GPS) and social networks and sometimes augmented reality technologies (e.g. Layar10). The main idea around them is helping people to locate their friends and interesting places to visit or where to meet with friends. Another example of LBS is location-based messaging services [Num07] like Socialight11, InfoRadar [Ran04], Heresay [Pac05]. Context-aware communication (CAC) applications apply knowledge of people's context to reduce communication barriers [Sch02]. Many scenarios of context-aware communication can be imagined like those presented in [Num07b] based on nonverbal and electronic communication services (e.g. SMS, MMS, chats, e-mail, electronic message boards and mailing-list): • • • •
Seeing whether a previously sent message, especially an urgent one, has already been delivered to the recipient and whether the recipient has already read it; Restricting what context information about you other persons are allowed to see in different situations; Leaving messages to certain places for anyone that arrives at the same place to read, which can be compared to an electronic bulletin board; Notifying user about the reception of message in appropriate situation, for example notify a user only when he is in coffee break about a message left by a friend asking him for a week-end skiing.
Context-aware buildings are another promising study field. In [Mey03], the authors presented some futuristic image of what will be a house fully embedded with sensors and intelligent devices in order to support healthier everyday life of users. For example 7
http://foursquare.com http://gowalla.com/ 9 http://www.loopt.com/ 10 http://www.layar.com/ 11 http://socialight.com/ 8
Context-Aware Systems: A Case Study
725
phones will ring only in the room where the callee is located to avoid disturbing everyone in house; lights and sound will be automatically adjusted based on the user who is present in the room; family member will be able to communicate as if they were in the front of each other, even if they are in different rooms; assistance of older people will be enhanced and their health conditions will be continuously assessed. According to [Fuj07], a context-aware home will change society conventional lifestyle, and especially in heath management by changing the purpose of medicine from treatment to prevention, the location of healthcare from hospital to home, and the method of obtaining information on diseases from periodic to real time examination. Context-Aware Recommendation Systems (CARS) aim to recommend a service or a product to a user based on his context. A lot of works have been conducted specially for recommending movies [Bog10], motivated by very well awarded competitions like Netflix12.For the recommendation to be relevant, CARS need to collect and to process a great amount of data (about products rating, users preferences, historical data, etc.) to predict the most relevant product or service to a user. In this paper, we have applied CARS concepts to communication services, by processing data retrieved from the Microsoft communication suite.
4 Case Study: HEP 4.1 Usage Scenario Current advances in ICT (Information and Communication Technologies), especially in professional environments, are enhancing communication between co-workers. In the same time, these technologies add a certain amount of stress to workers because they are loosing the control on the way they can be reached and at what time. Also, the diversity of the used communication tools (e.g. Email, IM, Video conferencing) amplifies the amount of notifications or interruptions (e.g. when an email is received) they cause to the workers. This may cause degradation on the worker performance on his current activity or influence the choice of the future ones [Hud02]. Hence, it’s important to control when interruptions occur on behalf on the user in order to not affect his performance. One possible solution is to delegate the control of interruptions to his contacts by sending to them his contextual information. The information will help users’ contact to evaluate the importance, at this time, of the interruption they will cause. In this aim, we have developped HEP a context-aware system for recommending communication means for enterprise employees. The system publishes real-time information describing their status, emotions, activities and workload. The published information are results of processing diverse input streams concerning the usage of communication services (phone, IM, e-mail, calendar). The nature of the inputted information as part of the user context, how it is retrieved and how it will be processed make CMS the suitable management system, and context-aware system the suitable kind of application.
12
http://www.netflixprize.com/
726
B. Chihani et al.
Fig. 2. HEP statuses
Figure 2 presents the different statuses of a user: "Very available" corresponds to the state where user is highly available for receiving communication requests (e.g. phone call, IM request); "Available" corresponds to the state where user can receive call requests; "Busy" corresponds to the state where user can weakly respond to a call request; "Do not disturb" corresponds to the state where user cannot respond and will potential refuse incoming communication requests. A status corresponds to the level of availability of a user on a given communication service (e.g. agenda, email, instant messaging, phone). Such information is used by the caller to decide if he can interrupt the callee, and if it is better to use a communication service (e.g. email) than another service (e.g. phone) in order to reach the callee. For instance, let us suppose that Alice wants to call Bob for an urgent matter. Bob is at this moment in a conference call, but he is still reading his emails and answering them. With HEP, Alice will see that Bob is busy on the phone, but available by email. She decides thus to send his an email instead of calling him, although her demand is urgent. 4.2 Service Design Our system (figure 3) is developed in .Net and is based on OCS 2007 (Office Communication Server). The different elements composing the architecture are: a PC client, an Outlook plug-in, and a broker. The PC client implements the two first functions of a CMS. It is responsible for: • • •
Retrieving raw data from virtual sensors placed on Microsoft communication suite (Email, Calendar, Instant Messaging, fixed telephony), Computing user status for communication mean, Interacting with the broker.
The Outlook plug-in provides the user interface. It enables the user to set his preferences (e.g. the status that should correspond to a given load level), and above all it enables the user to see the statuses of each of his Outlook contacts. The CMS storage and management functions are implemented in the broker that offers a directory service. PC clients subscribe and publish their status. And Outlook plug-ins requests the status of other users. An administration interface is available to set global rules for status computation. Computation of user status is based on information about the history of usage of communication services (e.g. email) and desktop applications (e.g. Word, Excel, PowerPoint). The frequency of computation and freshness of status depends on the related communication service, but can be fixed by users when they specify their preferences.
Context-Aware Systems: A Case Study
727
Fig. 3. HEP architecture
4.3 Context Modeling We gather the different contextual information (both sensed and deduced ones) in an UML data model as illustrated in Figure 4. The BaseObject class is the root class in the model that is common to any type of context information. The InteractiveCom class gathers the shared attributes of interactive communication tools, while information specific to a communication tool is gathered in a specific class (e.g. Instant Messaging, email, phone). A specific class is dedicated to calendar information.
Fig. 4. HEP Data Model
728
B. Chihani et al.
The description of the different attributes is as follow: • •
• • • • •
Available: is a sensed data that represents whether or not the user is available at a given instant in a communication tool (e.g. for calendar it can be interpreted as the user is currently not in a meeting); Load: is a deduced data that represents work load of user corresponding to a communication tool (e.g. for calendar, load is the ration of the total amount of meeting time to work time), it correspond directly to the user status (figure 2); Timestamp: represents for how long the sensed information remain valid, it depends on communication tool (e.g. 5mn for Mail, 15mn for Calendar); Missed: represents the ratio of missed communication requests (e.g. missed phone calls, IM requests or unread mails) to the received; Engaged: ratio of engaged communication to the received ones; Availability: ratio between free time and total amount of meeting; unreadVoiceBoxMsg: ratio of unread message from the user’s voicemail to the stored ones.
4.4 Context Reasoning The sensed information are used to compute the work load of a user on a given communication tool in order to determine the user status and whether or not he can accept incoming requests on this communication tool (figure 5). We defined rules for calculating the work load level for each communication mean (IM, mail, phone, calendar).
Fig. 5. User status based on his work load level
In the case of calendar, if the user is currently in a meeting then we set his calendar work load to 100%. Then, the more the meeting start time gets closer, the more the calendar work load goes higher (e.g. 5m before a meeting, work load reaches 75% and user status become ‘busy’). If the user is not in a meeting then calendar work load is the ratio of meeting duration in the rest of the day to the remaining work time. Follow is an example of the calculation of the calendar work load of a user at different time of day. We consider that a work day start at 8:00 and finish at 18:00, and the user have a first meeting from 9:00 to 11:00 (2h duration), then a second one from 15:00 to 18:00 (3h duration). Thus, at 8:00 work load is (2+3)/10 = 50%, from 9:00 to 11:00 work load is 100% (user in meeting), at 12:00 work load is 3/6 = 50%, at 14:00 work load is 3/4 = 75%, and between 15:00 and 18:00 work load is 100%.We add a layer of abstraction by introducing the user global status that reflects the global workload. It is computed by combining the status related to the different communication means, with predefined weightings that can be modified by the end-user.
Context-Aware Systems: A Case Study
729
4.5 Future Works HEP has been deployed on the workstations of our coworkers at Orange Labs (Caen, France), and we received very positive feedbacks. The integration with the everyday working tools (e.g. Outlook) was especially approved. From the implementation viewpoint, several lessons can be derived. The current reasoning technique is built with a set of IF-THEN clauses implemented in a C# class. We believe it is enough for a proof of concept solution, and we plan to use more sophisticated techniques like those provided by rule engines. For the context modeling language, we used an UML data model to take benefit of encapsulation and inheritance. The current modeling approach do not include metadata especially QoC parameters, we plan to include them in our future works. We found that these parameters are as important as sensed data themselves especially for managing very common situations where software crashes. In our current solution, context processing including reasoning are performed at the client side. Such solution makes the deployment of new reasoning techniques for new included context data (e.g. about the usage of other office tools) more complicated. To overcome this issue we are planning to transfer a part of the preprocessing layer (figure 1), namely the reasoning part, to the broker side.
5 Conclusion In this paper we surveyed the previously conducted works in the field of contextaware systems from different viewpoints like system design, context modeling and reasoning. After having highlighted the challenges to face when building contextaware systems and the major application fields for such systems, we proposed a context-aware system for recommending communication means. Our system helps users to choose the appropriate communication mean for contacting a person based on the context of the later. The aim behind the developed prototype is to build a contextaware system with the existing approaches in sensing, modeling and reasoning, to use it as a solution for a real problem, and experiment it with users in a real environment. Besides the enhancements introduced in the previous section (lessons learned), we plan to expand our system with the ability to transform, in a transparent way, the format and the delivery time of a message based on the user’s context. A message sent as an SMS at time t could be received, for instance at time t + t’ as an email, given that the user is unreachable at t but may be reached at t + t’ by e-mail only because he/she is still on the phone. We believe that this could lead to a new and seamlessly way to use our daily communication means.
References [Alm06] de Almeida, D.R., Baptista, C.S., da Silva, E.R., Campelo, C.E.C., de Figueirêdo, H.F., Lacerda, Y.A.: A Context-aware System Based on Service-Oriented Architecture. In: Proceedings of 20th International Conference on Advanced Information Networking and Applications, AINA 2006, Vienna University of Technology, Vienna (2006)
730
B. Chihani et al.
[Bal07] Baldauf, M., Dustdar, S., Rosenberg, F.: A Survey on Context-Aware Systems. Int. J. Ad Hoc and Ubiquitous Computing, 2(4), 263–277 (2007) [Bog10] Bogers, T.: Movie Recommendation using Random Walks over the Contextual Graph. In: 2nd Workshop on Context-Aware Recommender Systems (CARS 2010), Barcelona, Spain (2010) [CCPP] Kiss, C.: Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies 2.0. W3C Draft (2007) [Che03] Chen, H., Finin, T., Joshi, A.: An Ontology for Context-Aware Pervasive Computing Environments. Journal The Knowledge Engineering Review 18(3) (2003) [Che03b] Harry, C., Finin, T., Joshi, A.: An Intelligent Broker for Context-Aware Systems. In: Adjunct Proceedings of Ubicomp 2003, Seattle, Washington, USA, October 12-15 (2003) [Che04] Chen, H., Perich, F., Chakraborty, D., Finin, T., Joshi, A.: Intelligent Agents Meet Semantic Web in a Smart Meeting Room. In: Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2004), vol. 2, IEEE Computer Society, Washington, DC, USA (2004) [Che09] Cheng, P.L., Kakousis, K., Paspallis, N., Papadopoulos, G.A., Lorenzo, J., Soladana, E.: White Paper: MUSIC & Android. IST-MUSIC Deliverable, D13.11 (2009) [Dey00] Dey, A.K., Abowd, G.D.: Towards a better understanding of context and contextawareness. In: Proceedings of the Workshop on the What, Who, Where, When and How of Context-Awareness, ACM Press, New York (2000) [Dey01] Dey, A.K.: Understanding and using context. Personal and Ubiquitous Computing 5 (2001) [Dey01b] Dey, A.K., Salber, D., Abowd, G.D.: A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications. Anchor article of a special issue on context-aware computing in the Human-Computer Interaction (HCI) Journal 16(2-4), 97–166 (2001) [Fer06] Goel, D., Kher, E., Joag, S., Mujumdar, V., Griss, M., Dey, A.K.: Context-aware authentication framework. In: Phan, T., Montanari, R., Zerfos, P. (eds.) MobiCASE 2009. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol. 35, pp. 26–41. Springer, Heidelberg (2010) [Fuj07] Fujii, A.: Trends and issues in research on context awareness technologies for a ubiquitous network society. Science and Technology Trends (2007) [Gu04] Gu, T., Pung, H.K., Zhang, D.Q.: Toward an OSGi-based infrastructure for contextaware applications. IEEE Pervasive Computing 3(4), 66–74 (2004) [Gu04b] Gu, T., Pung, H.K., Zhang, D.Q.: A middleware for building context-aware mobile services. In: IEEE 59th Vehicular Technology Conference, VTC 2004-Spring, Milan, Italy (2004) [Ha07] Hamadache, K., Bertin, E., Bouchacourt, A., Ben Yahia, I.: Context-Aware Communication Services: an Ontology Based Approach. In: International Workshop on Context Modeling and Management for Smart Environments (CMMSE 2007), In Conjunction with ICDIM 2007, Lyon, France (2007) [Her08] Herborn, S., Petander, H., Ott, M.: Predictive Context Aware Mobility Handling. In: International Conference on Telecommunications, St. Petersburg, Russie (2008) [Hud02] Hudson, J.M., Christensen, J., Kellogg, W.A., Erickson, T.: I’d be overwhelmed, but it’s just one more thing to do: availability and interruption in research management. In: Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves (CHI 2002), Minneapolis, Minnesota, USA (2002) [Kaa03] Kaasinen, E.: User needs for location-aware mobile services. Personal and Ubiquitous Computing 7(1), 70–79 (2003)
Context-Aware Systems: A Case Study
731
[Kna10] Knappmeyer, M., Kiani, S.L., Fra, C., Moltchanov, B., Baker, N.: ContextML: A light-weight context representation and context management schema. In: 5th IEEE International Symposium on Wireless Pervasive Computing (ISWPC), Modena, Italy (2010) [Lok06] Loke, S.: Context-Aware Pervasive Systems: Architectures for New Breed of Applications. Auerbach Publications (2006) ISBN 0-8493-7255-0 [McK09] McKeever, S., Ye, J., Coyle, L., Dobson, S.: A Context Quality Model to Support Transparent Reasoning with Uncertain Context. In: 1st International Workshop on Quality of Context (QuaCon), Stuttgart, Germany (2009) [Mey03 Meyer, S., Rakotonirainy, A.: A Survey of Research on Context-Aware Homes. In: Proceedings of Conference on Research and Practice in Information Technology, vol. 21 (2003) [Num07] Nummiaho, A., Laakko, T.: A Framework for Mobile Context-Based Messaging Applications. In: Proceedings of the 4th international conference on mobile technology, applications, and systems and the 1st international symposium on Computer human interaction in mobile technology, MC 2007 (Mobility 2007), New York, NY, USA (2007) [Num07b] Nummiaho, A.: User survey on context-aware communication scenarios. In: Proceeding of the 4th Intl. Conf. on Mobile Technology, Applications and Systems (Mobility Conference), Singapore, pp. 478–481 (2007) [Pac05] Paciga, M., Lutfiyya, H.: Herecast:an open infrastructure for locationbased services using WiFi. In: IEEE International Conference on Wireless And Mobile Computing, Networking And Communications (WiMob 2005), Montreal, Quebec (2005) [Pan07] Pannevis, M.: I’m bored! Where is Everybody? Location Based Systems for Mobile Phones. MCs Thesis, University of Amsterdam (2007) [Ram09] Ramparany, F., Benazzouz, Y., Martin, M.B.: Agenda Driven Home Automation Towards High Level Context Aware Systems. In: Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing (UIC-ATC 2009)., Brisbane, QLD (2009) [Ran04] Rantanen, M., Oulasvirta, A., Blom, J., Tiitta, S., Mäntylä, M.: InfoRadar: group and public messaging in the mobile context. In: Proceedings of the third Nordic conference on Human-computer interaction, NordiCHI 2004, pp. 131–140 (2004) [RDF] RDF Working Group : Resource Description Framework (RDF). World Wide Web (W3), http://www.w3.org/RDF/ (2004) [Sch02] Schilit, B., Hilbert, D.M., Trevor, J.: Context-aware Communication. IEEE Wireless Communications 9(5), 46–54 (2002) [She05] Sheng, Q.Z., Benatallah, B.: ContextUML: a UML-based modeling Language for model-driven development of context-aware Web services. In: Proceedings of the International Conference on Mobile Business, ICMB 2005 (2005) [Tru08a] Truong, H.-L., Juszczyk, L., Bashir, S., Manzoor, A., Dustdar, S.: Vimoware - a Toolkit for Mobile Web Services and Collaborative Computing. In: Special session on Software Architecture for Pervasive Systems, 34th EUROMICRO Conference on Software Engineering and Advanced Applications, Parma, Italy, September 3 - 5 (2008) [Tru08b] Truong, H.L., Dustdar, S., Corlosquet, S., Dorn, C., Giuliani, G., Peray, S., Polleres, A., Reiff-Marganiec, S., Schall, D., Tilly, M.: inContext: A Pervasive and Collaborative Working Environment for Emerging Team Forms. In: International Symposium on Applications and the Internet, SAINT 2008, Turku, Finland (2008) Truong, H.L., Dustdar, S.: A survey on context-aware web service systems. International Journal of Web Information Systems 5(1), 5–31 (2009) [Tru09] UAProf Wireless Application Protocol WAP-248-UAPROF-20011020-a. WAP Forum (2001)
732
B. Chihani et al.
[Van06] van Sinderen, M.J., van Halteren, A.T., Wegdam, M., Meeuwissen, H.B., Eertink, E.H.: Supporting context-aware mobile applications: an infrastructure approach. IEEE Communications Magazine (2006) [Ven10] Venezia, C., Lamorte, L.: Pervasive ICT Social Aware Services enablers. In: 14th Int. Conf. Intelligence in Next Generation Networks (ICIN), Berlin, Germany (2010) [Yam05] Yamabe, T., Takagi, A., Nakajima, T.: Citron: A Context Information Acquisition Framework for Personal Devices. In: Proceedings of the 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications, RTCSA 2005 (2005) [Zaf10] Zafar, M.: Specification of Context Casting Service En-ablers, Context Management & Context Brokering. C-CAST IST WP3 Deliverable (2010) [Zim07] Zimmermann, A.: Context Management and Personalization. PhD Thesis. University of Aachen (2007)
Handling Ambiguous Entities in Human-Robot Instructions Using Linguistic and Context Knowledge Shaidah Jusoh and Hejab M. Alfawareh Faculty of Science & Information Technology, Zarqa University, Zarqa Jodan {shaidah,hejab}@zpu.edu.jo
Abstract. A behaviour-based control has been considered as the best approach to control autonomous robots. The robots have been expected to assist people in a people living environment such as houses or offices. Such situations require natural interaction between people and the robots. One way to facilitate this is by deploying a natural language interface (NLI) for human-robot interaction. The major obstacle in developing the NLI is natural language is always ambiguous. The ambiguity problem occurs when a word in a human instruction may have more than one meaning. Up to date, there is no existing NLI processor which can well resolve the problem. This paper presents a framework and an approach for resolving the problem. The approach is developed by utilizing fuzzy sets and possibility theory on linguistic and context knowledge. Keywords: natural language interface, ambiguity, fuzzy approach, lexical knowledge, context knowledge.
1 Introduction In the previous years, robots were the most suitable tools in a dangerous workplace like a nuclear plant or a manufacturing factory. However, in the recent years, researches on robotics have focused on developing autonomuous robots which can give services to people. These robots are expected to assist ordinary people and perform their tasks in human living areas such as in offices, houses, schools, hospitals and so on. Examples of this type of robots include a teaching assistant robot [1], an office secretary robot [2], a purchaser robot that buys a cup of coffe [3] and a few others. In parallel with that, a behaviour-based control is seen as the best approach to control autonomous robots [4, 5]. The architecture of the behavior-based approach is divided in three levels: the highest, middle, and lowest. The highest level concerns with task oriented behaviour, the middle with an obstacle-avoidance behaviour, and the lowest with an emergency behaviour. As the autonomus robots are expected to assist human users, the users are obligated to instruct the robot to perform duties. Thus an instruction from a human user is defined as a user-task behaviour. Consequently, a human-robot interface will be required to convert a user-task behaviour into a task-oriented behaviour. As the robot systems have started to become ‘friends’ in human life, the human-robot interface should be more natural and flexible. The need for the naturalness and H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 733–742, 2011. © Springer-Verlag Berlin Heidelberg 2011
734
S. Jusoh and H.M. Alfawareh
flexibilities have becoming a challenge issue for human-robot interfacing technology [6]. The most possible technology to facilitate naturalness is a natural language interface (NLI) technology. However, an NLI for an autonomous robot is more complex than a regular computing system, because the robot perceives its world and acts according to a sequence of user-task behaviours. The problem occurs because human user instructions are ambiguous; due to the ambiguity in a natural language. In human to human interaction, people resolve this ambiguity by using their interaction context. Robots are machines. Intelligent NLI is required. Consequently, NLI should be able to resolve the ambiguity problem by taking into account the context of interaction. This paper proposes a new approach for developing an NLI of an autonomous robots. The proposed approach utilized the interaction context and linguistic knowledge to resolve the ambiguity problem. The approach has been implemented as an intelligent NLI processor. Experiments on a test case have been done to convert an ambiguous a user-task behaviour into an unambiguous task-oriented behaviour. The unambiguous task-oriented behaviour is represented in a predicate calculus. The obtained results indicate that the proposed approach is successful. The paper is organized as follows; section 2 presents the basic understanding of natural language processing, section 3 discusses in details of the proposed approach. Implementation and results are presented in section 4, and the summary of the paper is presented in section 5.
2 Natural Language Processing (NLP) NLP research work focuses on analyzing human languages so that computers can understand the languages as a human being does. The ultimate goal of NLP community is to develop a software program that enables a computer to understand and generate a natural language. The work in this field is moving rapidly and many researches have been conducted in the last 10 years. Although the goal of NLP’s work remains far from being success, significant positive outcomes have been shown in some researches [7, 8]. NLP is a technology that concerns with natural language generation (NLG) and natural language understanding (NLU). NLU is a system that computes the meaning representation of natural language sentences. NLU research work is independent from speech recognition. However, the combination of the two may produce a powerful human-computer interaction system. When combined with NLU, speech recognition transcribes an acoustic signal into a text. Then the text is interpreted by an understanding component to extract the meaning. The ambiguity problem in a natural language can be classified into 4 types; lexical ambiguity, structural ambiguity, semantic ambiguity, and pragmatic ambiguity. Not all ambiguities can be easily identified and some of them require a deep linguistic analysis. The major processes in NLU include syntactic processing and semantic processing. Syntactic processing is a process of assigning a parse tree to a sentence. The purpose of this process is to determine an actual structure of a sentence, and part-ofspeech (POS) of each word in a sentence. Normally, syntactic processing requires a grammar. The grammar is a formal specification of the structures allowable in the language and it is usually represented as G=(VN,,VT ,P,s).
Handling Ambiguous Entities in Human-Robot Instructions
735
The symbols are explained below. •
VN : a set of non-terminal symbols that do not appear in the input strings, but are defined in the grammar. Examples of non-terminal symbols are: sentence (S), imperative sentence (IS), Noun Complement List (NCL), Noun ∂1 Complement (NC), verb phrase (VP), noun phrase (NP), and prepositional phrase (PP).
•
VT : a set of terminal symbols that are primitives or classes of primitive symbols in input strings. Examples of terminal symbols are: Noun, Verb, Preposition, Adjective, Determinant (Det), and Adverb.
•
P: a set of production rules, each of the form σ → β where σ is a nonterminal symbol and a string of symbols from the infinite set.
•
s: a starting of symbols.
The process of syntactic processing is explained as the following. Let σ1 be a string in (VT U VN)* and σ → β be a rule in P. If σ is in σ1, we can obtain a new string from σ1 by substituting σ with β. The new string is also a string in (VT U VN)*. (The symbol * indicates a free manioc X* over the set X). Now, let σ2 denotes the new string. σ2 is said to be derivable from A1 in G. The derivation can be expressed as σ1 → σ2. Let σ1, σ2, σ3,.., σm be strings in (VT U VN)* (m≥2). If there are σ1 →σ2; σm-1 → σm, then σm is said to be derivable from σ1 in G. The sequence of derivations σ1 → σ2, …,σm-1 → σm is referred to as a derivation chain from σ1 to σm. A grammar G defines a language L(G). A string s is a valid sentence in L(G), if and only if s→ S. Semantic processing is a process of converting a parse tree into a semantic representation that is precise and unambiguous representation of the meaning expressed by the sentence. Semantic interpretation can be conducted in two ways: context independent interpretation and context dependent interpretation. Context independent interpretation concerns with meanings of words in a sentence and how these meanings are combined in the sentence to form the sentence’s meaning. Context dependent interpretation concerns how the context affects the interpretation of words exist in a sentence. In this paper, the location where the robot is positioned has been used as the context knowledge.
3 The Proposed Approach The framework of the proposed approach is illustrated in Fig.1. There are 5 major components; NL processor, History Knowledge, Model Knowledge, World Knowledge and Task Oriented Behavior. This paper discusses only the NL and Model knowledge. The aim of the approach is to convert NL instructions (which represent user-task behaviors) into unambiguous task oriented behaviors. The concern of this paper is to resolve an ambiguous entity in a user instruction. The word which is categorized into Noun POS, and is used to describe an object or a thing is considered as an entity. For example, pen, dish, table, chair, spoon and so on are entities.
736
S. Jusoh and H.M. Alfawareh
History Knowledge
User instruction
NL Processor
Model Knowledge
User-task behavior Task Oriented Behavior
World Knowledge
Top application Bottom application Behavior-based Control
Fig. 1. Framework of the intelligent natural language interface for an autonomous robot
3.1 User-Task Behavior
Three scenarios below demonstrate how user instructions can be used to represent user-task behaviors. Scenario A: An owner of a house requires a house keeping robot (an autonomous robot) to arrange things in a room which is used as an office in the house. Instruction: “Keep the pen in the box”. Robot moves to collect the pen and save in the box. Scenario B: An owner of a house needs a house keeping robot to tidy up a baby’s room. Instruction 1: “Resemble the pen” Instruction 2: “Store the pen in the closet” Robot moves to resemble the play pen and store it in the closet. Scenario C: An owner of a house requires a house keeping robot to clean dishes in a kitchen. Instruction 1: “Clean the dish” Instruction 2: “Store it in the cabinet”.
As we can observe from the user instructions above, the word such as pen, cabinet, and dish are ambiguous. The word ‘pen’ takes a different meaning in a different context. The robot must know the correct meaning of each word before it can perform
Handling Ambiguous Entities in Human-Robot Instructions
737
any action. Incorrect understanding of the meanings may cause the robot to perform a task which is not required by the human user. 3.2 Model Knowledge
Two pieces of knowledge are stored in Model knowledge: lexical knowledge and context knowledge. Lexical knowledge is represented as a two dimensional array to store lexical and its semantics. Table 1 shows examples of a lexical “pen” with its semantics. According the WordNet database, the word pen has 5 possible senses. WordNet is a large lexical database of English, developed by Princeton University. The database categorized words into nouns, verbs, adjectives and adverb; each expressing a distinct concept. Nouns, verbs, adjectives and adverbs are grouped into sets of synsets. Synsets are interlinked by means of conceptual-semantic and lexical relations. WordNet is also freely and publicly available on the Internet for download. WordNet’s structure makes it a useful tool for computational linguistics and NLP. In this work, WordNet version 3.0 is used as a reference in determining semantics of a word. Table 1. Five possible meanings/semantics for the word pen Word pen pen pen pen pen
Semantic a writing tool a livestock’s enclosure a portable enclosure for a baby a correctional institution a female swan
In this work, we defined a possible context for each sense based on the word’s meanings which are obtained from the WordNet database. Table 2 demonstrates the meanings of the word ‘pen’ with its contexts. Context knowledge is a piece of knowledge which is obtained from the World Knowledge. In this work, it assumed that the house keeping robot is equipped with a sensor that can capture an image of its environment; the location of its being positioned. Then, the image is translated into a text. For example an image of an office is translated into `office room’ and an image of a baby room is translated into `baby room’ as shown in Table 2. Table 2. Five possible contexts for the word pen Word pen pen pen pen pen
Semantic a writing tool a livestock’s enclosure a portable enclosure for a baby a correctional institution a female swan
Context office room Barn baby room Building Lake
738
S. Jusoh and H.M. Alfawareh
3.3 NL Processor
NL processor component contains a technique to resolve the ambiguity problem. The technique is developed by applying fuzzy sets and the possibility theory on the lexical and context knowledge. In this technique, a membership function is given each meaning of the entity. The possibility theory is then used to select the most possible meaning of the entity based on the given context. NL processor conducts the two major processes; parsing the instructions and resolving ambiguous entities. Parsing the Instructions
Parsing is a process of searching a syntactic structure for NL instructions. The process of parsing the instructions requires NL processor to fire grammar rules and use a lexicon to determine POS of each word in the instructions. In this work, the lexicon is represented as a two dimensional array which holds words and their POS. A word that is categorized as a Noun POS is considered as an entity. The proposed approach decides either it is ambiguous or not. If the word is stored as one meaning, then the word is not ambiguous, and if the word is stored with multiple meaning, then the word is considered ambiguous. A parsing process is considered successful if all the words in NL instruction can be recognized its POS. Resolving Ambiguous Entity
In this section, a proposed theory of how to resolve ambiguous entities using possibility theory and fuzzy sets is presented in details. Now, let us denote Ω as a set of lexical, Fψψdenotes a fuzzy set of Ω with subject to the context (C). Variable x is a lexical may be restricted by the fuzzy set Fψ. We denote such a restriction as П(x, Fψ), and call Fψψthe restricting fuzzy set of x. П(x| Fψ) associates a possibility distribution x
with x. The possibility distribution function П Fc (µ) denotes the possibility for x to take value µ under the restriction of Fψ. Numerically, the distribution function x under the restriction Fψψis defined to be equal to the membership function of Fψ, that is ∏( x, FC ) = μFc( X )∀μ ∈ Ω
(1)
Now let us consider the lexical semantics of the word ‘pen’. The table is stored in the lexical knowledge store. Again, let us take the ‘pen’ as a lexical x, then lexical semantics of x can be formalized as
x = mi, mi + 1,..., mj
(2)
where mi is the first semantic, and mj is the last semantic, and its membership function can be derived from Eq. 1 is ∏( x, Fc( x ) = (vi, vi + 1,..., vj )
(3)
Handling Ambiguous Entities in Human-Robot Instructions
739
where v is a plausibility value, and it is context-dependent. When x is applied in a different context, it may take a different value. In this work, the v is assigned automatically and randomly by the NL processor. Fig. 2 illustrates how the fuzzy values are assigned randomly to the semantics of lexical.
Fig. 2. Examples of a fuzzy value assignment based on the context of interaction
The most plausible value (ρ) of x is obtained by using the maximum (max) operator of a fuzzy set. Thus
ρ = (vi, vi + 1,..., vj )
(4)
Once ρ value has been calculated and presented, the most possible semantic can be attached to x. In this way, the lexical ambiguity can be resolved, consequently, the system is able to give the most accurate meaning or semantic of a given word. To apply the possibility theory to the technique, a fuzzy semantics database is created. The fuzzy semantic database is a table T which contains three fields; lexical (x), semantic (semi) and semantic value (v). Conceptually the table T can be formalized as T + {(( semi, vi ), ( semi + 1, vi + 1), ( semj , vj ))}
(5)
where semi denotes the meaning or semantic of the word x and v is a possibility value attach to it. The value v is in a range of (0, 1] based on the subject context. The values are generated and stored in the database manually by using human common sense. However, the process of assigning value to a lexical x is conducted dynamically and randomly. Fig. 2 illustrates how values are assigned to semantics randomly. The result of value assignment will be a fuzzy semantic database as presented in Table 3.
740
S. Jusoh and H.M. Alfawareh
Table 3. The possibility value is assigned each semantic for each defined context. The highest value is assigned to the interaction context. Lexical pen
Semantic a writing tool
Context (C) office room
Value (v) 0.9
pen
a livestock’s enclosure
barn
0.3
pen
a portable enclosure for a baby
baby room
0.8
pen
a correctional institution
building
0.5
pen
a female swan
lake
0.2
Let us take an office room as an interaction context between a human user and a robot. In this context, a grade value of a word pen to have a semantic of as writing tool is 0.9 and female swan is 0.2. It is very unlikely for the word pen to be a female swan, thus the possibility value is 0.2. To resolve an ambiguous word pen, the most possible value ρ is calculated using the max fuzzy set operator. Using the example given in Table 3, the ρ is calculated by using equation in 4. Using plausible value in the Table 3, each v in 4 is replaced by v from the database, which can be represented as in Equation 6.
ρ = ((0.9,0.3,0.8,0.5,0.2))
(6)
NL processor resolves the maximum value of all the plausible value in table T, which 0.9 is taken the possible value for the word pen in the context of the office room. The semantic that is attached with the plausible value 0.9 is “writing tool”, consequently, its semantic is taken as the unique semantic of a word. When the most possible semantic is identified, the semantic attachment is conducted. Once semantic attachment process is completed, semantics for ambiguous entities have been resolved. At this stage, all recognized entities are not ambiguous anymore. The following step is the instruction is converted into a task oriented behavior.
4 Implementation and Results The proposed approach has been implemented and experimented. A chart parser has been developed using a dynamic programming approach in C language. 77 grammar rules have been constructed. Each grammar rule consists of a left symbol and several right symbols. The right symbols could be terminal and non-terminal symbols. A structure for grammar rules is created and defined as the rules data type. It consists of three types of members which are stored in an integer data type. The data structure of the rule is illustrated below.
Handling Ambiguous Entities in Human-Robot Instructions
741
typedef struct { int leftsymbol; int NOfRightSymbol; inT rightsymbol[MaxNofRightSymbol] }
In parsing an instruction, the parser scans the existing grammar rules and a lexicon in which a possible grammar rule is fired. The lexicon is represented as a two dimensional array of an integer data type. A row element is used to store a terminal symbol such as verb, noun, adjective, adverb and preposition. A column element is used to store a word. If a word has two kinds of terminal symbols, the word is stored twice. For example, the word ‘place’ is stored as a verb as well as a noun. In this work, 110 human-robot instructions as illustrated in Section 3.1 have been created as a test case. For the simplicity in parsing, each instruction is limited to VP NP structure only. Each instruction contains words in the range 2 to 5. The parser is used to recognize POS of each word in the user instruction. A parsing process is successful if all words in the user instruction is recognized their POS; consequently a parse tree is generated. An example of a generated parse tree is shown in Fig. 2. An ambiguous entity is then tagged.
Fig. 2. An example of a parsing output. The word ‘dish’ is recognized as a noun.
A parse tree representation is conversed into a predicate calculus. For example, the parse tree represented in Fig. 2 is translated into a predicate clean(dish), in which the dish has been tagged. Using the proposed technique mentioned in Section 3, the word ‘dish’ is resolved into ‘food container’. Consequently, an unambiguous task-oriented behavior is clean(food_container). Table 4 presents a sample of the experiment results from the test case. Table 4. A sample of experiments results from the test case Instruction keep the pen in drawer clean the dish store it the cabinet resemble the pen store the pen in the closet
Context office room kitchen kitchen baby room baby room
Task oriented behavior keep (writing _tool, drawer) clean(food _container) store(food_container, storage) resemble(play_pen) store(play_pen, small_room)
742
S. Jusoh and H.M. Alfawareh
5 Summary This paper presents a new framework and an approach for handling the ambiguity problem in human-robot instructions. The approach is obtained by applying fuzzy sets and possibility theory to the lexical knowledge and interaction contexts. In this work, the lexical knowledge is obtained from the WordNet database and interaction contexts are simulated. The proposed approach has been implemented and experimented. This work focuses on the lexical ambiguity only, where the major concern is ambiguous entities. The obtained results show the proposed approach is viable.
References 1. Han, J., Lee, S., Kang, B., Park, S., Kim, J., Kim, M., Kim, M.: A Trial English Class with a Teaching Assistant Robot in Elementary School. In: 5th ACM/IEEE International Conference on Human-robot Interaction (HRI 2010), pp. 335–336. ACM, New York (2010) 2. Asoh, H., Motomura, Y., Asano, F., Hara, I., Hayamizu, S., Itou, K., Kurita, T., Matsui, N., Vlassis, R., Bunschoten, R., KrÄose, B.: Jijo-2: An Office Robot that Communicates and Learns. J. IEEE Intelligent Systems 16(5), 44–45 (2001) 3. Nakauchi, Y., Simmons, R.: A Social Robot that Stands in Line. J. Autonomous Robots. 12(3), 313–324 (2002) 4. Arkin, R.C.: Behavior-based Robotics: Intelligent Robots and Autonomous Agents. MIT Press, Cambridge (1998) 5. Hasegawa, Y., Fukuda, T.: Motion Coordination of Behavior-Based Controller for Brachiation robot. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 896–901. IEEE Press, New York (1999) 6. Nathan, F., Asada, M., Hinds, P., Sagerer, G., Trafton, G.: Grand Technical and Social Challenges in Human-Robot Interaction. In: 5th ACM/IEEE International Conference on Human-robot Interaction (HRI 2010), pp. 11–11. ACM, New York (2010) 7. Sekimizu, H., Tsuji, J.: Identifying the Interactions Between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstract. In: 9th Workshop on Genome Informatics, pp. 62–71. Universal Academy Press, Japan (1998) 8. Grenager, T., Klein, D., Manning, C.D.: Unsupervised Learning of Field Segmentation Models for Information Extraction. In: 43rd Annual Meeting on Association for Computational Linguistics, pp. 371–378, USA (2005)
Cognitive Design of User Interface: Incorporating Cognitive Styles into Format, Structure and Representation Dimensions Natrah Abdullah, Wan Adilah Wan Adnan, and Nor Laila Md Noor Faculty of Computer and Mathematical Sciences Universiti Teknologi Mara Shah Alam , Malaysia {natrah,adilah,norlaila}@tmsk.uitm.edu.my
Abstract. In line to the introduction of the online museum, a user interface is a new medium to allow museum collections to be exhibited and promoted more effectively. For promoting purposes, this paper stresses on an establishment of the digital collection structure and the improvement of the interface design for users. It is critical to pay attention to the cognitive design and ensure that the interface is usable in order to help users understand the displayed information. The user interface design for online museum has commanded significant attention from designers and researchers but lacks in the cognitive design perspective. In the effort to formalize the design, this report is the extension of the initial work on a user interface design framework in understanding cognitivebased user interface design. The individual differences approach is adopted to explore possible user interface design elements. This study improves the framework by conducting empirical investigation to test the hypotheses linkage between cognitive styles and user interface dimensions. The research method involves using Field Dependent and Field Independent as the case study and web-based survey on online museum visitors. The result of the analysis suggests cognitive styles do influence user interface dimensions. These design elements contain the implications for user interface design of the online environment, cognitive design of user interface development, and the identification of cognitive design of user interface for cultural website that support user while browsing for museum online collections. The effort may contribute towards increasing the usability level of the website. Keywords: cognitive styles, user interface design.
1 Introduction In the internet era, new type of museum has been introduced to allow more collections to be placed online. With extensive usage of the Internet nowadays, people will no longer have physical space constraints and can get more information about museums from different webs via search engines, or directly visit online museums [1]. A role of museums is beyond preserving a society's cultural heritage collections [2]. Besser [3] noted that online museum increases the public’s awareness of and access to traditional H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 743–758, 2011. © Springer-Verlag Berlin Heidelberg 2011
744
N. Abdullah, W.A. Wan Adnan, and N.L. Md Noor
physical collections, and serving as a promotional function. Online museums become a center to promote these collections and later persuade a user to visit the physical museum [4]. User Interface (UI) is a new medium to allow museum collections to be exhibited and promoted ever since an online museum was being introduced. For promoting purposes, Fan et. al [5] stress on the establishment of a structure for the digital collection system and the improvement of the interface design for users. The goal to present, and effectively views museum collections for exhibition and promotion, should thus be of emphasis. Otherwise, the promotion of museum collections as to attract more visitors to visit museums will be affected. Users are easily recognizing information, which matched with their needs [6] and fit with their capability of information processing [7]. Study by Fikkert et. al [8] shows that user differences influence UI design. They concluded that failing to relate with computer users need, and their capability to process information, may result in misinterpretation of the displayed information. Therefore, it is critical to pay attention to user differences and ensure that the interface is usable in order to help users understand the displayed information [9]. Thus, a study on user's differences is significant. By integrating user interface into museum website, museum collections may be presented and views effectively. Among of the various users’ differences, is the cognitive styles, which consider the user’s preferred and habitual approach to organizing and representing information [10]. Cognitive styles are proven to have a significant effect on comprehension during the searching and browsing activities [11]. Cognitive styles are also associated with certain personality characteristics that may have important implications to instructional and learning process [12]. As UI design major concern is on how to organize and present information to users, cognitive styles become even significant. In addition, cognitive styles are the most applicable because it is independent of intelligence [13], personality and gender [14]. Moreover, cognitive styles are consistent across domains and stable over time [15]. However, cognitive styles are often overlooked when designing a user interface for museum website. To fill the gap, this study incorporates cognitive styles into the interface elements. The theoretical framework presents the relationship between cognitive styles and user interface dimensions. In this paper users’ differences significant to UI design of the cultural website are first presented and discussed before design elements of UI is proposed. A section on research method explains the sample selection, data collection process and statistical analysis. We then provide and discuss the results of the statistical analysis, and followed by conclusion. Finally, this paper would also offer some directions for future research.
2 Background Study The success of UI in presenting information via online system from the user perspective is evaluated based on the performance and usability testing. The result suggests UI design can influence user performance and behavior. Thus, a successful UI is a UI that can get better performance in as well presentation as possible. Through understanding of user differences, the potential benefits of UI design could be realized and could contribute towards better performance and usability. The aspects of user difference that influence user comprehension can be categorized into three:
Cognitive Design of User Interface
745
Differences in attention: Most users are capable of using either a visual or verbal content, but they will prefer to use one rather than the other [10]. Imagery and verbal user are grouped according to their preference on content. Imagery style is attracted to ideas, concepts, data and other information, which are associated with images. Imagery users prefer an image or visual content. Visual content reflects the way in which imagery user would represent knowledge in their mental effort. Conversely, verbal users prefer sound and word content [10]. Sound and word content reflects the way in which verbalist represents knowledge in their mental effort. Thus, textual and visual formats have an influence to imagery and verbalist users. In addition, Wickens [15] discusses on imagery user who is at no difficulty in the visual format and verbalize user is, satisfy with textual format. Differences in accessing information: Holist and serialist users are identified during discussion on helping user find information [10]. According to their capability to explore their environment, holist typical adopts a thematic approach and will often focus on several aspects of the topic at the same time. Holist has a capability to view information as a whole and difficult to identify detail components of the information. However, serialist will adopt a step-by-step approach, built on clearly identified chunks of information, which are used to link concepts and parts of the topic. In addition, serialist will concentrate on detail, procedure and often conceptualize information in a linear structure. The approaches are relations to conclude that user may have a capability to assess information in holist or serialist strategy. The researchers suggest that serialist user accesses information using structured technique and holist user using an unstructured technique. Thus by focusing on the structure dimension as a way to organize information, the researchers hope to allow users move freely within the UI. Differences in action: This group is classified according to task handling used by a user to engage in UI. Active users tend to apply a dynamic strategy. Dynamic strategy encourages users to act and react and generally is more reliable indicators of user intent [16]. Inactive user may be attentive and less obtrusive and have a tendency apply a passive strategy. Passive strategy is described as a situation where the user does not interact with the UI and infrequently applies manipulation task. Passive strategy is common for public displays, such as a cultural website. In cultural website users are allowed to see and listen during viewing cultural collections. These characteristics of passive and active users are then used as the elements to create the representation elements. By identifying the cognitive background based on the cognitive differences of previous researches, this study uses these user differences as guidance to identify design elements as part of the localization process of UI. This is done with the contextualization process of ‘Format’, ‘Structure’ and ‘Representation’ dimensions of UI, which aims on assisting users on effectively views museum collections around the museum website. In addition, this paper proposes the contextualization process of ‘Content’ dimension of UI that will not only aid the user in information searching but also avoid user from experiencing information overload, which occurs when user deals with too much information. As part of the main contribution of this research which is to maximize user browsing task strategy, the contextualization process of ‘Structure’
746
N. Abdullah, W.A. Wan Adnan, and N.L. Md Noor
dimension of UI may reinforce user positioning and orientation while searching and browsing for information [19]. Furthermore, the localization process of ‘Representation’ dimension of UI is also being imposed. Practical design implication can be seen in the next section where the overall potential effects of cognitive towards the dimensions of ‘Format’, ‘Structure’ and ‘Representation’ of UI are observed.
3 Related Work 3.1 Impact of Cognitive Overload on Performance Cognitive Load Theory (CLT) as shown is Figure 1 is related to a mental load and mental effort which have an impact on user's performance [17]. While the mental load related to UI design, mental effort refers to cognitive capacities allocated to accommodate with UI demands. Cognitive overload occurs when the user is no longer processing information in the quantities or at the speed which it is being presented. Thus, to minimize cognitive overload, Feinberg and Murphy [18] used consistent page layout, organized information and added audio/visual elements. In addition, they discovered that CLT is consistent with general web design principles for effective design of web-based instruction.
Mental Load (task-based dimension)
Cognitive Load
Performance
Mental Effort (learner-based dimension)
Fig. 1. Cognitive Load Theory adopted from Sweller [17]
Another theoretical explanation explains a relationship between cognitive overload and understanding. The relationship between cognitive overloads and understanding is explained by Nikunj [19]. He has stated, when cognitive overloads to ensue; understanding degrades rapidly. In addition, a theoretical explanation on user understanding by Thuring et al. [20] has stressed on limitations of human information processing. According to them, low resources capability available for understanding, less likely a person will understand the information well. They thought a major factor related to user understanding is mental models. Therefore, mental models have an implication on users’ understanding during browsing and searching in a website. A performance is determined by how well the UI support in forming quick and clear mental models to turn on understanding. Thus, it can be concluding that performance provides a sign of cognitive overload. The impact of cognitive overloads on performance has been shown through field and experimental studies. Both field and experimental studies of cognitive overload show a reduced performance [21] [22]. The performance is measured based on
Cognitive Design of User Interface
747
various dependent variables. In this, performance is measured by a number of correct answers and time taken to complete tasks. This research provides a new study on the effects of cognitive load on performance of UI for searching and browsing museum collections. From an experimental study, this research is providing more empirical data on the relationship between cognitive overload and performance. The results of the experiment will provide strong additional evidence for cognitive overload effects on UI design, which may provide an explanation for why UI is so poorly understood in practice. 3.2 Research on User Browsing Patterns A review of empirical studies on browsing pattern shows that the majority of works concentrated on log file data [23] [24]. Relatively, few studies examined the user navigation to discuss on eye-tracking data. Of those that did consider this factor, many chose to focus on reading style and learning style. A review of empirical studies on user browsing pattern suggests of work concentrated on browsing using direct manipulation task. Effective display designs must provide all the necessary data in a proper representation to explore a web site. This paper is interested in particular, the identification of the browsing style. The researchers propose an approach for the user’s activity perception on a museum website to identify the user’s styles from observable indicators related to their browsing path and interactions. Their interest relates to the detection of user’s browsing styles by the automatic analysis of behaviors through the collection and interpretation of information on the user’s activities using eye-tracking data. 3.3 Research on Individual Characteristics and UI Design Use In UI design cases, it is likely that redesigning the interface will be an effective method of dealing with individual differences. Martin et. al [25] have discussed on the three approaches by Egan and Gomez in dealing with individual differences. Firstly, it is necessary to assess the area of the differences. This involves considering what to determine and how to determine them. Once differences have been observed, the essential differences have to be isolated from the confusing factors. Thus, there is a need to consider the features of the UI, features of the users, and the stability of the features. When the important features have been identified it is then necessary to accommodate these features. The researchers’ experimental work with cognitive user interfaces embodies the steps. They want to assess, identify and isolate individual differences, which have a significant impact on the human-computer interaction. Some of these differences can be accommodated through format, structure and representation dimensions of UI to improvise the UI design. Studies on UI design HCI show that there is a wide range of variation in purpose. While cognitive design by Curl et. al [26] provides proper presentation to carry out on database, Clark and Mayer [27] have strengthened important principles covered in e-learning. Both studies included individual differences to their investigation and discussed the impact of their design on performance. As a result Clark and Mayer proposed basic design principles to provide meaningful groupings of items with labels
748
N. Abdullah, W.A. Wan Adnan, and N.L. Md Noor
suitable to the user’s knowledge, consistent sequences of groups, and orderly formats all support professional users. However, this study be extended to apply to museum users for promoting museum collections, whereby the understanding of museum collection is towards attracting users to later visit the physical museums. In addition to systematic data collection, user behavior approach is monitored. It is aimed to identify any relationship between user behavior and UI elements used. A UI of the National Museum of Malaysia’s website is used in order to design accordingly to the identified elements with a goal to persuade users to browse for museum collections.
4 Cognitive-Based User Interface Elements This paper will integrate and extend the above findings by emphasizing the UI design aspects. Instead of ultimately focusing on the relevance dimensions of UI, this paper takes a wider approach by focusing on the elements of UI dimensions that can provide users with better insight and allow greater employment of supportive UI design. The major interest is towards the development of UI dimensions that display information designed primarily for its cognitive impact on the user. Multi Resources Theory Dimensions Code of processing
Visual Channels
Towards creating Format Dimension
UI Dimensions Format
Aiding Concept
Structure
Aiding Concept
Aiding Concept
Towards creating Structure Dimension
Perceptual Modality
Towards creating Representation Dimension
Representation
Human Differences Holist and Serialist [2]
Imagery and Verbalist [2][3]
Active and Passive [2][10]
Fig. 2. Cognitive Framework of Understanding UI Dimensions (Adapted from Natrah et. al [28])
A theoretical framework of UI design was proposed in a previous study [28]. The framework has three dimensions; format, structure and representation. It is shown in Figure 2. The dimensions are based on Multiple Resource Theory. The Format Dimension is defined as a mode of presentation of content in UI. The Format Dimension is used to get attention from users. User tends to remember content of UI if the format is informative and is able to draw user’s attention. The Structure Dimension communicates to users about how to proceed through the UI. The objective of structure dimension is to make information on web sites easy to find and to avoid common browsing
Cognitive Design of User Interface
749
errors by users, such as getting lost during finding information. The Representation Dimension is to deliver experiencing and to increase usefulness whiles interacting with the UI. To extend the format, structure and representation dimensions, cognitive styles are incorporated into the dimensions. Riding and Rayner [10] provide descriptions on groups of cognitive style. They also discuss on the assessment that has been done through series of empirical study [10]. They concluded that imagery-verbal, serialistholist and active-passive users have significant implications on designing UI. Users browsing and searching in a web environment should be able to form quick and clear mental models. This can be achieved through the use of UI design elements that support the process of mental-model formation. The design dimensions and elements of cognitive-based UI for web environment are outlined below: Textual or Visual: The study extents the Format Dimension by considering the format, visual and textual format of content. Textual and visual formats have an influence on imagery and verbalist user. The Format Dimension is designed as to get user’s attention in order to attract them browsing for museum collection and museum information presented both in a textual and visual format. Organized, not unorganized: Structure of information content is related to the arrangement of the information content [20]. This study extents the Structure Dimension by considering two elements, structured and unstructured. Structure addresses the interrelationships. It enables users to browse through the UI. The researchers have adopted navigation technique, which consists of two basics: traveling, which is the control or motion of the user's viewpoint in the environment, and way finding, where the user determines the path based on the knowledge of the virtual environment [9]. These issues are combined to the structure and require a good understanding when designing an effective information organization. However, traveling typically is a basic task that enables one to perform a more extensive interaction with the virtual environment. To reduce user's cognitive overhead on structure, Thuring et. al [20], and Storey et. al [29] suggests on providing an overview, table of contents, summaries or headings for holist user as to assist them with the information searching. Interactive, not static: The Representation Dimension is defined as a dimension of UI for task handling [28]. In a web environment, task handling involves interaction between user and interface [30]. There are two types of interactions, based on passive and active users. Active user enjoys the element of interactivity because they are flexible in manipulating the UI and passive user rarely involved in the interaction. This study extent the Representation Dimension by proposing two elements, namely passive viewing interaction and active viewing interaction elements as to facilitate a flexible view of UI. Direct manipulation elements in which the machine is viewed as a simple tool, and as a passive collection of objects waiting to be manipulated is suitable for active user. Passive viewing interaction is where the user does not interact with the display and infrequently manipulate during use. Passive viewing applications are common for public displays and small displays. Icons are effective for quickly
750
N. Abdullah, W.A. Wan Adnan, and N.L. Md Noor
communicating information on small display. This is corresponding to a requirement of a small space for presenting and manipulating cultural collections in the cultural website. In addition, small display may reduce processing activity, and it will reduce a need for high performance capability of computer processor.
5 Experimental Method The experimental research approach is adopted in the study. The approach is carried out in controlled environment where the independent variable is manipulated. The objectives of the study are to measure user performance on cognitive-based UI dimensions. More specifically the objective is to examine the influence of UI organization, UI format and UI representation on completion time and accuracy for searching and browsing task. Cognitive style and gender are used as dimensions to the individual characteristics. The experimental method consists of collecting data using single factor design. Single factor design in which one independent variable is manipulated to test which factor contributes to better performance. Respondents are assigned within-subjects design in which each subject is assigned to all treatment conditions. In the evaluation of a UI design, the same subjects used the system under seven different treatment conditions. In an ideal experiment only the independent variable should vary from condition to condition. In reality, other factors are found to vary along with the treatment differences. These unwanted factors are called confounding variables and they usually pose serious problems if they influence the behavior under study since it becomes hard to distinguish between the effects of the manipulated variable and the effects due to confounding variables. As indicated by [32], one way to control the potential source of confounding variables is holding them constant, so that they have the same influence on each of the treatment conditions for the study is testing environment. As part of the methodology, experimental design, participants, tasks, interfaces, and the experimental tools are discussed in the next few sections. 5.1 Experimental Design Measuring on performance issue concerning structure, format and representation, an experiment was conducted on a group of participants. They were examined different types of interfaces and filled out detailed questionnaires for each type of interfaces. Six different types of interface were created based on input from the UI framework. The experimental design within-subject, summarized in Table 1, apply reductionism approach in order to discover number of conditions to be tested. With reductionism method six conditions were tested. We were compared textual and visual of UI format, structured and unstructured of UI organization, and with-interaction and withoutinteraction of UI representation. Thus, the independent variables were structured, unstructured, textual, visual, three-dimensional without interaction (3DL) and threedimensional with interaction (3DH). The dependent variables were the completion time and number of correct answer, recorded in log file. Fig. 3 shows the experimental design adopted for this study.
Cognitive Design of User Interface
751
User Characteristics Cognitive Style
Format Visual/Textual Structure Organized/Unorganized
Performance Completion Time
Representation Interactive/Static Task Navigating Way Finding
Fig. 3. Experimental Design
This study used design as illustrated in Fig. 3, which is adapted from Cognitive Fit Theory. User Characteristics are measured to see the influence of user differences on the UI Dimensions. There were five independent variables: Structure, Format, Representation, User Characteristics, and Task. The dependent variables were completion time as shown in Fig 3. Completion time is time allocated to complete a series of task. 5.2 Participants This experiment is conducted with website users which have an experience visiting museums. Thirty museum visitors were agreed to participate in this laboratory study. The participants were volunteers with roughly equal numbers of Field Dependent (FD) and Field Independent (FD). They were familiar with Web browsing. 5.3 Experimental Hypotheses These hypotheses cover the affects issues of experimental variables. List of research hypotheses to be tested in this experimental study is presented below. Performance is used to test whether the UI dimensions are benefit during browsing and searching activities. By comparing task performance in structured and non-structured conditions; visual and textual format setting; and accompanied by interaction and without interaction features, the benefit can be observed. Thus, the following hypotheses are constructed and will be tested: • UI Organization and Performance H10: There is no significant difference in Performance across UI organizations. H1a: There is a significant difference in Performance across UI organizations.
752
N. Abdullah, W.A. Wan Adnan, and N.L. Md Noor
• UI Format and Performance H20: There is no significant difference in Performance across UI Formats. H2a: There is a significant difference in Performance across UI Formats. • UI Representation and Performance H30: There is no significant difference in Performance across UI Representation. H3a: There is a significant difference in Performance across UI Representation. 5.4 Tasks and Procedure This study seeks to determine if certain features in an interface design is beneficial to certain types of users. In order to determine this, this study will test and compare performance of six interface designs and to investigate the interaction effect of Format, Organization and Representation dimensions. The questions in computer program are related to the content of the program. It concerns about cultural tourism information. Participants were asked about the museum information, direction to museum and collections available in museum and some demographic information about themselves. Each participant was tested for two sessions. In first session, a ten-minute introduction was given to the participants prior to their tasks. Then the participant was asked to complete the Group Embedded Figures Test (GEFT) [31] to determine the participant’s cognitive style. That administration of the GEFT was last twenty minutes. Participants also were asked to provide demographic information (age, gender, etc.) for research purpose only. Answers were recorded via paper-based answer script. Each participant’s unique number ID was used to code their answers and selections. The first session was last about forty-five minutes. In the second session, each participant was conducted a series of tasks on six interfaces. For each task, subjects randomly assigned to one of the interfaces in a random order. All participants were required to finish tasks in one hour. After completing with the interfaces participants were given tasks’ questions for each interface design type. The task was to search for the correct answers, as accurate as possible, and to complete the task as fast as they can. They were needed to mark the correct answer in the computer. Each task was allocated for six minutes. After completing series of tasks for each interface design type, participants were required to answer a few questions. We were also asked participants few questions at the end of the experiment. 5.5 The User Interfaces In the experiment one, participants were used seven different interface designs (1, 2, 3, 4, 5, and 6) to perform a series of tasks. A web-based application on Malaysia Museum Directory was used in this experiment. The interfaces are shown in the Fig 4.
Cognitive Design of User Interface
a) Interface 1(structured)
c) Interface 3 (textual)
e) Interface 5 (2D)
b) Interface 2 (unstructured)
d) Interface 4 (visual)
f) Interface 6 (3D) Fig. 4. User Interfaces for the Experiment
753
754
N. Abdullah, W.A. Wan Adnan, and N.L. Md Noor
5.6 Analysis A mixed between-within subjects’ analysis of variance was conducted to compare score on subjects’ completion time with Unstructed and Structured UI. Within-subject effect was measured. There was a significant effects for completion time with large effect size [Wilks’ Lambda=.588, F(1,26)=18.20, p.05, multivariate partial eta squared=.32.], indicating that the change in structured UI in the FD group was not significant different to the change in the FI group. Specifically, there was a significant raise in completion time in the FI and FD group. These findings indicate that there was a big effect size when participant were encouraged to use structured UI. This shows that using Structured UI accelerate the expected influence in completion time. A between subject analysis of variance was conducted to explore the impact of cognitive style on completion time. Subjects are divided into two groups according to their cognitive styles (Group 1: Field Dependent; Group 2: Field Independent). There was no significant difference in the completion time for the two groups [F (1, 26) =1.31, p>.05, multivariate partial eta squared=.05.] and the effect size of the completion time between-subject was moderate. These findings indicate FD doing faster than FI using Structured or Unstructured UI. A mixed between-within subjects’ analysis of variance was conducted to compare score on subjects’ completion time with Textual and Visual UI. There was not significant effects for completion time [Wilks’ Lambda=.946, F(1,26)=1.45, p>.05, multivariate partial eta squared=.054]. The results show that there was no change in completion time across two UI. This indicated that when the format at which UI was measured is ignored, the user completion time in visual was not significantly different to the textual UI. The between subject analysis of variance was conducted to explore the main effect for cognitive styles. There was a significant difference in the completion time for the two groups [F (1, 26) =4.72, p.05, multivariate partial eta squared=.002.] and the effect size of the completion time between-subject was small. These findings indicate FI doing faster than FD using 2D or 3D UI. 5.7 Discussion This study shows a possibility of cognitively adapted UI by connecting cognitive process and UI components. While past studies [11][32] are shown connection between cognitive styles and interface design more related to format, accessibility, structure, interaction flow and menu structure, the results of this study indicates that visual design, structured and 3D dimensions are beneficial to certain types of users. Therefore, the features play an important role in designing user interface. There is a difference of approach between two groups toward format. The participants having a FD categorization style performed faster in the visual format. Therefore, for FD users who are known to be more dependent on clues, it may be better to work in visual. For example, the system can makes additional features for that user while user viewing the information. The participants having a FI categorization style perform and preferred textual UI. Therefore, for FI users may be better to include textual information in order to help viewing visual information. There is no difference of approach between two groups toward structured and 3D interfaces. The participants performed the tasks better in the situation where structured and 3D interface were available. Therefore, 3D and structured dimensions are needed to help designing usable UI. However, there were limitations in the experiment. The subject groups were mostly museums visitors in their early twenties who are more easily adjusting themselves to change so it might be difficult to find clear differences between individuals. Therefore, we will need to capture a big-enough sample with diverse generations to ensure the validity of data. Future research also will combine requirements of FD and FI user toward interactivity in an interface and discover experiment on usability of the user interface design.
6 Conclusion Designing the UI for museum websites is crucial as to make users understand and appreciate cultural collections in museums. This paper aims to understand and establish the relationship between cognitive styles to existing UI framework and
756
N. Abdullah, W.A. Wan Adnan, and N.L. Md Noor
of understanding UI. The goal is to understand what do cognitive perspectives may have on UI, in which, the understanding and the theoretical proposition highlighted could bring forward valuable knowledge from a known knowledge into UI domain. An integrated framework, combining these perspectives is presented in Figure 2 as part of the theoretical building process of the UI design framework. This framework is formed by using inductive reasoning research method, which is performed by conducting literature analysis on the related website on UI design and cognitive design elements. There are several important implications from this research for future research and practice purposes. First, the researchers use some existing concepts of cognitive styles to understand UI design of online environment. Second, they integrate user differences perspectives related to cognitive styles that later offer views for UI design development. This is done by using propositions suggested by cognitive style groups to understand the different cognitive background. Then, this understanding is used as a part of the understanding framework of cognitive UI design. In addition, among the implications and contribution of this research is the identification of cognitive UI design for cultural website that supports user when browsing for museum collections. The effort may contribute towards increasing the usability level of the website. The next study will consider the design elements mentioned to evaluate user performance and usability for UI of a cultural website. Evaluation will take place using an experimental approach. The researchers will apply reductionism approach where user will be asked to use seven UI, with different elements, and researchers will monitor their performance separately. This approach may be useful to support much required research areas orienting on engaging digital content and focused on better understanding of various website users.
References 1. Buhalis, D., O’Connor, P.: Information Communication Technology Revolutionizing Tourism. Tourism Recreation Research 30(3), 7–16 (2005) 2. Kateli, B., Nevile, L.: Interpretation and Personalisation: Enriching individual experience by annotating on-line materials. In: Trant, J., Bearman, D. (eds.) Museums and the Web 2005: Proceedings, Toronto: Archives & Museum Informatics, March 31 (2005), http://www.archimuse.com/mw2005/papers/kateli/kateli.html 3. Besser, H.: The Transformation of the museum and the way it’s perceived. In: JonesGarmil, K. (ed.) The wired museum, pp. 153–169 (1997b) 4. Watkins, J., Russo, A.: Digital Cultural Communication: Designing Co-Creative New Media Environments. In: Candy, L. (ed.) Creativity & Cognition: Proceedings 2005, ACM SIGCHI, pp. 144–149 (2005) 5. Cheng-Wei, F., Jun-Fu, H., Rong-Jyue, F.: Development of a learning system for a museum collection of print-related artifacts. Electronic Library 26(2), 172–187 (2008) 6. White, M.D., Ivonen, M.: Questions as a factor in web search strategy. Information Processing and Management 37, 721–740 (2001) 7. Ren, F.: Affective Information Processing and Recognizing Human Emotion. Electron. Notes Theor. Comput. Sci. 225, 39–50 (2009)
Cognitive Design of User Interface
757
8. Kim, J.H., Lee, K.-p., You, I.K.: Correlation between cognitive style and structure and flow in mobile phone interface: Comparing performance and preference of korean and dutch users. In: Aykin, N. (ed.) HCII 2007. LNCS, vol. 4559, pp. 531–540. Springer, Heidelberg (2007) 9. Fikkert, W., D’Ambros, M., Bierz, T., Jankun-Kelly, T.J.: Interacting with visualizations. In: Kerren, A., Ebert, A., Meyer, J. (eds.) GI-Dagstuhl Research Seminar 2007. LNCS, vol. 4417, pp. 77–162. Springer, Heidelberg (2007) 10. Riding, R., Rayner, S.G.: Cognitive styles and learning strategies. David Fulton, London (1998) 11. Chen, S.Y., Magoulas, G.D., Macredie, R.D.: Cognitive styles and users’ responses to structured information representation. International Journal Digital Library 4, 93–107 (2004) 12. Sternberg, R.J., Grigorenko, E.L.: Are cognitive styles still in style? American Psychologist 52(7), 700–712 (1997) 13. Atkinson, S.: Cognitive Styles and Computer Aided Learning (CAL): Exploring Designer and User Perspectives. In: PATT-11 Conference, Haarlem, The Netherlands, pp. 2–14 (2001) 14. Ali Reza, R., Katz, L.: Evaluation of the reliability and validity of the cognitive styles analysis. Personality and Individual Differences 36, 1317–1327 (2004) 15. Wickens, C.D.: Multiple Resources and Mental Workload. Human Factors 50(3), 449–455 (2008) 16. Oviatt, S.: Human-Centered Design Meets Cognitive Load Theory: Designing Interfaces that Help People Think. ACM Press, New York (2006) 17. Sweller, J.: Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction 4, 295–312 (1994) 18. Feinberg, S., Murphy, M.: Applying cognitive load theory to the design of web- based instruction. In: Proceedings of SIGDOC, pp. 353–360 (2000) 19. Nikunj, D., Quible, Z., Wyatt, K.: Cognitive design of home pages: an experimental study of comprehension on the World Wide Web. Information Processing and Management 36(4), 607–621 (2000) 20. Thuring, M., Hannemann, J., Haake, J.: Hypermedia and cognition: designing for comprehension. Communications of the ACM 38(8), 57–66 (1995) 21. Lewis, D., Barron, A.: Animated demonstrations: Evidence of improved performance efficiency and the worked example effect. In: Proceedings of HCI International, pp. 247–255 (2009) 22. Moody, D.L.: Cognitive load effects on end user understanding of conceptual models: An experimental analysis. In: Benczúr, A.A., Demetrovics, J., Gottlob, G. (eds.) ADBIS 2004. LNCS, vol. 3255, pp. 129–143. Springer, Heidelberg (2004) 23. Chen, S.Y., Magoulas, G.D., Dimakopoulos, D.: A fexible interface design for web directories to accommodate different cognitive styles. JASIST 56(1), 70–83 (2005) 24. Kim, K.-S.: Information seeking on the Web: effects of user and task variables. Library and Information Science Research 23(3), 233–255 (2001) 25. Egan, D.E.: Individual Differences in Human-Computer Interaction. In: Helander, M. (ed.) Handbook of Human-Computer Interaction, Elsevier Science Publishers B.V, New York (1988) 26. Curl, S.S., Olfman, L., Satzinger, J.W.: An investigation of the roles of individual differences and user interface on database usability. Database for Advances in Information Systems 29(1), 50–65 (1998)
758
N. Abdullah, W.A. Wan Adnan, and N.L. Md Noor
27. Clark, R.C., Mayer, R.E.: e-Learning and the Science of Instruction. John Wiley, Hoboken (2002) 28. Natrah, A., Wan Adilah, W.A., Nor Laila, M.N.: Towards A Cognitive-Based User Interface Design Framework Development. Springer, Heidelberg (2011) 29. Storey, M.A.D., Fracchiat, F.D., Muller, H.A.: Cognitive Design Elements to Support the Construction of a Mental Model during Software Visualization. The Journal of Systems and Software 44(3), 171–185 (1999) 30. Schneiderman, B., Plaisant, C.: Designing the User Interface: Strategies for Effective Human-Computer Interaction, 4th edn. Addison-Wesley, Reading (2004) 31. Witkin, H.A., Oltman, P.K., Raskin, E., Karp, S.A.: A Manual for the Group Embedded Figures Test. Consulting Psychologists Press, Palo Alto (1971) 32. Kim, J.H., Lee, K.-p., You, I.K.: Correlation between cognitive style and structure and flow in mobile phone interface: Comparing performance and preference of korean and dutch users. In: Aykin, N. (ed.) HCII 2007. LNCS, vol. 4559, pp. 531–540. Springer, Heidelberg (2007)
Communications in Computer and Information Science: The Impact of an Online Environment Reading Comprehension: A Case of Algerian EFL Students Samir Zidat1 and Mahieddine Djoudi2 1
Computer Science of Batna, UHL University, Algeria [email protected] 2 Laboratoire SIC et Equipe IRMA UFR Sciences SPMI, Université de Poitiers Téléport 2, Boulevard Marie et Pierre Curie, BP 30179, 86962 Futuroscope Chasseneuil Cedex, France [email protected]
Abstract. In this study we used a statistical analysis, based on a sub-sample of participants enrolled in the fifth years of Computer Science Department of Batna (Algeria) University, the effects of an online environment reading comprehension were investigated. The students’ native language was Arabic, and they were learning English as a second foreign language. The two research questions of this study are: 1. Are there any differences in students' reading paper-andpencil / web mode? 2. Are there any differences in students' reading individual / collaborative mode? The paper proves that working with our online learning environment significantly improved the students’ motivation and positively affected higher-level knowledge and skills. Keywords: reading skills, reading comprehension, evaluation, e-learning.
1 Introduction This paper describes the initial findings of a case study that investigates the effects of an online learning of English as a Foreign Language (EFL) university students’ English reading comprehension. This online learning environment, called ReadFluent (R.F), is developed to the Computer Science Department students in the use of English at Batna University. The current English language teaching situation at the University level for Computer Science students does not lead to good results [1]. Graduate students are required to read and understand written documents in English in relation to their different fields of study. The use of a language is based on four skills. Two of these skills are from comprehension domain. These are oral and written comprehension. The last two concern the oral and written expression. All four macro-English language skills are greatly needed. Reading skills are considered as the most important. H. Cherifi, J.M. Zain, and E. El-Qawasmeh (Eds.): DICTAP 2011, Part II, CCIS 167, pp. 759–773, 2011. © Springer-Verlag Berlin Heidelberg 2011
760
S. Zidat and M. Djoudi
The difficulties encountered by Batna Computer Science Department students in the non-existing courses, and the inadequacy of the teaching materials and techniques. Only one hour and a half per week during four academic years is assigned to the English course. The amount of student-teacher interaction is minimal, and individualized instruction is irrelevant. This has a bad effect on the teacher student relationship and students lose interest and their motivation diminishes accordingly. If the question of time is beyond the teacher's control, the teacher must to look into another ways to supplement what he or she does in the classroom. These factors put together led to a shift to the communicative approach to foreign language education, namely, web assisted language learning [2]. Therefore, an online reading environment specially designed and geared to students’ needs is recommended as an urgent need for Computer Science Department students at Batna University. In addressing reading difficulties, the selection of an appropriate Web based application for comprehension reading is one of the most essential aspects [3]. The development of this application involves a serious commitment to understanding the different features of this medium and the ways it can be used most advantageously to impart learning. Such an environment should include necessary functionalities which facilitate and improve reading. It is very important to know how to read the different types of material we encounter. One pertained method the students use, when they need to understand the details and maximize retention, is the active method. It involves highlighting important points and even making short notes in the margins. This environment should also be able to assist the teacher in tutoring and administration. Such a tool could be very helpful for the poor readers in regaining their self-confidence and incites the teachers to be an active of educational change. This paper is organized as follows. In section 2, we start with the presentation of online reading comprehension. The last section is destined to the description and analysis of an experimental study that we conducted in our Computer Science Department with student preparing the engineering degree. Reading comprehension was measured by performance on a set of comprehension skills. The findings of the experiment reveal by the ANOVA’s method that from practical side, a true integration of Web in English language pedagogy can help poor students become fluent, independent and incite the teacher to be an active of educational change. The paper concludes with some suggestions and recommendations necessary for the implementation of online reading comprehension in Algerian universities.
2 Related Work There has been much work on online reading focusing on new interaction techniques [4] and [5] to support practices observed in fluent readers [6] and [7], such as annotation, clipping, skimming, fast navigation, and obtaining overviews. Some work has studied the effect of presentation changes like hypertext appearance [8] on reading speed and comprehension.
Communications in Computer and Information Science
761
The findings support what other studies have found in terms of positive influence of online environment on students’ performances [9], [10], [11] and [12], but cannot be a substitution for them. The characteristics of online environment can increase students’ motivation, create highly interactive learning environments, provide a variety of learning activities, offer independence to users in the process of learning, improve learners’ self confidence, and encourage learners to learn in a better way with technology-based tools. Nowadays, most of the universities are asked to enhance of their personals’ skills in order to utilize the new technologies in their teaching activities in an efficient way [13]. One of the modern technologies is online learning environment which is a software application to be used to integrate technological and pedagogical features into a well-developed virtual learning environment [14], [15] and [16]. Students have easy access to course materials, take online tests and collaborate with class mates and teacher.
3 Online Reading Comprehension Although comprehension is incessantly acknowledged as the final objective of reading, comprehension problems in students continue [17]. Learning does not involve some kind of obscure transfer of knowledge from one source to another, but rather consists of the active role engaged by the learner to deal with the information for use [18]. The central ideology of learning theories is that learning occurs inside a person [19]. Good readers are most often strategic readers. That is, they use a number of comprehension strategies to get meaning from text. Comprehension strategies are conscious plans or procedures that are under the control of a reader, who makes decisions about which strategies to use and when to use them [20]. One of the most difficult problems facing Batna university teachers today is that many students come from secondary schools without the indispensable knowledge, skills, or disposition to read and comprehend the print or electronic text. Reading comprehension can be defined as “the ability to obtain meaning for some purpose” [21]. Teachers need to develop skills that encourage reading comprehension, guarantee content learning through reading, and deal with the differences in comprehension skills that their students present. Each student will have different strengths to build on and different weaknesses to overcome. To improve learners’ reading ability, effective reading strategies and supportive tools are being widely considered [22]. How to take advantage of online resources to make easy language learning is a question considered by many researchers. While teachers need to use the ability of the new high technology to facilitate learning processes, students are encouraged to improve their learning through computer and online activities [23]. It is no longer a question of whether to take advantage of these electronic technologies in foreign language instruction, but of how to harness them and guide our students in their use [24]. According to Pei-Lin Liu, for instance, the computer-assisted concept mapping reading strategy improved poor readers’ reading ability and narrowed the reading proficiency gap between good and poor readers [25].
762
S. Zidat and M. Djoudi
An online comprehension reading environment should offer an opportunity to help students become active participants in learning process. They actively interpret and organize the information they are given, fitting it into prior knowledge. Teachers don’t become the only source of information anymore; the teacher’s role is changed from information giver to facilitator, counselor, resource and technology managers, and mediator to the students [26]. The question then is: How can online reading environment help poor readers? We suggest that the answer may lie in providing students a good view about the mental activities that good readers engage in to achieve comprehension. Successful reading comprehension requires the efficient coordination and integration of a number of underlying cognitive processes, which include reading decoding abilities and the utilization of linguistics and relevant previous knowledge [27] and [28]. An important goal is to learn how to become self-regulated, active readers who have a variety of strategies to help them comprehend. Common online comprehension activities for practice are the exercises, including drills, comprehension skill-based quizzes. While the students are doing activities, they become immersed in the comprehension-need situation and react much as they would in real life that leads to learning by doing. One of the benefits of online reading is the students can access e-Learning anywhere and at any time of the day. It’s “just in time – any time’ approach makes the learning process ubiquitous. An online reading provides opportunities to obtain to each learner instant and more interactive feedback compared to traditional media and an environment where more autonomous self-study reading comprehension can be realized much more efficiently and effectively. The degree to which the online reading is useful in comprehension depends fundamentally on how well the materials match the needs of the students and their ability level. The online reading environment should give the student control of material selection and the pace of progress. The student is willing to and capable of taking charge of his or her own learning.
4 Software Architecture When the teacher’s daily tasks are observed thoroughly, we find out that she/he executes a number of different activities. For example, helping students acquire specific reading skills such as: main ideas, factual recall, vocabulary, sequencing, ability to make inferences, and drawing conclusions. So the teacher is a position to tell which kinds of knowledge and strategy can be applied to understand a given text, and can support students in the process of reading comprehension by providing suggestions and corrections. Finally analyzing students’ comprehension and explaining the different types of errors. ReadFluent is designed to equip teachers with the following characteristics such as: “flexibility” and “generality” in order to meet individual student’s needs and abilities. Students can access to our environment once they are in the university and outside as well.
Communications in Computer and Information Science
763
Fig. 1. ReadFluent architecture
We were not particularly interested in meeting only the students’ satisfaction taking into consideration their social aspects of interaction, teachers and administrators were also included. Meeting social, intellectual and technical interactions of all users of a learning tool should be a basic guide for designers. Satisfaction of the users in the computer based and information environments is very important for developers and administrators of these environments [29]. Because success of the computer based environments generally is associated within the user satisfactions [30]. Satisfaction is one of the factors that affect usability of the system which also directly affects users’ performance. When we developed ReadFluent we took into consideration these three major metrics’ satisfaction: 1. 2. 3.
Systems quality: Easy-to-use, user friendly, stable and fast. Information quality: Completeness, well organized, clearly written and up-to-date. Service quality: Prompt, knowledgeable and available.
ReadFluent is a web-based application with server-side processing of intensive requests. The application provides the three principal users (teacher, learner and administrator) a device, which has for primary functionality the availability and the
764
S. Zidat and M. Djoudi
remote access to pedagogical contents for comprehension reading, personalized learning and distance tutoring. ReadFluent allows not only the downloading of the resources made available on line (using a standard navigator), it also provides a real environment where all users can somehow meet and satisfy their personal learning needs.
Fig. 2. Gap filling’s sample
Conceptually, the relationship among the three principal components of the environment can be viewed as in Fig. 1. To use ReadFluent, learners should be able to choose or type basic sentences. Learners as showed in Fig. 2 and Fig. 3 are guided step-by-step, through the process of learning. Personalized tips and suggestions are provided throughout. Students may at any time save and resume work where teachers can access to easily. The developed learning environment for this study is a web application written in PHP 5.2.0 scripting language and Apache 2.2.3 Web server. A MySQL 5.2.27 database connected to the learning environment contains the learning materials and registers student’s actions such as: performance and tasks timing. ReadFluent works on two operating system (Win, Linux) supporting one of many popular web browsers. Recommended browsers are Firefox 2.0+ and Internet Explorer 6, 7. ReadFluent with an instruction of the teacher suggests two scenarios to the students: 1. 2.
Tasks accompanied with text (Students can see and resee the supporting text as many times as they wish) Tasks without accompanied text (Students access to supporting text only once, just at the beginning)
Communications in Computer and Information Science
765
The tutor can formulate several forms of questions: 1. 2. 3. 4.
Multiple Choice Question, MCQ, Reorder the given sentences, Gap filling, and Research the pertained information.
5 Experimentation Today, Algerian universities are expected to provide students’ access to online learning to support traditional teaching. No previous studies have been conducted to examine the effectiveness of online reading and to provide support for the use of online reading in English as a second foreign language EFL classroom settings. The lack of studies has in part reflected the difficulty in administering experimental studies at Algerian universities due connectivity in the past years do not guarantee successful distance learning use and Internet novel presence in society at large. These e-Learning applications, designed to offer students an alternative learning or training arrangement through the Internet, can be important contributions to existing educational programs. It is important that the consequences of their implementation are thoroughly reviewed before the systems are implemented. In Batna Computer Science Department, one way to gauge student’s reading development is through reading aloud, which can provide a teacher with information about the reading cues student are using as well as levels of fluency and accuracy. However, “Not all competent readers can read aloud well or at all, and, for some, skill at reading aloud is not very representative of reading comprehension” [31]. Another way to evaluate the reading ability of students is through giving them reading comprehension questions answering tests. These tests typically consist of a short text followed by questions. Presumably, the tests are designed so that the reader must understand important aspects of the text to answer the questions correctly. In this paper, we will study our ReadFluent environment, developed as a support system for students, with emphasis to what extent the system is suitable as a supplement to more traditional educational programs. For the development and implementation of effective online comprehension environments, there is a need for teachers to become active and critical online users and develop their own skills and strategies for creating and/or selecting online materials. Although [32] advances that “Yet even with these modern and powerful tools, authoring web-based questions and quizzes remains a difficult and time-consuming activity!” The main hypothesis of the present research study is as follow: the ongoing integration and utilization of the online reading comprehension skills will firstly enhance the learners’ affect exemplified by high motivations, and self-confidence. Consequently, when learners are motivated, they learn better and acquire more knowledge. Secondly, empower the English teachers’ roles and responsibilities as active agents of pedagogical and technological change. The main objectives of the current work are to investigate, firstly, the validity of web based application for reading comprehension and secondly, to attract both teachers and learners’ attention as to the crucial
766
S. Zidat and M. Djoudi
relevance of the urgent integration the web in English language learning in Algerian university. Through ReadFluent, we have tried to exploit the advantages of online, web techniques, offering students a system that they can use any time from any place. The idea of ReadFluent is not to replace existing teaching methods, but to work together with. 5.1 Materials The reading materials were two passages given into English language, one text concerning “Computer” and another “Network”. A set of a closed-ended comprehension questions was prepared for the two texts. Multiple choice questions, reorder the given sentence, gap filling and research the pertained information are a common means of assessing learners' reading comprehension because the task is familiar to students and is easy for researchers to score [33]. Then the closed-ended format was chosen, despite much of the criticism on this method of testing, after a preliminary study with the open-ended format of questions which yielded too many divergent responses [34].
Fig. 3. Student interface for a question’s sample 5.2 Research Questions In this article we used ANOVA analysis, based on a sub-sample of participants enrolled in the fifth years of Computer Science Department of Batna (Algeria) University, the effects of web-based application were investigated. The students’ native language was Arabic, and they were learning English as a second foreign language.
Communications in Computer and Information Science
767
Hence, most students reading in a language which is not the learner’s first language is a source of considerable difficulty. Their EFL vocabulary and grammar knowledge as well as their efficiency in processing EFL words and sentences are at a low level, relative to their L1 (First language) skills. Another hypothesis was investigated, which states that “good motivation by collaborative work gives good results” is strategic in nature and can be transferred to EFL reading comprehension, despite relatively poor skills in EFL vocabulary and grammar knowledge and relatively slow processing of EFL words and sentences. The two research questions of this study are the following: 1. Are there any differences in students' reading paper-and-pencil / web mode? 2. Are there any differences in students' reading individual / collaborative mode? 5.3 Procedure Before the tests and questionnaire were administered, students read a consent form that explained the purpose of the study and they agreed to participate. All students read both passages and completed all measurements for both passages on two different days separated by 2 weeks. The students are divided in four groups (see table 1). In the first session, the “Computer” text was used; in the second session, the “Network” text was used. In each session, the students of two groups received the “web” condition and two other the “paper-and-pencil” condition. Thus after two weeks, every student was exposed to the two contents (Computer and network), each content is in one of two processing conditions (“paper-and-pencil” and “Web”). Table 1. The groups of work Group N°.
Composition
Group 1
5 students working separately on web.
Group 2
5 students working separately on paper-and-pencil.
Group 3
2 groups of students (2 and 3) working in reading-collaboration on web.
Group 4
2 groups of students working in reading-collaboration on paperand-pencil.
In the first session, two groups (1, 2) of the students used the paper-and-pencil like support of work (reading and answering) for the “Computer text”; each student of the first groups work separately and the second group is divide also into two small groups of students working collaboratively (Read the text and do a set of questions, together). The other two groups (3, 4) used the web as work support for the “Network” text. Each student of the third group work separately and the fourth group is divided also into two small groups of students working collaboratively.
768
S. Zidat and M. Djoudi
In the second session, those students who had received the “paper-and-pencil” condition in the first session received the “web” condition for the “Computer” text, and those that had received the “paper-and-pencil” condition in the first session received the “web” condition for the “Network” text. The information concerning every session are summarized in the table 2. The test condition involved the following instructions: Read the following text. You have fifty minutes for this task. The conditions were explained to students who asked for clarification. The set of questions was distributed to the students with the text on their desks or accessible directly via Web. After 5O min, all the materials were collected or saved. Table 2. The type of text per Group/Session Group N°.
Session I
Session II
Group 1
Computer text
Group 2
Computer text
Network text
Group 3
Network text
Group 4
Network text
Computer text Computer text
Network text
The text to read and the tasks (set of questions) are elaborated by a specialist teacher at the Department of English language of Batna University. The set of the proposed questions are marked on 20 points. After completing the two activities, the students completed a post-activity questionnaire. They were able to agree or disagree with different statements intended to measure their perceptions of the effectiveness of the task and the technology they used to accomplish it. 5.4 Statistic Study and Result By using the SAS (originally Statistical Analysis System) which is an integrated system of software products provided by SAS Institute, the statistical procedure Analysis of Variance (ANOVA) is selected because we want investigate the relationship of independent variables with dependent variables and are concerned about the variation between and within groups of variables [35]. As showed in figure 3, the web factor is significant (p=0.026): the students who used the Web mode (G1 and G3) obtained a higher performance longer than those which used the paper-and-pencil mode (G2 and G4). The use of the web as work support in the reading comprehension was more effective than the traditional use of paper-and-pencil concerning this population. The collaborative factor is no significant (p=0.0856): the students who used the collaborative mode (G3 and G4) obtained an equivalent performance as than those which used the paper-and-pencil mode (G2 and G4). The factor method (collaborative vs. individual) does not seem here to have of influence. Only the nature of the mode (web or paper-and-pencil) seems here to play an important role. Contrast G1 G3 vs. G2 G4 is significant (p=0.0026), but not contrast G1 G3 vs. G2 G4.
Communications in Computer and Information Science
769
Fig. 4. Statistical result
The results of the present study indicated that one important interacting factor in the EFL reading comprehension is the “Web” use. To evaluate the students’ subjective attitudes toward the online reading, we administered a questionnaire at the end of tests. The questionnaire included a range of questions focused on the online reading’s value, its features, and aspects of its usage. 5.5 Limitations of the Study The study was conducted within a relatively short period of time; this may have negatively skewed the results obtained. The fact that the study was done with only one class of students at Computer Science Department at Batna, Algeria, may also limit the extent to which the results can be generalized to other population. In the near future, we are thinking of carrying out a thorough investigation of the same issue at a national scale, maybe, why not a Maghreb one (Tunisia, Morocco and Algeria). 5.6 Discussion Access to the online reading environment helps teachers discover ways to bridge the gap between students’ needs and classroom instruction. The teachers may motivate their students by selecting the appropriate materials and especially for those at the early stages of learning. Students with comprehension disabilities must find meaningful ways to complete the task of gaining understanding from written text, and reading comprehension strategies offer possibilities for improving reading comprehension. This experimentation allows us to collect information on the effective activities of the users. We can thus validate or question certain technical choices and determine with more precision the adaptations that have to be made to the online tools. If we
770
S. Zidat and M. Djoudi
define comprehension only as reading a passage and answering a few multiple-choice questions about it, the Reader couldn't know that he is in need to develop a deeper and more precise conceptualization of the construct. Some measures of comprehension that are referenced to the characteristics of text are necessary. That is, a way of relating an assessment of comprehension to the difficulty of the text. Common comprehension assessments focus on standard tasks as reading for immediate recall, reading for general idea, and reading to deduce word meaning. But these assessment procedures to evaluate learners’ capacities are not enough. It's valuable information to know how the learner modify old or build new knowledge structures, to evaluate texts on particular criteria or to use information obtained while reading in the interest of problem solving. The two important consequences of reading with comprehension are knowledge and application. The assessments that return all two are crucial to the success of reading with comprehension. However, such a discussion would be incomplete without addressing the disadvantages or obstacles related to the use of the online reading environment. While the online reading environment offers a great deal to the students’ English reading comprehension, it is not without its problems. When lines are not sufficiently fluent due to services’ quality, it may take time to access information and technical problems themselves can lead to frustration. We find the findings of the experiment most encouraging. However, the simple availability of a good tool, of known benefit to the students, is not enough to guarantee its large educational impact. Supplementary measures are necessary to enable teachers to develop positive attitudes toward the new tools, to adopt them [37] and to engage students to use them. The assumptions, that can help teachers effectively adopt and use ReadFluent, include the presence or the lack of appropriate training, administrative support, and restraints due to traditional pedagogical beliefs and resistance to change. According to [38], a known way to encourage the use of educationally beneficial tools is to convert them from student-driven to assessment-driven tools. We hope that the developed environment will be used as a medium which can foster a close relationship between students and teachers and used more broadly, enabling us to collect more data and perform some deeper analyses of its influence on different categories of students. In particular, we intend to explore it with new categories of students, such as secondary school.
6 Conclusion This paper explores a specific kind of educational activity in reading comprehension: Web and Sheet-pencil sessions. In each session, the student is given a fragment of a text and it is asked to give correct responses to a set of questions. Our studies demonstrate that developed online reading environment (ReadFluent), used in an out-ofclass reading mode, is an exceptional learning tool. As demonstrated by ANOVA method, working with it significantly improved the students’ motivation and positively affected higher-level knowledge and skills. The students themselves commend the
Communications in Computer and Information Science
771
system highly as a helpful learning tool and easy to use. The student believes that using the ReadFluent environment will increase her or his learning performance, efficiency, and effectiveness. We think it has more advantages than traditional paper-and-pencil assessment, because it helps solving the problems of increased teacher workload and class size. With the development of online reading environments and the increasing improvements in Internet access, the online reading of English text has become a promising trend for future development. Therefore, the establishment of design guidelines for optimized online reading environments is of great significance. The online reading environment can be a reliable if it has been designed in a simplistic format for ease of accessibility and interactivity. For further research, several recommendations are offered. It is recommended that the online reading should be used in conjunction with classroom teaching and should be further developed to improve a broader range of student abilities and motivate student learning. Further studies are also recommended to look at the impact of the use of online activities on students’ language development and, as a result, to improve our understanding and knowledge of ways of using online reading in the EFL classroom.
References 1. Ounis, S.: For specific purposes: A case study of the 1st year students at the department of Agronomy, Master Thesis, English department, Batna University (2005) 2. Meena, S.: The Internet and Foreign Language Education: Benefits and Challenges. The Internet TESL Journal III(6), (June 1997), http://iteslj.org/Articles/Singhal-Internet.html (accessed January 4, 2009) 3. Gerda van, W., Arno, L.: Technology-Assisted Reading for Improving Reading Skills for young South African Learners. Electronic Journal of e-Learning 6(3), 245–254 (2008) 4. Graham, J.: The reader’s helper: a personalized document reading environment. In: Proceedings of CHI 1999, pp. 481–488 (1999) 5. Schilit, B.N., Golovchinsky, G., Price, M.N.: Beyond paper: supporting active reading with free form digital ink annotations. In: Proceedings of CHI 1998, pp. 249–256 (1998) 6. Duggan, G.B., Payne, S.J.: How much do we understand when skim reading? In: Proceedings of CHI 2006, pp. 730–735 (2006) 7. O’Hara, K., Sellen, A.: A comparison of reading paper and on-line documents. In: Proceedings of CHI 1997, pp. 335–342 (1997) 8. Cook, D.: A new kind of reading and writing space the online course site. The Reading Matrix 2(3) (September 2002) 9. Fernandez, V., Simoa, P., Sallana, J.: Podcasting: A new technological tool to facilitate good practice in higher education. Computers & Education 53(2), 385–392 (2009) 10. Tsou, W., Wang, W., Li, H.: How computers facilitate English foreign language learners acquire English abstract words. Computers & Education 39, 415–428 (2002) 11. Chen, Y.L.: A mixed-method study of EFL teachers’ Internet use in language instruction. Teaching and Teacher Education 24, 1015–1028 (2008)
772
S. Zidat and M. Djoudi
12. Rahimi, M., Yadollahia, S.: Foreign language learning attitude as a predictor of attitudes towards computer-assisted language learning. In: Procedia Computer Science, World Conference on Information Technology, vol. 3, pp. 167–174 (2011) 13. Turan: Student Readiness for Technology Enhanced History Education in Turkish High Schools. Cypriot Journal Of Educational Sciences, 5(2) (2010), http://www.worldeducation-center.org/index.php/cjes/ article/view/75 (retrieved November 15, 2010) 14. Zidat, S., Tahi, S., Djoudi, M., Zidat, S., Talhi, S., Djoudi, M.: Système de compréhension à distance du français écrit pour un public arabophone. In: Colloque Euro Méditerranéen et Africain d’Approfondissement sur la FORmation A Distance ,CEMAFORAD 4, 9, 10 et 11 avril, Strasbourg, France (2008) 15. Zidat, S., Djoudi, M.: Online evaluation of Ibn Sina elearning environment. Information Technology Journal (ITJ) 5(3), 409–415 (2006) ISSN: 1812-5638 16. Zidat, S., Djoudi, M.: Task collaborative resolution tool for elearning environment”. Journal of Computer Science 2(7), 558–564 (2006) ISSN: 1549-3636 17. Cain, K., Oakhill, J.: Reading comprehension difficulties: Correlates, causes, and consequences. In: Cain, K., Oakhill, J. (eds.) Children’s comprehension problems in oral and written language: A cognitive perspective, pp. 41–75. Guilford, New York (2007) 18. Barnard, Y.: Didactical and Pedagogical Aspects of e-Learning Tools. In: Pardillo, J.P. (ed.) Proceedings of the Conference on European guidelines for the application of new technologies for driver training and education, Madrid (2006) 19. Siemens, G.: Connectivism: A learning theory for the digital age (2004), http://www.elearnspace.org/Articles/connectivism.htm (accessed January 01, 2009) 20. Comprehension Instruction 2002 Online edition, Texas Reading Initiative (2002) , http://www.netxv.net/pm_attach/67/TRIComprehension_Instr.pdf (accessed January 05, 2009) 21. Vellutino, F.R.: Individual differences as sources of variability in Reading comprehension in elementary school children. In: Sweet, A.P., Snow, C.E. (eds.) Rethinking Reading comprehension, pp. 51–81. Guilford Press, New York (2003) 22. Erin, P.J., Justin, P., Haya, S.: Variability in Reading Ability Gains as a Function of Computer Assisted Instruction. Computers & Education 54(2), 436–445 (2010) ; Available online January 15, 2010) 23. Hossein, M., Harm, B., Martin, M.: Determining factors of the use of e-learning environments by university teachers. Computers & Education 51, 142–154 (2008) 24. Paulsen, P.: New Era Trends and Technologies in Foreign Language Learning: an Annotated Bibliography. Interactive Multimedia Electronic Journal of Computer-Enhanced Learning (2001) 25. Pei-Lin, L., Chiu-Jung, C., Yu-Ju, C.: Effects of a computer-assisted concept mapping learning strategy on EFL college students’ English Reading comprehension. Computers & Education 54, 436–445 (2010) 26. Mojgan, A., Kamariah, A., Wong, S., Bahaman, A., Foo, S.: Understanding the role of teachers in realizing the potential of ICT in Education. In: International Conference on Teaching and Learning (ICTL 2007), Malaysia (2007) 27. Kintsch, W.: The use of knowledge in discourse processing: A construction integration model. Psychological Review 95, 163–182 (2003) 28. Perfetti, C.: Reading ability. Oxford University Press, New York (1985)
Communications in Computer and Information Science
773
29. Pena, A., Domínguez, R., Medel, J.: Educational data mining: a sample of review and study case. World Journal On Educational Technology 1(2) (2010) (retrieved November 15, 2010) 30. Ives, B., Olson, M.H., Baroudi, J.J.: The Measurement of User Information Satisfaction. Communications of the ACM 26(10), 785–793 (1983) 31. Price, P., et al.: Assessment of emerging Reading skills in young native speakers and language learners. Speech Communication 51, 968–984 (2009) 32. Peter, B., Sergey, S.: Individualized Exercises for Self-Assessment of Programming Knowledge: An Evaluation of QuizPACK. ACM Journal of Educational Resources in Computing 5(3), Article 6 (September 2005) 33. Wolf, D.: A comparison of assessment tasks used to measure FL Reading comprehension. Modern Language Journal 77, 473–489 (1993) 34. Brantmeier, C.: Does gender make a difference? Passage content and comprehension in second language Reading. Reading in a Foreign Language 15(1) (April 2003) 35. Brantmeier, C.: Statistical procedures for research on L2 Reading comprehension: An examination of ANOVA and Regression Model. Reading in a Foreign Language 16(2) (October 2004) 36. Albirini, A.: Teachers’ attitudes toward information and communication technologies: The case of Syrian EFL teachers. Computers and Education 47(4), 373–398 (2006) 37. Peter, B., Sergey, S.: Engaging Students to Work with Self-Assessment Questions: A Study of Two Approaches. ACM SIGCSE Bulletin archive 37(3) (2005)
Author Index
Abbasy, Mohammad Reza I-508 Abdel-Haq, Hamed II-221 Abdesselam, Abdelhamid I-219 Abdi, Fatemeh II-166, II-180 Abdullah, Natrah II-743 Abdul Manaf, Azizah I-431 AbdulRasool, Danyia II-571 Abdur Rahman, Amanullah II-280 Abel, Marie-H`el`ene II-391 Abou-Rjeily, Chadi II-543 Aboutajdine, Driss I-121, I-131 Abu Baker, Alaa II-448 Ademoglu, Ahmet I-277 Ahmad Malik, Usman I-741, II-206 Ait Abdelouahad, Abdelkaher I-131 Alam, Muhammad II-115 Alaya Cheikh, Faouzi I-315 Alboaie, Lenuta I-455 Alemi, Mehdi II-166 Alfawareh, Hejab M. II-733 Al-Imam, Ahmed M. II-9 Aliouat, Makhlouf I-603 Al-Mously, Salah I. I-106 Alsultanny, Yas A. II-629 Alzeidi, Nasser I-593 Amri Abidin, Ahmad Faisal II-376 ´ Angeles, Alfonso II-65 Arafeh, Bassel I-593 Arya, K.V. I-675 Asghar, Sajjad I-741, II-206 Aydin, Salih I-277, II-654 Azmi, Azri II-21 Babaie, Shahram I-685 Balestra, Costantino I-277 Balogh, Zolt´ an II-504 Bardan, Raghed II-139 Barriba, Itzel II-65 Bayat, M. I-535 Beheshti-Atashgah, M. I-535 Behl, Raghvi II-55 Belhadj-Aissa, Aichouche I-254 Bendiab, Esma I-199 Benmohammed, Mohamed I-704
Bensefia, Hassina I-470 Ben Youssef, Nihel I-493 Berkani, Daoud I-753 Bertin, Emmanuel II-718 Besnard, Remy II-406 Bestak, Robert I-13 Bilami, Azeddine I-704 Boledoviˇcov´ a, M´ aria II-504 Bouakaz, Saida I-327 Boughareb, Djalila I-33 Bouhoula, Adel I-493 Boukhobza, Jalil II-599 Bourgeois, Julien II-421 Boursier, Patrice II-115 Boutiche, Yamina I-173 Bravo, Antonio I-287 Burita, Ladislav II-1 Cangea, Otilia I-521 Cannavo, Flavio I-231 C´ apay, Martin II-504 Carr, Leslie II-692 Chaihirunkarn, Chalalai I-83 Challita, Khalil I-485 Chao, Kuo-Ming II-336 Che, Dunren I-714 Chebira, Abdennasser II-557 Chen, Hsien-Chang I-93 Chen, Wei-Chu II-256 Cherifi, Chantal I-45 Cherifi, Hocine I-131, II-265 Chi, Chien-Liang II-256 Chihani, Bachir II-718 Ching-Han, Chen I-267 Cimpoieru, Corina II-663 Conti, Alessio II-494 Crespi, Noel II-718 Dahiya, Deepak II-55 Daud, Salwani Mohd I-431 Day, Khaled I-593 Decouchant, Dominique I-380, II-614 Dedu, Eugen II-421 Den Abeele, Didier Van II-391
776
Author Index
Djoudi, Mahieddine II-759 Do, Petr II-293 Drlik, Martin I-60 Druoton, Lucie II-406 Duran Castells, Jaume I-339 Egi, Salih Murat I-277, II-654 El Hassouni, Mohammed I-131 El Khattabi, Hasnaa I-121 Farah, Nadir I-33 Faraoun, Kamel Mohamed I-762 Fares, Charbel II-100 Farhat, Hikmat I-485 Fawaz, Wissam II-139 Feghali, Mireille II-100 Feltz, Fernand II-80 Fenu, Gianni I-662 Fern´ andez-Ard`evol, Mireia I-395 Fezza, Sid Ahmed I-762 Fonseca, David I-345, I-355, I-407 Forsati, Rana II-707 Furukawa, Hiroshi I-577, I-619 Garc´ıa, Kimberly II-614 Gardeshi, M. I-535 Garg, Rachit Mohan II-55 Garnier, Lionel II-406 Garreau, Mireille I-287 Gaud, Nicolas II-361 Germonpre, Peter I-277 Ghalebandi, Seyedeh Ghazal I-445 Gholipour, Morteza I-161 Ghoualmi, Nacira I-470 Gibbins, Nicholas II-692 Giordano, Daniela I-209, I-231 Gong, Li I-577 Goumeidane, Aicha Baya I-184 Gueffaz, Mahdi II-591 Gueroui, Mourad I-603 Gui, Vasile I-417 Gupta, Vaibhav I-675 Haddad, Serj II-543 Hafeez, Mehnaz I-741, II-206 Haghjoo, Mostafa S. II-166, II-180 Hamrioui, Sofiane I-634 Hamrouni, Kamel I-146 Hassan, Wan H. II-9 Heikalabad, Saeed Rasouli I-685, I-693
Hermassi, Marwa I-146 Hilaire, Vincent II-361 Hori, Yukio I-728 Hosseini, Roya II-517 Hou, Wen-Chi I-714 Hui Kao, Yueh II-678 Hundoo, Pranav II-55 Hussain, Mureed I-741 Ibrahim, Suhaimi II-21, II-33 Ilayaraja, N. II-151 Imai, Yoshiro I-728 Ismail, Zuraini II-237 Ivanov, Georgi I-368 Ivanova, Malinka I-368 Izquierdo, V´ıctor II-65 Jacob, Ricky I-24 Jahromi, Mansour Nejati I-787 Jaichoom, Apichaya I-83 Jane, F. Mary Magdalene II-151 Jeanne, Fabrice II-718 Jelassi, Hejer I-146 Jin, Guangri I-577 Ju´ arez-Ram´ırez, Reyes II-65 Jung, Hyun-seung II-250 Jusoh, Shaidah II-733 Kamano, Hiroshi I-728 Kamir Yusof, Mohd II-376 Karasawa, Yoshio II-531 Kardan, Ahmad II-517 Karimaa, Aleksandra II-131 Kavasidis, Isaak I-209 Kavianpour, Sanaz II-237 Khabbaz, Maurice II-139 Khamadja, Mohammed I-184 Khdour, Thair II-321 Khedam, Radja I-254 Kholladi, Mohamed Kheirreddine Kisiel, Krzysztof II-473 Koukam, Abderrafiaa II-361 Kung, Hsu-Yung I-93
I-199
Labatut, Vincent I-45, II-265 Labraoui, Nabila I-603 Lai, Wei-Kuang I-93 Lalam, Mustapha I-634 Langevin, Remi II-406 Lashkari, Arash Habibi I-431, I-445
Author Index Laskri, Mohamed Tayeb II-557 Lazli, Lilia II-557 Le, Phu Hung I-649 Leblanc, Adeline II-391 ´ Leclercq, Eric II-347 Lepage, Alain II-678 Licea, Guillermo II-65 Lin, Mei-Hsien I-93 Lin, Yishuai II-361 Luan, Feng II-579 Maamri, Ramdane I-704 Madani, Kurosh II-557 Mahdi, Fahad II-193 Mahdi, Khaled II-193 Marcellier, Herve II-406 Marroni, Alessandro I-277 Mashhour, Ahmad II-448 Masrom, Maslin I-431 Mat Deris, Sufian II-376 Matei, Adriana II-336 Mateos Papis, Alfredo Piero I-380, II-614 Mazaheri, Samaneh I-302 Md Noor, Nor Laila II-743 Medina, Rub´en I-287 Mehmandoust, Saeed I-242 Mekhalfa, Faiza I-753 Mendoza, Sonia I-380, II-614 Mes´ aroˇsov´ a, Miroslava II-504 Miao-Chun, Yan I-267 Miyazaki, Eiichi I-728 Moayedikia, Alireza II-707 Mogotlhwane, Tiroyamodimo M. II-642 Mohamadi, Shahriar I-551 Mohamed, Ehab Mahmoud I-619 Mohammad, Sarmad I-75 Mohd Su’ud, Mazliham II-115 Mohtasebi, Amirhossein II-237 Moise, Gabriela I-521 Mokwena, Malebogo II-642 Mooney, Peter I-24 Moosavi Tayebi, Rohollah I-302 Mori, Tomomi I-728 Mosweunyane, Gontlafetse II-692 Mouchantaf, Emilie II-100 Mousavi, Hamid I-508 Muenchaisri, Pornsiri II-43 Munk, Michal I-60
777
Musa, Shahrulniza II-115 Muta, Osamu I-619 Mutsuura, Kouichi II-483 Nacereddine, Nafaa I-184 Nadali, Ahmad I-563 Nadarajan, R. II-151 Nafari, Alireza I-770, II-87 Nafari, Mona I-770, I-787, II-87 Nakajima, Nobuo II-531 Nakayama, Minoru II-483 Narayan, C. Vikram II-151 Navarro, Isidro I-355, I-407 Nejadeh, Mohamad I-551 Najaf Torkaman, Mohammad Reza I-508 Nematy, Farhad I-693 Nicolle, Christophe II-591 Nitti, Marco I-662 Nosratabadi, Hamid Eslami I-563 Nunnari, Silvia I-231 Nyg˚ ard, Mads II-579 Ok, Min-hwan II-250 Olivier, Pierre II-599 Ondryhal, Vojtech I-13 Ordi, Ali I-508 Orman, G¨ unce Keziban II-265 Otesteanu, Marius I-417 Ozyigit, Tamer II-654 Parlak, Ismail Burak I-277 Paul, Sushil Kumar I-327 Penciuc, Diana II-391 Pifarr´e, Marc I-345, I-407 Popa, Daniel I-417 Pourdarab, Sanaz I-563 Pujolle, Guy I-649 Rahmani, Naeim I-693 Ramadan, Wassim II-421 Rampacek, Sylvain II-591 Rasouli, Hosein I-693 Redondo, Ernest I-355, I-407 Rezaei, Ali Reza II-456 Rezaie, Ali Ranjideh I-685 Reza Moradhaseli, Mohammad Riaz, Naveed II-206 Robert, Charles II-678 Rodr´ıguez, Jos´e I-380, II-614
I-445
778
Author Index
Rubio da Costa, Fatima I-231 Rudakova, Victoria I-315 Saadeh, Heba II-221 Sabra, Susan II-571 Sadeghi Bigham, Bahram I-302 Safaei, Ali A. II-166, II-180 Safar, Maytham II-151, II-193 Safarkhani, Bahareh II-707 Saha, Sajib Kumar I-315 Salah, Imad II-221 Saleh, Zakaria II-448 S´ anchez, Albert I-355 S´ anchez, Gabriela I-380 Santucci, Jean-Fran¸cois I-45 Savonnet, Marinette II-347 Sedrati, Maamar I-704 Serhan, Sami II-221 Shah, Nazaraf II-336 Shahbahrami, Asadollah I-242, II-686 Shalaik, Bashir I-24 Shanmugam, Bharanidharan I-508 Sharifi, Hadi II-686 Sheisi, Gholam Hossein I-787 Shih, Huang-Chia II-436 Shorif Uddin, Mohammad I-327 Sinno, Abdelghani II-139 Spampinato, Concetto I-209, I-231 ˇ anek, Roman II-307 Sp´ Sta´ ndo, Jacek II-463, II-473 Sterbini, Andrea II-494 Takai, Tadayoshi I-728 Talib, Mohammad II-642 Tamisier, Thomas II-80 Tamtaoui, Ahmed I-121
Tang, Adelina II-280 Taniguchi, Tetsuki II-531 Temperini, Marco II-494 Terec, Radu I-455 Thawornchak, Apichaya I-83 Thomson, I. II-151 Thongmak, Mathupayas II-43 Touzene, Abderezak I-593 Tsai, Ching-Ping I-93 Tseng, Shu-Fen II-256 Tunc, Nevzat II-654 Tyagi, Ankit II-55 Tyl, Pavel II-307 ur Rehman, Adeel II-206 Usop, Surayati II-376 Vaida, Mircea-Florin I-455 Vashistha, Prerna I-675 Vera, Miguel I-287 Villagrasa Falip, Sergi I-339 Villegas, Eva I-345, I-407 Vranova, Zuzana I-13 Wan Adnan, Wan Adilah II-743 Weeks, Michael I-1 Winstanley, Adam C. I-24 Yamamoto, Hiroh II-483 Yimwadsana, Boonsit I-83 Yusop, Othman Mohd II-33 Zalaket, Joseph I-485 Zandi Mehran, Nazanin I-770, II-87 Zandi Mehran, Yasaman I-770, II-87 Zidat, Samir II-759 Zlamaniec, Tomasz II-336