Grid and Cooperative Computing: Second International Workshop, GCC 2003, Shanghai, China, December 7-10, 2003, Revised Papers, Part II [1 ed.] 9783540219934, 3540219935

Grid and cooperative computing has emerged as a new frontier of information tech- logy. It aims to share and coordinate

396 74 19MB

English Pages 1078 [1116] Year 2004

Report DMCA / Copyright

DOWNLOAD PDF FILE

Table of contents :
Front Matter....Pages -
Synthetic Implementations of Performance Data Collection in Massively Parallel Systems....Pages 1-9
GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid....Pages 10-17
A Parallel Branch–and–Bound Algorithm for Computing Optimal Task Graph Schedules....Pages 18-25
Selection and Advanced Reservation of Backup Resources for High Availability Service in Computational Grid....Pages 26-33
An Online Scheduling Algorithm for Grid Computing Systems....Pages 34-39
A Dynamic Job Scheduling Algorithm for Computational Grid....Pages 40-47
An Integrated Management and Scheduling Scheme for Computational Grid....Pages 48-56
Multisite Task Scheduling on Distributed Computing Grid....Pages 57-64
Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm....Pages 65-72
Resource Scheduling Algorithms for Grid Computing and Its Modeling and Analysis Using Petri Net....Pages 73-80
Architecture of Grid Resource Allocation Management Based on QoS....Pages 81-88
An Improved Ganglia-Like Clusters Monitoring System....Pages 89-96
Effective OpenMP Extensions for Irregular Applications on Cluster Environments....Pages 97-104
A Scheduling Approach with Respect to Overlap of Computing and Data Transferring in Grid Computing....Pages 105-112
A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Dependent Tasks in Grid Computing....Pages 113-120
A Load Balancing Algorithm for Web Based Server Grids....Pages 121-128
Flexible Intermediate Library for MPI-2 Support on an SCore Cluster System....Pages 129-136
Resource Management and Scheduling in Manufacturing Grid....Pages 137-140
A New Task Scheduling Algorithm in Distributed Computing Environments....Pages 141-144
GridFerret: Grid Monitoring System Based on Mobile Agent....Pages 145-148
Grid-Based Resource Management of Naval Weapon Systems....Pages 149-152
A Static Task Scheduling Algorithm in Grid Computing....Pages 153-156
A New Agent-Based Distributed Model of Grid Service Advertisement and Discovery....Pages 157-160
IMCAG: Infrastructure for Managing and Controlling Agent Grid....Pages 161-165
A Resource Allocation Method in the Neural Computation Platform....Pages 166-169
An Efficient Clustering Method for Retrieval of Large Image Databases....Pages 170-173
Research on Adaptable Replication Protocol....Pages 174-178
Co-operative Monitor Web Page Based on MD5....Pages 179-182
Collaboration-Based Architecture of Flexible Software Configuration Management System....Pages 183-186
The Research of Mobile Agent Security....Pages 187-190
Research of Information Resources Integration and Shared in Digital Basin....Pages 191-194
Scheduling Model in Global Real-Time High Performance Computing with Network Calculus....Pages 195-198
CPU Schedule in Programmable Routers: Virtual Service Queuing with Feedback Algorithm....Pages 199-202
Research on Information Platform of Virtual Enterprise Based on Web Services Technology....Pages 203-206
A Reliable Grid Messaging Service Based on JMS....Pages 207-210
A Feedback and Investigation Based Resources Discovery and Management Model on Computational Grid....Pages 211-214
Moment Based Transfer Function Design for Volume Rendering....Pages 215-218
Grid Monitoring and Data Visualization....Pages 219-222
An Economy Driven Resource Management Architecture Based on Mobile Agent....Pages 223-226
Decentralized Computational Market Model for Grid Resource Management....Pages 227-230
A Formal Data Model and Algebra for Resource Sharing in Grid....Pages 231-235
An Efficient Load Balance Algorithm in Cluster-Based Peer-to-Peer System....Pages 236-239
Resource Information Management of Spatial Information Grid....Pages 240-243
An Overview of CORBA-Based Load Balancing....Pages 244-249
Intelligence Balancing for Communication Data Management in Grid Computing....Pages 250-253
On Mapping and Scheduling Tasks with Synchronization on Clusters of Machines....Pages 254-258
An Efficient Load Balancing Algorithm on Distributed Networks....Pages 259-262
Optimal Methods for Object Placement in En-Route Web Caching for Tree Networks and Autonomous Systems....Pages 263-270
A Framework of Tool Integration for Internet-Based E-commerce....Pages 271-278
Scalable Filtering of Well-Structured XML Message Stream....Pages 279-286
Break a New Ground on Programming in Web Client Side....Pages 287-293
An Adaptive Mixing Audio Gateway in Heterogeneous Networks for ADMIRE System....Pages 294-302
Kernel Content-Aware QoS for Web Clusters....Pages 303-310
A Collaborative Multimedia Authoring System....Pages 311-318
Research of Satisfying Atomic and Anonymous Electronic Commerce Protocol....Pages 319-326
Network Self-Organizing Information Exploitation Model Based on GCA....Pages 327-334
Admire – A Prototype of Large Scale E-collaboration Platform....Pages 335-343
A Most Popular Approach of Predictive Prefetching on a WAN to Efficiently Improve WWW Response Times....Pages 344-351
Applications of Server Performance Control with Simple Network Management Protocol....Pages 352-359
Appcast – A Low Stress and High Stretch Overlay Protocol....Pages 360-371
Communication Networks: States of the Arts....Pages 372-379
DHCS: A Case of Knowledge Share in Cooperative Computing Environment....Pages 380-387
Improving the Performance of Equalization in Communication Systems....Pages 388-395
Moving Communicational Supervisor Control System Based on Component Technology....Pages 396-399
A Procedure Search Mechanism in OGSA-Based GridRPC Systems....Pages 400-403
An Improved Network Broadcasting Method Based on Gnutella Network....Pages 404-407
Some Conclusions on Cayley Digraphs and Their Applications to Interconnection Networks....Pages 408-412
Multifractal Characteristic Quantities of Network Traffic Models....Pages 413-417
Network Behavior Analysis Based on a Computer Network Model....Pages 418-421
Cutting Down Routing Overhead in Mobile Ad Hoc Networks....Pages 422-425
Improving Topology-Aware Routing Efficiency in Chord....Pages 426-429
Two Extensions to NetSolve System....Pages 430-433
A Route-Based Composition Language for Service Cooperation....Pages 434-437
To Manage Grid Using Dynamically Constructed Network Management Concept: An Early Thought....Pages 438-441
Design of VDSL Networks for the High Speed Internet Services....Pages 442-445
The Closest Vector Problem on Some Lattices....Pages 446-449
Proposing a New Architecture for Adaptive Active Network Control and Management System....Pages 450-454
A Path Based Internet Cache Design for GRID Application....Pages 455-458
On the Application of Computational Intelligence Methods on Active Networking Technology....Pages 459-463
Grid Computing for the Masses: An Overview....Pages 464-473
A Multiple-Neighborhoods-Based Simulated Annealing Algorithm for Timetable Problem....Pages 474-481
Lattice Framework to Implement OGSA: Its Constructs and Composition Scenario....Pages 482-489
Moving Grid Systems into the IPv6 Era....Pages 490-499
MG-QoS: QoS-Based Resource Discovery in Manufacturing Grid....Pages 500-506
An Extension of Grid Service: Grid Mobile Service....Pages 507-512
Supplying Instantaneous Video-on-Demand Services Based on Grid Computing....Pages 513-520
A Grid Service Lifecycle Management Scheme....Pages 521-528
An OGSA-Based Quality of Service Framework....Pages 529-540
A Service Management Scheme for Grid Systems....Pages 541-548
A QoS Model for Grid Computing Based on DiffServ Protocol....Pages 549-556
Design and Implementaion of a Single Sign-On Library Supporting SAML (Security Assertion Markup Language) for Grid and Web Services Security....Pages 557-564
Performance Improvement of Information Service Using Priority Driven Method....Pages 565-572
HH-MDS: A QoS-Aware Domain Divided Information Service....Pages 573-580
Grid Service Semigroup and Its Workflow Model....Pages 581-589
A Design of Distributed Simulation Based on GT3 Core....Pages 590-596
A Policy-Based Service-Oriented Grid Architecture....Pages 597-603
Adaptable QOS Management in OSGi-Based Cooperative Gateway Middleware....Pages 604-607
Design of an Artificial-Neural-Network-Based Extended Metacomputing Directory Service....Pages 608-611
Gridmarket: A Practical, Efficient Market Balancing Resource for Grid and P2P Computing....Pages 612-619
A Distributed Approach for Resource Pricing in Grid Environments....Pages 620-627
Application Modelling Based on Typed Resources....Pages 628-635
A General Merging Algorithm Based on Object Marking....Pages 636-643
Charging and Accounting for Grid Computing System....Pages 644-651
Integrating New Cost Model into HMA-Based Grid Resource Scheduling....Pages 652-659
CoAuto: A Formal Model for Cooperative Processes....Pages 660-668
A Resource Model for Large-Scale Non-hierarchy Grid System....Pages 669-676
A Virtual Organization Based Mobile Agent Computation Model....Pages 677-682
Modeling Distributed Algorithm Using B....Pages 683-689
Multiple Viewpoints Based Ontology Integration....Pages 690-693
Automated Detection of Design Patterns....Pages 694-697
Research on the Financial Information Grid....Pages 698-701
RCACM: Role-Based Context-Awareness Coordination Model for Mobile Agent Applications....Pages 702-705
A Model for Locating Services in Grid Environment....Pages 706-709
A Grid Service Based Model of Virtual Experiment....Pages 710-714
Accounting in the Environment of Grid Society....Pages 715-718
A Heuristic Algorithm for Minimum Connected Dominating Set with Maximal Weight in Ad Hoc Networks....Pages 719-722
Slice-Based Information Flow Graph....Pages 723-726
Semantic Rule Service Model: Enabling Intelligence on Grid Architecture....Pages 727-735
CSCW in Design on the Semantic Web....Pages 736-743
SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity – The P2P Meets the Semantic Web....Pages 744-751
SkyEyes: A Semantic Browser for the KB-Grid....Pages 752-759
Toward the Composition of Semantic Web Services....Pages 760-767
A Viewpoint of Semantic Description Framework for Service....Pages 768-777
A Novel Approach to Semantics-Based Exception Handling for Service Grid Applications....Pages 778-786
A Semantic-Based Web Service Integration Approach and Tool....Pages 787-794
A Computing Model for Semantic Link Network....Pages 795-802
A Semantic Web Enabled Mediator for Web Service Invocation....Pages 803-806
A Data Mining Algorithm Based on Grid....Pages 807-810
Prototype a Knowledge Discovery Infrastructure by Implementing Relational Grid Monitoring Architecture (R-GMA) on European Data Grid (EDG)....Pages 811-814
The Consistency Mechanism of Meta-data Management in Distributed Storage System....Pages 815-821
Link-Contention-Aware Genetic Scheduling Using Task Duplication in Grid Environments....Pages 822-829
An Adaptive Meta-scheduler for Data-Intensive Applications....Pages 830-837
Dynamic Data Grid Replication Strategy Based on Internet Hierarchy....Pages 838-846
Preserving Data Consistency in Grid Databases with Multiple Transactions....Pages 847-854
Dart: A Framework for Grid-Based Database Resource Access and Discovery....Pages 855-862
An Optimal Task Scheduling for Cluster Systems Using Task Duplication....Pages 863-870
Towards an Interactive Architecture for Web-Based Databases....Pages 871-878
Network Storage Management in Data Grid Environment....Pages 879-886
Study on Data Access Technology in Information Grid....Pages 887-890
GridTP Services for Grid Transaction Processing....Pages 891-894
FTPGrid: A New Paradigm for Distributed FTP System....Pages 895-898
Using Data Cube for Mining of Hybrid-Dimensional Association Rules....Pages 899-902
Knowledge Sharing by Grid Technology....Pages 903-906
A Security Access Control Mechanism for a Multi-layer Heterogeneous Storage Structure....Pages 907-912
Investigating the Role of Handheld Devices in the Accomplishment of Grid-Enabled Analysis Environment....Pages 913-917
A TMO-Based Object Group Model to Structuring Replicated Real-Time Objects for Distributed Real-Time Applications....Pages 918-926
Fuzzy Synthesis Evaluation Improved Task Distribution in WfMS....Pages 927-934
A Simulation Study of Job Workflow Execution Models over the Grid....Pages 935-943
An Approach to Distributed Collaboration Problem with Conflictive Tasks....Pages 944-953
Temporal Problems in Service-Based Workflows....Pages 954-961
iCell: Integration Unit in Enterprise Cooperative Environment....Pages 962-969
The Availability Semantics of Predicate Data Flow Diagram....Pages 970-977
Virtual Workflow Management System in Grid Environment....Pages 978-985
Research of Online Expandability of Service Grid....Pages 986-993
Modelling Cooperative Multi-agent Systems....Pages 994-1001
GHIRS: Integration of Hotel Management Systems by Web Services....Pages 1002-1009
Cooperative Ants Approach for a 2D Navigational Map of 3D Virtual Scene....Pages 1010-1017
Workflow Interoperability – Enabling Online Approval in E-government....Pages 1018-1021
A Multicast Routing Algorithm for CSCW....Pages 1022-1025
A Multi-agent System Based on ECA Rule....Pages 1026-1029
A Hybrid Algorithm of n-OPT and GA to Solve Dynamic TSP....Pages 1030-1033
The Application Research of Role-Based Access Control Model in Workflow Management System....Pages 1034-1037
Research and Design of Remote Education System Based on CSCW....Pages 1038-1041
Data and Interaction Oriented Workflow Execution....Pages 1042-1045
XCS System: A New Architecture for Web-Based Applications....Pages 1046-1050
A PKI-Based Scalable Security Infrastructure for Scalable Grid....Pages 1051-1054
A Layered Grid User Expression Model in Grid User Management....Pages 1055-1058
A QoS-Based Multicast Algorithm for CSCW in IP/DWDM Optical Internet....Pages 1059-1062
An Evolutionary Constraint Satisfaction Solution for over the Cell Channel Routing....Pages 1063-1066
Back Matter....Pages -
Recommend Papers

Grid and Cooperative Computing: Second International Workshop, GCC 2003, Shanghai, China, December 7-10, 2003, Revised Papers, Part II [1 ed.]
 9783540219934, 3540219935

  • 0 0 0
  • Like this paper and download? You can publish your own PDF file online for free in a few minutes! Sign Up
File loading please wait...
Citation preview

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos New York University, NY, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

3033

Springer Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo

Minglu Li Xian-He Sun Qianni Deng Jun Ni (Eds.)

Grid and Cooperative Computing Second International Workshop, GCC 2003 Shanghai, China, December 7-10, 2003 Revised Papers, Part II

Springer

eBook ISBN: Print ISBN:

3-540-24680-0 3-540-21993-5

©2005 Springer Science + Business Media, Inc.

Print ©2004 Springer-Verlag Berlin Heidelberg All rights reserved

No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher

Created in the United States of America

Visit Springer's eBookstore at: and the Springer Global Website Online at:

http://ebooks.springerlink.com http://www.springeronline.com

Preface

Grid and cooperative computing has emerged as a new frontier of information technology. It aims to share and coordinate distributed and heterogeneous network resources for better performance and functionality that can otherwise not be achieved. This volume contains the papers presented at the 2nd International Workshop on Grid and Cooperative Computing, GCC 2003, which was held in Shanghai, P.R. China, during December 7–10, 2003. GCC is designed to serve as a forum to present current and future work as well as to exchange research ideas among researchers, developers, practitioners, and users in Grid computing, Web services and cooperative computing, including theory and applications. For this workshop, we received over 550 paper submissions from 22 countries and regions. All the papers were peer-reviewed in depth and qualitatively graded on their relevance, originality, significance, presentation, and the overall appropriateness of their acceptance. Any concerns raised were discussed by the program committee. The organizing committee selected 176 papers for conference presentation (full papers) and 173 submissions for poster presentation (short papers). The papers included herein represent the forefront of research from China, USA, UK, Canada, Switzerland, Japan, Australia, India, Korea, Singapore, Brazil, Norway, Greece, Iran, Turkey, Oman, Pakistan and other countries. More than 600 attendees participated in the technical section and the exhibition of the workshop. The success of GCC 2003 was made possible by the collective efforts of many people and organizations. We would like to express our special thanks to the Ministry of Education of P.R. China and the municipal government of Shanghai. We also thank IBM, Intel, Platform, HP, Dawning and Lenovo for their generous support. Without the extensive support from many communities, we would not have been able to hold this successful workshop. Moreover, our thanks go to Springer-Verlag for its assistance in putting the proceedings together. We would like to take this opportunity to thank all the authors, many of whom traveled great distances to participate in this workshop and make their valuable contributions. We would also like to express our gratitude to the program committee members and all the other reviewers for the time and work they put into the thorough review of the large number of papers submitted. Last, but not least, our thanks also go to all the workshop staff for the great job they did in making the local arrangements and organizing an attractive social program. December 2003

Minglu Li, Xian-He Sun Qianni Deng, Jun Ni

This page intentionally left blank

Conference Committees

Honorary Chair Qinping Zhao (MOE, China)

Steering Committee Guojie Li (CCF, China) Weiping Shen (Shanghai Jiao Tong University, China) Huanye Sheng (Shanghai Jiao Tong University, China) Zhiwei Xu (IEEE Beijing Section, China) Liang-Jie Zhang (IEEE Computer Soceity, USA) Xiaodong Zhang (NSF, USA)

General Co-chairs Minglu Li (Shanghai Jiao Tong University, China) Xian-He Sun (Illinois Institute of Technology, USA)

Program Co-chairs Qianni Deng (Shanghai Jiao Tong University, China) Jun Ni (University of Iowa, USA)

Panel Chair Hai Jin (Huazhong University of Science and Technology, China)

VIII

Conference Committees

Program Committee Members Yaodong Bi (University of Scranton, USA) Wentong Cai (Nanyang Technological University, Singapore) Jian Cao (Shanghai Jiao Tong University, China) Jiannong Cao (Hong Kong Polytechnic University, China) Guo-Liang Chen (University of Science and Technology of China, China) Jian Chen (South Australia University, Australia) Xuebin Chi (Computer Network Information Center, CAS, China) Qianni Deng (Shanghai Jiao Tong University, China) Xiaoshe Dong (Xi’an Jiao Tong University, China) Joseph Fong (City University of Hong Kong) Yuxi Fu (Shanghai Jiao Tong University, China) Guangrong Gao (University of Delaware, Newark, USA) Yadong Gui (Shanghai Supercomputing Center, China) Minyi Guo (University of Aizu, Japan) Jun Han (Swinburne University of Technology, Australia) Yanbo Han (Institute of Computing Technology, CAS, China) Jinpeng Huai (Beihang University, China) Weijia Jia (City University of Hong Kong) ChangJun Jiang (Tongji University, China) Hai Jin (Huazhong University of Science and Technology, China) Francis Lau (University of Hong Kong) Keqin Li (State University of New York, USA) Minglu Li (Shanghai Jiao Tong University, China) Qing Li (City University of Hong Kong) Xiaoming Li (Peking University, China) Xinda Lu (Shanghai Jiao Tong University, China) Junzhou Luo (Southeast University, China) Fanyuan Ma (Shanghai Jiao Tong University, China) Dan Meng (Institute of Computing Technology, CAS, China) Xiangxu Meng (Shandong University, China) Jun Ni (University of Iowa, USA) Lionel M. Ni (Hong Kong University of Science & Technology) Yi Pan (Georgia State University, USA) Depei Qian (Xi’an Jiao Tong University, China) Yuzhong Qu (Southeast University, China) Hong Shen (Advanced Institute of Science & Technology, Japan) Xian-He Sun (Illinois Institute of Technology, USA) Huaglory Tianfield (Glasgow Caledonian University, UK) Weiqin Tong (Shanghai University, China) Cho-Li Wang (University of Hong Kong) Frank Wang (London Metropolitan University, UK) Jie Wang (Stanford University, USA) Shaowen Wang (University of Iowa, USA) Xingwei Wang (Northeastern University, China)

Conference Committees

Jie Wu (Florida Atlantic University, USA) Zhaohui Wu (Zhejiang University, China) Nong Xiao (National University of Defense Technology, China) Xianghui Xie (Jiangnan Institute of Computing Technology, China) Chengzhong Xu (Wayne State University, USA) Zhiwei Xu (Institute of Computing Technology, CAS, China) Guangwen Yang (Tsinghua University, China) Laurence Tianruo Yang (St. Francis Xavier University, Canada) Qiang Yang (Hong Kong University of Science & Technology) Jinyuan You (Shanghai Jiao Tong University, China) Haibiao Zeng (Sun Yat-Sen University, China) Ling Zhang (South China University of Technology, China) Xiaodong Zhang (NSF, USA and College of William and Mary, USA) Wu Zhang (Shanghai University, China) Weimin Zheng (Tsinghua University, China) Aoying Zhou (Fudan University, China) Wanlei Zhou (Deakin University, Australia) Jianping Zhu (University of Akron, USA) Hai Zhuge (Institute of Computing Technology, CAS, China)

Organization Committee Xinda Lu (Chair) (Shanghai Jiao Tong University, China) Jian Cao (Shanghai Jiao Tong University, China) Ruonan Rao (Shanghai Jiao Tong University, China) Meiju Chen (Shanghai Jiao Tong University, China) An Yang (Shanghai Jiao Tong University, China) Zhihua Su (Shanghai Jiao Tong University, China) Feilong Tang (Shanghai Jiao Tong University, China) Jiadi Yu (Shanghai Jiao Tong University, China)

IX

X

Conference Committees

Will Globus dominate Grid computing as Windows dominated in PCs? If not, what will the next Grid toolkits looks like? Panel Chair Hai Jin, Huazhong University of Science and Technology, China [email protected]

Panelists Wolfgang Gentzsch, Sun Microsystems, Inc., USA [email protected] Satoshi Matsuoka, Tokyo Institute of Technology, Japan [email protected] Carl Kesselman, University of Southern California, USA [email protected] Andrew A. Chien, University of California at San Diego, USA achien @ ucsd.edu Xian-He Sun, Illinois Institute of Technology, USA [email protected] Richard Wirt, Intel Corporation, USA [email protected] Zhiwei Xu, Institute of Computing Technology, CAS, China [email protected] Francis Lau, University of Hong Kong [email protected] Huaglory Tianfield, Glasgow Caledonian University, UK [email protected]

Table of Contents, Part II

Session 6: Advanced Resource Management, Scheduling, and Monitoring Synthetic Implementations of Performance Data Collection in Massively Parallel Systems Chu J. Jong, Arthur B. Maccabe

1

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid Chuan He, Zhihui Du, San-li Li

10

A Parallel Branch–and–Bound Algorithm for Computing Optimal Task Graph Schedules Udo Hönig, Wolfram Schiffmann

18

Selection and Advanced Reservation of Backup Resources for High Availability Service in Computational Grid Chunjiang Li, Nong Xiao, Xuejun Yang

26

An Online Scheduling Algorithm for Grid Computing Systems Hak Du Kim, Jin Suk Kim

34

A Dynamic Job Scheduling Algorithm for Computational Grid Jian Zhang, Xinda Lu

40

An Integrated Management and Scheduling Scheme for Computational Grid Ran Zheng, Hai Jin Multisite Task Scheduling on Distributed Computing Grid Weizhe Zhang, Hongli Zhang, Hui He, Mingzeng Hu Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm Yang Gao, Hongqiang Rang, Frank Tong, Zongwei Luo, Joshua Huang

48

57

65

Resource Scheduling Algorithms for Grid Computing and Its Modeling and Analysis Using Petri Net Yaojun Han, Changjun Jiang, You Fu, Xuemei Luo

73

Architecture of Grid Resource Allocation Management Based on QoS Xiaozhi Wang, Junzhou Luo

81

XII

Table of Contents, Part II

An Improved Ganglia-Like Clusters Monitoring System Wenguo Wei, Shoubin Dong, Ling Zhang, Zhengyou Liang Effective OpenMP Extensions for Irregular Applications on Cluster Environments Minyi Guo, Jiannong Cao, Weng-Long Chang, Li Li, Chengfei Liu

89

97

A Scheduling Approach with Respect to Overlap of Computing and Data Transferring in Grid Computing Changqin Huang, Yao Zheng, Deren Chen

105

A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Dependent Tasks in Grid Computing Haolin Feng, Guanghua Song, Yao Zheng, Jun Xia

113

A Load Balancing Algorithm for Web Based Server Grids Shui Yu, John Casey, Wanlei Zhou Flexible Intermediate Library for MPI-2 Support on an SCore Cluster System Yuichi Tsujita Resource Management and Scheduling in Manufacturing Grid Lilan Liu, Tao Yu, Zhanbei Shi, Minglun Fang A New Task Scheduling Algorithm in Distributed Computing Environments Jian-Jun Han, Qing-Hua Li

121

129 137

141

GridFerret: Grid Monitoring System Based on Mobile Agent Juan Fang, Shu-Jie Zhang, Rui-Hua Di, He Huang

145

Grid-Based Resource Management of Naval Weapon Systems Bin Zeng, Tao Hu, ZiTang Li

149

A Static Task Scheduling Algorithm in Grid Computing Dan Ma, Wei Zhang

153

A New Agent-Based Distributed Model of Grid Service Advertisement and Discovery Dan Ma, Wei Zhang, Hong-jun Zhang

157

IMCAG: Infrastructure for Managing and Controlling Agent Grid Jun Hu, Ji Gao

161

A Resource Allocation Method in the Neural Computation Platform Zhuo Lai, Jiangang Yang, Hongwei Shan

166

Table of Contents, Part II

An Efficient Clustering Method for Retrieval of Large Image Databases Yu-Xiang Xie, Xi-Dao Luan, Ling-Da Wu, Song-Yang Lao, Lun-Guo Xie

XIII

170

Research on Adaptable Replication Protocol Dong Zhao, Ya-wei Li, Ming-Tian Zhou

174

Co-operative Monitor Web Page Based on MD5 Guohun Zhu, YuQing Miao

179

Collaboration-Based Architecture of Flexible Software Configuration Management System Ying Ding, Weishi Zhang, Lei Xu The Research of Mobile Agent Security Xiaobin Li, Aijuan Zhang, Jinfei Sun, Zhaolin Yin

183 187

Research of Information Resources Integration and Shared in Digital Basin Xiaofeng Zhou, Zhijian Wang, Ping Ai

191

Scheduling Model in Global Real-Time High Performance Computing with Network Calculus Yafei Hou, Shi Yong Zhang, YiPing Zhong

195

CPU Schedule in Programmable Routers: Virtual Service Queuing with Feedback Algorithm Tieying Zhu

199

Research on Information Platform of Virtual Enterprise Based on Web Services Technology Chao Young, Jiajin Le

203

A Reliable Grid Messaging Service Based on JMS Ruonan Rao, Xu Cai, Ping Hao, Jinyuan You A Feedback and Investigation Based Resources Discovery and Management Model on Computational Grid Peng Ji, Junzhou Luo

207

211

Moment Based Transfer Function Design for Volume Rendering Huawei Hou, Jizhou Sun, Jiawan Zhang

215

Grid Monitoring and Data Visualization Yi Chi, Shoubao Yang, Zheng Feng

219

An Economy Driven Resource Management Architecture Based on Mobile Agent Peng Wan, Wei-Yong Zhang, Tian Chen

223

XIV

Table of Contents, Part II

Decentralized Computational Market Model for Grid Resource Management Qianfei Fu, Shoubao Yang, Maosheng Li, Junmao Zhun A Formal Data Model and Algebra for Resource Sharing in Grid Qiujian Sheng, Zhongzhi Shi An Efficient Load Balance Algorithm in Cluster-Based Peer-to-Peer System Ming-Hong Shi, Yong-Jun Luo, Ying-Cai Bai

227 231

236

Resource Information Management of Spatial Information Grid Deke Guo, Honghui Chen, Xueshan Luo

240

An Overview of CORBA-Based Load Balancing Jian Shu, Linlan Liu, Shaowen Song

244

Intelligence Balancing for Communication Data Management in Grid Computing Jong Sik Lee

250

On Mapping and Scheduling Tasks with Synchronization on Clusters of Machines Bassel R. Arafeh

254

An Efficient Load Balancing Algorithm on Distributed Networks Okbin Lee, Sangho Lee, Ilyong Chung

259

Session 7: Network Communication and Information Retrieval Optimal Methods for Object Placement in En-Route Web Caching for Tree Networks and Autonomous Systems Keqiu Li, Hong Shen

263

A Framework of Tool Integration for Internet-Based E-commerce Jianming Yong, Yun Yang

271

Scalable Filtering of Well-Structured XML Message Stream Weixiong Rao, Yingjian Chen, Xinquan Zhang, Fanyuan Ma

279

Break a New Ground on Programming in Web Client Side Jianjun Zhang, Mingquan Zhou

287

An Adaptive Mixing Audio Gateway in Heterogeneous Networks for ADMIRE System Tao Huang, Xiangning Yu Kernel Content-Aware QoS for Web Clusters Zeng-Kai Du, Jiu-bin Ju

294 303

Table of Contents, Part II

A Collaborative Multimedia Authoring System Mee Young Sung, Do Hyung Lee

XV

311

Research of Satisfying Atomic and Anonymous Electronic Commerce Protocol Jie Tang, Juan-Zi Li, Ke-Hong Wang, Yue-Ru Cai

319

Network Self-Organizing Information Exploitation Model Based on GCA Yujun Liu, Dianxun Shuai, Weili Han

327

Admire – A Prototype of Large Scale E-collaboration Platform Tian Jin, Jian Lu, XiangZhi Sheng

335

A Most Popular Approach of Predictive Prefetching on a WAN to Efficiently Improve WWW Response Times Christos Bouras, Agisilaos Konidaris, Dionysios Kostoulas

344

Applications of Server Performance Control with Simple Network Management Protocol Yijiao Yu, Qin Liu, Liansheng Tan

352

Appcast – A Low Stress and High Stretch Overlay Protocol V. Radha, Ved P Gulati, Arun K Pujari

360

Communication Networks: States of the Arts Xiaolu Zuo

372

DHCS: A Case of Knowledge Share in Cooperative Computing Environment Shui Yu, Le Yun Pan, Futai Zou, Fan Yuan Ma

380

Improving the Performance of Equalization in Communication Systems Wanlei Zhou, Hua Ye, Lin Ye

388

Moving Communicational Supervisor Control System Based on Component Technology Song Yu, Yan-Rong Jie

396

A Procedure Search Mechanism in OGSA-Based GridRPC Systems Yue-zhuo Zhang, Yong-zhong Huang, Xin Chen

400

An Improved Network Broadcasting Method Based on Gnutella Network Zupeng Li, Xiubin Zhao, Daoyin Huang, Jianhua Huang

404

Some Conclusions on Cayley Digraphs and Their Applications to Interconnection Networks Wenjun Xiao, Behrooz Parhami

408

XVI

Table of Contents, Part II

Multifractal Characteristic Quantities of Network Traffic Models Donglin Liu, Dianxun Shuai

413

Network Behavior Analysis Based on a Computer Network Model Weili Han, Dianxun Shuai, Yujun Liu

418

Cutting Down Routing Overhead in Mobile Ad Hoc Networks Jidong Zhong, Shangteng Huang

422

Improving Topology-Aware Routing Efficiency in Chord Dongfeng Chen, Shoubao Yang

426

Two Extensions to NetSolve System Jianhua Chen, Wu Zhang, Weimin Shao

430

A Route-Based Composition Language for Service Cooperation Jianguo Xing

434

To Manage Grid Using Dynamically Constructed Network Management Concept: An Early Thought Zhongzhi Luan, Depei Qian, Weiguo Wu, Tao Liu

438

Design of VDSL Networks for the High Speed Internet Services Hyun Yoe, Jaejin Lee

442

The Closest Vector Problem on Some Lattices Haibin Kan, Hong Shen, Hong Zhu

446

Proposing a New Architecture for Adaptive Active Network Control and Management System Mahdi Jalili-Kharaajoo, Alireza Dehestani, Hassan Motallebpour A Path Based Internet Cache Design for GRID Application Hyuk Soo Jang, Kyong Hoon Min, Won Seok Jou, Yeonseung Ryu, Chung Ki Lee, Seok Won Hong On the Application of Computational Intelligence Methods on Active Networking Technology Mahdi Jalili-Kharaajoo

450 455

459

Session 8: Grid QoS Grid Computing for the Masses: An Overview Kaizar Amin, Gregor von Laszewski, Armin R. Mikler A Multiple-Neighborhoods-Based Simulated Annealing Algorithm for Timetable Problem He Yan, Song-Nian Yu

464

474

Table of Contents, Part II

Lattice Framework to Implement OGSA: Its Constructs and Composition Scenario Hui Liu, Minglu Li, Jiadi Yu, Lei Cao, Ying Li, Wei Jin, Qi Qian

XVII

482

Moving Grid Systems into the IPv6 Era Sheng Jiang, Piers O’Hanlon, Peter Kirstein

490

MG-QoS: QoS-Based Resource Discovery in Manufacturing Grid Zhanbei Shi, Tao Yu, Lilan Liu

500

An Extension of Grid Service: Grid Mobile Service Wei Zhang, Jun Zhang, Dan Ma, Benli Wang, Yun Tao Chen

507

Supplying Instantaneous Video-on-Demand Services Based on Grid Computing Xiao-jian He, Xin-huai Tang, Jinyuan You

513

A Grid Service Lifecycle Management Scheme Jie Qiu, Haiyan Yu, Shuoying Chen, Li Cha, Wei Li, Zhiwei Xu

521

An OGSA-Based Quality of Service Framework Rashid Al-Ali, Kaizar Amin, Gregor von Laszewski, Omer Rana, David Walker

529

A Service Management Scheme for Grid Systems Wei Li, Zhiwei Xu, Li Cha, Haiyan Yu, Jie Qiu, Yanzhe Zhang

541

A QoS Model for Grid Computing Based on DiffServ Protocol Wandan Zeng, Guiran Chang, Xingwei Wang, Shoubin Wang, Guangjie Han, Xubo Zhou

549

Design and Implementaion of a Single Sign-On Library Supporting SAML (Security Assertion Markup Language) for Grid and Web Services Security Dongkyoo Shin, Jongil Jeong, Dongil Shin Performance Improvement of Information Service Using Priority Driven Method Minji Lee, Wonil Kim, Jai-Hoon Kim

557

565

HH-MDS: A QoS-Aware Domain Divided Information Service Deqing Zou, Hai Jin, Xingchang Dong, Weizhong Qiang, Xuanhua Shi

573

Grid Service Semigroup and Its Workflow Model Yu Tang, Haifang Zhou, Kaitao He, Luo Chen, Ning Jing

581

A Design of Distributed Simulation Based on GT3 Core Tong Zhang, Chuanfu Zhang, Yunsheng Liu, Yabing Zha

590

XVIII

Table of Contents, Part II

A Policy-Based Service-Oriented Grid Architecture Xiangli Qu, Xuejun Yang, Chunmei Gui, Weiwei Fan

597

Adaptable QOS Management in OSGi-Based Cooperative Gateway Middleware Wei Liu, Zhang-long Chen, Shi-liang Tu, Wei Du

604

Design of an Artificial-Neural-Network-Based Extended Metacomputing Directory Service Haopeng Chen, Baowen Zhang

608

Session 9: Algorithm, Economic Model, Theoretical Model of the Grid Gridmarket: A Practical, Efficient Market Balancing Resource for Grid and P2P Computing Ming Chen, Guangwen Yang, Xuezheng Liu

612

A Distributed Approach for Resource Pricing in Grid Environments Chuliang Weng, Xinda Lu, Qianni Deng

620

Application Modelling Based on Typed Resources Cheng Fu, Jinyuan You

628

A General Merging Algorithm Based on Object Marking Jinlei Jiang, Meilin Shi

636

Charging and Accounting for Grid Computing System Zhengyou Liang, Ling Zhang, Shoubin Dong, Wenguo Wei

644

Integrating New Cost Model into HMA-Based Grid Resource Scheduling Jun-yan Zhang, Fan Min, Guo-wei Yang

652

CoAuto: A Formal Model for Cooperative Processes Jinlei Jiang, Meilin Shi

660

A Resource Model for Large-Scale Non-hierarchy Grid System Qianni Deng, Xinda Lu, Li Chen, Minglu Li

669

A Virtual Organization Based Mobile Agent Computation Model Yong Liu, Cong-fu Xu, Zhaohui Wu, Wei-dong Chen, Yun-he Pan

677

Modeling Distributed Algorithm Using B Shengrong Zou

683

Multiple Viewpoints Based Ontology Integration Kai Zhang, Yunfa Hu, Yu Wang

690

Table of Contents, Part II

XIX

Automated Detection of Design Patterns Zhixiang Zhang, Qing-Hua Li

694

Research on the Financial Information Grid Jiyue Wen, Guiran Chang

698

RCACM: Role-Based Context-Awareness Coordination Model for Mobile Agent Applications Xin-huai Tang, Yaying Zhang, Jinyuan You

702

A Model for Locating Services in Grid Environment Erfan Shang, Zhihui Du, Mei Chen

706

A Grid Service Based Model of Virtual Experiment Liping Shen, Yonggang Fu, Ruimin Shen, Minglu Li

710

Accounting in the Environment of Grid Society Jiulong Shan, Huaping Chen, GuoLiang Chen, Haitao Tian, Xin Chen

715

A Heuristic Algorithm for Minimum Connected Dominating Set with Maximal Weight in Ad Hoc Networks Xinfang Yan, Yugeng Sun, Yanlin Wang Slice-Based Information Flow Graph Wan-Kyoo Choi, Il-Yong Chung

719 723

Session 10: Semantic Grid and Knowledge Grid Semantic Rule Service Model: Enabling Intelligence on Grid Architecture Qi Gao, HuaJun Chen, Zhaohui Wu, WeiMing Lin CSCW in Design on the Semantic Web Dazhou Kang, Baowen Xu, Jianjiang Lu, Yingzhou Zhang SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity – The P2P Meets the Semantic Web Le Yun Pan, Liang Zhang, Fanyuan Ma

727 736

744

SkyEyes: A Semantic Browser for the KB-Grid Yuxin Mao, Zhaohui Wu, HuaJun Chen

752

Toward the Composition of Semantic Web Services Jinghai Rao, Xiaomeng Su

760

A Viewpoint of Semantic Description Framework for Service Yuzhong Qu

768

XX

Table of Contents, Part II

A Novel Approach to Semantics-Based Exception Handling for Service Grid Applications Donglai Li, Yanbo Han, Haitao Hu, Jun Fang, Xue Wang

778

A Semantic-Based Web Service Integration Approach and Tool Hai Zhuge, Jie Liu, Lianhong Ding, Xue Chen

787

A Computing Model for Semantic Link Network Hai Zhuge, Yunchuan Sun, Jie Liu, Xiang Li

795

A Semantic Web Enabled Mediator for Web Service Invocation Lejun Zhu, Peng Ding, Huanye Sheno

803

A Data Mining Algorithm Based on Grid Xue-bai Zang, Xiong-fei Li, Kun Zhao, Xin Guan

807

Prototype a Knowledge Discovery Infrastructure by Implementing Relational Grid Monitoring Architecture (R-GMA) on European Data Grid (EDG) Frank Wang, Na Helian, Yike Guo, Steve Thompson, John Gordon

811

Session 11: Data Remote Access, Storage, and Sharing The Consistency Mechanism of Meta-data Management in Distributed Storage System Zhaofu Wang, Wensong Zhang, Kun Deng

815

Link-Contention-Aware Genetic Scheduling Using Task Duplication in Grid Environments Wensheng Yao, Xiao Xie, Jinyuan You

822

An Adaptive Meta-scheduler for Data-Intensive Applications Xuanhua Shi, Hai Jin, Weizhong Qiang, Deqing Zou

830

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy Sang-Min Park, Jai-Hoon Kim, Young-Bae Ko, Won-Sik Yoon

838

Preserving Data Consistency in Grid Databases with Multiple Transactions Sushant Goel, Hema Sharda, David Taniar

847

Dart: A Framework for Grid-Based Database Resource Access and Discovery Chang Huang, Zhaohui Wu, Guozhou Zheng, Xiaojun Wu

855

An Optimal Task Scheduling for Cluster Systems Using Task Duplication Xiao Xie, Wensheng Yao, Jinyuan You

863

Table of Contents, Part II

XXI

Towards an Interactive Architecture for Web-Based Databases Changgui Chen, Wanlei Zhou

871

Network Storage Management in Data Grid Environment Shaofeng Yang, Zeyad Ali, Houssain Kettani, Vinti Verma, Qutaibah Malluhi

879

Study on Data Access Technology in Information Grid YouQun Shi, ChunGang Yan, Feng Yue, Changjun Jiang

887

GridTP Services for Grid Transaction Processing Zhengwei Qi, Jinyuan You, Ying Jin, Feilong Tang

891

FTPGrid: A New Paradigm for Distributed FTP System Liutong Xu, Bo Ai

895

Using Data Cube for Mining of Hybrid-Dimensional Association Rules Zhi-jie Li, Fei-xue Huang, Dong-qing Zhou, Peng Zhang Knowledge Sharing by Grid Technology Bangyong Liang, Juan-Zi Li, Ke-Hong Wang A Security Access Control Mechanism for a Multi-layer Heterogeneous Storage Structure Shiguang Ju, Héctor J. Hernández, Lan Zhang Investigating the Role of Handheld Devices in the Accomplishment of Grid-Enabled Analysis Environment Ashiq Anjum, Arshad Ali, Tahir Azim, Ahsan Ikram, Julian J. Bunn, Harvey B. Newman, Conrad Steenberg, Michael Thomas

899 903

907

913

Session 12: Computer-Supported Cooperative Work and Cooperative Middleware A TMO-Based Object Group Model to Structuring Replicated Real-Time Objects for Distributed Real-Time Applications Chang-Sun Shin, Su-Chong Joo, Young-Sik Jeong Fuzzy Synthesis Evaluation Improved Task Distribution in WfMS Xiao-Guang Zhang, Jian Cao, Shensheng Zhang

918 927

A Simulation Study of Job Workflow Execution Models over the Grid Yuhong Feng, Wentong Cai, Jiannong Cao

935

An Approach to Distributed Collaboration Problem with Conflictive Tasks Jingping Bi, Qi Wu, Zhongcheng Li

944

XXII

Table of Contents, Part II

Temporal Problems in Service-Based Workflows Zhen Yu, Zhaohui Wu, ShuiGuang Deng, Qi Gao

954

iCell: Integration Unit in Enterprise Cooperative Environment Ruey-Shyang Wu, Shyan-Ming Yuan, Anderson Liang, Daphne Chyan

962

The Availability Semantics of Predicate Data Flow Diagram Xiaolei Gao, Huaikou Miao, Shaoying Liu, Ling Liu

970

Virtual Workflow Management System in Grid Environment ShuiGuang Deng, Zhaohui Wu, Qi Gao, Zhen Yu

978

Research of Online Expandability of Service Grid Yuan Wang, Zhiwei Xu, Yuzhong Sun

986

Modelling Cooperative Multi-agent Systems Lijun Shan, Hong Zhu

994

GHIRS: Integration of Hotel Management Systems by Web Services Yang Xiang, Wanlei Zhou, Morshed Chowdhury

1002

Cooperative Ants Approach for a 2D Navigational Map of 3D Virtual Scene Jiangchun Wang, Shensheng Zhang

1010

Workflow Interoperability – Enabling Online Approval in E-government Hua Xin, Fu-ren Xue

1018

A Multicast Routing Algorithm for CSCW Xiong-fei Li, Dandan Huan, Yuanfang Dong, Xin Zhou

1022

A Multi-agent System Based on ECA Rule Xiaojun Zhou, Jian Cao, Shensheng Zhang

1026

A Hybrid Algorithm of n-OPT and GA to Solve Dynamic TSP Zhao Liu, Lishan Kang

1030

The Application Research of Role-Based Access Control Model in Workflow Management System Baoyi Wang, Shaomin Zhang, Xiaodong Xia

1034

Research and Design of Remote Education System Based on CSCW Chunzhi Wang, Miao Shao, Jing Xia, Huachao Chen

1038

Data and Interaction Oriented Workflow Execution Wan-Chun Dou, Juan Sun, Da-Gang Yang, Shi-Jie Cai

1042

Table of Contents, Part II

XXIII

XCS System: A New Architecture for Web-Based Applications Yijian Wu, Wenyun Zhao

1046

A PKI-Based Scalable Security Infrastructure for Scalable Grid Lican Huang, Zhaohui Wu

1051

A Layered Grid User Expression Model in Grid User Management Limin Liu, Zhiwei Xu, Wei Li

1055

A QoS-Based Multicast Algorithm for CSCW in IP/DWDM Optical Internet Xingwei Wang, Hui Cheng, Jia Li, Min Huang, Ludi Zheng

1059

An Evolutionary Constraint Satisfaction Solution for over the Cell Channel Routing Ahmet Ünveren, Adnan Acan

1063

Author Index

1067

This page intentionally left blank

Table of Contents, Part I

Vega Grid: A Computer Systems Approach to Grid Research Zhiwei Xu

1

Problems of and Mechanisms for Instantiating Virtual Organizations Carl Kesselman

2

Grid Computing: The Next Stage of the Internet Irving Wladawsky-Berger

3

Making Grid Computing Real for High Performance and Enterprise Computing Richard Wirt

4

Grid Computing for Enterprise and Beyond Songnian Zhou

5

Semantic Grid: Scientific Issues, Methodology, and Practice in China Hai Zhuge

6

Grid Computing, Vision, Strategy, and Technology Wolfgang Gentzsch

7

Towards a Petascale Research Grid Infrastructure Satoshi Matsuoka

8

The Microgrid: Enabling Scientific Study of Dynamic Grid Behavior Andrew A. Chien

9

On-Demand Business Collaboration Enablement with Services Computing Liang- Jie Zhang

10

Session 1: Grid Application Multidisciplinary Design Optimization of Aero-craft Shapes by Using Grid Based High Performance Computational Framework Hong Liu, Xi-li Sun, Qianni Deng, Xinda Lu A Research on the Framework of Grid Manufacturing Li Chen, Hong Deng, Qianni Deng, Zhenyu Wu Large-Scale Biological Sequence Assembly and Alignment by Using Computing Grid Wei Shi, Wanlei Zhou

11 19

26

XXVI

Table of Contents, Part I

Implementation of Grid-Enabled Medical Simulation Applications Using Workflow Techniques Junwei Cao, Jochen Fingberg, Guntram Berti, Jens Georg Schmidt A New Overlay Network Based on CAN and Chord Wenyuan Cai, Shuigeng Zhou, Linhao Xu, Weining Qian, Aoying Zhou An Engineering Computation Oriented Visual Grid Framework Guiyi Wei, Yao Zheng, Jifa Zhang, Guanghua Song

34 42 51

Interaction Compatibility: An Essential Ingredient for Service Composition Jun Han

59

A Distributed Media Service System Based on Globus Data-Management Technologies Xiang Yu, Shoubao Yang, Yu Hong

67

Load Balancing between Heterogeneous Computing Clusters Siu-Cheung Chau, Ada Wai-Chee Fu

75

“Gridifying” Aerodynamic Design Problem Using GridRPC Quoc-Thuan Ho, Yew-Soon Ong, Wentong Cai

83

A WEB-GIS Based Urgent Medical Rescue CSCW System for SARS Disease Prevention Xiaolin Lu MASON: A Model for Adapting Service-Oriented Grid Applications Gang Li, Jianwu Wang, Jing Wang, Yanbo Han, Zhuofeng Zhao, Roland M. Wagner, Haitao Hu Coordinating Business Transaction for Grid Service Feilong Tang, Minglu Li, Jian Cao, Qianni Deng Conceptual Framework for Recommendation System Based on Distributed User Ratings Hyun-Jun Kim, Jason J. Jung, Geun-Sik Jo

91 99

108

115

Grid Service-Based Parallel Finite Element Analysis Guiyi Wei, Yao Zheng, Jifa Zhang

123

The Design and Implementation of the GridLab Information Service Giovanni Aloisio, Massimo Cafaro, Italo Epicoco, Daniele Lezzi, Maria Mirto, Silvia Mocavero

131

Comparison Shopping Systems Based on Semantic Web – A Case Study of Purchasing Cameras Ho-Kyoung Lee, Young-Hoon Yu, Supratip Ghose, Geun-Sik Jo

139

Table of Contents, Part I

XXVII

A New Navigation Method for Web Users Jie Yang, Guoqing Wu, Luis Zhu

147

Application Availability Measurement in Computational Grid Chunjiang Li, Nong Xiao, Xuejun Yang

151

Research and Application of Distributed Fusion System Based on Grid Computing Yu Su, Hai Zhao, Wei-ji Su, Gang Wang, Xiao-dan Zhang An Efficient and Self-Configurable Publish-Subscribe System Tao Xue, Boqin Feng The Implementation of the Genetic Optimized Algorithm of Air Craft Geometry Designing Based on Grid Computing Xi-li Sun, Xinda Lu, Qianni Deng Distributed Information Management System for Grid Computing Liping Niu, Xiaojie Yuan, Wentong Cai The Design of Adaptive Platform for Visual-Intensive Applications over the Grid Hui Xiang, Bin Gong, Xiangxu Meng, Xianglong Kong

155 159

164 168

172

Maintaining Packet Order for the Parallel Switch Yuguo Dong, Binqiang Wang, Yunfei Guo, Jiangxing Wu

176

Grid-Based Process Simulation Technique and Support System Hui Gao, Li Zhang

180

Some Grid Automata for Grid Computing Hao Shen, Yongqiang Sun

184

The Cooperation of Virtual Enterprise Supported by the Open Agent System Zhaolin Yin, Aijuan Zhang, Xiaobin Li, Jinfei Sun The Granularity Analysis of MPI Parallel Programs Wei-guang Qiao, Guosun Zeng NGG: A Service-Oriented Application Grid Architecture for National Geological Survey Yu Tang, Kaitao He, Zhen Xiang, Yongbo Zhang, Ning Jing

188 192

196

Integration of the Distributed Simulation into the OGSA Model Chuanfu Zhang, Yunsheng Liu, Tong Zhang, Yabing Zha

200

An Extendable Grid Simulation Environment Based on GridSim Efeng Lu, Zhihong Xu, Jizhou Sun

205

XXVIII

Table of Contents, Part I

The Architecture of Traffic Information Grid Zhaohui Zhang, Qing Zhi, Guosun Zeng, Changjun Jiang

209

Construction Scheme of Meteorological Application Grid (MAG) Xuesheng Yang, Weiming Zhang, Dehui Chen

213

OGSA Based E-learning System: An Approach to Build Next Generation of Online Education Hui Wang, Xueli Yu, Li Wang, Xu Liu Multimedia Delivery Grid: A Novel Multimedia Delivery Scheme ZhiHui Lv, Jian Yang, ShiYong Zhang, YiPing Zhong The System for Computing of Molecule Structure on the Computational Grid Environment Yongmei Lei, Weimin Xu, Bingqiang Wang An Efficient Parallel Crawler in Grid Environment Shoubin Dong, Xiaofeng Lu, Ling Zhang, Kejing He The Development and Application of Numerical Packages Based on NetSolve Haiying Cheng, Wu Zhang, Yunfu Shen, Anping Song

217 221

225 229

233

Grid-Based Biological Computation Service Environment Jing Zhu, Guangwen Yang, Weimin Zheng, Tao Zhu, Meiming Shen, Li’an Qiao, Xiangjun Liu

237

CIMES: A Collaborative Image Editing System for Pattern Design Xianghua Xu, Jiajun Bu, Chun Chen, Yong Li

242

Campus Grid and Its Application Zhiqun Deng, Guanzhong Dai

247

The Realization Methods of PC Cluster Experimental Platform in Linux Jiang-ling Zhang, Shi-jue Zheng, Yang Qing

251

Coarse-Grained Distributed Parallel Programming Interface for Grid Computing Yongwei Wu, Qing Wang, Guangwen Yang, Weiming Zheng

255

User Guided Parallel Programming Platform Yong Liu, Xinda Lu, Qianni Deng A High-Performance Intelligent Integrated Data Services System in Data Grid Bin Huang, Xiaoning Peng, Nong Xiao, Bo Liu

259

262

Table of Contents, Part I

Architecting CORBA-Based Distributed Applications Min Cao, Jiannong Cao, Geng-Feng Wu, Yan-Yan Wang

XXIX

266

Design of NGIS: The Next Generation Internet Server for Future E-society Chong-Won Park, Myung-Joon Kim, Jin-Won Park

269

Video-on-Demand System Using Multicast and Web-Caching Techniques SeokHoon Kang

273

Session 2: Peer to Peer Computing PeerBus: A Middleware Framework towards Interoperability among 277 P2P Data Sharing Systems Linhao Xu, Shuigeng Zhou, Keping Zhao, Weining Qian, Aoying Zhou Ptops Index Server for Advanced Search Performance of P2P System with a Simple Discovery Server Boon-Hee Kim, Young-Chan Kim

285

Improvement of Routing Structure in P2P Overlay Networks Jinfeng Hu, Yinghui Wu, Ming Li, Weimin Zheng

292

Overlay Topology Matching in P2P Systems Yunhao Liu, Xiao Li, Lionel M. Ni, Yunhuai Liu

300

Effect of Links on DHT Routing Algorithms Futai Zou, Liang Zhang, Yin Li, Fanyuan Ma

308

A Peer-to-Peer Approach to Task Scheduling in Computation Grid Jiannong Cao, Oscar M.K. Kwong, Xianbing Wang, Wentong Cai

316

Efficient Search in Gnutella-Like “Small-World” Peer-to-Peer Systems Dongsheng Li, Xicheng Lu, Yijie Wang, Nong Xiao

324

Dominating-Set-Based Searching in Peer-to-Peer Networks Chunlin Yang, Jie Wu

332

GFS-Btree: A Scalable Peer-to-Peer Overlay Network for Lookup Service Qinghu Li, Jianmin Wang, Jiaguang Sun

340

An Approach to Content-Based Approximate Query Processing in Peer-to-Peer Data Systems Chaokun Wang, Jianzhong Li, Shengfei Shi

348

A Hint-Based Locating and Routing Mechanism in Peer-to-Peer File Sharing Systems Hairong Jin, Shanping Li, Tianchi Ma, Liang Qian

356

XXX

Table of Contents, Part I

Content Location Using Interest-Based Subnet in Peer-to-Peer System Guangtao Xue, Jinyuan You, Xiaojian He

363

Trust and Cooperation in Peer-to-Peer Systems Junjie Jiang, Haihuan Bai, Weinong Wang

371

A Scalable Peer-to-Peer Lookup Model Haitao Chen, Chuanfu Xu, Zunguo Huang, Huaping Hu, Zhenghu Gong

379

Characterizing Peer-to-Peer Traffic across Internet Yunfei Zhang, Lianhong Lei, Changjia Chen

388

Improving the Objects Set Availability in the P2P Environment by Multiple Groups Kang Chen, Shuming Shi, Guangwen Yang, Meiming Shen, Weimin Zheng

396

PBiz: An E-business Model Based on Peer-to-Peer Network Shudong Chen, Zengde Wu, Wei Zhang, Fanyuan Ma

404

P2P Overlay Networks of Constant Degree Guihai Chen, Chengzhong Xu, Haiying Shen, Daoxu Chen

412

An Efficient Contents Discovery Mechanism in Pure P2P Environments In-suk Kim, Yong-hyeog Kang, Young Ik Eom Distributed Computation for Diffusion Problem in a P2P-Enhanced Computing System Jun Ni, Lili Huang, Tao He, Yongxiang Zhang, Shaowen Wang, Boyd M. Knosp, Chinglong Lin Applications of Peer to Peer Technology in CERNET Chang-ji Wang, Jian-Ping Wu

420

428

436

PSMI: A JXTA 2.0-Based Infrastructure for P2P Service Management Using Web Service Registries Feng Yang, Shouyi Zhan, Fuxiang Shen

440

CIPS-P2P: A Stable Coordinates-Based Integrated-Paid-Service Peer-to-Peer Infrastructure Yunfei Zhang, Shaolong Li, Changjia Chen, Shu Zhang

446

A Multicast Routing Algorithm for P2P Networks Tingyao Jiang, Aling Zhong Leveraging Duplicates to Improve File Availability of P2P Storage Systems Min Qu, Yafei Dai, Mingzhong Xiao

452

456

Table of Contents, Part I

XXXI

Distributing the Keys into P2P Network Shijie Zhou, Zhiguang Qin, Jinde Liu

460

SemanticPeer: An Ontology-Based P2P Lookup Service Jing Tian, Yafei Dai, Xiaoming Li

464

Authentication and Access Control in P2P Network Yuqing Zhang, Dehua Zhang

468

Methodology Discussion of Grid Peer-Peer Computing Weifen Qu, Qingchun Meng, Chengbing Wei

471

PipeSeeU: A Scalable Peer-to-Peer Multipoint Video Conference System Bo Xie, Yin Liu, Ruimin Shen, Wenyin Liu, Changjun Jiang

475

Session 3: Grid Architectures Vega Grid: A Computer Systems Approach to Grid Research Zhiwei Xu, Wei Li RB-GACA: A RBAC Based Grid Access Control Architecture Weizhong Qiang, Hai Jin, Xuanhua Shi, Deqing Zou, Hao Zhang

480

487

GriDE: A Grid-Enabled Development Environment 495 Simon See, Jie Song, Liang Peng, Appie Stoelwinder, Hoon Kang Neo Information Grid Toolkit: Infrastructure of Shanghai Information Grid Xinhua Lin, Qianni Deng, Xinda Lu

503

On-Demand Services Composition and Infrastructure Management Jun Peng, Jie Wang

511

GridDaen: A Data Grid Engine Nong Xiao, Dongsheng Li, Wei Fu, Bin Huang, Xicheng Lu

519

Research on Security Architecture and Protocols of Grid Computing System Xiangming Fang, Shoubao Yang, Leitao Guo, Lei Zhang

529

A Multi-agent System Architecture for End-User Level Grid Monitoring Using Geographic Information Systems (MAGGIS): Architecture and Implementation Shaowen Wang, Anand Padmanabhan, Yan Liu, Ransom Briggs, Jun Ni, Tao He, Boyd M. Knosp, Yasar Onel

536

An Architecture of Game Grid Based on Resource Router Yu Wang, Enhua Tan, Wei Li, Zhiwei Xu

544

XXXII

Table of Contents, Part I

Scalable Resource Management and Load Assignment for Grid and Peer-to-Peer Services Xuezheng Liu, Ming Chen, Guangwen Yang, Dingxing Wang

552

Research on the Application of Multi-agent Technology to Spatial Information Grid Yan Ren, Cheng Fang, Honghui Chen, Xueshan Luo

560

An Optimal Method of Diffusion Algorithm for Computational Grid Rong Chen, Yadong Gui, Ji Gao

568

A Reconfigurable High Availability Infrastructure in Cluster for Grid Wen Gao, Xinyu Liu, Lei Wang, Takashi Nanya

576

An Adaptive Information Grid Architecture for Recommendation System M. Lan, W. Zhou

584

Research on Construction of EAI-Oriented Web Service Architecture Xin Peng, Wenyun Zhao, En Ye

592

GridBR: The Challenge of Grid Computing S.R.R. Costa, L.G. Neves, F. Ayres, C.E. Mendonça, R.S.N. de Bragança, F. Gandour, L.V. Ferreira, M.C.A. Costa, N.F.F. Ebecken

601

Autonomous Distributed Service System: Basic Concepts and Evaluation H. Farooq Ahmad, Kashif Iqbal, Hiroki Suguri, Arshad Ali ShanghaiGrid in Action: The First Stage Projects towards Digital City and City Grid Minglu Li, Hui Liu, Changjun Jiang, Weiqin Tong, Aoying Zhou, Yadong Gui, Hao Zhu, Shui Jiang, Ruonan Rao, Jian Cao, Qianni Deng, Qi Qian, Wei Jin

608

616

Spatial Information Grid – An Agent Framework Yingwei Luo, Xiaolin Wang, Zhuoqun Xu

624

Agent-Based Framework for Grid Computing Zhihuan Zhang, Shuqing Wang

629

A Hierarchical Grid Architecture Based on Computation/Application Metadata Wan-Chun Dou, Juan Sun, Da-Gang Yang, Shi-Jie Cai

633

A Transparent-to-Outside Resource Management Framework for Computational Grid Ye Zhu, Junzhou Luo

637

Table of Contents, Part I

XXXIII

A Service-Based Hierarchical Architecture for Parallel Computing in Grid Environment Weiqin Tong, Jingbo Ding, Jianquan Tang, Bo Wang, Lizhi Cai

641

A Grid Computing Framework for Large Scale Molecular Dynamics Simulations WenRui Wang, GuoLiang Chen, HuaPing Chen, Shoubao Yang

645

Principle and Framework of Information Grid Evaluation Hui Li, Xiaolin Li, Zhiwei Xu, Ning Yang

649

Manufacturing Grid: Needs, Concept, and Architecture Yushun Fan, Dazhe Zhao, Liqin Zhang, Shuangxi Huang, Bo Liu

653

Developing a Framework to Implement Security in Web Services Fawaz Amin Alvi, Shakeel A. Khoja, Zohra Jabeen

657

Session 4: Grid Middleware and Toolkits Computing Pool: A Simplified and Practical Computational Grid Model Peng Liu, Yao Shi, San-li Li

661

Formalizing Service Publication and Discovery in Grid Computing Systems Chuliang Weng, Xinda Lu, Qianni Deng

669

An Improved Solution to I/O Support Problems in Wide Area Grid Computing Environments Bin Wang, Ping Chen, Zhuoqun Xu

677

Agora: Grid Community in Vega Grid Hao Wang, Zhiwei Xu, Yili Gong, Wei Li

685

Sophisticated Interaction – A Paradigm on the Grid Xingwu Liu, Zhiwei Xu

692

A Composite-Event-Based Message-Oriented Middleware Pingpeng Yuan, Hai Jin

700

An Integration Architecture for Grid Resources Minglu Li, Feilong Tang, Jian Cao

708

Component-Based Middleware Platform for Grid Computing Jianmin Zhu, Rong Chen, Guangnan Ni, Yuan Liu

716

Grid Gateway: Message-Passing between Separated Cluster Interconnects Wei Cui, Jie Ma, Zhigang Huo

724

XXXIV

Table of Contents, Part I

A Model for User Management in Grid Computing Environments Bo Chen, Xuebin Chi, Hong Wu GSPD: A Middleware That Supports Publication and Discovery of Grid Services Feilong Tang, Minglu Li, Jian Cao, Qianni Deng, Jiadi Yu, Zhengwei Qi Partially Evaluating Grid Services by DJmix Hongyan Mao, Linpeng Huang, Yongqiang Sun Integrated Binding Service Model for Supporting Both Naming/Trading and Location Services in Inter/Intra-net Environments Chang-Won Jeong, Su-Chong Joo, Sung-Kook Han

732

738

746

754

Personal Grid Running at the Edge of Internet Bingchen Li, Wei Li, Zhiwei Xu

762

Grid Workflow Based on Performance Evaluation Shao-hua Zhang, Yu-jin Wu, Ning Gu, Wei Wang

770

Research on User Programming Environment in Grid Ge He, Donghua Liu, Zhiwei Xu, Lin Li, Shengliang Xu

778

The Delivery and Accounting Middleware in the ShanghaiGrid Ruonan Rao, Baiyan Li, Minglu Li, Jinyuan You Applying Agent into Web Testing and Evolution Baowen Xu, Lei Xu, Jixiang Jiang

786 794

Experiences on Computational Program Reuse with Service Mechanism Ping Chen, Bin Wang, Guoshi Xu, Zhuoqun Xu

799

Research and Implementation of the Real-Time Middleware in Open System Jian Peng, Jinde Liu, Tao Yang

803

An Object-Oriented Petri Nets Based Integrated Development Environment for Grid-Based Applications Hongyi Shi, Aihua Ren

809

Some Views on Building Computational Grids Infrastructure Bo Dai, Guiran Chang, Wandan Zeng, Jiyue Wen, Qiang Guo

813

Research on Computing Grid Software Architecture Changyun Li, Gansheng Li, Yin Li

817

Table of Contents, Part I

XXXV

Research on Integrating Service in Grid Portal Zheng Feng, Shoubao Yang, Shanjiu Long, Dongfeng Chen, Leitao Guo

821

GSRP: An Application-Level Protocol for Grid Environments Zhiqiang Hou, Donghua Liu, Zhiwei Xu, Wei Li

825

Towards a Mobile Service Mechanism in a Grid Environment Weiqin Tong, Jianquan Tang, Liang Jin, Bo Wang, Yuwei Zong

829

Mobile Middleware Based on Distributed Object Song Chen, Shan Wang, Ming-Tian Zhou

833

Session 5: Web Security and Web Services On the Malicious Participants Problem in Computational Grid Wenguang Chen, Weimin Zheng, Guangwen Yang

839

Certificate Validation Scheme of Open Grid Service Usage XKMS Namje Park, Kiyoung Moon, Sungwon Sohn, Cheehang Park

849

Distributed IDS Tracing Back to Attacking Sources Wu Liu, Hai-Xin Duan, Jian-Ping Wu, Ping Ren, Li-Hua Lu

859

The Study on Mobile Phone-Oriented Application Integration Technology of Web Services Luqun Li, Minglu Li, Xianguo Cui

867

Group Rekeying Algorithm Using Pseudo-random Functions and Modular Reduction Josep Pegueroles, Wang Bin, Miguel Soriano, Francisco Rico-Novella

875

Semantics and Formalizations of Mission-Aware Behavior Trust Model for Grids Minglu Li, Hui Liu, Lei Cao, Jiadi Yu, Ying Li, Qi Qian, Wei Jin

883

Study on a Secure Access Model for the Grid Catalogue Bing Xie, Xiao-Lin Gui, Qing-Jiang Wang

891

Modeling Trust Management System for Grids Baiyan Li, Wensheng Yao, Jinyuan You

899

Avoid Powerful Tampering by Malicious Host Fangyong Hou, Zhiying Wang, Zhen Liu, Yun Liu

907

Secure Grid-Based Mobile Agent Platform by Instance-Oriented Delegation Tianchi Ma, Shanping Li

916

XXXVI

Table of Contents, Part I

Authenticated Key Exchange Protocol Secure against Offline Dictionary Attack and Server Compromise Seung Bae Park, Moon Seol Kang, Sang Jun Lee

924

StarOTS: An Efficient Distributed Transaction Recovery Mechanism in the CORBA Component Runtime Environment Yi Ren, Jianbo Guan, Yan Jia, Weihong Han, Quanyuan Wu

932

Web Services Testing, the Methodology, and the Implementation of the Automation-Testing Tool Ying Li, Minglu Li, Jiadi Yu

940

Composing Web Services Based on Agent and Workflow Jian Cao, Minglu Li, Shensheng Zhang, Qianni Den

948

Structured Object-Z Software Specification Language Xiaolei Gao, Huaikou Miao, Yihai Chen

956

Ontology-Based Intelligent Sensing Action in Golog for Web Service Composition Zheng Dong, Cong Qi, Xiao-fei Xu

964

The Design of an Efficient Kerberos Authentication Mechanism Associated with Directory Systems Cheolhyun Kim, Yeijin Lee, Ilyong Chung

972

A Multi-agent Based Architecture for Network Attack Resistant System Jian Li, Guo-yin Zhang, Guo-chang Gu

980

Design and Implementation of Data Mapping Engine Based on Multi-XML Documents Yu Wang, Liping Yu, Feng Jin, Yunfa Hu

984

Research on the Methods of Search and Elimination in Covert Channels Chang-da Wang, Shiguang Ju, Dianchun Guo, Zhen Yang, Wen-yi Zheng Design and Performance of Firewall System Based on Embedded Computing Yuan-ni Guo, Ren-fa Li OGSA Security Authentication Services Hongxia Xie, Fanrong Meng Detecting Identification of a Remote Web Server via Its Behavioral Characteristics Ke-xin Yang, Jiu-bin Ju

988

992 996

1000

Table of Contents, Part I

Access Control Architecture for Web Services Shijin Yuan, Yunfa Hu Formalizing Web Service and Modeling Web Service-Based System Based on Object Oriented Petri Net Xiaofeng Tao, Changjun Jiang

XXXVII

1004

1008

Report about Middleware Beibei Fan, Shisheng Zhu, Peijun Lin

1012

Grid Security Gateway on RADIUS and Packet Filter Jing Cao, BingLiang Lou

1017

A Security Policy Implementation Model in Computational GRID Feng Li, Junzhou Luo

1021

An Approach of Building LinuxCluster-Based Grid Services Yu Ce, Xiao Jian, Sun Jizhou

1026

Dynamic E-commerce Security Based on the Web Services Gongxuan Zhang, Guowei Zuo

1030

Standardization of Page Service Using XSLT Based on Grid System Wanjun Zhang, Yi Zeng, Wei Dong, Guoqing Li, Dingsheng Liu

1034

Secure Super-distribution Protocol for Digital Rights Management in Unauthentic Network Environment Zhaofeng Ma, Boqin Feng

1039

X-NIndex: A High Performance Stable and Large XML Document Query Approach and Experience in TOP500 List Data Shaomei Wu, Xuan Li, Zhihui Du

1043

The Analysis of Authorization Mechanisms in the Grid Shiguang Ju, Zhen Yang, Chang-da Wang, Dianchun Guo

1047

Constructing Secure Web Service Based on XML Shaomin Zhang, Baoyi Wang, Lihua Zhou

1051

ECC Based Intrusion Tolerance for Web Security Xianfeng Zhang, Feng Zhang, Zhiguang Qin, Jinde Liu

1055

Design for Reliable Service Aggregation in an Architectural Environment Xiaoli Zhi, Weiqin Tong The Anatomy of Web Services Hongbing Wang, Yuzhong Qu, Junyuan Xie

1059 1063

XXXVIII

Table of Contents, Part I

Automated Vulnerability Management through Web Services H. T. Tian, L.S. Huang, J.L. Shan, G.L. Chen

1067

Optimizing Java Based Web Services by Partial Evaluation Lin Lin, Linpeng Huang, Yongqiang Sun

1071

An XML Based General Configuration Language: XGCL Huaifeng Qin, Xingshe Zhou

1075

Modification on Kerberos Authentication Protocol in Grid Computing Environment Rong Chen, Yadong Gui, Ji Gao

1079

A Distributed Honeypot System for Grid Security Geng Yang, Chunming Rong, Yunping Dai

1083

Web Security Using Distributed Role Hierarchy Gunhee Lee, Hongjin Yeh, Wonil Kim, Dong-Kyoo Kim

1087

User Authentication Protocol Based on Human Memorable Password and Using ECC Seung Bae Park, Moon Seol Kang, Sang Jun Lee New Authentication Systems Seung Bae Park, Moon Seol Kang, Sang Jun Lee

1091 1095

Web Proxy Caching Mechanism to Evenly Distribute Transmission Channel in VOD System Backhyun Kim, Iksoo Kim, SeokHoon Kang

1099

Author Index

1103

Synthetic Implementations of Performance Data Collection in Massively Parallel Systems Chu J. Jong1 and Arthur B. Maccabe2 1

School of Information Technology, Illinois State University, IL, USA [email protected]

2

Department of Computer Science, University of New Mexico, NM USA [email protected]

Abstract. Most performance tools that run on Massively Parallel (MP) systems do not scale up as the number of nodes increases. We studied the scalability problem of MP system performance tools and proposed a solution, replacing the two-level data collection structure by hierarchal one. To demonstrate that hierarchical data collection structure solves the scalability problem, we synthesized an implementation model to implement the performance data collection in MP systems. This paper presents our synthetic implementation results. Keywords: Data collection, response time, split node, performance knee.

1

Introduction

Complex applications, such as genetic analysis, material simulation, and climate modeling[8], require high performance and high computation resources to generate results. MP systems, built with thousands processors and huge amount of memory, meet these applications’ needs1. The effectiveness of using the MP system computation power mainly depends on the user’s knowledge and sometimes on the compilers or system performance tools. In the past years, developers have made significant efforts to parallel performance tools. Recent work showed that many areas, such as reducing overheads[3], improving accuracies[10], minimizing perturbations[9], and increasing convenience[5], have been improved except the tool’s scalability. We use the response time, time between issuing a user command and a reply, to measure the performance of a tool. The results of performance tools are based on the performance data collection and processing. Two main factors contribute to the response time and they are: effective processor utilization and balanced point [6]. In MP systems, when the workload incurred by workers exceeds the 1

ASCI project: The Red had 9,298 Pentium Xeon processors; the Blue-Pacific composed 1464 RS/6000 nodes with 4 CPUs; the Blue-Mountain had 256 MIPS R10000 processors; and the BlueGene/L will have 130,000 advanced microprocessors to achieve 367 teraflops.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1–9, 2004. © Springer-Verlag Berlin Heidelberg 2004

2

C.J. Jong and A.B. Maccabe

server’s balanced point, the server starts thrashing and that causes exponential processor utilization drop and response time increase. Since most performance tools use two-level structure (one server and workers) to collect data, the workload of the server is proportional to the number of workers. When the number of workers reaches the server’s limit, thrashing occurs. A level hierarchical data collection structure (one server, workers, and two or more worker-servers) allows worker-servers share the workload to prevent server thrashing. Thus increases the total number of workers significantly. In theory, the maximum numbers of workers can be handled by a single server without thrashing are: in a two-level data collection structure and similar to in a level hierarchical data collection structure. and are ratios of network throughput, memory size, and processing power between server and workers respectively. Although some systems provide large ratios [4] to accommodate more workers, and are bounded by the system capacities.

2

Previous Work – Response Time Simulation

We built a time driven event simulator to simulate the response time behaviors for both data collection structures. From our simulation results, we concluded that a hierarchical data collection structure eliminates the response time knee (exponential increasing) that two-level data collection structures cannot avoid [7].

3

Response Time Implementation

We used a small cluster of 8 Pentium III 500 MHz PCs to construct our implementation model. We wrote MPI implementation programs in C run on Linux with customized kernel modules. We used a node splitting mechanism, to produce a larger virtual node cluster. The is a virtualization mechanism that splits a physical node into a number of virtual nodes. The basic principle of is to divide resources of a physical node into equivalent slices and group one slice from each divided resource to form a virtual node (processor). Theoretically the value of can be arbitrary chosen, the reality is that the more virtual processors, the higher the system overheads and the heavier the channel workload. We applied to seven PCs. Virtual processors are workers or workerservers, the un-split PC is the server. Applying to seven PCs, we built a virtual processor cluster that is equivalent to an one-order magnitude scale down cluster of a 56 500-MHz single CPU workers and a one-GHz quad CPU server connected by a one-Gigabit network cluster.

3.1

System Modules

Our customized kernel modules allows us to have better control on the system resources. Our pseudo application, pseudo data generating/collection, and pseudo

Synthetic Implementations of Performance Data Collection

3

data processing allows us to emulate a system without side effects. Following are descriptions of our modules. System Structure. All PCs are connected by a 100-megabit Ethernet. The server processor is the un-split PC, the worker and worker-server processors are virtual processors. Both workers and worker-servers are evenly distributed among seven PCs, no two worker-servers reside in the same PC, and no two children from the same parent reside in the same PC too. System Control Units. Memory Management Unit (MMU) - Locks a block of memory for processors. It uses paging to handle memory page faults and all memory requests have to go through it. Processor Management Unit (PMU) Controls CPU cycles consumed by processors. It suspends a process when that process reaches its CPU cycle limit. Perturbation Control Unit (PCU) - Controls application perturbation rate. Implementation Software. Processors - One server processor and many worker and/or worker-server processors. A 8:1 computation ratio between server and worker (worker-server) processor and a 8:4:1 memory ratio among server, worker-servers, and workers. Server. Runs on the server processor. Major functions are: to interact with the user, set system topology, monitor process behaviors, and fulfill the user’s request. Server collects the response time and calculates global statistics, such as data gathering time, data processing time, data processing rate, and channel time. Worker. Runs on worker processor. Worker calculates local statistics, such as response time, data collection time, data processing time, data processing rate, and channel time. Worker-Server. Runs on worker-server processor. Worker-server calculates non-local statistics, such as worker-server response time, data gathering time, data processing time, data processing rate, and channel time. Application Process - Runs on all processors, it emulates the computation and I/O behavior of different applications. Data Processing Process - Invoked by data generating/collection process after data gathering completed. It processes data by reading and writing every byte at least once. It provides a hook to allow different data access methods to be used, such as sequential, offset, indexed, link list, scattered among pages or segments. The size of data access range from a single byte to several hundred bytes. Data Generating/Collection Process Invoked by performance tool process. It performs three different tasks, data generating, data gathering, and data forwarding. It generates performance data and puts them in the data buffer with other data gathered from its children. After data processing is completed, it sends the result to its parent.

4

3.2

C.J. Jong and A.B. Maccabe

The Implementations

The two-level structure has two kinds of processes: server and worker, the server can have up to 56 child workers. The hierarchical structure has three kinds of processes: server, worker-server, and worker, the server has seven child workerservers, and each worker-server can have up to seven child workers. Communications are restricted to the processes having direct parent and child relations. Following are logical actions of each of them: Processor has no parent - server. To get a user request, parse the request, send data collection command, allocate data processing buffers, wait for all data and then process data, and present results to the user. Processor has no children - worker. To wait for data collection command, allocate data buffers, store data in the buffer, and send data to its parent. Processor has both parent and children - worker-server. To wait for data collection command, allocate data buffer, send data collection command, allocate data processing buffers, store data in the buffer, wait for all data and then process data, and sends results to its parent.

4

Implementation Results

Implementations start from one server and 7 workers (worker-servers) and increment one worker per physical node at a time (to eliminate impacts from unbalanced communication delay) until the total worker number reaches to 56.

Fig. 1. Server Response Time of Two-Level Structure

Synthetic Implementations of Performance Data Collection

5

Fig. 2. Server Data Processing Rate of Two-Level Structure

4.1

Server of Two-Level Structure

Figure 1 shows the response time of the two-level data collection structure. Horizontal scale is the number of workers and vertical scale is the response time in milliseconds. Labels indicate data sizes in kilobytes collected by each worker. The total amount of data gathered by the server equals the data size multiplied by the number of workers. Figure 2 shows the data processing rates. The 256K rate drops from 2183 bytes/ms at 48 workers to 355 bytes/ms at 50 workers and the 384K rate drops from 2187 bytes/ms at 32 workers to 406 bytes/ms at 33 workers. The rate drops indicate thrashings.

6

C.J. Jong and A.B. Maccabe

Fig. 3. Average Worker Server Response Time

4.2

Worker-Server of Hierarchical Structure

Figure 3 shows the average worker-server response times, the sums of the workerserver data gathering times and the worker-server data processing times, of the hierarchical data collection structure. The worker-server data gathering time is the longest time between when a worker-server issues a data collection command and receives all data packets. The average worker-server data processing times are shown in table 1. Except one value, 0.76 ms from 8K data size, most of them are in the range between 0.30 ms/KByte and 0.42 ms/Kbyte. Either of gathering time or processing time that start non-linear incrementation will make the response time non-linear. Table 1 indicated that none of the worker-servers were thrashing. Larger values were caused by the delay from the Linux process management. Actually, thrashing occurred in the virtual node subsystem due to the TCP/IP stack and buffer contention.

4.3

Server of Hierarchical Structure

Figure 4 shows the normalized (filtering outliers and replacing subsystem by values generated from algorithms based on the data sizes) response times of the hierarchical data collection structure. Graphs are data sizes of 4, 8, 16, 32, 64, 128, 256, 384, and 512 kilobytes. The overall data collected by all processors equals the data size multiplied by the total number of processors. Figure 5 shows the corresponding data processing rates. All rates are in the range between 2700 bytes/ms and 3300 bytes/ms. They showed no evidence of the server being saturated, and thrashing did not occur.

Synthetic Implementations of Performance Data Collection

7

Fig. 4. Normalized Server Response Time

Fig. 5. Server Data Processing Rate

5

Discussion

In figure 1, 256K and 384K graphs start non-linear incrementation when the total data collected by the server reaches 12 megabytes. The same graphs in figure 2 also showed a non-linear decrementing at a total data size of 12 megabytes. From the theory of thrashing [2] [1], when all main memory are used, swapping and waiting occurs. The 12 megabyte main memory was the upper limit we set up for the server.

C.J. Jong and A.B. Maccabe

8

In figure 3, some response time start non-linear incrementation after 50 processors regardless of the data size. However, from table 1, we know that all data processing times (except the 8K) are in the range between 0.30 ms/Kbyte to 0.42 ms/Kbyte. That means that none of the worker-servers reach their thrashing point. The large response time delays must caused by their data gathering times. Although applying to creates enough virtual nodes, it causes the sub-node system thrashing since the increasing workload reaches the system’s balanced point. The only way to prevent the sub-node system thrashing is not to increase the number of but to add physical nodes (PCs) to the system. The normalized hierarchical structure graphs in figure 4 shows that the response times are increased linearly in terms of the total processors. The overall data size does not affect the response time even after it reaches the memory limit. Figure 5 shows flat server data processing rates across all data sizes. That means that the server never reaches its balance point and thrashing has been totally eliminated. We also noticed that larger data size has slightly higher data processing rate. The constant data packet overhead in favors larger data collection size

6

Conclusion and Future Work

The results of our implementations shows that the hierarchical data collection structure eliminates the response time knee that a two-level data collection structure cannot avoid. The scalability has been improved in orders of magnitude with a minor network delay penalty per level. Both structures collectes the same overall amount of data, however, the amont of data gathered by two servers are not the same. In fact, the server in the hierarchical data collection structure receives quality data instead of quantity data. We are currently porting our implementation model to a larger cluster. We are also working on instrumentation and performance data presentation. We plan to enhance our virtualization mechanisms to provide a better development environment. Our long term goal is to produce a system that helps developers develop better MP system performance tools.

References 1. W. C. L. A. Alderson and B. Randell. Thrashing in a multiprogrammed paging system. Operating Systems Techniques by Hoare and Perrott, pages 152–167, 1972. 2. P. J. Denning. The working set model for program behavior. Communication of the ACM, (5):323–333, May 1968. 3. G. Eisenhauer, B. Schroeder, K. Schwan, V. Martin, and J. Vetter. DataExchange: High Performance Communication in Distributed Laboratories. Ninth International Conference on Parallel and Distributed Computing and Systems, October 1997. 4. D. A. R. et al. Scalable Performance Analysis: The Pablo Performance Analysis Environment. IEEE Scalable Performance Libraries Conference, IEEE Service Center, Piscaataway, N.J., 1993.

Synthetic Implementations of Performance Data Collection

9

5. W. Gu, G. Eisenhauer, and K. Schwan. Falcon: On-line Monitoring and Steering of Parallel Programs. Concurrency: Practice and Experience, 1995. 6. P. B. Hansen. Operating System Principles. Prentice-Hall, Inc., Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1973. 7. C. J. Jong and A. B. Maccabe. A simulator for performance tools in massively parallel system. In Postceedings of The International Conference on Parallel and Distributed Processing Techniques and Applications, June 2003. 8. T. Kindler, K. Schwan, D. Silva, M. Trauner, and F. Alyea. A Paraller Spectral Model for Atmospheric Transport Processes. Concurrency: Practice and Experience, 8:639–666, November 1996. 9. C. Liao, M. Martonosi, and D. W. Clark. Performance Monitoring in a MyrinetConnected Shrimp Cluster. In Symposium on Parallel and Distributed Tools, pages 21–29, P.O. Box 12114 Church Street Station, New York, N.Y. 10257, August 1998. The Association for Computing Machinery. 10. R. L. Ribler, J. S. Vetter, H. Simitci, and D. A. Reed. Autopilot: Adaptive Control of Distributed Applications. Proceedings of the 7th IEEE Symposium on HighPerformance Distributed Computing, Chicago, IL, July 1998.

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid Chuan He, Zhihui Du, and Sanli Li Grid Research Group, Department of Computer Science and Technology, Tsinghua University, Beijing, China [email protected]

Abstract. In this paper, we investigate many monitoring and management tools in Grid and other distributed systems. After focusing on the advantage and disadvantage of GMA (Grid Monitoring Architecture), we propose a new monitoring and management schema for Grid --GMA+. In GMA+, we add a close loop feedback structure to GMA and provides interfaces which match Grid Service standards for all its components and defines metadata for monitoring events. It is a novel infrastructure for Grid monitoring and management with highly modularity, usability and scalability.

1 Introduction Nowadays, single PC and even HPCs can not satisfy the explosive need of high performance computing. It needs to connect various heterogeneous resources which are distributed in physical location with high speed network to solve some huge problems. Grid [1] is the extension of traditional distributed computing technology, it waves resources in LAN or WAN and constructs dynamic VO(Virtual Organization) to implement secure and coordinated sharing of resources between persons, organizations and resources. The target of Grid monitoring and management is monitoring resources in Grid for fault detection, performance analysis, performance tuning, load balancing and scheduling. Compared with traditional distributed systems, Grid is more complicated in architecture, larger in scale and more distributed in physical location. So it is more urgent to construct a monitoring and management system with high performance, high scalability and high stabilization to do automatic management in Grid environment.

2 Related Works There have already been many monitoring tools in traditional distributed systems. However, those existing tools can not completely meet the needs of Grid monitoring and management.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 10–17, 2004. © Springer-Verlag Berlin Heidelberg 2004

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid

11

NetLogger [2], Paradyn [3], AIMS [4], Gloperf [5] and SPI [6] collect data from distributed systems and then analyze the data using special tools. Those tools generate monitoring data with unique format, which constrains their cooperation with other systems. Globus HBM [7] periodically sends “Heart Beat” to a centralized collection system in order to do fault detection in a distributed environment. The centrality of HBM make it difficult for extending, what’s more, HBM relies on local information so that it can not make forecast and management from the whole point of view. NWS [8] and DSRT [9] only monitor some specific resources (network bandwidth, CPU) but are not adaptive for new resources. RMON [10] and Autopilot [11] implement intelligent and dynamic control by using technology like reactive circuits. JAMM [12] is another similar system, but it only can be used in Java program. The shortcoming of those systems is that they are bound to given programming languages. So they can’t cooperate with other systems. SNMP-based tools are not suitable for Grid monitoring and management which runs on WAN and requires high performance. In addition, those toolkits do not provide application level monitoring.

3 GMA GMA [13] is an architecture presented and supported by GGF. It aims to constitute standards for grid monitoring and make existing systems interoperate.

3.1 Architecture GMA first adopts a producer-consumer model in Grid monitoring. All monitoring information are events which are base on timestamp for storing and transferring.

Fig. 1. Grid Monitoring Architecture

12

C. He, Z. Du, and S. Li

Fig. 1. shows the architecture of GMA. GMA is consisted of four key components: 1. Sensors. A sensor is any program which can generate time-stamped monitoring events. For example, we have sensors that monitor CPU usage, memory usage, and network usage. 2. Producers. Producers is the source of performance events. Producers send events to consumers through their interfaces and one producer can provide several independent interfaces for sending different events. 3. Consumers. Applications who receive and consume monitoring events. Consumers can not only run at the same computer with the producer, but can run at remote computers also. 4. Directory Service. The directory service in Grid likes a registry. Producers and consumers publish their location information in the directory service. When a producer and a consumer find each other through directory services, the monitoring procedure runs only between the producer and the consumer, but has no relationship with the directory service at all. All monitoring information in GMA are events. Another important contribution of GMA is that it abstracts three event passing patterns between producers and consumers. They are: Publishing/subscribing pattern, Query/response pattern and Notification pattern. The protocol between producers and consumers are not defined in GMA, it can be implemented in many ways. For example: using SOAP/HTTP, LDAP or XML/BXXP. What’s more, we also can use one protocol in controlling but another distinct protocol in transmission under different situations.

3.2 Current Status of GMA and Its Limitations R-GMA [14] makes use of relation database to offer directory service instead of LDAP. But it can not be suitable for huge data transaction, especially when facing outburst of information. GridMon [15] is the first prototype of Grid monitoring in China. Although it applies GMA thoughts partially, it is not entirely base on GMA model. In fact, GridMon uses LDAP as directory service and only realizes the query/ response model. It defines a scalable format for monitoring events, yet it is too simple to serve a complex system like Grid. GMA presents a highly scalable and flexible architecture for grid monitoring. However, there are still some limitations in GMA that should be improved in the following ways: 1. In GMA, the directory service should be universal enough to support LDAP, relation database and P2P distributed directory storage rather than restricted to LDAP. 2. There is no definition of the metadata model of monitoring information in GMA system. Using XML schema is a good way to define it and make the system more suitable for monitoring various kinds of new resources. 3. Producers and consumers in GMA system need to provide standard external interfaces. Moreover, considering that Grid is a wide area system, we should encapsulate those interfaces as Web Service or Grid Service proposed in OGSA [16].

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid

13

4. GMA only offers monitoring functions but does not integrate monitoring with management. In a practical system, GMA ought to be joined into the structure like close loop feedback in order to combine monitoring and management together.

4 GMA+ 4.1 Architecture According to the shortages of GMA mentioned above, we give an improved Grid Monitoring system—GMA+ based on GMA. The architecture of GMA+ is showed in Fig. 2. We classify all events in GMA+ into 2 types: M-Event for monitoring events and C-Event for controlling events. We also add controller and actuator as 2 new parts.

Fig. 2. The architecture of GMA+

Actually, the controller consists of a consumer, a producer and its controlling logic. The controller can analyze various events, then it will generate new M-Events and CEvents, in the end, it will send corresponding C-Events to the actuator. The actuator receives the evens and adjusts the status of Grid resources dynamically. Typically, the functions of the controller can be: validating the monitoring events, classifying the events, decision-making system and etc. In GMA+, many controllers can be connected as a chain, so sensor->producer-> controller...->controller->actuator forms a local closed loop structure. Fig. 3. shows three different types of controller: controller1 filters M-Event; controller2 analyzes the receiving C-Events and M-Events and generates C-Events for controlling; controller3 is a more complicated one for decision-making.

4.2 Components in GMA+ The interfaces of all GMA+ components follow Grid Service Specification in OGSA. Grid Services are more suitable for use in a Grid environment than Web Services,

14

C. He, Z. Du, and S. Li

Fig. 3. Different Controllers in GMA+

because Web Services are mostly persistent, while services in Grid are mainly volatile. Another important reason is that Grid Service has a soft lifetime management to release useless resources automatically, but Web Service can not solve that problem easily. In a word, adopting Grid Service in Grid monitoring and management has many benefits such as standardization, easy to distribute, easy for cooperating, independent for platforms and programming languages.

4.2.1 Sensors We implement four different types of sensors in GMA+. They are host sensors, network sensors, process sensors and application sensors. In a real Grid, there may be more kinds of sensors than what we have defined. For extension, we have given the metadata model for all events in GMA+, encapsulated sensors with Grid Service interface and defined the programming model for how to develop a new sensor. Those all guarantee that new sensors could be added into GMA+ easily. 4.2.2 Consumers GMA+ offers many kinds of querying and analyzing tools as consumers. They mainly contains: 1. Storing archives. Although the monitoring events in Grid is mainly useful in short period of time, it still needs a storing archive to keep those events for later use. We implement a distributed storage system to archive the monitoring events in GMA+ using LDAP and RDBMS. 2. Real-time monitoring tools. The tools are used to monitor real-time data. 3. Analyzing tools. These tools are used to analyze the data which is kept in the storing archives. 4. Query and analysis portal. GMA+ also provides a web portal to do query and analysis. In GMA+, we define several types of users with different privileges to do monitoring and management.

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid

15

4.2.3 Controller and Actuator In GMA+, we just implement two simple controllers: C-Controller and A-Controller to demonstrate the usage of controllers and actuators. C-Controller is used to classify monitoring events. A-Controller is used to analyze the monitoring events of computers (the computing resources in Grid). If the load or temperature of CPU exceeds the threshold, A-Controller will notify the corresponding actuator to report an alarm. Fig. 4. shows the collaboration sequence of sensors, controllers and actuators.

Fig. 4. Sequence diagram of sensors, controllers and actuators

4.3 Directory Service There are some traditional directory systems such as LDAP, DNS, DEC’s GNS [17], Intentional Naming Systems [18] and Active Names [19]. Nevertheless, they may meet similar problems when used directly in the Grid. 1. These directory services can not serve for the storage of a great number of data; as a result they can not satisfy the extension of Grid. 2. These services are not suitable for Grid monitoring data which needs to be updated frequently. They are only fit for the frequently-asked queries. 3. These services are not able to support queries related with multi-objects, for example, function “Join” in SQL. Recently, more methods were put forward to solve the directory service problems, such as using RDBMS [20]. But we have to see that there are not any mature solutions. As all components in GMA+ are Grid Services, the directory service in GMA+ is used to index Grid Services. We use WSIL to implement the directory. WSIL has a lightweight, distributed model: the XML documents which describe Web Services can be stored at any location; independent documents can connect with others through URL. Every component in GMA+ (producer, consumer, controller, sensor, and actuator) contains three parts: Grid Service interfaces, service entity and service description in WSIL. The distributed WSIL documents in the network construct a logical directory service. Fig. 5. shows the architecture of directory service in GMA+.

16

C. He, Z. Du, and S. Li

Fig. 5. Directory Service using WSIL

4.4 Metadata and Protocol Monitoring and management events are widely used in GMA+. So it is important to define the metadata of events with high scalability. We use XML Schema to define the metadata for various events. The metadata of events do not rely on transferring protocol. In GMA+, we use SOAP as the transfer protocol. SOAP is a lightweight protocol base on HTTP and XML. It is a protocol supporting RPC through HTTP and has many benefits such as: high scalability, easy to understand and easy to deploy.

5 Next Step Work In the future, we prepare to add more adapters and sensors in GMA+ in order to monitor and manage more Grid resources. And we also plan to implement more complicated controllers and actuators with powerful algorithm in order to do intelligent control and management in Grid.

References 1. 2.

3.

Foster and C. Kesselman, “The Grid: Blueprint for a New Computing Infrastructure”, Morgan Kaufmann Publishers, San Francisco, CA, 1999. Brian Tierney, William Jonston, Brian Crowley, Gary Hoo, Chris Brooks, and Dan Gunter, “The NetLogger Methodology for High Performance Distributed Systems Performance Analysis”, Proc. of IEEE High Performance Distributed Computing Conference (HPDC7), July 1998. Barton P. Miller, Jonathan M. Cargille, R. Bruce Irvin, Krishna Kunchithapadam, Mark D. Callaghan, Jeffrey K. Hollingsworth, Karen L. Karavanic, and Tia Newhall, “The Paradyn Parallel Performance Measurement Tool”, IEEE Computer, 28(11), November 1995, pp. 37–46.

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid 4.

5.

6.

7. 8. 9. 10.

11.

12. 13.

14. 15.

16.

17. 18.

19.

20.

17

Jerry C. Yan, “Performance Tuning with AIMS—An Automated Instrumentation and Monitoring System for Multicomputers”, Proc. of the Twenty-Seventh Hawaii Int. Conf. on System Sciences, Hawaii, January 1994. Craig A. Lee, Rich Wolski, Ian Foster, Carl Kesselman, and James Stepanek, “A Network Performance Tool for Grid Environments”, Proc. of SC’99, Portlan, Oregon, Nov. 13–19, 1999. Devesh Bhatt, Rakesh Jha, Todd Steeves, Rashmi Bhatt, and David Wills, “SPI: An Instrumentation Development Environment for Parallel/Distributed Systems”, Proc. of Int. Parallel Processing Symposium, April 1995. http://www.globus.org/hbm/heartbeat_spec.html R. Wolski et al., “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing”, Journal of Future Generation Systems, 1998. H. Chu and K. Nahrstedt, “CPU Service Classes for Multimedia Applications,” Proc. of IEEE Multimedia Computing and Applications, Florence, Italy, June 1999. Clifford W. Mercer and Ragunathan Rajkumar, “Interactive Interface and RTMach Support for Monitoring and Controlling Resource Management”, Proceedings of Real-Time Technology and Applications Symposium, Chicago, Illinois, May 15-17, 1995, pp. 134– 139. J.S. Vetter and D.A. Reed, “Real-time Monitoring Adaptive Control and Interactive Steering of Computational Grids”, The International Journal of High Performance Computing Applications (2000) Chris Brooks, Brian Tierney, and William Johnston, “Java Agents for Distributed System Management”, LBNL Technical Report, Dec. 1997. B. Tierney, R. Aydt, D. Gunter et al. “A Grid Monitoring Architecture (2002)”, GGF Performance Working Group, http://www-didc.lbl.gov/GGF-PERF/GMA-WG/ papers/GWD-GP-16-1.pdf Fisher.S. “Relational Grid Monitoring Architecture Package”, http://hepunx.rl.ac.uk/grid/wp3/releases.html Li Cha, Zhiwei Xu, Guozhang Lin, “A Grid monitoring system using LDAP”, Computer science and technology department, Beijing Institute of Technology. Journal of Computer Research and Development. August, 2002. Foster, C. Kesselman, J. Nick, S. Tuecke, “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002. Lampson, B. W., “Designing a global name service”, In 4th ACM Symposium on Principles of Distributed Computing (August 1986). Adjie-Winoto, W., Schwartz, E., Balakrishnan, H., and Lilley, J. “The design and implementation of an intentional naming system”, In Proceedings of the 17th ACM Symposium on Operating System Principles (December 199). Vahdat A., Dahlin M., Anderson T. and Aggarwal A. “Active names: flexible location and transport of wide-area resources”, In USENIX Symposium on Internet Technology and Systems (Oct 1999). E.F.Codd, “A relational model of data for large shared data banks”, CACM, 13(6), Jun 1970.

A Parallel Branch–and–Bound Algorithm for Computing Optimal Task Graph Schedules Udo Hönig and Wolfram Schiffmann FernUniversität Hagen, Lehrgebiet Technische Informatik I, 58084 Hagen, Germany {Udo.Hoenig|Wolfram.Schiffmann}@FernUni-Hagen.de http://www.informatik.ti1.fernuni-hagen.de/

Abstract. In order to harness the power of parallel computing we must firstly find appropriate algorithms that consist of a collection of (sub)tasks and secondly schedule these tasks to processing elements that communicate data between each other by means of a network. In this paper, we consider task graphs that take into account both, computation and communication costs. For a homogeneous computing system with a fixed number of processing elements we compute all the schedules with minimum schedule length. Our main contribution consist of parallelizing an informed search algorithm for calculating optimal schedules based on a Branch–and–Bound approach. While most recently proposed heuristics use task duplication, our parallel algorithm finds all optimal solutions under the assumption that each task is only assigned to one processing element. Compared to exhaustive search algorithms this parallel informed search can compute optimal schedules for more complex task graphs. In the paper, the influence of parameters on the efficiency of the parallel implementation will be discussed and optimal schedule lengths for 1700 randomly generated task graphs are compared to the solutions of a widely used heuristic.

1

Introduction

A task graph is a directed acyclic graph (DAG), that describes the dependencies between the parts of a parallel program [8]. In order to execute it on a cluster or grid computer, it’s tasks must be assigned to the available processing elements. Most often, the objective of solving this task graph scheduling problem is to minimize the overall computing time. The time that a task needs to compute an output by using the results from preceding tasks corresponds to the working load for the processing element to which that task is assigned. It is denoted by a node weight of the task node. The cost of communication between two tasks and is specified as an edge weight If both tasks are assigned to the same processor, the communication cost is zero. Task graph scheduling comprises two subproblems. One problem is to assign the tasks to the processors, the other problem consists of the optimal sequencing of the tasks. In this paper, we suppose a homogeneous computing environment, e.g. a cluster computer. But, even if we assume identical processing elements, the M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 18–25, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Parallel Branch–and–Bound Algorithm

19

problem to determine an optimal schedule has been proven to be NP–complete, apart from some restrained cases [8]. Thus, most researchers use heuristic approaches to solve the problem for reasonable sizes of the task graph. Three categories can be distinguished: list–based, clustering–based and duplication–based heuristics. List–based heuristics assign priority levels to the tasks and map the highest priority task to the best fitting processing element [10]. Clustering-based heuristics embrace heavily communicating tasks and assign them on the same processing element, in order to reduce the overall communication overhead[1]. Duplication-based heuristics also decrease the amount of communication while simultaneously the amount of (redundant) computation will be increased. It has been combined with both list-based [2] and cluster-based approaches [9]. In order to evaluate the quality of all those heuristics in a unified manner it would be desirable to compare the resulting schedules lengths of those heuristics to the optimal values. The parallel Branch–and–Bound algorithm proposed in this paper can be used to create a benchmark suite for this purpose. Of course, this is only feasible for task graphs of moderate size (e.g. with lower than 30 tasks). Usually, there are multiple optimal schedules that provide a solution set of the task graph scheduling problem. All these schedules are characterized by the same minimal schedule length. To compute these optimal schedules we have to search for all the possible assignments and sequences of the tasks. The simplest algorithm to compute the set of optimal schedules enumerates all possible solutions and stores only the best ones. But, even for a small number of tasks the number of solutions will be enormous. If we want to get the set of optimal schedules in an acceptable period of time and with maintainable memory requirements we have to devise a more skillful algorithm. The basic idea to shorten the effort of an exhaustive search is to perform a structured or informed search that reduces the state space. Thus, the number of schedules that have to be investigated is much smaller than the number of possible processor assignments multiplied by the number of possible sequences. In this way, an informed search can manage more complex task graphs than exhaustive search strategies. The informed search is often based on a A* algorithm ([3], [5]). In this paper, we will present a Branch–and–Bound approach and its implementation on a parallel virtual machine [4]. The paper is organized as follows. In the next section, the concepts of the Branch–and–Bound algorithm will be explained. The third section is concerned with the parallelization of that algorithm. It describes how the workload is partitioned and how load balancing will be achieved. In the fourth section, we present results and discuss the influence of various parameters on the efficiency of the parallel implementation.

2

Branch–and–Bound Algorithm

If we want to shorten the time to compute the set of optimal solutions, we are not allowed to consider every valid schedule (assignment of the tasks to the processing elements plus determination of the tasks’ starting times). Instead,

20

U. Hönig and W. Schiffmann

we have to divide the space of possible schedules into subspaces that contain partially similar solutions. These solutions should have in common that a certain number of tasks is already scheduled in the same way. The corresponding subspace contains all the schedules that are descended from the partial schedule but differ in the scheduling of the remaining tasks. Each partial schedule can be represented by a node in a decision-tree. Starting from the root, which represents the empty schedule, all possible schedules will be constructed. At any point of this construction process, we can identify a set of tasks that are ready to run and a set of idle processing elements to which those tasks can be assigned. In this way, each conceivable combination will produce a new node in the decision tree (Branch). Supposed we have an estimate for the total schedule length, we can exclude most of the nodes that are created in the Branch part of the algorithm. This estimate can be initialized by any heuristic. Here, we used the heuristic from Kasahara and Narita [6]. After the creation of a new partial schedule (node of the decision tree), we can estimate a lower bound of it’s runtime by means of it’s current schedule length and the static b-level values of the remaining (yet unscheduled) tasks. The lower bound is computed by the sum of the partial schedule length plus the maximum of the static b-level values. If is greater than we can exclude the newly created node from further investigation. By this deletion of a node (Bound) we avoid the evaluation of all the schedules that depend on the corresponding partial schedule (subspace of the search space). In this way, we accelerate the computation of the solution set for the task graph problem. As long as a node’s is lower or equal to the current we continue to expand this node in a depth-first manner. When all the tasks of the graph are scheduled, a leaf of the decision tree is reached. If we add the corresponding schedule to the set of best schedules. we clear the set of best schedules, store the new (complete) schedule into the set of best schedules and set Then we continue with the next partial schedule. The pruning scheme above is further enhanced by a selection heuristic that controls the order of the creation of new nodes. By means of this priority controlled breadth-first search we improve the threshold for pruning the decision tree as early as possible. Likewise to the Bound phase, this procedure reduces further the total number of evaluations. By proceeding as described above, all possible schedules are checked. At the end of the search procedure, the current set of best schedules represents the optimal schedules for the task graph problem.

3

The Parallel Algorithm

The parallelisation of the sequential Branch–and–Bound algorithm requires a further subdivision of the search-space into disjunct subspaces, which can be assigned to the processing units.

A Parallel Branch–and–Bound Algorithm

21

As already described in Section 2, every inner node of the decision-tree represents a partial schedule and every leaf node corresponds to a complete schedule. The branching rule used by the algorithm, guarantees that the sons of a node will represent different partial schedules. Since the schedules are generated along the timeline, a later reunification of the subtrees, rooting in these sons, is impossible. Therefore two subtrees of the decision-tree always represent disjunct subspaces of the search-space, if none of their roots is an ancestor of the other one. Another result of these thoughts is that every part of the search-space can unambiguously be identified by it’s root-node. In order to achieve a balanced assignment of the computation to the available processing units, the algorithm generates a workpool, containing a certain number of subtree-roots. This workpool is managed by a master-process, which controls the distribution of the tasks to the slave-processes. The workpool is created by means of a breadth-first-search which terminates, when a user defined number of elements is collected in the workpool. The nodes are numbered by the ordinal numbers of the nodes’ permutations. These ordinal numbers allow an unambiguous identification of the nodes. By means of the root’s ordinal number, it is possible to build the corresponding subtree. For that reason, the only information, that need to be stored in the workpool, are the ordinal numbers of the subtrees’ roots. This helps to keep the required memory small. The parallel algorithm can be partitioned into three parts, called Initialisation, Computation and Finalisation. During the Initialisation-Phase, the master launches the slave-processes and splits the whole task into a number of smaller subtasks. The number of subtasks depends on the size of the workpool, which is specified by the user. The Computation-Phase begins as soon as the master assigns a subtask to every slave1. Then, the master has to wait until it receives a message from one of the slaves which indicates that the slave has completely analysed a given subtree or that it has found an improved schedule. In the last case, the master only stores the broadcasted schedule length and the process-id of the sending slave. In the other case, it sends a new subtask to the slave if the workpool is not empty, otherwise the slave will stay idle. As soon as all subtasks are processed and all slaves are idle, FinalisationPhase will take place. The master informs all slaves about the end of the computation-process. This request is necessary, because the sending of messages appears asynchronously and it is not guaranteed, that every message that indicates a new best solution was already sent and received. Every slave receiving the finalisation message, compares the global best solution to it’s own recent results and possibly deletes it’s own suboptimal results. If the slave recognizes, that it’s own recent temporal solution is better than the global best solution, it sends an appropriate broadcast to the master and to all other slaves. Then, the slave sends the master an acknowledgement to indicate, that it finished the adjustment successfully. The master waits, until it receives an acknowledgement 1

It is required that the workpool contains more tasks than slaves.

22

U. Hönig and W. Schiffmann

from every slave. Then it requests the complete schedule from the last slave that reported to have found the best solution so far. Additionally it requests some bookkeeping information of all slaves. The slaves terminate after sending their replies. Finally, the master creates the output-file and terminates as well.

4

Results

To achieve an efficient informed search algorithm, there are some constraints that should be analysed before starting the computation of larger problems. It was found, that some of the most important constraints that influence the search-speed are independent of the given task graph. These aspects belong to the algorithm’s properties such as the size of the workpool and the number of processing elements that are involved in the search-process. Additionally, we demonstrate the suitability of our approach for the evaluation of scheduling heuristics. For this purpose, we analyse the heuristic of Kasahara and Narita [6] using a test bench of approximately 1700 task graphs.

4.1

Size of the Workpool

After it’s creation, the workpool includes the complete search space, subdivided into a user-defined number of subspaces. Apparently, the size of a subspace is determined by the number of schedules that it contains. If the number of subspaces will be increased, the size of every subspace will be reduced. In this way the workpool’s size determines the granularity of the search space and the number of schedules one slave has to analyse. Figure 1 shows how the runtime for different task graph problems depends on the size of the workpool. On the left side, we see how an increase of the workpool size can reduce the runtime on a parallel computing system with 30 processing elements. It is clearly visible that a workpool size between approx. 300 to 1200 elements (partial schedules) will be useful in this case. In contrast, we see on the right side of figure 1, that the situation for light-weight scheduling problems changes to get worse when using a parallel implementation. In this case, the minimal runtime is reached with the sequential implementation and no subtasks at all. The relative slowdown increases with the workpool’s size and might become clearly more than 100 %. Since it is difficult or even impossible to estimate the computational complexity of a schedule, the workpool size has to be chosen carefully in order to minimize the overall runtime.

4.2

Number of Processing Elements

Usually, the maximum speedup of a parallel program will be equal to the number of the available processing elements. In order to evaluate the scaling of the parallel implementation we used three task graphs that had sequential runtimes of approximately one minute each. The workpool size was set to 6000.

A Parallel Branch–and–Bound Algorithm

23

Fig. 1. Influence of the workpool’s size on the runtime. Sequential runtime on the left side diagram: 50..250s; on the right side diagram: 0,01..1,2s

Figure 2 shows the relation between the speedup-factor and the number of used processing elements. For smaller numbers of processing elements, the speedup-factor increases almost linear (for one example even a slight superlinear speedup can be recognized). If the number of processing elements gets larger than 8 the speedup differs from being linear and it begins to move to a saturation limit of approximately 16.

Fig. 2. Influence of the number of processing elements on the speedup-factor

4.3

Analysing Scheduling Heuristics

The computation of optimal schedules is a rather time-consuming process which is only possible for small to medium-size task graphs. Although most of the

24

U. Hönig and W. Schiffmann

proposed scheduling heuristics aim at large task graphs, this subsection should show, that the efficiency of those heuristics can also be analysed by considering smaller graphs for which the optimal schedule lenghts can be computed by the proposed algorithm. A test with approximately 1700 computed optimal schedules was carried out to evaluate the heuristic’s efficiency. The used graphs were generated randomly in terms of multiple possible settings of the DAGs’ properties, e.g. the connection density between the nodes. Using such a wide spread variation of task graph properties, we can be sure that the results are independent of the chosen task graph set. This way, our approach enables scientists to evaluate and compare their heuristics’ results more objectively. To demonstrate this new opportunity, we use the well-known heuristic of Kasahara and Narita, described in [6].

Table 1 shows the deviation of this heuristic’s results from the optimal schedule lengths. The heuristic finds a solution with the optimal schedule length for 57.67% of the investigated task graphs. Regarding to the other task graphs, the observed deviation from the optimal schedule length is rather low (< 10%) in 83.41% of the cases. Only 2.86% of all solutions are worse than 25%. The performance of heuristics is usually evaluated by comparing their solutions with the ones of another (well known) heuristic. We argue that it would be more meaningful to use the deviations from the optimal solutions introduced above. For this purpose we will soon release a benchmark suite that provides the optimal solutions for 36.000 randomly created task graph problems which cover a wide range of different graph properties.

5

Conclusion

In this paper we presented a parallel implementation of a Branch–and–Bound algorithm for computing optimal task graph schedules. By means of parallelization the optimization process is accelerated and thus a huge number of test cases can be investigated within a reasonable period of time. The runtime needed for the computation of an optimal schedule is highly dependent of the workpool’s size and the number of processing elements that are available for computation. In order to reduce the runtime, the size of the workpool has to be chosen carefully. A nearly linear speedup can be achieve, provided that an appropriate workpool size is used.

A Parallel Branch–and–Bound Algorithm

25

By means of the parallel Branch–and–Bound algorithm, the optimal schedules for a benchmark suite that comprises 1700 task graphs were computed. This allows for a more objective evaluation of scheduling-heuristics than comparisons between heuristics. We evaluated the solutions of the heuristic of Kasahara and Narita [6] by comparing the corresponding schedule lengths towards the optimal schedule lengths of all the 1700 test cases. The authors’ future work will include the release of a test bench, which will provide a collection of 36000 task graph problems together with their optimal schedule lengths. This benchmark suite will enable researchers to compare the performance of their heuristics with the actually best solutions. Acknowledgement. The authors would like to thank Mrs. Sigrid Preuss who contributed some of the presented results from her diploma thesis.

References 1. Aguilar, J., Gelenbe E.: Task Assignment and Transaction Clustering Heuristics for Distributed Systems, Information Sciences, Vol. 97, No. 1& 2, pp. 199–219, 1997 2. Bansal S., Kumar P., Singh K.: An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems, IEEE Transactions on Parallel and Distributed Systems, Vol. 14, No. 6, June 2003 3. Dogan A., Özgüner F.: Optimal and Suboptimal reliable scheduling of precedenceconstrained tasks in heterogeneous distributed computing, International Workshop on Parallel Processing, p. 429, Toronto, August 21-24, 2000 4. Geist A., Beguelin A., Dongarra J., Jiang W., Mancheck R., Sunderam V.: PVM 3 Users Guide and Reference Manual, Oak Ridge National Laboratory, Tennessee 1993 5. Kafil M., Ahmad I.: Optimal Task assignment in heterogeneous distributed computing systems, IEEE Concurrency: Parallel, Distributed and Mobile Computing, pp. 42-51, July 1998 6. Kasahara, H., Narita, S.: Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing. IEEE Transactions on Computers, Vol. C-33, No. 11, pp. 1023-1029, Nov. 1984 7. Kohler, W.H., Steiglitz, K.: Enumerative and Iterative Computational Approaches. in: Coffman, E.G. (ed.): Computer and Job-Shop Scheduling Theory. John Wiley & Sons, New York, 1976 8. Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, Vol. 31, No. 4, 1999, pp. 406–471 9. Park C.-I., Choe T.Y.: An optimal scheduling algorithm based on task duplication, IEEE Transactions on Computerss, Vol. 51, No. 4, April 2002 10. Radulescu A., van Gemund A. J.C.: Low-Cost Task Scheduling for DistributedMemory Machines, IEEE Transactions on Parallel and Distributed Systems, Vol. 13, No. 6, June 2002

Selection and Advanced Reservation of Backup Resources for High Availability Service in Computational Grid* Chunjiang Li, Nong Xiao, and Xuejun Yang School of Computer, National University of Defense Technology, Changsha, 410073 China, +86 731 4575984 [email protected]

Abstract. Resource redundancy is a primary way to improve availability for applications in computational grid. How to select backup resources for an application during the resource allocation phase and how to make advanced reservation for backup resources are challenging issues for grid environment to provide high availability service. In this paper, we proposed a backup resources selection algorithm base on resources clustering, then devised several policies for advanced reservation of backup resources. With this algorithm and these policies, the backup resource management module in grid middleware can provide resource backup service more efficiently and cost-effectively. These algorithm and policies can be implemented on the GARA system, make the QoS architecture more powerful and practical for computational grid.

1

Introduction

Grid computing [1] is a kind of distributed supercomputing, in which geographically distributed computational and data resources are coordinated for solving large-scale problems. The resources in the grid environment are wide-area distributed, heterogeneous in nature, owned by different individuals or organizations; which makes the grid a more variable and unreliable computing environment. The most common failure include machine faults in which hosts go down, and network faults where links go down. When some resources fail, the applications using such resources have to stall, waiting for the recovery of the failed resources or migration to other resources. Ways to reduce the stall-time for applications are critical for grid middleware to provide high availability service. Resource backup is such a kind of methods, that is, at the resource allocation phase, allocates redundant resources as the backup resources for the application, when some resources fail, the application’s tasks running on the failed resources can migrate to the backup resources, * This work is supported by the National Science Foundation of China under Grant No.60203016 and No.69933030; the National High Technology Development 863 Program of China under Grant No.2002AA131010. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 26–33, 2004. © Springer-Verlag Berlin Heidelberg 2004

Selection and Advanced Reservation of Backup Resources

27

without asking the global resources manager to reallocate resources which is time-consuming in the grid. The application in computational grid usually uses a large amount of resources, and the resources usually differ from each other on type, hardware/software architecture, and performance. So, how to select backup resources for this application is more complex than ever. The backup resources allocated for the application may not be used at all during the application running process. If making advanced reservation for all those backup resources during this phase, it is more feasible to get the resource wasted, which makes the application less cost-effective. So, it is necessary to design multiple policies for advanced reservation of backup resources. In this paper, firstly, we proposed a backup resources selection algorithm based on resources clustering. This algorithm simplified the selection process by dividing the resources used by the application into subsets, the resources in each subset share somewhat similarity and can share backup resources. Then we devised several policies for advanced reservation of backup resources. These policies can be implemented with the GARA system [2]. This paper is organized as follows. The resource backup process is described in section 2. In section 3, the selection algorithm of backup resources based on resources clustering is introduced. In section 4, the policies for advanced reservation of backup resources are presented, and briefly discussed the architecture of the policy engine for resource backup. Conclusion is drawn in section 5.

2

Resource Backup

Resource backup [3] in a single computer system, like a cluster or a server, is often done by hardware component redundancy. In computational grid, however, as the resources belong to different administrative domains, the grid middleware cannot rearrange the hardware components for redundancy. The only way for resource backup is to allocate redundant resources for the application that need high availability. Here, we call the resources allocated to the application on which the application is running as primary resources, and call the resources for redundancy as backup resources. As we know, the resource allocation process in the grid includes two phases [4]: resource discovery and reservation, for example, when a user submits an application with resources requirement described with RSL [5] to the computational grid, the global scheduler first analyzes the resource requirement, performs resource discovery using the grid information system, then computes a primary resource list; secondly, it try to obtain reservations for these resource. If the application needs high availability service, the scheduler can allocate redundant resources at these phases, keeping advanced reservation of some resources as backup. So, in the running process, when the resources on which some application’s task running are failed, these tasks can migrate to the backup resources and continue to run without reallocation process. This can reduce the stall time of the application, increasing its availability. Usually, the application in the grid does not explicitly declare which resources should be allocated as backup resources when it is submitted. The backup resources management module should select backup resource according to the resources requirement and

28

C. Li, N. Xiao, and X. Yang

availability requirement of the application. So, backup resources selection algorithm is absolute necessary. Furthermore, the backup resources may not be used during the running process, in order to avoid the waste of resources, flexible advanced reservation policies is also necessary. In this paper, we first present a backup resource selection algorithm based on resource clustering, then designed some policies for the advanced reservation of backup resources. They are very valuable for providing high availability services in computational grid.

3 3.1

Selection Algorithm Definitions

Before process resource clustering, we give the following definitions for the relations between resources. Definition 1, Substitution Relation, if the task running on resource can also run on and can satisfy the performance requirement of the tasks, then we call that resource is substitutable by denoted as Sub Definition 2, Similarity relation, if two resources and can substitute each other, i.e. Sub and Sub then we call and as similar resources, denoted as Sim Definition 3, Complete resource set, a resource set if Sim then we call as complete resource set. Definition 4, Similarity relation of resource set, complete resource set and for if Sub and Sub then we call and as similar resource set, denoted as Sim

3.2

Resource Clustering

It is obvious that the resources in a complete resource set can share backup resources. The primary resources allocated to an application can cluster into several complete resource set based on similarity relation, then we can choose backup resources for each set. The clustering includes two steps: inner domain clustering and inter domain clustering. First we will examine the determination of similarity relation between two resources. Determination of Similarity Relation. The similarity relation can be determined at two levels: physical level and logical level. At the physical level, it is obvious that homogeneous resources allocated to an application are similar. At the logical level, for two resources allocated to an application, and if the tasks running on can also run on and vice versa, and the application performance can still be satisfied, then and are similar resources. For example, a grid application GA, the tasks of it are programmed with Java, and there are two workstations can run these two tasks , one is a Linux workstation, and the other is a Windows workstation. If the tasks running on these two resources can be exchanged without degradation of the performance, then these two workstations are similar computing resources for the application, although they are heterogeneous computing resources.

Selection and Advanced Reservation of Backup Resources

29

Inner Domain Clustering. It is common that quite a few resources in the primary set are belonging to one administrative domain, for example a cluster or a mainframe. Usually, the resources in a domain are homogeneous, we suppose this is true for all administrative domains. So, the inner domain clustering is easy to perform. The primary resource set can be denoted as these resources are from domains, and the resources in each domain is a complete resource set. is the resources set from the domain. If the resources coming from a domain are heterogeneous, we can divide the domain into sub-domains, and keep the resources in each sub-domain are homogeneous. Inter Domain Clustering. In order to have the primary resources of the application cluster into complete resource sets, the application level clustering must be performed. Inter domain clustering is based on the similarity between domains, which can be determined with following rule: for and if Sim then Sim We use a matrix to describe the similarity relation between domains. If Sim then otherwise The algorithm for inter domain clustering is given in 1.

Fig. 1. Inter Domain Resources Clustering

3.3

Backup Resource Selection for a Resource Set

Availability Measurement. For selecting backup resources, it is necessary to devise a measurement method for the availability of a resource set. In this paper, for simplicity, we use an availability measurement model based on probabilistic model [6]. Suppose a primary resource set of an application is only when all the resources in it are available, the application can run smoothly; otherwise, the application will stall. Suppose A(R)is the availability of this resource set, and the failure rate of is then,

30

C. Li, N. Xiao, and X. Yang

Suppose the availability demand of this application is if then need to allocate backup resources for it. Suppose we have selected backup resources for this resource set, and the expanded resource set is the availability of it can be defined as: {at least resources in is available } If add a backup resource to R, denoted as then

If add more than one backup resources, the computation of is very difficult, for simplicity, we use an approximate method. In this approximate method, we suppose the failure rate of each resources in R is equal to the highest one, denoted as Then the availability of resource set R is In selecting backup resource for this resource set, the failure rate of the backup resource should not over suppose we chosen backup resources, then

Backup Resource Selection algorithm. Based on the upper discussions, we proposed the algorithm for selecting backup resources for a resource set R, shown in 2 In this algorithm, each iteration only choose one backup resource for

Fig. 2. Backup Resources Selection Algorithm for a Complete Resource Set

Selection and Advanced Reservation of Backup Resources

31

the complete resource set. This can determine the minimum number of backup resources for it. In practice, we can choose more than one resource at each time, improve the efficiency of this algorithm.

3.4

Backup Resource Selection Algorithm Based on Resource Clustering

Now we can describe the whole backup resource selection algorithm based on resource clustering as shown in 3:

Fig. 3. Backup Resource Selection Algorithm based on Resource Clustering

4

Policies for Advanced Reservation of Backup Resources

The allocation of backup resources is fulfilled by advanced reservation, in which the reservation is requested before it is needed. The GARA system based on the famous grid middleware Globus Toolkit [7,8], can process advanced reservation for multiple resources such as CPU slot, network bandwidth and storage capacity. But the GARA system lacks flexible policies for advanced reservation. In order to reduce the waste of resources, we designed several policies for advanced reservation of backup resources, these policies can be implemented on the GARA system, providing flexible resource backup mechanism. Flowing, we will introduce the idea of each policy briefly. Here, we use to denote the list of backup resource for application GA, the resources in which is selected by the selection algorithm proposed above.

4.1

Policies

Totally Advanced Reserved, TAR. When allocating backup resources, make advanced reservation for all of the resources in of application GA; that is

32

C. Li, N. Xiao, and X. Yang

only when all the backup resources have been advanced reserved, the application can begin to run. Partially Advanced Reserved, PAR. In the advanced reservation stage, only make advanced reservation for some of the resources in form the backup resource set During the running process, if the advanced reserved resources are not enough for failure recovery, then make advanced reservation for other backup resources in Compensate for Risk, CR. For reducing the number of backup resources in the whole grid environment, different applications can share their backup resources. In this policy, the backup resources which are advanced reserved can assemble into a resource pool, like the insurance mechanism in the society. But for this policy, the time management for advanced reservation is critical, because it is very often that backup resources for different application have different time period for reservation. Delayed Advanced reservation, DAR. In this policy, before the application start, none of the resources in is advanced reserved. Only when resource failure occurs, the resource reservation agent begins to reserve the backup resources in the backup resource list of this application. The number of resources reserved can be determined according to the failure condition and the availability demand of the application.

Fig. 4. The Architecture of Policy Engine

No-backup, NB. For completeness, we also call the method that does not prepare any backup resources for the application as a backup policy. That is, never make advanced reservation of resources used for backup. When failure

Selection and Advanced Reservation of Backup Resources

33

occurs, the job management module in the grid reallocates resources for the failed tasks.

4.2

Policy Engine

It is obvious that each policy has benefits and drawbacks. The backup resource management module in the computational grid will never use only one policy to serve all the applications. It is undoubtedly that a policy engine should be implemented for resource backup service. In 4 we described the architecture of the policy engine for resource backup service of computational grid. There are three critical modules in the policy engine: MLBR is the module that Manage the List of Backup Resources for applications, MARR is the module that Manage the Advanced Reserved Resources in the whole environment, and TMARR is Time Management module for Advanced Reserved Resources. All these modules can call the GARA API to control the advanced reservation of resources. The policy engine can provide flexible resource backup service for the applications.

5

Conclusion

Backup resource management for providing high availability service in computational grid faces two critical issues: backup resource discovery and advanced reservation. In this paper we proposed a backup resource selection algorithm and devised several policies for advanced reservation. These algorithm and policies can provide flexible resource backup service for the applications in computational grid.

References 1. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers (1999) 2. Simoni, A.: End-to-End Quality of Service for High-End Applications. PhD thesis, the University of Chicago (2001) 3. Hwang, K., Xu, Z.: Scalable Parallel Computing. Mc-Graw-Hill Companies Inc. (1998) 4. K. Czajkowski, I. Foster, N.K.C.S.M.W.S., Tuecke, S.: A resource management architecture for metacomputing systems. In Proceedings of the IPPS/SPDP’s 98 Workshop on Job Scheduling Strategies for Parallel Processing (1998) 5. WWW: The globus resource specification language rsl v1.0. (http://www.globus.org/gram/rsl_spec1.htm) 6. Archana Sathaye, S.R., Trivedi, K.: Availability models in practice. In Proceedings of Int. Workshop on Fault-Tolerant Control and Computing (FTCC-1) (2000) 7. Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputing Applications 11 (1997) 115–128 8. WWW: (http://www.globus.org)

An Online Scheduling Algorithm for Grid Computing Systems Hak Du Kim1 and Jin Suk Kim2* 1

Electronics and Telecomunications Research Institute, Taejon, Korea 2 School of Computer Science, University of Seoul, Seoul, Korea [email protected]

Abstract. Since the problem of scheduling independent jobs in heterogeneous computational resources is known as NP-complete [4], an approximation or heuristic algorithm is highly desirable. Grid is an example of the heterogeneous parallel computer system. Many researchers propose heuristic scheduling algorithm for Grid [1], [8], [9], [10]. In this paper, we propose a new on-line heuristic scheduling algorithm. We show that our scheduling algorithm has better performance than previous scheduling algorithms by extensive simulation.

1 Introduction A Grid computing system is a system which has various machines to execute a set of tasks. We need high performance Grid computing systems in the field of natural science and engineering for large scale simulation. In this paper, we propose a scheduling algorithm which assigns tasks to machines in a heterogeneous Grid computing system. The scheduling algorithm determines the execution order of the tasks which will be assigned to machines. Since the problem of allocating independent jobs in heterogeneous computational resources is known as NP-complete [4], an approximation or heuristic algorithm is highly desirable. In the scheduling algorithms, we consider that the tasks randomly arrive the system. We assume that the scheduling algorithms are nonpreemptive, i.e., tasks must run to completion once they start, and the tasks have no deadlines. All the tasks are independent, i.e., there is no synchronization or communication among the tasks. In the on-line mode, a task is assigned to a machine as soon as it arrives at the scheduling system

2 Related Works The scheduling problem has already been investigated by several researchers [1]. In MET(Minimum Execution Time), the scheduling algorithm is developed to minimize execution time, i.e., the algorithm assigns a task to the machine which has the least * Corresponding author: Jin Suk Kim M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 34–39, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Online Scheduling Algorithm for Grid Computing Systems

35

amount of execution time. Since this algorithm does not see the ready times of the machines, it has imbalance of load among machines. This algorithm calculates only the minimum one among m machine execution times, and then the algorithm assigns the task to the selected machine. The time to find the machine which has minimum execution time is O(m). The MCT(Minimum Completion Time) assigns a task to the machine which has minimum completion time, i.e., the algorithm gets the completion time for each task by adding begin time and execution time, and calculates the minimum one among m machine completion times. The algorithm assigns the task to the selected machine. Therefore, the time to get the machine which has minimum completion time is O(m). The KPB(K-Percent Best) first finds (k/100)*m best machines based on the execution time for the task, and the algorithm calculates the minimum one among selected machine completion times, and then the algorithm assigns the task to the machine which has the minimum. The time to get the subset of machines is O(mlogm) because the time for sorting m execution times is needed. The time to determine the machine which has minimum completion time is O(m). Overall time complexity of KPB is O(mlogm).

3 A Scheduling Algorithm The execution time denotes the amount of time which is taken to execute task on machine [1]. The completion time denotes the time at which machine completes task Let the begin time of on be We can see that from the above definition. is defined as the set of tasks. n is the number of tasks and m is the number of machines. In this paper, we use a makespan as a performance metric for scheduling algorithms. The makespan is the maximum completion time when all tasks are scheduled. Figure 1 shows a proposed scheduling algorithm MECT(Minimum Execution Completion Time). The inputs of the scheduling algorithm are a task and a set of execution times which is taken to execute on machines In the step I, MECT finds maximum begin time among the begin times of all machines. In the step II, find a subset of machines M’ such that for in M’. In the step III, if M’ is not empty, MECT determines the machine in M’ which has minimum execution time to execute the task Otherwise, MECT determines the machine in M which has minimum completion time to execute the task Finally, MECT returns the index of machine k. Here, we compute the time complexity of MECT. The time that is taken to get a maximum begin time is O(m). The time to find a subset of machines M’ is also O(m). In the step III, we can get k in O(m). Therefore, overall time complexity of MECT is O(m).

36

H.D. Kim and J.S. Kim

Fig. 1. A Scheduling Algorithm MECT

4 Simulation Results In this section, we made a simulation program with SimJava which is used in discrete model simulations [6]. In this simulation, we assume that the execution time for each task on each machine is known prior to execution. This assumption is used when studying scheduling algorithm for heterogeneous computing systems [11]. We use task-machine matrix which has the execution times. Figure 2 shows an example of task-machine matrix. For example, the 3rd row represents execution times for on each machines, i.e., To simulate the scheduling algorithms on various scheduling situations, many studies use task-machine matrix consistency model [11]. We say a task-machine matrix is consistent if machine executes task faster than machine then machine executes all tasks faster than machine We say a task-machine matrix is inconsistent where machine executes some tasks faster than machine and machine executes other tasks faster than machine A task-machine matrix is said to be semi-consistent if some columns are consistent and other columns are inconsistent. Figure 2 represents inconsistent task-machine matrix which has 10 tasks and 5 machines. In this matrix, we can see that executes faster than but executes slower than In this simulation, we made task-machine matrices which have 1,000 tasks and 20 machines. We use the arrival rate of tasks 100. Figure 3 shows the average makespans for scheduling algorithms in inconsistent model. The machine heterogeneity is varied 10 to 120 and the task heterogeneity is 3000. We have 50 tests for each cases. It can be noted that MECT outperforms previous scheduling algorithms.

An Online Scheduling Algorithm for Grid Computing Systems

Fig. 2. A 10×5 Task-Machine Matrix

Fig. 3. The makespans for scheduling algorithms in the inconsistent model

Fig. 4. The makespans for scheduling algorithms in semi-consistent model

37

38

H.D. Kim and J.S. Kim

Fig. 5. The makespans for scheduling algorithms in consistent model

Figure 4 and 5 show the simulation results in the semi-consistent model and the consistent model, respectively. In these figures, we can see that MECT competes with MCT. Note that the performance of MET is lower than that of other three algorithms. In the last simulation, the machine heterogeneity is 20 and the task heterogeneity is varied from 500 to 3000. Figure 6 compares the scheduling algorithms based on makespan. We can see that MECT outperforms previous scheduling algorithms when the task heterogeneity is high.

Fig. 6. The makespans for scheduling algorithms in consistent model

An Online Scheduling Algorithm for Grid Computing Systems

39

5 Conclusion In this paper, we propose a new scheduling algorithm MECT for heterogeneous Grid computing systems. The proposed scheduling algorithm is a kind of on-line scheduling algorithm. We show that MECT has better performance than the traditional scheduling algorithms especially when the heterogeneity is high.

References [1]

M. Maheswaran, S. Ali, H. J. Siegel, D. Hensgen, and R. F. Freund, “Dynamic Matching and Scheduling of a Class of Indenpendent Tasks onto Heterogeneous Computing Systems,” Proc. of the 8th Heterogeneous Computing Workshop, pp. 30-44, April, 1999. [2] I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” Journal of High-Performance Computing Applications, vol. 15, no. 3, pp. 200-222,2001. [3] T. D. Braun, H. J. Siegel, and Noah Beck, “A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems,” Journal of Parallel and Distributed Computing, vol. 61, pp. 810-837, 2001. [4] O. H. Ibarra and C. E. Kim, “Heuristic Algorithm for Scheduling Independent Tasks on Nonidentical Processors,” Journal of the ACM, vol. 24, no. 2, pp. 280-289, April, 1977. [5] M. Pinedo, Scheduling: Theory, Algorithms, and Systems, Prentice Hall, NJ, 1995. [6] F. Howell and R. McNab, “SimJava: A Discrete Event Simulation Package For Java With Applications In Computer Systems Modelling,” Proc. of the 1st International Conference on Web-based Modelling and Simulation, January, 1998. [7] A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, and C. L. Wang, “Heterogeneous Computing: Challenges and Opportunities,” IEEE Computer, vol. 26, pp. 18-27, June, 1993. [8] R. Buyya, J. Giddy, and D. Abramson, “An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for Parameter Sweep Applications,” Proc. of the 2nd International Workshop on Active Middleware Services, August, 2000. [9] H. Barada, S. M. Sait, and N. Baig,“Task Matching and Scheduling in Heterogeneous Systems using Simulated Evolution,” Proc. of the 15th Parallel and Distributed Processing Symposium, pp. 875-882, 2001. [10] B. Hamidzadeh, Lau Ying Kit, and D.J. Lilja, “Dynamic Task Scheduling using Online Optimization,” Journal of Parallel and Distributed Systems, vol. 11, pp. 1151-1163, 2000. [11] T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, D. Hensgen, and R. F. Freund, “A Comparison Study Mapping Heuristics for a Class of Meta-tasks on Heterogeneous Computing Systems,” 8th IEEE Heterogeneous Computing Workshop, pp. 15-29,1999.

A Dynamic Job Scheduling Algorithm for Computational Grid* Jian Zhang and Xinda Lu Department of Computer Science and Eng., Shanghai Jiaotong Univ., Shanghai 200030, China {zhangjian,

lu-xd}@cs.sjtu.edu.cn

Abstract. In this paper, a dynamic job-scheduling algorithm is proposed for a computational grid of autonomous nodes. This algorithm tries to utilize the information of a practical system to allocate the jobs more evenly. In this algorithm, the communication time between nodes and the scheduler is overlapped with the computation time of the nodes. So the communication overhead can be little. The principle of scheduling the job is based on the desirability of each node. The scheduler would not allocate a new job to a node that is already fully utilized. The execution efficiency of system will be increased. An implementation framework of the algorithm is also introduced.

1 Introduction The Grid concept has recently emerged as a vision for future network based computing. A computational grid is a large scale, heterogeneous collection of autonomous systems, geographically distributed and interconnected by low latency and high bandwidth networks. Networks of workstations, NOWs, represent particular forms of grids. Like an electrical power grid, the Grid will aim to provide a steady, reliable source of computing power. The most difficult problems in a computational grid are the management and control of resources, dependability and security. Obviously, until now, researchers have proposed many techniques to allocate jobs dynamically in a parallel system to improve its performance. But most of these algorithms ignore practical issues such as the speed of nodes, the process ability of the node, and the different size of the jobs. These schedulers might not perform satisfactorily in practical. In this paper, a dynamic job-scheduling algorithm is proposed. In this algorithm, the scheduler tries to allocate a job according to the knowledge of the nodes. This algorithm is very general and it can adapt to more situations to improve the job fairness. This algorithm is also a valid method to prevent the saturation of the nodes. This paper is organized as follows. In the second section, an analysis of the general job scheduling policies is presented. Then in the third section, the job-scheduling algorithm is introduced and explained in detail. The mobile agent based implementation framework is giving in the fourth section, followed by a summary of this paper in the fifth section. *

This work was supported by the National Science Foundation of China (No. 60173031)

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 40–47, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Dynamic Job Scheduling Algorithm for Computational Grid

41

2 Analysis of the General Job Scheduling Policies There are three general job-scheduling policies [1]: Static scheduling: In this policy, the job is assigned to the nodes at compile time and will never be reassigned again. Dynamic scheduling: System makes all scheduling decisions at run time. They use a central work-queue where all idle nodes go to find work to execute. Affinity scheduling: In this policy, schedulers create one local work queue for each node. Each node is statically assigned some work, as if static scheduling were used. If load imbalance actually occurs, idle nodes search the work queues of other nodes to find work to do. In the static scheduling policy, because the scheduler need not communicate with the node before the assigning the job, and the job will never be reassigned, the synchronization and communication overhead will be very little. But the drawback is that it may lead to underutilization of nodes. In the dynamic scheduling policy, the right to take initiative is controlled by the node. When a node is in the idle state, it will send message to the scheduler to ask for the job. The job in the head of job queue will be assigned to this node. In this policy, the load imbalance will be reduced to the minimum. However, this policy may also result in an increase of communication overhead. Before the idle node can get the job, it has to spend time in making communication with scheduler. The affinity scheduling policy attempts to strike a balance among the static and dynamic schedulers. On one hand, it can reduce the load imbalance. There will be no idle node if there is a job waiting for executing. On the other hand, the communication overhead between nodes and scheduler is little. Each node has its own job queue. But in this policy, migration may be taken among nodes. The overhead between nodes is increasing. In some condition, the overhead related to migration is higher than the overhead related to load imbalance. Thus the performance of affinity scheduling policy may be worse than static scheduling policy. If no migration in the affinity scheduling policy, the performance of system would be improved notably. With the utilization of some other practical information, we can do this. The size of jobs is useful practical information, but it is hard to get in advance. We still can get some other information from the system. We can estimate the speed of each node. We can know the maximum number of jobs that can be executed concurrently on each node. And we can also know that there are three jobs running in each node right now. Such information is also useful. The algorithm we proposed in this paper is based on the utilization of these information.

3 The Dynamic Job Scheduling Algorithm 3.1 The Architecture of the Algorithm Intuitively, the best load-balancing status occurs when all nodes are at the point of full utilization and each node’s workload is proportional to its capacity. We try to increase the throughput of the system. We may not allocate more jobs to a fully utilized node, because this will cause imbalance without improving the overall throughput [2]. We

42

J. Zhang and X. Lu

should know whether we could allocate a new job to a particular node or not. An important characteristic of the algorithm is to estimate the desirability of executing a new job in each node. We use a scheduling function to reflect this desirability. When the node is saturated, scheduler will not allocate new job to it. The architecture of this algorithm is shown in Fig.1.

Fig. 1. Architecture of the algorithm

The characteristic of this architecture is that there is no job queue for each node. Additionally, there is a message queue for the scheduler. All the messages sent to the scheduler will be appended to this queue. The scheduler will extract the message from the message queue to process. Taking Fig.1 as an example. Assuming a node will be fully utilized when three jobs are running in it, then the scheduler will not dispatch the next job until a running job is completed. Because all short jobs are in a job in will more likely be completed first. Therefore, the next long job has a higher chance of being allocated to Then we can reduce the chances of overloading the nodes and tend to distribute long jobs to the nodes more evenly.

3.2 The Algorithm In order to explain this algorithm in detail, here some notions are explained at first. P The set of all nodes A node in the set P, The initial speed of a node The bounding factor of a node The run queue length of a node The effective speed of a node The scheduling function Central job queue Message queue

A Dynamic Job Scheduling Algorithm for Computational Grid

43

In a computational grid, all the nodes consist the set of nodes P. denotes each node in the node set, is a bounding factor. It limits the number of schedulable jobs in that node. The bounding factor is generally set such that when jobs are scheduled to the node is fully utilized. The value of the bounding factor will affect the response time of system. If we set a high for a node the number of jobs that can be executed concurrently on this node will be high, then the response time will be long. So if in a system where response time is important, the bounding factor should be set to a low value. is the run queue length of a node, reflecting the number of active jobs which are being executed on this node. Moreover, should not larger than

Each node has its speed. When it is in idle state, its speed will be its initial speed. We use to denote this speed. This speed should be the highest speed for this node. When some jobs are being executed in this node, the speed of this node will decrease. The current speed will be its effective speed. We use to denote the effective speed of node. The effective speed of a node has an inverse proportion with the number of jobs in this node. The more the number of jobs is, the lower the effective speed is. When the node is in idle state, its effective speed should be its initial speed. We may use the formula below to calculate the effective speed of a node:

is the scheduling function reflecting the desirability of executing a new job on this node The larger is, the more desirable sending a job to is. If equals zero, this means the number of jobs in this node has reached this node has already been fully utilized. The scheduler tries to find the node with the highest from the nodes whose is larger than zero.

If of all the nodes are equal to zero,

this means all the nodes in the parallel system are all saturated. The scheduler will not dispatch additional jobs to overload an already saturated system but to wait for some running jobs to complete. The scheduling function may be expressed like this:

From this formula, we see that the larger is, the smaller is. When is less than will be larger than zero. When equals will be zero. If is less than zero, it means the node is overloaded with local jobs.

44

J. Zhang and X. Lu

There are three kinds of messages that the scheduler will receive. Job_done: It is sent from the node to the scheduler when a job in this node is completed. New_job: This message is sent to the scheduler when a new job is coining. The scheduler will append this job at the end of the central job queue. Current_load: The current run queue length of a node. The algorithm is shown below:

In the initial phase, it will get the bounding factor and the initial speed of each node The for each node will be set zero. Then calculates the effective speed and the value of scheduling function for each node. The algorithm then processes the incoming message in the main loop. If the message is “New_job”, it will append the new job at the end of central job queue If the message is “Job_done”, the of that node will be minus 1. The and will be recalculated. In the next step, the algorithm will find the with the highest from the set P. If the highest is zero, it will not extract the job from the to execute but to process the next message from the If the highest is larger than zero, it will extract the job from the and execute it on The of that node will be added 1. The and will also be recalculated. Compared with the dynamic scheduling policy, in our algorithm, the node does not need to use a special time to make communication with the scheduler. The time for communication and the time for computation are overlapped. When it completed a job, it will send a “Job_done” message to the scheduler, and it will continue to execute other jobs assigned on it but to wait for new job. This could reduce the overhead between the node and the scheduler. It alleviates the drawback of the dynamic scheduling policy. Compared with the affinity scheduling policy, when the scheduler want to allocated a job, the algorithm does not select the node blindly. It will select among nodes whose scheduling functions are large than zero, which is calculated with the process ability and the effective speed of that node. Therefore, it can evenly distribute the workload among the nodes, and no migration is needed. Another important characteristic of this algorithm is that it would not allocate more jobs to a node when the desirability of that node is zero. This means it would not allocate more jobs to the node that is already fully utilized. Most other scheduling algorithms submit the jobs to the selected node without checking if the node has become

A Dynamic Job Scheduling Algorithm for Computational Grid

45

saturated after selection and before sending out the job. These algorithms want to increase the throughput of system, but on contrary, it will decrease the execution efficiency of system.

4 Mobile Agent Based Implementation of the Algorithm 4.1 Mobile Agent Mobile agent is an emerging paradigm that is now gaining momentum in several fields of applications [3]. A mobile agent corresponds to a small program that is able to migrate to some remote machines, where it is able to execute some function or collect some relevant data then migrate to other machines in order to accomplish another task. The basic idea of this paradigm is to distribute the processing through the network: that is, sent the code to the data instead of bringing the data to the code. The type of applications that are most appropriate for mobile agent technology would include at least one of the following features: data collection, searching and filtering, distributed monitoring, information dissemination, negotiating and parallel processing [4]. When the application is computationally intensive or requires the access to distributed sources of data then the parallel execution of mobile agents in different machines of the network seems to be an effective solution. At the same time, the dynamic and asynchronous execution of mobile agents fits very well in changing environments where it is necessary to exploit some notion of adaptive parallelism.

4.2 Implementation Framework The implementation of the algorithm is based on Aglet (IBM). Aglet is probably the most famous mobile agent system. It models the mobile agent to closely follow the applet model of Java, with the following characteristics: object-passing, autonomous execution, local interaction, asynchronous, disconnected operation, parallel execution, etc. Aglets are Java objects that can move from one host to another, their fundamental operations including: creation, cloning, dispatching, retraction, deactivation, activation, disposal, and messaging [6]. It is possible for them to halt execution, dispatch to a remote host, and restart executing again by presenting their credentials and obtaining access to local services and data. Aglet provides a uniform paradigm for distributed object computing [7]. Using Aglet can ease the development of distributed computing system. We are doing research on mobile agent based parallel computing. On the basis of that, an implementation framework of the algorithm is proposed. It is composed of two parts: Console and Monitor. They are both aglets with ability of communication through messaging. Console is responsible for the initialization, decision-making and job dispatching. Monitor is responsible for the load monitoring. Console resides on the central node, waiting for the coming of new jobs and messages from monitor in each node. When a new job comes it will find an appropriate node to execute it according to the algorithm. Meantime it will handle messages from

46

J. Zhang and X. Lu

monitors: update run queue length and scheduling function. Monitors check the run queue length of each node, and send this information to console if these values change

4.3 Experimental Results The experiments were conducted on a network, consisting of a Sun server and five Sun workstations. The tasks are generated on the server and then scheduled to other machines. Two conditions were considered in the tests: without and with background loads. The background loads are computing agents, which are long time consuming. The tasks are generated with interval of 4 and 6 second respectively, until 100 tasks are generated. For comparing with the dynamic scheduling algorithm (DS), the round robin (RR) scheduling algorithm was used to do the same tests. The experimental results are shown in table 1.

From table 1, it is observed that the performance of DS is better than that of RR. The average speedups for the two conditions are 1.24 and 1.27 respectively. As the tasks are generated and scheduled, the run queue length of some machines may be greater than the bounding factor, their scheduling function is less than zero. DS algorithm will not assign new tasks to these nodes until the scheduling function is positive. While RR algorithm continues scheduling new tasks to them and finally get a bad result.

5 Conclusion In this paper, we introduce a dynamic job scheduling algorithm. The basic principle of this algorithm is trying to utilize the information of each node in a computational grid to allocate the jobs among them. A scheduling function is used to determine the desirability of each node to accept a new job. If a node were already fully utilized, the new job would not be allocated to it again. This algorithm tries to improve the throughput, response time, and fairness of the system. Moreover, a mobile agent based implementation framework is proposed.

A Dynamic Job Scheduling Algorithm for Computational Grid

47

References 1. Markatos E P.: How architecture evolution influences the scheduling discipline used in shared-memory multinodes. Joubert G R. Proceedings of Parco 93. Amsterdam: Elsevier, (1993) 524-528 2. Hui C C, Chanson S T.: Improved strategies for dynamic load balancing. IEEE Concurrency, 3 (1999) 58-67 3. V. Pham, A. Karmouch.: Mobile Software Agents: An Overview. IEEE Communications Magazine, 7 (1998) 26-37 4. B. Venners.: Solve Real Problems with Aglets, a Type of Mobile Agent, JavaWorld Magazine, 5 (1997) 5. Perdikeas M.K., Chatzipapadopoulos F.G., Venieris I.S. An Evaluation Study of Mobile Agent Technology: Standardization, Implementation and Evolution. IEEE International Conference on Multimedia Computing and Systems, 2 (1999) 287 -291 6. Lange Danny B., Oshima Mitsuru.: Mobile Agents with Java: The Aglet API. World Wide Web Journal, 3 (1998) 111-121 7. Lange Danny B., Oshima Mitsuru.: Programming and Deploying Mobile Agents with Java, Addison-Wesley, MA, (1998)

An Integrated Management and Scheduling Scheme for Computational Grid* Ran Zheng and Hai Jin Cluster and Grid Computing Lab Huazhong University of Science and Technology, Wuhan, 430074, China {zhraner, hjin}@hust.edu.cn

Abstract. Computational grids have become attractive and promising platforms for solving large-scale high-performance applications of multi-institutional interest. However, the management of resources and computational tasks is a critical and complex undertaking as they are geographically distributed, heterogeneous in nature, owned by different individuals or organizations with their own policies, different access, and dynamically varying loads and availability. In this paper, we propose an integrated management and scheduling scheme for computational grid. It solves some pivotal and important questions such as resources heterogeneous and information dynamic. It affords transparent support for high-level software and grid applications, enhancing the performance, expansibility and usability of computers, and providing incorporate environment and information service. This scheme has universality for computational grid and makes every grid resource work efficiently.

1 Introduction Computational grids [1][2] are becoming more attractive and promising platform for solving large-scale computing intensive problems. In this environment, various geographically distributed resources are logically coupled together and presented as a single integrated resource. The resource management and scheduling is the key technology of a grid. How to manage the resources efficiently is the pivotal issue, which decides whether grid is available or not. At the same time it is a complex undertaking as the resources are distributed geographically, heterogeneous in nature, owned by different individuals or organizations, and they have dynamically varying loads and availability. Some existing resource managing technologies of the parallel and distributed system cannot fit well for the characteristics of the computational grids mentioned above. This paper presents an integrated resource management and scheduling scheme for computational grid. In section 2, we analyze three resource management models and point out the hierarchical structure is suitable for grid. We put forward an integrated * This paper is supported by National Science Foundation under grant 60125208 and 60273076. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 48–56, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Integrated Management and Scheduling Scheme for Computational Grid

49

management and scheduling scheme in section 3, and the prototype is also explained. In section 4, task dispatching and selection algorithms for this architecture are introduced. Section 5 focuses on the performance evaluation of this scheme. Finally, we draw the conclusion and give out future work in section 6.

2 Anatomy of Resource Management Architectures Primarily, there are three different scheduling models: Centralized management model This can be used for managing single or multiple resources, which suits well for cluster (or batch queuing) systems such as Condor [3], [4], LSF [5], and Codine [6]. There are many advantages: simple structure, convenient maintenance, certifiable consistency or integrality. However, it is hard to achieve in distributed system for the scheduling bottleneck. Therefore it is not suitable for capacious grid. Decentralized management model In this model resources are partitioned into different virtual domains. In each domain here is a domain scheduler. The model appears high scalable, but remote status is not available so the optimal scheduling is questionable! What’s more, the traffic is heavy and the data are located decentralized, which make against data consistency and scheduling in multi-domains. Hierarchical management model This model looks like a hybrid (central and decentralized) model, which not only avoids the shortcoming of the two models, but also settles some challenging problems: site autonomy, heterogeneous environment and policy extensibility. It has been adopted in Globus [7][8], Legion [9][10], Ninf [11], and NetSolve [12][13].

Fig. 1. Hierarchical Architecture Model

Our grid resource management architecture follows this hierarchical model, shown in Figure 1. It is constructed with a super-scheduler, several local schedulers and published resources, just like a tree. Super-scheduler is root, and local schedulers are non-leaves. All leaves are resources, which are divided into several virtual domains.

50

R. Zheng and H. Jin

3 Integrated Management and Scheduling Scheme 3.1 Integrated Resource Management and Scheduling Scheme Resource management is highly important in grid, which is similar but more complex than distributed system. It should not only support multi-scheduling, but also suit for the complex surrounding and provide necessary QoS. The integrated management and scheduling scheme is shown in Figure 2. The key components are Grid Resource Scheduler, Grid Information Server and Grid Nodes.

Fig. 2. Integrated Resource Management and Scheduling Structure

Grid Resource Scheduler is a decision-making unit, which incepts user requests, adopts optimal scheduling algorithms and handles seamless managing issues. Furthermore, the scheduler must be able to deal exceptional case. For example, after the fail of one resource, the scheduler can reschedule tasks to other idle resources. Grid nodes include devices and software. The node managers handle all issues from upper scheduler and harmonize the actions of devices and active processes. The dispatcher determines inner scheduling based on upper info or resource statuses that are collected by monitor. Examples of local manager include cluster systems such as MOSIX and queuing systems such as Condor, Codine and LSF. Grid Information Server acts as databases for describing items of interest to the resource management systems, such as resources, jobs, schedulers.

3.2 Grid Scheduler Infrastructure Grid scheduler acts as a mediator between users (application) and grid resources. It is responsible for grid management. The representation of grid scheduler is shown in Figure 3, where there are two hierarchies: decomposing level and scheduling level.

An Integrated Management and Scheduling Scheme for Computational Grid

51

Job receiver module incepts user requests and returns results. Task decomposing module decomposes job into several parallel, mutually exclusive or synchronous atomic tasks. The rules are saved in database, which are the foundation of rule-based illation. Resource-task matching module finishes the matching between atomic tasks and resources, identifying the exchangeable and compensatory ability of resources. Scheduling module searches information from information server, and saves or modifies some context. Task scheduling module analyses scheduling strategies with different principles. Rule-based illation module arranges resources for atomic tasks. Scheduler optimizing module can optimize scheduling on-line. Job receiver module takes charge of scheduling generation, creation of atomic tasks, and maintenance of job status. Resource matching module decomposes jobs, searches grid information server, and generates allocating schemes. Scheduling illation module analyses generated reasonable schemes and determines the best. Scheduling module allocates tasks to selected resources according with the result.

Fig. 3. Grid Scheduling Structure

Re-scheduler happens with interruption of outer or inner abnormity. Outer abnormity is caused with the arrival of urgent jobs or cancellation. Inner abnormity is caused by factors such as resource failure. When interruption happens, job scheduler sends signal to task scheduling module, then re-dispatches and reschedules.

4 Grid Scheduling Algorithm 4.1 Dynamic Scheduling Mechanism There are two commonly dynamic methods, named event-based and timed-based. In timed-based method the interval of scheduling is periodical. After a fixed time a new period is coming: eliminate all finished tasks and insert new tasks to reschedule.

52

R. Zheng and H. Jin

Event-based method is different, which inspects the rescheduling status constantly and decides the possibility whether to issue a rescheduling event. The structure of dynamic scheduling is shown in Figure 4. Job has its own priority to be scheduled. A new submitted job springs the scheduling of the selected local scheduler at the same time if its emergency is high enough to exceed the threshold value. Otherwise it is scheduled with other tasks when the scheduling slot comes. The failure of resource or the cancellation of tasks will all lead to reschedule.

Fig. 4. The Structure for Dynamic Scheduling

4.2 Integrated Scheduling Algorithms All tasks in grid can be classified as real-time or best-effort class. All tasks will be space-based dispatched widely and time-based selected in virtual domain. The basic rule is real-time tasks are processed earlier than best-effort tasks. 4.2.1 Grid Task Dispatch Algorithm of Grid Scheduler We propose a mixed scheduling algorithm with the goal of the least number of missed real-time deadlines and load balance of grid resources, called LMLB. The algorithm is invoked at every real-time task or best-effort task arrival. For every real-time task, resource-task matching is done first. If no satisfied resource, add 1 to the scheduling counter whose initial value is 0. If the value is out of the scheduling threshold, it is regarded the task is disable, else put it to renewed window to reschedule next time, then repeat the operation in any case. If some resources are suitable for the request, select one from the matching unit with the goal of the least number of missed deadlines. Compute as

where is the deadline specification, is the scheduling time. Estimate the processing time on each available resource

where

denotes the round-trip time of network,

between resource and user,

is the bandwidth,

is transmitted data is the waiting time on

is a

An Integrated Management and Scheduling Scheme for Computational Grid

parameter of waiting probability, arbitrary units) for the task, and Select a suitable resource

where

53

is the logical computational “cost” (in some is resource performance (in units per second).

whose

is not over and nearest to

denotes the logical computation of resource

mation of all tasks’ logical computation “costs” in resource

the real-time sumis

,

is the boundary whether resource overloads or not. However, if there is no resource satisfying which is shown no suitable resource for the task at this moment, add 1 to its scheduling counter for rescheduling. For every best-effort task, the initial operation is similar with real-time task. Select one with the goal of load balance of grid resources from the matching unit. Estimate current load of resource on each available resource

where in resource

is the real-time summation of all tasks’ logical computation “costs” and

Select a suitable resource

denotes the logical computation of resource whose

is the smallest.

4.2.2 Tasks Selection Algorithm in Virtual Organization A mixed algorithm is proposed, in which best-effort tasks are solved after real-time tasks, so that most deadlines can be satisfied and resource can be utilized efficiently. Grid scheduler should ensure the maximal benefits of users, namely the minimal number of missed tasks. Therefore, real-time tasks are prior to best-effort tasks to schedule. But best-effort tasks should also be processed as soon as possible. The real-time task selection algorithm should consider QoS of all tasks. Weighted Priority Schedule Algorithm (WPSA) is adopted in real-time scheduling. In WPSA, the priorities of tasks in virtual organization are varied dynamically with their importance. Tasks with different priority get different QoS in this algorithm. Highest Responsibility First Algorithm (HRFA) is used for best-effort tasks. Response Rate is the ratio of response time of the task, defined as:

54

R. Zheng and H. Jin

Here Response Time is the summation of waiting time that the task joins the system and the execute time estimated. Therefore, the Eq.6 can be written as:

5 Performance Evaluation of the Scheduling Scheme In this section, we will evaluate the grid schedule scheme from the viewpoints of QoS of grid applications. The process of matching, scheduling, and executing is performed so that some metric of aggregate QoS delivered to the requestor is maximized [14]. LMLB algorithm is based on the greedy scheduling algorithm described in [15] as MCT (Minimum Completion Time). MCT is an aggressive approach that does not consider the possible deadlines. It can lead to an increasing load of resource, even overloaded. LMLB can overcome this shortcoming efficiently. The scheduler uses load measurements and searches all resources and finds an optimal one. Figure 5 shows a comparison of the failure rate between MCT and LMLB algorithm. The failure rate is defined as the percentage of requests that missed their deadlines. Results are shown for the different thresholds of workload conditions. Although LMLB decreases the overloads of grid resources, it is inevitable that the loads of them are not in balance. In order to utilize idle resources sufficiently, the resources with the least loads are distributed to best-effort tasks with no limitation of deadlines. It not only gets load balance, but also ensures earlier finishing time.

Fig. 5. Failure Rate for Greedy and the LMLB Algorithm

Fig. 6. Waiting Time of Tasks with Different Priorities

WPSA is adopted for real-time scheduling. The different deadlines or degrees of importance are distinguished with different weights. WPSA ensures higher priority for more emergent tasks, which can be ahead processed. Suppose tasks are divided

An Integrated Management and Scheduling Scheme for Computational Grid

55

into two classes with higher or lower priorities. Figure 6 shows the comparison among WT1, WT2 and WT, which denote the higher, the lower and all tasks respectively. The conclusion is that tasks with higher priorities get better QoS. HRFA is used for best-effort tasks. Compare it with First Come First Server algorithm (FCFS) and Shortest Executing Time First algorithm (SETF). FCFS only considers Waiting Time and ignores Executing Time, while SETF just on the opposition emphasizing Executing Time. HRFA is the compromise between them, which not only cares for short tasks, but also do not let long tasks wait for too long. From Eq.7, we can know tasks with shorter executing time can get higher response rate, saying HRFA gives special treatment to short tasks. But if waiting time of one task is so long that its response time increases with the extending of waiting time, the task will be possible to have the highest response rate to schedule immediately.

6 Conclusions and Future Work In this paper, we proposed an integrated resource management and scheduling scheme for computational gird. The integrated algorithms for high QoS of grid applications are used and correlative evaluations are investigated. The results suggest that future applications on grid will compete for resources. Based on this architecture, there are many factors and aspects need to be studied carefully, such as fault tolerance, high utilities of grid resources, trade-off between failure-rate and cost.

References 1. 2.

3. 4. 5. 6. 7. 8.

9.

I. Foster and C. Kesselman (ed.), The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, 1998. M. Baker, R. Buyya, and D. Laforenza, “The Grid: International Efforts in Global Computing”, Proc. of International Conference on Advances in Infrastructure for Electronic Business, Science, and Education on the Internet, Rome, Italy, 2000. M. Litzkow, M. Livny, and M. W. Mutka, “Condor – A Hunter of Idle Workstations”, Proc. of the 8th International Conference of Distributed Computing Systems, June 1988. J. Basney and M. Livny, “Deploying a High Throughput Computing Cluster”, High Performance Cluster Computing, Vol. 1, Chapter 5, 1999. Q. Zhao and J. Suzuki, “Efficient quantization of LSF by utilizing dynamic interpolation”, Proc. of 1997 IEEE International Symposium on Circuits and Systems, Hong Kong, 1997. F. Ferstl, Job and resource management systems: Architectures and Systems, Vol.1, pp.499~518, 1999. I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit”, International Journal of Supercomputer Applications, Vol.11, No.2, pp.115-128, 1997. K. Czajkowski, I. Forster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke, “A Resource Management Architecture for Metacomputing Systems”, Proc of the 4th Workshop on Job Scheduling Strategies for Parallel Processing, 1998. A. Grimshaw and W. Wulf, “The Legion Vision of a Worldwide Virtual Computer”, Communications of the ACM, Vol.40, No.1, 1997.

56

R. Zheng and H. Jin

10. S. Chapin, J. Karpovich, and A. Grimshaw, “The Legion Resource Management System”, Proc. of the 5th Workshop on Job Scheduling Strategies for Parallel Processing, 1999. 11. H. Nakada, M. Sato, and S. Sekiguchi, “Design and Implementations of Ninf: towards a Global Computing Infrastructure”, Future Generation Computing Systems, Metacomputing Special Issue, 1999. 12. H. Casanova and J. Dongarra, “NetSolve: A Network Server for Solving Computational Science Problems”, International Journal of Supercomputing Applications and High Performance Computing, Vol. 11, No. 3, 1997. 13. H. Casanova, M. Kim, J. Plank, and J. Dongarra, “Adaptive Scheduling for Task Farming with Grid Middleware”, International Journal of Supercomputer Applications and HighPerformance Computing, 1999. 14. M. Maheswaran, “Quality of Service Driven Resource Management Algorithms for Network Computing”, Proc. of International Conference on Parallel and Distributed Processing Technologies and Applications, 1999. 15. M. Maheswaran, S. Ali, H. Siegel, D. Hensgen, and R. Freund, “Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems”, Journal of Parallel and Distributed Computing, Vol.59, pp.107-131, 1999.

Multisite Task Scheduling on Distributed Computing Grid Weizhe Zhang1, Hongli Zhang1, Hui He2, and Mingzeng Hu1 1

School of Computer Science and Technology, Harbin Institute of Technology, P.R.China {zwz, zhl, mzh}@pact518.hit.edu.cn 2

http://pact518.hit.edu.cn/index.html Network Information Center, Harbin Institute of Technology, P.R.China [email protected]

Abstract. Multisite task scheduling plays more and more important role in the grid computing as the WAN becomes faster and faster. Through the development of a three-level architecture of the distributed computing grid model and a grid schedule model, a scalable environment for multisite task scheduling is put forward. Then, a multisite Distributed Scheduling Server is designed and its prototype is implemented. A heuristic strategy, Clustering-based Grid Resource Selection algorithm, is described. Experiment indicates the scheduler and the algorithm are effective.

1 Introduction Grid computing refers to the coordinated and secured sharing of computing resources across different administrative domains, aiming to solve the large-scale embarrassing problems such as fluid dynamics, weather modeling, nuclear simulation and molecular modeling....etc. Currently, computational grid can be classified into distributed computing grid and high-throughput computing grid [1]. A distributed supercomputing grid executes the application in parallel on multiple machines to reduce the completion time of a job. A high-throughput grid increases the completion rate of a stream of jobs. Task scheduling is necessary and important to achieve less running time and higher throughput. Traditionally, the definition of task scheduling is the assignment of start and end times to a set of tasks to some certain resources, subject to certain constraints. However, computing grid involves so many resources over multiple administrative domains that resources should be selected carefully in order to provide the best Qos. Thus, the traditional scheduling model based on static resources can not satisfy the large-scale dynamic resources requirement of the grid computing. In this paper, a new scheduling model oriented to the distributed computing grid is put forward. In the new scheduling model, the resource selection phrase plays an important role. Normally, resource selection algorithms can be classified into single-site and multisite resource selection algorithms. Currently, most of the scheduler systems adopt the M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 57–64, 2004. © Springer-Verlag Berlin Heidelberg 2004

58

W. Zhang et al.

single-site resource selection algorithm such as Matchmaker/Class Ad system of University of Wisconsin-Madison [2], Nimrod/G Scheduler of Monash University[3], Silver Grid scheduler of Supercluster Organization[4] and the Metascheduler of the Poznan Supercomputing and Networking Center[5]. However, only the GrADS [6] project matches sets of resources to applications instead of just a single resource. The lack of the multi-site resource selection algorithms is the result that many users fear a significant adverse effect on the computation time due to the limitations in network bandwidth and latency over wide-area networks. As WAN networks become faster and faster, the overload caused by communication may decrease over time. In fact, [7] has proved the usage of multi-site application can significantly improve running time in terms of a smaller average response time and about 25% communication overload. In this paper, we propose our enhanced multi-site resource selection algorithm— CGRS algorithm based on the distributed computing grid model and the grid scheduling model. The CGRS algorithm integrates a new density-based internet clustering algorithm into the decoupled scheduling approach of the GrADS and decreases its time complexity. The rest of this paper is organized as follows. First, our scheduling model is discussed in the next section. In Section 3, we present the design of our scheduler. The resource selection algorithm and experiments are presented and discussed in Section 4 and 5 respectively. The paper ends with a brief conclusion.

2 Models 2.1 Distributed Computing Grid Model Distributed Computing Grid Model (DCGM) adopts three-level architecture as shown in Fig.1: the top level consists of Grid Information Servers (GIS) and Grid Meta Scheduling Server (GMSS); the second level has several domains and each domain consists of a Grid Distributed Scheduling Server (GDSS); all kinds of Grid Computing Resources (GCR) and Grid User Groups (GUG) are third level. Grid Information Servers (GIS) are the essential part of any Grid software infrastructure, providing fundamental mechanisms for discovery and monitoring [8]. Each domain is controlled by at least one GIS, which dynamically collects information about resources registered to it and spread information to other GISs. A GIS receives Grid information request sent by GMSS, GDSS and GUG and returns the satisfactory resource aggregate to the requester Grid Meta Scheduling Servers (GMSS) focus on harmonizing the scheduling of different GDSSs. The goal of the GDSS is to avoid the mistake that GDSS assumes the absence of the others when two or more applications are submitted simultaneously. Every GMSS accepts the meta scheduling request from the GDSSs and cooperates with other GMSSs to increase the system throughput. There is much work done for the GMSS policies, in the current stage we only focus on the GDSS scheduling policies. Grid Distributed Scheduling Servers (GDSS) are the key component in the arch-

Multisite Task Scheduling on Distributed Computing Grid

59

itecture, which administer the efficient use of registered resources and mapping the grand-challenging application on the selected resources aggregate. When the Grid User Group (GUG) submits the job to GDSS, GDSS contacts with GIS to gather the information of the Grid. Then, GDSSs use the decision module for scheduling and dispatches the meta scheduling request to GMSSs. Grid Computing Resources (GCR) are non-dedicated workstations or personal computers, which may be homogeneous or heterogeneous. Every GCR registered to GIS acts as the target of task mapping performed by GDSS or GMSS. Grid User Group (GUG) has the challenge problems such as fluid dynamics, weather modeling and nuclear simulation....etc. GUG interacts with Grid environment through the Grid Portal along with GDSS.

Fig. 1. Distributed computing grid model

2.2 Scheduling Model Distributed computing grid mainly focuses on some specific grand application which takes hours, days, weeks and months while high throughout grid targets stream of tasks. Thus the scheduling purpose of our DCGM is not to maximize system utilization but to reduce the turn around time. The GDSSs make best effort decisions of static scheduling using predicable performance model of specific applications and submit the job to the resources selected. The scheduling model is formally defined as a seven-tuple: where the meanings are as follows: 1. R is a finite and nonempty set of the non-dedicated and heterogeneous GCRs. 2. represents an application requirement model, satisfying is a finite and nonempty set of application information.

60

3. 4. 5. 6. 7.

W. Zhang et al.

and are a finite set of GUG network and host minimal application requirement respectively, which provide the basic Qos guarantee. T is a finite and nonempty set of arbitrarily divisible grand-challenge tasks P is a finite and nonempty set of performance models determined by types of T . , S denotes the set of the start time of tasks } is nonempty set of mapping strategies. is a function filtering out the resources that do not meet the GUG minimal job requirements and reduce the resources set for the GDSS. is a function determining the best fit resource (or set of best resources) to submit a job, which is the core process of the GDSS.

3 The GDSS Design Combing the scheduling model of DCG and the general architecture presented in [8], we begin with a framework of a single GDSS node. The framework as shown in Figure 2 will give a broad overview of the work required to build a generic unit.

Fig. 2. The framework of the Grid Distributed Scheduling Server

There are three main phases when scheduling on the GDSSs. Phase One is resource filtering, which consists of distributed computing grid portal, XML parser, job priority queue and resource filter. In order to proceed to resource filtering, the users must specify task description and some minimal set of job requirements through Web portal which creates the XML document parsed by our DOM parser. Then, job priority queue is with responsibility for determining the priority of the job. Subsequently, resource filter removes unsuitable resources utilizing the information from GIS. At the end of phase one, the list of potential resources is generated. Phase Two involves mapping tasks and selecting the best resource set. Predicable information collector gathers the detailed information from the GIS. Our system adopts information provider based on Globus and NWS to support dynamic information collection. Job scheduling decision module is the key component of GDSSs,

Multisite Task Scheduling on Distributed Computing Grid

61

which determines the best-fit resource (or the resource set) as a meta request. The efficiency of job scheduling decision module is directly determined by best-fit resource selection algorithm. Our resource selection algorithm based on grid clustering will be explained in the next section. Subsequently, the meta request is sent to the GMSSs and the cooperating resource set is feed back after the GMSSs negotiate a compromise about the contention of different GUGs request the same GCRs at the same time. At the end of phase two, a set of cooperating resource is generated. In Phase Three the job is executed, which includes a file dispatcher and a result retriever. We adopt gridftp, GRAM services based on Globus to implement remote job submission and remote compilation. At last, the result is retrieved and displayed on the Web portal using Virtual Reality Modeling Language (VRML).

4 A Resource Selection Algorithm The resource selection algorithm is at the core of the job scheduling decision module and must have the ability to integrate multi-site computation power. Our Clusteringbased Grid Resource Selection (CGRS) algorithm clusters the set of available resources, generates the candidate schedules for all the subsets in each cluster and evaluates the candidate schedules to select a final schedule. Pseudo-code for our multi-site search procedure is given in Figure 3. The first method called by the procedure is GCRClustering ( ); The method clusters the available GCR into disjoint subsets, such that the network delays within each subset are lower than the network delays between subsets. The clustering algorithm of available GCRs clustering is so important as the basis of CGRS algorithm that we design a sophisticated clustering algorithm based on data mining [10] method, which will be expounded in the following section. Another core method is the MapAndPredict ( ). The method adopts performance model and mapping strategy of some specific applications to predict execution time. Because the predicted execution time directly determines the correctness of the best schedule, performance model and mapping strategy plays an important role.

Fig. 3. Clustering-based Grid Resource Selection (CGRS) algorithm

62

W. Zhang et al.

Our methodology of CGRS algorithm is similar to the GrADS approach in that it decouples the performance model and adopts the multisite selection algorithm to promote application performance. However, the CGRS algorithm outperforms the schedule search procedure of GrADS in two aspects: Firstly, the CGRS algorithm introduces more sophisticated clustering algorithm than the method based on the Internet Domain Name Service adopted by the GrADS. . It is well-known that clustering algorithm based on DNS is unreasonable and imprecise. The CGRS algorithm adopts a new clustering method based on data mining [9] providing firmly theoretic ground for resource selection. Secondly, the CGRS algorithm is an O(n) algorithm, where n represents the available GCRs, if the GCRClustering method is decoupled and implemented by a separate module which runs periodically. The time complexity of GrADS is about where n and s represents the available GCRs and the number of clusters respectively.

4.1 Density-Based GCR Clustering (DGC) Algorithm The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering .The purpose of GCR clustering, which serves as a pivotal preprocessing step for CGRS algorithm, is to identify the clusters of low intra-cluster network latencies to enable coarse-grain parallelism. We propose a density-based GCR clustering technique instead of the traditional DNS-based clustering method which leads to more static clusters. The basic ideas of DGC algorithm involve a number of new definitions: The neighborhood within a radius of a given edge is called the neighborhood of the edge, where radius represents network latency or bandwidth. If the of an edge contains at least a minimum number, MinPts, of edges, then the edge is called a core edge. Given a set of edges, D, we say that an edge p is directly density-reachable from edge q if p is within the of q. Now, the DGC algorithm is presented as follow: 1. All the directly density-reachable edges of every edge in the AvailableGCRPool are found and stored into the adjacency list. 2. Then, the DGC algorithm determines the core edges among the low latency edges or high bandwidth. 3. At last, it iteratively collects directly density-reachable edges from these core edges, which may involve the merge of a few density-reachable clusters. The process terminates when no new edge can be added to any cluster. The time complexity of DCG algorithm is where e is the edge number.

Multisite Task Scheduling on Distributed Computing Grid

63

5 Experiments The efficiency of the decoupled multi-site scheduling methodology itself has been demonstrated in [6].In this section, we present validation results for the density-based GCR Clustering (DGC) algorithm developed in Section 4.1. We have implemented DGC algorithm on the tree network topology, graph network topology and AS-level network topology shown in Figure 4. In each topology, according to a user-defined parameter D, the edges of graph can be classified into two categories: low-latency and high-latency. The low-latency edges are the edges between two black nodes while others are high-latency edges. In the experiment, we initialize MinPts and as 1, which means that the core edges itself are low-latency edges and at least directly connect with one low-latency edge. In the tree network topology, after clustering using DGC algorithm, three clusters {(1,2)(2,3)(2,9)(3,4)(4,5)},{(11,12)(11,13)(11,15)(13,14)},{(18,19)(19,20)(19,21)} are acquired and resource selection combinations of DGC algorithm are 9 less than that of the DNS-based method. In the graph network topology, only one cluster {(3,6)(3,7)(5,8)(6,8)(7,12)} is acquired and 7 resource selection combinations are erased. In AS-level network topology, the low-latency edges are {(1,2)(2,3)(3,4)(4,1)(5,6)(6,7)(7,8)(8,5)(9,10)(9,11)(9,12)}and the result of clustering is three clusters: {(1,2)(2,3)(3,4)(4,1)}{(5,6)(6,7)(7,8)(8,5)} and {(9,10)(9,11)(9,12)}.

Fig. 4. The tree, graph network and AS-level network topology

The above results indicate that our clustering strategy is correct and effective for clustering grid computing resources. It avoids the negative impact of high-latency edges, reduces the member number of the resource aggregate and the resource evaluation combination of CGRS algorithm.

6 Conclusion In this paper, a three-level architecture of the Distributed Computing Grid Model (DCGM) is brought forward and acts as the infrastructure of multisite scheduling environment. The design of the key component of multisite scheduling environment, Distributed Scheduling Server (GDSS), is discussed in detail.

64

W. Zhang et al.

Also, we focus on the multisite resource selection algorithm of GDSS. A heuristic strategy, Clustering-based Grid Resource Selection (CGRS) algorithm, was described. In the CGRS algorithm, we mainly introduce the Density-based GCR Clustering (DGC) algorithm to cluster the resources in the distributed computing grid and combine it with the decoupled scheduling approach. The next step in this research is to precisely quantify the benefit of the CGRS algorithm with actual internet application in real distributed computing grid environment. Moreover, we will involve more parallel applications such as loose synchronous applications and embarrassing applications into the schema of our scheduling.

References 1. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Fransisco, CA, 1999. 2. R. Raman, M. Livny and M.Solomon, Matchmaking: distributed resource management for high throughput computing, High Performance Distributed Computing, 1998. Proceedings. The Seventh International Symposium on 28-31 July 1998 Page(s): 140 -146 3. R.Buyya, D.Abramson and J Giddy, Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid, High Performance Computing in the Asia-Pacific Region, 2000. Proceedings. The Fourth International Conference/Exhibition on May 2000, Volume: 1 , 14-17 Page(s): 283 -289 vol.1 4. Silver Design Overview, http://supercluster.org/projects/silver/designoverview.html 5. Krysztof Kurowski, Jarek Nabrzyski, and Juliusz Pulacki, User Preference Driven Multiobjective ResourceManagement in Grid Environments, Proceedings of CCGrid 2001, May 2001. 6. H. Dail, F. Berman and H. Casanova, A Decoupled Scheduling Approach for the Grads Program Development Environment, to appear in Journal of Parallel and Distributed Computing (JPDC), 2003 7. Carsten Ernemann, Volker Hamscher, Uwe Schwiegelshohn, Ramin Yahyapour and Achim Streit, On Advantages of Grid Computing for Parallel Job Scheduling, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’02), May 21 - 24, 2002 , Berlin, Germany 8. J. Schopf, A General Architecture for Scheduling on the Grid. Submitted to special issue of JPDC on Grid Computing (2002) 9. Jiawei Han, Micheline Kamber, Data Mining: Concepts and Technology, Morgan Kaufmann, 2001

Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm Yang Gao1, Hongqiang Rong2, Frank Tong2, Zongwei Luo 2 , and Joshua Huang2 1

National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China [email protected]

2

E-Business Technology Institute, The University of Hong Kong, Hong Kong, China {hrong,ftong,zwluo,jhuang}@eti.hku.hk

Abstract. This paper presents a new approach to scheduling jobs on a service Grid using a genetic algorithm (GA). A fitness function is defined to minimize the average execution time of scheduling N jobs to machines on the Grid. Two models are proposed to predict the execution time of a single job or multiple jobs on each machine with varied system load. The single service type model is used to schedule jobs of one single service to a machine while the multiple service types model schedules jobs of multiple services to a machine. The predicted execution times from these models are used as input to the genetic algorithm to schedule N jobs to M machines on the Grid. Experiments on a small Grid of four machines have shown a significant reduction of the average execution time by the new job scheduling approach.

1 Introduction One of the challenges for a service Grid is to efficiently process users’ requests to Grid services in large numbers. This is essentially a problem of optimal allocation of the Grid resources to complete the requested services within a given time slot through effective job scheduling. Given N jobs submitted at time to a service Grid that has machines to execute these jobs in parallel, the optimal job scheduling strategy is to minimize the execution time of these jobs subject to a given cost. Job scheduling problems on a service Grid can be divided into the three levels. System-level scheduling deals with the problem of assigning a single job to one of M machines on the Grid that can finish the job in the shortest time [1][6][4]. Application-level scheduling deals with the problem of scheduling N various jobs 1

The work was conducted when the author was visiting the E-Business Technology Institute of The University of Hong Kong, under the support of the IBM China Scholar Visitorship Program, Natural Science Foundation of P.R.China (No.60103012) and the National Grand Fundamental Research 973 Program of China (No.2002CB312002)

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 65–72, 2004. © Springer-Verlag Berlin Heidelberg 2004

66

Y. Gao et al.

that are submitted to the Grid at the same time slot to M machines of the Grid [3]. Grid-level scheduling deals with the problem that N jobs are submitted to the Grid at the same time slot but the Grid is lack of resources to complete these jobs in a given time slot. In such a situation, the Grid-level scheduling system needs to find other Grids to execute some of these jobs. Unlike job scheduling in parallel computing and cluster computing that usually use a static load model or performance model estimated from experience data [8], the Grid job scheduling uses a dynamic model to predict job execution time due to the heterogeneity of computing resources and dynamics of the machine load [7]. The traditional approach to job scheduling is to first model the available computing resources, then determine the system load and finally estimate the jobs’ execution time. Direct application of this approach to Grid job scheduling often results in poor performance due to the special characteristics of the Grid environment [2]. In addition, job scheduling in parallel computing and cluster computing emphasizes on the performance and load balance of the whole system while in Grid computing, different scheduling polices and algorithms are required to deal with different kinds of tasks [5]. In this paper, we present an approach to using a genetic algorithm (GA) to minimize the average execution time of scheduling N jobs on heterogenous machines on a service Grid. To solve this job scheduling problem, we first develop a model to predict the execution time of a single service (job) on different machines with varied system load. Then, we extend the single service model to the multiple service types model that deals with situations where different types of jobs arrive at each machine in sequence. With the multiple service types model, we define an objective function to evaluate the optimal scheduling of N services of different types to M machines of different load situations and use a genetic algorithm to find the solution by minimizing the objective function. We conducted simulated experiments on a small scale Grid composed of four machines with different capacities and operating systems. The experiment results have shown significant reduction of the average execution time by our approach in comparison with the random allocation and average allocation methods.

2 2.1

Adaptive Models for Predicting Job Execution Times Single Service Type Model

We first develop a model to predict the execution time of a single service or job on each machine on the Grid. The major factors that affect the execution time include the machine capacity, complexity of the algorithm for the service and the size of data involved. The machine’s capacity changes dynamically over system load. Because of the load dynamics and many other unknowns, it is impossible to predict the precise execution time of a job on a machine. We can only predict the possible execution time of a job from historical experiences and use this predicted execution time to schedule the execution of the job. In the service Grid,

Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm

67

the computational performance of each service is likely tested on each machine before it is published for use. Therefore, the computational performance and ability to process different sizes of data for each service can be known or learned incrementally from the historical data of using the service. The difficult part is to deal with the dynamic load of the system that affects the execution performance of the service to be submitted. To simplify this problem we first assume that only one service is submitted to the Grid at a time slot. Our objective is to optimize the performance of the whole system no matter whether the system has a light or heavy load. In order to increase system’s throughput, the scheduling system must maximize the numbers of completed jobs within a time slot. To solve this problem, we use the following model to predict the execution time of a single job on a machine.

Here, is the predicted execution time of a new job on machine is the number of times that the jobs of the same service have been executed on is the actual execution time of the same service on the same machine. is the learning rate. and as The historical values of and are stored on the machine, can be obtained by experiments. If there are jobs generated from the same service already running on machine the actual execution time of the job will be affected due to the system load. In this case we use the following model to predict the execution time of the job.

Here, is the predicted execution time of job on machine that has jobs of the same services still running. and on the right side are the last historical values of the expected execution time and actual execution time respectively. is the learning rate. Because of the dynamics of the system load, it is difficult to obtain However, the ratios of can be estimated. Using the ratios, the execution time of the job on machine can be predicted by

where

and as and approach constants if the Grid system is stable. Figure 1 explains the process of calculating the predicted execution time of job on machine When the first job is submitted to the Grid, its predicted

68

Y. Gao et al.

Fig. 1. Predict the execution time of the on a machine

job when there are

jobs still running

execution time on machine is calculated by (1). The second job arrives while the first job is still running on machine The predicted execution time of the second job on machine is calculated by (3). Similarly, we can calculate the predicted execution time of the job, on machine when the previous submitted jobs are still running on it.

2.2

Multiple Service Types Model

To predict the execution time of the jobs in multiple service types we use the following model.

where

The superscript indicates a type of service. The difference of (6) from equation (3) is that two subsequent jobs can be different services as shown in (7) and (8). Figure 2 shows how the predicted execution time of the job is calculated. Because the predicted execution times of all preceding jobs are known, the scheduling path is known for a particular sequence of jobs, e.g., shown as the light line in Figure 2. However, unlike the single service type model, depends on the scheduling path. In other words, different scheduling paths result in different execution times since services in each path are different. Therefore, in implementation, we do not save individual on machine but ratios instead. We can observe from Figure 2 that if there are S services of different types on machine and the maximal number of possible concurrent jobs is N, then the total stored weights of connection lines are It is feasible to store these weights in the scheduling system since the number of services on a machine is limited. To implement this scheduling algorithm, the Grid initializes all parameters and all based on experiments. When a new job arrives, the Grid predicts all execution times on each machine, sorts them and selects

Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm

69

a machine with the minimal execution time to execute this job. When the job ends, the scheduling system records the real execution time and adjusts the prediction value. When another new job arrives, the schedule system schedules the job based on these new prediction values. The computational complexity of this system-level scheduling algorithm is O(M), where M is the number of machines.

3

A GA Approach to Application-Level Scheduling

Application-level scheduling deals with the problem of scheduling N jobs of different service types to machines on the Grid. To solve this scheduling problem, we can use the execution time prediction models to calculate the predicted execution time of these jobs on each machine and find an optimal allocation of these jobs to the M machines by minimizing

subject to

where

is the average execution time of jobs on machine N is the total number of jobs submitted to the Grid at a given time slot and is the maximal number of jobs limited to machine is the current, private load on machine can be obtained from experiments or historical results of running different numbers of jobs on each machine. Figure 3 shows three examples. Essentially, each curve is the accumulative distribution function of (3). Interpolation and extrapolation can be used to obtain a particular value. is also determined by experiments. (9) can be optimized using a genetic algorithm. Table 1 shows the pseudo code of the algorithm. First, the Grid gatekeeper sends requests to all machines and inquires whether or not they can process the new jobs. Each machine checks its load according to the current system performance and returns the information to the gatekeeper. The machine which agrees to receive the new jobs returns its current load status, such as the current number of jobs and The Grid scheduling system produces initial population including certain jobs assignments, evaluates each individual’s fitness, chooses the individuals with higher fitness to make copies, performs crossover and mutation operations, obtains the new population, and evaluates each individual’s fitness in the new population. Finally,

70

Y. Gao et al.

Fig. 2. Predict the execution time of the job when the jobs of different service types are runing on the machine.

Fig. 3. The performance curves of these machines running different numbers of jobs.

the Grid obtains the nearly optimal jobs assignment strategy and assigns the jobs to each machine. At last, each machine receives the new jobs that will be scheduled by its OS. In this algorithm, indicates the population in the step.

4

Experimental Results and Analysis

We conducted two experiments to test our models on a small scale Grid consisting of 4 machines. In these experiments, only one service to calculate the Discrete Flourier Transform (DFT) was used. The scheduling time slot was set as one second. In the first experiment, a request of the DFT service to transform 100 points was sent to the Grid in each time slot. 1000 requests for the same service were sent consecutively. The Grid system-level scheduling model was used to schedule these 1000 jobs. Figure 4 shows the results of scheduling the jobs randomly and by using the multiple service types model. One can clearly see

Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm

71

the reduction of the average execution time by the prediction model. For example, in 1000 second time slots, the Grid finished more than 800 jobs scheduled by the prediction model while it could only complete 400 jobs by the random scheduling. Figure 5 shows the distribution of the difference between the predicted execution time and the actual execution time of 1000 jobs. Although a system error of -3.8s is present, the model is able to predict the execution time within a small variance (4.95s). Even for interactive analysis, this prediction error is acceptable in many applications .

Fig. 4. Comparison of random scheduling and model scheduling.

Fig. 5. Distribution of the difference between the predicted execution time and the actual execution time.

Fig. 6. Comparison of GA scheduling with Random scheduling and average scheduling.

Fig. 7. Performance of the GA scheduling algorithm.

In the second experiment, we conducted 20 tests. In each test, we changed the number of jobs submitted to the Grid in each time slot. The number of jobs increased from 20 to 100. Figure 6 shows the average execution time of jobs scheduled by the GA approach, the average scheduling and the random

72

Y. Gao et al.

scheduling. We can see that the GA approach performed best and the random scheduling performed worst. The performance of the average scheduling is close to the GA approach when the number of jobs is samll. When the number of jobs becomes large, the GA approach shows a clear advantage. For example, the average execution time of the average scheduling was 20 seconds longer than that of the GA approach when they scheduled 80 jobs. Figure 7 shows predicted execution time against the iterations of the genetic algorithm. Our experiments showed that a near-optimal scheduling can be found within 50 iterations.

5

Conclusions

In this paper, we have presented two prediction models that are used to predict the execution time of a job on a given machine with varied system load. Based on the prediction models, we have developed a genetic algorithm approach to scheduling N jobs of different services to M machines on a Grid. Our experiment results have shown that the GA approach can reduce the average execution time of N jobs run on the Grid in comparison with some naive scheduling methods. Our experiments also demonstrated that the prediction models can predict the execution time accurately.

References 1. The globus grid project. http://www.globus.org. 2. Miguel L. Bote-Lorenzo, Yannis A. Dimitriadis, and Eduardo Gomez-Sanchez. Grid characteristics and uses: a grid definition. In Proceeding of the First European Across Grids Conference, 2 2003. 3. Henri Casanova, MyungHo Kim, James S. Plank, and Jack J. Dongarra. Adaptive scheduling for task farming with grid middleware. The International Journal of High Performance Computing Applications, 13(3):231–240, Fall 1999. 4. Steve J. Chapin, D. Katramatos, J. Karpovich, and A. Grimshaw. Resource management in legion. Future Generation Computer Systems, 15(5–6):583–594, 1999. 5. Klaus Krauter, Rajkumar Buyya, and Muthucumaru Maheswaran. A taxonomy and survey of grid resource management systems for distributed computing. Software Practice and Experience, 2:135–164, 2002. 6. Rajesh Raman, Miron Livey, and Marvin Solomon. Matchmaking:distributed resource management for high throughput computing. In Proceedings of the seventh IEEE International Symposium on High Performance Distributed Computing, Chicago, IL, July 1998. 7. D. Thain, T. Tannenbaum, and Miron Livny. Condor and the grid. In Anthony J.G. Hey, Fran Berman, Geoffrey C. Fox, editor, Grid Computering: Making the Global Infrastructure a Reality, chapter 11, pages 299–335. Wiley, West Sussex, England, 2003. 8. Y. Zhang, H. Franke, J. E. Moreira, and A. Sivasubramaniam. An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. In D.G. Feitlson and L. Rudolph, editors, JSSPP 2001, Lecture Notes in Computer Science, pages 133–158, Berlin Heidelberg, Springer-Verlag.

Resource Scheduling Algorithms for Grid Computing and Its Modeling and Analysis Using Petri Net* Yaojun Han1,2,3, Changjun Jiang1 ,You Fu1,2, and Xuemei Luo2,3 1

Department of Computer Science & Engineering, Tongji University, Shanghai, 200092, China 2 Department of Computer Science, Shandong University of Science & Technology, Qingdao, 266510, China 3 Lab. Computer Science, ISCAS, Beijing, 100080, China [email protected]

Abstract. A resource scheduling algorithm called XMin-min is proposed in this paper. In the XMin-min algorithm, we consider not only the expected execution time of tasks, but also expected communication time when calculating expected completion time. In the paper, the execution cost of tasks and budget of application are selected as QoS and an algorithm XMin-min with QoS is also proposed. An extended high-level timed Petri net (EHLTPN) model suiting scheduling of resource in grid computing is presented in the paper. In the EHLTPN, the firing times assigned to transitions are functions of the tokens of input places. We construct a simple model for the resource scheduling in grid computing using EHLTPN. A definition of Reachable Scheduling Graph (RSG) of EHLTPN to analyze the timing property of the resource scheduling is given in this paper. Two algorithms can be use to settle the “state explosion” problem while constructing RSG of EHLTPN.

1 Introduction In grid computing environment, the scheduling problem becomes complex, as resources are geographically distributed, heterogeneous in nature, owned by different individuals or organizations [1]. It is well known that the choice of the best pairs of tasks and resources is a NP-complete problem. Some simple heuristics for dynamic scheduling of a class of independent tasks onto a heterogeneous computing system have been presented [2]. It is well known that the Min-min heuristics is now becoming the benchmark of such kinds of task/host scheduling problems. However, the Min-min algorithm is unable to balance the load well since it usually maps small tasks *

This work is support partially by projects of National Preeminent Youth Science Foundation (No. 60125205), National 863 Plan (2002AA4Z3430, 2002AA1Z2102A), Foundation for University Key Teacher by the Ministry of Education, Shanghai Science & Technology Research Plan (02DJ14064, 03JC14071), Open project of Laboratory of Computer Science, ISCAS (SYSKF0304).

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 73–80, 2004. © Springer-Verlag Berlin Heidelberg 2004

74

Y. Han et al.

first and did not deal with the Quality of Service (QoS). A novel QoS guided taskscheduling algorithm for grid computing was introduced in [3]. However, there are few algorithms consider the communication time of tasks while scheduling. An extension of Min-min algorithm called XMin-min is proposed in this paper. In the XMin-min algorithm, we consider not only the expected execution time of tasks, but also expected communication time of tasks when calculating expected completion time. In the paper, the execution cost of tasks and budget of application are selected as QoS. We give another scheduling algorithm XMin-min with QoS by embedding the QoS information into the XMin-min algorithm to improve the efficiency and the utilization of a grid system. In grid computing environment, the need arises in resource scheduling for powerful graphical and analytical tools such as Petri nets. Petri nets have gained more and more applications because of their ability to model asynchronous events, parallelism, connection, and synchronization [4]. In order to describe real processes well, many extensions of Petri nets such as colored Petri nets [5] and timed Petri nets [6] have been proposed. Some Petri nets models for scheduling were given in [7,8]. But, these models and their analysis technologies are not suitable to the resource scheduling in grid computing environment. Up to now, there are few Petri nets models for grid computing. We gave an extended colored time Petri net (ECTPN) model for describing and analyzing resource scheduling in grid computing environment in [9]. We modify the ECTPN and give an extended high-level timed Petri net (EHLTPN) model in order to well suit scheduling of resource in grid computing environment. In the EHLTPN, the firing times assigned to transitions are functions of the tokens of input places. A definition of reachable scheduling graph (RSG) to analyze the timing property of the resource scheduling in grid computing environmenis is given in this paper. Meantime, the XMin-min and XMin-min with QoS algorithms can be also use to settle the “state explosion” problem while constructing RSG of EHLTPN. The rest of this paper is organized as follows. Two algorithms are proposed in section 2. An EHLTPN model for the resource scheduling in grid computing environment is constructed in section 3. The definition of RSG is given in section4. Section 5 gives an example and experimental results. Section 6 concludes the paper.

2 Algorithms for Resource Scheduling We assume that there are m computing resources that are accessible to the user via m distinct network links, n tasks that are mapped onto m heterogeneous machines. The tasks are assumed to be independent. The expected execution time of task on machine is defined as the amount of time taken by to execute given has no load when is assigned. The expected transmit time is defined as the amount of time taken by transmitting task to machine from user site. Let denote the expected time machine will become ready to execute a task after finishing the execution of all tasks assigned to it at that point in time. The expected completion time of task on machine is defined as the wall-clock time at which completes In our paper,

Resource Scheduling Algorithms for Grid Computing and Its Modeling

75

Let denote the expected execution cost of task on machine denote the budget of task and B denote the total budget of all tasks Algorithm 1. XMin-min algorithm (1) For all tasks (2) For all machines (3) (4) Do until all tasks in are mapped (5) For each find the earliest completion time and that obtains it (6) Find with the minimum earliest completion time within budget B (7) Assign to that gives the earliest completion time within budget B (8) Delete task Update Update for all i (9) Enddo Algorithm 2. XMin-min Algorithm with QoS (1) For all tasks (2) For all machines (3) (4) Do until all tasks in are mapped (5) For each find the earliest completion time within budget ( i.e. and the machine that obtains it (6) Find with the minimum earliest completion time within budget B (7) Assign to that gives the earliest completion time within budget B (8) Delete task Update Update for all i (9) Enddo Obviously, above algorithms have the same complexity as the Min-min algorithm.

3 Petri Net Model for Resource Scheduling The basic concepts and properties of Petri nets have been introduced in [4,5,6], we do not intend to review them here. Definition 1. An extended high-level timed Petri net a eight-tuple where, is a finite nonempty set of places, is a finite nonempty set of transitions, is a finite set of directed arcs from P to T and T to P. C is a function: where is a power set of color set and are the negative function and positive function of P×T. is an initial marking that satisfies D is a set of firing durations, where firing duration is a function of the tokens of input places. Definition 2. An EHLTPN model for the resource scheduling in grid computing environment RSPN is a eight-tuple where, is a finite set of places, where, represents all unmapped tasks, represents the task selected to schedule using some algorithm,

76

Y. Han et al.

and represent the data of tasks used to reschedule, represent the tasks mapped to machine is a finite set of transitions, where, is used to select a task to schedule from unmapped tasks, represent the execution of any task on machine are used to modify the data of all unmapped tasks, is a finite set of arcs, is a set of colors, where u is the number of tasks in and M is the current marking, where and are selected from all unmapped tasks according to some algorithm, where u is the number of tasks in where is the running time of the task on the machine The graphical representation of RSPN is shown as figure 1.

Fig. 1. RSPN model

4 Reachable Scheduling Graph Definition 3. Let RSPN is an EHLTPN model for the resource scheduling. The reachable scheduling graph (RSG) of RSPN is defined as a directed graph with labeled directed edges and labeled nodes RSG(RSPN)=(V,E1,E2) and divided into m vertical sections corresponding to m machines and some levels. Proposition 1. The RSG(RSPN) =(V,E1,E2) is constructed by the following algorithm. (1) Let For j=1 to m do lv[j]=0. (2) Place in level 0 and tag “new”.

Resource Scheduling Algorithms for Grid Computing and Its Modeling

77

(3) If there exists no “new” node in V, then the algorithm ends, otherwise go to (4) (4) Select a “new” marking M and do the following: (4.1) While there exist at M, do the following for each enabled transitions at M:

(4.11) Obtain M’ that results from firing at M by calculating arc functions. (4.12) If (4.121) (4.122) If then lv[j]=lv[j]+1, M’ is placed in level lv[j] and section j and tag M’ “new”, where j is the index of and (4.123) If then is placed in level lv[j] and section j and tag M’ “new” and where i and j are the indexes of and respectively and (4.124) If then M’ is placed in level lv[j] and section j and tag "new", where j is the index of and (4.13) with (4.14)If then tag where (4.2) If there exists no transition t such that M[t>, then tag M “dead node”. (4.3) Remove “new” from M and go to (3). The correctness of the algorithm can be easily proven according to the definitions of RSPN and RSG and the firing rule of Petri net. In order to reduce the scale of RSG, we can use above two algorithms to generate successive nodes while calculating arc functions. Proposition 2. Let RSG(RSPN)=(V,E1,E2) is the reachable scheduling graph of RSPN. The sequence composed of “from beginning of level 1 along the edge E2 in section j represents the scheduling sequence of tasks on machine Proposition 3. Let RSG(RSPN)=(V,E1,E2) is the reachable scheduling graph of is in last level of section j. is the tag attached to Then the makespan for the complete schedule is equal to Example: Suppose that there are 4 independent tasks and there are two machines at time t. Table 1 gives the expected execution time, transmitting time, execution cost and budget data. The total budget required to run all tasks is 60.

The graphical representation of the RSPN for the example is similar to figure 1. We construct RSGs (shown in figure 2, 3 and 4) using the Min-min, XMin-min and XMin-min with QoS algorithms when calculating the arc functions of RSPN according to proposition 1 respectively.

78

Y. Han et al.

Fig. 2. The RSG constructed using Min-min

Fig. 3. The RSG constructed using XMin-min

Fig. 4. The RSG constructed using XMin-min with QoS.

From figure 2,3 and 4, we know that the Min-min algorithm gives a schedule sequence on machine and a makespan of 18, the XMin-min algorithm gives a schedule sequence on machine on machine and a makespan of 19, where execution time is 11, and the XMin-min with QoS

Resource Scheduling Algorithms for Grid Computing and Its Modeling

79

algorithm gives schedule sequence on machine and on machine and a makespan of 16, where execution time is 11. It shows that the XMinmin with QoS algorithm outperforms the XMin-min without QoS and the XMin-min algorithm outperforms the Min-min algorithm in this example.

5 Experimental Results and Discussion In our experimental testing, the system consists of a cluster named DAWNING 3000 including 4 nodes and 12 PCs. We design a program running the system evaluate the newly proposed scheduling algorithm. The expected execution time, expected transmit time and expected execution cost of tasks on machines were produced randomly by the program. The budget of each task is equal to the maximum of costs of 85%. The experimental evaluation of the algorithms is performed for n={50, 100,150,200} tasks. We get the average makespan and the cost of the 100 times. Table 2 and figure 5 shows the comparison of makespans and costs.

Fig. 5. (a) Makespans for three algorithms

(b) Costs for three algorithms

In order to compare three algorithms, we subtract the transmit time from makespan for Xmin-min and Xmin-min with QoS in figure 5. The difference of makespan among three algorithms is little in figure 5(a). But, from figure 5(b), we know that the difference of cost among three algorithms is big. The cost of Xmin-min with QoS is minimum. The bigger the number of tasks is, the bigger the difference of cost is.

80

Y. Han et al.

6 Conclusions In this paper, two resource scheduling algorithms dealing with execution time and transmitting time were presented. The example and experimental results show that the XMin-min with QoS algorithm outperforms the XMin-min without QoS. As a powerful graphical and analytical tool, an EHLTPN model was presented and a simple model for the resource scheduling in grid computing environment was constructed using EHLTPN. The reachable scheduling graph was given and used to analyze the timing property and the sequence of the resource scheduling efficiently and intuitively. Because the transmitting time values of tasks vary with time, we will consider the dynamic transmitting time when calculating the completion time in every iteration of the XMin-min and XMin-min with QoS algorithms in our future work.

References 1. Buyya, R., Giddy, J., Abramson, D.: An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for parameter Sweep Applications. Proceedings of the Int. Workshop on Active Middleware Services (AMS 2000), Kluwer Academic Press, USA(2000). 2. Maheswaran, M., Ali, S., et al.: Dynamic Mapping of a Class of Independent Task onto Heterogeneous Computing Systems. IEEE Heterogeneous Computing Workshop (HCW’99),San Juan, Puerto Rico (1999). 3. He, X.-S., Sun, X.-H., Laszewski G. V..: A QoS Guided Scheduling Algorithm for Grid Computing. Proc. of the Int. Workshop on Grid and Cooperative Computing (GCC2002), Sanya,, China(2002)745-758. 4. Murata, T.: Petri Nets: Properties, Analysis And Applications. Proceeding of IEEE. Vol. 77, 4 (1989)541-580. 5. Jensen, K.: Coloured Petri nets: Basic Concepts, Analysis Methods and Practical Use, Vol. 1, Basic Concepts. Mono-graphs in Theoretical Computer Science. Berlin, Heideberg, New York: Springer-Verlag, 2nd corrected printing (1997). 6. Zuberek, W.M.: Timed Petri Nets: Definitions, Properties and Applications. Microelectron. Reliab., Vol. 31, 4(1991)627-644 7. Prashant Reddy, J., Kumanan, S., Krishnaiah Chetty, O.V..: Application of Petri Nets and a Genetic algorithm to Multi-Mode Multi-Resource Constrained Project Scheduling,” AMT 17 (2001)305-314. 8. Huang, B., Zhang, B.: A New Scheduling Model Based on Extended Petri Net - TREM Net. Proc. of the Int. Conference on Robotics and Automation, Vol. 1 (ICRA94-1). San Diego, CA , USA(1994)495-500. 9. Han Y.-J., Jiang C.-J.: Extended Colored Time Petri Net -Based Resource Scheduling in Grid Computing. Proc. of the Int. Workshop on Grid and Cooperative Computing (GCC2002), Sanya,, China(2002)7345-353.

Architecture of Grid Resource Allocation Management Based on QoS* Xiaozhi Wang and Junzhou Luo Department of Computer Science and Engineering, Southeast University Nanjing, 210096, P. R. China [email protected], [email protected]

Abstract. Qualities of service (QoS) and resource management are key technologies in grid. Through analyzing the characteristics of Grid QoS, this paper sets up the layered structure of Grid QoS. Based on the analysis of the content of grid resource allocation management (GRAM) based on QoS, this paper puts forward the architecture of GRAM based on QoS. Through mapping, converting and negotiating the QoS parameters, it can implant the user’s requirement about QoS in the process of resource allocation management, and connect Grid QoS with GRAM very well. All these provide a reasonable consulting model for QoS and resource allocation management in grid.

1 Introduction The total target of grid is to provide users the ability to harness the power of large numbers of heterogeneous resources: computational resources, storage resources, devices and useful information etc, which are distributed in the wide area and belong to different organizations. In Open Grid Services Architecture (OGSA [1]), all resources are organized in a rational way and formed virtual organizations, which are dynamic and expansive. Virtual organization makes it possible for the mapping of many logical resources cases to the same physical resource. Resource management can be conducted in virtual organization based on basic resource formation. So resource management encounters new challenges. On one hand, in OGSA, grid resources are transparent to grid users, in the form of logical resource. But they are distributed in physical and have their own management strategies. How to allocate and schedule these resources in virtual organization, and enhance their utilization are important problems for resource management to be solved in grid. On the other hand, different grid services have different QoS requests to resources participating in the service, and in the case of considering the service cost, different users can have different QoS needs. In OGSA, the QoS characteristics of physical resource cannot *

This work is supported by National 973 Fundamental Research Program of China (G1998030402) and National Natural Science Foundation of China (90204009)

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 81–88,2004. © Springer-Verlag Berlin Heidelberg 2004

82

X. Wang and J. Luo

represent that of logical resource. How to convert the user’s QoS request to particular QoS parameters that are in grid becomes a problem to be solved urgently. Moreover, the resource allocation under grid environment is closely related to Grid QoS, while carrying on resource management, one should consider the two synthetically. Based on above issues, through analyzing the characteristics of Grid QoS, we propose and set up the layered structure of Grid QoS. On the basis of analyzing the content of GRAM based on QoS, further, we propose the architecture of grid resource allocation management based on QoS (GRAM-QoS). Through the mapping among different layers of Grid QoS, it converts the user’s QoS demand to particular QoS parameters of resources, and implants the mapping conversion of QoS into the resource selection processes. Considering the characteristics of Grid QoS roundly, GRAM-QoS provides a reasonable consulting model for resource allocation management based on QoS. The rest of the paper is as follows. Section 2 reviews related work in GRA and Grid QoS. In Section 3, we analyze the characteristics of Grid QoS and set up the layered structure of Grid QoS. In Section 4, we propose the content of GRAM based on QoS, and introduce the architecture and working flow of GRAM-QoS. Finally, we conclude and point to future directions in Section 5.

2 Related Work and Limitations In [2], the author indexed the resource allocation in grid has two phases: the external and the internal. It points out the characteristics of resource allocation under the grid environment. In [3], the author introduced the economics method, which the service price of resources is determined by the market supply and demand state. It puts forward one new method about resource allocation in terms of service price. But they don’t consider the user’s QoS request synthetically. In [4][5][6][7], the author mainly researched the architecture, reservation strategy and scheduling technology based on resource co-reservation in grid computing systems. Other scholars studied grid resource allocation technology from different angles, e.g. [8][9][10], these research works raised the performance of GRA from different aspects. Though above works can offer QoS guarantee in some extent, they didn’t analyze the characteristics of Grid QoS roundly. So there isn’t a systemic solution based on Grid QoS in grid resource allocation up to now.

3 The Characteristics and Layered Structure of Grid QoS 3.1 The Characteristics of Grid QoS Quality of Service (QoS) is a synthetically guideline, which is used for measuring the satisfaction of a service. It describes some characteristics of a certain service. It’s expressed as a group of parameters using an understandable language to users. In

Architecture of Grid Resource Allocation Management Based on QoS

83

[11], the author divides the QoS of multimedia network system into network QoS and devices QoS. But it can’t summarize QoS issues in grid circumstance. In grid, QoS issues not only include network QoS issue and devices QoS issue, it should also include resource QoS issue. Resources in grid circumstance not only contain hard resource but also soft resource in the form of information, data and software, etc. They should respond to the resource demand of grid job as well as that of local job. Generally, the priority of local job is higher than that of grid job. So the status of the resource, such as load-factor, arrival rate of local job etc, will have important effect on the quality of resource to grid users. On the other hand, the performance of the device, which the resource lied on, also has important effect on the QoS to grid users. In order to evaluate the quality of grid resource accurately, we separate the devices QoS and resources QoS. Among them the devices QoS parameters express the QoS parameters of the specific devices such as response time, throughput capacity, etc, resources QoS parameters express the performance of resources in grid circumstance, such as load ability, interrupt rate, etc. The devices QoS parameters and resources QoS parameters are totally different. In addition, for all components in OGSA are virtual objects, grid users are only interest in logical resource, which is in virtual organization. The characteristic of physical resource should transparent to grid user. Based on the analyzing of last paragraph, the QoS characteristics of physical resource cannot represent that of logical resource. So in grid, we should also differentiate logical resource and physical resource. The Grid QoS system should provide strategy and function that can convert and map the QoS parameters of logical resources to that of the physical resources.

3.2 The Layered Structure of Grid QoS Because QoS have different descriptions to different objects, for example, QoS demands put forward by the end user maybe only some simple descriptions, such as bad, generally, better, best. The QoS demands of resource in that of grid service QoS is the QoS demands to logical resource and system, for instance, resources are excellent, system response time 180ms, system transmission speeds 2 Mb/s, etc. The final QoS parameters are a group of particular numerical value. Further extended in [11], grid QoS can be divided into four layers, as fig. 1 shows. The top of it is user layer, representing the QoS demand, brought forth by the users when they apply for the grid service. The second is the grid service layer. It describes the QoS demand of grid service, for instance, the responding time of service, the transmission rate of the system, and quality of resource that takes part in the service etc. The third is a layer of system and logical resource, which satisfies the QoS parameters in layer of grid service. In system QoS, it mainly refers to network QoS demands and device QoS demands. In logical resource QoS, it mainly refers to the QoS demands on resources in virtual organizations. The bottom is a layer of network, devices and physical resource. The layer of network QoS describes the performance of network, such as ability of loading, internal time, delay, bandwidth etc. The layer of devices QoS describes the demands on devices such as the responding time of

84

X. Wang and J. Luo

Fig. 1. Layered structure of Grid QoS

devices, throughput, etc. The layer of physical resource QoS describes the demands on physical resource that takes part in the service actually, such as loadable ability of physical resource, useable time, interruption rate of grid job etc. Through the analysis above, we can divide Grid QoS into three parts: network, devices and physical resource. QoS parameters of each layer are coincidentally converted from top to bottom or from bottom to top by specific grid system.

4 GRAM-QoS: GRAM Based on QoS 4.1 The Content of GRAM Based on QoS The activity of GRAM based on QoS exists in the process of service application, service execution and service close. At the stage of service application, the resource allocation management system will define user’s resource demands according to specific grid service, and convert user’s QoS demands to particular grid QoS parameters, then it will take these QoS parameters as constrained factors to search for available resources that satisfy the requirement in grid. In the process of searching, system maybe negotiate with the user, then get final result: not being able to supply, being able to supply or reducing QoS demands to supply. If the negotiation is successful finally, it means that service provider can provide resource with final QoS demands, and then system can carry out the admission control according to specific strategy. After that, it reserves the resource and sent the grid job to resource waiting pool, waiting to be scheduled and executed. If not successful, system should inform the user and terminate the application. At the executive stage of service, the resource allocation management system will monitor the resource which at the reserved state, and renew the relevant QoS information of them. If the information of reserved resource cannot satisfy the user’s demand, it should make new QoS negotiation or choose commutability resource in order to ensure that the user’s QoS demand can be satisfied. As for the resource at reserved state, it will be scheduled according to the strategies offered by the provider. Because in general situation, the priority of local job is higher than that of the grid job, the arrival of local job will lead to the interruption of grid job. In order to ensure that grid job can be done before deadline, it’s necessary to dynamic adjust the priority

Architecture of Grid Resource Allocation Management Based on QoS

85

of grid job, in another word, with closer to deadline, the priority of grid job should becomes higher. Adopting this scheme also can enhance the resource utilization. At the stage of close, system should release the resource and renew the statistical information of the resource. For the use of resource in grid need tolls, system should also record the information such as the cost of the used resource and the user’s account information etc.

4.2 Logical Structure of GRAM Based on QoS (GRAM-QoS) According to the analysis above about the content of GRM based on QoS, in addition to the analysis of the layered structure of Grid QoS, we propose the logical structure of GRAM based on QoS (GRAM-QoS), just as fig. 2 shows. The main modules are explained as follows. 1. Grid Services Market On one hand, Grid Service Market provides the function of inquire about grid service for the grid user; on the other hand, it provides a function of register and publish grid service for the service provider. When the provider register and publish, they should provide identity proves and relevant description of service, such as resource demand and QoS demand which with particular QoS parameters in different layers. 2. Grid Middleware Services This module is mainly responsible for sign-on, safety control, managing user’s information and accounting the information about the used resource. 3. Grid Resource Broker The Resource Information Service Center module is the information center of available resource in grid circumstance. It provides information about the quality and QoS parameters of the logical resource. The Resource Information Provider Service module in Grid resource node offers this information. The QoS Mapping & Converting module implements the mapping conversion from user’s QoS demand to particular QoS parameters in different layers. The QoS Negotiation module in Grid Resource Broker used for judging whether system QoS and logical resource QoS can satisfy user’s demands. The QoS Negotiation module in Grid Resource Node judges whether physical resource QoS, network QoS and devices QoS can satisfy the user’s demands. When presenting resources cannot satisfy the user’s demand, two QoS Negotiation modules should interact with relevant modules and inquire whether the user can reduce QoS demand. The Resource Monitor module is responsible for monitoring the reserved resources. If the QoS parameters of reserved resources cannot satisfy user’s demands, the module would get touch with the QoS Negotiation module to make new QoS negotiation or choose commutability resource. The Resource Information Provider Service module offers the information needed by this module. The Error Process module processes errors that come from the QoS Negotiation module with the resource, which cannot satisfy user’s QoS demands. It finishes the execution of grid service and reminds the user.

86

X. Wang and J. Luo

Fig. 2. Logical structure of GRAM based on QoS

4. Grid Resource Node The Resource Information Provider Service module locates in Grid Resource Node, which is used for monitoring the QoS information of physical resources in grid. It obtains the newest information of resources through the QoS Control module, and provides the Resource Information Service Center module and the Resource Monitor module with renewed information. If the result of the QoS negotiation is that it is able to provide resource that can satisfy user’s demand, the QoS Admission Control module would complete tasks such as resources co-allocation, conflict detect, deadlock detect and load balance, etc. Then, finally the module will finish the affirmation work requested by the user. The Trade Server module is responsible for determine the using price and record the information such as the total cost of the used resource and the user’s account information etc. The Resource Reservation module is responsible for setting resources reservation flag and sending grid job to the Waiting-job Pool, waiting to be scheduled. Otherwise, Waiting-job Pool should responsible for adjusting the priority of gird jobs dynamically. The Scheduler takes charge of the scheduling of jobs in Waiting-job Pool according to particular strategy. In general, the priority of local job is higher than that of the grid job. It is permitted that the grid job has higher priority when grid job is very close to its deadline. The QoS Control module takes charge of the control of all dynamic QoS parameters. It adjusts QoS parameters according to the result of QoS negotiation, such as bandwidth, buffer size, etc. It should also response the inquiry from the Resource Information Provider Service module and renew its state information.

Architecture of Grid Resource Allocation Management Based on QoS

87

4.3 Working Flow Without considering the service register and publication, starting with the grid service application, we give the working flow of GRAM-QoS as follows: 1) Through the Grid Middleware Services module, grid user sign-on the grid system; 2) Grid user inquire and apply for grid service through Grid Services Market; 3) System confirms user’s resource demands according to the specific grid service; 4) The user puts forward QoS demand; 5) The QoS Mapping & Converting module implements the mapping conversion from user’s QoS demand to particular QoS parameters in different layers; 6) System chooses logical resources according to that needed by grid service in Resource Information Service Center. Through QoS negotiating, mapping and converting, system assures that selected logical resources can satisfy user’s QoS demands. If not finding resource that can satisfy the demands, the Error Process module would inform the user and terminate the applying; 7) According to the result from step 6), system inquire about the information of physical resources according the mapping relation between logical resource and physical resource; 8) Through the QoS Negotiation module in Grid Resource Node, system judges whether physical resource QoS, network QoS and devices QoS can satisfy the user’s demands. When presenting resources cannot fully satisfy the user’s demand, two QoS Negotiation modules should interact with relevant modules and inquire whether the user can reduce QoS demand, working flow go back to step 6); 9) Then the QoS Admission Control module implements affirmation work requested by grid job; 10) The Trade Server module determines the using price of resource and records the information about the used resource; 11) The Resource Reservation module sets up resource reservation flag and record QoS demands; 12) Grid job enters Waiting-job Pool. The Waiting-job Pool module takes charge of dynamic adjust the priority of grid jobs in Waiting-job Pool; 13) The Resource Monitor module monitors the state of reserved resources; 14) The Scheduler schedules local jobs and grid jobs in Waiting-job Pool according to particular strategy; 15) The QoS Control module adjusts QoS parameters according to the result of QoS negotiation and reacts the requirement from the Resource Information Provider Service module and renews its information about resource state.

5 Conclusions and Future Work Grid QoS and resource management are key technologies in grid. The relationship between them is much closer in OGSA. Through analyzing the QoS characteristic in grid, we propose and set up the layered structure of QoS in grid, which provides a reasonable gist for mapping and converting QoS parameters in grid. On the basis of

88

X. Wang and J. Luo

analyzing the content of GRAM based on QoS, we put forward the architecture of GRAM based on QoS (GRAM-QoS). The GRAM-QoS provides a reasonable consulting model for the QoS and resource allocation management in grid. In future work, we plan to design a simulation platform based on Grid QoS to experiment the usability of GRAM-QoS and enhance its performance. We also aim to implement GRAMA-QoS on Globus toolkit. Another interesting aspect is to study the technology of dynamic adjust the priority of grid job. In this aspect, we maybe consult relevant solution of CORBA [12]. Furthermore, we plan to formalize the description language about the parameters of Grid QoS, which is based on XML.

References I. Foster, C. Kesselman, Jeffrey M. Nick, S. Tuecke. The Physiology of the Grid: An Open Grid Services architecture for Distributed Systems Integration. http://www.gridforum.org/ogsi-wg/drafts/ogsa_draft2.9_2002-06-22.pdf 2. Chen Hongtu, M. Maheswaran. Distributed dynamic scheduling of composite tasks on grid computing systems. Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, on page(s): 88-97 3. R. Buyya, D. Abramson, J. Giddy. A case for economy grid architecture for service oriented grid computing. Parallel and Distributed Processing Symposium., Proceedings 15th International, Apr 2001, on page(s): 776-790 4. L. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, A. Roy. A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation. Intl Workshop on Quality of Service, 1999 5. K. Czajkowski, I. Foster, and C. Kesselman. Resource Co-Allocation in Computational Grids. Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing (HPDC-8), 1999 on page(s). 219-228 6. W. Smith, I. Foster, V. Taylor. Scheduling with advanced reservations. Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, 1-5 May 2000, on page(s): 127 –132 7. Lizhe Wang, Wentong Cai, Bu-Sung Lee, Simon See, Wei Jie. Resource co-allocation for parallel tasks in computational grids. Challenges of Large Applications in Distributed Environments, 2003. Proceedings of the International Workshop on , 21 June 2003, on page(s): 88 –95 8. Omer F. Rana, Michael Winikoff, Lin Padgham, James Harland. Applying conflict management strategies in BDI agents for resource management in computational grids Author. IEEE Computer Society Press. 2002, on page(s): 205-214 9. Jonghun Park. A scalable protocol for deadlock and livelock free co-allocation of resources in Internet computing. Applications and the Internet, 2003. Proceedings. 2003 Symposium on , 27-31 Jan. 2003, on page(s): 66 –73 10. Grosu Daniel, Anthony T. Chronopoulos. Algorithmic Mechanism Design for Load Balancing in Distributed Systems. Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics: Accepted for future publication, 2003,on page(s): 1-8 11. K. Nahrstedt, R. Steinmete. Resource Management in Networked Multimedia Systems. Computer, May 1995, Volume: 28 Issue: 5, on page(s): 52 –63 12. Y. Wang, F. Brasileiro, E. Anceaume, F. Greve, and M. Hurfin. Avoiding Priority Inversion on the Processing of Requests by Active Replicated Servers. Proc. Int’l Conf. Dependable Systems and Networks (DSN 2001), on page(s): 97-106, 2001 1.

An Improved Ganglia-Like Clusters Monitoring System* Wenguo Wei1,2 , Shoubin Dong 1 , Ling Zhang1, and Zhengyou Liang1 1

Guangdong Key Laboratory of Computer Network, South China University of Technology, Guangzhou, 510641, P.R.China {wgwei, 2

sbdong,

ling,

zhyliang}@scut.edu.cn

Department of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, P.R.China

Abstract. Ganglia [1] is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. We propose an improved Ganglia-like clusters monitoring system, which has more reliability with federation node and associated link failures; some monitoring data is accessed by permission; adding control functions such as restart or shutdown confusion processes; send email or pager to cluster administrator when important event occurs; and optionally select some data to federation node based on user policy in order to speedup the WAN access. We have implemented a prototype system.

1 Introduction Currently there has been an enormous shift in high performance computing from systems composed of small numbers of computationally massive devices to systems composed of large numbers of commodity components. This architectural shift from the few to the many is causing designers of high performance systems to revisit numerous design issues such as scale, reliability, heterogeneity, manageability, and system evolution over time. With clusters now the de facto building block for high performance systems, scale and reliability have become key issues, as many independently failing and unreliable components need to be continuously accounted for and managed over time. Heterogeneity, previously a non-issue when running a single vector supercomputer or an MPP, must now be designed for from the beginning, since systems that grow over time are unlikely to scale with the same hardware and software base. Manageability also becomes of paramount importance, since clusters today commonly consist of hundreds or even thousands of nodes. Finally, as systems evolve to accommodate growth, system configurations inevitably need to adapt. In summary, high performance systems today have sharply diverged from the monolithic machines of the past and now face the same set of challenges as that of largescale distributed systems. * This research was supported by Guangdong Key Laboratory of Computer Network under grant 2002B60113. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 89–96, 2004. © Springer-Verlag Berlin Heidelberg 2004

90

W. Wei et al.

This paper presents the design and implementation of the improved Ganglia-like distributed monitoring system. It is organized as follows. In Section 2, we describe the key challenges in building a distributed monitoring system and why we select Ganglia to improve, then we analyze the architecture of Ganglia, finally we point out Ganglia’s main issues. In Section 3, we describe our current design and implementation of Ganglia-like system that is improved at several fields. In Section 4, we present a performance analysis of our system theoretically. And in Section 5, we conclude the paper.

2 The Presupposition and Foundation of Our Work 2.1 Monitoring System Design Challenges The key design challenges for distributed monitoring systems thus include: Scalability: The system should scale gracefully with the number of nodes in the system. Clusters commonly consist of hundreds nodes. Grid computing efforts, such as TeraGrid [2], will eventually push these numbers out even further. Robustness: The system should be robust to node and network failures of various types. As systems scale in the number of nodes, failures become inevitable. The system should localize such failures so that the system continues to operate and delivers useful service in the presence of failures. Extensibility: The system should be extensible in the types of data that are monitored. It is impossible to know a priori everything that ever might want to be monitored. The system should allow new data to be collected and monitored. Manageability: The system should incur management overheads that scale slowly with the number of nodes. For example, managing the system should not require a linear increase in system administrator time as the number of nodes increases. Manual configuration should also be avoided as much as possible. Portability: The system should be portable to a variety of operating systems and CPU architectures. Despite the recent trend towards Linux on x86, there is still wide variation in hardware and software used for HPC. Systems such as Globus [3] further facilitate use of such heterogeneous systems.

2.2 Why Ganglia? There are a number of research efforts centered on monitoring of clusters, but only a handful that have a focus on scale. Supermon [4] is a hierarchical cluster monitoring system that uses a statically configured hierarchy of point-to-point connections to gather and aggregate cluster data collected by custom kernel modules running on each cluster node. CARD [5] is a hierarchical cluster monitoring system that uses a statically configured hierarchy of relational databases to gather, aggregate, and index cluster data. Compared to these systems, Ganglia differs in four key respects. First, Ganglia uses a hybrid approach to monitoring which inherits the desirable properties

An Improved Ganglia-Like Clusters Monitoring System

91

of listen/announce protocols including automatic discovery of cluster membership, no manual configuration, at the same time still permitting federation in a hierarchical manner. Second, Ganglia makes extensive use of widely-used, self-contained technologies such as XML and XDR which facilitate reuse and have rich sets of tools that build on these technologies. Third, Ganglia makes use of simple design principles and sound engineering to achieve high levels of robustness, ease of management, and portability. Finally, Ganglia has demonstrated operation at scale.

2.3 Ganglia Architecture Ganglia is based on a hierarchical design targeted at federations of clusters (Figure 1). It relies on a multicast-based listen/announce protocol [6,7] to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. Within each cluster, Ganglia uses heartbeat messages on a well-known multicast address as the basis for a membership protocol. Membership is maintained by using the reception of a heartbeat as a sign that a node is available and the non-reception of a heartbeat over a small multiple of a periodic announcement interval as a sign that a node is unavailable.

Fig. 1. Ganglia architecture

Each node (daemon “gmond”) monitors its local resources and sends multicast packets containing monitoring data on a well-known multicast address whenever significant updates occur. All nodes in same cluster always have an approximate view of the entire cluster’s state and this state is easily reconstructed after a crash. Ganglia (daemon “gmetad”) federates multiple clusters together using a tree of point-to-point connections. Each leaf node specifies a node in a specific cluster being federated, while nodes higher up in the tree specify aggregation points. Each leaf node logically

92

W. Wei et al.

represents a distinct cluster while each non-leaf node logically represents a set of clusters. (You can specify multiple cluster nodes for each leaf to handle failures.) Aggregation at each point in the tree is done by polling child nodes at periodic intervals. Monitoring data from both leaf nodes and aggregation points is then exported.

2.4 Ganglia’s Main Issues There are several issues in Ganglia (2.x version): If any aggregation nodes and associated links are failure, then all the data of its leaf nodes (a set of clusters) cannot be collected, i.e. it provides no redundancy of non-leaf nodes and associated links. Ganglia is a distributed monitoring system that may span multiple clusters, if those clusters belong to different organization, some clusters owner may not want anyone to know all information Ganglia reported. So a mechanism must be proposed to access monitoring data by permission. There is little or no control mechanism. Sometimes people not only monitor passively, but also want to restart or kill some processes, or want Ganglia send email or pager to cluster administrator if any important event occurs. Ganglia aggregates all data that gmond reported to client, even if there is many clusters, each cluster is composed of many nodes, and the network is slowly. To speedup network access, only part of monitoring data (e.g. dynamic information) is sent when network is busy, however user have choice to see all data if they want. Infrastructure limitation. Ganglia has a flat namespace, i.e. it assumes that all measurements on hosts can be easily represented by a simple key/value pair. This may hold for some metrics such as the number of CPUs but fails miserably when you want something like the %CPU user for process 1289.

3 Improvements and Implementation 3.1 Improved Reliability of Whole System As systems scale in the number of nodes, failures become both inevitable. If any aggregation nodes and associated links are failure, then all the data of its leaf nodes (a set of clusters) cannot be collected, i.e. it provides no redundancy of non-leaf nodes and associated links. So we assume any of two aggregation nodes have at least two different paths linked (may not directly linked) in the monitored system, i.e. the structure of monitored system isn’t a tree but contain rings. We propose a mechanism to deal with failure of aggregation nodes. Data collection in gmetad is done by periodically polling a collection of child data sources, which are specified in a configuration file. Each data source is identified using a unique tag and has multiple IP address/TCP port pairs associated with it, each of which is equally capable of providing data for the given data source. It use configuration files for specifying the structure of the federation tree for simplicity and

An Improved Ganglia-Like Clusters Monitoring System

93

since computational Grids, while consisting of many nodes, typically consist of only a small number of distinct sites.

Fig. 2. Example of monitored system logic connection relationship

Configuration File’s Structure. If a case of monitored system is as figure 2. Our aggregation node’s configuration file extends Ganglia function. We explain as following (figure 3): There are two types data source, one is gmond (a cluster state), and another is gmetad (a set of clusters state). For gmond type data source, spokesman is any node of the cluster, IP address and TCP port pairs can identify it, and generally we can specify multiple nodes for redundancy. The parent field is up level or same level aggregation node to collect its monitoring data, two types parent nodes, i.e. primary parent is the main aggregation node, secondary parent is backup node, it begins to work when primary is unavailable. For another data source gmetad, there is only parent field, which is also used for data collecting. Example of lowest aggregation node configuration file is as figure 3.

Fig. 3. Example of lowest aggregation node’s configuration file

Procedure of Monitoring Data Collecting. If all aggregation nodes work normally, collecting data is done just as Ganglia by polling child nodes at periodic intervals.

94

W. Wei et al.

When any aggregation node is failure (for example, in figure 2, Agg1 is failure), its upper aggregation node (the primary parent node: Agg11) triggers a message to the failed aggregation node’s lower nodes (Data Source1 and Data Source2) to send monitoring data to its secondary parent node (Agg2). Reliability Analysis. This solution has better reliability then Ganglia. If the failure node is a cluster spokesman, the engine (Listening thread) could collect data from any other spokesman (at least 2 spokesman, however you can specify many spokesman if you like), if the failure node is an aggregation node, which is parent node of a data source, then the data source monitoring data can be collected by another parent node. According to graph connectivity theory [8], the connectivity of monitored system topological graph is at least 2, because any of two aggregation nodes have at least two different links in the monitored system. When both parent nodes fail synchronously, then some data may not be collected because of disconnection of the graph. So our solution can work with many numbers of spokesman failures (at least a spokesman is running) and one of parent node failure, its reliability significantly improved. Anyway, if the connectivity of graph is bigger, the reliability is higher.

3.2 Accessing Monitoring Data by Permission Ganglia is a distributed monitoring system that may span multiple clusters, if those clusters belong to different organization, some clusters owners may not want anyone to know all information (for example the running processes) Ganglia reported. So we propose a mechanism to access monitoring data by permission. We add a one-byte field-- “security” to some metrics by default, anyone whose permission is bigger then the field could see this metric data, and the granularity of permission is metric. Further, we can extend it to more levels security control. However, cluster administrator can customize it by adding more metrics to security control, or reducing some metrics from security control.

3.3 More Control Functions and Aggressive Behaviors Ganglia has little or no control mechanism. Sometimes people not only monitor passively, but also want to clear some died processes, or restart some confusion processes, of course only anyone has permission could do it, or expect Ganglia to send email or pager to cluster administrator if any important event occurs. We implement the process control by root authorization. The cluster administrator will be notified when any important events occur such as disk is over 95% full, CPU load average is unacceptably high, some important processes have died, can’t connect to special IP address and some service is down etc. We modify Ganglia engine to implement this function.

An Improved Ganglia-Like Clusters Monitoring System

95

3.4 Aggregation Monitoring Data by User Policy Ganglia aggregates all data that gmond (Ganglia monitoring daemon) reported to client, even if the number and size of cluster is big, and the network is slowly. To speedup network access, it’s necessary to have part of monitoring data sent, for example, dynamic information is sent when network is busy. We implement a policy to send data selectively by 4 approaches. The first is by basic static data such as the number of CPUs, operating system (name, version, architecture); the second is by dynamic data such as %CPU (user, nice, system, idle), load (1, 5, and 15-minute averages), memory (free, shared), processes (running, total), free swap etc; the third is by granularity of data, I.e. we limit how deeply the xml data is recursively displayed, there are 3 granularity: cluster, node and metric; the fourth is customized by the above 3 approaches. However, user have choice to see all data if they want and have permission to see it.

3.5 Hierarchical Namespace for Monitoring Data Ganglia has a flat namespace, i.e. it assumes that all measurements on hosts can be easily represented by a simple key/value pair. This may hold for some metrics such as the number of CPUs but fails miserably when you want something like the %CPU user for process 1289, which needs at least 3 fields to represent it. We implement a hierarchical namespace that can have any arbitrary depth (limited only by maximum stack size), this hierarchical namespace can copy with vary data structure.

4 Performance Analysis We improve data storage structure by adding a field to some metrics to deal with permission and change flat namespace to hierarchical namespace; repair Ganglia engine (for example, Listening thread) for aggregation node’s failure and send notify when important event occurs etc. These improvements only add little operational overhead, Theoretical analysis shows sacrificing little performance to get more reliability and flexibility. A quantitative comparison analysis of Ganglia and our system gained through real world deployments on distributed-systems is under taking.

5 Summary Our system improves Ganglia from 5 aspects as following: Enhance the whole reliability from aggregation node’s failure; increase permission assignment and control function; more flexible monitoring data collecting and hierarchical namespace for data storage. This system has good performance and reliability to manage middle and large multiple-clusters environment. Further enhancements and optimizations of this

96

W. Wei et al.

model are currently under investigation. Now Ganglia can monitor PlanetLab [9], which currently consists of 102 nodes distributed across 42 sites spanning three continents: North America, Europe, and Australia. We believe our system will do better on CERNET or ChinaGrid in future.

References 1. 2. 3. 4. 5. 6.

7.

8. 9.

Massie, M. L., Chun, B. N., and Culler, D. E. The Ganglia Distributed Monitoring System: Design, Implementation, and Experience, submitted for publication, February 2003. The TeraGrid Project. Teragrid project web page (http://www.teragrid.org), 2001. I. Foster and C. Kesselman. Globus: A meta computing infrastructure toolkit. International Journal of Supercomputer Applications, 11(2): 115–128, 1997. Matt Sottile and Ron Minnich. Supermon: A high speed cluster monitoring system. In Proceedings of Cluster 2002, September 2002. Eric Anderson and Dave Patterson. Extensible, scalable monitoring for clusters of computers. In Proceedings of the 11th Systems Administration Conference, October 1997. Elan Amir, Steven McCanne, and Randy H. Katz. An active service framework and its application to realtime multimedia transcoding. In Proceedings of the ACM SIGCOMM ’98 Conference on Communications Architectures and Protocols, pages 178–189, 1998. Brent N. Chun and David E. Culler. Rexec: A decentralized, secure remote execution environment for clusters. In Proceedings of the 4th Workshop on Communication, Architecture and Applications for Network based Parallel Computing, January 2000. F. Hyarary, Graph Theory, Addison-Wesley, Reading, Mass, 1969. Larry Peterson, David Culler, Tom Anderson, and Timothy Roscoe. A blueprint for introducing disruptive technology into the internet. In Proceedings of the 1st Workshop on Hot Topics in Networks (HotNets-I), October 2002.

Effective OpenMP Extensions for Irregular Applications on Cluster Environments Minyi Guo1, Jiannong Cao2, Weng-Long Chang3, Li Li1, and Chengfei Liu4 1

Department of Computer Software, The University of Aizu, Aizu-Wakamatsu City, Fukushima 965-8580, Japan minyi@u–aizu.ac.jp 2

4

Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong 3 Department of Information Management, Southern Taiwan University of Technology, Tainan County, Taiwan School of Computer and Information Science, University of South Australia, Mawson Lakes, South Australia 5095, Australia

Abstract. Sparse and unstructured computations are widely used in Scientific and Engineering Applications. Such problem inherent in sparse and unstructured computations is called irregular problem. In this paper, we propose some extensions to OpenMP directives, aiming at efficient irregular OpenMP codes to be executed in parallel. These OpenMP directives include scheduling for irregular loops, inspector/executor for parallelizing irregular reduction, and eliminating ordered loops. We also introduce implementation strategies with respect to these extensions.

1

Introduction

Many codes in scientific and engineering computing involve sparse and unstructured problems in which array accesses are made through a level of indirection or nonlinear array subscript expressions. This means that the data arrays are indexed either through the values in other arrays, which are called indirection arrays/index arrays, or through non-affine subscripts. The use of indirect/nonlinear indexing causes the data access patterns, i.e. the indices of the data arrays being accessed, to be highly irregular. Such a problem is called irregular problem. Exploiting parallelism for irregular problems becomes very difficult due to their irregular data access pattern. A typical example is shown in Fig. 1. In the loop, elements are moved across the columns of a 2D array based on the information provided in the indirection arrays prev_elem and next_elem. The elements of array cell are shuffled and stored in array new_cell. If this loop is split across OpenMP threads with different threads taking care of different values of the prev_elam and next_elem may have the same values in different threads at the same time. This may result in a potential problem when updating the value of new_cell. There are some simple solutions to this problem, which include making all the updates atomic, or having each thread compute temporary results which are M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 97–104, 2004. © Springer-Verlag Berlin Heidelberg 2004

98

M. Guo et al.

then combined across threads. However, for the extremely common situation of sparse array access neither of these approaches is efficient.

Fig. 1. A typical irregular loop

2

Requirements of Irregular Loop Scheduling

When parallelizing a loop in OpenMP, we may use the schedule clause to perform different scheduling policies which affect how loop iterations are mapped onto threads. There are four scheduling policies available in OpenMP: static Scheduling, dynamic scheduling, guided scheduling, and runtime scheduling. In order to achieve load balance for irregular loops, it is better to select dynamic or guided scheduling. In dynamic and guided scheduling schemes, the chunk parcel follows the owner computers rule. This rule specifies that, on a single-statement loop, each iteration will be executed by the processor which owns the left hand side array reference of the assignment for that iteration. However, if irregular loops are parceled in terms of dynamic scheduling, the performance of total execution may not be improved even if the load balance is achieved. The reason is that the communication overhead among threads may be considerable, especially for cluster environments or software distributed-shared memory (DSM) systems. Consider the following irregular loop.

Without

loss

of generality, and

for iteration assuming are distributed onto threads

that

Effective OpenMP Extensions for Irregular Applications

99

and respectively. Then the iteration of executing S1, S2, and S3 would be partitioned to threads and respectively. The following table shows the owner of executing assignments and the required communications and synchronizations for the example loop.

We can conclude that owner computes rule is often not best suited for irregular codes. Another situation is shown in Fig. 1. The index may have the same values on different threads. When parallelized in the cluster environment, there are often inter-thread dependencies in which two different iterations of the loop modify the same data. This situation prevents the loop from being processed in parallel, and thus serializes its execution. Though OpenMP provides two mechanisms that help in the parallelization of such loops: atomic and critical directives. These mechanisms, however, are excessively expensive in codes where not all the accesses need to be done in mutual exclusion and in other two situations, irregular reduction and loops in which only some iterations need to be executed in sequential order. As such, we propose a new OpenMP directive irregular, for loops that have irregular data access patterns. The directive will provide an efficient alternative to load balance, iteration partitioning and irregular data access.

3

OpenMP Directive Extensions for Irregular Loops

This section introduces our extensions to OpenMP, the irregular directive. The irregular directive can be applied to the parallel do directive in one of the following situations: When the parallel region is recognized as an irregular loop: in this case the compiler will invoke a runtime library which partitions irregular loop according to a special computes rule. When an ordered clause is recognized in the parallel region where the loop is irregular: in this case the compiler will treat this loop as partially ordered; that is, some iterations are executed sequentially while others may be executed in parallel. When an reduction clause is recognized in the parallel region where the loop is irregular: in this case the compiler will invoke an inspector/executor routines to perform irregular reduction in parallel.

100

M. Guo et al.

The irregular directives in extended OpenMP version may have the following patterns: This designates that the compiler will encounter an irregular loop, where irarrayl,. . .,irarrayN are possible indirection arrays, or This designates that the compiler will encounter a special irregular reduction or irregular ordered loop where expr1, .... exprN are expressions such as loop index variables.

3.1

Irregular Loop Scheduling

Irregular loop are frequently found in the core of scientific and engineering applications. The following loop is a more complicated irregular loop, which is a simplified version extracted from ZEUS-2D code:

For the above loop, the compiler will consider communication overhead when the iteration is partitioned to Although the number of elements to be communicated is 6, same as the former, but the communication steps are reduced (three times). This improvement is important when the outer sequential time step-loop is large. This illustrates that the owner computes rule is not always an optimal scheme for guiding partition of loop iterations.

3.2

Partially Ordered Loops

A special case for the example in Fig. 1 occurs when a shared update needs to be performed not only in a mutual exclusive manner but also in an ordered way. The use of the irregular clause in this case tells the compiler that for those iterations which may update the same data in the different threads, they need to be executed in an ordered way. There is no change to other iterations. An example of code using the indirect clause in this manner is shown as follows:

Effective OpenMP Extensions for Irregular Applications

3.3

101

Irregular Reduction

Some scientific applications need to perform reduction operations which are not directly parallelizable, where the update index for the element is not the induction variable of the loop, but a function of it, or another array. The following code shows an example of such a case.

The example shows that the computation is irregular because accesses to array y are determined by the index arrays idx1 and idx2, preventing the compiler from analyzing accesses exactly at compile time. The only way to parallelize the code using OpenMP is to protect the update of the y array either with atomic or with critical directives. These solutions, however, are excessively expensive in codes where not all the accesses need to be done in mutual exclusion. The inclusion of the irregular clause in the parallel do directive in the presence of a reduction clause tells the compiler that the reduction being performed in the parallel loop has an irregular data access pattern, but some parts of it can be executed in parallel, thus enabling the compiler to generate code to deal with this situation. The implementation of this clause will be introduced in the next section.

102

4

M. Guo et al.

Implementation

Our extended irregular directives for OpenMP can be implemented by adding several library routines in the compiler. The implementation strategies and algorithms are outlined in this section.

4.1

Implementation of Irregular Scheduling

We adopt the strategy of loop iteration partitioning for irregular codes by following the least communication computes rule [4]. Different from the owner computes rule, the whole loop body of a loop to be parallelized is processed. Suppose that all arrays including data arrays and index arrays are initially distributed as BLOCK. The communication pattern of a partitioned loop iteration on a processor can be represented as a directed graph G = (V, E), called communication pattern graph (CPG). The following algorithm describes how iterations of a loop to be partitioned to threads, where is defined as a set of threads which have to send (receive) data to (from) thread before (after) the iteration is executed. and are the number of processors in and respectively. and are defined as the degrees of the set and respectively.

4.2

Implementation of Partial Ordered Loops

To parallelize partial ordered loops, the key technique is to detect data dependence. However, it is very expensive to test most of the data dependencies for irregular codes at runtime. We proposed a symbolic analysis method [3] similar to Range Test [2], which can detect irregular data dependence as much as at compile-time. In our symbolic analysis, symbolic solutions of a set of symbolic expressions are obtained by using certain restrictions. We introduced symbolic analysis algorithms to obtain the solutions in terms of a set of equalities and inequalities.

Effective OpenMP Extensions for Irregular Applications

103

Fig. 2. Performance of X2INTZC program Fig. 3. Performance of IRRCFD program on 8 processors SUN Cluster with three dif- on 8 processors SUN Cluster with the optiference scheduling strategies of OpenMP. mization of irregular reduction and partial ordered loop of OpenMP.

4.3

Implementation of Irregular Reduction

We use GatherScatter approach for the implementation of irregular reduction in our compiler. GatherSatter can generate explicit messages between threads for distributed memory systems. Reductions with regular accesses can be converted directly to collective communication. Irregular reductions may be parallelized by generating an inspector to identify nonlocal data needed by each processor. The inspector also generates a communication schedule and performs address translation, modifying indices of nonlocal data to use local buffers. Inspectors are expensive, but their code can be amortized over many time steps. on each time step an executor gathers nonlocal data using the communication schedule, performs the computation using local buffers, and scatters nonlocal results to the appropriate processors.

5

Experiments, Simulations, and Performance Results

We are constructing the library routines with the OpenMP implementation. We evaluated our extensions on two platforms, SGI Origin2000 with 16 nodes and SUN workstation cluster with 8 × 400MHz CPUs, connected by 100Mbps Ethernet cable. Since irregular scheduling and reduction are not part of the current OpenMP implementation, we implemented them by hand so that this part of the OpenMP program was generated to pthread routines in SGI Origin2000 and MPI routines in the cluster. The OpenMP versions are MIPSpro Fortran 77 Compiler for SGI Origin2000 and SUN Forte Developer 6 Update 2 compiler for SUN Cluster, both provide OpenMP support. Due to the limited space of this paper, we only show the results on the cluster. We select an irregular kernel X2INTZC of the fluid dynamics code, ZEUS-2D for our irregular scheduling study. X2INTZC includes some loops with similar appearance as Example 2. Another application IRRCFD is used to evaluate the irregular reduction and partial ordered loop optimizations. Figure 2 shows 8 CPUs speedup on the cluster for those three scheduling strategies. Here we observed that irregular scheduling improves the performance

104

M. Guo et al.

since it reduces communication cost. This phenomenon is more significant when the time steps are large. Figure 3 presents the optimized results of IRRCFD on the cluster. We see that in both cases the performance after optimization of irregular reduction and partial ordered loops has been improved. In comparison, speedup on the cluster is less than one on SGI Origin2000 because it costs more for inspector/executor manner.

6

Conclusion

The performance issue of irregular scientific computing codes in current OpenMP implementation has not well investigated. At its current version, OpenMP can only sequentially execute loops by using atomic, order loops directives for irregular codes. In this paper, we proposed the new directives to improve the sequentially decomposition of irregular loops, to parallelize irregular reductions, and to reduce atomic and ordered loops. These directives are specially useful for distributed memory multicomputers such as cluster platforms, though the implementation of them in OpenMP compilers may cost extra computation at runtime. The experiments and simulations validated our effort for these proposals. Our proposed method would enable straightforward and efficient automatic parallelization of a wide range of scientific applications.

References 1. R. Asenjo, E. Gutierrez, Y. Lin, D. Padua, B. Pottengerg, E. L. Zapata. On the Automatic Parallelization of Sparse and irregular Fortran codes. Technical Report 1512, University of Illinois at Urbana-Champaign, CSRD, December 1996. 2. W. Blume and R. Eigenmann. Nonlinear and symbolic data dependence testing. IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 12, pp. 11801194, Dec. 1998. 3. M. Guo, Y. Pan, and C. Liu. Symbolic Communication Set generation for irregular parallel applications. The Journal of Supercomputing, Vol. 25, No. 3, pp. 197–214, 2003. 4. M. Guo. Efficient Loop Partitioning for parallel codes of irregular Scientific Computations. IEICE Transactions on Information and Systems, Vol. E86-D, No. 9, pp. 442–451, 2003. 5. E. Gutierrez, R. Asenjo, O. Plata, and E.L. Zapata. Automatic parallelization of irregular applications. Parallel Computing, 26(2000), pp. 1709-1738, 2000. 6. Y. Hu, A. Cox, and W. Zwaenepoel. Improving fine-grained irregular sharedmemory benchmarks by data reordering. In Proceedings of SC’00, Dallas, TX, November 2000. 7. D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguade. Is Data Distribution Necessary in OpenMP? in proceedings of SC 2000, 2000. 8. R. Ponnusamy, J. Saltz, A. Choudhary, S. Hwang, and G. Fox. Runtime support and compilation methods for user-specified data distributions. IEEE Transactions on Parallel and Distributed Systems, 6(8), pp. 815-831, 1995.

A Scheduling Approach with Respect to Overlap of Computing and Data Transferring in Grid Computing Changqin Huang1,2, Yao Zheng1,2, and Deren Chen1 1

College of Computer Science, Zhejiang University, Hangzhou, 310027, P. R. China 2 Center for Engineering and Scientific Computation, Zhejiang University, Hangzhou, 310027, P. R. China

Abstract. In this paper, we present a two-level distributed schedule model, and propose a scheduling approach with respect to overlap of computing and data transferring. On the basis of network status, node load, and the relation between task execution and task data access, data transferring and computing can occur concurrently in the following three cases: a) A task is being executed on a part of its dataset when the other of its dataset is being replicated; b) A dataset of a scheduled task is being replicated to a node, at which another task is running; c) Data exchange happens when dependant subtasks are running at different nodes. Corresponding theoretical analysis and experimental results demonstrate that the scheduling approach improves execution performance and resource utilization.

1 Introduction A computational grid is an emerging computing infrastructure that enables effective access to distributed and heterogeneous computing resources in order to serve the needs of a Virtual Organization (VO) [1]. The performance that can be delivered varies dynamically for resources competing, network status, task type, and so on. Therefore, resource management and scheduling is a key and hard issue. In data management, replication from primary repositories to other locations at an apt moment can be an important optimization step [2,3]. In the present paper, we focus on scheduling approaches suitable for large-scale data-intensive applications or those of dataintensive and computing-intensive nature, which exist widely in the area of engineering and scientific computation. In the present work, we adopt a distributed schedule model, in which there exist two level schedulers. The scheduler schedules task execution on the basis of a variety of metrics and constraints, meanwhile it tries its best to reduce task expending time to improve performance by overlap of computing and data transferring. This paper is organized as follows: Section 2 reviews related work in the arena of grid scheduling. In Section 3, details of our approach and proposed scheduling model are described. An algorithm and apt analysis are included in Section 4. Case studies with experimental results are included in Section 5, and conclusions in Section 6. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 105–112, 2004. © Springer-Verlag Berlin Heidelberg 2004

106

C. Huang, Y. Zheng, and D. Chen

2 Related Work For the development and deployment of applications on computational grids, there are a number of approaches to scheduling. Vadhiyar et al. [4] present Metascheduler with a 2D chart and Metascheduler types. Berman et al. [5] adopt performance evaluation techniques [6], and they utilize the NWS [7] resource monitoring service at application-level scheduling. Abraham et al. [8] use a parametric engine and heuristic algorithms. Zomaya et al. [9] apply a genetic algorithm. Beaumont et al. [10] aim at independent and equal-sized tasks. Dogan et al. [11] consider the problem of scheduling independent tasks with multiple QoS requirements. The above schedules [4,5,8-11] are related to independent tasks or ignore issues of efficient replication. An adaptive scheduling algorithm for parameter sweep applications is used by Casanova et al. [12], and they take data storage into account. The essential difference between their work and ours is that our heuristic actively replicates datasets. Thain et al. [13] describe a system that links jobs and data by binding execution and storage sites into I/O communities, but do not address policy issues. Ranganathan et al. [14] focus on data-intensive applications, where data movement can be either tightly bound to job scheduling decisions or not by a decoupled way. They don’t consider the cases that task computing and data transferring proceed in a parallel fashion on a node.

3 Scheduling Strategy and Scheduling Model To provide the context for this scheduling strategy and system model, we first address the scheduling scenario in detail. Each site (LAN) comprises a number of nodes (such as PCs, clusters, and supercomputers), and each node has a limited amount of storage. A set of data is initially put onto a node, where user’s task is submitted, or it is mapped to nodes at this site according to a certain distribution. The target computational grid consists of heterogeneous nodes connected by LANs and/or WANs. The whole computational grid is hierarchical: node, LAN, and WAN. Scheduling at a single node is ignored here. The scheduling is divided into two levels: Global Scheduler (GS), corresponding to a WAN; and Local Scheduler (LS), corresponding to a LAN. Firstly, tasks are submitted to a site/node, at which the associated LS is activated to schedule the tasks, and when this scheduler fails in scheduling, these requests are passed to the associated GSs, A GS is responsible for determining which site(s) in its domain these tasks are sent to. Finally, the corresponding LS gives a complete schedule by the local scheduling algorithm. It kills these requests at the other schedulers, then lets tasks be executed and results be returned. As far as algorithms are concerned, the “Best” schedule considers information such as CPU speed, network status between hosts, and task properties. This information is retrieved from resource information providers, such as the Network Weather System (NWS) and the Metacomputing Directory Service (MDS). Our approach requires an application-specific performance model and a scheduler. The scheduler schedules

A Scheduling Approach with Respect to Overlap of Computing

107

tasks and make a decision of transferring data. The goal of the scheduler is to develop a schedule that minimizes makespan and maximizes utilization rate. Each scheduler has two components and two queues as shown in Fig. 1, and their functionalities and relations are described as follows.

Fig. 1. Scheduling model and interaction in a scheduler or among schedulers

Task Scheduling Component (TSC): TSC makes a scheduling decision on the basis of information about resources and tasks, and passes some messages of data transferring to Data Transferring Component (DTC), if overlap of computing and data transferring occurs (we will discuss the details in Section 4). When there exist tasks in Arrived Task Queue (ATQ), TSC keeps activated and gives a schedule of all tasks in the associated ATQ, puts the scheduled tasks into the associated Scheduled Task Queue (STQ), and directs the tasks to be executed on selected resources. Within a limited period of time, if the TSC is not able to give a schedule to a certain task, it will deliver the task request to associated GS’s ATQ to schedule the task with the similar method, otherwise, it returns “failure”. Data Transferring Component (DTC): DTC keeps track of the popularity of each dataset locally available. It works in the following two ways: a) Only if DTC receives associated messages from TSC, it can make a decision on “how” to replicate datasets necessary for tasks in STQ or tasks being executed, under conditions that CPU is busy but connected network is idle. b) It makes a decision on “how” to exchange data necessary for dependant subtasks being executed. Finally, it directs nodes to transfer datasets, so computing and data transferring are performed concurrently. Arrived Task Queue (ATQ) and Scheduled Task Queue (STQ): An ATQ stores all tasks to be delivered to its scheduler. Tasks are put into an ATQ when task requests arrive, and a task is taken out when it has been scheduled. The STQ store

108

C. Huang, Y. Zheng, and D. Chen

those tasks scheduled by the local TSC, and its task is taken out when the task comes into execution.

4 Scheduling Algorithm and Theoretical Analysis 4.1 Assumptions Based on the above-mentioned scheduling model, with a concern for network traffic, we limit a task’s execution in a LAN. To simplify the scheduling, we make the following assumptions: a) Each task/subtask (a subtask exists when a task is divided into parallel subtasks) is assigned to a specific node that it can meet its deadline. b) The time spent on it can be predicted by related techniques (e.g., the PACE [15]). c) Before execution, each task/subtask can get the information about the relation of its computing and its dataset (e.g., the computing may proceed on a part of dataset). d) Tasks/subtasks can be pre-scheduled on the basis of task status and grid information. The core algorithm, by which the scheduler schedules task execution, is not uncertain and can be selected by users (e.g., the FCFS algorithm, and the GA algorithm).

4.2 Scheduling Algorithm Both GS and LS adopt the same approach described below except the core algorithms may not be the same. Only if a distributed task gets its dataset/subset, it starts to run.

A Scheduling Approach with Respect to Overlap of Computing

109

4.3 Theoretical Analysis The metric in our analysis used is makespan and average resource utilization rate here. We only analyze efficiency to be brought by our data transferring strategy. The generic algorithm without our approach is assumed: A scheduled task/subtask needs to hit all of its dataset by replication before it starts execution. It is opposite to our approach. To simplify analysis, both computing and data exchanging in the concurrent way, between the dependant subtasks being executed, are not considered here. Under the conditions that the dataset of one task/subtask is divisible, the above algorithm is considered. Let p denote this task or subtask, and m data size. Dataset is divided into n blocks equally. Let x denote the percentage of CPU performance decrease when transferring data on network concurrently, the speed of transferring data on network, the speed of processing data by a CPU, the makespan for generic scheduling algorithm, and the makespan for a our algorithm It happens that both computing and transferring data are performed concurrently except the first block of data is transferred, so we have the following equation:

If

then

reversely, Obviously, under the first condi-

tion, there exists In general, when there exists multi-storage system, and m is very large, so there exists under the second one. Totally, the makespan adopting our algorithm decreases considerably in general. Let denote the average resource (CPU) utilization rate for generic scheduling algorithm, and the average resource utilization rate for our algorithm. We give a period of time

110

C. Huang, Y. Zheng, and D. Chen

Because If x is little and m is large, will increase considerably. If the dataset of one task/subtask cannot be divided, overlapping computing with data transferring can take place between the task/subtask to being executed and the scheduled task/subtask in a STQ. By analyzing, there exists a similar conclusion: Only if x is little and m is large, the makespan will be reduced and the average resource utilization rate will be increased considerably.

5 Experiments We have developed an engineering computation oriented visual grid prototype system, named VGRID, in which tasks are auto-scheduled in a visual fashion, and it permits a selection of task scheduling core algorithms. In this environment, three pairs of experiments have been designed by using the above scheduling approach. The tasks consist of the iterations of two application examples: Monte Carlo integration in high dimensions, including a small dataset transferring; video conversion application, including a large dataset replication and compression. All nodes are PCs with Intel Pentium 4 processors of 2.0G Hz, memory of 512M, Ethernet 100M, and hard disk 80G/7200rpm. The experiments are described as follows, where two approaches are used. Approach A adopts the FCFS algorithm, whereas Approach B adopts the FCFS algorithm with our scheduling approach. Case 1: One task: video conversion. One node. Case 2: Four tasks: Monte Carlo simulation, video conversion, Monte Carlo simulation, and video conversion in sequence. One node. Case 3: Four tasks: Monte Carlo simulation, video conversion, Monte Carlo simulation, and video conversion in sequence. Three nodes. Experimental results are illustrated in Fig.2 and 3. As shown in these figures, different types of tasks, scheduled task sequences and grid resources have different performance scenarios. In all experiments, all average resource utilization rates increase over 15% by adopting our algorithm. But in Case 1, the new makespan decreases very little and the associated average resource utilization rate increases 18%. This means that overlap of computing and transferring data can-

A Scheduling Approach with Respect to Overlap of Computing

111

not bring benefits, but adds a little workload. Therefore our algorithm isn’t very fit for the tasks of this type as performance decrease percentage x is large to these tasks, it occurs under the conditions of competing grid resources.

Fig. 2. Variation of the makespan

Fig. 3. Variation of the average resource utilization rate

6 Conclusions A scheduling model and an associated algorithm were proposed in the present work. This approach tries its best to reduce task expending time to improve performance by overlapping computing with data transferring. We have theoretically analyzed and instantiated this algorithm with three tests based on a FCFS core algorithm in the VGRID under different conditions. Our results show: Firstly, it is obvious to improve system performance. Secondly, the relation of task execution and its dataset, and the size of data have a significant impact on system performance. Though these results are promising, in interpreting their significance we have to bear in mind that they are based on the simplified grid scenarios. The case, that these dependant subtasks move data for exchange, has not yet been studied in detail. Acknowledgements. The authors wish to thank the National Natural Science Foundation of China for the National Science Fund for Distinguished Young Scholars under grant Number 60225009. We would like to thank the Center for Engineering

112

C. Huang, Y. Zheng, and D. Chen

and Scientific Compu-tation, Zhejiang University, for its computational resource with which the research project has been carried out.

References 1.

2. 3.

4. 5. 6.

7.

8.

9. 10.

11.

12. 13. 14.

15.

I. Foster, C. Kesselman et al.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 2001, 15 (3): 200-222 J. Subhlok and G. Vondran: Optimal Use of Mixed Task and Data Parallelism for Pipelined Computations. Journal of Parallel and Distributed Computing, 2000, 60: 297-319 O. Beaumont, A. Legrand et al.: Scheduling Strategies for Mixed Data and Task Parallelism on Heterogeneous Clusters and Grids. Proc. of the 11th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003 S. S. Vadhiyar and J. J. Dongarra: A Metascheduler for the Grid. Proc. of the 11th IEEE International Symposium on High Performance Distributed Computing, 2002 F. Berman et al.: Adaptive Computing on the Grid Using AppLeS. IEEE Transactions on Parallel and Distribted Systems, 2003, 14(4): 369-382 W. Smith, I. Foster, and V. Taylor: Predicting Application Run Times Using Historical Information. Proc. of the IPPS/SPDP Workshop on Job Scheduling Strategies for Parallel Processing, 1998 R. Wolski et al.: The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Future Generation Computing Systems, 1999, (56): 757-768 A. Abraham, R. Buyya et al.: Nature’s Heuristics for Scheduling Jobs on Computational Grids. Proc. of 8th International Conference on Advanced Computing and Communications, Cochin, India, 2000 A. Y. Zomaya et al.: Observations on Using Genetic Algorithms for Dynamic LoadBalancing. IEEE Transactions on Parallel and Distributed Systems, 2001, 9: 899-911. O. Beaumont and L. Carter: Bandwidth-Centric Allocation of Independent Tasks on Heterogeneous Platforms. Proc. of the International Parallel and Distributed Processing Symposium, 2002 A. Dogan and F. Özgüner: Scheduling Independent Tasks with QoS Requirements in Grid Computing with Time-Varying Resource Prices. Proc. of Grid Computing-GRID 2002, 2002, 58-69 H. Casanova, G. Obertelli et al.: The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. Proc. of Supercomputing 2000, Denver, 2000 D. Thain, J. Bent et al.: Gathering at the Well: Creating Communities for Grid I/O. Proc. of Supercomputing 2000, Denver, 2000 K. Ranganathan and I. Foster: Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. Proc. of the 11th International Symposium on High Performance Distributed Computing, 2002 G. R. Nudd et al.: PACE – A Toolset for the Performance Prediction of Parallel and Distributed Systems. Journal of High Performance Computing Applications, 2000, 3: 228251

A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Dependent Tasks in Grid Computing Haolin Feng1,3, Guanghua Song2,3, Yao Zheng2,3, and Jun Xia2,3 1 2

Chu Kechen Honors College, Zhejiang University, Hangzhou, 310027, P. R. China College of Computer Science, Zhejiang University, Hangzhou, 310027, P.R. China 3 Center for Engineering and Scientific Computation, Zhejiang University, Hangzhou, 310027, P. R. China

Abstract. Computational grid has a promising future in large-scale computing, because it enables the sharing of widely distributed computing resources. Good managements with excellent scheduling algorithms are in great demand to take full advantage of it. Many scheduling algorithms in grid computing are for independent tasks. However, communications are very common in scientific computing programs. In this paper, we will propose an easy-implemented algorithm to schedule the tasks with some communications. Our algorithm is suitable for a large proportion of scientific computing programs, and is based on Binary Integer Programming. It is able to meet the users’ quality of service (QoS) requirements, and to minimize the combination of costs and time consumed by the users’ programs. We will give an example of scheduling a typical scientific computing task to show the power of our algorithm. In our experiment, the grid resource consists of an SGI Onyx 3900 supercomputer, four SGI Octane workstations, four Intel P4-2.0GHz PCs and four Intel P4-1.8GHz PCs.

1 Introduction Computational grids [1] become more and more popular in large-scale computing, because they enable the sharing of computing resources that are distributed all over the world. Those computing resources are distributed widely and owned by many different organizations, and thus, good systems for resource management are essential to take full advantage of grids. Published literature provides us with various management systems with different policies and principles ([2, 3, 4]). The most important part of a good management system is an excellent algorithm for scheduling tasks. It is the management systems that do the jobs of resource discovery[8], selecting machines and scheduling the tasks for the users. Nowadays, numerous scheduling algorithms are available [5, 6, 7]. Most of the scheduling algorithms assume the tasks to be independent. Under this assumption, the existing algorithms can still work with many scientific and engineering computing problems. However, the majority of scientific computing problems and computing in engineering require communications among tasks, e.g. computation in areas of solid meM. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 113–120, 2004. © Springer-Verlag Berlin Heidelberg 2004

114

H. Feng et al.

chanics and fluid dynamics. Without considering the communications among tasks, algorithms are of obvious limitation, and cannot take full advantage of the powerful computational grids. However, it is very challenging to schedule general dependent tasks, and so far, there is no existing satisfying solution, due to causes such as the heterogeneous architectures of different machines in the same grid and the limited bandwidths of network transmission. Thus, for the first step, we try to add some constraints on communications in order to achieve an improved scheduling algorithm. In this paper, we present a model that can schedule both independent tasks and dependent tasks with special communications. In this model, we can reach a balance between the cost and the job completion time for different clients. That means, we provide an optimal solution for an objective function, which consists of the cost and the job completion time. Moreover, both the deadline and the budget set by the clients will be met in the algorithm. We select an array of machines for a batch of tasks, either independent or dependent under a constraint condition, and we show that our solution can be achieved by solving a classical problem -- Binary Integer Programming [11].

2 Problem Modeling Suppose a user has submitted a program consisting of N dependent tasks (denoted as and in the grid there are M machines (denoted as available at the moment (M >= N). The machines are different in terms of architectures, computing power, prices of CPU time, as well as the distances from the management system, which will result in different speeds of data transfer. Our goal is to assign each task to a specific machine (processor) in order to minimize the total “cost” -- the money and the time for completing all these N tasks. Note that the tasks would sometimes communicate with each other, which will make our problem even more complicated.

2.1 Assumptions 1) For each single task, while it is assigned to a specific processor in the grid, the time spent on it can be known. Many techniques are available to achieve this [9, 10]. 2) Tasks are dependent, and communicate with each other. The communication happens whenever a certain percentage amount of work of every task has been completed for all the tasks. For example, the communication happens when 10% of work of every task has been done, and then, communication happens again when another 6% work of every task has been done, and so on. We need not know the value of percentage before its execution, but we should know how many times of communication are to happen as well as the scale of communications. Remark. Although that is a constraint, most problems in scientific computing satisfy this assumption. For example, computations in computational structural analysis, in computational fluid dynamics, and in DNA sequencing, all satisfy this assumption. Also, a program with independent tasks can be considered as the program with tasks

A Deadline and Budget Constrained Cost-Time Optimization Algorithm

115

that communicate after 100% of work has been completed. Therefore, our algorithm works with both independent and dependent tasks. 3) According to most accounting systems, the charge for a unit of time is proportional to the resource used by the user at that moment. Provided that, for a specific processor, if its resource used is very close to zero in a period of time, the cost it charges is very close to zero. Thus, it is reasonable to assume that when waiting for communication, we do not have any monetary consumption.

2.2 Algorithm Description 2.2.1 Definition of the Object Function Now we have to define a proper function to measure the “cost”. Defining the function as a weighed sum of the money and the job completion time is a good idea [5]. People have different views of value towards money and time; thus, it is necessary to give our clients the right to specify the weights for the money and the time, respectively. For example, a certain user considers one unit of time as valuable as one hundred units of money, then we set and Here and stand for the weights of money and time, respectively. The users can also set the deadline (denoted as and the maximal cost that can be afforded (denoted as That is why we called it “a deadline and budget constrained cost-time optimization algorithm”. Based on this logic, we define the object function in this way: Here, is the time spent on task given that is assigned to prcessor For each pair of i and j, as we have assumed, is known. is the cost of processor per unit CPU time. If we really assign task to then the value of will be 1, otherwise 0. T stands for the duration from the beginning of the first task to the end of the last task. 2.2.2 Constrained Condition of the Problem

Here q stands for the times of communications happened during the process.

116

H. Feng et al.

The meaning of is the same with that of equation (1). Inequations (3) and (4) mean “deadline and budget constrained”. Equation (5) means that each single task should be assigned to one and only one processor. Inequation (6) means that each single processor should process at most one of those tasks. That is because communications cannot happen until every one of those tasks has finished by the same percentage, which has been assumed in Subsection 2.1. Equation (7) gives the value of the duration from the end of the (k-1)th communication to the end of the kth communication, and is the percentage amount of work completed during these two communications. For each k, stands for the time for the kth communication for task assigned to machine (The value of determines whether or not is assigned to Equation (7) means that, another (percentage amount) of every task has been finished, and then the kth communication will happen. The reason why we use the first “max” is that the communication will not happen until all the tasks are ready for communication. stands for the time spent on the kth communication. Here, we use “max” because of the following reason: the quality of network varies from place to place, and it is the slowest network that determines the time for communications. Equation (8) has given the value of the job completion time of N tasks. We use the term because after the last communication, of each task is left.

3 Model Modifications and Simplification Combining Equations (7) and (8), we have the following equation: From now on, we can use Equation (9) instead of Equations (7) and (8). A large proportion of scientific computing problems spend most of the time on computing, and the time for communications is relatively ignorable. (That does not mean we can ignore communications, because usually much time is spent on waiting before communications.) When the time transferring data is very little, we can just ignore the difference among the networks, and replace the term with Then, Equation (9) is replaced with the following equation: Now, we claim that we can replace Equation (10) with the following inequation: We state that the replacement will not change the solution. The proof is omitted here, due to the space limitation. In this special case our model can be presented in the following way:

A Deadline and Budget Constrained Cost-Time Optimization Algorithm

117

Except that should be binary integers, all the constrained conditions in our model are linear. Thus, the model is straightforward, and is reduced to a classical Binary Integer Programming problem, for which a lot of methods are available. In reality, the user may have some other requirement for some tasks, for example, reliability. In such cases, not all the processors in a grid are suitable for the tasks. If a certain machine is not suitable for a certain task, we set the corresponding term to be greater than In this way, our algorithm can avoid assigning that task to that machine. Therefore, our algorithm can schedule tasks with QoS requirement.

4 Experiments and Evaluation In this section, we will use a numerical program of Computational Fluid Dynamics (CFD) as an example to test our scheduling algorithm. This CFD program simulates the vortex streets downstream near the nozzle in a plane jet. The whole computational domain is divided into two types of sections, the physical domain, and one PML buffer zone at each end of the physical domain. Four processors have been used to get the final results in their respective subsections of the same height (Fig. 1).

Fig. 1. The steady vortex streets downstream near the jet nozzle

Because of its typicality in scientific computation, it is a satisfying example to show the value of our scheduling algorithm. The program consists of four tasks, which are to be scheduled by our algorithm. There are several different types of machines available for these tasks: an SGI Onyx 3900 supercomputer with 64 processors, four SGI Octane workstations and eight personal computers (four of which are better than the rest). According to the costs of these machines, we have assumed the

118

H. Feng et al.

price for each processor (Table 1). We will discuss two different cases. The first case is that the workload of each task is nearly the same; while the second is that the workloads are different. We will see how this difference will affect the outcome of the schedule. As we have four tasks, we need no more than four processors of the supercomputer. We denote these processors as SC1, SC2, SC3 and SC4. Similarly, WS1, WS2, WS3, and WS4 stand for the processors of four SGI workstations, respectively; P4-2-1, P4-2-2, P4-2-3 and P4-2-4 stand for the four P4 2.0G Hz processors; P4-1.8-1, P4-1.8-2, P4-1.8-3 and P4-1.8-4 stand for the four P4 1.8G Hz processors.

To estimate the time for computing and communications, we can run the program and record the real time. In this experiment, through the recorded data, we know that although the communications are frequent, the data transferred during communications are of very small size. Thus, compared with the time for computing and waiting, the total time for transferring data is of very small amount. As is claimed in Section 3, we can simplify our model in this case.

Case 1: The workload of each task is nearly the same, so, for each type of processors, we only have to list the estimation of CPU time needed to complete one task (Table

A Deadline and Budget Constrained Cost-Time Optimization Algorithm

119

2). Let be 10,000 (units of time), and the budget is 400,000 (units of money). And we will set different value of ratio so that we can see how the weights of time and money can affect the scheduling (Table 3). Remark: The data in Table 3 is given by our algorithm. Because the situation for each of those four tasks is the same, for each scheme, we only listed the CPU time for one task. But the monetary consumption and the value of Z are for those four tasks in each scheme. When really running that program on the super computer, the time for waiting and communications is too short to be measured accurately, so the walltime above should be 1984 minutes. However, the difference is small enough to be ignored. Analysis: When the term is less than 0.0655, all the tasks will be processed by the supercomputer, because it can save a lot of time, which the user values highly. If the term is between 0.0655 and 0.3122, all the tasks will be processed by those four P4 2.0G processors. While the term is larger than 0.3122, the four P4 1.8G processors are preferable, because of the low charge. The result shows that the workstations are not used in this case. However, if the user has some other requirements, for example, the requirement for reliability, then, both the workstations and the super computer may be preferable. The details for adding quality of service requirements have been discussed in Section 3. (Fig. 2).

Fig. 2. Changes of the ratio

result in the changes of the scheme

Case 2: The workload of each task is different. We will give a brief discussion on this case. Suppose there are two tasks with larger workload, each of which is twice as much as each of the smaller ones. Let be 0.01:0.99, then, according to our algorithm, we should assigned the larger tasks to the supercomputer, and leave the smaller tasks to the P4-2.0G Hz processors. This will help to minimize the waiting time before communications. But when the value of changes, the scheme will change greatly in order to meet the requirements of difference clients and to minimize the “cost” they defined. Of course, the budget of the program should also be taken into our consideration, so that to meet our goal – “budget and time” constrained algorithm.

5 Conclusions We have presented an algorithm to schedule programs with dependent tasks. Unlike other algorithms, this one takes the communications among tasks into consideration. Although we impose a constraint on the communications, the algorithm is suitable to

120

H. Feng et al.

scheduling a large proportion of programs on scientific computing. Moreover, we reduce the problem to a classical programming problem -- Binary Integer Programming, which can be solved by some existing methods. Our algorithm can meet the users’ quality of service requirements such as the deadline, budget, security and reliability. By scheduling such a typical scientific computing application, our experiment shows how the algorithm meets the requirements of different users, and how the communications will affect the scheme, and thus demonstrate the validity of the algorithm. Acknowledgements. The authors wish to thank the National Natural Science Foundation of China for the National Science Fund for Distinguished Young Scholars under grant Number 60225009. We would like to thank the Center for Engineering and Scientific Computation, Zhejiang University, for its computational resources, with which the numerical experiments have been carried out.

References I. Foster and C. Kesselman (eds.): The Grid: Blueprint for a Future Computing Infrastructure, Morgan Kaufmann Publishers, USA, 1999. 2. GRAM: Grid Resource Allocation & Management, Argonne National Laboratory, and USC Information Sciences Institute. 3. C. Youn: Resource Management and Scheduling in Grid (Concepts and Trends), 2002. 4. R. Buyya, D. Abramson, J. Giddy, and H. Stockinger: Economic Models for Resource Management and Scheduling in Grid Computing, Special Issue on Grid Computing Environments, the Journal of Concurrency and Computation: Practice and Experience (CCPE), 14(13-15), 2002 5. A. Dogan and F. Özgüner: Scheduling Independent Tasks with QoS Requirements in Grid Computing with Time-Varying Resource Prices, Proceeding of Grid Computing-GRID 2002, 58-69, 2002. 6. A. K. Amoura, E. Bampis, C. Kenyon, and Y. Manoussakis: Scheduling Independent Multiprocessor Tasks, Algorithmica, 32: 247–261, 2002 7. R. Buyya, M. Murshed, and D. Abramson: A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Task Farming Applications on Global Grids. (www.cs.mu.oz.au/~raj/, current September 14, 2003) 8. J. Yu, S. Venugopal, and R. Buyya: A Market-Oriented Grid Directory Service for Publication and Discovery of Grid Service Providers and their Service. (www.cs.mu.oz.au/~raj/, current September 14, 2003) 9. M. A. Iverson, F. Özgüner, and C. Lee: Potter: Statistical Prediction of Task Execution Times through Analytic Benchmarking for Scheduling in a Heterogeneous Environment, IEEE Trans. Computers, 48(12): 1374-1379, 1999 10. B. Reistad and D. K. Gifford: Static Dependent Costs for Estimating Execution Time, Proc. of the 1994 ACM Conference on LISP and functional programming, 65–78, 1994. 11. F. S. Hillier, G. J. Lieberman: Introduction to Operations Research, 7th ed., McGraw-Hill Higher Education, 2001. 1.

A Load Balancing Algorithm for Web Based Server Grids Shui Yu, John Casey, and Wanlei Zhou School of Information Technology, Deakin University 221 Burwood HWY, Burwood, VIC 3125, Australia {syu, jacasey, wanlei}@deakin.edu.au

Abstract. Load balance is a critical issue in distributed systems, such as server grids. In this paper, we propose a Balanced Load Queue (BLQ) model, which combines the queuing theory and hydro-dynamic theory, to model load balance in server grids. Base on the BLQ model, we claim that if the system is in the state of global fairness, then the performance of the whole system is the best. We propose a load balanced algorithm based on the model: the algorithm tries its best to keep the system in the global fairness status using job deviation. We present three strategies: best node, best neighbour, and random selection, for job deviation. A number of experiments are conducted for the comparison of the three strategies, and the results show that the best neighbour strategy is the best among the proposed strategies. Furthermore, the proposed algorithm with best neighbour strategy is better than the traditional round robin algorithm in term of processing delay, and the proposed algorithm needs very limited system information and is robust.

1 Introduction Server grids are an important and efficient architecture for Internet based applications. Server grids based on distributed architecture can improve the performance, which is a critical issue of Internet based applications. Nowadays, server grids in the Internet environment are very popular, such as, distributed web based databases, clustering web servers, mirrored servers, anycast servers, peer-to-peer computers, and so on. One issue of Internet based server grids is the load balance among the distributed servers. Most of the existing load balance algorithms [2], [3], [10] are based on the background of a static environment, but in the situation of Internet based server grids, the environment is no longer static because of the unstable Internet traffic, congestions, user requests, and so on. Graph theory is one of the methods of analyzing the load balance issue [3], and Statistics is a useful tool for the load balance problem as well [1], [2], [10]. [5], [6] applied a hydro-dynamic approach to model network traffic. The main advantage of this method is its power in describing dynamic load balancing activities. However, the hydro system describes a continuous world, while computer network systems belong to a discrete environment; therefore certain transformation methods have to be employed. On the other hand, because of its discrete nature, queuing theory has been used for modeling computer networks for decades. However, modeling dynamic load balancing activities using queuing theory is difficult. In this paper, we try to combine M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 121–128, 2004. © Springer-Verlag Berlin Heidelberg 2004

122

S. Yu, J. Casey, and W. Zhou

balancing activities using queuing theory is difficult. In this paper, we try to combine the discrete nature of queuing theory and the power in describing dynamic activities of the hydro-dynamic approach to model load balance activities of Internet based server grids. The rest of this paper is organized as follows. Section 2 refers to the related work. In section 3 we propose the Balanced Load Queue model. A novel algorithm is proposed in section 4 based on the Balanced Load Queue model. The performance evaluation is discussed in section 5. Finally section 6 summaries the paper and presents the future work.

2 Related Work [9] presented the supermarket model to describe load balancing for a group of servers: customers arrive as a Poisson stream of rate at a collection of n servers. Each customer chooses some constant d servers independently and uniformly at random from the n servers, and waits for service at the one with the fewest customers. The service time for a customer is exponentially distributed with mean 1, and the service protocol is first-in first-out. Furthermore, the paper pointed out that the supermarket model is difficult to analyze because of dependencies: knowing the length of one queue affects the distribution of the other queues. Then the author first developed a limiting, deterministic model representing the behavior as and then translated the results from that model to results for large, but finite, values of n. Balls and bins model is used for load balancing research in [2], [10]. The problem is described as follow: suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random, then the largest number of balls in any bin is approximately log n /log log n with high probability. [1] proposed a approach of online load balance based on the balls and bins model, and obtained that if each user samples the load of two resources and sends his request to the least loaded one, the total overhead is small, and the load on the n resources varies by only a O(log log n) factor. [5], [6] introduced a hydro-dynamic approach to solve the dynamic load balancing issue on a network of heterogeneous computers. The authors modeled a computer as a cylinder, the diameter represents the computing capability of the computer and liquid in the cylinder denotes the work load on the computer. Their conclusion is when the system achieves the global fairness, namely the heights of all the cylinders are the same, the system is load balanced, at the same time, the potential energy of the system is minimized. Anycast is a new type of network service which always tries to find the “best” server among the anycast group [7], [11], [13]. Anycast mechanism provides an automatic load balance capability among an anycast group. Our previous research [14] provides a practical and efficient method for anycast routing, which integrates the network delay and the server performance as a criterion for finding the “best” server.

A Load Balancing Algorithm for Web Based Server Grids

123

3 Balanced Load Queue (BLQ) Model The hydro-dynamic approach is very effective for analyzing dynamic load balancing activities of distributed systems. However, the liquid system used in the hydrodynamic approach is a continuous system, while the situation in our computer systems is discrete. On the other hand, the queuing theory is a powerful tool for modeling computer systems, but it is less effective in modeling dynamic systems. In this section we combine the two distinct theories together to model Internet based server grids.

Fig. 1. Server Grid using the Queuing Model

Fig. 2. A queue and the related concepts

Here we model each server in a server grid as a queue, and the queues are connected by networks. Figure 1 shows an example of a server grid with four servers and five connections. In Figure 1, the queues, and represent servers, and there exists a network connecting them together to form a server grid. For each queue the width of a queue denotes its computing capability: the wider, the more powerful. In order to simplify the explanation, we describe some concepts here, which will be used in the rest of this paper. In Figure 2, parameter indicates the moving speed of the requests in the queue m, actually, is the service rate of the related computer. is a request in the queue, and is the service time for in the queue. Based on the definitions in Figure 2, we can find that during the processing of request i, at any time point t, When the processing finished, then Definition 1. Global Fairness (GF). In a server grid ers, if the sums of service time (where

n is the number of the servis the number denoting a

server, and is the sequence number for requests in a queue) in each queue are equivalent, then we call the system is in a state of global fairness. The definition can be expressed in the follow equation.

If a server grid is in the state of global fairness, then the current requests in all the queues will be finished at the same time, and further, that each server is equal for a new incoming request. Assertion 1. If the work load of a server grid with n servers is balanced, then in a given period [0, T] (T is sufficiently big), the system must be in the state of global fairness, namely, the equation (1) is correct.

124

S. Yu, J. Casey, and W. Zhou

Proof: There are three cases for the issue as listed below; any other situations are the combination of them. Case 1. There are no requests in the queues and is the arrival rate of requests for queue i) for the period [0, T]. It is obvious that the equations are correct. Case 2. i =1,2,...,n for the period [0, T]. This means all the arrival rates are bigger than the service rates respectively, namely, the all the servers are busy for the whole period, The assertion is correct. Case 3. Without loss generality, suppose there is no request in and there is/are one or more request(s) in at a given time point For the reason of load balance, if there comes a new request, the request will be dispatched to by the overloaded queue(s), this situation may happen from time to time. Therefore, if T is sufficiently big, the assertion is correct. In all of the three cases the assertion are correct, therefore the assertion is correct for any combination of them, as a result, the assertion is correct for any situation. Assertion 2. When a server grid is in the state of global fairness, then the performance of the whole system is the best. Proof: assume that there are n servers, and the service rates are if the system is not idle, in the state of global fairness, the total service rate is

if

the system is not in the state of global fairness, after a period of time, T, there will be at least one computer having no jobs to do, then the total service rate is/are the server/servers that has/have no jobs to do. It is obvious that therefore the assertion 2 is correct. Assertion 3. In a server grid with servers, if the system is in the state of balance, work load of n servers are balanced, then during a given period [0, T] (T is sufficiently big), the ratios of arrival rate to the service rate for each server are the same. Proof: If the system is in the state of balance, then equation (1) is correct. And we know that We ignore the switching time of processes, then for a long term view, we can obtain the following result.

Where k is a constant, which represents the ratio for convenience. This assertion implies that the relationship between the arrival rate and the service rate is fixed when the load of the system is balanced. Furthermore, parameter k implies the average waiting time for the users when the whole system is fully loaded. When k is bigger, the average waiting time is longer at that scenario.

A Load Balancing Algorithm for Web Based Server Grids

125

Assertion 4. If the work load of n servers are balanced, then during a given period [0, T] (T is sufficiently big), the relationship between mean time a request spends in the system, and the arrival rate is reciprocal. Proof: Assume that n=2, based on the equations of queuing theory, we can get the in terms of and shown as below,

From equation (1), we can obtain,

When

the proof is the same, then in general,

This assertion indicates that the relationship between the arrival rate and the mean time that a request spends in the system when the load of the system is balanced.

4 A Load Balanced Algorithm Based on the BLQ Model The balanced load queue model is good for describing load balance issues for server grids, but it is expensive because we need to know the states of all the queues. Based on assertion 3, we found that if the system is balanced, then the ratio of the arrival rate and the service rate for a given server is fixed. As we know, the service rate of a server is a constant value, for a given value k, if that means the server is approaching to the state of balance; On the other hand, if that means the server is overloaded, and the incoming requests should be dispatched to other servers in order to get the system back to the balanced state. The main advantage of this idea is that we just need to set a reasonable small k when the system is initiated, and then each server can judge weather it is necessary to deviate the incoming request or not without the information of network and any other information about the other servers. We assume that the whole performance of the system is satisfied by the users, which means the k in equation (2) is fixed, and then we get a boundary for arrival rate for each server, respectively. When there comes a new request to server i, the server will calculate its own if then it does nothing, otherwise, it deviates the incoming request to one of the other peer servers. How to decide the destination to process the deviated requests is an interesting issue, we design 3 strategies here for deviation: 1) Random Selection Strategy. Choose one server randomly from the other servers of the server grid; 2) Best Node Strategy. Choose the best one from all the servers of the grid using a global probing.

126

S. Yu, J. Casey, and W. Zhou

3) Best Neighbour Strategy. Choose the better one from the current server’s nearest two neighbors (neighboring servers). The details of the algorithm are shown as below.

We must point out that for the best neighbour strategy and the random selection strategy, there is a potential danger of a deviation loop. The probability of deviation loop is high when the number of servers is small.

5 Performance Analysis We have conducted some experiments on the Internet in order to demonstrate our proposed algorithm and compare the performance of the three strategies for job deviation. Moreover, we use a central controlled algorithm with round robin strategy [4] [8] [12] as a benchmark to evaluate our algorithm. The scenario for our algorithm is that requests are generated everywhere in the Internet and target to one of the servers of the server grid randomly. We know an estimated processing time for each job on a given server. Because of the delay of the deviation, there exists a delay of processing compared with the estimated processing time; we name it as Processing Delay. We use more than ten servers, which are distributed in two campuses, to act as the server grid. In the rest of this section, we will present and compare several factors, which have impact on the performance of the whole system. Figure 3 shows that when the number of nodes (servers) in a server grid increases, the processing delay of the best neighbour strategy keeps almost constant and less than the other two proposed strategies. Generally only the best neighbour strategy of the proposed algorithm is better than the central controller algorithm. The reason is that the best node strategy is expensive while the random selection strategy has no quality control. If the arrival rates are stable, then the number of requests can reflect the general performance in term of time. Based on Figure 4, we can observe that generally the average processing delays of the three strategies and central controller algorithm are close to a constant value respectively. In term of the general performance, best neighbour is better than best node, and much better than the random selection. Both of the

A Load Balancing Algorithm for Web Based Server Grids

127

strategies with quality control are better than the central controller algorithm in term of processing delay.

Fig. 3. No. of Nodes vs Processing Delay

Fig. 5. Network Delay vs Processing Delay

Fig. 4. No. of Requests vs Processing Delay

Fig. 6. Arrival Rate vs Processing Delay

Figure 5 compares the impact of network delay on the processing delay. It shows that the best neighbor strategy is the best in the three proposed strategies and the benchmark algorithm. Arrival rate is a parameter that reflects the concentration of the Internet traffic. The relationship of processing delay and the arrival rate is shown in Figure 6. Based on the result, we can conclude that the performance of the best neighbour strategy is the best in the four strategies.

6 Summary and Future Work In this paper, we proposed the balanced load queue model, which combines the advantages of the queuing theory and the hydro-dynamic approach to model the Internet based server grids. We proposed a load balancing algorithm based on our balanced load queue model, which tries its best to keep the system in the global fairness status using job deviation strategies. We presented three strategies: best node, best neighbour, and random selection for job deviation. We predefined a threshold in the algorithm for each server (queue) in the server grid, which depends on a reasonable delay for users. If one queue’s jobs exceed the predefined threshold, then a job deviation strategy will be employed.

128

S. Yu, J. Casey, and W. Zhou

Our experiments show that the best neighbour strategy is the best among the three strategies and the central controlled strategy at several aspects: number of servers (nodes), number of requests, network delay and arrival rate. The proposed algorithm can work with very limited system information, moreover, it can work independently from network traffic, link breaches, and so on. Some further researches need to be done, for example, the dynamic adjustment for the threshold for a server grid is an important issue for the whole system performance. Furthermore, the deviation loop is a critical and interesting topic for further research.

References 1. 2.

3.

4.

5.

6.

7. 8.

9. 10. 11. 12. 13.

14.

Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli, Upfal, “Balanced Allocations,” SIAM J. COMPUT. Vol. 29, No.1, pp180-200, 1999. Eleni Drinea, Alan Frieze, and Michael Mitzenmacher, “Balls and Bins Models with Feedback,” Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 308-315, 2002. Bharat S. Joshi, Seyed Hosseini, and K. Vairavan, “On a load balancing algorithm based on edge coloring,” Proceedings of the Southeastern Symposium on System Theory, pp174-178, 1997. Jing Liu, Hung Chun Kit, Mounir Hamdi, and Chi Ying Tsui, “Stable Round-Robin Scheduling Algorithms for High-Performance Input Queued Switches,” Proceedings of the Symposium on High Performance Interconnects Hot Interconnects, 2002. Chi-Chung Hui, and Samuel T. Chanson, “A hydro-dynamic approach to heterogeneous dynamic load balance in a network of computers,” Proceedings of the 1996 International Conference on Parallel Processing, pp. III-140-147, 1996. Chi-Chung Hui, and Samuel T. Chanson, “Efficient load balancing in interconnected LANs using group communication,” Proceedings of the International Conference on Distributed Computing Systems, pp.141-148, 1997. W. Jia, W. Zhou, and J. Kaiser, “Efficient Algorithm for Mobile Multicast Using Anycast Group,” IEE Proc.-Commun., Vol. 148, No. 1, February 2001. Tamas Marostis, Sandor Molnar, and Janos Sztrik, “CAC Algorithm Based on Advantage Round Robin Method for Qos Networks,” Proceedings of the Sixth IEEE Symposium on Computers and Communications, 2001. M. Mitzenmacher, “Load balancing and dependent jump markvo processes,” Proceedings of the Annual Symposium on Foundations of Computer Science, pp213-222, 1996. Michael Mitzenmacher, “The Power of Two Choices in Randomized Load Balancing,” Ph.D thesis, 1997. C. Partridge, T. Mendez, and W. Milliken, “Host Anycasting Service,” RFC 1546, November 1993. Jie Wang and Yonatan Levy, “Managing Performance Using Weighted Round-Robin,” Proceedings of the Fifth IEEE Symposium on Computers & Communications, 2000. Dong Xuan, Weijia Jia, Wei Zhao, and Hongwen Zhu, “A Routing Protocol for Anycast Message,” IEEE Transaction on Parallel and Distributed System, Vol. 11, No. 6, June 2000. Shui Yu, Wanlei Zhou, Fuchun Huang, and Mingjun Lan, “An Efficient Algorithm for Application-Layer Anycasting”, The Fourth International Conference on Distributed Communities on Web (DCW2002), Sydney, April 2002.

Flexible Intermediate Library for MPI-2 Support on an SCore Cluster System Yuichi Tsujita Department of Electronic Engineering and Computer Science, Faculty of Engineering, Kinki University 1 Umenobe, Takaya, Higashi-Hiroshima, Hiroshima 739-2116, Japan [email protected]

Abstract. A flexible intermediate library named Stampi for MPI-2 support on a heterogeneous computing environment has been implemented on an SCore cluster system. With the help of a flexible communication mechanism of this library, users can execute MPI functions without awareness of underlying communication mechanism. In message transfer of Stampi, a vendor-supplied MPI library and TCP sockets are used selectively among MPI processes. Introducing its own router process mechanism hides a complex network configuration in inter-machine data transfer. In addition, the MPI-2 extensions, dynamic process creation and MPI-I/O, are also available. We have evaluated primitive functions of Stampi and sufficient performance has been achieved and effectiveness of our flexible implementation has been confirmed.

1

Introduction

The low cost and scalability of a PC cluster have made it the most popular platform today. But there is a difficulty that users need to pay attention to each PC node because each node is operated independently and users need to care heterogeneity in the PC cluster. To provide a seamless computing environment, an SCore cluster system (SCore system) [1] was developed. As MPI [2,3] is the de facto standard in parallel computation, almost all computer vendors have implemented their own MPI libraries. A built-in MPI library of the SCore system, MPICH-SCore [4], is one of the versions of an MPICH library [5]. Although this library is available inside a PC cluster (intramachine MPI communications), dynamic process creation and MPI communications among different platforms (inter-machine MPI communications) have not been supported. To realize such mechanisms, Stampi [6] has been implemented on an SCore system [7]. Recent applications in parallel computation handle huge amounts of data. Almost all data-intensive applications tend to have access to noncontiguous data rather than contiguous one. MPI-I/O was proposed as a parallel-I/O interface to support such I/O patterns in the MPI-2 standard [3]. But MPI-I/O operations among computers have not been supported in any vendor-supplied MPI library. To realize this mechanism, we have developed a flexible MPI-I/O library, named M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 129–136, 2004. © Springer-Verlag Berlin Heidelberg 2004

130

Y. Tsujita

Stampi-I/O [8], as a part of a Stampi library. Users can call MPI-I/O functions in both local and remote I/O operations using a vendor-supplied MPI-I/O library. When the library is not available, UNIX I/O functions are used instead of the library (pseudo MPI-I/O method). An MPI-I/O function call is translated into the combination of the UNIX I/O operations and data manipulations inside a Stampi library. Primitive MPI functions of the Stampi library have been evaluated on interconnected Linux machines, a Linux cluster with an SCore system (SCore cluster) and a Linux workstation. Network connection between them was established on Gigabit Ethernet based LAN. In this paper, outline, architecture, and preliminary results of Stampi on an SCore system are described.

2

Implementation of Stampi on an SCore System

Stampi has been developed to hide complex network configuration and heterogeneity for flexible MPI communication on interconnected supercomputers. The features of Stampi are summarized as follows: 1. flexible communication mechanism among computers, 2. dynamic process creation and remote I/O operation mechanisms based on the MPI-2 standard, 3. flexible mechanism in both local and remote I/O operations, and 4. support of external32 data format among multiple platforms.

To use high availability and flexibility of an SCore system on a heterogeneous computing environment, Stampi has been implemented on the system. Rest of this section describes the details of Stampi on the SCore system.

2.1

Architecture of Stampi on an SCore System

Architectural view of Stampi on a heterogeneous computing environment including an SCore cluster is depicted in Fig. 1. When user processes execute MPI functions, the Stampi library is called at first. High performance intra-machine MPI communication is available using a well-tuned underlying communication library named PM2 [9] via an MPICH-SCore library. Although ROMIO [10] is available in MPICH-SCore, Stampi uses the pseudo MPI-I/O method in local MPI-I/O operations because ROMIO of MPICH-1.2.0 in SCore 5.0.1 does not support error handling. In inter-machine MPI communications, a communication path is switched to the TCP socket connections inside the Stampi library. When computation nodes can not communicate outside, a router process is invoked on the server node to relay messages from/to user processes on the computation nodes. The spawn functions based on the MPI-2 standard have been implemented in Stampi with the help of a remote shell command (rsh, ssh, etc.) to use computational resources effectively.

Flexible Intermediate Library for MPI-2 Support

131

Fig. 1. Architecture of Stampi on a heterogeneous computing environment.

Remote MPI-I/O operations are carried out with the help of an MPI-I/O process which is invoked on a remote computer. I/O interfaces for users are also based on the MPI-I/O APIs. I/O requests from user processes are translated into a message data, and it is transfered to the MPI-I/O process via a communication path switched to the TCP socket connections. The MPI-I/O process plays parallel-I/O operations according to the I/O requests. When a computer does not have own MPI-I/O library, the pseudo MPI-I/O library in Stampi is used. In the pseudo MPI-I/O method, each MPI-I/O function is translated into the combination of UNIX I/O functions such as write() and data manipulations.

2.2

Execution Mechanism of Stampi on an SCore System

Stampi supports interactive and batch mode both. Here, execution method to create child user processes and MPI-I/O processes from an SCore cluster to a remote computer with a batch system is explained using Fig. 2. Firstly, an SCore start-up process (scout) and a router process are initiated by a Stampi start-up command (starter). Then the scout process initiates user processes. When those user processes call MPI_Comm_spawn() or MPI_File_open(), a router process kicks off a starter process on a remote computer with the help of a remote shell command, and it generates a script file which is submitted to a batch queue system according to a specified queue class in an info object. Secondly, the starter written in the script file kicks off user processes or MPI-I/O processes in the case of MPI_Comm_spawn() or MPI_File_open(), respectively. Besides, a router process is invoked on an IP-reachable node if it is required. Finally, inter-machine MPI communication is available via a communication path established between both computers. Remote I/O operations are carried out by the MPI-I/O processes. When the user processes on the SCore cluster call MPI_File_close(), the MPII/O processes are terminated. Next, mechanism of remote I/O operations is explained. As an example, mechanism of MPI_File_write_at_all() in remote I/O operations is illustrated

132

Y. Tsujita

Fig. 2. Execution mechanism of dynamic process creation and remote I/O operation from an SCore cluster to a remote computer.

in Fig. 3. When user processes call this function, several parameters are packed in a user buffer using MPI–Pack(). Then the buffer is transfered to the MPI-I/O process using MPI_Send() and MPI_Recv() of the Stampi library. In these functions, Stampi-supplied underlying communication functions such as JMPI_Isend(), JMPI-Irecv(), and JMPI_Wait() are called for non-blocking TCP socket communications. After message data is transfered, I/O operation is carried out by the MPI-I/O process, and returned values are sent to the user processes. Other MPI-I/O functions of Stampi also use the similar mechanism.

3

Performance Measurement

Performance of MPI communications and MPI-I/O operations was measured on interconnected Linux machines, an SCore cluster and a Linux workstation. Specifications of them are summarized in Table 1. A Linux kernel used in the computation nodes of the SCore cluster is a modified one for an SCore system based on the original Linux kernel. The SCore cluster consisted of one server node and eight computation nodes. Network connections among the computation nodes were established with Gigabit Ethernet (1 Gbps, full duplex mode) through a Gigabit Ethernet switch (Extreme Alpine 3804). While a server node was connected to the switch with 100 Mbps bandwidth. Network connection between the computation nodes and the Linux workstation was made with 1 Gbps bandwidth via the switch and two Gigabit Ethernet switches (NetGear GS524Ts) on Gigabit Ethernet based LAN. Remote I/O operations of Stampi were carried out with inter-machine MPI communications and local I/O operations on a disk which was attached to the Linux workstation using an Ultra160 SCSI connection. In this test, a router process was not used because each computation node could communicate outside directly. Data size was denoted as the whole message

Flexible Intermediate Library for MPI-2 Support

133

Fig. 3. Mechanism of MPI_File_write_at_all() in remote I/O operations. MPI functions in rectangles are MPI interfaces of Stampi. Internally Stampi-supplied functions such as JMPI_Isend() are called.

data size to be transfered. Message data was split evenly among user processes and they were transfered to another user processes or an MPI-I/O process. Transfer rate of inter-machine MPI communications was calculated as (message data size)/(RTT/2), where RTT is a round trip time for ping-pong communication between user processes. In addition, we defined the latency as RTT/2 for 0 Byte message data. In remote I/O operations, latency was measured as operation time for 0 Byte message data.

3.1

Performance of Inter-machine MPI Communications

Performance of inter-machine MPI communications between the computation node of the SCore cluster and the Linux workstation was measured using pingpong data transfer with MPI_Send() and MPI–Recv(). Besides, TCP–NODELAY flag in TCP sockets was activated in the Stampi start-up command to gain higher performance. Performance results are summarized in Table 2. We achieved up to 28 % (35.0/125 × 100 for 256 MByte message data) of the theoretical bandwidth. Performance of inter-machine data transfer using raw TCP sockets was also measured, and the similar performance was observed. Thus, there was not significant performance degradation in inter-machine MPI communication mechanism compared with the case of raw TCP sockets.

3.2

Performance of Remote I/O Operations

Performance of remote I/O operations from the SCore cluster to the Linux workstation was measured using collective MPI-I/O functions,

134

Y. Tsujita

MPI_File_write_at_all() and MPI_File_read_at_all() with TCP_NODELAY flag in the Stampi start-up command. An MPI-I/O process which was invoked on the Linux workstation operated the pseudo MPI-I/O method. Performance values are summarized in Table 3. In both functions, performance values in the cases of single user process and multiple user processes are almost same. It is considered that inter-machine data transfer between the SCore cluster and the Linux workstation is bottleneck in remote I/O operations. To examine the performance values, performance of local I/O operations was measured using Stampi on the Linux workstation. Performance results are summarized in Table 4. Up to 68.3 % (~ 109.3/160 × 100) and 93.9 % (~ 150.3/160 × 100) of the theoretical Ultra160 SCSI bandwidth were achieved for write and read operations, respectively. Using these values, performance of remote I/O operations was estimated roughly. In the case of MPI_File_write_at_all(), total operation time is estimated to be the sum of operation times for transfer of parameters and bulk data, local I/O on a remote computer, and transfer of returned values. In this estimation, the operation time to transfer the parameters is supposed to be same with the latency (57 from Table 2) because length of message data was a few Bytes. Operation times to transfer and write a 1 MByte data were 29.5 ms (~ (1 MB)/(33.9 MB/s)) and 9.84 ms (~ (1 MB)/(101.6 MB/s)), respectively. As

Flexible Intermediate Library for MPI-2 Support

135

length of the returned values was a few Bytes, the time for this operation is also supposed to be 57 with the same reason for the parameters. Thus the total time was estimated as 39.5 ms (~ 57 + 29.5 ms + 9.84 ms + 57 ) in the single user process case, while measured one was 42.7 ms (~ (1 MByte)/(23.4 MB/s)). It is noted that there were negligible and unconsidered processing times in data manipulation, context switch inside user and MPI-I/O processes, and so on.

4

Summary

In this paper, outline, architecture, and preliminary performance results of Stampi on an SCore system are reported. Stampi on an SCore system realizes intra-machine and inter-machine MPI communications with a high performance MPICH-SCore library and TCP sockets, respectively. Dynamic process creation based on the MPI-2 standard is also supported among computers. In addition, Stampi supports both local and remote MPI-I/O operations using a vendor-supplied MPI-I/O library. If the library is not available, a pseudo MPI-I/O library using UNIX I/O functions is used. In remote I/O operations, Stampi achieved sufficient performance considering performance values of inter-machine MPI communications and local I/O operations. The bottleneck in remote I/O operations was considered to be mechanisms in inter-machine MPI communications. Although there was the bottleneck, transfer rates were almost same in the case of up to four user processes.

136

Y. Tsujita

Acknowledgments. The author would like to thank Prof. Genki Yagawa, University of Tokyo and director of Center for Promotion of Computational Science and Engineering (CCSE), Japan Atomic Energy Research Institute (JAERI), for his continuous encouragement. The author would like to thank the staff at CCSE, JAERI, especially Toshio Hirayama, Norihiro Nakajima, Kenji Higuchi, and Nobuhiro Yamagishi for providing a Stampi library and giving useful information. This research was partially supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Grant-in-Aid for Young Scientists (B), 15700079, 2003.

References 1. PC Cluster Consortium: http://www.pccluster.org/. 2. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, June 1995. 3. Message Passing Interface Forum: MPI-2: Extensions to the Message-Passing Interface Standard, July 1997. 4. M. Matsuda, T. Kudoh, and Y. Ishikawa: Evaluation of MPI Implementations on Grid-connected Clusters using an Emulated WAN Environment. In Proceedings of the IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003, pp. 10–17. 5. W. Gropp, E. Lusk, N. Doss, and A. Skjellum: A high-performance, portable implementation of the MPI Message-Passing Interface standard. Parallel Computing, 22(6), 1996, pp. 789–828. 6. T. Imamura, Y. Tsujita, H. Koide, and H. Takemiya: Architecture of Stampi: MPI Library on a Cluster of Parallel Computers. Recent Advances in Parallel Virtual Machine and Message Passing Interface, LNCS 1908, Springer, 2000, pp. 200–207. 7. Y. Tsujita, T. Imamura, N. Yamagishi, and H. Takemiya: MPI-2 Support for Heterogeneous Computing Environment Using an SCore Cluster System. Parallel and Distributed Processing and Applications, LNCS 2745, Springer, 2003, pp. 139– 144. 8. Y. Tsujita, T. Imamura, N. Yamagishi, and H. Takemiya: Stampi-I/O: Flexible Distributed Parallel-I/O Library for Heterogeneous Computing Environment. Recent Advances in Parallel Virtual Machine and Message Passing Interface, LNCS 2474, Springer, 2002, pp. 288–295. 9. T. Takahashi, S. Sumimoto, A. Hori, H. Harada, and Y. Ishikawa: PM2: High Performance Communication Middleware for Heterogeneous Network Environments. In SC2000: High Performance Networking and Computing Conference, IEEE, November 2000. 10. R. Thakur, W. Gropp, and E. Lusk. On Implementing MPI-IO Portably and with High Performance. In Proceedings of the Workshop on I/O in Parallel and Distributed Systems, May 1999, pp. 23–32.

Resource Management and Scheduling in Manufacturing Grid Lilan Liu1, Tao Yu1, Zhanbei Shi2, and Minglun Fang1 1

CIMS & Robot Center of Shanghai University, Shanghai, China, 200072 2 Computer Science of Shanghai University, Shanghai, China, 200072 [email protected]

Abstract. In order to resolve resource management and scheduling problem in Manufacturing Grid (MG) - an application of Grid technology, we develop a resource management and scheduling system with the interaction of Manufacturing Grid Information Service (MGIS) and the Manufacturing Grid Resource Scheduler (MGRS). The former, MGIS, provides fundamental mechanisms for remote resource encapsulating, registration, and monitoring, and the latter, MGRS, performs scheduling roles as Global Process Planning (GPP) analyzing, resource discovery, resource selection, and resource mapping.

1 Introduction Manufacturing resources, ranging from software, such as Computer Aided Design (CAD), Computer Aided Process Planning (CAPP), and Computer Aided Manufacturing (CAM), to various kinds of machine tools, such as Computerized Numerical Control (CNC), and Rapid Prototype Manufacturing (RPM), etc, are quite distinct from those computing resources or data resources. This particularity increases the complexity of resource management and scheduling in Manufacturing Grid (MG), which has been proposed in our previous research [1, 2, 3]. So, a Manufacturing Grid Information Service (MGIS) and a Manufacturing Grid Resource Scheduler (MGRS) are proposed in this article to construct the resource management and scheduling system in MG. MGIS provides functions for remote resource encapsulating, registry, and monitoring. And, MGRS performs scheduling roles as Global Process Planning (GPP) analyzing, resource discovery, resource selection, and resource mapping.

2 Resource Management and Scheduling in MG 2.1 Resource Management and Scheduling System Due to the characteristics of manufacturing resources, we investigate a resource management and scheduling system, which includes MGIS and MGRS, as shown in Fig.1. With this system, Manufacturing Grid enables large-scale sharing of resources and M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 137–140, 2004. © Springer-Verlag Berlin Heidelberg 2004

138

L. Liu et al.

collaborative working among formal or informal enterprises and/or institutions: what are called Virtual Organizations (VO) or Virtual Enterprise (VE).

Fig. 1. Resource Management and Scheduling System

Fig. 2. Manufacturing Grid Information Service

In the following sections, we’ll mainly discuss the MGIS and the MGRS.

2.2 Manufacturing Grid Information Service In this research, we constructed the MGIS in MG system with the help of Index Service provide by GT3 [4, 5], as shown in Fig.2. The functions of its two important components are as follows: Resource Templates. In manufacturing, resource differs greatly to each other in terms of its nature (physical characteristics like location and functionality), the demands placed on it (time, quality, cost or service), and the ways in which it is employed (e.g., discovery, brokering, monitoring, diagnosis, adaptation). Nevertheless, in each case we see a similar structure: the resources belonging to the same category are similar in characteristics and demands. So, we design many templates of inhomogeneous resources, which describe the attributes, demands and interfaces of these kinds of resources. And resource templates can increase with the expanding of MG system. Index Service providing collective-level indexing and searching functions, Index Service is used to obtain information from multiple resources, and acts as an organization-wide information server for a multi-site collaboration.

2.3 Manufacturing Grid Resource Scheduler The utilization of Manufacturing Grid is distinguished from other data Grid or computing Grid applications in two ways: tasks. The tasks submitted in MG are not formulas or data but products; requirements. The consumer’s requirements usually include user satisfaction, product quality and service, time-to-market, and cost, which are normally called TQCS (Time, Quality, Cost, and Service) [6, 7].

Resource Management and Scheduling in Manufacturing Grid

139

So, we develop a TQCS -based Manufacturing Grid Resource Scheduler (MGRS) to perform scheduling functions in MG, shown in Fig.3.

Fig. 3. Manufacturing Grid Resource Scheduler (MGRS)

The functions of the four main components in MGRS are as follows: GPP Analyzing. Based on the information from GPP knowledge database, Global Process Planning (GPP) aims at analyzing and decomposing the submitted task into a few serial or parallel basic manufacturing subtasks. Resource Discovery. The goal of this step is to identify a list of authorized resources that are available to the consumer by interacting with MGIS. And, if possible, the preliminary filter of resources can be made in this step. Resource Selection. Once the list of possible resources is known, MGRS will select those resources that meet the basic constraints imposed by the user. Resource Mapping. In this stage, the optimal solution is chosen to map the subtasks onto resources. The choice of the best pairs of tasks and resources is a multiobjective decision-making problem in manufacturing. And the optimization criteria are often the random combination of Time, Quality, Cost and Service (TQCS).

3 Conclusions In order to solve resource management and scheduling problem to handle dynamic changes in availability of manufacturing resources and user requirements in Manufacturing Grid (MG), we develop a resource management and scheduling system with the interaction of Manufacturing Grid Information Service (MGIS) and Manufacturing Grid Resource Scheduler (MGRS). The former, MGIS, provides fundamental mechanisms for remote resource encapsulating, registration, discovery and monitoring, and the latter, MGRS, performs scheduling roles as Global Process Planning (GPP) analyzing, resource discovery, resource selection, and resource mapping.

140

L. Liu et al.

References 1. Liu Lilan, Yu Tao, Shi Zhanbei, etc: Self-organization Manufacturing Grid and Its Task Scheduling Algorithm. Computer Integrated Manufacturing Systems (2003). 2. Liu Lilan, Yu Tao, Shi Zhanbei, etc: Research on Rapid Manufacturing Grid and Its Service Nodes. Machine Design and Research (2003). 3. Shi Zhanbei, Yu Tao, Liu Lilan: Service Registry and Discovery in Rapid Manufacturing Grid. Computer Applications (2003). 4. GT3 Index Service User’s Guide. http://www.globus.org/ogsa/releases/final/docs/infosvcs /indexsvc ug.html. 5. Thomas Sandholm, Jarek Gawor: Globus Toolkit3 Core – A Grid Service Container Framework. http://www-unix.globus.org/core/. 6. Kavitha Ranganathan, Ian Foster: Computation and Data Scheduling for Large-Scale Distributed Computing. http://www.globus.org/research/papers.html 7. S. H. Wu, J. Y. H. Fuh, A. Y. C. Nee: Concurrent Process Planning and Scheduling in Distributed Virtual Manufacturing. IIE Transactions (2002).

A New Task Scheduling Algorithm in Distributed Computing Environments Jian-Jun Han and Qing-Hua Li Department of Computer Science and Technology, Huazhong University of Science & Technology, Wuhan 430074,China han_j _j @16 3.com

Abstract. Distributed computing environments are well suited to meet the computational demands of diverse groups of tasks. At present the most popular model characterizing task’s precedence is DAG(directed acyclic graph). In [2], a novel model called TTIG(Temporal Task In Interaction Graph)that is more realistic and universal than DAG and its corresponding algorithm called MATE are presented. This paper extends TTIG model and proposes a new static scheduling algorithm called GBHA(group-based hybrid algorithm) that eliminates cycles when traversing TTIG, so that global information can be captured. Simulation results show that our algorithm outperforms MATE significantly in homogeneous systems.

1 Introduction Efficient scheduling of application tasks is a key issue for achieving high performance in parallel and distributed systems. Since the general DAG scheduling algorithm is NP-complete [1], many research efforts have been made in this research field[3-5]. Among all of these scheduling algorithms, list-scheduling algorithm has been shown to have a good cost-performance trade-offs and static scheduling outperforms dynamic scheduling in most cases when precedence, computation and communication volumes of tasks are known a priori. However, most of these algorithms are base on DAG, each task of which communicates with other tasks only at the beginning and at the end of this task, so that it is not well suited to model iterative parallel programs that repeatedly alternate computation and communication phases with other tasks. Hence, [2] proposes a new model, called TTIG, to characterize dependencies between tasks and get rid of drawback described above in DAG, and its corresponding scheduling algorithm called MATE. In this work, we extend TTIG and propose a new algorithm called GBHA that ranks the nodes upward based on groups and maps the nodes onto processors according to their earliest completion time or earliest start time of each processor.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 141–144, 2004. © Springer-Verlag Berlin Heidelberg 2004

142

J.-J. Han and Q.-H. Li

2 Modified TTIG Unlike TTIG ,Modified TTIG is derived from TFG directly without taking time of concurrency into account. First, some new concepts are described as follows: Definition 1.Normal Node (NN). NN is the same as the node that communicates only at the beginning or at the end in DAG. Definition 2. Composite Node (CN). CN is derived from TFG, which exchanges information with other nodes repeatedly within nested loops in program code. Definition 3. Component of Composite Node (COCN). COCN is the component of one CN. Each component of a CN should be assigned to the same processor. In figure1, T1 is comprised of four components. Definition 4.Direct Precedence Set of NN (DPSNN) where and Definition 5.Direct Successor Set of NN (DSSNN) where or

and

Definition 6. Direct Precedence Set of CN (DPSCN) where or

and

Definition 7 .Direct Successor Set of CN (DSSCN) or where

and

Definition 8.Group (GP) Group is the core of our algorithm, whose main goal is to prevent yielding cycles. In order to prevent forming cycles in ranking procedure, the nodes in cycle paths are constructed in groups. [6]gives mote details. GP(m): refers to the node set in group m. QG: Each group has a two-dimension array called queue of group, each of which represents a path in graph. QG(m)(n) refers to the nth queue of the group m. NQ(m):refers to the number of queues in group m. DPSG(m): denotes the set of nodes from which communication data are transferred directly to GP(m). where DSSG(m): denotes the set of nodes to which communication data are transferred directly from GP(m). where Next, the construction procedure of group is described as follows: Initially, g=0, QG(m)=null for each m .

A New Task Scheduling Algorithm in Distributed Computing Environments

Step 1. If

143

where

and

then

remedy accordingly.

Step 2. For each where

where

if there exist paths from

to

(a group is treated as one node when traversing the modified

TTIG graph in depth-first-search. When one node that belongs to a group n is traversed, all nodes in this group are traversed, and the next nodes to be traversed are in DSSG(n)), then find out all paths from to For each node in paths, if any GP(n) where

then GP(m)=GP(m)

remedy DPSG(m),

DSSG(m), QG(m), NQ(m) accordingly. Otherwise, if where and then merge group n into group m, remedy parameters in group m accordingly and delete group n, It is important to note that each path yields one queue in a group. Step 3. For each where and is a CN, if there exist cycle paths from

to

then find out all paths from

to

g’= g’+1, and append

to the new group.

Similar to above strategy, append the nodes in paths to and remedy parameters accordingly. Note that each path yields one queue in a group. Theorem 1. Given that each group m is treated as a unit in graph and let the nodes in DPSG(m),DSSG(m) be this unit’s predecessors and successors respectively. Then, it is impossible to find cycle when traversing the graph. Proof. We assume that a cycle can be found when traversing the graph, then there must exist a CN in this cycle path, otherwise it will contradict the rule that it is impossible to find cycle in DAG. Hence, without loss of generality, let node i be composite node in this cycle path. If node i belongs to group n, it will contradict the assumption that each group is treated as a unit. If node i doesn’t belong to any group, two cases occur: 1)If all other nodes in this path are NNs except node i, all nodes in this path are merged into a group according to step 3. 2) Provided that there exists a composite node j besides node i, all nodes in this path are merged into a group according to step 1 or step 3 if j doesn’t belong to any group. Similarly, nodes in this path are merged into a group according to step 2 if node j is a group node. Therefore, two cases stated above contradict the assumption that each group m is treated as a unit.

3 GBHA Algorithm and Simulation Experiment The procedure of GBHA1 used in homogeneous system is presented as follows:

144

J.-J. Han and Q.-H. Li

Step 1. Search and construct groups in modified TTIG. Step 2. Sort queues for each group in non-increasing order of their length. Step 3. Rank the nodes in the graph. Step 4. Enqueue the start node into sorted list. Step 5. If there is unscheduled task in sorted list, then select the first task

in the

sorted list, 1)if and then assigning it to the processor on which task i can start execution the earliest, only two processors are considered here as mentioned above. 2)if and then assigning it to the processor on which the first component of task i can start execution the earliest. 3)if then mapping all unscheduled nodes in group g to processor in non-increasing order of length of queue to which Ti belongs. If the method is the same as 1). If , the method is the same as2). After Ti is scheduled, its ready successors are added to FIFO list or sorted list and Ti is dequeued from sorted list.Repeat Step 5. Since group can eliminate cycles in TTIG, many mature heuristics can be used in GBHA. As GBHA captures global behavior of TTIG, GBHA outperforms MATE significantly. The details of algorithm and simulation results refer to [6].

4 Conclusion In this paper, we extend TTIG model further, and propose a new method based on group called GBHA, which outperforms MATE significantly and can be comparable to efficient multiprocessor scheduling algorithms based on DAG but with a significant lower time complexity.

References 1. M.R.Garey and D.S.Johnson. Computers and Intractability: A guide to the Theory of NPCompleteness. W.H.Freeman and Co..1979. 2. C.Roig, A.Ripoll, M.Senar, F.Guirado, and E.Luque. A new model for static mapping of parallel applications with task and date parallelism. IEEE Proceedings of the International Parallel and Distributed Processing Symposium, 2002, 78-85. 3. M.Tan.H.J.Sigle, J.K.Antonio, and Y.A.Li. Minimizing the application execution time through scheduling of subtasks and communication traffic in a heterogeneous computing system. IEEE Trans, on Parallel and Distributed Systems, Aug.l997,8(8):857-871. 4. H.Topcuoglu, S.Hariri, and M.-Y.Wu. Task scheduling algorithms for heterogeneous processors. In Proc. Heterogeneous Computing Workshops, 1999. 5. A.Radulescu,A.J.C.van Gemund. Low-cost Task Scheduling for Distributed-Memory Machines. IEEE Transactions on Parallel and Distributed Systems,2002,13(6):648-658. 6. A New Static Task Scheduling Algorithm in Homogeneous Computing Environments. MiniMicro Systems. to appear.

GridFerret: Grid Monitoring System Based on Mobile Agent Juan Fang, Shu-Jie Zhang, Rui-Hua Di, and He Huang College Of Computer Science, Beijing University of Technology, Beijing 100022, China [email protected]

Abstract. GridFerret system is a grid resource discovery and monitoring system which bases on mobile agent, it applies mobile agent technology to grid environment. The existed resources in the grid environment, the status information of grid computing nodes, the optimized information of current grid environment can be provided by GridFerret system, the introduction of mobile agent technology reduces the network traffic during the grid resource discovery and monitoring process effectively.

1 Introduction The concept of grid should include three characteristics: coordinates resources that are not subject to centralized control, using standard, open, general-purpose protocols and interfaces, to deliver nontrivial qualities of This paper introduces the content and resolve methods of grid resource monitoring, a new grid resource discovery and monitoring model is constructed based on mobile agent technology.

2 Grid Monitoring and Related Technology 2.1 MDS Architecture of Globus Toolkit 2.4 In the context of Globus Toolkit, information services have the following requirement: A basis for configuration and adaptation in heterogeneous, dynamic environments Access to static and dynamic information regarding system components Scalable, efficient access to dynamic data Uniform, flexible access to information Decentralized maintenance MDS can aggregate information from multiple systems at a physical site as well as aggregate information from multiple sites within the project.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 145–148, 2004. © Springer-Verlag Berlin Heidelberg 2004

146

J. Fang et al.

2.2 Aglet System Model We introduce mobile agent to design the GridFerret system, exert the characteristics of mobile agent, reducing the network traffic of grid resource discovery and monitoring process effectively. GridFerret system adopts Aglet platform which based on Java language to exploit, it provides a simple and general mobile agent programming model, then provides dynamic and efficient communicate mechanism.

3 GridFerret System 3.1 The Design of GridFerret Directory Information Tree GridFerret system adopts distributed directory structure to describe the structured characteristics of grid. GridFerret system adopts OpenLDAP, each server customizes schema according to rfc criterion. Global LDAP server answers for store grid resource name and position information, it doesn’t store concrete information of resource usage status. The local LDAP server of each node stores the concrete information local file system, storage, network, professor, job, OS, etc. GridFerret system customizes gridferret.schema according to rfc criterion. For instance, the definition of GF-Group-Name is as follows: ATTRIBUTETYPE (1.3.18.0.2.4.712 NAME

'GF–Group–Name'

DESC

'Ferret Group NAme'

EQUALITY caseIgnoreMatch ORDERING caseIgnoreOrderingMatch SUBSTR

caseIgnoreSubstringsMatch

SYNTAX

1.3.6.1.4.1.1466.115.121.1.15)

OBJECTCLASS (1.3.18.0.2.6.158 SUP organization MUST GF–Group–Name MAY (GF–validfrom $ GF–validto $ GF–keepto)) The GF-validfrom, GF-validto and GF-keepto define the efficient time of node updating; other attributes are similar with above.

GridFerret: Grid Monitoring System Based on Mobile Agent

147

3.2 GridFerret Architecture The primary agents and their function of GridFerret system are listed as follows. SensorAgent: Monitoring the status information of local resource. RegisterAgent: VO’s members may use it to register, and unregister, if the static information of node changed, it can be updated by RegisterAgent. LDAPAgent: Accomplish the operation of addition, deletion, update, query of LDAP directory server according to API defined by Java JNDI. UpdateAgent: Calling SensorAgent to update the status of nodes, and then achieve all the real-time data and status information. It also acts as static agent to stay at agent runtime environment to run update in turn. QueryAgent: Searching corresponding resource information in grid computing environment by query criteria. Grid node can join to virtual organization by RegisterAgent.

Fig. 1. Registry and unregistry of grid node

4 Implementation of GridFerret At present the system has implemented in LAN, we use five servers to experiment, a dell2400 server as global LDAP server; three nodes are two dell2400, one IBM5000. Another IBM4400 is used as backup server of global LDAP server to ensure system continues working when system collapsed. GridFerret system and LDAP server adopt Linux platform, Client end adopts Windows platform, and LDAP directory server adopts OpenLDAP. Each node has Aglet runtime environment, the registry, unregistry and update system status information can be done.

148

J. Fang et al.

5 Conclusion The grid monitoring system based on mobile agent considers the characteristics of grid computing environment; it provides some pertinent mechanism and strategy. Most mobile agent platforms are based on Java, so they have very good expansibility. In addition, comparing with the distributed application which based on RPC manner, the movement of mobile agent need not long time steady connect to network, it can alleviate network load greatly. When monitoring grid resource which distributed wide area, mobile agent avoids network transfer of a great deal data, it will improve the system run efficiency and reliability.

References 1. Ian Foster. What is the Grid? A Three Point Checklist [EB/OL]. http://www.gridtoday.com/02/0722/100136.html, 2002 2. http://www.globus.org/ogsa/TechResources/MDS.html

Grid-Based Resource Management of Naval Weapon Systems Bin Zeng l,2, Tao Hu 2 , and ZiTang Li 1 1

School of Computer Science, Huazhong University of Science and Technology, Wuhan 430074, China [email protected] 2

University of Naval Engineering, Wuhan, 430030, China

Abstract. The continuous transformation of the Chinese navy into an integrated and network-centric capability requires a cooperative and distributed weapon resource management system. As one of the steps into naval information grid our objective is to develop generalized principles for grid computing that can be applied to this specific domain. To address this problem, we adopt the OGSA (Open Grid Service Architecture) technology to rebuild the legacy weapon manager. This paper proposes a generic co-ordination WRM (Weapon Resource Management) based on the integration of grid capabilities. The architecture can effectively shape a stovepipe and self-contained system into a service community equipped with open interfaces thus enabling the command to make fast, high quality weapon allocation decisions across widely distributed platforms.

1 Introduction In the past the Navy has acquired numerous weapon systems that alone can be considered complex systems. However, the current reality is that these systems cannot be viewed as operating in isolation. Grid Technology is an excellent choice of building an open system and system of systems. Network-centric weapon resource management system in the idealized vision is a “publish and subscribe”, “plug and play” network, in which any application can be “plugged” into the network anywhere, at any time, to help achieve warfighting objectives. For applying the advantages of Grid to the widely distributed naval weapon systems, the navy has stepped into the roadmap of developing new system and integrating the legacy systems based on Grid architecture.

2 Technical Foundation of the WRM Service-Based Architecture The WRM of a service-based architecture uses OGSA technology as a foundation. Specifically, WRM uses the OGSA discovery, look-up and lifetime services[1]. FigM. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 149–152, 2004. © Springer-Verlag Berlin Heidelberg 2004

150

B. Zeng, T. Hu , and Z. Li

ure 1 provides a conceptual view of the WRM architecture and those external entities that have a direct impact on WRM operations. Weapon resources are shown at the bottom of the diagram. These resources are the hardware, software, data and communications media that support the weapon operations mission.

Fig. 1. WRM Architecture Typically, Weapon resources are in constant operation and the various properties that reflect status are in a constant state of flux. The key element for a cooperative management system is to monitor these internal property transitions. This is accomplished through the use of the event source layer. This layer reflects the fact that WRM employs the grid standard event reporting structure. WRM manager receives events from weapon management wrapper, which functions as Web Based Enterprise Management (WBEM) CIM Object Manager (CIMOM) and the event producers. WRM managers are both producers and consumers of resource status information. An application that is interested in the type of resource information can find the service providers that support its information needs through directory service. Once discovered, the application and the managers enter into a service contract(s) through a negotiation process. The contract is completed and the client sends specific processing instructions to the service provider through the command services at the right of the event sources. This discovery and contract process facilitates a decoupling of the managers and their clients. Manager Services comprises the first true WRM layer. This layer consists of the various managers implemented within the WRM environment. These managers take the form of Grid services made available to various clients through the look-up and discovery process. The Manager Services also contains a small administrative user interface to support the configuration, deployment, and troubleshooting of individual managers. Directly above the Manager Services layer is the Client Application Layer. This layer contains the clients of Manager services. These clients use Globus Directory service[2] to find appropriate managers. The clients enter into service contracts with the managers and manage their performance through a renewal process. Clients also employ “business logic” drawn from the knowledge base to transform manager events into useful consumer information.

Grid-Based Resource Management of Naval Weapon Systems

151

Business Logic Managers share the characteristics of both client applications and manager services. Aggregate managers employ MDS(Meta Directory service) to discover and enter into service contracts with other managers using the same methods as other clients. These aggregates transform the manager events into process/organization specific status through the use of “business logic” and then make this information available in the same manner as other services. The top layer reflects the visualizations provided to the various WRM user groups and the interfaces to external information presentation and dissemination systems such as the Naval Distributed Command and Control Environment. The knowledge base provides a central repository for information relating specific weapon resources to naval operations processes and organizational elements. The knowledge base also reflects the business logic used in the process of transforming data into useful information. It is WRM’s knowledge base that separates WRM from commercially available monitoring and management systems. The WRM knowledge base is a grid service providing knowledge capabilities to WRM consumers. The knowledge base provides a mapping service that links command and control tasks to the specific resources that support those tasks. This mapping allows WRM users configure WRM clients to discover appropriate WRM manager services using only knowledge of specific tasks. The WRM knowledge base extends this mapping capability by providing a mapping of specific resource problems to their operational impact. The knowledge base also provides clients with indicators for these identified problems and provides, on demand, the specific service contract to be used to determine relevant resource status.

3 Grid Services in Weapon Resource Management WRM is a Grid service community with the individual managers being Grid services. Globus’s ability to support WRM is based around four key concepts [3]. Discovery/Registry: Discovery is the process used to find services on the network and finds its way to use the service. Registry services can be used by manager services to join in service directory. WRM uses the discovery/registry methods provided by Globus to support service consumers in finding appropriate service providers, Fault Monitoring: The Globus HBM (HeartBeat Monitor) service provides simple mechanisms for monitoring the health and status of managers. Fault recovery mechanisms, such as automatic restart of crashed daemon processes, will be implemented later for WRM’s reliability, Events: Remote events are the paradigm Globus uses to allow services to notify each other of changes in their state. Because manager itself is a service, it can use remote events to notify interested parties when the set of services available to a community has changed,

152

B. Zeng, T. Hu , and Z. Li

Security: The single sign-on mechanisms for all Grid resources provided by GSI will be shaped in accordance with military standard such as different security solutions, mechanisms and policies (such as onetime passwords), Now we make a case of WRM’s service delivery process. At initialization, WRM managers announce their availability and register with the Grid Information Service. The managers employ a “well defined” set of information (service data) to advertise its availability to support user requirements. When a client queries the Grid Information Service it receives a set of references for the managers that can potentially satisfy the information requirements. The client application then can use this reference to query the managers to determine the best set of information sources. The client then uses the selected reference(ies) to communicate directly to the manager(s). As a result of the negotiation process, the client and the manager service(s) enter into a service performance contract. The service contract specifies the client’s performance model including set of resources, capabilities of resources, problem parameters and the duration of the contract. The duration is managed in the form of a lease. The client is responsible for renewing the lease in order to maintain the service. The manager implements the function of performance monitor to decide if the contract has been violated.

4 Conclusion and Future Works WRM represents one of the first steps toward developing a service-based architecture for command and control. It also provides a framework for developing future capabilities in support of Network Centric Warfare. The paper presents the architecture descriptions that are useful for directing coevolution and also for understanding and controlling the collection of naval weapon systems. With the completion of the core information infrastructure, the near term focus of the research and subsequent development will be in three main areas. These areas include the expansion of the WRM manager, automated information discovery and Quality of Service issues.

References 1. Foster, I., Geisler, J., Nickless, W., Smith, W., Tuecke, S.: Software Infrastructure for the IWAY High Performance Distributed Computing Experiment. In 5th IEEE Symp. on High Performance Distributed Computing (1997) 562–571 2. Foster, I., Kesselman, C.: Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications.11 (1997) 115–128 3. Krill, R., Jerry, A.: Some Fundamental Principles for Engineering New Capabilities into a Battle Force System of Systems. Panel Presentation at 11th INCOSE (2001) 4. Tierney, I., Aydt, R., Gunter, D., Smith, W., Swany, M., Wolski, R.: A Grid Monitoring Architecture. The Global Grid Forum (2002). http://forge.gridforum.org/projects/ggfeditor/document/GFD-I.7/en/1/GFD-I.7.pdf

A Static Task Scheduling Algorithm in Grid Computing Dan Ma1 and Wei Zhang2 1

School of Computer Science in HuaZhong University of Science and Technology , WuHan, 430074, China. [email protected] 2

WuHan Ordnance N.C.O. Academy of PLA, WuHan, 430075, China

Abstract. Task scheduling in heterogeneous computing environment such as grid computing is a critical and challenging problem. Based on traditional list scheduling we present a static task scheduling algorithm LBP (Level and Branch Priority) adapted to heterogeneous hosts in grid computing. The contribution of LBP algorithm lies mainly on working out a new method determining task priority in task ready list. Compare to the influential algorithms in the field of heterogeneous computing environment for instance HEFT and CPOP, LBP algorithm has a better task scheduling performance under the same time complexity.

1 Introduction The task scheduling is still one of the most challenging problem need to be solved urgently either in grid computing or in traditional distributed and parallel computing. In homogonous environment, the researchers have explored many heuristic task scheduling list-based algorithms. These algorithms are classified as two types: one is called as BNP (Bounded Number of Processor) task scheduling algorithm. It supposes that all processors are fully connected and the number of processors is limited. The task scheduling algorithm ISH[1], MCP[2] and ETF[2]etc. belong to this kind of algorithm. Another is called as APN (Arbitrary Processor Network) task scheduling algorithm. It supposes that the processors network is arbitrarily connected and the number of processors is also limited. So it must consider the communication contention because the processors network isn’t fully connected. The task scheduling algorithm MH[3] and DLS[2] etc. belong to this kind of algorithm. The above two types of task scheduling algorithm works in the homogenous environment. But in heterogeneous system (such as grid system) the task scheduling problem is more complex because more factors such as different processor capacity, matching of different language codes and overhead of communication contention etc. are involved in task scheduling. So far, the task scheduling algorithm under heterogeneous condition is not often seen in literature. The influential algorithms are HEFT[4] and CPOP[4] presented by H.Topcuoglu et al. This paper presents a static task scheduling algorithm LBP (Level and Branch Priority) in grid environment. The LBP algorithm may obtain more optimizing perform-

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 153–156, 2004. © Springer-Verlag Berlin Heidelberg 2004

154

D. Ma and W. Zhang

ance (viz. more shorter scheduling length) than HEFT and CPOP under the same condition. At the same time, it doesn’t increase the time and space complexity.

2 The Basic Definition and Model The most common task scheduling model is Directed Acyclic Graph (DAG). A parallel program can be well expressed as a DAG. In DAG, the parallel parts of application program are partitioned as many tasks. There exist communication data among some tasks. Generally, the task scheduling in grid environment may be seen as two independent stages logically. The first stage is task mapping or task assignment stage in which a task is assigned to a certain host. Simply expressing as: denotes an arbitrary task, denotes an arbitrary host, n denotes the number of tasks, k denotes the number of hosts. The second stage is a task scheduling or task order stage. In this stage the all tasks already assigned on respective hosts are decided when to start. Expressing as: denotes the start time of the task A DAG is denoted by graph G=(V,E). denotes n tasks with weight value; denotes the communication edge with weight value. Considering heterogeneous hosts, the computation workload of the task is various on the different hosts. So denotes the computation time of the task executing on a certain host denotes the message size between the task and the task Definition 1. The task node without any parent node is called entrance node. The task node without any child node is called exit node. If there is not only an exit node then only an exit node is named and the other exit nodes are looked as a common node which exists a void edge with zero communication message to the only exit node. Definition 2. The idle hosts that adopted to be scheduled in grid system are fully interconnected. The number of idle hosts is limited. All idle hosts could synchronously carry on the task computing progress and message passing progress. The message size among tasks that are scheduled on the same host is zero.

3 The Proposed Algorithm and Analysis on Its Performance Most static heuristic task scheduling algorithm is based on classic list scheduling ideas. Its basic content may be divided into two independent steps. Step 1. All tasks in task graph are sorted according to a certain priority order and form a task ready list. Step 2. Taking out the head node from the task ready list one by one and scheduling it to a certain processor by employing a special strategy. The algorithm HEFT is a typical static list scheduling algorithm. By analyzing the basic idea of list scheduling, we think the most key factor in step 1 is how to determine the priority of task node. Generally, the two common attribu-

A Static Task Scheduling Algorithm in Grid Computing

155

tions that determine task priority is T-LEVEL (T-LEVEL of the task is the length of the most longest path from the entrance node to the task node and B-LEVEL (B-LEVEL of task is the length of the most longest path from the task node to the exit node ). The T-LEVEL value of the task is involved to the most early start time of and The B-LEVEL value of the task related to the critical path of the task graph. The algorithm HEFT used B-LEVEL as the priority attribution. Different from homogeneous environment, the task executing time is a mean executing time on all different hosts when computing the B-LEVEL. Saying step 2, many algorithms adopt the greedy strategy. The algorithm HEFT does so. Further it permits inserting the current task in the time gap of two scheduled tasks. This insertion undoubtedly increases the overhead of the algorithm. The algorithm LBP mainly improves on the selection of task priority attribution in step 1. In the homogeneous environment, the T-LEVEL and B-LEVEL value is the most important task priority attribution. Especially the B-LEVEL emphasizes that the tasks on the critical path should be scheduled as soon as possible. But in the heterogeneous environment, the task executing time on the different hosts is various. Only adopting the mean executing time to computing the B-LEVEL isn’t wise because the mean B-LEVEL can’t really reveal the relationship between the task and the critical path. In view of the heterogeneity, we present a new way of computing the task priority that is called as Level-Branch Priority. The way determining the task priority described follows: First, computing the Level value of each task. There are two methods to compute the Level value. From the entrance to the exit;(the method isn’t introduced here owing to limit of paper length) From the exit to the entrance: When there isn’t only an entrance node, computing the Level value of every task node according to the sequence from the exit to the entrance. The value is the sum of the edges on the longest path from the exit node to the task node If there are j paths from the exit to task and the value is relevant to the jth path, Define: then Then computing the branch value of each task. is the sum of all out edges weight value of the task viz. is the out degree of Finally, determining the priority of the task according to the and The priority of is higher than if despite or If then comparing the and If then the priority of is higher than Contrarily then the priority of is lower than The whole LBP algorithm simply described as below in a non-formal mode: Input the DAG, determining the priority of any task according to the and Put any task into the task ready list at the decreasing order of the priority of the task While the task ready list is not empty Do Take out head task from the task ready list to begin scheduling. For each host in idle hosts set Do Computing the most early finish time of the task when it is scheduled on the host not considering the insertion the current task into the

156

D. Ma and W. Zhang

time gap between any two scheduled tasks when computing the most early finish time of the task Endfor Scheduling the task on the host that makes it could be finished at a most early time. Endwhile Output the task scheduling gantt chart. Time complexity: The time complexity of the HEFT and CPOP algorithm is O(e*q). e is number of all edges in DAG, q is the number of all idle hosts. The LBP algorithm adopts the same greed strategy to select the idle host as the HEFT and CPOP algorithm. The difference lies on that the LBP algorithm scheduled the current task only after the last scheduled task on the idle host and the HEFT algorithm considering the insertion operation. The time complexity of LBP isn’t greater than HEFT. So the time complexity of LBP is also O(e*q). Scheduling performance: Some simulation experiment were made by adopting small-scale stochastic DAGs. These DAGs compose of task nodes from ten to a hundred. The CCR (Communication to Computation Ratio) of DAGs vary from 0.1 to 10.The two important indices were discussed: the mean run time and the mean speedup. The simulation results reveal that the mean run time of LBP is a little less than HEFT and CPOP when the number of task nodes is great. At the same time, the mean speed-up of LBP is higher than HEFT and CPOP when the number of task nodes is small. With the number of task nodes becoming bigger the mean speed-up of LBP tends to be uniform as HEFT and CPOP.

4 Conclusion The static task scheduling algorithm aiming at the heterogeneous environment isn’t often seen. The HEFT and CPOP algorithm are two influential algorithms. Based on HEFT and CPOP this paper presents a new task priority determining and task scheduling algorithm called LBP. Comparing to HEFT and CPOP, the LBP algorithm may obtain better scheduling performance than HEFT and CPOP without increasing the time and space complexity.

References 1. H.EL-Rewini, T.G.Lewis, H.H.Ali. Task Scheduling in Parallel and Distributed Systems, Englewood Cliffs, New Jersey: Prentice Hall, 1994. 2. Rajkumar Buyya, High performance cluster computing: Architectures and system (volume No. 1).402-406. 3. H.EL-Rewini, T.G.Lewis. Scheduling Parallel Programs onto Arbitrary Target Machines. Journal of Parallel and Distributed Computing, vol.9(2), 138-153, June 1990. 4. Haluk.T, Salim.H, Min-You Wu. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. Transactions on Parallel and Distributed Systems, vol.13, No.3, March 2002.

A New Agent-Based Distributed Model of Grid Service Advertisement and Discovery Dan Ma, Wei Zhang, and Hong-jun Zhang School of Computer Science in HuaZhong University of Science and Technology, WuHan 430074, China. [email protected]

Abstract. Grid computing is becoming a research focus in distributed and parallel systems. The idea of the grid service being a kernel of the whole grid architecture is accepted by the researchers. The grid service has the characteristics of high scalability and dynamic. By analyzing the resource management in heterogeneous environment and agent-based hierarchy model, this paper presents an agent-based distributed model on grid service advertisement and discovery. It can’t only satisfy the scalability and dynamic of grid service, but may also reduce the system overhead comparing to central management of service advertisement.

1 Introduction Geographically wide-area network and many large-scale distributed high-end resources managed by various organization or personnel compose a new corporate computing mode. This new heterogeneous and distributed corporate computing mode is called grid computing or grid. In grid computing architecture, the resources provided to the grid users are abstracted as the grid service. According to important standard proposal-Open Grid Service Architecture (OGSA)[1] presented by GLOBAL GRID FORUM, the concept of grid service is far-ranging. All various kinds of computational resource, storage resource, interconnected network, application program and database etc. is grid service. Advent of the concept of grid service is helpful to eliminate the existed difference among various heterogeneous resources in grid system. However, the grid service management isn’t an easy task, because the grid service in a real grid system should have high dynamic and scalability. This status often causes the problem of looking for the grid service that could satisfy to the performance need of the application users. So providing a valid grid service advertisement and discovery mechanism is necessary. Further, this mechanism oneself should be simple and consume as little system overhead as possible. The software agent is a powerful high-level tool for modeling a complex software system. So it is adapted to implement the advertisement and discovery of grid service. As far, some typical distributed and parallel systems implement different resource management model. They have different characteristics. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 157–160, 2004. © Springer-Verlag Berlin Heidelberg 2004

158

D. Ma, W. Zhang, and H.-j. Zhang

Condor[2]: User agent negotiates with the resource agent by the matcher. The user agent asks the matcher for resource, the resource agent provides resource information to the matcher. The matcher is responsible for making a match between resource provider and resource requestor. Obviously, the matcher becomes the system bottleneck. This brings additional trouble when the system is frequently extended. Globus[3]: In Globus, Meta computation Directory Service (MDS) is adopted to manage static and dynamic information of all resources. This mode may satisfy the need of scalability and dynamics of system resource. Nevertheless, all LDAP servers must be notified once when the state of any a resource is changed or a new resource is added into the system. This will increase the system and network overhead. Agent-based hierarchy model[4]: According to this model, the agents that are responsible for service advertisement and discovery are organized as a hierarchy. When a new service is added into an agent, this agent need to distribute the new service message to its next up-level and down-level agents. When a node in hierarchy asks for a certain service, its agent query to next up-level agent until to the highest level. In a high dynamics scene especially when service need to be distributed frequently the overhead will be greatly increased and the nodes that located in higher level may become the bottleneck.

2 A New Agent-Based Distributed Model The basic idea of the new agent-based distributed model is that the grid service agent exists at every grid node in grid system. Different from the hierarchy model, all agents don’t form the superior or junior level relationship. Every agent has same position in grid system viz. they all are equal. Each agent only maintains its own service information, it also keep a remote service address list that records the address of its neighbor nodes. When the agent needs to look for the service that doesn’t exist at local node it only interacts with neighbor nodes using some service discovery algorithm. This kind of organization structure resembles the P2P mode. So this model is more suited to the scene of highly distributed grid system. The software agent is a perfect high-level tool for modeling a grid system. All agents that manage grid service form a multi-agent system. The software agent provides a coordination platform for service requester and service supplier. Each agent isn’t only a service supplier, but also a service requester. The grid service agent works as a kernel component of service advertisement (register) and discovery management. It composes of a series of functional modules. In addition to some general function modules for example communication modules etc. The basic modules that are mainly responsible for service register and discovery include: the service register and discovery interface, the local service register module, the remote service address list module, the optimizing strategy module for service discovery. The elemental function of these modules is described as below and the sketch map of the agent-based model is omitted. Service register and discovery interface: Service register and discovery interface is an I/O of the whole agent structure. It receives local or remote service request and set

A New Agent-Based Distributed Model of Grid Service Advertisement

159

up a certain service register/discovery instance. Generally, it first adapts standard service description language such as Web Service Description Language (WSDL) to describe request or registered service. Then pass the service request or register parameters to the local service register module. If the requested service is found in local register module then the interface return the service address to local node or remote node. Or return the fail signal. Local service register module: Local service register module itself is a grid service. It mainly takes charge of register of local service. When the interface passes a local service register parameter, this module registers this service. When a service request parameter from local or remote node is passed, the module calls for service discovery method that is encapsulated in the register service to deal with the service request. If the requested service is already registered then return the local address to the interface. If the requested service is not matched to relevant service then pass this request to service fast discovery cache. Remote service address list module: Remote service address list module mainly maintains an address list of all neighbor nodes. It first receives the service request from service fast discovery cache. Then selects an appropriate algorithm from service discovery algorithm sets and a neighbor node address in address list. Finally passes all these parameters to the interface. The interface begins to look for the requested service from the remote node by building service discovery instance. Service discovery optimizing strategy module: Service discovery optimizing strategy module composes of some optimizing strategy components for instance service discovery algorithm sets and service fast discovery cache etc. The algorithm sets collects many service search algorithms such as Depth First Search (DFS) algorithm and Width First Search (WFS) algorithm. Service fast discovery cache reserves some last access remote service addresses and some basic service address which are often accessed. Once the service register module don’t meet the service request, the service request is transferred to cache and begin to match the relevant service whose address recorded in the cache. If matching succeeds then return its service address, otherwise pass this service request to the remote service address list module.

3 Service Advertisement (Register) and Discovery Mechanism Service register: All grid service only is registered in local node. This mode avoids the embarrassment that register server easily becomes system bottleneck. The traditional central service register mode such as Condor or Globus often produces such problem. At the same time, the data of every grid service are only reserved in the local node therefore don’t occupy large storage space. The local service needn’t be registered to any remote node, so save network bandwidth. Grid service register procedure works as: When a local resource want join grid system, it presents service register request to local agent. The service register and discovery interface in local agent first describes the received service register information in standard service description language. Then a service register instance is created. Call for register

160

D. Ma, W. Zhang, and H.-j. Zhang

method and relevant data of service register instance and register service to the local service register module. Service discovery: Service discovery mechanism has two cases--local service and remote service discovery. Local service discovery procedure works as: The local node presents service request. The interface describes service request in standard description language and creates a service discovery instance. Call for discovery method and relevant data of instance to query in local service register module. If matching succeeds then return local address, otherwise Enter the service fast discovery catch to look for reserved service . If matching succeeds then return remote service address, otherwise Start the remote service discovery procedure. Remote service discovery procedure works as: Local service request activates a service discovery instance in interface and no relevant service is found at local node or cache. Select a appropriate search algorithm from algorithm sets and starting address from the remote address list. Call for remote service discovery method of service discovery instance to query remote node one by one. If matching succeeds within life time defined in service discovery instance then return remote node address. Or return fail signal and stop service discovery.

4 Conclusion The kernel of resource management in grid system based on grid service is how to advertisement and discovery grid service. By analyzing existed resource management mechanism we present a new agent-based distributed model of service register and discovery. This model is different from general central management mode and more suits to the scene of highly distributed grid. It could reduce system overhead and save network resource comparing to central management.

References 1. Foster. I, Kesselman. C, Nick. J.M, Tuecke. S. “Grid services for distributed systemintegration”, Computer , Volume: 35 Issue: 6 , June 2002, Page(s): 37 -46 2. R.Raman, M.Livny, M.Solomon. “Matchmaking: Distributed Resource Management for High Throughput Computing”, In Proceeding of IEEE International Symposium on High Performance Distributed Computing, Chicago, Illinois, July 1998. 3. K.Czajkowski, I.Foster, N.Karonis, C.Kesselman, S.Martin, W.Smith, S.Tuecke, “A Resource Management Architecture for Metacomputing Systems”, In proceeding of IPPS/ SPDP’98 Workshop on Job Scheduling Strategies for Parallel Processing. 1998. 4. Junwei.Cao, Darren J. Kerbyson, Graham R. Nudd, “Use of Agent-Based Service Discovery for Resource Management in Metacomputing Environment” In Proceedings of 7th International Euro-Par Conference, Manchester, UK, Lecture Notes in Computer Science 2150, Springer Verlag, 882-886, August 2001.

IMCAG: Infrastructure for Managing and Controlling Agent Grid Jun Hu and Ji Gao School of Computing, Zhejiang University, Hangzhou 310027,Zhejiang, China [email protected]

Abstract. This paper presents infrastructure for Managing and Controlling Agents Grid (IMCAG). The goal of IMCAG is to realize the distributed service integration within Internet. Regarding providing service as the central task, IMCAG creates a transparent integration to make the easy use of every kind of service in heterogeneous, open and dynamic network environment. This paper expatiates IMCAG from three basic aspects: the framework of the Agent grid information communication; the core control mechanism of Agent grid; the individuation of adjusting with Agent service, and discusses application of IMCAG by a test. The paper concludes that IMCAG will become more mature and complete along with the development of related technique and application in grid. Keywords: Agent grid, web services, Agent federation

1 Introduction How to make use of various computing resources of Internet and how to make virtual organizations in Internet to realize the cooperative work are becoming one of research hotspots. These researches focus on two aspects: 1), Taking Web services as basic elements of the new generation WEB to realize the distributed service integration and cooperative work. But the controlling granularity for Web service is too small, can’t support the systematical construction of system and policy on knowledge level to adjust and control their behavior.2), Regarding ABC (Agent-Based Computing) and CoABS [1] (Control of Agent Based Systems) items as representation. They import Agent grid as means to control Web service and develop the DAML-S [2] language to describe the Web service that Agents can provide. However, most research items are limited on particular aspect or partial problem of managing and controlling Agents Grid, lack the complete theories and method system to guide the systemic development of foundation facility. For this, this paper presents IMCAG: Infrastructure for Managing and Controlling Agents Grid. Applying Agent grid as upper level construction of Web service by Agent and MA (Multi-Agent) technique, it is easy to individually establish the cooperative work system and behavior-restricted policy of Agents on knowledge level, and validly control the quantity and performance of Agent service. Our study focuses on M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 161–165, 2004. © Springer-Verlag Berlin Heidelberg 2004

162

J. Hu and J. Gao

three fields: the semantics of the information context, the core managing and controlling mechanism of Agent grid, transparent managing and controlling of agents service. These make up of a serial study for theory and methodology of the Agent grid in the web environment.

2 IMCAG Architecture Figure 1 shows Agent grid architecture. The whole Agent grid constitutes with various nested AFs (Agent federation) and ACs (Agent cooperative group)[3][4]. AF constitutes with one MA (manage Agent) and some member’s Agents and acquaintance’s Agents, The MA and acquaintance’s Agent or other AF can constitute AC by negotiation for completing a certain service.

Fig. 1. Agent grid architecture

From managing and controlling Agent social behavior angle, IMCAG system is described with a three-element set as following: IMCAG=(CE, MM, SI), MM=(FM, AS, NR) CE----the semantics of the information context, through based-ontologies modeling and expressing mechanism, make exchanging information contents of Agents having clear semantic meaning. It is the foundation of managing and controlling Agents grid. MM----the core mechanism of managing and controlling Agent grid, including three parts: FM--the Agent federation management, it manages the whole cooperative process of Agent; AS--the Agent assistant system, through establishing assistant system for Agent sociality, it makes Agent to conveniently gain the needing services at anytime and anywhere; NR--the Agents rational negotiation, through integrating expression of negotiation contents and ratiocination into the negotiation process, Agent can rationally and flexibly boosts negotiation process based on the Agent social knowledge, and negotiation protocols, negotiation reasoning and decisionmaking model, to obtain the higher consultation intelligence. SI----the transparent managing and controlling of agents, through applying interface Agent as intelligence middleware of man-machine intercourse, it offers convenient means to adjust and control Agent service for customer.

IMCAG: Infrastructure for Managing and Controlling Agent Grid

163

And then, this paper expatiates the logic system and the related essential element of IMCAG through three levels of structures that are mentioned above.

3 The Semantics of the Information Context IMCAG describes the semantics of information context with five-element set: CE = (OKRL, OML, Mapping, ICMT, OAFM), OKRL =(WSL, CDL, CPL), OKRL---- (Ontology Based Knowledge Representation Language) is used in inner part of Agents; OML---- (Ontology Based Markup Language), be used as the communication language among Agents; Mapping----mapping mechanism between OKRL with OML; ICMT---- modeling tool sets; OAFM----the automatically forming mechanism of ontology.

Fig. 2. IMACG modeling frame

The modeling frame is shown in Figure2. OKRL represents knowledge, which is needed by agent when it launches social activity based on web services, from three aspects: description of web services requiring and offering (WSL); applied area ontology (CDL); definition of restricted policy of Agent behavior (CPL). OML is designed as limitary XML. OML contains the descriptive ability of OKRL.

4 The Core Mechanism of Managing and Controlling Agent Grid Agent federation management Adopting activity sharing oriented joint intention as main line [5], AF manages the whole process of Agents cooperation. Joint intention resolves activity to sub-activities and dispatches these sub-activities to corresponding agents. MA of AF control joint intention by Recipe and centralized manage and schedule activity sharing. The Agent assistant system is described with a five-element set as following: AS = (QWS, MSPC, MACM, MAS). QWS----Querying Web services; M S P C - - - Mid-service public center; MACM----Middle Agent cooperation mechanism; MAS----Middle Agents (MA) MA = (WSAR, CMM, MSM). WSAR--Web services

164

J. Hu and J. Gao

advertisement warehouse; CMM-- compatible matching mechanism between QWS with WSAR; MSM--Mid-service mechanism. Agent assistant system offers assistance for Agents from two levels. The first level is MSPC, it manages middle Agent, suggests middle service provider; the second level is MAS, it suggests Web services provider. This double level assistant service is realized by MACM. Agent rational negotiation is described with a five-element set as following: NR = (NP, NE, RNC, MMA, IE). NP----negotiation protocols that be accepted by both parties; NE---Negotiation engine, it boosts negotiation process according to negotiation protocols; RNC----Representation of Negotiation Content, adopt the CDL defined description format; MMA----Mental Model of Agent, used for describing the Agent social faith, domain knowledge, Negotiation state information, and the reasoning knowledge; IE----Inferential Engine, it is divided into three levels: Evaluation, strategy and tactics, decides the Agent negotiation behavior and contents. The mental state model makes Agents to rationally decide negotiation behavior and contents that should be adopted; The description of negotiation contents based on ontologies, make Agents that participate the negotiation have the common semantics of the negotiation contents; negotiation protocols establishes communication rules that must be obeyed for Agents that participate the negotiation.

5 The Transparent Managing and Controlling of Agents Service IMCAG describes adjusting will with restricting policy which established by user, defined with Concept-Definitions of CDL and obeyed by Agent federation when it offers Agent service, named customer policy. Customer can control the Agents behavior indirectly by establishing customer policy. The adjusting of Agent service is described with a five-element set as following: SI = (IA, TS, PS, IR, PT). IA----interaction Agent; TS----computing task set started by customer, PS----policy set established by customer; IR----Agent service controlling mechanism; PT----tracking mechanism of Agent service offering process. The customer starts the desirable Agent serves by IA, specifies the restricting policy obeyed by Agent federation when it offers Agent service, and track the Agent service offering process; IA then starts task that need customer cooperation, ask for instructions of some difficult problem or send out important messages to customer.

6 Conclusion We have validated IMCAG by an instance of test---a minitype conference arrangement. It shows that IMCAG establishes infrastructure of Agent social grid, provide the solution of cooperative work among various virtual organizations on the Internet. IMCAG completely and integrallty dissertate the theories and methodology of managing and controlling Agent grid. IMCAG will become integration mechanism of various Agents social grid technique.

IMCAG: Infrastructure for Managing and Controlling Agent Grid

165

References 1. Dylan Schmorrow. Control of Agent-Based Systems (CoABS). http://www.darpa.mil/ ipto/programs/coabs/index.htm. 2. DAML Services Coalition. DAML-S: Web Service Description for the Semantic Web. In The First International Semantic Web Conference (ISWC), June 2002 3. Gao Ji, Lin Donghao. ASOJI: An Agents Based Controlling Integration Method. Pattern Recognition & Artificial Intelligence,2000,13 (2): 151-158. 4. Gao ji, Wang Jin. ABFSC: An Agents-Based Framework For Software Composition. Journal of computers, 1999,21 (10): 1050-1058. 5. Gao Ji, Lin Donghao. Agent Cooperation Based Control Integration by Activity-Sharing and Joint Intention, JCST, 2002, 17(3), 331-340.

A Resource Allocation Method in the Neural Computation Platform Zhuo Lai, Jiangang Yang, and Hongwei Shan Department of Computer Science, Zhejiang University Hangzhou, China 310027 [email protected]

Abstract. A resource management framework was designed for a neural computation platform based on the Grid technology. The Metacomputing Directory Service (MDS) in Globus toolkit was employed to locate the resources in the Grid. A kind of semi-structure data model was adopted to encapsulate the schema, data and query of tasks and resources. The position in the hierarchy and waiting time of tasks were taken into account to sort the tasks beforehand and those tasks chose compatible computing nodes in that order.

1 Introduction Neural networks have inspired many scientists to propose them as a solution for various problems. NCP (Neural Computation Platform) is developed to relieve the burden of implementing all kinds of neural network models from scratch. Because he training of a particular neural network involves huge amount of data, we used the idea of Grid Computing [1] to construct a distributed system. And the purpose of this paper is to present an autonomous resource allocation method used in NCP. Figure 1 shows the overview of the resource management system in the NCP. Metacomputing Directory Service (MDS)[3,4] provides information services in Globus project [2]. The platform status can be listed through queries of MDS. The GIIS provides a method to combine various GRIS services together and a consistent Grid resource system image, which facilitates queries from Grid application. GRIS and GIIS are both components relating to MDS in Globus toolkit. There exists a service cache in GIIS. Resources can register themselves either through GRIS or directly access GIIS. Platform users can also request for resources to GIIS. If the caches have expired, it will acquire latest information from GRIS. The Resource Information Collecting procedure (in Figure 1) acquires the latest resource information from MDS periodically and stores them to Resource Information database. Task Status collecting procedure (in Figure 1) also stores task status to Task Status database. Two queues have been created: map-ready-queue and map-urgentqueue. A task activation program checks the tasks and if needed data acquired, the tasks will be placed into map-ready-queue. Resource allocation for these waiting tasks will go on next. When matching starts, the weight of the task in a waiting queue is calculated first and resource allocation refers to it. If no proper computation node

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 166–169, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Resource Allocation Method in the Neural Computation Platform

167

found, the task still stays in the queue but its weight is incremented. The task will be placed into map-urgent-queue if its weight exceeds a limit.

Fig. 1. Resource management system in the computation

2 Sorting and Mapping Algorithms 2.1 Data Model A semi-structure data model called Classified Advertisement (Classad)[5] is adopted by NCP. Classad is flexible and extensive, which encapsulates resource queries into the data model. Figure 2 shows the formalized representation of Classad in our platform [5]. A Classad may contain the following items: attribute, constraint, rank. Constraint exists between computing nodes and tasks.

Fig. 2. Formalized description of Classad They can both set limits to the other. Rank embodies the definition of QoS from tasks so that users can define different QoS rules under different circumstances. The most outsanding characteristic of Classad is that it allows computing nodes to define

168

Z. Lai, J. Yang, and H. Shan

their own policies. Those computing nodes will reject any tasks that conflict with policies.

2.2 Pre-sorting Tasks The NCP is a distributed system on which several sub-tasks from a particular task may run at different computing nodes. These subtasks are data-dependent. Directed Acyclic Graph (DAG) [6] can depict their relations.Those sub-tasks are executed in the order depicted in DAG. Subsequent sub-tasks can get the execution chance only when earlier subtasks have finished, so earlier subtasks should get more chances to choose proper computing nodes. In NCP, a DAGP is defined which represents the priorities of subtasks in DAG. An exit node set (ens) is created which contains the lowest level nodes in DAG. For those nodes si not at the lowest level, an immediate successor set (iss) is created which contains the nodes immediately below si and have data-dependent on it. So for each si, DAGP is calculated as follows:

In fact, the DAGP value of a sub-task is the distance from its position in DAG to the exit. A Waiting Time Priority (WTP) is defined in our neural computation platform. If task T failed in a match, it stayed in the queue and waited its next chance and the WTP value of it shifted left one bit.

The t in expression 3 and 4 above represents the number of matches has been done on task si. The changing of WTP gets rid of the possibility that a task can not acquire proper resources for its low priority. And the other advantage is that WTP is a complement of Classad. Although Classad endows computing nodes with the power rejecting tasks, it may prevent tasks finding a proper node for ever. So in our platform NCP, when WTP exceeds a predefined number, the task will be placed on the node it chooses.

2.3 Mapping Algorithm As a task and a computing node both own its Classad, a first match is done on the compatibility of the two Classad. A task can only run on a node without Classad conflicts. In Classad, self.attribute represents the attribute of itself while other.attribute represents the attribute of the other. For example, In a task’s Classad, other.Memory refers to the Memory attribute of the computing node with which the compatibility

A Resource Allocation Method in the Neural Computation Platform

169

test will be done; self.Datasize refers to the Datasize attribute of the task itself. Two Classad are compatible if and only if both constraints are true. The priority value of task si is calculated by adding DAGP and WTP of it. Tasks with higher priority can choose node earlier.

If p tasks is waiting for resources and q nodes available, Ti (i D(p,m) The distance between root and new member is greater than the distance between proxy and new member 2. For all proxies that satisfy condition 1, There can be more than one proxy satisfying the condition 1, and if so, choose the ones which are nearer to the new member. 3. Path(Parent(p),m) contains Path(p,m) The path between proxy and member is a prefix to the path between proxy’s parent and member. The path need not be shortest path. In the pictures Fig 2.a, Fig 2.b & Fig 2.c, S is source, R1,R2,R3.... are routers and P1.P2... are proxies joining the overlay in 1,2... order. A node joining can cause the following cases. 1. Simple Direct Join: New proxy Pn gets a P, which is very near to it, and introducing this proxy into overlay doesn’t affect, rest of the tree structure. Figure 2a depicts this scenario. In this scenario, the algorithm selects S as the parent of P1 and when P2 also selects S as its parent, the overlay relation of S and P1. 2. New proxy selects a parent P such that is in the path of P’s parent and P. In this case, in fig 2b, the algorithm first selects P1 as the parent of P3 as per condition 1, but to meet the condition 3, it has to reorder the parent-child relationship, ie P3 becomes parent of P1 and child to R. becomes parent to some of the children proxies of P. In the 2c 3. New proxy picture above, P4 selects S as its parent and since P4 is the child of S and is in the path of S and P3, P4 takes over as parent to P3.

Fig. 2.a

Fig. 2.b

Fig. 2.c

Appcast – A Low Stress and High Stretch Overlay Protocol

Fig. 2.a1

Fig. 2.b1

363

Fig. 2.c1

2.1.2 Appcast Tolpology Creation Algorithms Any new proxy, joining the group sends a join message to the root. The root invokes the function “FindNearestProxy” which returns the proxy that is closest to the node. Root then calls “FindRelations” to fix the relationship of the new joining proxy and others in the overlay tree. The well-known Dijkstra’s algorithm finds the shortest paths from a source to any/all destinations (vertices) in a graph. Dijkstra’s algorithm keeps two sets of vertices. 1. The set of vertices that must be the part of the path from source to destination and 2.The remaining vertices that can be part of the path. The algorithm terminates once the required destination joins the first set, in case it has to find path between source and destination or once the set becomes null, in case it has to find paths between source and all destinations. To find the nearest proxy, we keep one more set of vertices – set of all proxies. We change the algorithm; such that the algorithm terminates once it reaches any vertex that belongs to this set. The complexity of Dijkstra’s algorithm is based on two operations – 1. Find minimum, with complexity O(N) and 2. Change the label with complexity O(m). In addition to this, in our algorithm, we have to find whether the selected node with minimum index that belongs to the set of proxies. This has a complexity of O(k). These three steps are performed N times and so total complexity is There are many ways in which Dijkstra’s algorithm has been implemented using data structures like binary heaps to reduce the complexity and to achieve better performance. It can be applied to this modified Dijkstra’s algorithm also. 2.1.3 Appcast Optimization In Appcast, a proxy joining the multicast group selects the very first proxy that it comes across while finding the path from itself to the root. This approach definitely ensures that the path length from joining proxy to parent proxy is lesser than the path length from joining proxy to root. However, if we take into consideration the actual path length from root to this new proxy (along the proxies), the path length would be higher. The performance results clearly showed that Appcast uses very few over all links/hops. At the same time, it also showed the maximum application level path lengths and maximum stretch. To keep the stretch and stress at an optimum level, the Appcast_opt algorithm is proposed. In this, a joining proxy can specify how many children (stress) it can accept and how much stretch (delay) it can bear. 2.1.4 Control and Data Paths The root keeps information of all proxies. Every proxy keeps information about its parent and its children. Also, every proxy keeps track (heart beat) of its children and

364

V. Radha, V.P Gulati, and A.K Pujari

parent. If any proxy is down, immediately, its children contact the root and try to hook to the parent of the downed proxy. It is the root, which tells the children about their new parent, keeping all constraints satisfied. Whenever, a new proxy joins or leaves, few other proxies also will be informed by root to change their relationships, so that the constraints are satisfied. In Appcast, data can be flowed bottom-up and top-down across the Appcast topology tree. To avoid loops, each node checks from which it received the data and accordingly forwards to selective children and parent. Whenever a proxy receives a packet, it checks from whom it received. If it received from it’s parent, then it forwards packets to all its children. If it received packet from one of its children (ie not parent), it forwards the packet to its own parent and to all its children except the child from whom it received as in Algorithm 5 – MulticastForward given in Table 1.

Appcast – A Low Stress and High Stretch Overlay Protocol

365

366

3

V. Radha, V.P Gulati, and A.K Pujari

Other Application Layer Multicast Protocols

The general purpose of creating a topology is 1.To distribute the data packets and 2.To send control information to manage the topology. Some protocols use the same topology for both the purposes, while others use separate topologies like tree and mesh. ESM[2], YOID[18], Scattercast[3], Overcast[4] create mesh and tree topologies, with mesh for controlling purpose and tree for distribution purpose. HMTP[12] and TAG create tree for both control and data distribution purposes. NICE arranges the hosts into a hierarchy of clusters. All these protocols take proximity metrics like rtt (round trip time), shortest path, maximum common path overlapping etc into consideration while creating the topology. We consider HMTP, TAG and NICE for comparison purpose with Appcast and hence describe the same in this section.

3.1

Host Multicast – HMTP

HMTP[12] creates group specific tree topology as the multicast overlay topology. In HMTP, each multicast group requires a Host Multicast Rendezvous point that acts as a contact point for new members to join the group. HMTP Clusters nearby members together. Members choose their parent closer to them, by using the following procedure. 1. New member sets the root as potential parent (PP) and contacts PP. 2. Query PP to discover all its children and measure its nearness to PP and PP’s children 3. Find the nearest member among the PP and PP’s children except those marked as invalid. If all of them are marked as invalid, pop the top element from stack, set it as PP and return to step 2. 4. If the nearest member is not current PP, push current PP onto stack; set the nearest member as new PP and return to step 3. 5. Otherwise send join request to PP. If PP accepts it as a child, it becomes child of the PP; if rejected mark PP as invalid and return to step 3 (PP may not accept it as its child due to many reasons like – out degree); otherwise parent found and so establish unicast path. HMTP proposed member leave, link failure and improvement algorithms also. In HMTP, every member keeps track of every other member that falls in the path of member and root. So, the average control overhead for HMTP is O(max degree), ie the maximum children a node has.

3.2 NICE NICE[13,15,16] claims, relatively small control overhead. Its motivation is actually from key distribution in a secure group communication. NICE arranges set of end hosts into a hierarchy. The hierarchy implicitly defines data Path. Each member maintains soft state information about other hierarchically nearer members and has only limited knowledge about other members. In NICE, all members belong to Layer 0. Members are grouped into clusters with size between K and 3K, where K is

Appcast – A Low Stress and High Stretch Overlay Protocol

367

constant. For each cluster, one of the cluster members acts as a leader and enters into higher layers. A member is part of layer if it is leader in all levels. A cluster leader has minimum maximum distance from all of its members. A host belongs to only a single cluster at any layer. If a host is not present in layer it can’t be present in any layer where j>i. For a group size N, and cluster size K, there can be at most layers. Each member maintains information about every other member of it’s own cluster in all of its layers. NICE constructs an overlay tree, before it clusters the group members and arranges them into a hierarchy. NICE constructs an overlay tree based on the underlying network topology. Next, it uses a clustering protocol to group the members into clusters of size K to 3K-1, where K is a constant by traversing the overlay tree bottom up. This clustering is basically to reduce the depth of the tree and to keep control overhead cost to be constant. As the cluster size increases, unicast with in the cluster may increase. NICE doesn’t give flexibility to the joining member, to choose its leader. Since NICE has made the cluster size constant, the control overhead in NICE is constant. Similarly, NICE can deliver the data to the members in at most O(log N) application hops.

3.3

TAG – Topology Aware Group Communication

TAG uses information about path overlap among group members to construct the overlay tree. In TAG each new member of multicast group, determines the path from the root to itself and finds out its parent and children by partially traversing the overlay tree. TAG proposed complete path matching algorithm, where in a new node selects one as its parent, which shares the maximum common path with it. Each TAG node maintains a Family Table, with information about its parent and children. The path-matching algorithm traverses the overlay tree from root down the children, matching the paths from the root to new node with the path from root to TAG node. It considers three mutually exclusive cases. Let N be a new member wishing to join and C be the node being examined. Then the three cases are 1. There exists a child A of C, whose path is a prefix for the path N, with the condition that the path length of N > A > C. In this case N chooses node A, and traverses the sub-tree rooted at A. 2. There exist children of C, who have the path of N as the prefix, in their path. In this case, N becomes child of C, with all as its children. 3. In case, there’s no child of C satisfying the cases 1 or 2, N becomes the child of C. As an optimization method, TAG proposed partial path matching algorithm, where in, instead of matching the complete path of a new member, a predefined number of elements in the path are matched. This helps reduce the depth of the tree.

4

Comparative Study

The evaluation criteria for multicast protocols have been defined in terms of stretch, stress and control overhead. Stretch is defined per member as the ratio of path length from the source to the member along the overlay to the length of the direct uni-cast path. Stress is defined per link or node as the number of identical packets sent by the

368

V. Radha, V.P Gulati, and A.K Pujari

protocol over that link or node. Control overhead is defined as the extra computing required to maintain the topology. Native multicast protocol achieves unit stress and unit stretch. Though Application level multicast protocols are not able to achieve this, they try to balance. While reducing stress will balance the load at nodes, it may increase stretch. Reducing stretch will increase the stress. Both are inversely proportional. The protocols (CAN, Bayeux, DTProtocol etc) that have no knowledge of the underlying topology suffer poor performance and can help only in sharing and distributing the load of the source across the members. The mesh based protocols like ESM and Yoid suffer from control overhead and are not suitable for large groups. The tree and hierarchical topologies like HMTP, TAG and NICE are able to contain the control overhead and at the same time performing well. The following table shows the intuitive comparison metrics.

4.1 Simulation and Results For comparison purposes we considered only TAG, NICE and HMTP as our future work would be based on tree topologies. The figures 3b, 3c, 3d show the overlay topologies created by them, when taken the network shown in figure 3a, with R1,R2,R3....R10 routers and S,A1,A2...A5 nodes, with the order of joins A3, A4, A5, A1 and A2. Since TAG chooses its parent based on the longest path match over shortest path from node to root, A3, selects A2 as parent, though A1 is nearer to it. Order of joins, matter a lot for the performance of HMTP. Since ‘A1’ joined in the last, it just took the one as its parent, which is nearer to it ie A3, without checking whether it is on the way between S and A3. NICE groups nearby members into clusters and arranges these clusters into a hierarchy. We used Boston University’s Network Topology generator - Brite to simulate our experiments. BRITE generates different kinds of network topologies based on the models - Flat Router Level models (Router Waxman, Router BarbasiAlbert); Flat AS Level models (AS-Waxman, AS-BarbasiAlbert) and Hierarchical models (Transitstub, tiers). First, we generated 100 nodes in AS model and assigned 20 hosts to these nodes. In this experiment, HMTP showed higher stress and lower stretch. TAG showed even more higher stress and less/no stretch. NICE, with cluster members fixed to 3, almost showed similar result like HMTP. Similar experiments have been conducted on network topology with 1000 nodes and with varying group memberships of hosts. Figures 4a, 4b, 4c and 4d show the results. HMTP used over

Appcast – A Low Stress and High Stretch Overlay Protocol

Fig. 3b. TAG

Fig. 3a.

Fig.3c. NICE

Fig. 4a.

Fig. 4c.

369

Fig. 3.d. HMTP

Fig. 3e. Appcast

Fig. 4b.

Fig. 4d.

all less hops and TAG used almost similar hops like unicast. This is because - TAG node will select a parent, which has maximum overlapping shortest path with it. In other words, TAG doesn’t look into alternative paths. This makes all nodes select the source itself as their parent. Very few nodes get nodes other than source as parent. Same is the reason TAG showing application level hops almost similar to unicast. NICE, while showing less over all hops compared to TAG, showed the higher application level hops compared to TAG and HMTP. This is because, with in clusters, NICE uses normal unicast among the cluster members and clusters leaders. As the group size increases, application level hops increase tremendously for NICE. Appcast

370

V. Radha, V.P Gulati, and A.K Pujari

is the one, which used the less number of hops. However, it is the one, which used maximum application level, hops. This is because; it doesn’t use any mechanism to control the tree depth. For this reason, an optimized version of Appcast protocol has been proposed, in which each joining host can specify the stretch parameter – the ratio between unicast hops and application level hops. As far as stretch is concerned, TAG showed less stretch and Appcast showed high stretch. NICE showed less stress and TAG showed high stress.

5

Conclusions and Future Work

The proposed application level multicast protocols basically differ in the overlay topology creation and distribution of data over the same. While studying the existing protocols, it has been found that mesh based systems are complex to maintain and tree based systems give good performance and less control overhead. In both the treebased systems ie TAG and HMTP, new joining node traverses the tree from root, down the children. While TAG is relying on shortest path, HMTP relies on shortest distance. These features some times may lead to overlapping links. We proposed a new method that allows the joining node to select a parent, which is on its way to the source. On the proposed new topology-building algorithm, we plan to use SOAP as application level transport mechanism and implement certain applications.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10]

Dering, S. And Cheriton, D. Multicast Routing in Datagram Internetworks and Extended LANS. ACM Transactions on Computer Systems 8, 2 (May 1990) Hua Chu, Y., Rao, S., And Zhang, H. A Case for End System Multicast. In Proceedings of ACM Sigmetrics ’00 (Santa Clara, CA, June 2000) Chawathe, Y. Scattercast: An Architecture for Internet Broadcast Distribution as an Infrastructure Service. PhD thesis, University of California, Berkeley, Dec. 2000 Jannoti, J., Gifford, D.K., and Johnson, K.L. Overcast: Reliable Multicasting with an Overlay Network. In Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI) (San Diego, CA, Oct. 2000), USENIX Prabhakar Raghavan, Beyond Web Search Services – IEEE Internet Computing, MarApr 2001 Peter N. Yianilos; Sumeet Sobti; The Evolving Field of Distributed Storage; IEEE Internet Computing, Sept-Oct 2001 D.Cheriton and S.Deering, “Host Groups: A multicast extension for Datagram Internetworks”, DataCommun. Symp., Sept. 1985, pp.172-79 A. Shaikh, M. Goyal, A. Greenberg, R. Rajan, and K. K. Ramakrishnan. An OSPF Topology Server: Design and Evaluation, 2001. http://www.cis.ohio-state.edu/mukul/research.html. D. Pendarakis et al., “ALMI: An Application Level Multicast Infrastructure,” 3rd USNIX Symp. Internet Tech. and Sys., Mar. 2001. J. Liebeherr, M. Nahas, and W. Si, “Application-Layer Multicast with Delaunay Triangulations,” IEEE GLOBECOM ’01, also tech. rep. CS-2001-26, Nov. 2001.

Appcast – A Low Stress and High Stretch Overlay Protocol

371

[11] S. Zhuang et al., “Bayeux: An Architecture for Scalable and Fault-Tolerant Wide-Area Data Dissemination,” 11th Int’l. Wksp. Net. and Op. Sys. Supportfor Digital Audio and Video, June 2001. [12] B. Zhang, S. Jamin, and L. Zhang, “Host Multicast: A Framework for Delivering Multicast to End Users,” IEEE INFOCOM ’02, New York, NY, June 2002. [13] S. Banerjee, B. Bhattacharjee, and C. Kommareddy, “Scalable Application Layer Multicast,” ACM SIGCOMM ’02, Pittsburgh, PA, Aug. 2002. [14] M. Castro et al., “Scribe: A Large-Scale and Decentralized Application-level Multicast Infrastructure,” IEEE JSAC, 2002. [15] S. Banerjee and B. Bhattacharjee. Analysis of the NICE Application Layer Multicast Protocol. Technical report, UMIACSTR 2002-60 and CS-TR 4380, Department of Computer Science, University of Maryland, College Park, June 2002. [16] S. Banerjee and B. Bhattacharjee. Scalable Secure Group Communication over IP Multicast. In Proceedings of Internation Conference on Network Protocols, Nov. 2001. [17] S. Ratnasamy,M. Handley, R. Karp, and S. Shenker. Application-level multicast using content-addressable networks. In Proceedings of 3rd International Workshop on Networked Group Communication, Nov. 2001. [18] P. Francis. Yoid: Extending the Multicast Internet Architecture, 1999. White paper http://www.aciri.org/yoid/. [19] Minseok Kwon, Sonia Fahmy. Topology aware group communication. In NOSSDAV’02, May 12-14, 2002, Miami, Florida, USA.

Communication Networks: States of the Arts Xiaolu Zuo Splendidsky Networkings FreeResearch, United Kingdom [email protected]

Abstract. This paper presents the states of the arts of communication networks. Firstly, fundamentals of communication systems are presented, particularly, these of data/computer communications and ISDN networks. Then, the latest developments of communication networks are highlighted, including active/programmable networks, networking for ubiquitous computing, ad hoc networking, and autonomic computing for network infrastructures. Finally, outlooks are summarized.

1 Introduction New paradigms of networking have been constantly emerging, e.g., intelligent networks [1] [2], next generation networks [3] [4], active networks [5] [6], etc. They require network management must enable an integrated management of highly complex heterogeneous network infrastructures (wired, wireless, ad hoc, edge, GRID) and to realize ubiquitous user connectivity, i.e., any where, any time, any devices and any service contents.

2 Network Fundamentals Communications is the exchange of information between individuals or machines over distance. A basic model of communication can be depicted in Fig.1.

Fig. 1. Generic model of communication systems

In terms of information, there are voice communication, image communication, data communication, and multimedia (voice, texts, characters, symbols, images, graphics, data, etc.) communication. In terms of signals, there are analogue communication, and digital communication. In terms of services, there are telegram, telephone, M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 372–379, 2004. © Springer-Verlag Berlin Heidelberg 2004

Communication Networks: States of the Arts

373

telefax, data communication, broadcasting, TV, navigation, tele-sensing, telemetering, remote control, video conference, etc. In terms of transmission media, there are, tethered communication (twisted pairs, shielded twisted pairs, coaxial cables, fiber optic cables), wireless communication (air (microwave, mobile, pager), space (satellite), sometimes water). In terms of bandwidth, there are, narrow-band communication, broadband communication. The bandwidth of a telecommunication system determines the type and amount of information that can be transmitted.

2.1 Data Communication Networks Data communication networks are usually classified by their size and complexity, e.g., Local Area Networks (LAN), Metropolitan Area Networks (MAN), Wide Area Networks (WAN). Compound topology is widely used, as depicted in Fig. 2.

Fig. 2. Network segments are interconnected via telecommunication network

Internets are a huge collection of computer networks at local, national and international levels, a combination of LAN’s, telecommunications trunks, switching facilities and public dial-up facilities. Internets are across telephone lines, cables, fibre optics and satellites, have low cost telecommunication, and their signals are sent in real or actual time.

2.2 Wireless Communication Networks Wireless telecommunication facilitates the access to information anywhere and anytime. Examples of wireless technologies are two-way radios, mobile telephones, cellular telephones, and satellites. Fig. 3 illustrates their mobility and data rates. Wireless LANs (WLANs) provide something wired ones cannot: mobility. This mobility and the attendant flexibility it provides to the computer user are what make wireless computing environment so attractive on e.g. campuses and in libraries. WLANs use access points to receive and transmit radio signals to and from user’s computer or other device, the user’s device has a special card that contains a small

374

X. Zuo

Fig. 3. Network types of wireless communication

radio transmitter and receiver. The access point is hard-wired to the LAN and via that to the Internet [7]. The emerging satellite technology gives a new perspective for a universal access to the broadband infrastructure, potentially alleviating the prohibitive cost of serving every user by terrestrial digital networks. Terrestrial network infrastructure plus satellite communications could in the global information infrastructure [8], as illustrated in Fig. 4.

2.3 Integrated Services Digital Network (ISDN) ISDN is a high speed, high capacity, and high quality multimedia communication. ISDN is designed to provide greater numbers of digital services to telephone customers, such as digital audio, interactive information services, fax, e-mail, and digital video, as depicted in Fig. 5.

Fig. 4. Global information infrastructure

Communication Networks: States of the Arts

375

Fig. 5. Integrated Services Digital Network (ISDN)

There are two types of ISDN. The original version of ISDN is now called Narrowband ISDN (N-ISDN), which employs baseband transmission. Another version, called Broadband ISDN (B-ISDN), uses broadband transmission, which supports higher transmission rates. Asynchronous transfer mode (ATM) can handle data transmission in both connection-oriented and packet schemes. B-ISDN is an ATMbased multi-service digital network, which can not only support high transmission rates, but can also allow different applications or multimedia streams to be transmitted simultaneously in an integrated manner. The main characteristics of B-ISDN include the capability to provide many types of services (so far offered on different networks) and a high multimedia content of the services.

3 Some Latest Developments in Communication Networks 3.1 Active/Programmable Networks Active/programmable networks allow their users to add customized programs into the nodes of the network. For instance, packets could be replaced with program fragments that are executed at each network router/switch they traverse [9]. Active/programmable architectures, as depicted in Fig. 6 permit a massive increase in the sophistication of the computation that is performed within the network. They will enable new applications, especially those based on application-specific multicast, information fusion, and other services that leverage network-based computation and storage. Furthermore, they will accelerate the pace of innovation by decoupling network services from the underlying hardware and allowing new services to be loaded into the infrastructure on demand. An active/programmable network enables users to customize live networks by providing programs/points with data [5] [6]. It will be fully self-customizable, will facilitate operator system integration by replacing and dynamically upgrading proprietary router software and facilitate end to end service creation. As a result, users can directly program switches and other devices within the network so as to meet

376

X. Zuo

their requirements. At the same time the network should continue to be robust and resilient to technology faults, human error and malicious attacks. Next generation networks will be active/programmable as a minimum.

Fig. 6. Architecture for active/programmable networks

3.2 Networks for Ubiquitous Computing Networks have to support applications in Ubiquitous Computing environments, such as home appliances, building access control, hand-held mobile access, personal working environment, car - office - home connectivity, etc. With ubiquitous computing people can work with full access to communication, data, and computing from any location at any time. Two scenarios are depicted in Fig. 7 and 8.

Fig. 7. Built-in and external access networking systems for home

Communication Networks: States of the Arts

377

Fig. 8. On-board and external access networking systems for car

In ubiquitous computing environments, computers will be embedded in our natural movements and interactions with our environments — both physical and social. Ubiquitous computing will help organize and mediate social interactions wherever and whenever these situations might occur. The idea of such an environment emerged more than a decade ago in Weiser’s seminal article and its evolution has recently been accelerated by improved wireless telecommunication capabilities, open networks, continued increases in computing power, improved battery technology, and the emergence of flexible software architectures [10] [11] [12] [13].

3.3 Ad Hoc Wireless Networking Ad Hoc wireless networking supports a rapid and temporary, but reliable formation and configuration in collaborative computing and collaborative works, meetings, etc. Ad Hoc wireless networking is created on demand in order to enable the communications between the mobile hosts equipped with the wireless devices. Before the creation of the Ad Hoc wireless network, each mobile host has no information about other hosts or links. The network has no centralised manager and its topology changs dynamically by the movement of mobile hosts.

3.4 Autonomic Computing for Network Infrastructures Autonomic computing systems are self-managing systems which can perform management activities based on situations they observe or sense in the IT environment. Such computing systems have the ability to manage themselves and dynamically adapt to changes in accordance with business policies and objectives [14] [15]. In an autonomic environment, system components -- from hardware such as desktop computers and mainframes to software such as operating systems and business applications -- are self-configuring, self-healing, self-optimizing and self-protecting.

378

X. Zuo

Self-configuring is to adapt automatically to the dynamically changing environments. Self-healing is to discover, diagnose and react to disruptions. Self-optimizing is to monitor and tune resources automatically. Self-protecting is to anticipate, detect, identify and protect against attacks from anywhere. Networks infrastructure will become increasingly huge in scale, heterogeneous in composition, active in physical components, and programmable in revise provision. All of those make manual management of IT infrastructure impossible. It must be able to manage itself.

4 Outlooks The 21st century may become the “Broadband Age” or even better: the “Service Convergence Age”. Today, broadband sources such as fiber optic, satellite and cable modem provide very high speed access to information and media of all types via the Internet, creating an “always-on” environment. The result is a widespread convergence of entertainment, telephony and computerized information: data, voice and video, delivered to a rapidly-evolving array of Internet appliances, Personal Digital Assistants, wireless devices and desktop computers. Ubiquitous access to information, anywhere, and anytime, will characterize whole new kinds of information systems in the 21st Century. These are being enabled by rapidly emerging wireless communications systems, The needed expertise encompasses, e.g., network management, integration of wireless and wireline networks, system support for mobility, computing system architectures for wireless nodes/base stations/servers, user interfaces appropriate for small handheld portable devices, and new applications that can exploit mobility and location information. In the future, the host may see the network as a message-passing system, or as memory. At the same time, the network may use classic packets, wavelength division, or space division switching. Future network protocols will need to provide a secure connection independent of the networks for applications to use.

References 1. 2. 3. 4.

R. Brennan, B. Jennings, C. McArdle, and T. Curran: Evolutionary trends in intelligent networks. IEEE Communications Magazine (2000) 86–93 M. Finkelstein, J. Garrahan, D. Shrader, and G. Weber: The future of the intelligent networks. IEEE Communications Magazine (2000) 86–93 A. R. Modarressi and S. Mohan: Control and management in next-generation networks: challenges and opportunities. IEEE Communications Magazine (2000) 94–102 A. Leon-Garcia and L.G. Mason: Virtual network resource management for nextgeneration networks. IEEE Communications Magazine (2003) 102–109

Communication Networks: States of the Arts 5. 6.

7. 8. 9. 10. 11. 12. 13. 14.

15.

379

K. L. Calvert, S. Bhattacharjee, E. Zegura, and J. Sterbenz: Directions in active networks. IEEE Communications Magazine (1998) 72–78 A. T. Campbell, H. G. De Meer, M. E. Kounavis, K. Miki, J. B. Vicente, and D. Villela: A survey of programmable networks. ACM SIGCOMM computer Communication Review, 29 (1999) 7–23 K. Asatani and Y. Maeda: Access network architectural issues for future telecommunication networks. IEEE Communications Magazine (1998) 110–114 C-K. Toh and V.O.K. Li: Satellite ATM network architectures: an overview. IEEE Network, (1998) 61–71 D. L. Tennenhouse and D.J. Weterall: Towards an active network architecture. http://www.tns.lcs.mit.edu (1997) K. Lyytinen and Y Yoo: Issues and challenges in ubiquitous computing. Communications of the ACM, 45 (2002) 63–65 G. E. Burnett, and J. M. Porter: Ubiquitous computing within cars: designing controls for non-visual use. International Journal of Human-Computer Studies, 55 (2001) 521–531 G. B. Davis: Anytime/anyplace computing and the future of knowledge work. Communications of the ACM, 45 (2002) 67–73 W. Drew Jr: Wireless networks: new meaning to ubiquitous computing. The journal of academic Librarianship, 29 (2003) 102–106 H. Tianfield: Multi-agent based autonomic architecture for network management. Proceedings of the IEEE International Conference on Industrial Informatics (INDIN’03), Banff, Alberta, Canada, 21–24 August 2003 A. G. Ganek and T. A. Corbi: The dawning of the autonomic computing era. IBM Systems Journal, 42 (2003) 5–18

DHCS: A Case of Knowledge Share in Cooperative Computing Environment Shui Yu, LeYun Pan, FuTai Zou, and Fan Yuan Ma Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China {merlin,fyma}@sjtu.edu.cn

Abstract. Large-scale hypertext categorization has become one of the key techniques in web-based information acquisition. How to implement efficient hypertext categorization is still an ongoing research issue. This paper introduces the Distributed Hypertext Categorization System (DHCS), in which the Directed Acyclic Graph Support Vector Machines (DAGSVM) for learning multi-class hypertext classifiers is incorporated into cooperative computing environment. Knowledge share among the local learning machines is achieved via utilizing both the special features of the DAG learning architecture and the advantages of support vector machines. The key problems encountered in design and implementations of DHCS are also described with solutions to these problems.

1 Introduction Over the years, computer scientists have primarily studied the knowledge discovery process as a single user activity. For example, the research of automatic text categorization (ATC) has provided us with sophisticated techniques for supporting the information filtering process, but mostly in the context of a single, isolated user’s interaction with an information base. Recently, a number of case studies have studied the cooperative nature of information search activities. The case study reported in [1][2] provides insight into the forms of cooperation that can take place during a search process. However, most researchers have studied in depth the kinds of collaboration that can occur in either the physical or digital library [2] [3] [4]. With the rapid change of the World Wide Web, cooperative web mining plays a crucial role in web information acquisition. As a typical application of web information retrieval, hypertext categorization is suffering the large-scale unlabeled web page base. Since building text classifiers by hand is difficult and time consuming, it is desirable to learn classifiers from examples. Apparently it is necessary to extend the state-of-the-art machine learning techniques to cooperative learning environment so as to solve the problem of distributed web information retrieval. Another motive of this extension is the local learning machines are always looking forward to knowledge share in one community. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 380–387, 2004. © Springer-Verlag Berlin Heidelberg 2004

DHCS: A Case of Knowledge Share in Cooperative Computing Environment

381

The main aim of this paper is to discuss some important issues of how to implement efficient hypertext categorization in the distributed and cooperative learning environment. The rest of the paper is organized as follows. Section 2 introduces the state-of-the-art machine learning technique and hypertext categorization. Section 3 explains the knowledge share and cooperative learning in DHCS. Section 4 discusses some implementation issues. Section 5 describes some experimental results. And in section 6, we present the conclusions and give some ideas of the future work.

2 Machine Learning and Hypertext Categorization 2.1 Support Vector Machines Kernel-based learning methods (KMs) are a state-of-the-art class of learning algorithms, whose best-known example is Support Vector Machines (SVMs). SVMs method has been introduced in automated text categorization (ATC) by Joachims [5] [6] and subsequently extensively used by many other researchers in information retrieval community. It has shown to yield good generalization performance in both text classification problems and hypertext categorization tasks. So far, SVM is the best choice for constructing hypertext categorization systems [5] [7]. The original primal optimization problem describes the principle of SVMs:

Instead of solving the above optimization problem directly, one can derive the following dual program:

It is obvious that SVM learning algorithm needs to solve a numerical quadratic programming problem. By taking some decomposition techniques such as SMO [8] [9], one can solve SVM problem iteratively and the computation time can scale as ~N1.7 (N is the total number of training samples) in the best case. However, it is still complicated while dealing with large-scale multi-class categorization problems.

382

S. Yu et al.

2.2 Multi-class SVM and DDAG Learning Architecture SVM was originally designed for binary classification. How to effectively extend it for multi-class classification is still an ongoing research issue. Several methods have been proposed where typically the binary SVMs are combined to construct the multiclass SVMs. There are three main methods: one-against-one, one-against-all and DAGSVM. It has been pointed out in [10] that DAGSVM is very suitable for practical use. Previous experiments have proved that DAGSVM yields comparable accuracy and memory usage to the other two algorithms, but yield substantial improvements in both training and evaluation time [10][11].

2.3 Challenges of Large-Scale Hypertext Categorization Tasks So far, all issues we have discussed are under the assumption that the whole computation process is executed in one computer. Nevertheless, does DAGSVM work well while handing real large-scale hypertext categorization tasks? How can we handle thousands of HTML files even millions of them? How can we handle too many categories? How can we update the decision rules efficiently? Unfortunately, most practical categorization systems are isolated and unable to deal with these problems. Driven by the idea of implementation of hypertext categorization in the cooperative computing environment, we propose the distributed system (DHCS) as an interesting and practical case of knowledge share in the cooperative computing context.

2.4 Distributed Hypertext Categorization System We find that binary SVM nodes in DAGSVM are very similar to the real nodes in computer networks. Thus we may divide the training workload of the whole DAGSVM into several separated groups (each group contains one or more binary SVM nodes). Fig. 1 describes the basic idea:

Fig. 1. Allocate the DAGSVM nodes to physical computer nodes

DHCS: A Case of Knowledge Share in Cooperative Computing Environment

383

In the distributed hypertext categorization system, there are several key problems to be solved including: How to divide the DAGSVM nodes and allocate them to computer nodes? How do the computer nodes communicate with each other? How to share categorization knowledge among the computer nodes? We will discuss these problems in the later sections.

3 Knowledge Share and Cooperative Learning 3.1 Information Knowledge in DHCS According to the structure and the learning algorithm of DAGSVM, we can explain the concepts of information and knowledge in DHCS. First, divided DAGSVM node groups need their training samples. When one computer node has labeled some samples that do not belong to its categories, it should send them to other computers. We may notice that the exchange of the labeled samples among the computer nodes is very similar to the regular information exchange in a traditional peer-to-peer computer networks. But it is more meaningful that we can implement knowledge share in DHCS. Since each computer node has known some “knowledge” after it finish the training of its own DAGSVM nodes, other computer nodes can share its learning results. Once all computer nodes get enough “knowledge” from others in the cooperative environment, they can assemble the whole DAGSVM respectively. That is difficult in other cooperative learning systems but is considerate easy in DHCS for the special structure and features of DAGSVM. We will discuss this in detail in the following sections.

3.2 Knowledge Share in DHCS Before we go any further, it is necessary to take a look at the decision rules in SVM. After we solve the optimization problems in (1) and (4), we get the optimal then we have:

According to the Karush-Kuhn-Tucker (KKT) condition, we have:

In (4),

is a support vector if the corresponding

And the decision rule is:

384

S. Yu et al.

Combining (10) with (7) and (8), we see only those support vectors can affect the decision function. Researchers have found that the proportion of support vectors in the training set can be very small (usually 2%~5% in text categorization tasks [8][12]) via proper choice of SVM hyper-parameters. In fact, other computer nodes can restore the categorization rules only through the support vectors and their coefficients. Thus, for a computer node, it can transfer its categorization knowledge to other computer nodes in the form of support vectors and the corresponding coefficients. And fig. 2 shows the knowledge share in the context of DAGSVM:

Fig. 2. Knowledge share in DHCS

In VSM style hypertext categorization, sparse matrix technique can be applied so that the communication cost is not an important issue any longer. For example, Joachims [12] has pointed out that an average document in the WebKB collection is 277 words long and contains 130 distinct terms. And the whole collection leads to 38,359 features. Apparently combining VSM and sparse matrix technique can benefit DHCS greatly. Thus the local knowledge of the categorization can easily be transferred to other computer nodes so that all nodes can obtain the global knowledge of the hypertext categorization.

4 Some Implementation Issues of DHCS 4.1 Allocation of Computation Load to Computer Nodes DHCS is implemented in the LAN environment so that we can ignore the cost of communications. And all computers in DHCS communicate with each other via simple broadcast. We have found this simple strategy works very well in DHCS. The next step is mapping DAGSVM nodes to the computer nodes. We focus on classify web pages in CERNET (China Education & Research Net). We define twelve top

DHCS: A Case of Knowledge Share in Cooperative Computing Environment

385

categories, which needs to construct 12(12-1)/2=66 DAGSVM nodes. In our DHCS, the four computers are assigned 15,17,17 and 17 DAGSVM nodes respectively. (Obviously, DHCS can be easily extends to the P2P computing environment.)

4.2 Information Exchange in DHCS To implement dynamic and incremental learning in DHCS, the four computers need to exchange information periodically. Here, we refer to the information but not the knowledge yet, because the computers broadcast the labeled samples periodically. Users can label the hypertext files independently while they surfing the Internet. In our demonstrative DHCS, the system is running on four personal computers, and the users belong to one research group, that is, they have the same research background and trust others information and knowledge.

4.3 SVM Training Algorithm One key issue of SVM is the training algorithm. Iterative training algorithms are very suitable for solving the SVM optimization problem. We implement the modified SMO algorithm [9] in DHCS, which is proved efficient. Meanwhile, the optimal hyper-parameters can be achieved via minimizing the generalization error. Based on the leave-one-out cross validation estimation, one can derive the performance estimators for finding the best hyper-parameters. In DHCS, we develop an efficient algorithm for tuning the hyper-parameters of DAGSVM [13].

Fig. 3. Experimental results of DHCS

386

S. Yu et al.

5 Experimental Results To evaluate the performance of DHCS, we run the system on four daily-using computers in our laboratory. Considering the vast dynamic web, we stop a computer node if it reaches a set point (we name it “satisfying point”), that is, the accuracy is over 50%. (In fact, with enough time and enough effort, higher accuracy is definitely reachable.) And every user’s judgment of the navigated web pages is the expert validation to other computers’ decision rules. Fig.8 shows how our DHCS runs in 12 days. Although this is a simple experiment, we can still see DHCS is stable and effective.

6 Conclusions and Future Work Hypertext categorization plays an increasingly important role in web information acquisition systems. It provides fundamental application interfaces for web information retrieval and web mining, and also benefits other research fields such as e-mail filtering and web users’ relevance feedback, etc. To avoid expensive manual labeling, cooperative learning method is a must for distributed web page categorization systems. In this paper, we have introduced a distributed hypertext classification system, which implements DAGSVM in the cooperative learning environment. With little communication cost, knowledge share is achieved at the same time. Experimental result has shown the proposed DHCS works well in a laboratory LAN. Nevertheless, there are still some problems to be explored. For example, how about running DHCS in P2P networks? In the hetergeneous P2P environment, is it still ac-ceptable to ignore the communication cost in DHCS? How does DHCS handle the hierachical categorization tasks? We will explore these aspects in the future work.

Acknowledgements. This work was supported by the Science & Technology Committee of Shanghai Municipality Key Research Project Grant 02DJ14045 and Key Technologies R&D Project Grant 03DZ15027.

References 1. 2. 3.

O’Day, V., R. Jeffries: Orienteering in an Information Landscape: How Information Seekers Get From Here to There. In: Proc. INTERCHI 93, 1993, pp. 438-445. Twidale, M. B., D. M. Nichols, G. Smith, J. Trevor: Supporting Collaborative Learning During Information Searching. In: Proceedings of CSCL95,1995, pp. 367-374. Hertaum, M., and Pejtersen, A.M. The Information-seeking Practices of Engineers: Searching for Documents as well as for People. In: Information Processing & Management, 36, 2000, 761-778.

DHCS: A Case of Knowledge Share in Cooperative Computing Environment 4.

5. 6. 7. 8.

9.

10. 11. 12.

13.

387

Fidel, R. Bruce, H. Pejtersen, A. Dumias, S. Grudin, J. and Poltrock, S. Collaborative Information Retrieval. In: L. Höglund, ed. The New Review of Information Behavior Research: Studies of Information Seeking in Context. London & Los Angeles: Graham Taylor. Jochims, T. Text Categorization With Support Vector Machines: Learning With Many Relevant Features. In: Proceedings of ECML-98. Berlin: Springer. 1998. 137-142. Jochims, T. Transductive Inference For Text Classification Using Support Vector Machines. In: Proceedings of ICML-99. US: Morgan Kaufmann Publishers. 1999. 200-209. S. T. Dumais, J. Platt, D. Heckerman and M. Sahami. Inductive Learning Algorithms and Representations for Text Categorization. In: Proceedings of CIKM98, 1998, pp. 148-155. J. Platt. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Advances in Kernel Methods – Support Vector Learning. Cambridge, MA: MIT Press. 1998. 185-208. S.S. Keerthi, S.K. Shevade, C. Bhattacharyya and K.R.K. Murthy. Improvements to Platt’s SMO Algorithm for SVM Classifier Design. In: Neural Computation, 2001,Vol. 13, pp. 637-649. Chih-Wei Hsu, Chih-Jen Lin. A Comparison of Methods for Multicalsss Support Vector Machines. In: IEEE Transactions on Neural Networks. 2002, Vol. 13, No. 2, 415-425. J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large Margin DAGs for Multiclass Classification. In: NIPS2000. Cambridge, MA: MIT Press. 2000. 547-553. Thorsten Joachims. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Norwell [M], MA, USA: Kluwer Academic Publishers. 2002. Shui Yu, Liang Zhang, Fanyuan Ma. Design and Implementation of a Large-scale Multiclass Text Classifier. Submitted to Journal of Harbin Institute of Technology, 2003.

Improving the Performance of Equalization in Communication Systems Wanlei Zhou1, Hua Ye1, and Lin Ye2 1

School of Information Technology Deakin University 221 Burwood HWY, Burwood. VIC. 3125. Australia.

{wanlei, hye}@deakin.edu.au 2

School of Adults Education Harbin Institute of Technology Harbin City, P.R.China [email protected]

Abstract. In this paper, research on exploring the potential of several popular equalization techniques while overcoming their disadvantages has been conducted. First, extensive literature survey on equalization is conducted. The focus has been placed on several popular linear equalization algorithm such as the conventional least-mean-square (LMS) algorithm, the recursive leastsquares (RLS) algorithm, the filtered-X LMS algorithm and their development. The approach in analysing the performance of the filtered-X LMS Algorithm, a heuristic method based on linear time-invariant operator theory is provided to analyse the robust performance of the filtered-X structure. It indicates that the extra filter could enhance the stability margin of the corresponding non filteredX structure. To overcome the slow convergence problem while keeping the simplicity of the LMS based algorithms, an optimal initialization is proposed.

1 Introduction The least-mean-square (LMS) based adaptive algorithm have been successfully applied in many communication equalization practices. The importance of the LMS algorithm is largely due to two unique attributes[1]: Simplicity of implementation Model-independent and therefore robust performance The main limitation of the LMS algorithm is its relatively slow rate of convergence. Two principal factors affect the convergence behaviour of the LMS algorithm: the step-size parameter and the eigenvalues of the correlation matrix R of the tapinput vector. The recursive least-square (RLS) algorithm is derived as a natural extension of the method of least square algorithm. The derivation was based on a lemma in matrix algebra known as the matrix inversion lemma. [2]. [3].

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 388–395, 2004. © Springer-Verlag Berlin Heidelberg 2004

Improving the Performance of Equalization in Communication Systems

389

The fundamental difference between the RLS algorithm and the LMS algorithm can be stated as follows: The step-size parameter in the LMS algorithm is replaced in RLS algorithm by

that is, the inverse of the correlation matrix of the input

vector U(n). This modification has a profound impact on the convergence behavior of the RLS algorithm in a stationary environment, as summarized here [4]-[10]: 1. The rate of convergence of the RLS algorithm is typically an order of magnitude faster than that of the LMS algorithm. 2. The rate of convergence of the RLS algorithm is invariant to the eigenvalue spread ( i.e., condition number ) of the ensemble-averaged correlation matrix R of the input vector U(n). 3. The excess mean-squared error of the RLS algorithm converges to zero as the number of iterations, n, approaches infinity. The computational load of the conventional RLS algorithm is prohibited in real time applications. The recursive least-squares ( RLS ) algorithm is characterized by a fast rate of convergence that is relatively insensitive to the eigenvalue spread of the underlying correlation matrix of the input data, and a negligible misadjustment, although its computational complexity is increased[11]-[14].

2 Experiment and Results We present the experiment results of three adaptive equalization algorithms: leastmean-square (LMS) algorithm, discrete cosine transform-least mean square (DCTLMS ) algorithm, and recursive least square ( RLS ) algorithm. Based on the experiments, we obtained that the convergence rate of LMS is slow; the convergence rate of RLS is great faster while the computational price is expensive; the performance of that two parameters of DCT-LMS are between the previous two algorithms, but still not good enough. Therefore we will propose an algorithm based on in a coming paper to solve the problems. It is well known that high data rate transmission through dispersive communication channels is limited by the inter-symbol interference (ISI). Equalization is an effective way to reduce the effects of ISI by cancelling the channel distortion. However, dynamic, random and time-varying characteristics of communication channels make this task very challenging. High speed of data transmission demands a low computational burden. Hence, simplicity and robust performance play a crucial role in equalizer design. Due to its good robust performance and computational simplicity, least-mean-square (LMS) based algorithms have received a wide attention and been adopted in most applications [2], but one major disadvantage of the LMS algorithm is its very slow convergence rate, especially in high condition number case . To solve this problem, a variety of improved algorithm have been proposed in the literature. Although their actual implementations and properties may be different but the underlying principle remains the same: trying to orthogonalize as much as possible the input autocorrelation matrix and to follow a steepest-descent path on the transformed error function. Therefore, we extend the least square algorithm to a recursive algorithm for the design of adaptive transversal filter. An important feature of the RLS algorithm is that

390

W. Zhou, H. Ye, and L. Ye

it utilizes information contained in the input data, extending back to the instant of time when the algorithm is initiated. The resulting rate of convergence is therefore typically an order of magnitude faster than the simple LMS algorithm. This improvement in performance, however, is achieved at the expense of a large increasing in computational complexity. The RLS algorithm implements recursively an exact least squares solution [10]. At each time, RLS estimates the autocorrelation matrix of the inputs and cross correlation between inputs and desired outputs based on all past data, and updates the weight vector using the so-called matrix inversion lemma. The DFT/LMS and DCT/LMS algorithms are composed of three simple stages [5]. First, the tap-delayed inputs are preprocessed by a discrete Fourier or cosine transform. The transformed signals are then normalized by the square root of their power. The resulting equal power signals are input to an adaptive linear combiner whose weights are adjusted using the LMS algorithm. With these two algorithms, the orthogonalizing step is data independent; only the power normalization step is data dependent. Because of the simplicity of their components, these algorithms retain the robustness and computational low cost while improving its convergence speed. Although the structure of the filtered-X LMS adaptive equalization scheme is a little bit different from that of the basic LMS adaptive equalization scheme, the control adjustment process is the same: adjusting the FIR model of the equalizer to minimize the least mean square error Therefore, the optimal solution is actually the limit of the best solution for the filtered-X LMS adaptive equalization algorithm. Similar to the case of the basic LMS adaptive equalization scheme, the filtered-X LMS adaptive equalization scheme can not, in general, achieve this limit. However, its optimal solution is still expected to be close to that point if the adaptive size is small. Therefore, the optimal initialization method proposed still applies here. Why not simply use the optimal model matching filter as the final equalizer? This is because of the presence of the model uncertainty and other unexpected disturbances. The optimal solution obtained in offline computation may not be optimal when the filter is implemented in the real world system because of model uncertainty and other unexpected disturbance. For the optimal initialization, a poorly identified system model may give rise to a low quality model matching solution. However, due to robustness of filtered-X LMS adaptive equalization scheme, this solution may still be well within the convergence region. By extensive simulations and experiments, it is observed that method proposed here can also cope with wide eigenvalue spread of the input without having to use Discrete Cosine Transformation (DCT) that was conventionally required. This is an advantage in real-time operation environment where computation burden is a critical factor. The focus of this work is on improving the equalization performance of the powerful LMS and RLS adaptive algorithms while minimizing the increase of the related computational complexity. Since these algorithms are very popular in real world applications, the attemption is significant. The channel equalization is an effective signal processing technique that compensates for channel-induced signal impairment and the resultant inter-symbol interference (ISI) in communications system. Many sophisticated techniques have

Improving the Performance of Equalization in Communication Systems

391

been proposed for equalization, most of successful real world applications are still dominated by techniques that are related to several popular algorithms, such as the adaptive LMS algorithm, the filtered-X LMS algorithm and the RLS algorithm. For high-speed commercial communication systems, simplicity, robust and fast convergence rate are critical criteria for the design of a good equalizer. The adaptive LMS algorithm, the filtered-X LMS algorithm, and the RLS algorithm meet some of these criteria. Unfortunately, none of them, alone, satisfies all these criteria. Therefore, research on exploring the potential of these techniques while overcoming their disadvantages is important and necessary, which is exactly what has been conducted in this paper.

3 A Fast Start-Up Technique Though the LMS algorithm does not actually converge to the least--mean--square solution that optimal model matching solution achieves, they are very close if the adaptive step size is small enough. Interestingly, not much effort is needed to find the filter

as the filtered-X LMS algorithm still converges so long as the

estimate of the channel P (z) has less than phase shift and unlimited amplitude distortion. The robust performance analysis of the LMS algorithm conducted by Hassibi, et al. reveals that sum of the squared errors is always upper bounded by the combined effects of the initial weight uncertainty and the noise ( i ). This evidence strongly supports that the optimal initialisation presented in this thesis can confine the error to a low level right from the beginning and hence improve the convergence rate dramatically. A major benefit of this approach is that it makes the adaptive process a virtual finetuning process if a reasonable initialization is obtained, which avoids experiencing a possibly long adaptation process in transit to the fine-tuning period. The advantage will be more clearly illustrated by a high eigenvalue spread case. Extensive simulation experiment has shown that, in many cases, the adaptive process starts from an acceptable performance, and it does not need any remedy like Discrete Cosine Transform ( DCT ) or Discrete Fourier Transform ( DFT ) even in the case with a very high input signal eigenvalue spread where the conventional LMS algorithm may fail and traditionally a remedy like DCT and DFT technique is required. The conventional filtered-X LMS is modified and introduced for the purpose of equalization. Generic integration of the filtered-X structure, LMS algorithm, RLS algorithm and optimal initialization is conducted to meet all paramount criteria of simplicity, robust and fast convergence for equalization of high-speed, distorted communication channels. Finally, various techniques proposed in this thesis are tested using a popular communication channel example, under both slight non-stationary and sever nonstationary conditions. Comparisons are made with other conventional methods.

392

W. Zhou, H. Ye, and L. Ye

Significant performance improvement has been observed by Mont Carlo. The effectiveness of the methods proposed in this thesis has been verified.

Fig. 1. Learning curves of the various adaptive algorithms experiencing a abrupt increase of impulse response of the channel by 35%

This experiment has verified a well known fact that the conventional adaptive LMS algorithm can track slight non-stationary environments such as slowly varying parameters. Now, a more severe non-stationary situation is tested by abruptly increasing the channel impulse response coefficients by 35% of its nominal value. Fig. 1 shows the simulation result. The conventional adaptive LMS algorithm begins to diverge while the filtered-X LMS algorithm with or without optimal initialization still maintains a good robust performance. The conventional RLS algorithm still has an acceptable performance, which matches the observation that when the time variation of the channel is not small, the RLS algorithm will have a tracking advantage over the LMS algorithm . The filtered-X RLS algorithm has a better robust performance. The robust performance enhancement by the introduction of the filtered-X structure is obvious and significant. From a computational point of view, optimal initialization needs an additional effort to solve an optimal model matching or filtering problem. Since this procedure is a non-iterative solution and can be done off-line, it does not increase the computational burden in online operation. The only extra online computational burden concerned comes from the extra filter that is involved in every adaptive step. However, that structure increases only a computation of one simple algebraic convolution. This poses no serious problem in computation at all.

Improving the Performance of Equalization in Communication Systems

393

4 Conclusions (1). The practical importance of LMS algorithm is largely due to simplicity of implementation and its robust performance and its main limitation is relatively slow rate of convergence. The RLS algorithm is characterized by a fast rate of convergence that is relatively insensitive to the eigenvalue spread of the underlying correlation matrix of the input data, and a negligible misadjustment. Although it is computational complexity. (2). The conventional filtered-X LMS is modified and introduced for the purpose of equalization. The famous filtered-X LMS algorithm has found very successful applications in the field of active noise and vibration control. It has inherited the elegant simplicity of the conventional LMS algorithm, and is very robust. For approach in analyzing the performance of the filtered-X LMS algorithm, a heuristic method based on linear time-invariant operator theory has been provided to analyze the robust performance of the filtered-X structure. It indicates that the extra filter could enhance the stability margin of the corresponding non filtered-X structure. In this thesis, a generic integration of the filtered-X structure, LMS algorithm, RLS algorithm and optimal initialization has been conducted to meet all paramount criteria of simplicity, robust and fast convergence for equalization of high-speed communication channels. (3). To overcome the slow convergence problem while keeping the simplicity of the LMS based algorithms, an optimal initialization is proposed. Though the LMS algorithm does not actually converge to the least-mean-square solution that optimal model matching solution achieves, they are very close if the adaptive step size is small enough. Interestingly, not much effort is needed to find the filter as the filtered-X LMS algorithm still converges so long as the estimate of the channel P(z) has less than phase shift and unlimited amplitude distortion [21]. The robust performance analysis of the LMS algorithm conducted by Hassibi, et al. reveals that the sum of the squared errors is always upper bounded by the combined effects of the initial weight uncertainty

and the noise

This

evidence strongly supports that the optimal initialization presented in this thesis can confine the error to a low level right from the beginning and hence improved the convergence rate dramatically. A big benefit of this approach is that it makes the adaptive process a virtual finetuning process if a reasonable initialization is obtained, which avoids experiencing a possibly long adaptation process in transit to fine-tuning period. The advantage will be more clearly illustrated by a high eigenvalue spread case. As it is well known that the conventional LMS converges very slowly or even fails to converge with a no matter how small adaptive step size due to high input signal eigenvalue spread. optimal model matching solution is independent of this input signal eigenvalue spread, and hence could avoid this trouble. Moreover, this idea can be combined with other speed-up techniques such as Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) as well as various adaptive algorithms. Extensive

394

W. Zhou, H. Ye, and L. Ye

simulation experiment has shown that, in many cases, the adaptive process starts from an acceptable performance, elated ven in the case with a very high input signal eigenvalue spread. Another approach proposed here is that it generally does not require detailed knowledge of the external signal which is a great advantage in practice. Since there exist many powerful tools solving filtering problem, including explicit solution, the method proposed in this thesis is very promising. (4). A popular communication channel example is used to test the proposed techniques, under both slight non-stationary and severe non-stationary conditions. The level of channel distortion is deliberately raised to a level that is much higher than any published result with system condition number as high as nearly 390. Furthermore, it is assumed that each tap weight of the channel undergoes an independent stationary stochastic process with each parameter fluctuating around its nominal value with a uniform probability distribution over the interval in addition to the white noise disturbance of variance 0.001 at the channel output. Mont Carlo simulation experiment of 1000 independent trials is conducted to obtain an ensemble-averaged learning curve. All adaptive algorithms have shown a good robust performance against the time varying, random Gaussian impulse response coefficient fluctuations specified above. The filtered-X LMS with the optimal initialization has been shown to have the fastest convergence rate and best performance. (5). A more severe non-stationary situation was tested by abruptly increasing the channel impulse response coefficient by 35% of its nominal value. The conventional adaptive LMS algorithm begins to diverge while the filtered-X LMS algorithm with or without optimal initialization still maintains a good robust performance. The conventional RLS algorithm has an acceptable performance, which matches the observation that when the time variation of the channel is not small, the RLS algorithm will have a tracking advantage over the LMS algorithm [1]. The filtered-X RLS algorithm has a better robust performance. The performance improvement by using the proposed techniques is significant and hence, the effectiveness of the new method has been verified. The contributions of this paper are: we are compared the LMS with DCT-LMS and RLS for adaptive equalizer first, then we will be conducted on how to speed up the convergence rate of LMS based algorithm while keeping the increased in-line computational burden as low as passible, we will overcome the slow convergence problem while keeping the simplicity of the LMS based algorithm, and the Optimal initialization has been applied in adaptive equalizer for communication systems. There still exists many open problems. For instance, the analysis of the stability margin of the filtered-X LMS was conducted in a heuristic manner. Can we extend this to a general case such as a discrete time MIMO case? What about the filtered-X RLS algorithm? Can we apply the ideas to other adaptive equalization techniques such as decision-feedback equalization, etc.? What happens if we use the optimal initialization instead of the optimal initialization? Another very active area of equalization is wireless communication where the phenomenon of fast multiple-path fading (Rayleigh fading) is very challenging. As indicated in the

Improving the Performance of Equalization in Communication Systems

395

simulation, rapid and not so small channel variations can cause the conventional LMS algorithm to diverge. It will be interesting and challenging, therefore, to apply the new techniques presented here to those areas in the future.

References 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

R.D. Gitlin, J.F. Hayes, and S.B. Weinstein, Data communication principles, Plenum Press, New York, 1992. E.A. Lee and D.G. Messerschmitt, Digital communication, Second Edition, Kluwer Academic Publishers, 1994. S. Haykin, Adaptive filter theory, Third Edition Edition, Prentice Hall Information and System Sciences Series, 1996. David S. Bayard, “LTI representation of adaptive systems with tap delay-line regressors under sinusoidal excitation,” “Necessary and sufficient conditions for LTI representations of adaptive systems with sinusoidal regressors,” Proceedings of the American Control Conference, Albuquer, New Mexico, June 1997, pp. 1647-1651, pp. 1642-1646. Steven L. Gay, “A fast converging, low complexity adaptive filtering algorithm,” Proceedings of 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1993, pp.4-7. S. Elliott and P. Nelson. “Active noise control,” IEEE Signal Processing Magazine, Oct. 1993. Markus Rupp and Ali H. Sayed, “Robust FXLMS algorithms with improved convergence performance,” IEEE Trans. on Speech, Audio Processing, vol.6, no.1, Jan.1998, pp.78-85. E.A. Wan, Adjoint LMS: an efficient alternative to the Filtered-X LMS and multiple error LMS algorithms. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996, ICASSP-96, vol.3, pp.1842-1845. Markus Rupp, “Saving complexity of modified filtered-X LMS and delayed update LMS algorithm,” IEEE Trans. on Circuits and System-II: Analog and Digital Signal Processing, vol.44, no.1, Jan. 1997, pp.57-60. Steven L. Gay, “A fast converging, low complexity adaptive filtering algorithm,” Proceedings of 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1993, pp.4-7. J.M. Cioffi and T. Kailath, “Fast recursive least-squares transversal filters for adaptive filtering,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. ASSP-32, April 1984. pp. 304-338. F.T.M. Slock, “Reconciling fast RLS lattice and QR algorithms,” 1990 International Conference on Acoustics, Speech and Signal Processing, vol.3, New York, USA. 1990. pp. 1591-1594. M. Bouchard, S. Quednau, “Multichannel RLS Algorithms and Fast-Transversal- Filter Algorithms for Active Noise Control and Sound Reduction System,” IEEE Trans. Speech and Audio Processing, vol.8, no.5, Sep. 2000 Bouchard, M. and Quednau, S. “Multichannel Recursive-Least-Squares Algorithms and Fast-Transversal-Filter Algorithms for Active Noise Control and Sound Reproduction Systems,” IEEE Transactions on Speech and Audio Processing, vol. 8. No. 5, September 2000.

Moving Communicational Supervisor Control System Based on Component Technology Song Yu and Yan-Rong Jie School of Computer, North China Electric Power University, Baoding 071003,China [email protected]

Abstract. Based on XYZ/E language, the moving communicational supervisor control system(MCSCS) based on component technology is introduced in the paper. The authors presented the system architecture and gave its XYZ/E description briefly, discussed separately that central supervision center is case of server and case of both client and server, gave the data transmission program of the XYZ/E description briefly, presented the implement of the system combination state environment. Keywords: Component; supervisor control system; XYZ/E language; client; server

1 Introduction A component is a program body that works alone or in cooperation with other components. Once is defined, it hasn’t relation to its concrete implementation language [1].Components existence relies on architecture techniques to a certain extent [2], only in suitable architecture, a software may be abstracted, isolated, and ultimately turns into components. Components are minimum units of the software resue, ideally, the whole system is composed of several components which connected each other through interface definitions. CBD (Component-Based Development) is looked on software architecture as packaging blueprint[3], took resue software components as packaging prefabricated blocks, supports packaging software resue, is one of the effective ways which enhance software productivity and quality, reduce the side effect of developers leave, shorten the product delivery date. As software industry and software engineering techniques develop, software resue [4] is paid more and more attention to. During the developing course, the first thing we should do is to define the specification (static semantics) of the component according to the requirement. Then select the right architecture style; create the subcomponent and connector; write out the specification of every subcomponent. After constructing a whole component’s structure, we can create the correspondent procedure in XYZ/E (dynamic semantics), and decompose the abstract component in the next level till all static semantics having been converted into dynamic semantics and executable program. The CBD techniques has come a noticeable issue. The paper describes the component dynamic semantics of the MCSCS development course in XYZ/E, this method has evidently enhanced the software productivity and reliability.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 396–399,2004. © Springer-Verlag Berlin Heidelberg 2004

Moving Communicational Supervisor Control System

397

2 XYZ System and Temporal Logic Language XYZ/E XYZ system [5] is the software engineering tool system that is foundation of temporal logic language XYZ/E, it combines temporal logic with software engineering organically, its goal is to enhance the software productivity and reliability. XYZ/E corresponds to a wide spectrum language which each sublanguage represents different program mode or program paradigm. There are three forms of the control structure in XYZ: basic XYZ/E which represents status transformation directly, structured XYZ/E and production rule form XYZ/E. There are two temporal logic operators in XYZ/E, the future temporal logic operators: $O, $U and $W; the past temporal logic operators: $ $ [ · ] , $ , $S and $B. The basic commands are called conditional elements(CE) in XYZ/E, they take the following form:

The basic component of a XYZ/E program is called unit, it takes form: WHERE The structured CEs (statements) includes conditional statement, loop statement, case statement, wait statement, continue statement, select statement and parallel statement. All the features of XYZ/E indicate that it will benefit the abstract description, stepwise refinement, and procedure synchronization [6,7], it can express the real world flexibly, and figure out the implementation of the procedure, it is very natural and meaningful to apply XYZ/E to formalizing architecture.

3 System Design MCSCS includes 3 layers structure: CSC (Central Supervision Center) part, LSC (Local Supervision Center) part and SU (Supervision Unit) part. CSC and LSC is composed of software, SU mainly hardware. It is mainly running background software in CSC which includes processing parameter setting with data, configuration module, and basic stand module management, and it also includes image terminal. LSC includes data transmission program, protocol transformation service program, and historical data processor etc.. MCSCS can be looked as client/server (C/S) mode entirely. Being looked from TCP/IP process procedure, LSC acts as server, terminals and its down supervision units act as clients. CSC echoes client requests and supervises their actions, they send communicational requests each other through TCP/IP protocol and transfer data; looked from system logical structure, LSC acts as server, supervision terminals (ST) as clients, ST pose request, LSC works according to concrete request, SU is server and LSC is client between relationship of them, LSC issue request to SU or send out command parameters request, SU respond to it.

398

S. Yu and Y.-R. Jie

As LSC is server case, using LscServer represents server role. There are two client roles, TerminalClient represents terminal client role, SuClient represents down level client roles, the C/S mode between LscServer and TerminalClient is:

TerminalClient communicates with LscServer according to TCP/IP protocol through network. As LSC is both client and server case, LSC is double level roles, using LscDouble represents it. Terminal clients are represented by TerminalClient, the server role of SU is represented by SuServer. The C/S mode between LscDouble and TerminalClient is:

TerminalClient communicates with LscDouble according to TCP/IP protocol through network by means of message. Data transmission program is situated in foreground LSC server, the upper is connected with service program, the lower is connected with serial interfaces achieving double communication. Because of data transmission through network by means of message mode, it is hard to avoid bringing about errors in data transmission process for outer interference. MCSCS is demanded data to be reliable and real time, so command parameters from terminal reception and data from lower send to upper must be right, there for a data transmission program is needed, which fulfils the process of data analysis, check, and repacked. The data transmission program act as filter, which similar to a instance of pipe-filter style[8]. The filter object is two messages which corresponds to upgoing and down data (the upgoing data is that from SU send to LSC, the down data is that from LSC send to SU). The input data of the filter is get from buffer, after the data read from buffer, they are divided package, verified and packed, at last the data is put to queue waiting to be transmitted. Configuration environment is a auxiliary tool that was attached to supervisor control software, user needn’t professional programming to fulfill special function. Because the system requires displaying configuration environment for figure mode, it is used some configuration tools of editing figure interface. After configuration environment was built, we find suitable components to insert in the architecture based on definite requirements. If components just satisfy the requirement, we can only program link code; if the components has sum distance to requirements, we can properly modify the components and make them satisfy the requirements; we program corresponding code on the requirements, satisfy the interfaces demands of function requirements and architecture.

Moving Communicational Supervisor Control System

399

4 Conclusion This paper applies the based on components/architecture development idea to real supervisor control system. In course of development, we combine the black with white box reuse way to build the system. If the component can directly be used, we only developed interface program, or if it is not directly, we make suitable modification to fulfill white box reuse. The program of MCSCS in XYZ/E has been transformed C++ program through corresponding transformation tools, and has run correctly.

References 1. Pat Hall, Educational Case Study –What is the model of an ideal component? Must it be an object? Third International Workshop on Component-Based Software Engineering: Reflection on Practice. Papers: 2000 International Workshop on Component-Based Software Engineering 59-62 2. Ralph E. Johnson , Components, Frameworks, Patterns, ACM Software Engineering notes 1997,22(3)10-18 3. MEI Hong, Software Component Composition based on ADL and Middleware, SCIENCE IN CHINA(Series F), 2001,44(2)136-151 4. Premkumar T. Deranbu, Next Generation Software Reuse, IEEE Transactions on Software Engineering, 2000,26(5)423-424 5. Tang Zhi Song, Temporal logic programming and software engineering, Beijing: Scientific publishing house, 2002(5)40-66 6. Tang Zhi Song, Object, meaning and application of XYZ system, journal of software, 1999,10(4)337-341 7. Zhang Guang Quan, Software architecture concepts, styles and its descriptive language, Journal of Chong Qing Teacher-training Institute, 2000, 17(3)1-5 8. Mary Shaw, Software Architecture Perspectives on An Emerging Discipline, Prentice Hall,(1996)

A Procedure Search Mechanism in OGSA-Based GridRPC Systems Yue-zhuo Zhang, Yong-zhong Huang, and Xin Chen Department of Computer Science & Technology, Information Engineering University of PLA, Zhengzhou Henan 450002,China [email protected]

Abstract. This paper presents a way of searching remote procedures in OGSAbased GridRPC systems. GGF recommends a grid-enabled, remote procedure call mechanism (GridRPC) to provide a low barrier to acceptance for grid by providing a well-known and established programming model that allows the full use of grid resources while hiding the tremendous amount of infrastructure necessary to make grids work. In this paper, by defining a kind of Grid service called Procedure Search Service in OGSA, we present a procedure search mechanism for discovering the remote procedures in grid computing implementations based on OGSA and GridRPC.

1 Introduction Although Grid computing is regarded as a viable next-generation computing infrastructure, the widespread adoption of grid is still hindered by several factors [1]. One of the factors is that for an application programmer, it is very difficult to program directly on Globus I/O. In order to provide a low barrier to acceptance for grid use, GGF will produce a recommendation for a grid-enabled, remote procedure call mechanism (GridRPC) [1]. This GGF proposed recommendation will primarily consists of an Application Programming Interface (API), and associated programming model, that will enable simple, RPC-based use of grid computing resources. GridRPC will provide a well-known and established programming model that allows the full use of grid resources while hiding the tremendous amount of infrastructure necessary to make grids work. A draft programming model and API already exist [l].The current GridRPC model and API presented by GGF is a first-step towards a general GridRPC capability, there are certainly a number of outstanding issues regarding widespread deployment and use, one of which is simply discovery. Currently a remote procedure is discovered by explicitly asking a well-known server for a well-known function through a name string lookup. Establishing this function-to-server mapping is all that the user cares about and, hence, the GridRPC model does not define how discovery is done. This paper will discuss how to find the remote procedures for an application running in a grid computing environment which is based on OGSA [2] and uses GridRPC mechanism.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 400–403, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Procedure Search Mechanism in OGSA-Based GridRPC Systems

401

2 Related Works Currently you can implement GridRPC by Netsolve or Ninf-G[1]. The current GridRPC model has four prototype implementations and has not built on OGSA. As we know, OGSA is called the next generation grid architecture; it will be widely used in building grid computing environments. In grid computing systems that based on OGSA and GridRPC, the remote procedures can be seen as a kind of Grid service, and they can be registered and searched by the way that OGSA provides. In the following part of this paper, we will define the remote procedure as a kind of Grid service called Remote Procedure Service. The discovery of remote procedure becomes the discovery of such a Grid service instance by querying the registers. We will define a kind of Grid service called Procedure Search Service; this kind of service is used specially for searching Remote Procedure Service. A client can find the procedures it wants simply by sending its requirements of the procedures to the Procedure Search Service; and then the Procedure Search Service will return a number of procedures, each of them accords with the client’s requirement. The client can select a valid procedure from these procedures without query the registry service for many times if the procedure that it found is not valid.

3 Remote Procedures Search Mechanism 3.1 Remote Procedure Service Definition We will discuss how to define the remote procedures as a kind of Grid service at first, and we call this kind of Grid service Remote Procedure Service. The Remote Procedure Service will be consisting of Grid service data and Grid service interfaces. The part of Grid service data including all the information required specifying a remote procedure, such as the name, the parameters and functions of the procedure. The Grid service interfaces include at least three portTypes [2]: GridService, Registration and Factory; the other portTypes can also be included in the definition of the service if required.

3.2 Procedure Search Service Definition We define the Procedure Search Service to search remote procedures in OGSA-based GridRPC systems. A group of remote procedures’ GSHs will be returned to an application, which wants to find a procedure via sending a query to a Procedure Search Service. The Procedure Search Service’s interface includes five core portTypes in OGSA: GridService, Factory, Registry, NotificationSource, NotificationSink and a userdefined portType: Compare.

402

Y.-z. Zhang, Y.-z. Huang, and X. Chen

Procedure Search Service implements a user-defined portType called Compare; and this portType will help the service to compare the WS-Inspection document [3] returned by a register with the specification document comes from a client which specified what kind of procedure it wants to find, if the former is the match of the later the Procedure Search service will return the GSH to the client as an answer.

3.3 Procedure Search Course in Details There are three roles in the discovery course: an application which we call it a client, a Procedure Search Service and the registers. The main idea of the discovery mechanism is that a client sends it requirements of the remote procedure to an Procedure Search Service at first, then the service subscribe information of remote procedures registered in a register, if the requirement accord with the information returned by the register, it means that the procedure with such information is the answer. The discovery course can be described as the flowing steps: 1. A client sends its procedure requirement to a Procedure Search Service. 2. The Procedure Search Service describes the information registered in a local register as notification sink; and the local register sends the information registered in it to the Procedure Search Service periodically. 3. The Procedure Search Service compare the information returned by the register with the requirement comes from the client, if the former is the match of the later it will register the corresponding Grid service’s GSH in itself. The information returned by the local register maybe refer to a register; the Procedure Search Service will also register the GSH of such a register. With the register of other registers, the Procedure

Fig. 1. Sequence diagram of procedure search course

A Procedure Search Mechanism in OGSA-Based GridRPC Systems

403

Search Service can search in such registers if it can’t find valid information from the local register, in this case we call the local register the first register and the other registers the second, the third, etc. 4. Init the registers that registered in the Procedure Search Service as a set, take the second register out and search it by following step2 and step3, then take the third, four, etc. After each search, some new registers will be added in the set, search each of the registers iteratively and delete the register from the set if it has been searched. If the set is null, stop the searching. 5. The Procedure Search Service can search the registers in the order of depth-first or breadth-first. 6. The search will not stop until a number of procedures have been found, or it will stop when the search time reaches a predefined threshold value. The Procedure Search Service unregistry all the registers that has been registered in it. Now, what registered in the Procedure Search Service is GSHs of the Remote Procedure Services that can meet the client’s need. 7. The Procedure Search Service returns the GSHs registered in it to the client. If it can not find a remote procedure that meet the client’s need, it will return a fault message and ask if the client want to search again, if the client want to search again, the steps mentioned above will be repeated. Figure 1 illuminates the procedure search course.

4 Discussion and Conclusions We discuss a procedure discovery mechanism in OGSA-based GridRPC systems. We define the remote procedures as a kind of Grid service called Remote Procedure Service and design a Procedure Search Service as an agent for discovering remote procedures instead of discovery by directly querying the registry service. The discovery mechanism presented above can obtain a group of procedures that accord with the client’s requirement. The client can find a valid procedure from them without querying the registry service for many times.

References 1. Hedemoto Nakada, Satoshi Matsuoka, Keith Seymour, Jack Dongarra. GridRPC: A Remote Procedure Call API for Grid Computing (2002). http://graal.ens-lyon.fr/GridRPC/pub/APM_GridRPC_0702.pdf 2. S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, C. Kesselman, T. Maguire, T. Sandholm, P. Vanderbilt, D. Snelling, S. Open Grid Services Infrastructure (OGSI) Version 1.0. Global Grid Forum Draft Recommendation (6/27/2003). http ://www. globus .org/research/papers/Final_OGSI_Specification_V1.0.pdf

An Improved Network Broadcasting Method Based on Gnutella Network* Zupeng Li1,2, Xiubin Zhao2,3, Daoyin Huang1, and Jianhua Huang1 1

National Digital Switching System Engineering & Technological R&D Center, [email protected] 2

Telecommunication Engineering Institute, Airforce Engineering University 3 Northwestern Polytechnical University No. 783 P.O.Box 1001, Zhengzhou, 450002, P.R.China Tele: 86-371-3532770; Fax: 86-371-3941700

Abstract. Peer-to-peer networking is a hot buzzword that has been sweeping through the computing industry over the past year or so. Gnutella, as one of the first operational pure P2P systems, is considered as an important case study for P2P networking. By analyzing the Gnutella network topology data, we can discover both the small diameter and the clustering properties characteristic of “small-world” networks. Based on this, the paper aims at analyzing and predicting the performance of current broadcasting algorithm and proposing an improved broadcasting method on Gnutella. By avoiding the unnecessary message forwarding in the network, flow of network communication is remarkably reduced in the new algorithm.

1 Introduction Peer-to-peer network (P2P) technology is a currently emerging technology in the network research domain [1,2] .In the P2P network research domain, Gnutella [3] is considered to be the first completely decentralized peer-to-peer protocol which has been created. The first client was written largely as an experiment by developers at Nullsoft, a subsidiary of AOL. Upon launch, Gnutella was swiftly labeled an “unauthorized freelance project” by AOL and removed from the Nullsoft website. The Open Source community soon continued its development and there now exist billions of clients operating the protocol. In this paper, the “small-world” property of Gnutella network is firstly discussed. In the following section, we present an analysis of the problems of Gnutella network. Armed with the “small-world” property of the underlying network topology, an improved network broadcasting method - Intelligent Network Broadcasting Method (INB) is proposed to avoid the unnecessary message forwarding in the Gnutella network. Finally the conclusion is given in the last part.

*

This research is supported by the National High Technology Development 863 Program of China under Grant Nos. 2001-AA-11-1-141.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 404–407, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Improved Network Broadcasting Method Based on Gnutella Network

405

2 “Small-World” Properties The term “small-world” originated with a famous social experiment conducted by Stanley Milgram in the late 1960s. By analyzing the Gnutella network topology data obtained by, we can discover both the small diameter and the clustering properties characteristic of “small-world” networks.

The values for the Gnutella topology graphs are benchmarked against two widely used “small-world” models, the Watts- Strogatz [4] and the Barabási-Albert model [5], the random graph and the 2-D torus in tables 1 and 2. As can be seen, all of the Gnutella topology snapshots demonstrate the “small-world” phenomenon: characteristic path length is comparable to that of a random graph, while the clustering coefficient is an order of magnitude higher. These results clearly indicate strong “smallworld” properties of the Gnutella network topology.

3 Intelligent Network Broadcasting Method 3.1 Problems of Gnutella Network Gnutella, as one of the first operational pure P2P systems, is considered an important case study for P2P networking. It consists of many peers, all of which are similar in functionality. There are no specialized directory servers. Peers must use the network of which they are part to locate other peers. As outlined in [6], the main problem with Gnutella is its use of broadcasts for searching (and discovering) the network. As the network grows in size, not only does the rate of messaging increase, but the traffic potentially generated by each message increases too.

406

Z. Li et al.

3.2 Description of INB Algorithm According to the “small-world” property, the system will tend to form more highly connected clusters of nearby machines in use. The important fact is that if nodes A and B are connected, along with B and C, then A is more likely to be connected to C than if the network was purely randomly connected. With more clustering, this effect will be more apparent. We can exploit this to remove much of the redundancy in broadcasting. Firstly, the algorithm assumes each node knows who each of its immediate neighbors is connected to. This knowledge can easily be passed to neighboring nodes, either with regular refreshes, or notifications of changes. Armed with this information, some unnecessary message forwarding can be avoided, which is shown in (a) of Fig. 1:

Fig. 1. Example I of intelligent network broadcasting

In this small example, node A receives a broadcast message and forwards it to B and C. With Gnutella, B and C would then forward the message to each other (and both promptly ignore the repeat), resulting in two unnecessary messages. However, with this scheme B knows that C is connected to A, so will have already received the broadcast. Hence, B does not forward to C, and similarly C does not forward to B. We can also do slightly better, if we make the algorithm a little more complex, as we can see in (b) of Fig. 1 .Implemented Gnutella-style (assuming nodes do not forward the message back to where it was received from), this would again result in two wasted messages. In this scheme, C uses the knowledge that A is connected to B and is connected B to D to imply that D will have already received the broadcast from A. Of course, B could imply the same given the knowledge that C is connected to both A and D, in which case B would not forward the message to D. To prevent this, some ordering of nodes is needed, so that in these situations the nodes know which one needs to forward the message. Using standard alphabetical ordering above, B must forward the message as B256MB), and powerful CPU. The new set-top box (STB) will be equipped with large capacity disk. An entire video title may be replicated (or called cached) in a user host after this user host performs a video on-demand service. Mini video server software can be downloaded from higher-level video server into this user host. When a video on-demand request is admitted, this mini video server software is running to deliver a video title to other nodes. The process of getting software and providing VOD services is based on the Grid Security Infrastructure (GSI) protocol implemented in this Grid-type architecture. An Only-Viewer node is a simple video-receiving and decoding device, and just taken as a terminator for playback. In this case, its proxy node assists some relevant video servers in supplying video services, and an individual static channel is allocated for this end node. An Only-Viewer node is just a consumer with bad efficiency. It does no help to increasing system capability.

2.2 Integration of Grid Computing and P2P Models Currently, peer-to-peer models (e.g. Napster) are increasingly becoming popular for sharing information and data through direct exchange, and computational Grids have emerged as popular platforms for deploying large-scale and resource-intensive applications. The Grid is defined as coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations. This concept of a virtual organization (VO) is central to Grid computing. Peer-to-peer models offer the important advantages of decentralization by distributing the storage capacity and load across a network of peers and scalability by enabling direct and real-time communication [4] [5]. As an integration of features in Grid and P2P models, this hybrid Grid-type architecture not only can enable a scalable system capacity, but also has a secure and efficient organization.

516

X.-j. He, X.-h. Tang, and J.-y. You

When a user host serves as a mini server, a credit is need to allocate the reserved network resource in a controllable network environment, and sufficient reliance may be needed to provide services for other nodes. All of these can be carried out under the security policy and global resource collaboration framework in Grid. A node may request or provide video services, so user host will involve different autonomous groups according to its interests. We consider that two hosts have similar interests if they are able to provide video services to their each other’s requests. Hosts learn about the interests of their peers by monitoring the replies they receive to their requests. Therefore, hosts decide whom to connect to when to add or drop a connection based on this local information. Hosts with high degree of similar interests are considered good peers [5], which form an autonomous group. The higher-level video servers maintain dynamic index information for its lower-level autonomous group. Under this Grid-type architecture, a request for a specific video title is processed in an autonomous group it is inside at first. If this try fails, the request will be delivered to higher-level server. This will be done so on until this request reach top dedicated servers. When this request is admitted according to the security and resource control policy, these relevant peers will provide video services with no-delay and global efficiency.

3 Video Multicast Delivery Based on Dynamic Buffering Statistical analysis shows that masses of requests focus on a few hot video titles and the entire system availability mainly rely on the network and disk bandwidth utilization. When various users request same video title, the multicast delivery allows these users sharing one channel so as to admit more concurrent on-demand users [6]. In a controllable high-speed network, the QoS can be guaranteed with a core-stateless proportional adaptive fair bandwidth allocation mechanism. Each video stream can be allocated one dynamic channel based on Diffserv, which guarantee relevant network bandwidth [7]. Further information shows that the power of network and storage is projected to double every 9 and 12 months respectively, and the increase in disk I/O bandwidth is slower than former case. In current VOD system, disk I/O bandwidth easily becomes limited. The disk performance becomes a bottleneck of the VOD system.

3.1 Video Multicast with Stream-Merging Algorithm An individual channel is allocated to transmit the entire video in order to satisfy the first request for this video title. When a new request for the same video title is admitted after seconds, we can multicast the rest of the video data through the sharing channel. There is at least two video streams may be processed in the new on-demand user host: the multicasting segment of video data and the fist seconds of the video through another channel allocated at this time. These video streams are merged for playback in client-side storage. The principle for this improved multicast policy is

Supplying Instantaneous Video-on-Demand Services Based on Grid Computing

517

shown in Fig. 2. Contrasting to typical batching scheme, this multicast policy can supply instantaneous VOD services without delaying the new on-demand request.

Fig. 2. Video multicast with stream merging

3.2 Improve Disk I/O Bandwidth Utilization with Dynamic Buffering In the conventional buffering mechanism, the video server maintains an equal buffer for real-time presenting video. Reading and transmitting video data is under the control of this buffering mechanism so as to prevent overflow or underflow in clients. Prior research has shown a Poisson arrival stream of requests. This user pattern implicates that the average disk I/O workload less than the maximal disk I/O capability [8]. In order to improve disk I/O bandwidth utilization, one method may be reading video data at full speed. It is feasible with sufficient network bandwidth and the capability for buffering entire video title in clients. Based on dynamic buffering scheme, video data will be transmitted to clients with best effort. Because the time for single on-demand video title is reduced, more resource may be ready for latter ondemand requests, and the number of total successful on-demand user will be increased during long time.

518

X.-j. He, X.-h. Tang, and J.-y. You

In video server based on this buffering scheme, one dynamic buffer is created for each video stream. The buffer size is adjusted according to video data reading bit rate and channel bandwidth for this video stream. The structure of this dynamic buffer is a link of pages memory. When a video server maintains multiple video streams, the disk performs at full speed, and each video stream is allocated a portion of the disk I/O bandwidth according to the requirements for video presenting and transmission.

4 Adaptive Video Delivery Scheme in This Grid-Type VOD System In this Grid-type VOD system, the mini-server runs as foreign application in user host. As non-dedicated video server, mini-server has lower priority than local application in user host. According to the power of every video server and network availability, the system can provide optimum VOD services. When a new on-demand request comes, some good peers are selected to provide VOD services. If it request the same video title processed by these good peers, it will join into multicast delivery for this video title. Otherwise, one video server starts an individual video delivery for its first presenting. We summarize in Table 1 the symbols we use in this paper.

For any new on-demand request, the video server must determine whether to admit this request by itself or to ask other servers for help. When a video server keeps in stable state, the number of video stream is k. The essential memory space in video server must be reserved for each video stream. Namely, essential buffer size is reserved to guarantee real-time presenting for the video stream i. When one dynamic buffer reaches its upper limit, make a pause in its disk read. The upper limit for video stream i is When a new on-demand request comes to this video server, the number of video stream may be k+1. Based on these following conditions, this video server can determine whether to admit this request or not.

Supplying Instantaneous Video-on-Demand Services Based on Grid Computing

519

If all of these conditions can be satisfied, this request can be admitted immediately. Otherwise, another video server will be selected to continue.

5 Conclusion In this paper, a hybrid VOD system is implemented based on Grid computing. By taking advantage of the large storage and the powerful processing capability in clientside devices, user host serves both as a client and as a mini video server. With the contribution of user hosts, we achieve scalable system capability. All of these user hosts are assigned to different autonomous groups, and workload may be apportioned among autonomous groups fairly. Based on improved multicast policy and dynamic buffering algorithm, this adaptive video delivery scheme furthers resources utilization. Because user hosts act as non-dedicated servers, relevant servers need to cooperate with each other to supply video services with no-delay. In this VOD system, all of the video data are stored on the dedicated servers, and parts of these video data are cached in user hosts. Requests for popular video will benefit from this policy. Due to the limited storage capacity in the user host, a video data distributed policy based on this caching method must be designed in future work. With a large caching space in the cooperative user hosts, more and more video services may be supplied locally.

References 1. S.-H. Gary Chan, Fouad Tobagi: Distributed Servers Architecture for Networked Video Services. IEEE/ACM Transactions on Networking, Vol.9, No. 2, April 2001, 125–136 2. Sridhar Ramesh, Injong Rhee, Katherine Guo: Multicast with Cache (Mcache): An Adaptive Zero-Delay Video-on-Demand Service. IEEE Transactions on Circuits and Systems for Video Technology. Vol. 11, No. 3, March 2001, 440–456 3. J.Y.B.Lee, R.W.T.Leung: Study of a Server-less Architecture for Video-on-Demand Applications. Proceedings of the IEEE International Conference on Multimedia and Expo 2002, Lausanne, Switzerland, Aug 2002 4. I. Foster, C. Kesselman and S. Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, Vol. 15, No. 3, 2001

520

X.-j. He, X.-h. Tang, and J.-y. You

5. Murali Krishna Ramanathan, Vana Kalogeraki, Jim Pruyne: Finding good peers in peer-topeer networks. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 02), 2002, 232–239 6. L. Gao, D. Towsley: Supplying instantaneous video-on-demand services using controlled multicast. Proc. IEEE International Conference on Multimedia Computing and Systems, 1999 7. LI FangMin, LI RenFa, YE ChengQing: A Core-stateless Proportional Adaptive Fair Bandwidth Allocation Mechanism. Journal of Computer Research and Development, Vol. 39, No. 3, March 2002, 269–274 8. He Xiaojian, Li Fangmin, You Jinyuan: A video-on-demand delivery policy for improving the disk access performance. Computer Science, Vol. 30, No. 4, April 2003, 76–78

A Grid Service Lifecycle Management Scheme Jie Qiu1, Haiyan Yu2, Shuoying Chen1, Li Cha2, Wei Li2, and Zhiwei Xu2 1

Computer Science and Engineering, Beijing Institute Of Technology, Beijing, 100081, China [email protected] [email protected] 2

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China {yuhaiyan,char,liwei,zxu}@ict.ac.cn

Abstract. Grid technologies enable large-scale sharing of computing resources and make them open to formal or informal consortia of individuals, who utilize these computing resources by means of consuming Grid Services. In such context, a more mature mechanism of Grid Service management, not only on the creation and destruction of Grid Service instances but also on the whole lifecycle of Grid Services, is brought up in this paper. Hosting environments incorporated with this mechanism are more flexible and powerful to support Grid Service provisioning, such as clone of overloaded services and automatic recovery of failed services. A Grid Service naming convention, security consideration in this mechanism and our implementation of the mechanism are also presented in this paper.

1 Introduction In the OGSI [1] specification, distributed and potentially heterogeneous computing resources are abstracted as Grid Services [2] to provide a standard means of accessing them. With the support of OGSI-compliant hosting environments, physical computing resources can be easily encapsulated into Grid Services. Although the abstraction of resources as services conceals their heterogeneity, several challenging questions still remain when creating a Grid Service based application. For example, the Grid services that integrated with the application might fail without notification, requiring the system to take recovery measures such as cloning a new identical service to replace the failed one. Or, the Grid services might become overloaded and have intolerable response time. In such context, it is necessary to create new instances of services on other hosting environments on the fly to satisfy application’s requests. Furthermore, a Grid Service naming scheme is requisite for helping applications identify a Grid Service uniquely and find the right one to invoke. To address these challenges, we first propose a Grid Service management mechanism based on a formal description of a Grid Service Lifecycle and a Grid Service Naming M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 521–528, 2004. © Springer-Verlag Berlin Heidelberg 2004

522

J. Qiu et al.

convention to identify Grid Services. The Grid Service management mechanism will be integrated in hosting environment and provide some infrastructures to support cloning and dynamical loading of Grid Services. Next, we discuss security considerations and the implementation of the mechanism. Lastly, we conclude with a discussion of related work and future directions.

2 Grid Service Lifecycle At first, let’s consider the following 2 scenarios: 1. To satisfy QoS policy, a Grid Service might be cloned to another appropriate hosting environment automatically for the purpose of serving more consumers when it becomes overloaded. In this paper consumers can be a user client or even a Grid Service. 2. Under some circumstances, consumers need some proper hosting environment to deploy their own Grid Services at run-time, and this transaction will not affect other on-serving Grid Services within the same environment, e.g. without the need to restart it. To realize the above scenarios, we must explicitly define the lifecycle of Grid Services, not Grid Services Instance Lifecycle [1] that are definitely demarcated by OGSI specification, which consists of 5 phases: Loading, Installation, Initialization, Serving and Revoking.

2.1 Grid Service Loader In the loading phase, all components of a consumer-owned Grid Service are replicated to a target hosting environment (potentially from remote). The components are assembled into a package with such format that it could be directly deployed in the target hosting environment, which referred to as a ready-to-deploy package in this paper, for example, gar package in globus toolkit and war package in Service Domain [3]. Before illuminating the loading course, we define a representation of a Grid Service. We represent a Grid Service using the notation , where N denotes the name of the Grid Service and L denotes the loader of the Grid Service. As to N, we have proposed Grid Service Naming convention in latter section. Grid Service Loader (GSL) is a special Grid Service, which loads ready-to-deploy packages from remote servers or local file system, or generates ready-to-deploy packages in the target hosting environment under the control of consumers. The purpose of Loader is to support dynamical loading of Grid Services in hosting environments. There are two types of GSL: consumer-defined GSL and systematical GSL supplied by hosting environments. Systematical GSLs play two basic roles: checking existence of a specified Grid Service and loading a ready-to-deploy grid service from local or remote to its own hosting environment. Consumer-defined GSLs are written and uploaded by consumers and are also in the form of ready-to-deploy packages, which let consumers have complete control of the process of dynamical Grid Service loading.

A Grid Service Lifecycle Management Scheme

523

A consumer-defined GSL can, for example, recompile source code belong to a consumer-owned Grid Service or reconfigure some configuration files of it according to the information gathered from the target hosting environment, such as MPI binary directory, whether to support PBS, or OS information, and so on; finally the GSL assembles all stuff of the consumer-owned Grid Service and instruments a ready-todeploy package. A Consumer-defined GSL itself is represented by and furthermore all ready-to-deploy services are represented by where denotes Systematical GSL.

2.2 Formal Description of Grid Service Lifecycle Now we give 3 definitions in the below, and also illustrate how to design a model of Grid Service Lifecycle using them, which will also direct our implementation of Grid Service Lifecycle. Definition 1. K is a set of phases. where ki is one phase of Grid Service Lifecycle. Definition 2. is a set of input and output. where null and is input or output of a phase. Definition 3. f is a relation, i.e. According to the definition of our proposed Grid Service Lifecycle, Let K= {Loading, Installation, Initialization, Serving, Revoking}: Loading: I. A Consumer finds out an appropriate hosting environment that can host his own service by Information Services [4] or Matchmaking mechanism [5], and gathers environment information, such as version of axis, implementation of OGSI, and OS. II. By utilizing the Systematical GSL, the Consumer could check existence of the service to be deployed in the found hosting environment by our proposed Grid Service Naming mechanism, retrieve some characteristics of the service and make their decision if it already exists, such as version of the service and load of the service. If the service doesn’t exist, go to next step. III. If the consumer-owned service doesn’t contain a consumer-defined GSL, the consumer only employs the Systematical GSL to load the service from consumerdesignated location. Otherwise, the consumer should first utilize Systematical GSL to load the consumer-defined Loader and then employ the consumer-defined Loader to load the fact Grid Service. IV. That the ready-to-deploy package of the fact Grid Service has been generated and Loaders have correctly loaded it to right directory required by the target hosting environment indicates the completion of loading process. After the above works, a consumer-defined Loader, if exists, should be revoked, which belongs to revoking operation. Installation: Installation is comparable with deployment of Grid Services. Installation involves verifying ready-to-deploy packages and preparing run-time stuff, and would potentially trigger loading of relative services if necessary.

524

J. Qiu et al.

The representation of Grid Services, ready-to-deploy packages, is verified to ensure that its binary representation is structurally valid and complies with the target hosting environment. Verification may cause additional Grid Services to be loaded, which are employed by the installed Services. The ready-to-deploy packages must satisfy the static or structural constraints provisioned by the target hosting environment; also a hosting environment must provision relevant information of the constraints to let consumers know. Preparation involves unpacking the ready-to-deploy packages from repository of the target hosting environment, copying the files to correct directories, making some revision on some files if needed, and registering some static information to the target hosting environment. For example, most hosting environments have some configuration files to manage their Grid Services, such as server-config.xml in AXIS. What does indicate the completion of this phase? If now we restart the target hosting environment and the new installed Grid Service can correctly serve consumers after startup of the target hosting environment, we declare completion of installation of the Grid Service. But we always hope that we could directly use a Grid Service after its installation without restarting the hosting environment and without affecting other on-serving service. So we explicitly define another phase, initialization, to load Grid Services into run-time environment dynamically. Initialization: In some hosting environments, e.g. globus toolkit, automatic service activation and deactivation for more efficient and scalable memory management are supported, and this feature is also supported by both CORBA and EJB. A Grid Service in globus hosting environment will be activated on its first call or on the hosting environment startup, and then it will run as a cycle of activation and deactivation according to some policies of the target hosting environment. But the mechanism would not take effect when the consumer-owned services have just finish the last phase, installation, and the hosting environment hasn’t been restarted, because run-time structures in memory of the target hosting environment have no information of the newly installed services, though installation has updated all static configuration files. So Initialization of a Grid Service consists of registering some data structures of it to the run-time structures in memory of the target hosting environment and loading some specified components of the Grid Service to memory. After initialization, a Grid Service can correctly serve its consumers. Serving: The phase covers the whole lifecycle of a Grid service instance defined in OGSI and also includes service activation and deactivation. For more information, please refer OGSI [1]. Revoking: A consumer may request the un-deployment of a Grid Service via utilization of an explicit Revoking Service provided by the target hosting environment or via a soft-state approach, in which (as motivated and described in [2]) a consumer registers interest in the Grid Service for a specific period of time, and if that timeout expires without having received reaffirmation of interest of the service from any consumer to extend the timeout, the service may be automatically un-deployed. Periodic reaffirmation can serve to extend the lifetime of a Grid Service as long as is necessary. Of course, a Grid Service can also be hosted in the target hosting environment perpetually, and could not be revoked by consumers.

A Grid Service Lifecycle Management Scheme

525

Before really revoking a Grid Service, the target hosting environment should notify all instances of it, for example simply calling a specific operation, let them do some cleanup of resources or send back some stuff to their consumers, and then end all the instances. Revoking involves un-register and un-load all data structures of a Grid Service from memory, which has been done in initialization phase, and withdraw all files and registration information of it that has been generated in installation phase. Potentially, Revoking may also remove ready-to-deploy package of the Grid Service from repository directory of the target hosting environment. Based on the K set, we give and of three types of Grid Service Loading mode respectively: Loading local Grid Service in the hosting environment, Loading immigrant Grid Service using a systematical Loader and Loading immigrant Grid Service with a consumer-defined Loader. Local Loading: ready-to-deploy packages outside the repository directory of the target hosting environment. ready-to-deploy packages in repository of the target hosting environment, possibly all files and static configuration information to be deployed in the target hosting environment. all data structure to be registered in memory. a SOAP request from Grid Service consumers, a SOAP response to the Grid Service consumers. Loading local Grid Services in the form of ready-to-deploy packages to repository. Installing Grid Services from repository. Registering some data structures to memory. Making Grid Services ready for Serving. Grid Services Serving. Systematical Loading: Based on the above definition of and we only define the additional elements. Systematical GSL that is in the serving state. A Revoking Grid Service. Some remains of the revoked Grid Service for result and logging. Systematical GSL loads a consumer-owned Grid Service. Hosting environment requests revoking a Grid Service, because time expires. Revoking a Grid Service, not because time expires. User-defined Loading: Based on the above two definition of and we only define the additional elements. consumer-defined GSL. Loading a consumer-owned Grid Service which its Loader has made proper reconfiguration on it

526

J. Qiu et al.

Hosting environment request revoking consumer-defined GSL Revoking a consumer-defined GSL Based on the above definition of K, and we should additionally provision a Systematic GSL and a mechanism for dynamical installation and initialization of Grid Services and management of consumer-owned Grid Services, including loading and revoking, on existed hosting environments.

3 Grid Service Naming In the foregoing section, we represent Grid Service using , and N denotes the name of a Grid Service. Before a Grid Service is indeed loaded to the target hosting environment, the Grid Service Naming (GSN) convention and the definition of equivalence of Grid Services proposed in this paper can distinguish the Grid Service. Definition 4. GSN := P + SV. P: A set of the Qname of portType, all operations in the portType, and all type defines employed by the operations. All of three parts are described in Grid Service WSDL. In practice, we use hash codes to represent P. For example we use MD5 to hash every parts of the portType file. SV: Semantic Version is a set of semantic strings. Each semantic string represents semantic activates of one operation of a Grid Service PortType. In CORBA and COM, the version and IDL file identify a component. But version and IDL are not semantic. To extend this model, Grid Service provider can defines a Semantic Version for the service. One semantic string for an operation not only defines a version of the operation but also represents its semantic. Suppose that a provider provides two different Grid Services with an identical WSDL description, but function of two Grid Services is different; thus, the provider should give two different SV to the two Services. And we don’t limit the content of SV. Furthermore, provider must define a mechanism to apprehend SV and how to distinguish two Grid Services by their SV. Definition 5. iff where are two Grid Services. Definition 6. if i.e. If portTypes of two Grid Services are different, the Grid Services are different. Definition 7. If two Grid Service have identical WSDL descriptions but one or more operation(s) of them have different semantic strings and the rest operations have identical semantic strings, we call that is partly equal to represented as

A Grid Service Lifecycle Management Scheme

527

4 Security Considerations and Implementation We adopt the security mechanism of Grid Security Infrastructure (GSI). Grid Service Loader (GSL) is implemented as a secure Grid Service with the authorization policy predefined by the target hosting environment, and consumers must use a valid certificate to access the secure GSL. The GSL will authenticate consumers and authorize them according the authorization policy. Besides, hosting environment must limit what the consumer-defined GSL can do and what information it can get. Furthermore, the Grid Service loading should not affect other on-serving Grid Services in the same hosting environment. So when we implement the Grid Service management scheme, our strategy is giving a sub hosting environment, an abridged version of the hosting environment, which runs separately from the hosting environment, runs as a special system user probably differing from the system user who startups the hosting environment, and supports run-time environment for Grid Services. Based on this design, we let consumer-defined GSL run in sub hosting environment, and after Grid Service successful loading, the consumer-defined GSL will be revoked and the sub host environment will be shutdown. Fig 1 gives us the implementation mode of this scheme and Fig 2 gives us the detail framework of Grid Service Lifecycle Management.

Fig. 1. Architecture of hosting environment (left) and sub hosting environment (right)

Fig. 2. Components in Grid Service Lifecycle Management

5 Related Works We have referred to a range of related works in the body of the paper. Here we make additional comments concerning the use of globus toolkit, other works on component technologies and JVM. We use globus toolkit 3 as our infrastructure, especially OGSI implementation and of it. But globus toolkit doesn’t support management of whole

528

J. Qiu et al.

Grid Service lifecycle. We refer lifecycle of a Class specified in JVM to define Grid Service lifecycle and the model of GSN comes from illumination of version definition and IDL in component technologies.

6 Conclusions and Future Work Definition of Grid Service lifecycle in this paper, complementary to the definition of lifecycle of Grid Service instance specified in OGSI specification, covers the whole life cycle of a Grid Service, which comprises 5 phases of loading, installation, initialization, Serving, and revoking. Based on the definition, a powerful mechanism for dynamically loading Grid Services on the hosting environment has been proposed and the notion of Grid Service Loaders presented in this paper has changed the general pattern of Grid Service usage. In the past, a grid Service must be well deployed before consumers access it and consumers must follow the fixed procedures prescribed by service provides, which have limited to most extent the interaction between consumers and providers. In future, computing should be consumer-centered other than now popularly provider-centered. Under such circumstance, consumers can upload own services to fulfill their whole business by employing other services provided by Grid Service providers. But this facility will always violate security of hosting environments. So we must take further consideration in secure violation of this mechanism and system protection of hosting environments. The Grid Service naming convention proposed in this paper is an experimental solution to the key question in the field of grid computing: how to identify a Grid Service. In this paper, we have suggested a definition of equivalency of two Grid Services. And we have first proposed semantic string as a part of identification of a Grid Service. But its implementation mechanism need further research.

References 1. S. Tuecke, K. Czajkowski, I.Foster, J. Frey, S. Graham, C. Kesselman, T. Maquire, T. Sandholm, D. Snelling, P.Vanderbilt: Open Grid Services Infrastructure (OGSI) Version 1.0. Global Grid Forum 2. I. Foster, C. Kesselman, J. Nick, S. Tuecke: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project, 2002. 3. Yih-Shin Tan, Brad Topol, Vivekanand Vellanki, Jie Xing: Business Service Grid. IBM developerWorks(2003). 4. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman: Grid Information Services for Distributed Resource Sharing. Proceedings of the Tenth IEEE International Symposium on HighPerformance Distributed Computing (HPDC-10), IEEE Press, August 2001. 5. R.Raman, M.Livny, and M.Solomon: Matchmaking: Distributed Resource Management for High Throughput Computing, In proc. IEEE Symp. On High Performance Distributed Computing. IEEE Computer Society Press, 1998.

An OGSA-Based Quality of Service Framework Rashid Al-Ali1,2, Kaizar Amin1,3, Gregor von Laszewski1, Omer Rana2, and David Walker2 1

Argonne National Laboratory, Argonne, IL, U.S.A 2 Cardiff University, UK. 3 University of North Texas, U.S.A.

Abstract. Grid computing provides a robust paradigm to aggregate disparate resources in a secure and controlled environment. Grid architectures require an underpinning Quality of Service (QoS) support in order to manage complex data and computation intensive applications. However, QoS guarantees in the Grid context have not been given the attention they merit. In order to enhance the functionality offered by computational Grids, we overlay the Grid framework with an advanced QoS architecture, called G-QoSM. The G-QoSM framework provides a new service-oriented QoS management model that leverages the Open Grid Service Architecture (OGSA) and has a number of interesting features: (1) Grid service discovery based on QoS attributes, (2) policy-based admission control for advance reservation support, and (3) Grid service execution with QoS constraints. This paper discusses the different components of the G-QoSM framework, in the context of OGSA architectures.

1 Introduction Grid computing [1,2] has traditionally focused on large-scale sharing of distributed resources, sophisticated applications, and the achievement of high performance. The Grid architecture integrates diverse network environments with widely varying resource and security characteristics into virtual organizations (VO). Computational Grids offer a high end environment that can be exploited by advanced scientific and commercial applications. Soft Quality of Service (QoS) assurances are made by Grid environments by the virtue of their establishment. Grid services are hosted on specialized “high-end” resources including scientific instruments, clusters, and data storage systems. High connectivity is maintained between resources via dedicated high-speed networks. A well-established resource administration facilitates constant resource connectivity, resource monitoring, and fault tolerance. Hence, some preliminary level of QoS is provided by the committed members of the VO based on their pre-agreed Grid policy and their dedication in the overall collaboration. Nevertheless, the complexities involved in several critical Grid applications make it imperative to provide hard and guaranteed QoS assurances beyond that provided by the basic Grid infrastructure. Considering the increasing sophistication of Grid applications and new hardware under development [3] such provisions become an inherent requirement within the Grid architecture. This implies a need for a QoS management entity that facilitates a negotiation mechanism, where the clients can select the appropriate resources with QoS constraints that suit client needs. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 529–540, 2004. © Springer-Verlag Berlin Heidelberg 2004

530

R. Al-Ali et al.

Motivated by this need, to overlay an advanced QoS framework on existing Grid architectures allowing them to support complex QoS requirements, we propose a QoS management framework, called as G-QoSM. Supporting the recent standardization efforts of the Global Grid Forum [4], the G-QoSM framework is based compatible with the latest Open Grid Services Architecture (OGSA) specification. The G-QoSM framework presented in this paper has a number of important features: (1) a ‘QoS brokering service, 2) a ‘policy service and 3) a generic resource ‘reservation manager that includes: support for advance and immediate reservation, support for single and collective resource reservations (co-reservation), accommodation of arbitrary resource types, for example, compute, network and disk, and scalability and and flexibility through an object-oriented that is uses underlying resource characteristics at run-time. The paper is structured as follows. In Section 2 we provide an overview of related research in the area of resource reservation to support QoS needs. In Section 3.1 we outline the general requirements of the Grid QoS model, and present the OGSA-based G-QoSM framework with reservation support. In Section 7.1 we define the reservation, and we present a reservation admission control mechanism and reservation features. We conclude the paper with a summary of conclusions.

2 Related Work Immediate and advance reservation is considered in a wide variety of systems mostly in networking, communication, and distributed applications including distributed multi media applications (DMM). Hence it is of considerable interest to the Grid community. In the context of Grid computing, GARA [5] is a QoS framework that provides programmers a convenient access to end-to-end QoS. It provides advance reservations with uniform treatment to various types of resources such as network, compute, and disk. GARA’s reservation is a promise that the client/application who initiated the reservation will receive a specific level of service quality from the resource manager. GARA also provides reservation application program interface (API) to manipulate reservation requests, such as, create, modify, bind and cancel. NAFUR [6] describes the design and implementation of a QoS negotiation system with advance reservation support in the context of DMM applications. NAFUR aims to compute the QoS that can be supported at the time the service request is made, and at certain carefully-chosen, later times. For example, if the requested multimedia service with the desired QoS cannot be supported at the time the service request is made, the proposed approach allows the computation of the earliest time the user can start the multimedia service with the desired QoS. In [7] a resource broker (RB) model in the context of middleware for DMM application is proposed. The proposed RB has the following design goals: 1) advance and immediate reservation, 2) a new admission control scheme based on using a timely adaptive state tree (TAST) and 3) the RB processes brokerage requests for reservation, modifications, allocation and release.

An OGSA-Based Quality of Service Framework

531

In [8] advance reservation is formalized in the context of networking systems and the fundamental problem of admission control associated with resource reservation is introduced. Based on the authors literature review it is concluded that none of the previous approaches is sufficiently flexible to cover all potential needs of all users. The proposed solution to this fundamental problem is to separate the issue into a technical and a policy part supported by a specifying a generic reservation service description and a corresponding policy layer. This combination improves the flexibility of resource advance reservation compared to the other approaches. None of the research efforts address advance reservation in the context of service-oriented architecture, as in our approach. In general, resource reservation is not widely explored in service-oriented Grids. Nevertheless, the GGF Grid Resource Agreement and Allocation Protocol (GRAAP) Working Group, has produced a ‘state of the art’ document, which lays down properties for resource reservation in Grids [9]. We envision that our reservation model can be used to support the reservation properties outlined by the GRAAP-WG. The features that distinguish our work from existing QoS management approaches are that the generic QoS management service is not coupled to any specific resource type, or even limited to resource quantity; the object-oriented design and the abstraction approach gives the proposed service the ability to integrate with any brokerage system that supports web service interaction; dynamic information gathering and management, such as, resource characteristics and policy information improves scalability; and usage policy frameworks for resource providers/administrators and users to enable a fine-grained request specification. In addition to the projects mentioned above, a general negotiation model called Service Negotiation and Acquisition Protocol (SNAP) is introduced in [10], which proposes a resource management model for negotiating resources in distributed systems. SNAP defines three types of SLAs that co-ordinate management across a desired resource set, and can, together, be used to describe a complex service requirement in a distributed system environment: task SLA (TSLA), resource SLA (RSLA) and bind SLA (BSLA). The TSLA describes the task and the RSLA describes the resources needed to accomplish the task in the TSLA. The BSLA associates the resources from the RSLA and the application ‘task’ in the TSLA. The SNAP protocol necessitates the existence of resource management entity that can provide promises on resource capability; for example, RSLA. Therefore, our reservation model can encapsulate such a requirement and implement the RSLA negotiation.

3 The Proposed QoS Framework In this section we introduce the proposed Grid QoS Management framework. We outline general requirements for the framework, and then we provide discussion on QoS management and the proposed system.

532

R. Al-Ali et al.

3.1 Requirements The proposed framework must adhere to certain important requirements: Service Discovery. The system should be able to discover services based on QoS attributes. These attributes are a) quantitative and b) qualitative. For example, quantitative attributes include computation, networking and storage requirements, while qualitative attributes include the degree of service reputation and service licensing cost. To support service discovery based on these attributes, a discovery mechanism needs to be employed within the proposed framework. Resource Advance Reservation. The system should support mechanisms for advance, immediate, or ‘on demand’ resource reservation. Advance reservation is particularly important when dealing with scarce resources, as is often the case with high performance and high end scientific applications in Grids. Reservation Policy. The system should support a mechanism which facilitates Grid resource owners enforcing their policies governing when, how, and who can use their resource, while decoupling reservation and policy entities, in order to improve reservation flexibility. [8]. Agreement Protocol. The system should assure the clients of their advance reservation status, and the resource quality they expect during the service session. Such assurance can be contained in an agreement protocol, such as Service Level Agreements (SLAs). Security. The system should prevent malicious users penetrating, or altering the data repositories that holds information about reservations, policies and agreement protocols. A proper security infrastructure is required, such as Public Key Infrastructure (PKI). Simple. The system should have a simple design that requires minimal overheads in terms of computation, infrastructure, storage, and message complexity. Scalability. The system should be scalable to large numbers of entities, as the Grid is a global scale infrastructure.

3.2 Grid Quality of Service Management Grid Quality of Service Management (G-QoSM) is a new approach to supporting Quality of Service (QoS) management in computational Grids, in the context of Open Grid Service Architecture (OGSA). QoS management includes a range of activities, from resource selection, allocation, and resource release; activities applied in the course of a QoS session. A QoS session includes three main phases: i) the establishment phase, ii) the active phase, and iii) the clearing phase [11]. In QoS-oriented architectures,during the ‘establishment phase’, a client’s application states the desired service and QoS specification. The QoS broker then undertakes a service discovery, based on the specified QoS properties, and negotiates an agreement offer for the client’s application. During the ‘active phase’, additional activities, including QoS monitoring, adaptation, accounting and possibly re-negotiation, may take place. The ‘clearing phase’ is responsible to terminate QoS session, either through resource reservation expiration, agreement violation or service completion, and resources are freed for use by other clients.

An OGS A-Based Quality of Service Framework

533

Quality of service management has been explored in a number of contexts, particularly for computer networks [12], multimedia applications [13] and Grid computing [5]. Regardless of the context, a QoS management system should address the following needs: Specifying QoS requirements. Mapping QoS requirements to resource capabilities. Negotiating QoS with resource owners - where a requirement cannot be exactly met. Establishing service level agreements (SLAs) with clients. Reserving and allocating resources. Monitoring parameters associated with a QoS session. Adapting to varying resource quality characteristics. Terminating QoS sessions. The G-QoSM [14] framework aims to operate in service-oriented architectures. It provides three main functions: (1) support for resource and service discovery based on QoS properties, (2) support for providing QoS guarantees at middleware and network levels, and establishing Service Level Agreements (SLAs) to enforce these guarantees, and (3) providing QoS adaptation for the allocated resources. The G-QoSM delivers three types of QoS levels: Guaranteed, Controlled Load and Best Effort QoS. At the ‘guaranteed level’, constraints, related to the QoS parameters of the client, need to exactly match the service provision. ‘Controlled load’ is similar to the ‘guaranteed’ level, with the exception that less stringent parameter constraints are defined, and the notion of range-based QoS attributes is used along with range-based SLAs. At the ‘best effort’ QoS level the resource manager has full control in choosing the QoS level without constraints, corresponding to the default case when no QoS requirements are specified. The G-QoSM is an ongoing project, previously investigated and implemented in the context of Globus toolkit (GT) 2.0, [14] [15] using the GARA framework to provide QoS support for ‘compute’ resources. However, with the emergence of service-oriented Grids, and Open Grid Service Architecture (OGSA) [ 16] it is necessary to introduce new features to the G-QoSM, to make it OGSA-enabled and GT3 compliant. In this new GQoSM architecture GARA is not utilized, and is replaced by a new reservation manager, policy service, allocation manager and a newly-developed Java API for a Dynamic Soft Real Time (DSRT) scheduler [17]. The new features in the OGSA-enabled G-QoSM are as follows: QoS brokering service as a Grid service. generic resource reservation manager. policy service as a Grid service. A framework that is OGSA-enabled and can be instantiated in the context of GT3. Figure 1 shows the new G-QoSM OGSA-enabled architecture.

4 QoS Grid Service QoS Grid Service (QGS) is the focal point of this architecture and exists in every Grid node. The QGS interacts with the client’s application, the QoS selection Service, the reservation manager, and the policy Grid service to support:

534

R. Al-Ali et al.

Fig. 1. Framework Architecture.

Interaction with Client’s Application. To primarily capture the service request with QoS constraints, and to negotiate a QoS agreement SLA interaction with client’s application is needed. This negotiation can be summarized as attempting to find the ‘best match’ service, based on given properties and priority levels, for example, one might request that cost has a higher priority than service reliability, and the matching process should comply with such a requirement. Once the best service match is found, and corresponding resources are reserved, an agreement offer is proposed to the client’s application. If the proposed agreement is approved, it becomes a commitment, and the QGS regards this agreement as a fixed guarantee. Otherwise resources are released and no agreement takes place. Interaction with the QoS Selection Service. In order to support basic concept queries, a QoS selection service is provided with QoS constraints similar to the one supplied by the client’s application. It’s main function is to provide information for selecting the best service. Normally, the selection service replies with a list of service matches, which necessitates the QGS selecting one of the returned services. To enable the best selection, we adapted a selection algorithm based on a Weighted Average (WA) concept, taking into account the proportional value of each QoS attribute, using the importance level supplied by the user in the ‘service request’, rather than each attribute being treated equally. The ‘importance level’ associates a level of importance or priority, such as High (H), Medium (M) and Low (L), to each QoS attribute, with this importance level mapped

An OGSA-Based Quality of Service Framework

535

to a numerical value (real number). The algorithm computes the WA for every returned service and selects the service with the highest WA. Interaction with Reservation Manager. After selecting a Grid service the functional requirements, required in support of the reservation, are extracted and formulated as resource specifications. These resource specifications are then submitted to the reservation manager for resource reservation, and a reservation ‘handle’ is returned in the case of a successful reservation. This reservation handle can be later used to claim, or manipulate, the reservation. Interaction with Policy Grid Service. Interaction with the policy grid service enables the QGS to capture policy information necessary to validate the service request. For example, to discover if there is any limitation on resource utilization per service, or the class of service requested. The QGS validates the service request by applying the rules obtained from the Policy Grid Service.

5 QoS Allocation Manager The Allocation Manager’s primary role is to interact with underlying resource managers for resource allocation and de-allocation, and to inquire about the status of the resources. It has interfaces with various resource managers employed in this framework, namely, the Dynamic Soft Real Time Scheduler (DSRT) [17] and a Network Resource Manager (NRM). It associates the execution of Grid services with a previously-negotiated SLA agreement, which process, of associating Grid services with SLAs, is beyond the scope of this paper. The Allocation Manager further interacts with adaptive services to enforce adaptation strategies, with more details on adaptation to be found in [15]. The DSRT [ 17] is a user-level soft real-time scheduler, based on the changing priority mechanism supported by Unix and Linux operating systems. The highest fixed priority is reserved for the DSRT and the real-time process admitted by the DSRT can then run under the DSRT scheduling mechanism. The real-time process can thus be scheduled to utilize a specific CPU percentage. Therefore, the compute QoS supported by the DSRT can be specified in terms of CPU percentage; for example, a real-time process might request the allocation of 40% of the CPU. The Network Resource Manager (NRM) is conceptually a Differentiated Services (Diffserv) Bandwidth Broker (BB) (a concept described in [18]), and manages network QoS parameters within a given domain, based on agreed SLAs. The NRM is also responsible for managing inter-domain communication, with NRMs in neighboring domains, to coordinate SLAs across domain boundaries. The NRM may communicate with local monitoring tools to determine the state of the network and its current configuration.

6 QoS Policy Service Policy Service is a Grid service aiming to provide dynamic information about the domainspecific resources’ characteristics and the domain’s policy concerning when, what and who is authorized to use resources. This policy service relies heavily on the existence of a policy repository, such as, the ‘policy controller’ in our framework. Resource owners

536

R. Al-Ali et al.

include in the policy repository domain-specific rules; for example, resource capacity allowed to be utilized with user authentication, time of the day and class of service. These rules are utilized by the policy service manager to provide information on resource characteristics and domain policies. Having a separate policy manager as a Grid service allows the following advantages: The ability for resource owners to update their policy repository without interfering with other broker services. The resource owner may delegate a remote ‘super’ policy service to act as the policy controller of their resources. Similarly, a policy service might control more than a single administrative domain. Decoupling the policy service from other broker services, allows the ability to dynamically change resource usage policy and system scalability.

7 QoS Reservation Manager Reservation support plays a major role in QoS-oriented architecture. In a shared resource environment, such as Grids, QoS brokers can provide promises on delivering certain resource quality to their clients, if, and only if, a reservation mechanism exists. A reservation can be viewed as a promise from the resource broker to clients on expected quality. Advance resource reservation is defined as: a possibly limited or restricted delegation of a particular resource capability over a defined time interval, obtained by the requester from the resource owner through a negotiation process [9]. As pointed out earlier, resource reservation can be categorized into: (a) Advance reservation and (b) Immediate or ‘on demand’ reservation, and can be for a specified duration, or indefinite. In the proposed reservation manager, we support advance/immediate reservation for a specified duration. Indefinite reservation is undesirable as it introduces blockages, which may result in a waste of unused resources. An important feature of this reservation approach is support for the co-reservation of various resources in service Grids. In this section we further discuss the formal definition of reservation, admission control and outline reservation features.

7.1 Reservation Definition We define a reservation model for collective Grid resources, with as few restrictions as possible, to increase the flexibility of the admission control. The fundamental problem with advance reservation, as discussed in literature [8], is that when an advance reservation is granted, the time from when the reservation is submitted until the start time, is called ‘hold-back time’, and to utilize, or grant, reservations during hold-back time is a complex problem. The problem arises when clients request immediate reservation for an indefinite period, which may, obviously, overrun a previously-granted advance reservation. A number of solutions are proposed to solve this problem; for example, all reservations, including immediate reservation, must be specified within a time frame (i.e. indefinite reservation is not supported); another solution proposes to partition resources for immediate reservation, and advance reservation with specified durations. In

An OGSA-Based Quality of Service Framework

537

this model we opt for the first proposal; that all reservations must be accompanied by duration specifications. We consider this a valid assumption as we deal with high performance resources, and application domains, like scientific experiments or simulations, means there is prior knowledge of the need for such resources, and no ad-hoc requests for simple resources. We formally define reservation R in terms of the following (5) parameters: : reservation start time : reservation end time : reservation class of service : each resource has a resource type. Such types can be “compute”, “network”, and “disk”, ... . : is a function that returns the capacity of resource at time t. With these notation one can express reservation request as a co-reservation for resources, with start time and end time reservation class cl on with the associated capacities as follows:

using QoS

We also introduce in this definition the concept of pre-emption priority, which has been explored in the context of networking and communication service [8]. The preemption priority is that when the reservation is not in effect, either before or after the reservation period, the job, or service that makes use of the reserved resource is not turned down or eliminated, but is rather assigned a low priority value, which means switching its status from ‘guaranteed’ to a ‘best effort’ type of service. In practice to support this concept the underlying resource manager should be a priority-based system, such as the Dynamic Soft Real Time (DSRT) scheduler [17]. This feature is very useful in protecting applications when reservations expired.

7.2 Admission Control Admission control is the process of granting/denying reservation requests based on a number of factors, such as, the actual load of the specified resource, the policy that governs who, how and when reservation for resource usage should be granted. To perform an admission control process an admission control mechanism must be employed. We formally describe our admission control mechanism as a ‘Boolean’ function that returns true or false for a reservation request R at time true means the reservation can be granted for the given time with the resource specifications, and false means otherwise. To further define the admission control function algorithm, we first define the notion of resource load L at time

where is the number of granted reservations for time and capacity reserved on the resource type at time

is the amount of

538

R. Al-Ali et al.

We also need to define resource total capacity as the maximum capacity the underlying resource can provide; formally is the maximum capacity that the resource can provide. With the above basic primitives, we can now define the algorithm for the admission control function.

7.3 Reservation Features As the reservation manager presented in this work operates in a Open Grid Service Infrastructure (OGSI), the service has a number of ‘operations’ can be used by other components. These operations are implemented as an API with a set of primitives, briefly described as: reserve: is invoked by sending a reservation tuple R, this replies with a ‘reject reservation’, if the reservation cannot be granted. Otherwise it returns a reservation ‘handle’, a reference for the newly-made reservation request. isAvailable: is used for checking the status of some resource prior to placing the actual reservation; this operation returns a Boolean result accordingly. nextAvailable: is used for ‘counter-proposals’ brokering service if the user’s request for reservation cannot be granted. Rather than replying with a yes/no answer, as is the case with most reservation systems, the operation can reply with a ‘no’ and a counter-proposal for the next availability. extend: can modify a reservation by extending it for a specified duration. find: finds a reservation, and replies with all details about the reservation. cancel: cancels a reservation. With this set of reservation operations on the reservation manager a higher level brokering service, or agent, can make use of this manager to provide immediate reservations, and reservations in advance, and also manipulate these reservations.

An OGSA-Based Quality of Service Framework

539

8 Conclusion In this paper, we propose a QoS service model in service-oriented Grids comprising a brokering service and a number of supporting modules, including policy service, reservation manager, allocation manager, and QoS selection service. Throughout this paper we describe the individual components of our framework and outline their patterns of interaction. We also discuss an OGSA compliant prototype implementation for our G-QoSM architecture. The important features of our approach are: the QoS manager is a Grid service and dynamically interacts with a reservation and policy service modules, which makes it possible for resource owners to update/modify their policies during run-time; and the reservation is abstracted as a generic service for co-reservation support, which makes it very suitable for distributed computing, such as Grids. This abstraction allows the reservation service to operate with any underlying resources, without previous knowledge of the resource characteristics, with the association of resource characteristics taking place during run-time by querying the policy service. This novel feature demonstrates scalability - highly desirable in Grid infrastructure. Acknowledgment. This work was supported by the Mathematical, Information, and Computational Science Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract W-31109-Eng-38. DARPA, DOE, and NSF support Globus Project research and development. The Java CoG Kit Project is supported by DOE SciDAC and NSF Alliance.

References 1. G. von Laszewski and P. Wagstrom, “Gestalt of the Grid,” in Performance Evaluation and Characterization of Parallel and Distributed Computing Tools, ser. Series on Parallel and Distributed Computing. Wiley, 2003, (to be published). http://www.mcs.anl.gov/~gregor/papers/vonLaszewski--gestalt.pdf 2. I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of Supercomputing Applications, vol. 15, no. 3, 2002. http://www.globus.org/research/papers/anatomy.pdf 3. “TeraGrid,” Web Page, 2001. http://www.teragrid.org/ 4. “The Global Grid Forum Web Page,” Web Page, http://www.gridforum.org 5. I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, and A. Roy, “A distributed resource management architecture that supports advance reservation and co-allocation,” in Proceedings of the International Workshop on Quality of Service, vol. 13, no. 5, 1999, pp. 27–36. 6. A. Hafid, G. Bochmann, and R. Dssouli, “A quality of service negotiation approach with future reservation (nafur): A detailed study,” Computer Networks and ISDN, vol. 30, no. 8, 1998. 7. K. Kim and K. Nahrstedt, “A resource broker model with integrated reservation scheme,” in IEEE International Conference on Multimedia and Expo (ICME2000), 2000. 8. M. Karsten, N. Berier, L. Wolf, and R. Steinmetz, “A policy-based service specification for resource reservation in advance,” in International Conference on Computer Communications (ICCC’99), 1999.

540

R. Al-Ali et al.

9. J. MacLaren, “Advance reservations: State of the Art,” GGF GRAAP-WG, See Web Site at: http://www.fz-juelich.de/zam/RD/coop/ggf/graap/graap-wg.html, Last visited: August 2003. 10. K. Czajkowski, I. Foster, C. Kesselman, V. Sander, and S. Tuecke, “SNAP: A Protocol for Negotiating Service Level Agreements and Coordinating Resource Management in Distributed Systems,” in Proceedings of the 8th Workshop on Job Scheduling Strategies for Parallel Processing, 2002. 11. A. Hafid and G.Bochmann, “Quality of service adaptation in distributed multimedia applications,” ACM Springer-Verlag Multimedia Systems Journal, vol. 6, no. 5, pp. 299–315, 1998. 12. A. Oguz et al., “The mobiware toolkit: Programmable support for adaptive mobile networking,” IEEE Pesronal Communications Magazine, Special Issue on Adapting to Network and Client Variability, vol. 5, no. 4, 1998. 13. G. Bochmann and A. Hafid, “Some principles for quality of service management,” Universite de Montreal, Tech. Rep., 1996. 14. R. Al-Ali, O. Rana, D. Walker, S. Jha, and S. Sohail, “G-QoSM: Grid Service Discovery using QoS Properties,” Computing and Informatics Journal, Special Issue on Grid Computing, vol. 21, no. 4, pp. 363–382, 2002. 15. R. Al-Ali, A. Hafid, O. Rana, and D. Walker, “Qos adaptation in service-oriented grids,” in Proceedings of the 1st International Workshop on Middleware for Grid Computing (MGC2003) at ACM/IFIP/USENIX Middleware 2003, Rio de Janeiro, Brazil, 2003. 16. I. Foster, C. Kesselman, et al., “The physiology of the grid:an open grid services architecture for distributed systems integration,” Argonne National Laboratory, Chicago, Tech. Rep., January 2002. 17. H. Chu and K. Nahrstedt, “A cpu service classes for multimedia applications,” in IEEE Multimedia Systems ’99, 1999. 18. B. Teitelbaum, S. Hares, L. Dunn, R. Neilson, R. Narayan, and F. Reichmeyer, “Internet2 qbone: Building a testbed for differentiated services,” IEEE Networks, 1999.

A Service Management Scheme for Grid Systems Wei Li, Zhiwei Xu, Li Cha, Haiyan Yu, Jie Qiu, and Yanzhe Zhang Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China {liwei, zxu, char, yuhaiyan, zhangyanzhe}@ict.ac.cn, [email protected]

Abstract. In this paper, we propose a service management scheme named Grid Service Space (GSS) model, which provides application developers a highlevel logical view of grid services and a set of primitives to control the full lifecycle of a grid service. The GSS model provides a novel approach to meet the desired management requirements for large-scale service-oriented grids, including location-independent service naming, transparent service access, fault tolerance, and controllable service lifecycle.

1 Introduction The physical resource independent property, such as location transparency and access transparency, is a general design principle for resource management in distributed systems. With the emergence of grid computing, the distributed resources are abstracted as Grid Services [3], which aims at the hidden of the heterogeneity of various resources and focuses on the standardization of interface descriptions, access semantics and information representations. Strictly speaking, the current definition of grid service does not endow distributed resources with fully virtual properties due to the use of location-dependent naming mechanism (e.g. The OGSA [3] framework leverages a URL-based naming scheme to indicate a service instance’s physical location) and the lack of transparent service access mechanisms. Under such circumstances, developers have to take extra efforts on much general-purpose resource management work, such as service discovery, scheduling, error recovery, etc. Another problem is that a developer has to modify his applications when the URL-based name of a service changes. How to achieve complete physical resource independency remains a challenge for grid resource management. From the knowledge of traditional operating system design, we know that the virtualization technologies, such as virtual memory [1] and virtual file system [6], are common ways to obtain physical resource independent properties. The virtual memory technology can fulfill the requirements of dynamic storage allocation, i.e. desires for program modularity, machine independence, dynamic data structures, elimination of manual overlays, etc. The virtual file system technology enables the accommodation of multiple file system implementations within an individual operating system kernel, which may encompass local, remote, or even non-UNIX file systems.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 541–548, 2004. © Springer-Verlag Berlin Heidelberg 2004

542

W. Li et al.

To obtain the full physical resource independent properties, we adopt a service management scheme called Grid Service Space (GSS) model, which is similar to the experiences of virtual memory and virtual file system. With this model, a programmer can refer to a service by a location-transparent name without knowing the service location, status, capabilities, etc. Hence, the runtime system can obtain several benefits such as load balancing (by choosing lightly loaded services), fault tolerance (by switching to a new service in response to service failure), locality of service access (by locating a nearer service), etc. The paper is organized as follows. Section 2 analyzes the requirements for grid service management. Section 3 presents the detail description of the GSS model. Section 4 introduces the implementation and section 5 concludes this paper.

2 Requirements for Service Management In current grid research, main efforts have been put on standardizing physical resources as grid services. Analogous to the traditional operating system harnessing the use of hardware, a grid operating system (GOS) becomes a natural solution to manage the use of grid resources. More precisely speaking, a GOS is a runtime system that can manage the heterogeneous, distributed, and dynamical resources efficiently. To realize such a GOS, it is necessary to analyze the lifecycle of a grid application carefully, which can be divided into programming phase and runtime phase. At the Programming phase, a programmer needs to integrate various services together to solve a problem. In most cases, programmers do not care about the location of services (i.e. where the task to be executed). From the view of programmers, the services should be physical resource independent, and a programmer can refer to a service just by a unique name and desired attribute descriptions. At the Runtime phase, when a program is running, it often encounters problems such as resource scheduling, error recovery, task migration, etc. A GOS should provide transparent service access mechanisms including service discovery, error recovery, lifecycle control, etc. to reduce burden of attacking the above issues. From the above analysis, we can summarize the main requirements of a service management system as physical resource independent naming, transparent service discovery and scheduling, service lifecycle control and fault-tolerance. In addition, the GOS should also consider the implementation issues such as resource topology, programming language support, performance, reliabilities, etc.

3 The GSS Model The GSS model is proposed to abstract and define the key concepts for a service management system. In this model, the basic elements are virtual services and physical services, which also construct the Virtual Service Space (VSS) and Physical Service Space (PSS). The difference is that a virtual service is a logical representation for

A Service Management Scheme for Grid Systems

543

a computational resource while a physical service is a computational entity with network access interface. A functional equivalence of multiple services is indicated by a coessential identifier, which means that these services have same processing functions (though they may have different capabilities and attributes). The mapping between two coessential services is called coessential mapping. The coessential mapping from a virtual service to a physical service is called scheduling mapping, and the coessential mapping from a virtual service to a virtual service is called translating mapping. For a given virtual service with a certain coessential identifier, all physical services, which have same coessential identifiers, are called discoverable set of this virtual service. All physical services in a discoverable set are candidates for this virtual service to bind to. In addition to the above basic definitions, we introduce the Virtual Service Block (VSB), which is a subset of VSS and groups related services together within a VSS. we also provide a set of primitives for service lifecycle control. Several service states are defined to indicate different phases of a service lifecycle. A service can switch to different states via the service lifecycle control primitives.

3.1 Formal Definitions Definition 1. A Service Space is a set denoted by name of a service.

where

is the

Definition 2. For all services in a service space S, they can be divided into two types, which are denoted by a set L = {vs, ps}. vs represents a virtual service, whose name is location independent, and ps represents a physical service, which has a locatable address. We denote the type of a service as where A service space is called a Virtual Service Space of S and denoted by V, if and for each service there is A service space is called a Physical Service Space of S and denoted by P, if and for each service there is Definition 3. If have a same function, we say are coessential services. The function is expressed by a coessential identifier e. All coessential identifiers construct a set which is called coessential identifier set for service space S. For each it has one and only one coessential identifier. That is, for each there is a mapping If two service and are coessential services, there is an equation Definition 4. The set coessential service set of is and Definition 5. The mapping of from to where

and and is called the for service space For every service space there and for random two coessential service sets there is is called the coessential mapping and and and

544

W. Li et al.

and and Especially, the coessential mapping of from a VSS V to a PSS P is called scheduling mapping. For each virtual service P is called the discoverable set of s. In addition, the coessential mapping of from one VSS V to another VSS V’ is called translating mapping. For each virtual service V ’ is called the translatable set of s.

3.2 Semantics of GSS Management Service naming mechanisms. The definitions of virtual service and physical service do not give the semantic of a service name explicitly. For a virtual service, the only prerequisite to its name is to differentiate it from other virtual services in one VSS. The location-independent means this name contains no physical resource information and should be translated to a locatable resource before a program can access this virtual service. In our model, the virtual service can use a code-based name or stringbased name, which can be user-friendly or even semantic-based. With the location independent naming mechanism of VSS, programmers can develop applications at a virtual resource layer. For a physical service, the GSS model does not restrict its naming mechanism only if the service name can indicate a locatable address. For example, we can use an IP address and a TCP port to indicate a service instead. The URL-based naming mechanism in the OGSA framework guarantees the global uniqueness of grid services and gives each resource a locatable address. Virtual Service Block. Normally, application developers require the ability to organize a group of related services together. In addition, a programmer needs the ability to refer to this group of services by a name. Similar to the virtual memory design, a service management scheme should fulfill the objectives such as program modularity, module protection and code sharing. The Virtual Service Block (VSB) can achieve the above objectives. In the GSS model, a VSS is composed by a set of named VSB. Each block is an array of service names. The service name in a blocked VSS is comprised by two components (b, s), where b is the name of a VSB and S is a unique name within b. The first objective can be achieved by allocating each module to one VSB. Therefore, other programs can easily share this module just by changing the block name, and the name within this block can remain unchanged. The second objective can be gained by adding extra information and checking mechanism. Each VSB can set the information such as block owner, access rights, etc., to implement the space protection. By mapping one VSB into multiple VSS and using different block names, multiple programs can share the code of modules in other programs. Virtual-Physical Service Space Mapping. Different from the memory mapping technology, the virtual-physical service space mapping in the GSS model is more complex. Although a VSS is similar to a virtual memory space, the PSS is much different from the memory space due to the feature of autonomous control and huge

A Service Management Scheme for Grid Systems

545

size. These two limitations brought several difficulties for efficient service space mapping. The first one makes it hard to deploy a physical service to a specific address (except for service owners). The second one may cause the performance of service locating even worse because of the huge search space of PSS. In the GSS model, we use coessential mapping mechanism formally described in Definition 5, parallel pre-mapping technology and discoverable set to address the above problems. When mapping a virtual service to a physical service, we should consider two important issues: correctness and performance. The correctness can be guaranteed by following the definitions of GSS model. The performance can be improved by better organization of physical services, efficient service locating, and scheduling policies. Several research work [2] [4] [5] have concentrated their efforts on the above issues. To improve the performance of service space mapping, we exploits the parallel pre-mapping technology together with VSB to improve the overall service space mapping performance by hiding the service locating time. The idea is to keep locating and mapping multiple physical services for a group of given virtual services (such as all virtual services in a VSB) in parallel before a running program refers to these virtual services actually. In addition, for each coessential identifier, we use the discoverable set defined in Definition 5 to build a small search space for service mapping, which also can reduce the searching time.

Fig. 1. Using Parallel Pre-mapping technology together with VSB to hide the searching time.

Fig. 2. Using discoverable set to reduce search space.

Figure 1 illustrates the parallel pre-mapping technology used in service space mapping. When loading a program, the GOS will map several virtual spaces in parallel at first. When a program starts to run, it can directly access the mapped physical

546

W. Li et al.

services. At the same time, the GOS will continually map virtual services of subsequent VSB in parallel. Figure 2 illustrates using of a discoverable set to reduce the search space for a virtual service. The GOS will build a discoverable set for each coessential identifier before loading programs. When a program is loaded, the search operations can be performed within a relative small discoverable set. The parallel pre-mapping and discoverable set technologies can be utilized together to improve the overall performance of service space mapping.

3.3 Service Lifecycle Control Compared to physical memory access, the lifecycle of service access is more complex. When a user accesses a service, there may have single send/receive operation or multiple send/receive operations. While in virtual memory systems, accessing a memory cell is in fixed time and the access mode is determined. To perform correctly and determinedly, lifecycle control of services is needed. In our GSS model, the different capabilities and properties between virtual services and physical services imply that they have different lifecycle patterns. Different control primitives are needed to manage the status transition of virtual services and physical services respectively. Here we mainly introduce the lifecycle control of virtual services. When a programmer refers to a virtual service, he not only want know a location independent name but also the full process of service access. In this section, we provide a set of primitives to control the activities of a virtual service. In order to describe the lifecycle control of a virtual service properly, we use a more concrete entity called mService to represent a virtual service. The mService can be defined as a tuple where n is a unique service name in a VSS, e is the coessential identifier, i is the session identifier, is the coessential mapping, V is a VSS, p is a physical service name and st is a service state indicator, which is an element of a set ST = { Created, Binded, Running, Waiting, Terminated}. The lifecycle of a virtual service includes several relevant operations, such as the virtual service creation, service discovery and scheduling, session control, etc. The lifecycle control primitives for virtual services are summarized as follows: create (n), performed when we create a new virtual service and start up a new session with it. After this operation, the state of mService st = Created and a session identifier i is returned. open (n), performed when we reopen an existing virtual service that is out of session. After this operation, the state of the virtual service remains unchanged and a session identifier i is returned. delete (n), performed when we remove a virtual service from a VSS. After this operation, the virtual service with name n is deleted from this VSS. bind performed when we map a physical service to a virtual service. After this operation, there is and st = Binded. In addition, the virtual service n is added to VSS V, the coessential identifier e is added to

A Service Management Scheme for Grid Systems

547

invoke (i), performed when we call a method of a virtual service. After this operation, st = Running. sleep (i), performed by program or GOS kernel. After the operation, st = Waiting. interrupt (n, i), performed when an external event occurs. After this operation, st = Running. close (i), performed when we cut off the current session with a virtual service. After this operation, the virtual service is out of session and the users cannot interact with this virtual service until using open primitive to create a new session.

4 Implementations The GSS model is a key feature of the Vega GOS in Vega Grid project [8] [9], which aims at learning fundamental properties of grid computing, and developing key techniques that are essential for building grid systems and applications. The Vega GOS is also used to build a testbed called China National Grid (CNGrid), which is sponsored by the 863 program and aims at integrating high performance computers of China together to provide a virtual super computing environment.

Fig. 3. The layered architecture of Vega Grid.

The architecture of Vega Grid is conformed to the OGSA framework and Figure 3 shows the layered architecture of Vega Grid. At the resource layer, resources are encapsulated as grid services or web services. The GOS layer will aggregate these services together and provides a virtual view for developers, who can use the APIs, utilities, and developing environments provided by Vega GOS to build VSS-based applications. At this layer, the most import work is deploying and publishing a physical service to upper layers. In our implementation, each physical service should have a unique coessential identifier. After generating the coessential identifier for a physical service, we register this service to a resource router with the coessential identifier and other information needed. According the algorithm in [7], this physical service will be published to all resource routers and every GOS can know the existence of this physical service. At the GOS layer, the resource router plays an important role to locating resources. The current implementation of Vega GOS is developed as grid services specified in [3]. In addition, Vega GOS implements the virtual service lifecycle management defined in Section 3.3. As a full functional integrated system, the Vega GOS also

548

W. Li et al.

considers the implementation issues such as security, user management, communication, etc., which are not covered in this paper. At the application layer, programmers can use the GOS APIs to build a custom application. We also provide a GUI tool called GOS Client based on GOS APIs to help users to utilize the services in Vega Grid.

5 Conclusions and Future Work We have discussed the issues on grid service management. In order to overcome the obstacles in grid application development and system management, the GSS model is proposed to provide the location independent naming, transparent service access and service lifecycle control abilities to developers. As a fundamental component of our service management scheme, the GSS model also helps other research work such as grid-based programming model. We are currently implementing the Vega GOS and the GSS model on the CNGrid testbed. We hope the practical running of Vega GOS and its applications can verify the basic concepts and technologies in the GSS model.

References [1] [2]

[3] [4] [5] [6] [7]

[8] [9]

P. J. Denning, “Virtual Memory”, ACM Computing Surveys, vol. 2:3, pp. 153-189, 1970. S. Fitzgerald et al., “A Directory Service for Configuring High-Performance Distributed Computations”, Proc. 6th IEEE Symposium on High Performance Distributed Computing, pp. 365-375, 1997. I. Foster et al., “Grid Services for Distributed Systems Integration”, Computer, pp. 37-46, 2002. A. Grimshaw et al., “Wide-Area Computing: Resource Sharing on a Large Scale”, Computer, pp. 29-37, 1999. A. Iamnitchi et al., “On Fully Decentralized Resource Discovery in Grid Environments”, International Workshop on Grid Computing, 2001. S. R. Kleiman, “Vnodes: An architecture for multiple file system types in Sun UNIX”, In USENIX Association Summer Conference Proceedings, pp. 238-247, 1986. W. Li et al., “Grid Resource Discovery Based on a Routing-Transferring Model”, 3rd International Workshop on Grid Computing (Grid 2002), LNCS 2536, pp. 145-156, 2002. Z. Xu et al, “Mathematics Education over Internet Based on Vega Grid Technology”, Journal of Distance Education Technologies, vol. 1:3, pp. 1-13, 2003. Z. Xu et al, “A Model of Grid Address Space with Applications”, Journal of Computer Research and Development, 2003.

A QoS Model for Grid Computing Based on DiffServ Protocol* Wandan Zeng1, Guiran Chang1, Xingwei Wang1, Shoubin Wang2, Guangjie Han1, and Xubo Zhou3 1

School of Information Science and Engineering, Northeastern University Shenyang, Liaoning 110004, China [email protected],[email protected] 2

3

Beijing Institute of Remote Sensing, Beijing 100085, P.R.China China National Computer Software & Technology Service Corporation, Beijing 100083, P.R.China

Abstract. Because Grid comprises various kinds of dynamic and heterogeneous resources and provides the users with transparent services, how to achieve the quality of services for Grid Computing will be in face of more challenges. A QoS Model for Grid Computing based on DiffServ protocol is proposed in this paper. The implementation method is introduced. A mathematical model for scheduling on the Grid nodes has been established. This model is evaluated with a simulated Grid environment. The simulation results show that the Grid QoS model based on DiffServ proposed in this paper can improve the performance of Grid services for a variety of Grid applications. Making full use of it and optimizing it further are of great significance for the use of Grid resources and gaining high quality of Grid services.

1 Introduction Grid Computing comprises different kinds of resources such as super computers, large-scale storage systems, personal computers, and other equipment into a unified framework. When the resources needed by a computing project surpass the local processing capability, the Grid permits this project uses the CPU and storage resources of distant machines [1]. Traditional Internet achieved the connection of computer hardware, Web achieved the connection of web pages [2], and Grid integrates Internet resources to realize the sharing of all kinds of resources. Distributed, heterogeneous, and dynamic resources are used by Grids to provide users with transparent, unified, and standard services [3]. The characteristics of Grid challenge the qualities of service of Grid. Many Grid applications have the high qualities of service needs, such as tele-immersion, distributed *

This work was supported by the National Natural Science Foundation of China under Grant No.60003006 (jointly supported by Bell labs) and No.70101006; the National High-Tech Research and Development Plan of China under Grant No.2001AA121064.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 549–556, 2004. © Springer-Verlag Berlin Heidelberg 2004

550

W. Zeng et al.

real-time computing, multi-media applications, and so on. OGSA (Open Grid Services Architecture) proposed the seamless quality of service that spanned integrated Grid resources. The Chinese Academy of Science proposed a set of consumer satisfaction and quality of service evaluation standards like service level agreement [4]. How to efficiently use the distributed and dynamic Grid resources to provide high quality of service for Grid applications has become an indispensable part of Grid Computing research.

2 QoS for Grid Computing 2.1 The Basic QoS Framework for Grid Computing IP QoS provides us a set of mature protocols for QoS. They can provide high beneficial services, increase bandwidth, and improve the quality of end-to-end services. The current leading Grid Computing operating systems such as Globus are based on TCP/IP, so IP QoS protocols should be the important technologies that can be used in Grid Computing QoS. The fundamental framework for Grid Computing IP QoS is as follows.

Fig. 1. Fundamental Framework for Grid Computing IP QoS

2.2 The Grid Computing QoS Strategy Current Grid Computing QoS pays more attention to resource management but take little notice to task scheduling [5]. RSVP is the most commonly used protocol in Grid Computing QoS [6]. RSVP is an in-band signaling protocol and transmits the soft state signaling through channels out of data streams [7]. RSVP gains high performance but at the same time makes the transmission more complicated, thus it is suitable for small-type access network [8]. But the amount of services Grid Computing provided is very large and properties of Grid resources are complicated and dynamic. The out-of-band signaling will bring out huge additional burden for network transmission and will be very hard to control.

A QoS Model for Grid Computing Based on DiffServ Protocol

551

DiffServ protocol is used in the Grid Computing QoS model in this paper. DiffServ protocol uses in-band signaling. The signaling in band is transmitted with the data in data-stream together. It supports eight kinds of priority and obeys to DSCP protocol. The core of DiffServ is “classifying at the edge and transmitting inside”. It will not add the transmission burden to the networks and it has good expansibility. This will be more suitable for the characteristics of Grids.

3 The Implementation Method of Grid Computing QoS The techniques used for the implementation of Grid Computing QoS include service classification, speed limitation, queuing scheduling, congestion control, traffic engineering, and so on [9]. DiffServ protocol is used in this paper. The tasks are classified by Grid operating system. The Grid node schedules tasks by queues and is responsible for the congestion control of tasks.

3.1 The Implementation of Grid Computing DiffServ Grid Computing applications send task requirements including the answering time and some requirement information to the Grid operating system. After that the Grid operating system uses the DiffServ protocol to classify the tasks and then sign them. The Grid operating system converges the tasks as different assemble groups with different priorities.

Fig. 2. DiffServ QoS Model for Grid Computing

In this model, tasks are classified by their answering time. Some tasks are in high demand of quick answering, such as multi-media display, video conference, and so on. These tasks ask for short time delay answering. The priorities of these kinds of

552

W. Zeng et al.

services are high. Some are not sensitive to the answering time, such as some common transaction process, E-Mail, and so on. Their priorities are comparatively low. In order to be compatible with the sub-field IP Precedence of IPv4 ToS, there are at most eight kinds of service priorities. Here we define eight types of services differentiated by their request for answering time. And each priority represents one type of the emergency degree of the task. The DS field is defined as follows:

Fig. 3. The Assignment of DS Field in Grid Computing DiffServ

Bit 0-2 are set to 000, 001, 010, 011, 100, 101, 110, or 111, and represent the priority from lowest to highest. Bit 3-5 are all set to 0. The sixth and seventh are reserved. The Grid operating system sends the task data packets with DS sign to Grid nodes through the network. When a Grid node receives the packet, it parses the content in DS and then assigns the tasks to different queue with different priority. In the same queue every task is processed circularly by the time slice. 3.2 The Implementation Model of Grid Computing DiffServ on Grid Node The arrival of every class of tasks on Grid node obeys to Poisson distribution [10], is the average reaching ratio of tasks with priority i. The service time for every class of task obeys to the negative exponential distribution and the average service time is represent the average waiting and lingering time of the task of priority i. Now:

If represents the lingering leaving time of tasks with priority 1 and 2 when there are both priority 1 and 2 tasks in the system. Then:

A QoS Model for Grid Computing Based on DiffServ Protocol

553

Fig. 4. Task Scheduling on Grid Nodes

can be worked out by the M/M/1 model queue with the arrival ratio of

Similarly, we can get

We can conclude that,

554

W. Zeng et al.

4 The Results of Simulation Simulation is carried on by NS2 on Globus2.2. CBQ/WRR is used to simulate the eight queues of different priorities. We set the length of the queue 21 (number of packets) and the simulation time is fifty seconds. For the 8 priorities from low to high, 21 packets are considered. Considering the page limit, we choose three columns of them as representatives and their priorities are 1, 4, 6, 7, respectively. (Unit is s, and d in the table represents drop.)

A QoS Model for Grid Computing Based on DiffServ Protocol

555

Fig. 5. The Comparison of Grid Packets Processing Time

WRR uses circular inquiring strategy and every queue has the same length of time slice. When congestion occurs, the tasks with lower priority will be dropped. From the table we can see that the losing ratio of the packet increases gradually. The processing time of the same packet in different queues is different and it becomes longer as the priority becomes lower. This model can provide different service quality for Grid applications. It can make full use of Grid resources and improve the performance of Grid Computing services for a variety of Grid applications.

5 Conclusion The characteristics of Grid services are the most important factors for the complexity of the quality of Grid Computing services. The Grid Computing QoS Model discussed in this article is effective according to the results of simulation. It can improve the performance for a variety of Grid Computing applications. But it is still not an ideal model for its own limitation. Because the lack of end to end signaling, DiffServ QoS cannot ensure end to end Grid QoS. And it has only 8 grades. Further research on Grid QoS, for example, further optimizing the model and more economic concepts taken into account, is the main aim and task in the future.

References 1.

2.

3.

Xiao, N., Ren, H., Gong, S.S.: Design and Implementation of Resource Directory in National High Performance Computing Environment. Journal of Computer Research and Development, 2002, 902 (8) Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications and High Performance Computing, 1997. 11(2): 115-128 Foster, I., Kesselman, C.: The Grid, Blueprint for a New Computing Infrastructure. San Francisco: Morgan Kaufmann Publishers Inc., 1998. 279-309

556

W. Zeng et al.

Xu, Z.W., Li, W.,: Architectural study of the vega Grid. Journal of Computer Research and Development (in Chinese), 2002, 39 (8): 923-929 5. He, X.S., Sun, X.H., von Gregor, L.: QoS Guided Min-Min Heuristic for Grid Task Scheduling. Workshop on Grid and Cooperative Computing. Comput. Sci. & Technol, 2003, 18 (4): 442 451 6. Foster, I., Roy, A., Winkler, L.:A quality of service architecture that combines resource reservation and application adaptation. In: Proceedings of the 8th International Workshop on Quality of Service (IWQOS 2000). 2000. 181-188 7. Xiao, X.P.: Internet QoS: a big picture. IEEE Networks, March/April 1999: 8 18 8. Grenville, A.,: Quality of Service in IP Networks: Foundations for a Multi-Service Internet, Beijing: China Machine Press, 2001.1 9. Subrat, K.: Issues and architectures for better quality of service (QoS). Proceedings of the 6th National Conference on Communications (NCC2000), New Delhi, India: 181-187 10. Lu, Z.Y., Wang, S.M.: The Information Content Theory of Communication, Beijing: Publishing House of Electronics Industry, 1997. 42-45 4.

Design and Implementaion of a Single Sign-On Library Supporting SAML (Security Assertion Markup Language) for Grid and Web Services Security Dongkyoo Shin, Jongil Jeong, and Dongil Shin Department of Computer Science and Engineering, Sejong University 98 Kunja-Dong, Kwangjin-Ku, Seoul 143-747, Korea {shindk, jijeong, dshin}@gce.sejong.ac.kr

Abstract. In recent years, the Grid development focus is transitioning from resources to services. A Grid Service is defined as a Web Service that provides a set of well-defined interfaces and follows specific conventions. SAML is an XML based Single sign-on (SSO) standard for Web Services, which enables the exchange of authentication, authorization, and profile information between different entities. This provides interoperability between different security services in distributed environments. In this paper, we designed and implemented Java-based SAML APIs to achieve an SSO library.

1 Introduction Grid computing denotes a distributed computing infrastructure for advanced science and engineering. It supports coordinated resource sharing and problem solving across dynamic and geographically dispersed organizations. Moreover, the sharing concerns not only file exchange but also direct access to computers, software, data, and other resources [1]. In recent years, the Grid development focus is transitioning from resources to services. A Grid Service is defined as a Web Service that provides a set of welldefined interfaces and follows specific conventions [2]. The interfaces address discovery, dynamic service creation, lifetime management, notification and manageability. Web Services [3], proposed by World Wide Web Consortium (W3C), provide a standatd which is designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL - Web Services Description Language). Other systems interact with the Web services in a manner prescribed by its description using SOAP(Simple Object Access Protocol) messages, typically conveyed using HTTP with an XML serialization. Single sign-on (SSO) is a security feature, which allows a user to log into many different services offered by the distributed systems such as Grid [4] and Web Services, while the user only needs to authenticate once, or at least always in the same way [5, 6]. Various SSO solutions have been proposed that depend on public key M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 557–564, 2004. © Springer-Verlag Berlin Heidelberg 2004

558

D. Shin, J. Jeong, and D. Shin

infrastructure (PKI), Kerberos, or password-stores, which require an additional infrastructure on the client’s side and new administrative steps [7]. Recently a new standard for exchange of security-related information in XML called Security Assertions Markup Language (SAML) [8,9] is recommended by the Organization for Advancement of Structured Information Standards (OASIS). SAML enables the exchange of authentication, authorization, and profile information between different entities to provide interoperability between different security services in distribution environments such as Grid and Web Services. The security information described by SAML is expressed in XML format for assertions, which can be either: authentication assertions, attribute assertions, or authorization decision assertions. SAML authorities issue assertions, which can be security service providers or business organizations. Assertions provide a means of avoiding redundant authentication and access control checks, thereby providing single sign-on functionality across multiple target environments. SAML also defines the protocol by which the service consumer issues the SAML request and SAML authority returns the SAML response with assertions. While SAML is an authentication standard for Web Services, it is also proposed as a message format for requesting and expressing authorization assertions and decisions from an OGSA(Open Grid Services Architecture) authorization service [10]. In this paper, we designed and implemented a Java-based SSO library made up of SAML Application Programming Interfaces (APIs).

2 Background The basic idea of single sign-on (SSO) is to shift the complexity of the security architecture to the SSO service and release other parts of the system from certain security obligations. In the SSO architecture, all security algorithms are found in the single SSO service, which acts as the single and only authentication point for a defined domain. The SSO service acts as the wrapper around the existing security infrastructure that exports various security features like authentication and authorization [5,6]. For SSO implementation, classical three-party authentication protocols that exploit key-exchanges such as Kerberos and Needham-Schroeder are used. Since these protocols start with a key-exchange or key-confirmation phase, the client application uses the new or confirmed key for encryption and authentication [11]. For a different approach for SSO implementation, token-based protocols such as cookies or SAML are used [11]. Being different to key-exchange protocols, an authentication token is sent over an independently established secure channel. In other words, a secure channel is established without an authenticated client key, in which the Secure Socket Layer (SSL) [12] is usually used with browsers, and then an authentication token is sent in this channel without conveying a key. The main advantage of token-based protocols is that a majority of service providers already have SSL server certificates and a suitable cryptographic implementation is available on all client machines via the browsers. In addition, one can use several unrelated authenti-

Design and Implementaion of a Single Sign-On Library Supporting SAML

559

cation tokens to provide information about the user in the same secure channel with the service provider [11]. SAML is a standard suitable for facilitating site access among trusted security domains after single authentication. Artifacts, which have a role of tokens, are created within a security domain and sent to other security domains for user authentication. Since the artifacts sent to the other domains are returned to the original security domain and removed after user authentication, this resolves the problems of session keys being revealed and stolen tokens in the browser. In addition, artifact destination control is fully achieved since artifact identification is attached to the Uniform Resource Locator (URL) and redirects the message sent to the destination [8].

2.1 SAML (Security Assertion Markup Language) Recently, OASIS has completed SAML, a standard for exchanging authentication and authorization information between domains. SAML is designed to offer single signon for both automatic and manual interactions between systems. It will let users log into another domain and define all of their permissions, or it will manage automated message exchanges between two parties. SAML is a set of specification documents that define its components: Assertions and request/response protocols Bindings (the SOAP-over-HTTP method of transporting SAML requests and responses) Profiles (for embedding and extracting SAML assertions in a framework or protocol) Security considerations while using SAML Conformance guidelines and a test suite Use cases and requirements

Fig. 1. Structure of Assertion Schema

Fig. 2. Assertion with Authentication Statement

560

D. Shin, J. Jeong, and D. Shin

SAML enables the exchange of authentication and authorization information about users, devices or any identifiable entity called subjects. Using a subset of XML, SAML defines the request-response protocol by which systems accept or reject subjects based on assertions [8,9]. An assertion is a declaration of a certain fact about a subject. SAML defines three types of assertions: Authentication: indicating that a subject was authenticated previously by some means (such as a password, hardware token or X.509 public key). Authorization: indicating that a subject should be granted or denied resource access. Attribution: indicating that the subject is associated with attributes. Figure 1 shows an assertion schema and Figure 2 shows the assertion statement, which includes authentication assertion issued by SAML authority. SAML does not specify how much confidence should be placed in an assertion. Local systems decide if security levels and policies of a given application are sufficient to protect an organization if damage results from an authorization decision based on an inaccurate assertion. This characteristic of SAML is likely to spur trust relationships and operational agreements among Web-based businesses in which each agrees to adhere to a baseline level of verification before accepting an assertion. SAML can be bound with multiple communication and transport protocols. It can be linked with Simple Object Access Protocol (SOAP) over HTTP [8,9].

Fig. 3. Browser/Artifact

Fig. 4. Browser/Post

SAML operates without cookies in one of two profiles: browser/artifact and browser/post. Using browser/artifact, a SAML artifact is carried as part of a URL query string as shown in Figure 3, where a SAML artifact is a pointer to an assertion. The steps in Figure 3 are explained as follows. (1) User of an authenticated browser on Server A requests access to a database on Server B. Server A generates a URL redirect, which contains a SAML artifact, to Server B. (2) Browser redirects user to Server B, which receives an artifact pointing to the assertion on Server A. (3) Server B sends artifact to Server A and gets a full assertion. (4) Server B checks the assertion and either validates or rejects the user’s request for access to the database. With browser/post, SAML assertions are uploaded to the browser within an HTML form and conveyed to the destination site as part of an HTTP post payload, as show in Figure 4. The steps in Figure 4 are explained as follows.

Design and Implementaion of a Single Sign-On Library Supporting SAML

561

Fig. 5. Java Packages of SAML APIs

(1) User of authenticated browser on Server A requests access to database on Server B. (2) Server A generates an HTML form with SAML assertion and returns it to user. (3) Browser posts the form to Server B. (4) Server B checks assertion and either allows or denies user’s request for access to database.

3 Design and Implementation of Java-Based SAML APIs We designed and implemented SAML APIs as Java packages as shown in Figure 5. The classification of packages is based on the specification “Assertions and Protocol for the OASIS Security Assertion Markup Language (SAML)” [8]. We designed three basic packages named assertion, protocol and messaging packages. To support the messaging function, we also designed generator, uitilities and security packages. The implemented SAML APIs are grouped into these packages. The function of each package is as follows. Assertion package: dealing with authentication, authorization and attribution information. Protocol package: dealing with SAML request/response message pairs to process assertions. Messaging package: including messaging frameworks which transmit assertions. Security package: applying digital signature and encryption on the assertions Utilities package: generating UUID, UTC Data format and artifacts, and so on. Generator package: generating SAML request/response messages. The Structure of major packages will be shown as continuous figures. The structure of assertion package is shown in Figure 6. The structure of protocol package is shown in Figure 7. A protocol defines an agreed way of asking for and receiving an assertion [13]. The structure of the messaging package is shown in Figure 8. The messaging package transforms a document into a SOAP message and defines how the SAML messages are communicated over standard transport and messaging protocols [13]. We verified the message according to the SAML specifications. When we generated SAML request messages as shown in Figure 9, we used RequestGenerator class in generator package (refer to step 1 of Figure 3 and 4).

562

D. Shin, J. Jeong, and D. Shin

Fig. 6. Structure of ac.sejong.saml.assertion package

Fig. 7. Structure of ac.sejong.saml.protocol package

Fig. 8. Structure of ac.sejong.saml.messaging

Fig. 9. Generation of SAML Request Message

This SAML request message is signatured as shown in Figure 10, using signature class in security.sign package. The signature process of signature class follows XMLsignature standards in the enveloped form. Figure 11 shows the generation of SAML response messages, in which ResponseGenerator class in generator package is used (refer to step 4 of Figure 3 and 4). This SAML response message is also signatured using signature class in security. sign package.

Design and Implementaion of a Single Sign-On Library Supporting SAML

563

Fig. 10. SAML Request Message signatured in Enveloped Form

Fig. 11. Generation of SAML Response Message

4 Conclusion We designed and implemented an SSO library supporting the SAML standard. The implemented SAML APIs have following features. Since SAML messages are transmitted through SOAP, XML based message structures are fully preserved. This enables valid bindings. Integrity and non-repudiation are guaranteed by using signatures on transmitted messages. Confidentiality is guaranteed by encryption of transmitted messages. Since XML encryption is applied, each element can be efficiently encrypted.

564

D. Shin, J. Jeong, and D. Shin

Even though digital signatures on a SAML message using RSA is default and using an XML signature is optional, we fully implemented both APIs in security package. Specific encryption methods for SAML messaging are not mentioned in the SAML specification and XML encryptions are a suitable candidate for encryption of SAML messages. We also implemented APIs for XML encryption [14]. Recently, Grid systems have adopted Web Services standards that were proposed by W3C and SAML is an XML-based SSO standard for Web Services. SAML will become widely used in Grid Architecture, as distributed applications based on Web Services become popular. We will further apply this SAML library to real word systems such as the Electronic Document Manage Systems (EDMS) and the groupware systems and continue research on authorization for users.

References 1. 2.

3. 4.

5. 6. 7.

8. 9. 10. 11. 12. 13. 14.

Foster, I., Kesselman C.: The Globus Project: A Status Report. Future Generation Computer Systems, Volume: 15, (1999) 607-621 Foster I., Kesselman C., Nick J.M., Tuecke S.: The Physiology of the Grid - : An Open Grid Services Architecture for Distributed Systems Integration, http://www.globus.org/research/papers/ogsa.pdf Web Services Activity, http://www.w3.org/2002/ws Foster I., Kesselman C., Tsudik G., Tuecke S.: A Security Architecure for Computational Grids. Proc. 5th ACM Conference on Computer and Communications Security Conference, (1998) 83-92 Volchkov, A.: Revisiting Single Sign-On: A Pragmatic Approach in a New Context. IT Professional, Volume: 3 Issue: 1 , Jan/Feb (2001) 39 -45 Parker, T.A.: Single Sign-On Systems-The Technologies and The Products. European Convention on Security and Detection, 16-18 May (1995) 151-155 Pfitzmann, B.: Privacy in Enterprise Identity Federation - Policies for Liberty Single Signon. 3rd Workshop on Privacy Enhancing Technologies (PET 2003), Dresden, March (2003) Assertions and Protocol for the OASIS Security Assertion Markup Language(SAML) V1.0: http://www.oasis-open.org/committees/security Bindings and Profiles for the OASIS Security Assertion Markup Language(SAML) V1.1: http://www.oasis-open.org/committees/security Global Grid Forum OGSA Security Working Group: Use of SAML for OGSA Authorization, http://www.globus.org/ogsa/Security Pfitzmann, B., Waidner, B.: Token-based Web Single Signon with Enabled Clients. IBM Research Report RZ 3458 (#93844), November (2002) Frier A., Karlton P., and Kocher P.: The SSL 3.0 Protocol. Net Scape Communications Corporation, Nov 18, (1996) Galbraith, B., Trivedi, R., Whitney, D., Prasad D. V., Janakiraman, M, Hiotis, A., Hankison, W.: Professional Web services Security, Wrox Press, (2002) XML Encryption WG, http://www.w3.org/Encryption/2001/

Performance Improvement of Information Service Using Priority Driven Method Minji Lee1, Wonil Kim2*, and Jai-Hoon Kim1 1

Ajou University, Suwon 442-749, Republic of Korea {mm2 23j,jaikim}@ajou.ac.kr

2.

Sejong University, Gwangjin-Gu, Seoul, 143-747 Republic of Korea [email protected]

Abstract. Grid is developed to accomplish large and complex computation by gathering distributed computing resources. Grid employs information service to manage and provide these collected resources. The accurate information on resource provider is essential to stable service. In Grid information service, GRIS and GIIS are used to gather and maintain the resource information. These two directory servers store resource information in a cache only for a fixed period to improve fast response and accurate access. Since resource information search is performed in the cache, the system performance depends on the search time in a cache. In this paper, we propose a novel cache management system based on the resource priority. Priority of resource is determined by the frequency of resource usage, and the number of available resources in GRIS. The simulated priority driven schemes provide more accurate information and faster response to a client than the existing cache update scheme of information service.

1 Introduction In Grid, directory service [1] is provided to keep and maintain resource information components. Variable types of connections are opened in directory service to update and search the resource status. Most of connections between the directory server and the resource producer are used in modifying the status of resources. The information transmitted by a resource provider is stored in a cache of directory server for fixed time interval. Current directory servers perform query processing on cache. They rummage their caches to search requested information after receiving a query from a client. If there is no proper resource information in the cache, the query is transmitted to other directory servers. If there are not many resource providers cache size is not an important problem. However, as the number of resource provider increase, cache size becomes a critical issue because it can be a major factor for fast response and accurate access. If there is too much information in a cache, it takes long time to respond to a query whereas the information correctness increases. * Author for Correspondence +82-2-3408-3795 M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 565–572, 2004. © Springer-Verlag Berlin Heidelberg 2004

566

M. Lee, W. Kim, and J.-H. Kim

Cache update is usually performed as a sequence of FIFO (First In First Out). In order to provide the accurate information, old information, which stays in a cache more than the fixed time, is also deleted. As the number of resource providers and requests increase, resources status changes fast. Consequently, it needs effective cache update mechanism that guarantees the accuracy of information. The best strategy for efficient cache management is to predict the next information being requested by a client and stores the information in a cache in advance. However it is impossible to predict every user’s requests and store the information. Instead of predicting every user’s request, this paper proposes two schemes to decide the importance of given information. One method gives priority on the basis of resource access frequency. In order to know the access frequency, GRIS logs the number of requests whenever it receives a resource request message from a client. Another method gives priority to the resource provider having more available resources than the others.

2 Information Service in Grid Grid is an important field in the distributed system. Grid can provide huge amount of resources to the requestor. In order to manage those resources, Grid maintains systematically organized architecture.

2.1 Grid Information Service Protocol In order to provide proper resources to a requestor, it is important to manage the resources efficiently and effectively. It is also important to provide correct information about resource status. Grid Information Service (GIS) supports such a service [2]. GIS provides the initial discovery and ongoing monitoring of the existence and characteristics of resources, services, computations, and other entities that are part of Grid system [2]. In order to support the functions listed above, GIS assumes that resources are subject to fail and the total number of resource provider is large. Furthermore, the type of resource is various. GIS architecture comprises highly distributed resource providers and specified aggregate directory services. Resource provider furnishes dynamic information according to the prototype defined by VO-neutral infrastructure. GIS uses two basic protocols; Grid Resource Information Protocol (GRIP) and Grid Resource Relationship Protocol (GRRP). GRIP is used to access information and GRRP is used to notify aggregate directory services of the availability of the information. For example, GRIP adopts standard Light Weighted Directory Access Protocol (LDAP) [5] as a protocol and it defines a data model, query language, and wire protocol.

2.2 Current Information Service in Grid Metadata Directory Service (MDS) is an implementation of information service in a Grid [4]. The service is provided with GRIS and GIIS. GRIS gathers the status infor-

Performance Improvement of Information Service Using Priority Driven Method

567

mation of resources and reports the status information to GIIS [4]. GIIS is hierarchical structure organized with several GIIS and GRIS [4]. GRIS is located at the bottom of the hierarchy and it is connected to several GIIS. GIIS receives data from many GRIS and stores the data in a cache. Fig 1 shows a prototype of query processing generated by a client. Initially, a query accesses the highest-level GIIS. If the data requested by a client are in the cache, the data are returned to a client. If not, query is sent to other GIIS to find data. This process is repeated until the query finds requested data. In other words, if the requested information is not in the cache, the query has to be transmitted to many GIIS and the requestor has to wait until the search sequence finishes. Therefore, the performance of a query to a GIIS is dependent upon the performance of the resource information servers that the query accesses.

Fig. 1. The Query processing in MDS. A query comes to the highest-level GIIS. GIIS searches its cache. If there is requested information, the information is transmitted to a requestor. If not, the query is sent to other GIIS or GRIS until requested information is found.

Warren Smith [3] showed the different searching time between two cases; first case is that data is at top-level cache and second is that data is at the bottom-level cache. It takes 10 seconds for the first case and 60 seconds for second case to receive data. This result shows the importance of containing data in the higher-level cache.

3 The Proposed Information Service System 3.1 Proposed Directory Service Since current directory service does not consider the importance of resources all the resource information has the same priority. Thus cache update is influenced only by its resource information registration time. However, such a current scheme cannot provide fast and accurate resource information for clients. If each resource has various priorities on the basis of some rules and the priority is applied to cache update method, its cache update result will be different from current cache update result in some ways. Two methods are proposed to decide the priority among resources. One method determines the resource priority with the frequency of resource usage. This scheme

568

M. Lee, W. Kim, and J.-H. Kim

implies that frequently used resources have higher possibility of being accessed by a client than the resource that is used occasionally. Another method determines resource priority with the number of available resources that GRIS has.

Fig. 2. New Scheme Added Directory Service in the Simulation

Fig. 3. Cache Update of Each Algorithm when Resource Information Message Arrives

Fig. 2 shows how the resource access frequency is recorded and used in a simulated directory service. Each GRIS records the type of a requested resource whenever it receives a resource request message from a client. At this time, the GRIS records the requested resource type without considering the availability of resource.

3.2 Data Processing of Proposed Methods Two cache update schemes of the proposed methods are shown in Fig. 3. Both of the methods start data processing when a message arrives. Then they check if there is empty space in their caches to store new data. If there is empty space, new data is inserted into the cache. If there is not, each method decides which data is discarded

Performance Improvement of Information Service Using Priority Driven Method

569

from the cache considering the data priority. One method decides the priority by the number of available resources and another decides the priority by the resource access frequency. After deciding data to be discarded, the data is compared with data in a message. If data in a message priority is higher than the data priority in a cache, the message is inserted into the cache or the message is discarded. Fig. 3 explains an example of the proposed method. In case of method 1, access frequency of a resource update message for resource X is 6 per day. Access frequency of resource G in a cache is 1 per day. Thus resource G is discard from a cache and resource X is inserted into a cache. In case of method 2, there are 6 resources in the resource update message and its resource type is X. In the cache, resource D has only 1 resource. Thus resource D is discarded from the cache and resource X is inserted into a cache.

4 Simulation Simulation is performed with three types of directory service distinguished by the cache update method. First simulation is performed with the information service applying general cache update scheme. Second simulation is performed with the directory service applying new cache update scheme considering the resource access frequency and ttl. Third method is completed after applying new scheme considering the number of available resources and ttl.

Fig. 4. Information Service Organization for Simulation. There are 5-layer GIIS. Each level has 1, 2, 3, 7 and 16 nodes. There are 160 GRIS servers and each server controls 20 resources.

4.1 Construction of Directory Service There are 160 GRISs in the simulation. Each GRIS controls 20 resources and the type of each resource is the same. 5 types of resource are used for the simulation. The component ratio of each resource is 50%, 20%, 10%, 10% and 10% respectively. Resource request ratio is identical to the component ratio Each GIIS controls two GIIS except the level 5. In level 5, each of 16 GIIS controls 10 GRISs. Table 1 shows the cache size at each level. Cache size increases according to GIIS levels. Simulations are performed changing the cache size of each level.

570

M. Lee, W. Kim, and J.-H. Kim

Fig. 5. Cache Hit Ratio at each Level GIIS When Cache Size of the Highest Level GIIS is 5. In order to show the performance improvement of new cache update schemes, cache size is fixed as simulation 1 shown in table 1.

4.2 Cache Hit Ratio of Each Algorithm Fig. 5 shows cache-hit ratio at GRIS. High cache hit ratio at the GRIS level means poor performance, because the GRIS is the last directory where information search is performed. When the highest-level cache size is 10, cache-hit ratio of the previous scheme is 25%. On the other hand, cache hit ratio of proposed schemes is 10%. In case of the cache size is 15, cache-hit ratio of all schemes is about 10%. An increase

Performance Improvement of Information Service Using Priority Driven Method

571

of cache-hit ratio at GRIS means an increase of response time for a query. Performance of the previous algorithm decreases as cache size reduces as shown in Fig 5. Performance of priority driven schemes shows almost the same performance when cache size changes. The proposed schemes show better performance than the previous scheme.

Fig. 6. Cache-Hit Ratio When Cache Size Changes. There are nine types of simulations and each simulation is distinguished by the cache size and the cache update schemes.

Fig. 7. Accuracy of cache information by measuring the success rate of resource request

Fig 5 and 6 show cache hit ratio when cache size changes. If cache size is large enough to store information transmitted by lower level GIIS or GRIS, three algorithms find information from cache at the ever rate. As the cache size decreases their cache hit ratio is discriminated. Cache hit ratio at the highest-level GIIS of new cache update schemes decreased of 50% compared with previous cache update scheme. Furthermore, the cache hit exactness of proposed schemes increases up to 92%.

4.3 Accuracy of Cache Information Fig. 7 shows the accuracy of returned information. When previous update scheme is applied to information service, accuracy of information is decreased as cache size decreases. In contrast to the previous scheme, accuracy of proposed two schemes are unchanged when cache size changes. Furthermore the proposed schemes show higher accuracy than the previous cache update scheme. For example, the Usage-Priority scheme shows almost 90% of accuracy. This rate is stable though the cache size changes.

572

M. Lee, W. Kim, and J.-H. Kim

4.4 Performance Evaluation Proposed algorithms are fast and accurate as shown in Fig. 5, 6 and 7. When previous cache update algorithm is applied to information system, cache hit ratio of the highest-level directory server also decreases as cache size of directory server decreases. It is natural that small size cache has small number of information than the large size cache and the cache makes less responds to a query. In order to increase the cache-hit ratio, new cache update scheme is needed. Basic concept of proposed algorithms is priority driven information. When a cache is full of data, one of the information in a cache is discarded. The previous algorithm considers only staying time in a cache of the information. On the other hand, the proposed algorithms do not only consider staying time but also priority of information.

5 Conclusion Current cache update method is FIFO on the basis of sequence that information enters cache. In this paper we proposed a novel method applying priority to resource information on the basis of resource usage frequency and the number of available resources. The proposed cache update method increased exactitude of cache information. Even though the cache size decreases cache-hit ratio and exactitude of a query response are unchanged. As the resources and users increase, the performance of previous information service may depreciate. Information accuracy of the proposed schemes is increased up to 90% when the previous scheme shows only 70%. Furthermore, the cache-hit ratio at the highest-level directory server doubles comparing with the previous scheme. The simulation shows that the proposed cache update schemes provide fast and accurate information.

References 1. Steven Fitgerald, Ian Foster, Carl Kesselman and Gregor VOn Laszewski, “A Directory Service for ConFiguring high-Performance Distributed computation,” Posted on “http://www.globus.org.” 2. Ian Foster, Carl Kesselman and Steven Tuecke, “The Anatomy of Grid,” “http://www.globus.org.” 3. Warren Smith, Abdul Waheed, David Meyers and Jerry Yan, “An Evaluation of Alternative Designs for Grid Information Service,” 9th IEEE international Symposium on High Performance Distributed Computing 4. “MDS 2.2: Creating a Hierarchical GIIS”, “MDS 2.2: GRIS specification Document: Creating New Resource providers”, http://www.globus.org/mds/NewFeatures.html, 2002 5. Gregor von Laszewski and Ian Foster, “usage of LDAP in globus,” http://www.globus.org/mds/NewFeatures.html, 2002

HH-MDS: A QoS-Aware Domain Divided Information Service* Deqing Zou, Hai Jin, Xingchang Dong, Weizhong Qiang, and Xuanhua Shi Huazhong University of Science and Technology, Wuhan, 430074, China [email protected]

Abstract. Grid computing emerges as effective technologies to couple geographically distributed resources and solve large-scale problems in wide area networks. Resource Monitoring and Information Service (RMIS) is a significant and complex issue in grid platforms. A QoS-aware domain divided information service, HH-MDS, is introduced in this paper. It is an important component of our service grid platform. HH-MDS solves system scalability issue effectively. In additional, several effective QoS strategies are provided to improve the efficiency of resource monitoring and information searching. Based on these strategies, service-oriented definitions and SLA specification are proposed to describe serving capability and relating QoS issues.

1 Introduction Grid technologies [1][13] enable large-scale sharing of resources within all kinds of consortia of individuals and/or institutions. In these environments, the discovery, characterization, and monitoring of resources and computations is a challenging issue. The RMIS need record the identity and essential characteristics of services available to community members, and maintain service validity. Notification framework is proposed as a basic means for determining the existence and properties of an entity in wide area network. Each framework message is propagated with timestamps. Based on soft-state model, a robust notification mechanism coupled with a graceful degradation of stale information is provided. Inter-operation between different systems is also a challenging issue in grid platforms. Web services [2] are Internet based applications that communicate with other applications to offer service data or functional services automatically. A service level agreement is an agreement regarding the guarantees of a web service. A service provider contracts with a client to provide some measurable capability or to perform a task by a service [3]. As resources in a particular grid system have apparently geographical characteristic, it is suitable to divide the whole grid into several parts and manage them separately. The rest of the paper is organized as follows: Section 2, we discuss related works about the information services. Section 3, we propose HH-MDS framework. A domain divided architecture is adopted and system scalability is discussed within this framework. Sections 4, some strategies about QoS guarantees are described. Finally we analyze HH-MDS performance and conclude this paper. * This paper is supported by National Science Foundation under grant 60125208 and 60273076. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 573–580, 2004. © Springer-Verlag Berlin Heidelberg 2004

574

D. Zou et al.

2 Related Works Peer-to-peer [4] paradigm dictates a fully distributed, cooperative network design, where nodes collectively form a system without any supervision. Its advantages include robustness in failures, extensive resource sharing, self-organization, load balancing, data persistence, anonymity, etc. Current search algorithms for unstructured P2P networks [5] can be categorized as either blind or informed. In a blind search, nodes hold no information that relates to document locations. While in informed methods, there exists a centralized or distributed directory service that assists in the search for the requested objects. Informed method is more suitable for global information searching and performance prediction than blind method. Globus [14] platform can be viewed as a representative peer-to-peer system. There does not exist a centralized supervision over all kinds of grid sources, and each task is submitted and controlled by a grid source. Globus adopts an informed method, the Monitoring and Discovery Service (MDS2) [6][7], for resource searching. MDS2 provides an effective managing strategy for both static and dynamic information of resources. It adopts a hierarchical structure to collect and aggregate service information. MDS2 is one of most popular RMISs at present. If the information service of a grid system is constructed in tree architecture, system load in high-level information servers will increase violently, and information searching efficiency could not be guaranteed. Index service in Globus Toolkit 3 is developed based on web services, but there is not an effective name space to organize services in the whole grid. It depends on domain name or IP address to locate the corresponding services. The Relational Grid Monitoring Architecture (RGMA) [8] [12] monitoring system is developed based on the relational data model and Java Servlet technologies. Hawkeye is a tool developed by the Condor group [9]. The main use is to offer monitoring information to anyone interested and to execute actions in response to conditions. They could not extend to a large scale. Based on domain divided principle, we propose a peer-to-peer architecture for high-level information servers to manage domain services and provide global service information. One domain includes many organizations, and one organization includes one or several service providers. A domain information server is responsible for the validity management and general capability provision of domain services. Besides the validity management, an organizational information server is also responsible for current capability provision of services within an organization. Effective strategies, such as performance statistic, prediction, and dynamic notification cycle, are provided to evaluate current service capability of the service provider.

3 Architecture of HH-MDS Framework Fully studied the above popular RMISs, a perfect RMIS oriented service should conform to five principles: 1) Service usability; 2) Information consistency; 3) Query performance; 4) System scalability, and 5) Distributed architecture. HH-MDS framework is designed based on the above principles. As depicted in Figure 1, the whole framework is divided into several domains with peer-to-peer relationship. In each domain, HH-MDS information services are classified into three

HH-MDS: A QoS-Aware Domain Divided Information Service

575

levels: (1) Domain Information Server (DIS), (2) Information Server (IS), and (3) Service Provider (SP). In order to make a DIS from becoming a single point of failure, two DISs are constructed in a domain.

Fig. 1. The HH-MDS Architecture

A SP registers to an IS with local services within an organization. It monitors local services, evaluates and predicts their performance, and reports their current serving capability to the IS. An IS registers to a DIS with services within a domain. Registered services in the DIS include service’s functions and general serving capability. The descriptions of registered services in the IS differ from those in the DIS. Besides those service descriptions in the DIS, the IS also includes current serving capability of services, described by SLA specification. In HH-MDS framework, we propose four types of protocols to achieve global resource sharing in the whole grid: (1) inter-domain GRid Registration Protocol (Inter-domain GRRP), which is available when a new domain is added to current grid system, at first, a new legal certificate signed by Domain CA is required for a new DIS; (2) Inter-domain GRid Information Protocol (Inter-domain GRIP), which is used by a DIS to query service information from the other DISs; (3) Intra-domain GRid Registration Protocol (Intra-domain GRRP), which is available to a SP or a IS when it registers to a higher-level IS or DIS with services; and (4) Intra-domain GRid Information Protocol (Intra-domain GRIP), which is used by users when they query service information from a DIS or a IS. A hierarchical naming method described in XML schema is proposed to organize registered elements. Elements “HHMds-Domain-name”, “HHMds-Organizationname = XXX”, and “HHMds-Host-name” are used to describe a host, and such above elements and element “HHMds-Service-name” are used to describe a service.

576

D. Zou et al.

Fig. 2. QoS Framework of HH-MDS

3.1 HH-MDS QoS Criteria As depicted in Figure 2, HH-MDS QoS framework [10][15] is divided into three levels: DIS level QoS, IS level QoS, and SP level QoS. DIS level QoS is used to guarantee service usability in a domain and provide global service information. Notification mechanism is adopted by the DIS to send service information to users subscribing the corresponding services when notification event occurs. IS level QoS is used to guarantee service validity and provide current serving capability of services in an organization. SP level QoS is used to monitor local services. Based on user request rate, available bandwidth, and local available resources, SP level QoS is used to determine current serving capability of a service and predict its future serving capability.

3.1.1 DIS Level QoS DIS level QoS includes three parts: service SLA specification, subscription and notification, and service validity management. Based on subscription and notification mechanism, a user can subscribe interesting services and get notification message in time once anyone registers to the DIS, or is invalid. A grid service is a WSDL-defined service that conforms to a set of conventions relating to its interface definitions and behaviors. It includes many kinds of service data to describe service functions. Service SLA specification provides general serving capability and policy description. The definition part consists of many SLA parameters, which are assigned a metric defining how its value is measured or computed. These parameters include total response time, total throughout ratio, performance curve, and usage policy. Service SLA specification provides service lifetime, and service obligations, which define QoS guarantees that service provider offers to service consumer. These guarantees represent promises with respect to the state of SLA parameters and promises to perform an action. There are three kinds of service status: normal, failure, and revised. When the status is revised, status report couples with the revised service description.

HH-MDS: A QoS-Aware Domain Divided Information Service

577

3.1.2 IS Level QoS IS level QoS embodies at three aspects: dynamic SLA management (such as current response time, current throughout ratio, etc) of services, cache management, and subscription and notification. Subscription and notification mechanism is realized at two levels in the IS: service level, and node level. At service level, a user interacts with a service directly and subscribes information about its current capability. Once its current capability meets the requirement of user request, the service will notify him. The IS container sets up a queue for each service, which records users who subscribe this service. At node level, the IS container sets up a common queue for user subscription. A user can subscribe interesting services and get notification. Dynamic service information is obtained from the SP, and cached in the IS memory. Cache management is introduced to fasten information searching speed, but it reduces information veracity. It is unadvisable to set a fixed notification interval for a service when it registers to the IS, because its current serving capability is dynamic and undetermined. When current serving capability of a service changes slowly, the notification interval is big, vice versa. Alterable notification improves information veracity and lighten system load. 3.1.3 SP Level QoS SP level QoS includes two parts: statistic decision-making, and changing detection. Statistic decision-making is used for performance evaluation and alterable notification. Changing detection is used to detect immediate change of local services. As different service type has its special resource requirement, we take data intensive computing service type as an example to describe statistic decision-making process. P donates available node resource, including CPU, memory, disk, and P(t) donates available resource changing rate, both including available amount and current access speed. N donates current available bandwidth, and N(t) donates available network bandwidth changing rate. F donates serving capability of a service, and F(t) donates the changing rate of serving capability. If memory and disk space are large enough, F is related to CPU and network bandwidth. We donate F with F1(CPU, N). If network bandwidth is equal to the constant n, we conclude:

The statistic decision-making flow is described as follows: For each n in

I donates bandwidth interval, and K donates CPU available rate interval. According to Eq.1 and linear interpolation formula, we have:

578

D. Zou et al.

We get the following performance evaluation aggregation:

Based on Eq.3, we get current serving capability of a service as:

P(0) and N(0) are obtained by operating system and network detector separately. If

donates the

last notification time), the SP notify F1(P(0),N(0)) to the IS. BasicValue is a threshold specified as a performance parameter of a service. There exist some gusty events in the SP. Those events will cause service performance change violently. Four situations exist: (1) user request number changes violently, (2) system load changes violently, (3) resource reservation, and (4) resource release. For these situations, F should be calculated again and notified the IS at once.

4 Performance Analysis A user obtains global service information from a DIS. There exists information inconsistency when a service fails at status report intervals. If the interval is large, information inconsistency is apparent. The interval should be adjustable according to network status. A user queries current serving capability of services from the IS directly. As our service grid system is developed based on Globus Toolkit 3, information server scalability with both users and service providers, and query performance have been studied in [11]. In this section, we focus on the information service in an organization. Three parameters, including serving capability, looking-up error rate and communication load will be studied. The experiments were run on two sites: the server-sided services are provided at Internet and Cluster Computing Center (ICCC) at Huazhong University of Science and Technology (HUST) with a 16 nodes cluster, each node of Xeon 1GHz, 40GB HD, 512MB memory. The IS is on a node, and the other nodes are used as the SPs. The client is a personal computer with Power604E 200MHz, 2 GB HD, 128MB Memory at High Performance Computing Center (HPCC), HUST. The bandwidth between ICCC and HPCC is around 50Mb/s. A 100 Mbps bandwidth is provided within a cluster. A service for data conversion is provided in one of the SPs. The service has four parameters: 1) size of input data: 1.2Gb, 2) size of output data: 800Mb, 3) size of application code: 2MB, and 4) execution time on a Base Machine (P4 1GHZ CPU, 512MB Memory): 40 sec. Response time of the above service is depicted in Figure 3. When output bandwidth in edge router is limited to a fixed value, response time decreases as CPU available rate increases. But when CPU available rate increases to certain value, response time is basically stable. During the serving process of the service, CPU available rate and output bandwidth are detected timely. Based on response time in Fig.3, current response time of the service is achieved.

HH-MDS: A QoS-Aware Domain Divided Information Service

579

Fig. 3. Response Time Evaluation

We send queries for the service to the IS from the client side timely. By comparing service information with the same time at two sites, looking-up error rate is obtained. BasicValue is used to determine the time the SP reports service information to the IS, and its value is set to certain percent of average response time. Looking-up error rate related to the service under different BasicValue is depicted in Figure 4. The corresponding communication load related to the service is depicted in Figure 5. The SP computes response time per second and determines whether to send information to the IS or not. Communication load statistic is got at one-minute interval. Generally, looking-up error rate and communication load are related to service type closely.

Fig. 4. Looking-up Error Rate

Fig. 5. Communication Load

5 Conclusions and Future Works In this paper, we propose a QoS-aware domain divided information service, HHMDS, as an important component of our service grid platform. We have proposed four types of protocols to construct HH-MDS. Distributed management and distributed information searching are proposed in HH-MDS. Three-level QoS framework of HH-MDS is discussed in this paper to guarantee service usability and information consistency, and improve query performance. In the future works, we will fully study the characteristics of all kinds of services, and propose a more perfect statistic model for serving capability of grid services.

580

D. Zou et al.

References [1] I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid”, Intl. Journal of Supercomputer Applications, 2001. [2] The Web Services Industry Portal, http://www.webservices.org/. [3] H. Ludwig, A. Keller, A. Dan, and R. King, “A Service Level Agreement Language for Dynamic Electronic Services”, Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems, IEEE, 2002. [4] R. Schollmeier, “A Definition of Peer-to-Peer Networking for the Classification of Peerto-Peer Architectures and Applications”, Proceedings of the First International Conference on Peer-to-Peer Computing (P2P’01), IEEE, 2002. [5] M. Kelaskar, V. Matossian, P. Mehra, D. Paul, and M. Prashar, “A Study of Discovery Mechanisms for Peer-to-Peer Applications”, Proceedings of CCGrid’02, pp.414-415. [6] G. Aloisio, M. Cafaro, I. Epicoco, and S. Fiore, “Analysis of the Globus Toolkit Grid Information Service”, Technical report GridLab-10-D.1-0001-GIS_Analysis, GridLab project, http://www.gridlab.org/Resources/Deliverables/D10.1.pdf. [7] K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman, “Grid Information Services for Distributed Resource Sharing”, Proceedings of 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), 2001. [8] S. Fisher, “Relational Model for Information and Monitoring”, Technical Report GWDPerf-7-1,GGF,2001. [9] M. Litzkow, M. Livny, and M. Mutka, “Condor – A Hunter of Idle Workstations”, Proceedings of the 8th International Conference of Distributed Computing Systems, pp.104-111, June 1988. [10] J. Al-Ali, F. Rana, W. Walker, S. Jha, and S. Sohail, “G-QoSM: Grid Service Discovery Using QoS Properties”, Computing and Informatics Journal, Special Issue on Grid Computing, Institute of Informatics, Slovak Academy of Sciences, Slovakia, 21(4), pp.363-382, 2002. [11] X. Zhang, L. Freschl, and M. Schopf, “A Performance Study of Monitoring and Information Services for Distributed Systems”, Proceedings of HPDC’03, 2003. [12] DataGrid, DataGrid Information and Monitoring Services Architecture: Design, Requirements and Evaluation Criteria, Technical Report, 2002. [13] W. E. Johnston, D. Gannon, and B. Nitzberg, “Grids as Production Computing Environments: The Engineering Aspects of NASA’s Information Power Grid”, Proceedings of 8th IEEE Symposium on High Performance Distributed Computing, 1999. [14] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit”, International Journal of Supercomputer Applications, Vol.11, No.2, pp.115-128, 1997. [15] C. Li, G. Peng, K. Gopalan, and T. Chiueh, “Performance Guarantees for Cluster-Based Internet Services”, Proceedings of CCGrid’03, IEEE, 2003.

Grid Service Semigroup and Its Workflow Model Yu Tang1, Haifang Zhou2, Kaitao He3, Luo Chen1, and Ning Jing1 1

School of Electronic Science and Engineering, National University of Defense Technology, Changsha, Hunan, P.R.China

[email protected], 2

{luochen,

ningjing}@nudt.edu.cn

School of Computer, National University of Defense Technology, Changsha, Hunan, P.R.China [email protected] 3

China Geological Survey, Beijing, P.R.China [email protected]

Abstract. Grid service is defined by OGSA as a web service that provides a set of well-defined interfaces and that follows specific conventions. To classify different Grid services and describe their relations, we present Grid Service SemiGroup (GSSG) based on group theory. Moreover, a novel concept, i.e. meta-service, is proposed based on the definition of generating element in cyclic monoid. To meet integration and collaboration demands of distributed and heterogeneous Grid services, some special elements, such as time, resource taxonomy and etc, are introduced to extend basic Petri net for workflow modeling. A new workflow model for GSSG named Grid Service/Resource Net (GSRN) is proposed and presented as well. And some new analysis methods based on graph theory, which complement traditional analysis methods of basic Petri net, are introduced to analyze and evaluate GSRN. The practicability and effectivity of GSRN are demonstrated in an application project.

1 Introduction As a novel technology defined by the Open Grid Services Architecture (OGSA) to implement resource sharing and cooperation, Grid service has become the focus of research and web-based applications. Grid service is a web service that provides a set of well-defined interfaces and that follows specific conventions. The interfaces address discovery, dynamic service creation, lifetime management, notification, and manageability; the conventions address naming and upgradeability [1,2]. According to different purposes, developers and service types, Grid services belong to different organizations and should be classified into different Grid service sets. Since some relations exist among Grid service sets, it is necessary to provide a mechanism to describe these relations. According to our best knowledge, the related research has not been attracted appropriate attention. Then, we propose a novel concept and framework called Grid service semigroup (GSSG) to classify different Grid service sets and describe their relations. And based on the definition of generating element in cyclic monoid[3], meta-service is presented for the first time.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 581–589, 2004. © Springer-Verlag Berlin Heidelberg 2004

582

Y. Tang et al.

On the other hand, for most web-based applications and works, integration and cooperation of different Grid services are indispensable. The course of Grid services cooperation is a service chain and can be described by workflow model. There exist many ways to define and describe the workflow model, such as WFMC define language [4], RAD graph, EPCM model, and etc, in which Petri net has been the focus of research and application. But Basic Petri net can’t model dynamic, timed and conditional workflow [5-7]. So we introduce some additional elements, such as time, condition, resource taxonomy, and etc, to extend basic Petri net. Then we propose a new workflow model named Grid Service/Resource Net (GSRN). Because GSRN is an extended Petri-net-based workflow model for GSSGs, we introduce some new algorithms and methods based on graph theory [3, 8] to complement traditional Petri net methods for analyzing and evaluating GSRN. The remainder of this paper is organized as follows: the definitions and concepts of GSSG are introduced in Section 2. Section 3 is devoted to definitions and related rules of the extended Petri-net-based workflow model for GSSGs (GSRN). In section 4, some new methods for analyzing and evaluating GSRN are discussed in detail. GSRN are illustrated and demonstrated by means of an application example in Section 5. Finally, Section 6 provides some concluding remarks.

2 Grid Service Semigroup A Grid service can connect and invoke other Grid services through standard interfaces to implement sharing and integration. The connecting relation and invoking relation among Grid services can be regarded as an operation (we define it as Join). Based on Grid service sets and Join operation, we find that Grid service sets are similar to semigroup [3]. So we propose Grid service SemiGroup (GSSG) and induce some useful definitions and theorems to determine structural similarity of GSSGs. Definition 1: A semigroup (s,*) is a nonempty set S which has a binary operation * such that Generally, the symbol * can be omitted, i.e. Definition2: A semigroup (S,*) is a monoid if A monoid is denoted by (S,*,e) and such e is called identity. Definition3: A monoid is a cyclic monoid if and such h is called generating element. Definition4: Join denoted by is a binary operation which describes the connecting and invoking relations between any two Grid services. Definition5: Grid service semigroup (GSSG) is a semigroup(GS,+) in which GS is a set of Grid services, i.e. Definition6: Empty service is a service which has no operation and function. Definition7:Grid service monoid is a GSSG such that

Grid Service Semigroup and Its Workflow Model

583

Definition8: A meta-service (ms) is a basic Grid service such that and is a Grid service monoid. Definition9: T is a subset of a GSSG if is a GSSG, then T is a subGSSG of (GS,+).

Definition10: Given two and map is a homomorphism if Monomorphism, epimorphism and isomorphism are defined as homomorphism between GSSGs. Different GSSGs may have same elements or similar structure, so homomorphism is necessary and useful in determining structural similarity of GSSGs. The related theorems are built on definition 10 and will be discussed in our subsequent papers.

3 Grid Service Semigroup Workflow Model: GSRN We extend basic Petri net to describe and model workflow of GSSGs. The basic definitions and concepts of Petri net are in [5-7]. Definition11: Grid Service/Resource Net (GSRN) is an extended Petri net, i.e. a tuple where: P is a finite set of places, is a set of resource(data, information, and etc), is a set of Grid services. T is a finite set of transitions representing the activities, F is a set of flow relation, K is a places capacity function, generally CLR is a resource taxonomy function, CLS is a services taxonomy function, is a GSSG, AC is a flow relation markup function, CN is a condition restriction function on F, TM is a time function on T. GSRN can be classified as fixed time-delay net and unfixed time-delay net. In fixed time-delay net, there is a fixed time value for each transition. And in unfixed time-delay net, a value area is endued to each transition, i.e. each transition has a scheduled executive time which is decided by practical flow. If a transition is scheduled to execute at time b, then its actual execution time (t) satisfies W is a weight function, M is a marking function, of whole net,

denotes the system marking is the initial marking.

584

Y. Tang et al.

Definition12: The pre-set of GSRN place/transition is a set denoted by such that the post-set of GSRN place/transition is a set denoted by such that GSRN is a directed graph which is composed of places, transitions and arc lines. In GSRN, we use token (black spot) to mark resource distribution (tokens exist in places) and arc lines to express the flow relation between places and transitions. As for extended elements, time elements are marked on transitions, and conditions are kept on the corresponding transitions or places. The running of GSRN is implemented by firing transitions. A transition can fire only if its input places have corresponding tokens (markings). After the transition being triggered and fired, the number of tokens in pre-set decreases and the number of tokens in post-set increases accordingly. Definition13: The transition is enabled under M, in symbols iff and (or fire), resulting in a new marking

If

holds, the transition

in symbol

may occur

with:

The transition structures describe the dependent relation among different resource and services. The basic transition structures of GSRN are concluded as six types (figure 1 shows) and workflow models can be composed of these six basic structures (basic place structures are similar to basic transition structures).

Fig. 1. Basic transition structures of GSRN

4 Analysis and Evaluating Methods of GSRN Characteristics of GSRN are very important in analyzing and evaluating GSRN. Main characteristics of GSRN include boundedness, reachability, liveness, and etc. Their

Grid Service Semigroup and Its Workflow Model

585

definitions are as same as those of basic Petri net (see [5-7]). Because GSRN is an extended Petri-net-based workflow model, traditional methods of Petri net should be combined with new algorithms based on graph theory to form a new analysis and evaluating system of GSRN. Follows are two new analysis and evaluating methods.

4.1 Resource Matching Algorithms of GSRN In GSRN, different sub-flows may request same resource and Grid services, but the resource can’t meet all demands at the same time. Then, confliction problem between resource and requests is induced as the resource matching problem [3, 8]. Based on graph theory, we put emphasis on algorithm of bigraph maximal matching for GSRN. Algorithms of bigraph optimal matching [3, 9] will be discussed in another paper. Definitionl4: Given a graph G and its edge subset M, if any two edges in M have no intersectant vertexes, then M is a matching of G. The vertexes related to edges of M are called saturated points; otherwise, the vertexes are called non-saturated points. Definition15: Given a matching M of graph G = (V, E ) , if for any matching M’ of is the number of edges in M), then M is a maximal matching of G. Definition 16: Given a matching M of graph G = (V,E), interleaved path is a path that is composed of edges belong to M and not belong to M alternately. Definition17: Given an interleaved path of matching M of G, i.e. P, if the two vertexes of P are non-saturated points, then P is called augment path. Theorem1: M is the maximal matching of G iff there is no augment paths in M. Proof: See [3]. Theorem 1 is the foundation of algorithms of bigraph maximal matching, and we use Hungary algorithm [3] to get maximal matching of GSRN.

4.2 Linear Temporal Inference Rules As defined in section 3, time element is an important element to evaluate GSRN, so we deduce some linear temporal inference rules as an evaluating method [10, 11]. Before giving rules, we define some symbols: are transitions in GSRN, denote scheduled time of are actual executive time of According to transforming structures shown in figure 2 and definition of time element, linear temporal inference rules, i.e. are proposed as follows [11]. 1. Rule1 (sequence): Based on sequence structure, we get follow equations.

586

Y. Tang et al.

Fig. 2. Transforming structures of linear temporal inference

By (1), (2) and (3), we

then

Rule1: The inducing courses of other rules are similar to rule1 and omitted in this paper. 2. Rule2 (paralleling): 3. Rule3 (free choice): 4. Rule4 (conditional choice):

5. Rule5 (circle):

(k is the circle times).

These rules above can’t be used for all GSRN models, and their applicable conditions are as same as the conditions discussed in [11].

5 A GSRN Example In our research project, we use GSRN to model the workflow of layout planning for the area nearby a bridge. Application courses are listed as follows: 1. Urban planning bureau proposes application request and the workflow begins. 2. Mapping bureau provides area map. 3. Traffic bureau provides related traffic data, Geological bureau provides related data, and business enterprises provide related business data. 4. Corresponding services process and integrate map and various data. 5. Eventually, the results return to urban planning bureau and the workflow finishes. The corresponding GSRN model is shown in figure3. And the meaning of resource and service taxonomy elements in this GSRN model is explained in following tables. In accordance with GSRN modelin figure 3, figure 4 demonstrates the application flow.

Grid Service Semigroup and Its Workflow Model

587

Fig. 3. GSRN model for layout planning

As experiment result shows, GSRN are effective and practical in modeling Grid services workflow. Based on GSRN, we can combine and aggregate distributed Grid services which belong to different GSSGs to fulfill large tasks.

6 Conclusion According to application demands, we propose GSSG and its related theorems based on group theory. And a new concept, i.e. meta-service, is presented. To describe and model workflow of grid services in different GSSGs, a novel extended Petri-netbased workflow model (GSRN) is proposed and discussed in detail. Moreover, some new algorithms and methods based on graph theory are introduced to analyze and evaluate GSRN. And the practicability of GSRN is verified in an application example. GSSG and GSRN are novel concepts and technologies. We will perfect and extend the definitions, theorems, and algorithms in the future. And we will put research emphasis on more key technologies, such as additional theorems of GSSG, rules for

588

Y. Tang et al.

Fig. 4. Application flow of GSRN example

GSRN model predigestion, new theories and methods for analyzing and evaluating GSRN, optimal resource matching algorithm for common graph, and etc. Acknowledgements. This work is supported in part by the National High Technology Research and Development 863 Program of China (Grant Nos.2002AA 104220, 2002AA131010, 2002AA134010).

References 1.

2. 3. 4. 5. 6. 7. 8.

Foster, C.Kesselman et al. The Physiology of the Grid:An Open Grid Services Architecture for Distributed Systems Integration. June, 2002. See http://www.gridforum.org/ogsiwg/drafts/ogsa_draft2.9_2002-06-22.pdf. S.Tuecke, K.Czajkowski et al. Grid Service Specification. Open Grid Service Infrastructure WG, Global Grid Forum, Draft 2. July 2002. See http://www.globus.org. Y.Q.Dai, G.Z.Hu, and W.Chen. Graph Theory and Algebra Structure (in Chinese). Tsinghua University Press, Beijing, China, 1999. D.Hollingsworth. Workflow Management Coalition: The Workflow Reference Model. Document Number WFMC-TC00-1003, Brussels, 1994. T.Murata. Petri Nets: Properties, Analysis and Applications. In Proceedings of the IEEE, 77(4), pages 541-580, April 1989. J.Peterson. Petri Net Theory and the Modeling of Systems. Prentice Hall, Englewood Cliffs, New Jersey, 1981. C.Y.Yuan. Petri Net Theory(in Chinese). Publishing House of Electronics Industry, Beijing, China, 1998. R.Johnsonbaugh. Discrete Mathematics, 4th Edition. Prentice Hall, Englewood Cliffs, New Jersey, 1997.

Grid Service Semigroup and Its Workflow Model

589

9. J. Edmonds. Path, trees, and flowers. Canadian J. Math., 17:449-467, 1965. 10. M. Silva, E. Teruel, and J. M. Colom. Linear algebraic and linear programming techniques for the analysis of place/transition net systems. In Lectures on Petri Nets I: Basic Models, W. Reisig and G. Rozenberg, Eds. Vol. 1491, Lecture Notes in Computer Science, pages 309–373, Springer-Verlag, 1998. 11. T.Liu, C.Lin, and W.D. Liu. Linear Temporal Inference of Workflow Management System Based on Timed Petri Net Models (in Chinese), ACTA ELECTRONICA SINICA, 30(2):245-248, Feb 2002.

A Design of Distributed Simulation Based on GT3 Core Tong Zhang, Chuanfu Zhang, Yunsheng Liu, and Yabing Zha College of Mechaeronics Engineering and Automation, National University of Defense Technology, Changsha 410073 [email protected]

Abstract. Aimed at coordinated resource sharing in distributed, heterogeneous dynamic environment, OGSA supports distributed simulation effectively in resource management. GT3 Core provides a structure of service container, based on which a new mode of distributed simulation system has been designed. The new mode realized the separation of simulation resource and simulation applications, and supplied a simulation server responsible for the organization of simulation resource. The server provides service index for higher simulation applications and enable the interaction among them. Under this new simulation mode, a combat simulation application has been developed as a prototype. It achieved well reusability, portability of simulation resource, and supported heterogeneous, cross-platform application development.

1 Introduction Building on technologies from the Grid [1, 2] and Web services [3], OGSA[4] has appeared as the most important Grid architecture. It defines a uniform exposed service semantics-Grid service, and provides well-defined interfaces for the components in Globus Toolkit (GT) [5]. GT3 is based on a new core infrastructure complied with OGSA, and is an open source implementation of OGSI [6]. GT3 Core [7] offers a runtime environment hosting Grid services, and mediates between the application and the underlying network, and the transport protocol engines. Distributed simulation is geographically distributed simulators interconnected via LAN or WAN, and aims to gain interoperability and reusability. In current systems based on High Level Architecture (HLA)[8], the reuse and cooperation is conditional and lack of wide applicability, which can hardly satisfy increasing simulation requirements. They care more about the operations in applications than the resource management. OGSA provides a new method for building and managing distributed system. Based on its service-oriented mechanism, resources can be encapsulated in a more standard and effective way. Therefore, OGSA serves as a middleware between the simulation resources and applications, and supports the system with great power.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 590–596, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Design of Distributed Simulation Based on GT3 Core

591

2 Backgrounds 2.1 The Framework of Web Service Web Service is one of the bases that support OGSA architecture. It describes a collection of operations which are network-accessible through standard XML messaging. Web Service is intended to facilitate the communication between computer programs, and builds on such standards as HTTP, XML, SOAP, WSDL and UDDI. It defines techniques for describing software components, methods accessing them, and discovery methods that enable the identification of service providers. OGSA takes great advantage of Web Service. First, dynamic discovery and composition of services in heterogeneous environment necessitates mechanisms for registering and discovering interface definitions and endpoint implementation descriptions, and for dynamically generating proxies based on bindings for specific interfaces. WSDL supports this requirement by providing a standard mechanism for defining interface separately from their embodiment within a particular binding. Second, the widespread adoption of Web services mechanism means that a framework based on Web services can exploit numerous tools and extant services. [4]

2.2 GT3 Core – A Grid Service Container [7] The model of GT3 Core is based on the notion of a container that hosts various logic components. These components can be deployed into the container with varying quality of service (QoS) and behaviors. The container must be flexible enough to be deployed into a wide range of heterogeneous hosting environments. Compared to conventional Web services toolkits, it provides three major functions. First, it supports light-weight service introspection and discovery, where information flows in a both pull and push way. Second, it provides dynamic deployment and soft-state management of stateful service instances that can be globally referenced using an extensible resolution scheme. Third, it has a transport independent Grid Security Infrastructure (GSI) supporting credential delegation, message signing, encryption, and authorization.

3 Key Points in GT3 Core 3.1 Service Data Service data refers to descriptive information about Grid service instances, which can support service discovery, introspection, and monitoring. It is a structured collection of information. Each instance has a set of service data elements (SDEs) with different types. OGSI defines extensible operations for querying, updates, and subscribing to notification of changes in SDEs. The application of the GridService interface’s findServiceData operation is service discovery. The essence of service discovery is to obtain the GSH (Grid Service Handle) of a desired service. A Grid service that supports service discovery is called

A Design of Distributed Simulation Based on GT3 Core

593

4 Distributed Simulation Based on GT3 Core 4.1 Design of the Framework We supposed a scene of combat simulation in a two-dimension world, where a missile is launched from the ground to fire at a plane. When the missile hit the plane, the combat is over. In this simulation, three members have been designed: plane member, missile member and manager member. The manager has to control the process of the whole combat. They are distributed in different computers. Based on members’ requirements, the resource of entity model and manager model is abstracted as services. Entity model is responsible for calculating the state of an entity in the combat and holding all the necessary information about them. And manager model take charge of simulation time to support manager member..

Fig. 2. Logical Structure of the Application

Figure 2 shows the logical structure of our application, which is based on a clientserver model. The server is realized as a registry server holding simulation services, namely simulation service container. The model resources in simulation are encapsulated as grid services, including entity model and manager model. As for the client, three members involves: plane, missile and manager. They complete their tasks using underlying services and interact with others through GT3 Core. The separation of simulation resource and applications is the key point here, which shows the superiority of grid-based system. It will largely reduce the coupling between systems and resource, and facilitate the reusability of resource. Further, OGSA specifies interactions between services in a manner independent of any hosting environment, so the services are portable to heterogeneous platforms.

4.2 Design of the Server – Simulation Service Container The server provides an index of all the services related to the simulation application for the clients. Based on registry service, it puts all simulation services together logically, in which a client can look up for desired one, while physically the services are distributed and implemented in various local containers.

594

T. Zhang et al.

Simulation services registered in the server are factory services, including entity factory and manager factory. The client has to create its own service instance from the proper factory, and make the instance serve as one member in the simulation and finish its supposed task. Here, the concept of instance is similar to that of federate in HLA’s federation [8]. The relationship among the service container, the service provider, and the client is shown in figure 3. The structure shows the way how the server provides these registered services to the clients. The content of registry list can be defined as service data in the registry service, which could be subscribed as notification. All these operations are supported by GT3 Core.

Fig. 3. Structure of the simulation server

4.3 Design of the Whole Process Based on the structure of the above framework, the development of simulation application can be summarized as the following steps: Step1: Define the concept model of simulation application, and specify distributed tasks. Then abstract desired simulation services from the requirement.. Step2: Design the server. Base on the common structure of simulation server, different simulation services are developed and deployed in their local container, and required to register to the simulation service container. The service interface and its service data should be designed and implemented under the mechanism of GT3 Core. Step3: Design the client. The client programs enable the utilization and interoperation of the simulation services. The simulation members will execute these client programs to finish the whole simulation task.

A Design of Distributed Simulation Based on GT3 Core

595

5 Realization of the Application 5.1 Simulation Services in the Server The manager service takes charge of the management of distributed simulation, especially the advancement and management of simulation time. It is defined as the service data. The service interface provides related operations as setting/getting the value of the time, and advancing the time with the process of simulation. This service must have the ability to send notification, which can assure the synchronization in the simulation. The entity service describes the model of entity participating the simulation. Its service data is the state information, including the entity’s position, time, and entity ID. The position is a two-dimension coordinate, and the entity ID appears as a GSH, which is a globally unique name. The interface defines operations to calculate the entity’s next moment position on the base of its current position and calculation formula. In order to enable interoperation between different combat members, the ability of notification is also required.

5.2 Simulation Client The realization of three members in combat simulation shows the execution of distributed task under the utilization and interaction among service instances. The manager client serves as a command center and orders to start simulation and advance the simulation time. It collects the information from all the members in the combat. When it makes sure that all the members have finished their own task at this moment, it will advance the time to next moment and send notifications. The manager subscribed notification messages from more than one member. The time when those messages will be sent is unsure, and they invoke the same callback, so how to identify the source and guarantee receiving every message once for all is quite important. So, the entity ID was set to identify notification source. And a boolean variable flag was set to each member, when the message has been sent set it true, else set it false. Only when all the flags turn true, simulation time could be advanced. As for the other two clients, they are quite similar in function. They need set their initial position, velocity, and subscribe to simulation time first. When the notification comes, they will calculate the entity position of next moment and cause position data changes notification to manager.

6 The Conclusion This paper uses grid technology to build a distributed simulation environment and develop a simple combat simulation application. This new system architecture supports reuse and standardization of simulation resource more than before, and achieves well heterogeneity, portability in various platforms. With the development of grid technology, the research of the combination of grid and distributed simulation will have a leap and the system will become more powerful.

596

T. Zhang et al.

References [1] [2] [3] [4] [5] [6] [7] [8]

I. Foster, C. Kesselman.: The Grid: Blueprint for a Future Computing Infrastructure. Morgan Kaufmann Publisher. San Francisco (1999) I. Foster, C. Kesselman, S. Tuecke.: The Anatomy of the grid: Enabling scalable virtual organizations. International Journal of Supercomputer Applications, Vol. 15. (2001) 200~222 S. Graham et al.: Building Web Services with Java: Making Sense of XML, SOAP, WSDL, and UDDI. Sams Technical Publishing. Indianapolis, Ind. (2001) I. Foster, C. Kesselman, J. Nick, S. Tuecke.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project, http://www.Globus.org/research/papers/ogsa.pdf, (2002) I. Foster, C. Kesselman.: Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, Vol. 11. (1997) 115-128 S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, C. Kesselman, T. Maquire, T. Sandholm, D. Snelling, P. Vanderbilt.: Open Grid Service Infrastructure (OGSI) Version 1.0. http://www.ggf.org/ogsi-wg (2003) Thomas Sandholm, Jarek Gawor. Globus Toolkit 3 Core – A Grid Service Container Framework. http://www-unix.globus.org/toolkit/3.0/ogsa/docs/gt3 core.pdf (2003) IEEE Standard for Modeling and simulation (M&S) High Level Architecture (HLA) –Frame and Rules. IEEE Std 1516-2000. (2000)

A Policy-Based Service-Oriented Grid Architecture* Xiangli Qu, Xuejun Yang, Chunmei Gui, and Weiwei Fan School of Computer Science, National University of Defence Technology, Changsha, China,410073 [email protected]

Abstract. Recently, a promising trend towards powerful and flexible Grid executing circumstances is the adoption of a service-oriented infrastructure. Meanwhile, for such requirements as QoS, load balance, security, scalability etc., network paradigm is being shifted from the current hardware-based, manually configured infrastructure to a programmable, automated, policy-based one. Based on the above two observations, in this paper we propose a policybased service-oriented grid architecture, outline its basic model, primary components and corresponding functionalities. Keywords: Grid, policy-based, service-oriented, small world

1 Introduction The Grid concept was first introduced as enabling resource sharing within desperate faraway scientific collaborations [4],[5],[6]. In [3], Grid technologies and infrastructures are defined as supporting the sharing and coordinated use of diverse resources in dynamic, distributed “virtual organizations”(VOs). With the booming of Web Services, component-based programming and middleware technologies, recently, trends show that Grid is more viewed as an extensible set of Grid services. Both in ecommerce and in e-science, integrating services from distributed, heterogeneous, dynamic VO is needed [2]. Therefore, many service-oriented Grid infrastructures and solutions have been presented, among which OGSA is a typical instance. OGSA “defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provide location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities”[2]. And some specific implementation of this architecture has already come into being, such as ICENI (Imperial College e-Science Networked Infrastructure) [9]. In terms of service orientation, such network-enabled entities as computational resources, storage resources, networks, programs, databases, mediums, etc., can all be classified as a kind of Grid services. For the diverse natures of these entities, for the various requirements from users, for the dynamics of the environment, and for the * This paper is sponsored by Chinese 863 OS New Technologies Research 2003AA1Z2060. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 597–603, 2004. © Springer-Verlag Berlin Heidelberg 2004

598

X. Qu et al.

end guarantee of QoS, finding the “best” service capable of meeting the needs of a user, or a community of users, is inherently complex and challenging. Wherein, no single policy can satisfy the whole situation. Therefore, to enable a transparent, organic and efficient composition of services, to realize the blueprint of a Semantic Service Grid, to change a loosely coupled system to a tightly coupled one, with security, scalability, fault-tolerance, interoperability etc. in mind, it is of great importance to introduce dynamic adaptive multi-policies into the whole infrastructure. Meanwhile, today’s network is moving beyond simple, insecure, best-effort data communications, heading for policy-based infrastructures to enable advanced features such as dynamic traffic engineering, guaranteed bandwidth, secure traffic tunneling and so on [1]. Driven by the two changes, this policy-based service-oriented grid architecture is suggested. The rest of the paper is organized as follows: the targets of this architecture are listed in section 2; section 3 outlines the basic model; section 4 details the components and corresponding functionalities; and a brief summary and an outlook to future work conclude this paper in section 5 and section 6, respectively.

2 Targets The targets of such a policy-based service-oriented grid architecture outlined in this paper are: High performance: It should be of efficient, flexible and succinct organization. Context sensibility: policies will make dynamic adjustment according to network status, workload distribution and requirement variations. QoS capability: to satisfy needs both from service requesters and service providers, to provide differentiated services. Multi-protocol interoperability: to enable seamless cooperation between incompatible domains running different protocols. Scalability: to allow the infrastructure to scale flexibly. Security Fault tolerance Mechanism, not policy

3 Basic Model According to analytical results of network behaviorism, network interactive patterns take on a feature of “small world” [10],[11], exhibiting the following two characteristics: high clustering and a small average shortest path between two random nodes (sometimes called the diameter of the network), scaling logarithmically with the number of nodes.

A Policy-Based Service-Oriented Grid Architecture

599

Taking this into account, we adopt a two-level hierarchical structure in this architecture: an inter-domain policy manager, a backup policy manager and an edge policy manager per domain. The whole Grid infrastructure can be divided into a number of relatively independent domains, which will be in the charge of a corresponding edge policy manager. And the inter-domain policy manager will take the responsibilities of coordinating different domains, making system-wide policies, managing edge policy managers. The backup policy manager, as its name implies, mainly serves to be a backup for the inter-domain policy manager. The whole infrastructure is depicted in Fig. 1:

Fig. 1. Basic Model

4 Components and Corresponding Functionalities After outlining the basic model, next we will focus on illustrating the components and corresponding functionalities.

4.1 Inter-domain Policy Manager Generally speaking, a policy will take the form of “If < condition (s) >, then < action (s) >. Considering the two parts consisting a policy-based system: a set of conditions under which the policy applies, including application types, protocol bindings, QoS priorities, workload distributions etc.; and a set of actions that apply as a consequence of satisfying (or dissatisfying) the conditions, including service matching, protocol

600

X. Qu et al.

selection, channel allocation, data migration and so on, a bundle of active and passive components consist this architecture: a policy maker, a policy base, a service repository, a multi-protocol interactor, a monitoring server, an auditing server and an event logger, as is illustrated in Fig. 2.

Fig. 2. Inter-domain Policy Manager

The corresponding functionalities of each component are: Auditing Server is responsible for such security problems as accessing control, user identification and so on. A service request will first enter this component, and will not get through unless qualified. Service Repository, as its name indicates, is a service collector, which is in charge of service registry, service discovering, service aggregation, service caching and service labeling. As an inter-domain service repository, it mainly interacts with those edge service repositories for service information. Policy Base is filled with all kinds of policies, such as security assuring policies, load balancing policies, protocol selecting policies, service matching policies and so on to cope with different situations. Administrators can input policies in Policy Description Language [7]. It also accepts the feedback of a service transaction to dynamically adjust some policies, embodying some self-learning capabilities.

A Policy-Based Service-Oriented Grid Architecture

601

Policy Maker, in a way, is a critical component here. The final decisions, involving service matching, protocol selection, channel allocation, data migration are made here, according to some specific policies in policy base, while taking the external conditions, such as workload distribution, network traffic, network topology, QoS requirements etc. into consideration. And the final result will be logged to event logger for fault-tolerance and service feedback. Multi-protocol Interactor: this component mainly functions to bridge domains running different protocols, which can be implemented with a Proteus Multiprotocol Message Library [8]. Monitoring Server: this is an observer to external conditions, including workload distributions, network traffics, network topologies, service availableness. The information collected will be saved in info base to enable workload balancing, dynamic network topology depicting, and traffic shaping, so as to provide more powerful aids for real-time policing. Event Logger is responsible for servicing transaction logging. Each time a service intercourse is initiated, each participator will be logged. As soon as the transaction succeeds, a “success” signal will be sent here. In this way, servicing information can also be offered to service repository for service labeling. On the condition that the signal is timed out or a “failure” signal is received, this servicing transaction will be rolled back and the policy maker will be notified to make another policy again. By this means, fault is tolerated to a certain degree. Synchronizing Server is responsible for synchronizations with backup policy manager in service repository, event log, info base and policy base, and with edge policy managers for dynamic service refreshment. For efficiency, synchronizing data can be transmitted in wormhole manners.

4.2 Edge Policy Manager An edge policy manager is responsible for local domain policing, whose components are quite similar to the inter-domain policy manager. Since each domain, in a way, constitutes a small world running almost the same protocol, multi-protocol interactor can be removed. Under this circumstance, service transactions occur much more frequently, therefore a channel selector is configured for proper assignment of channels. Meanwhile, for services cannot be fulfilled within the same domain, an outgoing interactor will relay these service requests to the inter-domain policy manager. And the final policy from the inter-domain policy manager is passed down by way of this component. At the same time, it is also responsible for periodically sending an “alive” signal to the backup policy manager. This infrastructure is shown in Fig. 3.

4.3 Backup Policy Manager The backup policy manager acts as a backup for the inter-domain policy manager. It is configured almost of the same components, wherein the monitoring server plays an

602

X. Qu et al.

Fig. 3. Edge Policy Manager

important part monitoring the aliveness of all the other policy managers instead of other status, as long as the inter-domain policy manager is alive. Otherwise, it will take the place of the inter-domain policy manager. If some edge policy manager is observed to be offline, the backup policy manager will choose another node from the in-problem domain to act as the edge policy manager.

5 Summary Our policy-based service-oriented grid architecture is put forward based on two observations: grid is evolving towards a service orientation, while policy-based network is springing up. From the illustration of its primary components and corresponding functionalities, it can be concluded that: this architecture is dynamic, context sensible, QoS capable, workload balanceable, secure, scalable, multi-protocol interoperable and fault-tolerant.

6 Future Work So far, this architecture remains just a blueprint; we will try to implement a prototype in the future.

A Policy-Based Service-Oriented Grid Architecture

603

For the limited fault-tolerant capabilities of this architecture and frequent volatilities in network, strong and efficient fault-tolerant measures such as service dependence analysis will be taken later. Since the policy maker plays a critical role in this architecture, parallelism will be exploited in avoidance of bottleneck.

References 1. David Durham: A New Paradigm for Policy-Based Network Control. Intel Developer Update Magazine, November 2001 2. Ian Foster, Carl Kesselman, Jeffrey Nick, and Steve Tuecks: The Physiology of the Grid: An Open Grid Service Architecture for Distributed Systems Integration. http://www.globus.org/ogsa/ 3. Foster, I., Kesselman, C. and Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 15 (3), (2001) 200-222 4. Catlett, C. : In Search of Gigabit Applications. IEEE Communications Magazine (April). 42-51. 1992 5. Catlett, C. and Smarr, L. : Metacomputing. Communications of the ACM, 35 (6), (1992) 44-52 6. Foster, I. The Grid: A New Infrastructure for 21st Century Science. Physics Today, 55 (2), (2002) 42-47 7. Jorge Lobo, Randeep Bhatia, Shamim Naqvi: A Policy Description Language, proceedings AIII, (1999) 291-298 8. Kenneth Chiu, Madhusudhan Govindaraju, Dennis Gannon: The Proteus Multiprotocol Message Library. Proceedings of the IEEE/ACM SC2002 Conference November 16 - 22, 2002 Baltimore,Maryland. p. 30 9. Nathalie Furmento, William Lee, Anthony Mayer, Steven Newhouse, and John Darlington: ICENI: An Open Grid Service Architecture Implemented with Jini. Proceedings of the IEEE/ACM SC2002 Conference November 16 - 22, 2002 Baltimore, Maryland, p. 37 10. Jörn Davidsen, Holger Ebel, and Stefan Bornholdt: Emergence of a Small World from Local Interactions. Modeling Acquaintance Networks Physical Review Letters, 2002 11. D.J. Watts: Small worlds: The Dynamics of Networks between Order and Randomness. Princeton University Press ,1998

Adaptable QOS Management in OSGi-Based Cooperative Gateway Middleware Wei Liu1, Zhang-long Chen1, Shi-liang Tu1, and Wei Du2 1

Department of Computer Science and Engineering, Fudan University, Shanghai 200433 {wliu, chenzl, sltu}@fudan.edu.cn 2

College of Management, University of Shanghai for Science and Technology, Shanghai 200093 [email protected]

Abstract. The Open Services Gateway Initiative (OSGi) Specification defines a service-oriented cooperative framework between home and outer home. It uses the OSGi-gateways to deliver products and services to endusers, such as home security control and intelligent home equipments. The paper studies the QOS problem of OSGi technology, and puts forward the QOS problems and other limitations. And it uses Real-Time Specification for Java (RTSJ) and dynamic adaptable QOS management integrating the OSGi framework to solve the QOS problem.

1

Introduction

Internet connections for private users are becoming much cheaper and faster. While the embedded and telecommunication equipments are getting smaller and more powerful, it needs an embedded server that is inserted into the network to connect the external internet to internal clients. The Open Services Gateway Initiative (OSGi) is making developers and enterprises realize the potential of the consumer equipments market such as virtual intelligent home and intelligent home health care etc. But how to provide reliable quality of service management in OSGi-based open middleware is a stringent problem. The central component of the OSGi specification is the service gateway that acts as the platform for many communication-based services. The service gateway can enable, consolidate and manage voice, data, internet and multimedia communications from the home, office and other locations.

2 2.1

Adaptable QOS Management of OSGi-Based Cooperative Middleware Limitation of QOS in OSGi Framework

Nowadays the specification of OSGi is 3.0. It does not provide rational QOS solution in middleware layer and framework. But the OSGi-based applications M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 604–607, 2004. © Springer-Verlag Berlin Heidelberg 2004

Adaptable QOS Management

605

maybe have requirements for real-time ability and predictability such as virtual intelligent home and intelligent home health care. Increasingly, applications in the domains are to perform more demanding functions over highly networked environments, which in turn places more stringent requirements on the underlying computing and network systems. Therefore the OSGi-based middleware are requiring a broad range of features, such as service guarantees and adaptive resource management, to support a widening performance, secure operation and predictability.

2.2

Software Solution: Adaptable QOS Management

To meet these research challenges, it is necessary to preserve and extend the benefits of existing middleware, while defining new middleware services, protocols and interfaces in OSGi-based middleware. The paper puts forward to integrate the OSGi-based specification using Real-time Java specification such as RTSJ. OSGi-based Middleware by RTSJ. For developing the standard for realtime Java, IBM, Sun and other organizations from industry and academia formed a team called the Real-Time for Java Expert Group, and proposed the RealTime Specification for Java (RTSJ). RTSJ is the definitive reference for the semantics, extensions and modifications to the Java programming language that enable the Java platform to meet the requirements and constraints of real-time system performance, predictability and capabilities. This specification provides programmers with the ability to model applications and program logic that require predictable execution, which meets hard and soft real-time constraints. However, the development of the RTSJ-compliant Java Virtual Machine has been slow for most vendors of real-time operating systems. It decided to design a real-time extension library that can satisfy the basic requirement of developing real-time programs. Dynamic Adaptable QOS Management. It needs to use QOS monitoring etc some kinds of method to provide reliable QOS in OSGi-based middleware using RTSJ specification. QOS violations are reported to diagnosis functions that identify the causes of poor QOS. Allocation analysis identifies possible reallocation actions to improve the QOS, and selects the best node of these possible actions. This section illustrates the use of the system model for QOS monitoring, for QOS forecasting, allocation analysis. QOS Forecasting. Monitoring of real-time QOS involves the collection of timestamped events sent from applications, and synthesis of the events into path-level QOS metrics. Forecasting of the real-time QOS allows early prediction of QOS overload or underload violations. Such conditions may occur when an unanticipated increase in tactical data causes the resource utilization to exceed the appropriate threshold levels. For forecasting QOS violations, the system model must be flexible enough to adapt to dynamic changes in resource utilization.

606

W. Liu et al.

QOS Adaptation and Allocation Mechanisms Analysis. However, to perform monitoring, the QOS requirements specified in application level terminology need to be translated into transport level terminology, that is, for example, from video frame or audio packet to transport protocol data units. A further level of translation is needed. This rescaling of QOS parameters is called QOS parameters mapping. A mapping between the type of services the transport protocol offers and the traffic classes the network offers is also needed. In this section we illustrate the use of the system model and the load indexes for selection of the best node for allocation purposes. In describing a best-node selection algorithm, we use the notation Li(hi,t) and Li(Li,t) to denote the load index of a host hi and LAN Li at time t, respectively, since a variety of different load index functions may be used. The best-node algorithm determines the best node on which to restart or scale a candidate application. The best host is determined using a fitness function that simultaneously considers both host and LAN load indices. The algorithm first computes the trend values of load indices of hosts and LANs over a moving set of samples. The trend values are determined as the slope of a simple linear regression line that plots the load index values as a function of time. QOS Levels. The application set has four applications, each having four and nine levels with associated benefit and CPU usage numbers. While these applications and levels do not correspond exactly to some applications, the ranges of CPU usages and benefit values used test the QOS level model and vary at least as much as one would find in most actual applications. For the next set of experiment, application period is fixed at 1/10 of a second for all QOS levels of all applications.

3

Experiment and Related Analysis

The test results reported in this section were obtained on an Intel Pentium 1.7GHz with 256 MB DDR RAM, running Linux Red Hat 7.3 with the TimeSys Linux/RT 3.0 GPL5 kernel . The Java platforms used to test the RTSJ features are described below:TimeSys RTSJ Reference Implementation(RI). TimeSys has developed the official RTSJ Reference Implementation (RI), which is a fully compliant implementation of Java that implements all the mandatory features in the RTSJ. The RI is based on a Java 2 Micro Edition (J2ME) Java Virtual Machine (JVM) and supports an interpreted execution mode i.e., there is no just-in-time (JIT) compilation. Run-time performance was intentionally not optimized since the main goal of the RI was predictable real-time behavior and RTSJ-compliance. The result shows the QOS levels at which the four applications run with a skip value of 0. The QOS levels change fast at the beginning, because it is starting the system in a state of CPU overload, i.e., the combined QOS requirement for the complete set of applications running at the highest level(level 1) is about 200% of the CPU. By the 10 th sample, the applications have stabilized at levels that can operate within the available CPU resources. There is an additional

Adaptable QOS Management

607

level adjustment of application 3 at the 38th sample due to an additional missed deadline probably resulting from transient CPU load generated by some nonQOS applications. The test result shows the requested CPU allocation for the applications in the same experiments. Here it shows that the total requested CPU allocation starts out at approximately twice the available CPU, and then drops down to about 100% as the applications are adjusted to stable levels. Note also the adjustment at sample 38, lowering the total requested CPU allocation to approximately 80%. As depicted in above, when the CPU running queue length, or the load average - which is mainly based on the CPU running queue length - are used as load indices, the path latencies are the best. This indicates that unlike other load indices considered, the resource manager component of the middleware made the best allocation decisions using Li and Xt.The most important content of the project is the design and development of open OSGi-based Middleware. It will allow services to be remotely deployed and administered onto home network gateways such as set-top boxes and DSL modems.

4

Conclusions and Future Work

In the paper, it brings forward to integrate the RTSJ and adaptable QOS management in OSGi-based cooperative middleware to solve the QOS problem in OSGi. On the other hand, for highly dynamic systems, adaptive QOS-driven resource management is necessary to utilize system resources efficiently and to provide the appropriate end-to-end application-level QOS support. For future work, it needs to adapt the transport to wireless environments, design a feedback scheme for multicast that is scalable and derive equations for exact QOS mapping.

References 1. Open Services Gateway Initiative, “OSGI Service Platform,” Release 3, 2003. 2. G. Bollella, J. Gosling, B. Brosgol et al: The Real-time Specification for Java. Addison Wesley, 2000. http://www.rtj.org . 3. K. Chen. Programming Open Service Gateways with Java Embedded Server Technology. Addison-Wesley,2001. 4. www.timesys.com 2003 5. D. Jordan: “Java in the Home: OSGi Residential Gateways”, Java Report, September, 2000, pp 38-42, 104. 6. E. S. Gardner, Jr.: “Exponential Smoothing: The State of the Art,” Journal of Forecasting. 7. Campbell and G. Coulson: “QOS Adaptive Transports: Delivering Scalable Media to the Desk Top,” IEEE Network ,1997. 8. R. Rajkumar, C. Lee, J. Lehoczky, and D. Siewiorek: “A Resource Allocation Model for QOS Management,” 18th IEEE Real-Time System Symposium , 1997. 9. D. Hardin: “The real-time specification for Java,” Dr. Dobb’s Journal, Vol. 25.

Design of an Artificial-Neural-Network-Based Extended Metacomputing Directory Service* Haopeng Chen and Baowen Zhang Distributed Computing Technique Centre, Shanghai Jiao Tong University, 200030 Shanghai, P.R.China {Chen–hp, Zhang-bw}@cs.sjtu.edu.cn http://www.cs.sjtu.edu.cn

Abstract. This paper analyzes a serious limitation of existing metacomputing directory service of Globus project that the existing metacomputing directory service doesn’t support application-oriented queries, and then designs an artificial-neural-network-based GRC (grid resources classifier) to eliminate this limitation. This classifier extends the metacomputing directory service by classifying grid resources into application-oriented categories. The classification precision of this GRC can be continuously improved by self-learning. This kind of new metacomputing directory service will be compatible with the old ones. Thus, the practicability of metacomputing directory service will be improved.

1 Introduction Globus is the most influential one of the current grid computing projects. In globus, MDS (metacomputing directory service) provides the functions for users to discover, register, query, and modify the information about grid computing environment. It reflects the real-time state of grid computing environment. [1] Users can locate grid resources and get their attributes by invoking MDS. [2] However, the functions provided by the existing MDS are incomplete because existing MDS doesn’t support the application-oriented query. For example, the existing MDS doesn’t support the query about which resource is suitable for massive data analyzing. However, for the most users, the application-oriented queries are more useful, so the functions of MDS need to be extended. This paper aims to the limitation of existing MDS, puts forward an ANN (artificial neural network) based solution to extend the existing MDS to be able to support application-oriented queries. Thus, the practicality of MDS will be improved.

* This paper is supported by the Shanghai Science and Technology Development Foundation under Grant No. 03DZ15027. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 608–611, 2004. © Springer-Verlag Berlin Heidelberg 2004

Design of an Artificial-Neural-Network-Based Extended Metacomputing

609

2 The ANN Topology of the GRC We designed a GRC(grid resources classifier) which can execute application-oriented classification by the information about grid resources to extend the existing MDS. We have chosen the ANN to design the application-oriented GRC because the input attributes of the instances of grid resources are their information which is stored in the LDAP server in the form of attribute-value pairs, and the result of classification is represented by a vector that each element of it represents the probability that the instance is suitable to be classified as the category that this element represents. So ANN learning is suitable for GRC [3] We decide to employ the sigmoid units as the basic units of the ANN of GRC. The ANN of GRC should be a two layers network which has a hidden unit layer inside, and there should be three sigmoid units in the hidden unit layer. The main reason for such design is that according to the practice, such design can be able to solve the most functions, and if we add more layers or more sigmoid units in the hidden layer, we’ll find that the precision couldn’t be improved markedly, but the training time would be prolonged greatly. The input vector of the ANN of GRC should include all static and dynamic information of the specified grid resource. We can use a linear function to scale-up and/or –down the numerical information into a suitable range. The number of the sigmoid units included in the output layer of the ANN of GRC is as many as the number of the application-oriented categories of grid resources. We use the probability that the instance is suitable to be classified as the category that this element represents to represent each element. According to the above analysis, we can obtain the topology of ANN of GRC. It has been illustrated in Figure 1.

Fig. 1. The topology of ANN of GRC

3 The Employed ANN Learning Algorithm BP Algorithm, (Back Propagation Algorithm) is the most common ANN learning algorithm. [4] In order to prevent get into the local minima in the error surface, and gradually increase the step size of the search in regions where the gradient is un-

610

H. Chen and B. Zhang

changing, we employ the BP algorithm which has a momentum term. This algorithm is described as the followings: Backpropagation_for_GRC(training_examples, Statements of symbols: training_examples represent the training instances of grid resources. Each of them is a pair of the form

where

is the vector of network input values,

and is the vector of target network output values. is the learning rate. It is a constant with very small value. We can specify its value according to the proper characteristics of grid, for example, is the dimension of the network input vector. is the dimension of the network output vector. It is equal to the number of units in the output layer. is the number of units in the hidden layer, we evaluate it as 3. The input from unit

to unit

The weight from unit to unit

is denoted is denoted

is a momentum constant. The value of is very small, for example, The process of this algorithm is described as followings: Create a feed-forward network with inputs, hidden units, and output units. Initialize all network weights to small random numbers. Until the termination condition is met, Do: For each in training_examples , Do: 1. Input the instance

to the network and compute the output

of each unit

in the network. 2. For each network output unit k , calculate its error term

3. For each hidden unit h , calculate its error term

4. Update each network weight

Where The description of this algorithm is over. In the above algorithm, the GRC always uses the current learned function to classify some grid resource in real time. According to the feedback of users, the GRC will modify the current function to obtain the new one. So this algorithm would not be stopped for ever, we just continuously use the newest learned function. The values of and should be specified according to the own characteristics of differ

Design of an Artificial-Neural-Network-Based Extended Metacomputing

611

Fig. 2. The architecture of the extended MDS which has a GRC

ent grids. It is unnecessary and impossible to give a set of values which can be applied to any grid.

4 The Architecture of the Extended MDS Which Has a GRC The architecture of the extended MDS which has a GRC is shown in Figure 2. User B and user C access MDS by the primary ways. User A access MDS by accessing GRC, GRC gets the static and dynamic information of the current available resources, filters information by the learned function, and returns the information of suitable resources to the user A. User A will send a feedback to GRC according to his final choice. GRC will modify its classification function by learning this feedback to improve the precision of classification.

5 Conclusion This paper designs an artificial-neural-network-based GRC to extend the metacomputing directory service by classifying grid resources into application-oriented categories. However, several aspects of the GRC given in this paper still need to research, such as the time complexity of training process, the space complexity of instance space, the training algorithm, and simulation.

References 1. Dou Zhi-hui, Chen Yu, & Liu Peng: Grid Computing. Interior materials. (2002) 87-96 2. The Globus Toolkit 2.2 MDS Technology Brief Draft 4 – January 30, 2003 http://www.globus.org/mds/ mdstechnologybrief draft4.pdf 3. Tom M. Mitchell: Machine learning. McGraw-Hill Companies, Inc. (1997) 70-74 4. Martin T. Hagan., Howard B. Demuth., & Mark H. Beale: Neural network Design. PWS Publishing Company. (1996) 197-207

Gridmarket: A Practical, Efficient Market Balancing Resource for Grid and P2P Computing* Ming Chen, Guangwen Yang, and Xuezheng Liu Dept. of Computer Science and Technology, Tsinghua University {cm01,ygw,liuxuezheng00}@mails.tsinghua.edu.cn

Abstract. The emergency of computational Grid and Peer-to-Peer (P2P) computing system is promising to us. It challenges us to build a system to maximize collective utilities through presumed participants’ rational behavior. Although economic theories sound reasonable, many existent or proposed solutions based on that face problem of feasibility in practice. This paper proposes Gridmarket: an infrastructure relying on resource standardization, continuous double auction, and straightforward pricing algorithms which are based on price elasticity inherent in consumers and suppliers. Gridmarket efficiently equates resource’s demand with supply through continuous double auction and price tracing mechanism in the required price ranges. Software agent employing Gridmarket’s schedule is easy to write. To demonstrate its efficacy and efficiency, we have designed, built a simulation prototype and found the experiments promising.

1

Introduction

Computational Grid and P2P computing system’s emergence provides promising solutions for cooperatively solving large-scale computing problems. Such systems consist of organically and economically independent entities. The rationality of human beings derives from their individuals’ selfishness. The contribution of resources only depends on the fickle concept of goodwill. Lacking mechanism to temper supplier and demander, the system tends to be unbalanced and eventually to collapse. A good incentive mechanism can allocate resource efficiently and boost the system’s prosperity. Mechanism built on economic models are better than schedule resolution only concerning system-parameters. In such geographically distributed systems spanning multiple independently organisms and entities, it provides a clear and familiar model for users. Several approaches[2.. 11] in this direction have been proposed to bring balance between demand and supply into these systems. But they are not practical in realms for lacking feasibility, or price formation mechanism [2], or complete support for required price ranges set by consumers and * Supported by National Natural Science Foundation of China (60373004,60373005) M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 612–619, 2004. © Springer-Verlag Berlin Heidelberg 2004

Gridmarket: A Practical, Efficient Market Balancing Resource

613

suppliers, or schedule scope[3][4][5], or transaction efficiency[10][11]. The above all are necessary preconditions for a productive market. In this paper we present Gridmarket, an practical infrastructure aiming to balance demand and supply in grid and P2P systems. Gridmarket is composed of following components: traded resource standardization, continuous double auction, intuitive and straightforward pricing algorithms based on price elasticity set by consumers and suppliers. The pricing algorithms greatly lift burden on consumers and suppliers: once consumers and suppliers set three simple parameters, software agents of consumers automatically bid resources to execute tasks while software agents of suppliers sell idle resources of themselves in the continues double resource market. Gridmarket features maneuverability, simplicity, efficacy, and efficiency.

2

Market Model

Enlightened by the economic phenomenons of realms, we reason that a fully competitive resource market where consumers and suppliers trade standard resources is necessary for Grid and P2P computing. Resource market provides the basic exchange function for resource consumers and suppliers. Resources traded in resource market are immediately consumable right after transactions. Because it is a perfectly competitive market, resource suppliers can’t manipulate prices to maintain high prices and fleece consumers unless they conspire together. Considering possible prohibitive expensive legal penalties and the difficulties to collude among large number of independent suppliers, the possibility of ostensible collusion is very low. Resource suppliers have to sell their resources at the prices of market to get returns on and of their sunk investments. This is their only available choice. In such a completely competitive market, supplier’s margin revenue == market resource price. A supplier can maximize its profit by providing as many as resources at the cost less or equal to market price. On the other side, consumers also can’t manipulatively depress market price to extort suppliers. Lower price stimulates demand and restrains supply while higher price chokes off demand and fuels supply. The market tend to be equivalent. The invisible hand[1] of market guarantees the full employment of resources. To increase the liquidities of the resource market, all items traded in it must be standardized, say resources are classified into different predefined standardized categories with unique identifiers. Consumers and suppliers can and only can bid/be asked for standard resources. Although this design may limit the flexibility of expression for resource, it provides standardization and reduces the complexity of communication and match process both for programmes and human beings. Backed by human standardization history, we envision that with the evolution of P2P and Grid, traded resources will gradually be standardized,too. Every transaction price in the resource market is published to market participants. The publication makes the market transparent, fair and efficient. Con-

614

M. Chen, G. Yang, and X. Liu

sumers and suppliers can make orders according to transaction price and their own pricing strategies. Orders are directly sent to the resource market for match.

3

Market Components

In this section, we describe match process, and pricing algorithms in detail in order.

3.1

Match Process

Resource market periodically uses price-driven continuous double auction process to match consumer’s bidding orders and supplier’s asked orders. Double Auction is one of the most common exchange institutions in the marketplace. Most stock markets (e.g.: NASDAQ, Shanghai Stock Exchange, and Shenzhen Stock Exchange) use double auction to exchange equities, bonds, and derivatives. In the double auction model, bid orders (buy orders) and asked orders (sell orders) can be submitted at anytime during the trading period. At the end of a match period, if there are open bids and asks that match or are compatible in terms of price and requirements (e.g., quantity of goods or shares), a trade is executed. Bids are ranked from highest to lowest in term of bid price while asks are ranked from lowest to highest in term of ask price. The match process starts from the beginning of ranked bids and asks. Some complex algorithms[12][13] have been developed to automate bidding in double auction process for stock trading. If prices are equal, match priorities are based on the principle of time first and quantity first: previous orders superiors later orders and orders with larger quantities arriving at the same time precede those with less quantities.

3.2

Pricing Algorithms

We propose two pricing algorithms: consumer pricing algorithm (Figure 1) and supplier pricing algorithm (Figure 2). The consumer pricing function is: where denoting base price and expressing price elasticity are consumer-specific coefficients and t is the time parameter. This function is intuitive and straightforward. With time elapsing, a consumer usually may bid with a higher and higher price if he can’t successfully buy needed resource. The supplier function is: where denoting base price and expressing price elasticity are supplier-specific coefficients and t is the time parameter. This function is also easy to understand. With time elapsing, a supplier usually may tend to offer his idle resource with a lower and lower price if he can’t successfully sell his resource. These two functions automatically make temporal differences between bid price and ask price to converge to clear the market. Consumers can set their ceiling prices and suppliers can set their floor prices. Increase in ceiling price in bid improves the demand power for consumers and decrease in floor price in ask buildup the supply competitiveness for suppliers.

Gridmarket: A Practical, Efficient Market Balancing Resource

Fig. 1. Consumer Pricing Algorithm

4 4.1

615

Fig. 2. Supplier Pricing Algorithm

Analysis and Experiments Analysis

The system are modelled as a M/M/N queuing network[17]. Task streams of all consumers are bound into a single task streams as system input stream. We employs below equations[17] to theoretically analyze the resource utilization rate and responsive time of our system:

where separatively and

and are the number of consumers and suppliers is the system resource usage rate.

System responsive time is:

4.2

Experiments

We use a event-driven prototype to measure our algorithms. The prototype samples before the 3000th task arrives.

616

M. Chen, G. Yang, and X. Liu

Fig. 3. Transaction mean price and transactions with different ceiling prices and floor prices (2 consumers vs. 2 suppliers):

Synthetical experiment. Two consumers with different ceiling prices and two supplier with varying floor prices play bargaining game in this experiment. The result shows that comparatively lower floor prices and relatively higher ceiling prices are good choices for suppliers and consumers respectively in the constraint of cost/wealth. There is no absolute panacea for consumers and suppliers. The game theory dominates as expected. Schedule efficiency. In this section, we explore the schedule efficiency of this algorithm in aspects of task responsive time penalty and resource utilization rate varying elasticity coefficients (Figure 4 and Figure 5). First, we can see from figures that our schedule algorithm are highly efficient: the theoretical curves (Plotted according to Equation 2 and Equation 1 respectively) are almost approximated by experiment curves when system’s load is not high. Second, time burden duo to bargaining between consumers and suppliers

Gridmarket: A Practical, Efficient Market Balancing Resource

617

increases sharply as system approaches saturation and the degree of increased burden is negatively related to elasticity coefficients. The reason behind it is straightforward: bargaining time costs are neglectable relative to ‘long’ arrival intervals when system load are light, but the costs do matter in high load cases. These costs reduce resource utilization rates and increase responsive times.

Fig. 4. Responsive Time (1 consumer vs. 1 supplier)

5

Fig. 5. Usage Rates (1 consumer vs. 1 supplier)

Related Work

There are lots of works in this area which can be classified into four categories: commodity-market model, auction model, credit-based model and theoretical analysis. We outline them by category.

5.1

Commodity-Market Model

Nimgrod-G[2], Mungi[3], and Enhanced MOSIX[4] fall into this category. Nimrod-G claimed that it supported multiple economic models, but its implementation focused on commodity-market model. Nimrod-G assumed that exogenous, predefined and static prices exists for resources and that the length of run time of a program can be accurately estimated which maybe unrealistic in practice. In Mungi[3], which is a single address space operating system, applications obtain bank accounts from which rent is collected for the storage occupied by objects. Rent automatically increases as available storage runs low, forcing users to release unneeded storage. Its main concern is garbage collection. Enhanced MOSIX[4] deployed in cluster environment uses opportunity cost method which converts the usage of several heterogeneous resources in a machine to a single homogeneous cost. It does not take the prices that consumers can afford into account.

5.2

Auction Model

This class includes Spawn[5], Rexec/Anemone[7], and JaWS[8]. Spawn employs Vickrey Auction [6]—second-price sealed auction—to allocate resources among

618

M. Chen, G. Yang, and X. Liu

bidders. Bidders receive periodical funding and use balance of fund to bid for hierarchical resources. Task-farming master program spans and withdraws subtasks depending on its relative balance to its counterparts. It doesn’t consider heterogenous resources and is mainly targeted for Monte Carlo simulation applications. Rexec/Anemone[7] implements proportional resource sharing in clusters. Users assign utility value to their applications and system allocates resources proportionally. Cost requirement is not its consideration. In JaWS (Java Webcomputing System) [8], machines are assigned to applications via auction process in which highest bidder wins out. These above solutions doesn’t make use of continuous double auction.

5.3

Credit-Based Model

Mojo-Nation[10] and Samsara[11] are all kind of this type. In Mojo-Nation and Samsara, storage contributors earn some kind of credits or claims by providing storage space and spend them when needed. It is a bartering methodology.

5.4

Theoretical Analysis

[14] explored the interaction between human objects and software bidding agents using strategies based on extensions of the Gjerstad-Dickhaut[12] and ZeroIntelligence-Plus[13] algorithms in a continuous double auction process. Gains of human objects and software agents and trading equilibrium are its main concern. [15] measured the efficiency of resource allocation under two different market conditions—commodities markets and auctions—in terms of price stability, market equilibrium, consumer efficiency, and producer efficiency using hypothetical mathematical model.

6

Conclusion

It is an effective approach, using economic models to schedule tasks in a worldwide geographically distributed environment. In this paper, we present Gridmarket, a practical, simple but efficient schedule infrastructure. Gridmarket is built on resource standardization, continuous double auction, and intuitive and straightforward pricing algorithms based on price elasticity inherent in consumers and suppliers. Software agents for consumers and suppliers can automatically bid resources to execute tasks and sell idle resources respectively through Gridmarket. Gridmarket efficiently equates resource demand with supply through continuous double auction and price tracing mechanism in the reasonable price range. Preliminary simulation results demonstrate the efficacy in term of resource allocation and the efficiency in term of resource utilization.

Gridmarket: A Practical, Efficient Market Balancing Resource

619

References 1. Adam Smith, An Inquiry into the Nature and Causes of the Wealth of Nations, 1776. 2. R. Buyya, D. Abramson, J. Giddy, and H. Stockinger, Economic Models for Resource Management and Scheduling in Grid Computing, Special Issue on Grid Computing Environments, The Journal of Concurrency and Computation: Practice and Experience (CCPE), Wiley Press, May 2002. C. Waldspurger, T. 3. G. Heiser, F. Lam, and S. Russell, Resource Management in the Mungi SingleAddress-Space Operating System, Proceedings of Australasian Computer Science Conference, February 4-6, 1998, Perth Australia, Springer-Verlag, Singapore, 1998. 4. Y. Amir, B. Awerbuch., A. Barak A., S. Borgstrom, and A. Keren, An Opportunity Cost Approach for Job Assignment in a Scalable Computing Cluster, IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 7, pp. 760-768, IEEE CS Press, USA, July 2000. 5. Hogg, B. Huberman, J. Kephart, and W. Stornetta, Spawn: A Distributed Computational Economy, IEEE Transactions on Software Engineering, Vol. 18, No. 2, pp 103-117, IEEE CS Press, USA, February 1992. 6. W. Vickrey, Counter-speculation, auctions, and competitive sealed tenders, Journal of Finance, Vol. 16, No. 1, pp. 9-37, March 1961. 7. B. Chun and D. Culler, Market-based proportional resource sharing for clusters, Technical Report CSD-1092, University of California, Berkeley, USA, January 2000. 8. S. Lalis and A. Karipidis, An Open Market-Based Framework for Distributed Computing over the Internet, Proceedings of the First IEEE/ACM International Workshop on Grid Computing (GRID 2000), Dec. 17, 2000, Bangalore, India, Springer Verlag Press, Germany, 2000. 9. K. Reynolds, The Double Auction, Agorics, Inc., 1996, http://www.agorics.com/Library/Auctions/auction6.html. 10. Mojo Nation - http://www.mojonation.net/, October 2003. 11. Landon P. Cox, Brian D. Noble, Samsara: Honor Among Thieves in Peer-to-Peer Storage , Proceedings of the 19th ACM Symposium on Operating System Principles, October 2003. 12. S. Gjerstad and J. Dickhaut, Price formation in double auctions. Games and Economic Behavior, 22:1 C29, 1998. 13. D. Cliff and J. Bruten, Minimal-intelligence agents for bargaining behaviors in marketbased environments, Technical Report HPL-97-91, Hewlett Packard Labs, 1997. 14. R. Das, J. Hanson, J. Kephart, and G. Tesauro, Agent-Human Interactions in the Continuous Double Auction, Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), August 4-10, 2001, Seattle, Washington, USA. 15. R. Wolski, J. S. Plank, J. Brevik and T. Bryan, Analyzing Market-Based Resource Allocation Strategies for the Computational Grid, The International Journal of High Performance Computing Applications, Sage Science Press, Volume 15, number 3, Fall, 2001, pages 258-281. 16. M. Livny R. Raman and M. Solomon, Matchmaking: Distributed Resource Management for High Throughput Computing, Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, July 28-31, 1998, Chicago, IL. 17. Hock N C. Queuing Modelling Fundamentals John Wiley & Sons Ltd., 1997.

A Distributed Approach for Resource Pricing in Grid Environments Chuliang Weng, Xinda Lu, and Qianni Deng Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200030, People’s Republic of China {weng-cl, lu-xd, deng-qn}@cs.sjtu.edu.cn

Abstract. A distributed group-pricing algorithm is presented for the marketbased resource management in the grid context based on quick convergence of centralized algorithms and scalability of distributed algorithms. According to the new algorithm, resources in the system are divided into multiple resource groups according to the degree of price correlation of resources. When the demand and supply of resources in the system changes, each auctioneer in the defined system structure is responsible for adjusting simultaneously the price of one resource group respectively until the excess demand of all resources becomes zero. We test the distributed group-pricing algorithm against the existed distributed algorithm, and analyze the property of the algorithm. Experimental results indicate that an equilibrium can be achieved by the distributed grouppricing algorithm quicker than by the existed distributed algorithm.

1 Introduction As a new infrastructure for next generation computing, grid systems enable the sharing, selection, and aggregation of geographically distributed heterogeneous resources for solving large-scale problems in science, engineering and commerce [1]. Many studies have focused on providing middleware and software programming layers to facilitate grid computing. There are a number of projects such as Globus [2] and Legion [3] that deal with a variety of problems such as resource specification, information service, and security issues in a grid computing environment involving different administrative domains. Grid resources are geographically distributed across multiple administrative domains and owned by different organizations. The characteristic of resources in the grid computing systems results in the following difficulty: there is no uniform strategy for resource management because resources belong to different organizations which have their own local strategies for resource management; the dynamic characteristic should be transparent to grid users with appropriate methods; resources are heterogeneous which differ in many aspects. The dynamic and heterogeneity not only are the inherent characteristics of grid computing systems, but also autonomy becomes the special characteristic of the grid for resources distributed across multiple administrative domains. The market mechanism is very suitable for solving the problem of resource management in the grid M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 620–627, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Distributed Approach for Resource Pricing in Grid Environments

621

context: market mechanism in economics is based on distributed self-determination, which is also suitable for resource management in the grid context; at the same time, the variation of price reflects the supply and demand of resources; finally, market theory in economics provides precise depiction for efficiency of resource allocation. Market-based resource allocation can be divided into two sub-problems. One is how to determine the price of resources, and the other is how to allocate resources for achieving the goal of high effective utilization of resources in response to current resource prices. In this paper, we focus on the first problem, i.e., how to determine the general equilibrium price. It is tâtonnement process [4] that varies the price of resources until an equilibrium is reached according to the general equilibrium theory. Generally, there are two kind of pricing methods: one kind is the distributed independent pricing method, which adjusts the price of the individual resource according to the equilibrium of supply and demand of the individual resource with the distributed manner; the other kind is the centralized simultaneous pricing method, which varies the price of all resources simultaneously according to the equilibrium of supply and demand of all resources with the centralized manner. In Section 2, a brief overview is given for current research on resource pricing for grid computing. A system structure for resource pricing is described in Section 3. A distributed group-pricing algorithm is presented in Section 4. In Section 5, we test the performance of the presented algorithm. Finally, we conclude this paper in Section 6.

2 Related Works Research efforts on resource management for grid computing based on economics principles include works [5,6,7,8]: the distributed pricing method is studied in [5], and the centralized pricing method is studied in [6,7]; In GRACE [8], the price of resources was given to them artificially in economics-based resource scheduling experiments, leaving no space for optimization of resources allocation. Distributed pricing WALRAS algorithm is presented in [9], and the property of this algorithm is also discussed. Distributed independent pricing methods and centralized simultaneous pricing methods are compared in [10]. The distributed independent pricing method is suitable for large-scale distributed systems, and the complex of the method is relative lower, however the speed of achieving the equilibrium is slower for not considering the correlation of different resource prices. In contrast, the centralized simultaneous pricing method can obtain quick convergence for considering the price correlation, however, the centralized manner is not suitable for large-scale grid computing systems, and the complex of the method increases quickly as the number of resources increases [11].

622

C. Weng, X. Lu, and Q. Deng

3 System Structure Resources in the grid are organized as resource domains that are individual and autonomous administrative domains, and multiple resource domains are integrated into a seamless grid by the grid middleware such as globus, legion, etc. A grid consists of multiple distributed resource domains, where resources are utilized by selfdetermination, and there are different kinds of resources in a resource domain. The supply and demand of resources in a resource domain is varying along with time. The price of resources should reflect the variation of the supply and demand of resources in the system. A system structure for pricing resources in the grid context is depicted based on the globus toolkit, which is illustrated as Fig.1.

Fig. 1. System Structure

In Fig.1, one kind of agents is used to manage local resource domains based on GRAM in globus toolkit, and is denoted by R-Agent (resource domain agent), which is responsible for assembling the information of supply and demand of resources in the range of the resource domain in response to the given price of resources, and calculating the excess demand of resources in the resource domain, and submitting the excess demand information to the auctioneer. The other kind of agents is responsible for pricing resource groups to achieve an equilibrium, and is denoted by R-Auctioneer (resource group auctioneer). Located in WAN, each R-Auctioneer is in charge of pricing resources among one resource group in the grid system, and communicates with R-Agents for collecting information on the supply and demand of resources through the middleware modules provided by the globus toolkit. The pricing system consists of two kinds of agents. Usually the pricing process need be repeated more than one iteration when the supply and demand of resources has a change. So it is expected that the communication occurring in WAN between RAgents and R-Auctioneers for adjusting price to an equilibrium could be minimized, consequently a pricing approach is presented to meet the requirement.

A Distributed Approach for Resource Pricing in Grid Environments

623

4 The Pricing Algorithm In this section, we present a distributed group-pricing algorithm. Firstly, resources in the grid are divided into multiple resource groups according to the degree of price correlation. Then, after the change of the supply and demand of resources invoking a tâtonnement process, the price of one resource group is adjusted independently from the other groups, and the price of resources in the same resource group is adjusted simultaneously according to the equilibrium of the resource group. This procedure is repeated until the global equilibrium of all resources reaches. The algorithm is described formally as follows. The total number of resource domains in the grid system is denoted by M, and N denotes the total number of resources. According to the degree of price correlation, resources are divided into G groups, and correspondingly the number of RAuctioneer is also G. Assumed that the number of resources in resource group k is and then we have:

Prices of resource group k are adjusted by R-Auctioneer k, and denoted by price vector The algorithm for R-Auctioneer k is as follows: 1. Initialize price vector of resources with the previous equilibrium price. 2. Receive the excess demand function from R-Agent i, i=1, 2,…, M, and

3. Calculate the new price vector With Taylor serial for multivariable function, we have an approximate function as follows:

where,

is the first derivate of vector function which element is as follows:

and as a matrix

where, A new price vector can be obtained through equation (3) with initial and we substitute the new vector price for in equation (3), then equation (3) is calculated repeatedly until We denote this specified as and is a given equilibrium threshold. 4. Determine the amplitude of the price variation:

624

C. Weng, X. Lu, and Q. Deng

According to the given price threshold , the flag for price variation is determined as follows:

5. Send the new price vector and the flag for the price variation to all R-Agents. Each R-Agent will obtain the new price vector and the flag vector by combining individual price vector and individual flag received from R-Auctioneer k. If i.e., P satisfies: total excess demand function Z(P) approximately equals zero, then P is a new equilibrium price vector and denoted by Otherwise, the R-Agent needs to calculate the new excess demand function of resource domain with the algorithm (see Fig.2).

Fig. 2. The algorithm for R-Agents that is used to calculate the new excess demand function

When the supply and demand of resources in the grid system changes, each RAuctioneer and each R-Agent repeatedly carry out the above algorithms by turns respectively until which means that a new equilibrium achieves.

5 Experiments and Discussion In this section, we will test the performance of the distributed group-pricing algorithm by simulation experiments against the performance of the WALRAS algorithm [9], and analyze the property of the presented algorithm. The CES (constant elasticity of substitution) utility function [12] is chosen for the valid comparison between the two algorithms. Then the utility function is as follows:

A Distributed Approach for Resource Pricing in Grid Environments

625

We choose the performance metrics: (1) the number of iterations of R-Auctioneer and R-Agent executing their algorithms by turns until an equilibrium is achieved; (2) square root of sum of the square of excess demand for all resources, which reflects the integral convergence degree during the pricing process. In experiments, the termination condition for iteration is We set =2, and randomly generate coefficient from a uniform distribution [0.1, 200]. The number of resource domains is 50, and endowments for resource domains are uniformly distributed in the range [2000, 3000]. The first situation is that the number of resources is 10. Correspondingly there are 10 R-Auctioneers in the WALRAS algorithm where each R-Auctioneer is responsible for adjusting the price of one resource. For the distributed group-pricing algorithm, resources are divided into 3 resource groups according to the degree of price correlation, and correspondingly there are 3 R-Auctioneers for the 3 resource groups respectively. The experimental result of pricing adjusting process is illustrated as Fig.3 (a). The second situation is that the number of resources is 20, and resources are divided into 3 resource groups for the distributed group-pricing algorithm, and the experimental result is illustrated as Fig.3 (b). The third situation is the same as the second situation except that in the distributed group-pricing algorithm, resources are divided into 6 resource groups, and the experimental result is illustrated as Fig.3 (c). According to Fig.3 (a) and Fig.3 (b), the iteration number for achieving the equilibrium increases for the two pricing algorithms as the number of resources in the system increases, which is because there are more influence on the price of one resource as there are more kinds of other resources in the system. Also we can find that more groups are divided for a fixed number of resources, more iterations for achieving the equilibrium are needed by the distributed group-pricing algorithm, which is illustrated as Fig.3 (b) and Fig.3 (c). That can be explained as more groups of resources are divided, more price interaction among groups of resources exists. Experimental results indicate that an equilibrium can be achieved by the distributed group-pricing algorithm quicker than by the WALRAS algorithm. The rationale behind the distributed group-pricing is that not only all resources are divided into resource groups for scalability borrowed from the distributed WALRAS algorithm, but also prices are adjusted simultaneously in the rang of one resource group for quick convergence borrowed from the traditional centralized pricing method. The other important issue is that the WALRAS algorithm is suitable for asynchronous pricing [9]. Considering the manner of pricing between resource groups is similar to the manner of pricing between individual resources in the WALRAS algorithm, the distributed group-pricing algorithm is also suitable for asynchronous pricing, i.e., each R-Auctioneer in the distributed group-pricing algorithm adjusts the price of the corresponding resource group asynchronously, and in a long run this process will also lead to a global general equilibrium.

626

C. Weng, X. Lu, and Q. Deng

Fig. 3. Price adjusting. (a) The first situation; (b) The second situation; (c) The third situation

6 Conclusions How to determine the price of resources is one of key issues for market-based resource management in grid computing systems. In this paper, a distributed grouppricing algorithm is presented for determining the price according to the general equilibrium theory. Based on group pricing with good scalability, this algorithm is suitable for grid computing systems, and its performance is tested in different situations, and the property of the presented algorithm is also analyzed. Experiments show that the price can be adjusted to achieve an equilibrium quickly by this algorithm when the supply and demand of resources changes. Acknowledgements. This research was supported by the National Natural Science Foundation of China, No. 60173031.

A Distributed Approach for Resource Pricing in Grid Environments

627

References 1.

Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. The International Journal of Supercomputer Applications, Vol.15, No.3 (2001) 200-222 2. Foster, I., Kesselman, C.: The Globus Project: A Status Report. Future Generation Computer Systems, Vol.15, No.5 (1999) 607-621 3. Natrajan, A., Humphrey, M.A., Grimshaw, A.S.: The Legion Support for Advanced Parameter-space Studies on a Grid. Future Generation Computer Systems, Vol.18, No.8 (2002) 1033-1052 4. Varian, H.: Microeconomic Analysis. 3rd edn, W. W. Norton & Company, Inc., New York (1992) 5. Cao, H., Xiao, N., Lu, X., Liu, Y.: A Market-based Approach to Allocate Resources for Computational Grids. Computer Research and Development (Chinese), Vol.39, No.8 (2002) 913-916 6. Wolski, R., Plank, J., Brevik, J., Bryan, T.: Analyzing Market-based Resource Allocation Strategies for the Computational Grid. The International Journal of High Performance Computing Applications, Vol.15, No.3 (2001) 258-281 7. Subramoniam, K., Maheswaran, M., Toulouse, M.: Towards a Micro-Economic Model for Resource Allocation in Grid Computing System. In: Proceedings of the 2002 IEEE Canadian Conference on Electrical & Computer Engineering (2002) 782-785 8. Buyya, R.: Economic-based Distributed Resource Management and Scheduling for Grid Computing [Ph.D. Dissertation]. School of Computer Science and Software Engineering, Monash University, Australia (2002) 61-79 9. Cheng, J., Wellman, M.: The WALRAS Algorithm: A Convergent Distributed Implementation of General Equilibrium Outcomes. Computational Economics, Vol.12, No.1 (1998) 1-24 10. Ygge, F.: Market-Oriented Programming and Its Application to Power Load Management [Ph.D. Dissertation]. Department of Computer Science, Lund University, Sweden (1998) 65-78 11. Zhang, J.: Economic Cybernetics (Chinese). Tsinghua University Press, Beijing (1989) 12. Zhang, J.: Mathematical Economics – Theory and Application (Chinese). Tsinghua University Press, Beijing (1998)

Application Modelling Based on Typed Resources* Cheng Fu and Jinyuan You Department of CS, Shanghai Jiao Tong Univ., China {fucheng, you–jy}@cs.sjtu.edu.cn

Abstract. We have developed a type system for the calculus of Safe Mobile Resources (SR), which is a variant of Mobile Resources(MR). In this paper, we will show the expressive power of the calculus. Some examples will be examined to illustrate how to use the features in SR to model the usual distributed applications in a mobile or cooperative environment.

1

SR Review

The calculus of Safe Mobile Resources(SR) is a variant of the calculus of Mobile Resources(MR), with enhanced capabilities to enforce full coactions. Its three essential behaviors are listed below:

The first one shows the operation of resource consumption. In SR, any resources can be embedded into ambient of any levels which are accessible for any outer process as long as the out process has the specific relative path names for the resource. In (1), for example, is the relative path name. The second reduction shows how resource(process) moves between two places. There are three processes that join the reduction. Any resources are moved passively with the permission(coactions) of a sender process and a receiver process. And the last formula shows the deletion operation for an ambient. Ambient deletion does not support path name access semantically. Let be a countable set of names ranged over by a, b, ... , n, m. The set of all processes is denoted by (ranged over by p, q, ...) and the set of capabilities (ranged over by In typed version, we use a set of restricted names where and to represent the typed names in an abbreviated form. Capabilities and simple capabilities are defined as SA. We write for free names of the process p, and for those of stands for one-step * This paper is supported by the Shanghai Science and Technology Development Foundation project (No. 03DZ15027) and the State Natural Science Foundation project (No.60173033). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 628–635, 2004. © Springer-Verlag Berlin Heidelberg 2004

Application Modelling Based on Typed Resources

629

reduction. The definition of structural congruence relation is standard. There list an amount of structural congruence rules in Table 2 for position commuting of parallel composition and for restriction scope stretching. Contexts and path contexts are defined as MR and The SR Grammars and syntax are shown in Table 1. The reduction rules are in Table 3 and structural congruence relation are shown in Table 2.

2

Type System

The SR type system is designed to show the attributes of mobility, threadness and resource types for a capability, an ambient and a process. The type grammars for SR are shown in Table 4. The notation used in our type system is inspired by SA and ETS-MT[3]. For instance, an immobile single-threaded process can be typed as Proc where stands for an immobile process and the superscript indicates the total threads in a process, while, if a process is typed as it is then a mobile multi-threaded process. We use X to range over the ambient type

630

C. Fu and J. You

set Z over the capability type set and T over inner process type set The concept we used to identify the mobility[1,5] of a process is to observe whether a process remains in the same place during the reduction. In SR, only and satisfy this property. We have two mobile types: (mobile) and (premobile). A premobile has one more feature than the mobile one. The former can cause the enclosed ambient to be a mobile process while the latter cannot. There are also two immobile types: and (accompanyable). If a process is accompanyable, then it can cause a process emerge to be parallel to it while an immobile one cannot.

To get started, we also need some basic types to denote the types of the resources over which the mobile resources types are built. We use as a collection of all the basic resources types, to indicate any subset of basic types used in the current process, and to indicate any basic type in And the intuitive meaning of the grammar production is explained as follows: Amb[T, is a type of an ambient which can contain a process typed as Proc[T, Cap[T, K] is a type of capability. If K is Shh, it means that the capability will not consume or be consumed as a resource. Otherwise the target resource

Application Modelling Based on Typed Resources

631

or capability to be consumed should therefore coincide with the specified resource type. Proc[T, is a type of a process that has inner type T, and can cause resource consumption typed in The typing rules are shown in Table 5. The commutative operators · , | on on and on are defined as follows:

Then we define a transitive, reflexive relation of subtyping[6,7] on the types of process, which is summarized below. We use on on and on

We only allow subtyping on processes. because without the reduction behavior of processes, it makes no sense for the ambients and capabilities to have such a relation. And subtyping on processes suffice to show the relation between the processes in our calculus. Theorem 1. If and

and

then

with

Proof. The proof is shown by induction on the derivation of Example 2. Consider the following process

By the assumption that and with and we can easily derive By the similar form of process we can model an immobile server [3] with the provision of different services within named boundary. Example 3. Consider the following process

Although we know that most processes like the previous example must be typed as immobile, by (SRT Cap ) and (SRT Amb 2) the resource ambient res can only be typed as premobile which is different from the example mentioned above. We thus can model a mobile resources by using this form. But if we want mobile ambient to be truly movable during reduction, it should then be placed in a container ambient to hold it. After doing that, the outer processes can fetch and make use of it.

632

3 3.1

C. Fu and J. You

Applications An Auto Delivery Cola Machine

This model is an enhanced version of vending machine in MR. We divide the model into two parts: one is the machine; the other is the consumer. We use ambient to denote a slot for credit card, to denote a can that contains cola, and pck to denote the pocket. Other names are intuitively clear. When consuming occurs, card must be a private name so that other process cannot access any resources inside card.

Application Modelling Based on Typed Resources

633

Then we show how the interaction is performed between machine and consumer.

In step (4), ambient card is taken from pck to in (5), resource ecash is consumed; in (6), cola in is consumed; and next, the card is fetched back; at last, is removed(into to the litter bin). We then apply the SR type system to this model. By assuming ecash : cola : we deduce the follow typing results for the other names and processes:

We omit the formal deduction steps. The result shows us that there exists two resources consumption in the process consuming. The machine and consumer remain immobile. Ambient is mobile because it can be deleted. is mobile because it can contain a mobile card. card is mobile because it can be taken and given. pck is mobile for it has the same reason as

3.2

Digital Signature Card

Digital Signature Card is one of the main issue in paper [2]. Here we provide a modified version where the any movement is controlled by all participants. We then apply the type system to the model and make it well typed.

634

C. Fu and J. You

By assuming

we have the following result:

The deduction steps for the results are omitted, but intuitively we give the following explanations. Since there is no resource consumption in this model, the resource type property is shown as empty. is a secure mobile place which can hold classified data that can be sent through a network. reg is an internal place where encrypting and decrypting operations occur. An internal place is immobile. in and out are something like a buffer to hold incoming and outgoing data. and are the local places to allow the and processes to perform their tasks. and are message boxes one for sending, the other for receiving. msg is something like an envelop to hold the message data and the signature network is a physical place where the data transmission takes place. The process and provide their functionality within a bound and they are thus immobile. The two processes are modelled as immobile services to provide unlimited encryption and decryption operations, so they are multi-threaded. is a sender process while is a receiver processes. Both of them stand immobile and contain only one thread to perform their operations. is the whole model process. From outer perspective, it is an autonomous immobile system with actual two threads.

3.3

Resource Pooling

In the large cooperative environment, resource pooling is one of the main issues. Here we give the model of a Servicing Resource Pool designed in language of SR. In the following example, the process is the k-parameterized process where all wanted resources gather in parallel. Each resource process and their carrier ambient represents a single servicing resource. All servicing resources are located in ambient svr. The pooling process is denoted by where denotes how many resources there are in the pool. represents a process that will request for the resource inside the pool. is the process of the whole model. Parameter indicates how many jobs are requesting resources.

Application Modelling Based on Typed Resources

635

The process in each job process can access the resource process inside ambient reg. Moreover must contain capability mem reg pool to return the resource to the pool if the job has no long need the resource. We can furthermore build a larger model where multiple pools for different resources, and jobs can fetch different kinds of resources from different pools. In such a model, we should concern more about the concurrency problems such deadlock and starvation. To type the above model, we assume and Proc where then we have the following results:

4

Conclusions and Related Works

In this paper, we briefly introduced a type system on SR and examine some examples to show how to use SR to model the usual applications. Most applications modelled by SR are a little more complex than by MR. But the chance of the security risks decreased dramatically. This is because any mobile action should reach an agreement by all participants. Besides, a type system is implemented on the calculus to type the mobility, threadness and resources of the SR processes. To fully eliminate the grave interferences[4] in SR, a more complex type system is under research. Further more, bisimulation congruences under the typed calculi need to be developed. And the expressive power for SR seems to be far from enough. There are lots of necessary work to do, such as encoding ambients or etc.

References [1] L. Cardelli, G. Ghelli, and A. D. Gordon. Mobility types for mobile ambients. Technical Report MSR-TR-99-32, Microsoft Research, 1999. [2] J. C. Godskesen, T. Hildebrandt, and V. Sassone. A calculus of mobile resources. In Proc. CONCUR ’02, volume 2412 of Lecture Notes in Computer Science, 2002. [3] X. Guan, Y. Yang, and J. You. Typing evolving ambients. Information Processing Letters, 80(5):265–270, 2001. [4] F. Levi and D. Sangiorgi. Controlling interference in ambients. Short version appeared in Proc. 27th POPL, ACM Press, 2000. [5] E. G. M. Coppo, M. Dezani-Ciancaglini and I. Salvo. M3: Mobility types for mobile processes in mobile ambients. In Proc. CATS ’03, volume 78 of Electronic Notes in Theoretical Computer Science, 2003. [6] B. Pierce and D. Sangiorgi. Typing and subtyping for mobile processes. Journal of Mathematical Structures in Computer Science, 6(5):409–454, 1996. An extended abstract in Proc. LICS 93, IEEE Computer Society Press. [7] P. Zimmer. Subtyping and typing algorithms for mobile ambients. In Proc. FoSSaCS ’00, volume 1784 of Lecture Notes in Computer Science, pages 375–390, 2000.

A General Merging Algorithm Based on Object Marking* Jinlei Jiang and Meilin Shi Department of Computer Science and Technology, Tsinghua University, Beijing, P. R. China, 100084 {jjlei, shi}@csnet4.cs.tsinghua.edu.cn Abstract. It is an ordinary need for cooperative applications to merge different versions of an object to a common state. Though many approaches exist, they are either too complex to implement or not flexible enough to meet the various high-level requirements. To solve the problem, a general merging algorithm is developed based on object marking, i.e., the contents of an object are marked with appropriate labels. The paper details the algorithm and shows how to recover operation context and how to detect and resolve operation conflicts with an example. The algorithm is efficient and flexible enough to allow users to specify various merging policies. Therefore, it can be implemented as a common service for cooperative applications.

1 Introduction For cooperative applications, it is an ordinary need to merge different versions of an object to a common state[4]. For example, in the course of collaboratively producing a document or some other artifact, collaborators often find that they have created two versions, each containing revisions that they wish to have in a single version. It then becomes a task to take the set of revisions from one version and re-apply them to the other version of the object. Another scenario requiring merging is mobile computing, where users replicate objects to local machine while online and then disconnect from the server and manipulate the objects offline as they move. At last different copies of an object are gathered somewhere and merged into a single one. To merge the contents, first of all we should tell out the differences, which can be done with the help of differencing tools. After that, we can re-apply one set of changes to an object to another object to obtain a new version of the object. This procedure is usually error-prone and time-consuming. Therefore, a tool performing the merge automatically would be highly useful. Existing merging tools can be divided into two categories, i.e., text-oriented and object-oriented. In text-oriented merging tools, the contents under operation are simple text documents. Examples of them are rcsmerge[7], semantic diff[3] and flexible diff[5]. While in object-oriented merging tools, the contents under operation are objects, which may have sophisticated structure. Examples of them include GINA[1], transformation based concurrency control[2, 6] and flexible object merging framework[4]. As we all know, objects *

This work is co-supported by the National Natural Science Foundation of China under Grant No.60073011, the National High Technology Research and Development 863 Program under Grant No.2001AA113150 and 985 Project of Tsinghua University.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 636–643, 2004. © Springer-Verlag Berlin Heidelberg 2004

A General Merging Algorithm Based on Object Marking

637

within cooperative applications usually have complex structures beyond text documents and computer programs. Therefore, the applicability of text-oriented merging algorithms is limited. Though object-oriented paradigm solves the data representation problem, deficiencies still exist. For example, GINA doesn’t support automatic merge, and transformation based concurrency control algorithms are hard to implement, and flexible object merging framework can’t handle whole-object operations and the matrix entries of it will explode for complex objects. To address these issues, a general merging algorithm based on object marking is developed in this paper. It defines a set of symbols (called labels) to denote the changes made to the object. Based on the semantics of the labels, one can easily recover operation context, detect and resolve operation conflicts. The rest of the paper is organized as follows. The coming section explains the basic idea behind our scheme. Section 3 describes the details about the algorithm proposed. Following that, an example illustrating the algorithm is given in section 4. The paper ends with some conclusions drawn in section 5, where future work is also given.

2 Object Marking We deploy Cova Object Description Language (CODL)[8] to represent the artifacts under operation, which provides a basis of general structured objects and allows us to take an application-independent approach to merge objects. The idea behind object marking is very simple, that is, if we can shield the changes suffered from concurrent operations by adding some labels to the shared object, the difficulty encountered due to different original context will be conquered.

2.1 Labels and Context A label is a symbol identifying the changes made to the contents of an object. It is obvious that the differences between the revised object and the base version are no more than three cases, that is, new contents are inserted, and existing contents are removed, and existing contents are updated. Therefore, three labels are defined to mark the object as follows. Insert label, denoted by I, is used to mark the inserted contents. Delete label, denoted by D, is used to mark the deleted contents. Update label, denoted by U, is used to mark the updated contents. For the changed contents, we also need to specify their scope. To do so, we borrow the idea of tagged data items from HTML/XML documents. In more detail, we exploit / and / to denote the start and the end of the inserted/deleted contents respectively. For update label, the format looks like “X/Y”, where X and Y indicate the new and the old value respectively while “/” acts as the separator between X and Y. Note that only the changed data items are labeled. Users may join a session at different time. This will cause a problem that the original object copies of different users are different, which will affect the differencing procedure and result in wrong merged result without careful treatment. To solve this

638

J. Jiang and M. Shi

problem, version number is introduced to track the original context information of a participant. In our algorithm, version is represented by a natural number and the original version for an object is always 0 (other values are also feasible). Combine version number and labels and then we get the full object-marking scheme. With version introduced, the marking labels can be uniformly represented as “...”, where Version is the context identifier and X is the label key.

2.2 Content Retrieval and Context Recovery Content retrieval is exploited to remove the labels (The contents sent to users contain no labels). According to the semantics of labels, this process is straightforward – contents without labels or labeled with I are copied directly, and contents labeled with D are omitted. As for updated contents, the new values are copied. During this process, we care nothing about the version numbers. This procedure functions when a user opens an object. Meanwhile the server will record the user’s context, which will be used for resolving conflicts after the operation results are submitted. After the object is opened, different users can then work independently on their own copies. Context recovery is used to recover the original object context for differencing and merging. Unlike in content retrieval, labels should be treated carefully during this process because the object may have been modified meanwhile. In more detail, suppose the original context of a user is N. To recover his/her original context, contents labeled with version number no greater than N don’t need to be reversed for they have been perceived by the user according to retrieval procedure. But a version number greater than N indicates the corresponding contents are not perceived by the user, so the labels with version number greater than N must be reversed, that is, inserted or updated contents are omitted while deleted contents are restored.

2.3 Marking Criteria During marking process, the following rules must be obeyed. Rule 1 (scope maximization). It requires that a label should mark as many contents as possible a time. This rule is introduced to reduce the labels present. Thus, the space occupied can be saved. According to this rule, labels “ab” should be changed into “ab”. Rule 2 (well structured). By well structured we mean that 1) each label must have an end label, and 2) no labels start within other labels and end outside of them. For example, labels “.........” are not well structured because the label starts in between and and ends outside of them. However, labels “abcdef” are well structured. Rule 3 (label nesting). It says that 1) nested labels must be well structured, and 2) versions of the nested labels must be greater than those of the ones nesting them. This rule is specially designated to accelerate the context recovery process.

A General Merging Algorithm Based on Object Marking

639

3 Merge Algorithm This section looks into the merge algorithm.

3.1 Conflicts and Merge Policies A merge policy is a set of rules that determine which revisions will be included in the merged object. We borrow the basic idea of the merge matrix in flexible object merging framework, i.e., the merge policy is defined for each level of a structured object. However, some fundamental modifications also have been done. First, merge matrix in our scheme is only specified on the three primitive operations (i.e., Insert, Delete and Update as mentioned previously) and a complex operation is viewed as a combination of the primitive operations. In this way, entry explosion issue is avoided. Second, merge policies are pre-defined for the 8 primitive types and 5 collection types (i.e., list, array, set, bag and dictionary) at the object level. With no user-defined policies specified, the default ones will be deployed. This eases the burden of specifying merge policies. Third, our scheme can handle whole-object operations, which is impossible in flexible object merging framework. Similar to other object-oriented language, once a class is instantiated, it is not allowed to be re-structured. Therefore, the reason causing conflict is that two concurrent operations update the same object element with different values. The merge policy for primitive types is very simple as defined in Table 1, where “–” means the corresponding case will never occur. Three choices provided are users (denoted by F) that means it is up to user or a program to decide which value to keep (As in flexible object merging framework, it is a function that presents the users with the alternative changes and requests that they select one of them, or a function that accepts the changes and returns the choice), both (B for short) that means the system will keep the both values, and overwrite (O for short) that means the old value will be replaced by the one submitted most recently (called the newest value hereafter). The default policy for primitive types is O. The policy B is deployed when the users want to keep the revision history. However, though many values are recorded with B policy, only the newest value is used.

Merge policies for collection types are divided into three categories, that is, list and array, set and bag, and dictionary. In the following we will explain them in detail. Conflicts for list and array are as follows. Insert-Insert conflict. Users insert different elements at the same place concurrently.

640

J. Jiang and M. Shi

Delete-Update conflict. Some users alter an element while some concurrent ones delete it. Update-Update conflict. Two participants update the same element with different values concurrently. There is no Delete-Delete conflict since either the targets are different or the users’ intensions are the same. In addition, there is no Insert-Delete conflict because the results of concurrent operations can’t be perceived by each other as the work procedure requires. So does with Insert-Update conflict. The merge policy for list and array is illustrated in Table 2, where indicates the two operations are never paired for comparison or they are compatible. The default policy is also O. For Delete-Update conflict, by O we mean that the element will be deleted if the last submitted operation is Delete. Otherwise, it will be kept. With policy B selected, both values will be kept and used for Insert-Insert conflict. However, only the newest one is used for UpdateUpdate conflict. Set and bag have no index defined on them and conflicts for them are as follows. Delete-Update conflict. It occurs when one user deletes an element while another user alters its value. Update-Update conflict. It occurs when two users update the same element. The merge policy for set and bag is shown in Table 3. Note that it must hold that elements in a set should be unique. The default policy is B. For Delete-Update conflict, by B we mean that the updated results are kept. For Update-Update conflict, by O we mean that the result submitted most recently (the newest value) is kept. In addition, if two users insert the same value into a bag object concurrently, only one value is kept because their intentions are consistent in this case.

The index of dictionary type is acted by some keys, which distinguishes it from list and array. Conflicts for it are as follows. Insert-Insert conflict. Two users insert the same key with different values. Update-Update conflict. Two concurrent users update the value corresponding to the same key. Delete-Update conflict. One user deletes a key while some others alter the corresponding value concurrently. The merge policy for dictionary type is described by Table 4. For Insert-Insert conflict, we keep both users’ intentions by default (Note that one key should be altered in this case). The default policy for other conflicting cases is O as explained previously. Merge policies for user-defined objects are recursively specified on its attributes, that is, 1) if the attribute is of primitive or collection type, the policies discussed above will be applied, and 2) if the attribute is also of a user-defined type, merge

A General Merging Algorithm Based on Object Marking

641

policies for its attributes are used instead. With no policy specified, the default ones will be used. Merge policies are supplied as a policy profile, which is loaded each time the merge procedure is invoked. The profile can also be modified as time goes on. In our scheme, it is allowed to treat an object as an atomic unit. This is accomplished by specifying certain attributes of object types with the word “atomic” in the policy profile. For atomic object, it makes no sense to merge changes to it. Therefore, the merge policy for atomic object is the same as that for primitive types. In the end, we should point out that the policies for collection types only hold when the element is of primitive type or atomic object data type.

3.2 Merge Procedure The merge algorithm, denoted by MergeObject(oid, NC, ov), is illustrated as follows, where oid is the ID of the object to merge, and NC denotes the operation results, and ov represents the operation context, i.e., the original version number.

The algorithm first loads the merge policies and then one final common object is obtained through one by one differencing and merging of its attributes. The function CompareObject first recovers the context to ov and then does the comparison. This algorithm is invoked each time users submit their results. Afterwards, the server will refresh the labels marking the object and increase the context version maintained.

642

J. Jiang and M. Shi

4 A Case Study In this section, we will illustrate the algorithm by a string object, which is defined as follows with most operations omitted.

The merge process with default policies is shown in Fig. 1, where the vertical axis represents the time while the horizontal one represents operation related information such as participants, action, operation results and so on.

Fig. 1. Merging Process for a String Object

Digitals in Ver. column record changes of object version, where the one within the brackets is the original context for the corresponding user while the one out of the brackets is the latest version maintained at server side. Users C1, C2 and C3 form a session and they have the same original context. When the result from C1 is submitted, character “1” is found as a result of insert operation and therefore, it is marked with Insert label. When C4 joins the session, the content retrieval procedure (i.e. RetrieveContent) returns “A1B2CDEF” back. After the results from C3 are merged,

A General Merging Algorithm Based on Object Marking

643

the object got is “A1B2C3DEF”. Since labels with version no greater than 2 have no use any longer, they are removed by label refreshing procedure. As a result, we obtain “A1B2C3DEF ”. Operations from C3 and C4 cause conflict. According to the merge policy, result from C4 is kept. At last, a final object (“A2B3C4DE4”) is obtained by removing all the labels.

5 Conclusions and Future Work In cooperative applications, the need to merge different versions of an object to a common state is often encountered due to several reasons including optimistic concurrency control, asynchronous coupling and absence of access control. To meet this requirement, we have developed a general merging algorithm based on object marking. Our algorithm has the following characteristics and therefore, it can be implemented as a common service. It is based on general objects and thus can cover a wide spectrum of applications. Automatic conflict resolution eases the burden put on users. Flexible merge policies make it possible to meet various high-level requirements. It is efficient and easy to implement. Consistency guarantee is an important issue in CSCW. Although the algorithm proposed in this paper can meet the consistency requirement, it has an assumption that participants are working unaware of each other. Computing environment in the future will contain both online and offline users, under this circumstance it is our next goal to keep data consistency without losing efficiency as well as work awareness.

References 1. Berlage T. and Genau A.: A Framework for Shared Applications with Replicated Architecture. In: Proc of ACM Symposium on UIST (1993) 249–257 2. Ellis C. A. and Gibbs S. J.: Concurrency Control in Groupware Systems. In: Proc of ACM Conf on Management of Data (1989) 399–407 3. Horwitz S., Prins J. and Reps T.: Integrating Noninterfering Versions of Programs. ACM Transactions on Programming Languages and Systems, 3(1989) 345–387 4. Munson J. and Dewan P.: A Flexible Object Merging Framework. In: Proc of ACM Conf on CSCW (1994) 231–242 5. Neuwirth C. M., Chandhok R. et al: Flexible Diff-ing in a Collaborative Writing System. In: Proc of ACM Conf on CSCW (1992) 147–154 6. Suleiman M., Cart M. and Ferrie J.: Serialization of Concurrent Operations in a Distributed Collaborative Environment. In: Proc of ACM Conf on Supporting Group Work (1997) 435–445 7. Tichy W. F.: RCS – A System for Version Control. Software – Practice and Experience, 7(1985) 637–654 8. Yang G. X. and Shi M. L.: Cova: An Object-oriented Programming Language for Cooperative Applications. Science in China (Series F), 1(2001) 73–80

Charging and Accounting for Grid Computing System Zhengyou Liang1,2, Ling Zhang1, Shoubin Dong1, and Wenguo Wei1 1

GuangDong Key Laboratory of Computer Network , South China University of Technology, GuangZhou, 510641, P.R.China {zhyliang, ling, sbdong, wgwei}@scut.edu.cn 2

College of Computer and Information Engineering, GuangXi University, NanNing, 530004, P.R.China

Abstract. Grid computing is the key technology of next generation Internet. Today grid research is mostly focus on the communication, security, resource manangement and information management. Charging and accounting is a base activity in a economy society, so that it should become a part of grid computing system in computational economy environment. In this paper, we introduce charging and accounting item in a grid computing system, and propose a method for calculating the cost of a grid usage. This method gives out how to calculate the standardization technology cost of the usage of a job and how to translate the standardization technology cost into currency cost. Further more, we analyse the demands of a charging and accounting system in a computional economy based grid. A architecture of charging and accounting system and its support system is designed in this paper.

1 Introduction With the popularization of the Internet and obtaining of powerfull computer and high speed network, these low cost commerce component are changing our method of using computer. It is possible for us to use the computer network as a simple uniform computing resource. And it is possible to connect the geographic distributed all kinds of computing resources and aggregate them into a simple uniform resource. This form of resource is usually called “computational grid”[1]. The solution framework of 21th century scientific problem will base on the heterogeneous complicated “grid”. The applications based on grid include today security mechanism, web browse, remote collaboration engineering, distributed petabyte data analyse and alive equipment control system[1,2,3]. Grid computing is the key technology of the next generation Internet. The key conception of grid is coordinated resource sharing and solves problem in the dynamic multi institutional virtual organizations[2]. In the world wild, a lot of ambitious projects are using the grid computing conception to deal with some challenging problems, such as distributed analyse of experiment physics data, together access earthquake engineering equipment, create the “science portals” for thin client to access all kinds of remote data and system and transaction processing of extra large data[1,4]. In a grid system, the resource provider and the resource consumer are away not the same person, and often do not belong to the same organization. So when one uses the M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 644–651, 2004. © Springer-Verlag Berlin Heidelberg 2004

Charging and Accounting for Grid Computing System

645

resources of grid, a economy active takes place between the resource provider and the resource consumer. From the view of economy, the resource provider should obtain suitable benefit from providing the service, and the resource consumer should pay for the service. The grid system can be maintained by paying for service. Today the research of grid is mainly focus on the communication, security mechanism, resource management and information management. They lack the base service that support the economic activity in a grid system. So it is necessary to develop a charging and accounting service for the grid, which will manage the cost of usage of grid and support the economic activity according to the computational economy. In this paper, We introduce charging and accounting item in a grid computing system, and propose a method for calculating the cost of a grid usage. Further more, we propose a charging and accounting system for grid.

2 Related Work Charging and accounting for grid has been taken into account for some grid projects and researchers. [5] discusses what kinds of resource to be charged and accounted in DataGrid project, and proposes a calculating cost method, which translates the cost into the credit. A working scheme of their accounting model also presented.This scheme monitors and controls the job executing in real time. It may be too complicated in many cases. [6] introduces which resource needs to be accounted and charged in a computational economic-based grid. And which element affects the resource value is discussed. But how to implement the accounting and charging is not included in it. [7] proposes a Charging distributed services of a computational grid architecture. It thinks that a computation grid consists of four layers—communication service layer,computation service layer, information service layer, and knowledge layer. The three former layers of them provide QoS service by using the CPS[8] pricing scheme. CPS[8] pricing scheme is first used to deal with the congest in the TCP/IP based network for QoS. A charging system for grid was designed in [7]. But how to deal with “computing power” which is the computational grid’s main issue is not discussed in it. In this paper, we propose a calculating cost method that translate the standardization technology cost into the currency cost, and propose a scheme for grid charging and accounting system, which deal with the “computing power” metering and cost calculation.

3 Charging, Accounting, and Calculating Cost 3.1 What Should Be Charged and Accounted in Computational Grid It is necessary to decide for what kinds of resource elements should pay in a grid system. User applications require different resource. The requirement depends on computations performed and algorithms used in solving problems. The demanding resource of different applications is usually different. Some applications can be CPU

646

Z. Liang et al.

intensive while others can be I/O intensive or a combination. Therefore, the consumption of the following resources may be accounted and charged[6]: CPU - User time (consumed by user App.) and System time (consumed while serving user App.) Memory Maximum resident set size - page size Amount of memory used Page faults Storage used Network bandwidth consumption Signals received, context switches Software and Libraries accessed . It is obvious that no every resource will bill the cost of every chargeable element. In a specific application , only the resources used by it will be billed.

3.2 Cost Calculation It is important to determine the value for every resource element and consequently, the price. The price togtheter with the amount of usage of the resource determines the cost. It’s important to note that the value and the price of a resource are conceptually different. The value of a resource should be essentially the way to quantify the real capabilities of the resource itself[5]. [5] assumes that the price of a resource should be related to the value of that resource and that two resources with the same value should have comparable prices. But in fact, the Price is different according to different economic model, and a resource has different price at different condition. In this paper, we calculate the standardization technology cost of a application by charging and accounting system, and then transform the standardization technology cost into currency cost according to pricing policy. Pricing policy is determined by economic model. It is beyond of discussion in this paper. The following is the detail of cost calculation. The computing usage is defined as the product p·u, where p is a performance factor and u is the amount of usage of that resource element[5]. For example if p refers to the CPU power, u should be the amount of CPU time used by the job. This product is called technology cost in this paper, and it is nearly constant for a given job executed on processors with varying CPU Power in ideal. Furthermore, we define standardization technology cost as the product k·(p·u). Where p·u is the technology cost, and k is the normalization coefficient that use to image the value of specific resource. When we concerns the standardization technology cost for the whole job, it could be obtained from the standardization technology cost of every resource component by computing:

Where and are defined above. The i index runs over the resource elements (i.e. CPU, RAM, Disk, etc. . .).

Charging and Accounting for Grid Computing System

647

The price of a standardization technology cost unit is gived by a pricing algorithm, which is determined by the economy model used by the grid system. According economy rule, price is usually related with the user demands, the user’s wish, the relation of the demand and supply in the market, the provider’s currently load, the provider’s wish. So the price is a function of the elements mentioned above. We can use a function to express it as follow: Where Price is the price of a standardization technology cost unit. D is the demand of usage of resource; usually it is an assessed value. PU is the user’s pricing policy, which images the user’s wish. R is the relation of demand and supply in the market. L is the currently load of the provider. And the PP is the provider’s pricing policy, which images the provider’s wish. The currency cost of the whole job is computed by: Where P is the standardization technology cost calculated by (1), and Price is Calculated by (2), C is the currency cost of a job.

4 A Charging and Accounting System for Grid In this section, we suppose a trade scene in a computational economy based grid system and subsequently, educe the demands of the charging and the other related item. Then we design the charging and accounting system for the grid, with the support modules.

Fig. 1. Process of a application in a computationnal economy based Grid

648

Z. Liang et al.

4.1 The Trade in a Grid and Its Demands In order to understand running scene of the charging and accounting system, we introduce how a comsumer submits a application to the grid and how the grid handles the job in a computational economy based grid system. Fig .1. shows the process: (1) The consumer(Application) submits his job with some parameters to the Grid Resource Broker. The parameters include consumer’s pricing policy and deadline. (2) The Grid Resource broker inquires of the Information Service about which Grid Service Provider is available. (3) The Information Service returns a set of available Grid Service Providers to the Grid Resource Broker. (4) If no availabled Grid Resource Provider, the Grid Resource notices the costumer that no available Grid Resource and the process is end. Or the Grid Resource Broker submits the job to one of the Grid Service Provider. (5) The Grid Service Provider evaluates the job technology cost and gives a price of the job to the Grid Resource Broker according to the technology cost , the demand and supply state in the maket, the provider’s pricing policy and its current loading. The Grid Resource Broker decides whether accept that price according to the cosumer’s pricing policy. If accept that price, continue to the step (6). Or it negotiates about the price with the provider. If they come to a price that accepted by both of them, continue to the step (6). Or the Grid Resoure broker chooses another available Grid Service Provider, go to step (4). (6) The Grid Service Provider signs a contract with the broker. Then the provider allocates resource for the job and executes the job. A component name “Data gather” meters and samples the usage of the job, seeing fig .2.. It sends the data to the Charging and Accounting System(CAS). After the job finish, the CAS calculates the currency cost and sends the currency cost data to the Billing System. Then the provider returns the job results to the broker. (7) The broker returns the results to the cosumer. All of the interaction between the Broker and Provider are supported by the Grid middleware. We don’t discuss the Grid middleware in detail for simplify the describe the process of handle the job. From the describing above, we get the demands of the provider. In computational economy environment, a provider should include the follow demands for computational economy: (1) Evaluate a job technology cost. (2) Evaluate the Supply and demand in a computinal power maket. (3) Give a price for job based on the technology cost evaluated, supply and demand state evaluate, the provider pricing policy and its current loading. (4) Meter and sample the usage of a job. (5) Calculate the currency cost of a job. (6) Billing system. Besides mention above, it must support the usually demands, such as resource reservation, resource allocation and trade server, etc. We don’t discuss them because they have often been discussed in the articles about grid resource management.

Charging and Accounting for Grid Computing System

649

Fig. 2. Grid Service Provider and Charging & Accounting System

4.2 The Architecture of Grid Service Provider and Its Charging and Accounting System We maps the demand into our design, showing as fig.2.. The important components relating to the cost calculation are introduced in follow. The Supply and demand assess module is used to evaluate the supply and demand state in the computional power market. According economy theory, the relationship of supply and demand is an important element in deciding the price of a goods. The price is low When supply is more than the demand, and the price is high when supply is less than the demand. The Consumer’s job assess module is used to evaluate how much technology cost that a job will spend in the provider’s machine. The broker need to know the machine performance, but it is not only decided by the hardware of the machine, it also depends on the algorithms that the job use to solve a problem. So if different providers use different algorithms, the same job may spend different technology cost even in the same type machine and subsequently, spend different currency cost. A broker need to know how many technology cost it will spend before it accept a price. It can compare the cost among the providers and choose the suitable one. The provider also needs the technology cost evaluated to help calculate a price in order to give a competing price to the consumer. The Data gather module is the process that provides general ways to meter and sample the usage of the grid resources for a job, such as the usage of cpu, memory, storage, etc. When a job is dispatched in the grid resources, it will be metered and sampled by the Data gather module. After the job finished, the data of its usage of the

650

Z. Liang et al.

grid resources will be sent to the Accounting module, Pricing algorithms module and the Consumer’s job accsess modules. The Pricing algorithms module is use to calculates optimal prices given current loading, Consumer job technology cost evaluated, supply and demand state evaluated, and the provider’s pricing policy. This prices was sent to the consumer for negotiate about the acceptable price for both provider and consumer. If they come to a acceptable price, it was sent to the charge calculation module. A pricing algorithms module may has more than one algorithms. The Accounting module collects data about the task or bulk usage of each customer that is provided by the Data gather module. The Charge calculation module receives the price that are sent from the pricing algorithms module and the data from the accounting module. It calculates the charges for the finished computing task and its output is again the input for the billing mechanisms of the provider .

5 Conclusion In this paper, what kinds of resource is charged and counted is introduced in the grid. A cost calculation method was proposed, which transforms the standardization technology cost of a job into curruency cost. And a architecture of charging, accounting and its support module is proposed. The shceme we discuss in this paper will apply to our campus grid.The further research is the pricing scheme and the related economic model that suitable for our grid. Acknowledgements. This work has been supported by GuangDong Key Laboratory of Computer Network (Grant No. 2002B60113) and GuangXi University Science Research Foundation (Grant No. CC1407).

References 1.

2. 3.

4.

A Natrajan, M A.Humphrey, A S.Grimshaw: Grids: Harnessing Geographically-Separated Resources in a Multi-Organisational Context. In Proceedings of the 15th Annual Symposium on High Performance Computing Systems and Applications(HPCS 2001) , Ontario, Canada, June 18-20, 2001. Available: http://legion.virginia.edu/papers/HPCS01.pdf I. Foster, C. Kesselman, S. Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations . International J. Supercomputer Applications, 15(3), 2001. Available: http://www.globus.org/research/papers/anatomy.pdf. Mark Baker, Rajkumar Buyya and Domenico Laforenza: The Grid: International Efforts in Global Computing. International Conference on Advances in Infrastructure for Electronic Business, Science,and Education on the Internet (SSGRR’2000), IAquila, Rome, Italy, July 31 - August 6. Available: http://www.cs.mu.oz.au/~raj/papers/TheGrid.pdf. A. Iamnitchi, I. Foster: On Fully Decentralized Resource Discovery in Grid Environments. International Workshop on Grid Computing, Denver, Colorado, November 2001. Available:http://citeseer.nj.nec.com/cache/papers/cs/25088/http:zSzzSzpeople.cs.uchicago. eduzSz~andazSzpaperszSzGC2001.pdf/iamnitchi01fully .pdf

Charging and Accounting for Grid Computing System 5.

6.

7. 8.

651

C. Anglano, S. Barale, L. Gaido, A.Guarise, S.Lusso, A.Werbrouck: An accounting system for the DataGrid project -Preliminary proposal. draft in discussion at Global Grid Forum 3, Frascati, Italy, October, 2001. Available: http://server11 .infn.it/workload-grid/docs/DataGrid-01-TED-0115-3_0.pdf Rajkumar Buyya: Economic-based Distributed Resource Management and Scheduling for Grid Computing. A thesis submitted in fulfillment of the requirements for the Degree of Doctor of Philosophy, School of Computer Science and Software Engineering Monash University, Melbourne, Australia. April 12, 2002. Stiller, B., Gerke, J., Flury, P., Reichl, P.: Charging distributed services of a computational grid architecture. Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on , 15-18 May 2001 Page(s): 596 –601 B. Stiller, J. Gerke, P. Reichl, P. Flury: The Cumulus Pricing Scheme and its Integration into a Generic and Modular Internet Charging and Accounting System for Differentiated Services, TIK Report No. 96, Computer Engineering and Networks Laboratory TIK, ETH Zürich, Switzerland, September 2000. Available: http://anaisoft.unige.ch/public-documents/deliverables/TIK-Report96.pdf.

Integrating New Cost Model into HMA-Based Grid Resource Scheduling Jun-yan Zhang, Fan Min, and Guo-wei Yang College of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610051, China {jyzhang, minfan, gwyang}@uestc.edu.cn

Abstract. Grid systems can provide a virtual framework for management and scheduling of resources across different domains. This paper proposes an HMA-base grid resource scheduling system to implement resource finding and scheduling. A new cost model is also given, considering resource finding cost and resource deciding cost beyond traditional model. In succession, the new cost model is integrated into the HMA-base grid resource scheduling system. Our experiment shows that optimal solution under traditional cost model is no longer optimal under our model. Keywords. Grid, resource scheduling, Agent, cost model

1 Introduction In traditional distributed computing environments (DCEs), resource management systems (RMSs) were primarily responsible for allocating resources for tasks [3]. They also performed functions such as resource discovery and monitoring to support their primary roles. With large amount of distributed resources and users, grid systems can provide a virtual framework for management and scheduling of resources across different domains, and they have been the focus of much research activities in recent years. A computational grid is an emerging computing infrastructure that enables effective access to high performance computing resources [5]. Resource management and scheduling are key grid services, where issues of utilizing grid resources reasonably by minimizing total cost represent a common concern for most grid infrastructure and scheduling algorithm developers. Resource management and scheduling in grid systems [1] [2] is challenging due to: (a) geographical distribution of resources; (b) resource heterogeneity; (c) autonomously administered grid domains having their own resource policies and practices; and (d) grid domains using different access and cost models. In this paper, we adopt Hierarchical Multi-Agent-based (HMA-based) methodology to grid resource scheduling, achieved by integrating cost into the HMA-base grid resource scheduling system to implement minimal cost resource scheduling. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 652–659, 2004. © Springer-Verlag Berlin Heidelberg 2004

Integrating New Cost Model into HMA-Based Grid Resource Scheduling

653

The paper is organized as follows: Section 2 introduces the traditional cost model. In section 3, the HMA-based grid resource scheduling system is described. In section 4, we integrate new cost model into our HMA-based grid resource scheduling systems. Comparing experimental results are included in section 5 and the paper concludes in section 6.

2 Traditional Cost Model The traditional cost model of the Internet forms the basis of most call admission, routing and reservation algorithms today. The model combines the cost classifying, switching, queuing and scheduling at a node with the cost of transmission over the next link in one abstract figure associated with each node-link pair (See Figure 1)[6]. Based on this model, the total cost of a system can be defined as:

Where, C denotes the total cost of system; Cs denotes the cost of switching and scheduling; Cq denotes the cost of queuing; Cl denotes the cost of link.

Fig. 1. Traditional Cost Model

Although simple, this model has proven effective and robust in designing many popular network protocols. Shortest Path Routing, for example, uses this cost model to find the route between a pair of origin-destination nodes by minimizing the sum of per-hop costs.

654

J.-y. Zhang, F. Min, and G.-w. Yang

3 HMA-Based Grid Resource Scheduling System In our resource scheduling system, we consider the whole grid system as a Global grid, which is made up of n Grid Domains. Let (i = 1, 2, ... , n) denote the ith Grid Domain (See Figure 2). The is an autonomous, administrative and interactive entity consisting of a set of resources, services and users managed by a single Management Agent In our system, we divide each GD into two subdomains: (a) a resource domain (RD) which signifies the resources within the GD; and (b) a user domain (UD) which signifies the users within the GD.

Fig. 2. HMA-Based Grid Resource Scheduling System First of all, we define as system resource matrix, where specifies how many type j resources possesses. We also define as system processing power matrix, where specifies processing power of for the jth type of resource, clearly for any legal i, j. In this system, Resource Deciding Agent (RDAgent) maintains an n×n table D which records the distance between GDs, where denotes the distance between and It also holds a quadruple to describe each GD in detail provides by each MAgent. Where, (a) denotes serial number of current GD, (b) specifies local resources; (c) denotes the processing power of current GD; (d) denotes the users of the ith user within At the beginning, RDAgent initializes the distance table D and quadruple according to the information submitted by all existed MAgents. When RDAgent polls MAgents on its own initiative, MAgents would refresh the table and the quadru-

Integrating New Cost Model into HMA-Based Grid Resource Scheduling

655

Fig. 3. A New Cost Model

ple if the update happens within current GD. Otherwise, MAgents needn’t submit the update and only keep the update information locally. Let TR denote a task requirement originated from a GD, it includes the explicit user IP address, and its original GD. It also includes the information of resource requirement. The requirement amount of each type of resource is denoted as respectively. After a TR is recognized and analyzed by Resource Scheduling Agent (RSAgent), RSAgent orders Resource Finding Agent (RFAgent) to find the location of resources which can meet the requirement TR. At first, RFAgent queries RDAgent for resource information. Because RDAgent has a distance table D and a quadruple after RFAgent finishes the finding processing, it can get a resource deployment matrix The columns specify RS and the rows specify Grid Domains. If locates at and we let else we let Now RFAgent examines this matrix if RFAgent finds a column whose elements are all it will ask the RDAgent to update table D and quadruple After update table D and quadruple if matrix still does not satisfy the TR, the next time update will start after a period of time (which we set to 5 minutes). If appropriate resource combination cannot be found after some times (which we set to 3 times), RFAgent concludes that the TR cannot be met and the task cannot be implemented, and sends this message to RSAgent. RSAgent will cancel this task and notify the user. If appropriate resource combination can be found, RFAgent sends this resource combination information to RSAgent. RSAgent will schedule resources according to this information and return result to the user, which procedure will be given in the next section.

4 Integrating New Cost Model into HMA-Based Grid Resource Scheduling System Generally speaking, resource schedulers and resource managers tend to choose the nearest resource because they make decisions depending upon traditional cost model [4]. Now we present a new cost model based on grid systems; therefore, the cost of resource finding and cost of resource deciding will affect the total cost significantly.

656

J.-y. Zhang, F. Min, and G.-w. Yang

Thus, the system total cost C consists of three parts: cost of resource finding cost of resource deciding and cost of resource scheduling Cs. Its architecture is illustrated with Figure 3. On the view of Figure 3, can be further divided into processing cost and transmission cost The total cost of resource scheduling C can be defined as:

Now we can integrate the new cost model into HMA-based grid resource scheduling systems to implement reasonable scheduling with minimal cost. On receiving a TR, RSAgent will forward resources and processing power requirement to RFAgent. RFAgent examines this matrix if it finds a column whose elements are all it will ask the RDAgent to update table D and quadruple We use integer variable count to record the update times. After refreshing table D and quadruple if matrix still does not satisfy the TR, the next time update will start after a period of time (about 5 minutes). Therefore, the cost of finding resources can be measured by count and denoted as That is to say, If count > M (M is a constant and M >0. In our paper, we set M = 3), then let RFAgent consider the TR cannot be met and tell this message to RSAgent. RSAgent will cancel this task and notify the user. If only one element is not in each column, this means only one kind combination of resources and Grid Domains can satisfy TR. So RSAgent has no choice but select this combination as the only solution regardless of cost C. If there is more than one resource combinations can satisfy TR, RDAgent will list all possible resource combinations and make decisions which combination to be chosen. The combinations can be signified as follows:

where RAi represents the GD which offers the ith resource, 0 for no requirement of respective resource. The requirement of each type of resource can be met by some GD. Each type of resources is selected independently, so we need choose among at most n values each time. Accordingly, the time complexity is O(n×k) instead of If some which satisfies TR and includes GD, which means there are kinds resources can be provided locally, and kinds resources locate at other GDs. We can define as:

Resource scheduling cost is made up of processing cost and transmission cost The stronger the processing power of GD, the smaller the of GD. For represents computation power of GDi for resource j, then

is the total computational power when choosing respective combination.

Integrating New Cost Model into HMA-Based Grid Resource Scheduling

657

Accordingly, we use processing power’s reciprocal to signify i.e. Because we assume the task to be processed cannot be divided further, the GD is unique. The father the distance between two GDs, the larger the Accordingly, we use logarithm of the distance between GDs and to signify so we have:

658

J.-y. Zhang, F. Min, and G.-w. Yang

Fig. 4. Performance comparison

Based on equation (2), the system total resource scheduling cost is:

RDAgent will choose the CB with minimal C and recommend to RSAgent. RSAgent would perform scheduling according this CB and return results to corresponding user. In this paper, we don’t compare costs of two tasks, so how many resources (such as communication traffic, computing overhead and so on) are needed is not our concern when we calculate total cost C. That is to say, the requirement we compute is relative but not absolute.

5 Comparing Experiment We simulate a system containing 10 GDs. 4 types of resource are provided. Distance between these GDs are listed in Table 1, initial information about all GDs are listed in Table 2, and task requirements information are listed in Table 3. Figure 4 shows average cost of tasks under two conditions: 1, optimal solution with traditional cost model and 2, optimal solution with new cost model. Here we can see that optimal solution under traditional cost model may not be optimal under new cost model.

6 Conclusion With the growing popularity of middleware dedicated at making so-called grids of processing and storage resources, network based computing will soon offer to users a

Integrating New Cost Model into HMA-Based Grid Resource Scheduling

659

dramatic increase in the available aggregate processing power. We propose an HMAbased grid resource scheduling system and a new cost model is given. Especially, we formalize resource information and task requirement in detail. Comparing experiment shows that optimal solution under traditional cost model is no longer optimal under our model.

References [1] [2]

[3]

[4] [5] [6]

I. Foster, C. Kesselman, and S. Tuecke, “The anatomy of the Grid: Enabling scalable virtual organizations,” Int’l Journal on Supercomputer Applications, 2001. K. Krauter, R. Buyya, and M. Maheswaran, “A taxonomy and survey of Grid resource management systems,” Software Practice and Experiance, Vol. 32, No. 2, Feb 2002, pp. 135–164. F. Berman, R. Wolski, S. Figueira, J. Schopf, and G. Shao, Application-level scheduling on distributed heterogeneous networks, in “Proc. 1996 Supercomputing”, Pittsburgh, PA, USA, 1996. B. Davie, S. Casner, C. Iturradle, D. Oran, J. Wroelawski, “Integrated Services in the presence of Compressible Flows”, Internet Draft, February 1999. I. Foster and C. Kesselman, “The GRID: Blueprint for a New Computing Infrastructure”, Morgan-Kaufmann, 1998. Kazem Najafi and Alberto Leon-Garcia, “A Novel Cost Model for Active Networks”, Communication Technology Proceedings 2000, vol.2, pp. 1073–1080.

CoAuto: A Formal Model for Cooperative Processes* Jinlei Jiang and Meilin Shi Department of Computer Science and Technology, Tsinghua University, Beijing, P. R. China, 100084 {jjlei, shi}@csnet4.cs.tsinghua.edu.cn http://cscw.cs.tsinghua.edu.cn

Abstract. A formal model called CoAuto (Cooperative Automaton) is proposed to describe and analyze cooperative processes. A basic CoAuto abstracts the behaviors of a single active entity. It separates data from control states and thus, can describe various cooperation scenarios (e.g. synchronous and asynchronous) by composition in a uniform yet flexible way. The composition can be done at two different levels (i.e. data sharing and action/control sharing) and thus, more complex cooperative process can be depicted. The paper details the structural elements of CoAuto and shows how to model cooperative processes in real world and analyze some basic properties (e.g. safety and liveness).

1 Introduction Computer Supported Cooperative Work (CSCW) is concerned with understanding how people work together and ways in which technology can assist. Interests in it have intensified in the last few years. As a result, numerous groupware systems have emerged such as MMConf[4], GroupKit[9], workflow systems, to name a few. Though these products have different emphases and are designed for supporting cooperation of certain type, further studies on them have shown that they present some regularities. For example, each system should support communication among the computational entities. This makes it possible to develop a general-purpose platform that can provide some services common to groupware development. To achieve this purpose, we believe a formal model will help us better understand the essential concepts and some interesting properties of cooperative processes. Indeed, people have done a lot towards the purpose above. Examples are various coordination models and languages[8], and team automaton[5]. Coordination models and languages are exploited to describe concurrent and distributed computations. Here we take IWIM[1] as an example, which is a control-driven model. In IWIM, different computing entities are interconnected by streams and communicate with each other through input/output ports. The formal semantics of a kernel of * This work is co-supported by the National Natural Science Foundation of China under Grant No.60073011, the National High Technology Research and Development 863 Program under Grant No.2001 AA113150 and 985 Project of Tsinghua University. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 660–668, 2004. © Springer-Verlag Berlin Heidelberg 2004

CoAuto: A Formal Model for Cooperative Processes

661

MANIFOLD, which is a language implementing IWIM, are presented in [2] based on a two-level transition system: the first level is used to specify the ideal behavior of each single component, whereas the second level captures their interactions. Although the approach is interesting, it is specially designed for MANIFOLD and loses the generality — it’s not suitable for describing synchronous activities. Team automaton is a framework and mathematical model for describing and analyzing groupware systems. It concerns with how to build groupware systems and how groupware systems work rather than to analyze cooperative processes. In this paper, a formal model called CoAuto is proposed to describe cooperative processes in a uniform yet flexible way. A CoAuto is a two-level transition system. At the first level, there exists a set of transition systems with each defining the behavior of a single participant. Different systems are combined via data dependency to model synchronous activities in a cooperative process. At the second level, there exists a single transition system that defines the interactions between the transition systems at the first level via control dependency. With this model, we can then analyze the properties of cooperative processes. The rest of the paper is organized as follows. In section 2, we revisit cooperative procedures and identify some key characteristics. These observations form the foundation of our work. Then CoAuto is formally defined in section 3. Composition rules are also given here. Section 4 presents a simple example to illustrate the model. Afterwards, we discuss some related issues such as safety and liveness in section 5. In the end, conclusions are drawn in section 6 with future work given.

2 Cooperation Revisiting A cooperative procedure can be studied from three aspects, that is, task decomposition, dependency and cooperation mode. Their details are as follows. Generally speaking, cooperative task always has a large goal and involves multiple participants. To accomplish it, it is usually divided into sub-tasks, which are then assigned to different people. The procedure of decomposition is recursive and it won’t stop until all the final sub-tasks can’t be divided any more or some conditions are met. In this paper, we use the term atomic task to denote such sub-tasks. By atomic we mean only one participant engages in a single and simple objective according to some pre-defined rules. A cooperative task is said to be modeled if all the atomic tasks and their relations have been identified and the composition rules are properly given. There are two typical inter-dependencies between two atomic tasks, that is, data dependency and control dependency. A data dependency exists among atomic tasks if these tasks access the same data object. Usually these tasks are called peer tasks. In this case, any operation executed by one atomic task should be multicast to other ones if the operation has modified the data object. This often occurs in synchronous groupware. One crucial issue related is to maintain the consistency of the shared data object. Control dependency means there are some causal relations between two tasks. For example, task B can’t start until task A completes. It is very common in workflow

662

J. Jiang and M. Shi

management system. In addition, we should point out that more complex dependency can be expressed by using logic connectives Real-world cooperation is far more complex and can’t be simply treated as a single asynchronous or synchronous process. For example, a report or an article can be coauthored synchronously by multiple authors and then the resulting document may be transferred asynchronously to others for further review or approval process. To facilitate further discussion, various tasks are divided into three classes as shown in Fig. 1.

Fig. 1. Cooperation Modes

The above three aspects are related to each other. The rules of decomposition depend on the type of dependency. For the sake of simplicity, we assume that a complex task will be decomposed into a collection of atomic tasks that satisfy there is at most one type dependency between any two of them. With atomic tasks and the dependencies identified, cooperative task of any type can be described by combining them. In more detail, (1) a single user task is an atomic task, and (2) a synchronous task could be modeled as a set of atomic tasks with data dependency, and (3) an asynchronous task could be modeled as a set of atomic tasks with control dependency, and (4) for integrated task, atomic tasks with data dependency are composed first and then the results are combined with other ones according to control dependency.

3 Formal Definitions CoAuto is derived from I/O automaton[7], which is known to have the power to specify synchronous or asynchronous, blocking or nonblocking systems[5]. Before running into the formal definitions, we will introduce some notations, represents null set and means Cartesian product, and i, denote indices. Definition 1 (Cooperative Automaton). A cooperative automaton is a tuple (D, S, where D is a set of data variables representing the data object under manipulation. Each variable should have a value, we denote by V the values assigned to D at a certain time. S is a nonempty state set is a set of input actions, which are generated by the environment and computed by the automaton is a set of internal actions, which are generated and computed by the automaton is a set of output actions, which are generated by the automaton computed externally by the environment For the above three sets, the relation holds. Each action in them has a form of , where e is an event, c is the pre-

CoAuto: A Formal Model for Cooperative Processes

663

condition and o is the operation. An operation o can be executed if and only if event e occurs and condition c is met. We use A (called action signature) to denote all the actions of a CoAuto, here I is a nonempty initial state set, which has the form of where and is a set of values assigned to D at state s. F is a set of transition rules. It has two forms Normal: here This rule applies to the situation where the corresponding atomic task has no data dependency with the others. Abnormal: here and is a transformation function. This rule applies to the situation where data dependency occurs. The purpose of is to convert the defined actions into actual internal actions in order to keep the consistency of the shared data among distributed sites. An example for is the transformation functions in dOPT[6]. A CoAuto defined above(called basic CoAuto) abstracts the behaviors of a single actor in a cooperative process. Though these actors can perform operations freely as long as they are permitted, they will eventually achieve a common group decision. So, they can be regarded as a single coherent cognitive unit (i.e. an activity in a cooperative process) which performs externally observable behaviors and interacts with environments. This is the philosophy behind CoAuto. With basic CoAuto defined, we can describe various cooperation modes uniformly by composition. Definition 2 (single user or atomic task). A single user or atomic task is a CoAuto with transition rules only of the normal form, since there is no data or control dependency present. Definition 3 (synchronous task). Given a collection of CoAutos sharing the same data and with no disjoint internal actions (called peer CoAutos), or formally and a CoAuto describing a synchronous task can be obtained according to the following rules. C.D = D

F contains all admissible transitions described below. we have and If we have and Definition 4 (asynchronous task). Given a collection of CoAutos sharing no data and internal actions between any two of them, or formally If

664

J. Jiang and M. Shi

A CoAuto the following rules.

describing an asynchronous task can be obtained according to

where M is the number of component CoAutos. This rule implies that data of each component automaton are treated as a whole.

Transition rules in this case are similar to those in definition 3 except that the input actions will only change the states of component automata that can recognize them, whereas the other component automata keep dormant during this cycle. Definition 5 (cooperative process). Given a collection of CoAutos satisfying the following condition a CoAuto describing an asynchronous task can be obtained via the following two steps: 1. Compose the peer automata according to the rules given in definition 3. 2. Compose the rest automata and the results got in step 1 according to the rules given in definition 4. Note this can also be done hierarchically. Though derived from I/O automaton, CoAuto has quite some differences from its parent: 1) CoAuto shares not only the joint actions, but the common data. This makes it convenient to specify the requirement on data consistency, and 2) it is allowed for two component automata to have the same output actions in CoAuto, and 3) the composition of CoAuto can be done at different levels according to dependencies present.

4 A Case Study In this section we will illustrate the model defined previously. The example deployed is an abstraction of a B2C e-commerce process where buyers interact with the dealers to purchase something. The process is shown in Fig. 2.

Fig. 2. An E-Commerce Process

The process contains two activities, that is, Deal and Consult. Deal activity has one participant seller, an input action br(Buyer Request), an internal action hr(Handle Request) and an output action cs(Consult Scheduling). Consult activity is a synchronous activity involving two participants, that is, seller and buyer. Both the seller and the buyer can suggest deliver time(dt). Once the suggestion is accepted, the activity

CoAuto: A Formal Model for Cooperative Processes

665

ends producing an output action dr(Deliver Request). Otherwise, dt must be set again. Thus, this activity has three input actions: cs from Deal activity, SetDT and SetAgree from seller and buyer. Let = {Ready, Running, Completed, Aborted}. The component automata and the composed result for this process are as follows. Deal: where dealt is a variable indicating whether the request has been handled. We just omit other variables here for simplicity.

Consult: Component automata for seller and buyer are the same and given below. In this case, one just accepts results from the other, so we have for any where agree indicates if the negotiation completes

Running> According to definition 5, the composed automaton for the process can be got via the following two steps. (1) Compose and first. The result, denoted by is as follows.

(2) Compose

and

The result, which is what we want, is as follows.

F contains all admissible transitions. The state transition diagram is shown in Fig. 3, where

666

J. Jiang and M. Shi

is the initial state and is the final one. State is compound, indeed it contains many states with different dt values. In the diagram, means input action and means output action. Output actions are important for identifying the relations between CoAutos, however, when analyzing a single CoAuto, they are usually removed. In addition, actions causing no state transition can also be removed from the diagram.

Fig. 3. State Transition of Composed Automaton

5 Process Analysis Safety and liveness are two important properties when proving the correctness of cooperative processes. Roughly speaking, a liveness property specifies that certain desirable events will eventually occur, while a safety property specifies that undesirable events will never occur. With CoAuto model, we can formally define them as follows based on the normal state transition diagram. Definition 6 (normal state transition diagram). A state transition diagram of a CoAuto is normal if each action in the diagram connects two (same or different) states. This definition is used to remove disturbing actions to guarantee the correctness of analysis results. The diagram given in figure 3 is not normal since dr connects only one state. However, if we remove dr from the diagram, it will become normal. Definition 7 (live process). A cooperative process is live iff for each state (denoted by s) reachable from the initial state (denoted by in the normal state transition diagram of the CoAuto corresponding to this process, there exists a transition sequence leading to a final state (denoted by Formally, Here means reachability. A state is called reachable from transition sequence leading from state to That is,

iff there is a

Definition 8 (safe process). A cooperative process is safe iff for each non-final state in the normal state transition diagram of the CoAuto corresponding to this process, there exists at least one output action. Formally, Here is a set of final states and represents all admissible output actions at state s. From this definition, we can see it is necessary to remove disturbing actions to avoid making wrong conclusions.

CoAuto: A Formal Model for Cooperative Processes

667

Definition 9 (sound process). A cooperative process is sound iff it is live and safe. In the end we should point out although the properties are defined according to the state transition diagram, other formats are also viable. Indeed, our formalism supports a variety of verification techniques such as simulation methods, compositional reasoning[3] and temporal logic methods[10].

6 Conclusions and Future Work This paper has established a mathematical foundation called CoAuto to deeply understand the cooperation which involves humans’ activities of different types. The model is based on such a fact observed that a complex cooperative process can be divided into a set of atomic computation entities. These computation entities cooperate with each other to achieve a common goal. The cooperation among them differs in interdependency (data or control) and cooperative mode (synchronous or asynchronous). The outstanding feature of CoAuto is in its uniformity and flexibility resulting from the separation between data and control dependencies. Moreover, thanks to the formal automata set-up, results and methodologies from automata theory are applicable. Our work is far from completion and there are still many open problems. For example, are there redundant states/actions in the state transition diagram? How long will a process last? In the future, we will study these two problems further. In addition, we will also investigate efficient algorithms to verify properties of cooperative processes. Acknowledgements. Some of the work was done at Bell-Labs Research China. Thanks owe to Yan Wu and Guangxin Yang for their significant contributions to the idea presented in this paper.

References 1. 2.

3.

4. 5. 6.

Arbab F.: The IWIM Model for Coordination of Concurrent Activities. In: LNCS 1061, Springer-Verlag (1996) 34–56 Bonsangue M. M., Arbab F.: A Transition System Semantics for the Control-Driven Coordination Language Manifold. Report SEN-R9829, 1998, CWI, Amsterdam, The Netherlands Cheung S. C., Giannakopoulou D. and Kramer J.: Verification of Liveness Properties Using Compositional Reachability Analysis. ACM SIGSOFT Software Engineering Notes, 6(1997) 227–243 Crowley T., Milazzo P. et al: MMConf: an Infrastructure for Building Shared Multimedia Applications. In: Proc of ACM Conf on CSCW (1990) 329–342 Ellis C. A.: A Framework and Mathematical Model for Collaboration Technology. In: LNCS 1364, Springer-Verlag (1998) 121–144 Ellis C. A. and Gibbs S. J.: Concurrency Control in Groupware Systems. In: Proc of ACM Conf on Management of Data (1989) 399–407

668

J. Jiang and M. Shi

Lynch N. A. and Tuttle M. R.: An Introduction to Input/Output Automaton. CWI Quarterly, 3(1989) 219–246 8. Papadopoulos G. A. and Arbab F.: Coordination Models and Languages. Report SENR9834, 1998, CWI, Amsterdam, The Netherlands 9. Roseman M. and Greenberg S. GroupKit: A Groupware Toolkit for Building Real-time Conferencing Applications. In: Proc of ACM Conf on CSCW (1992) 43–50 10. Sistla A. P.: On Characterization of Safety and Liveness Properties in Temporal Logic. In: Proc of ACM symposium on Principles of Distributed Computing (1985) 39–48

7.

A Resource Model for Large-Scale Non-hierarchy Grid System* Qianni Deng 1, Xinda Lu 1, Li Chen 2, and Minglu Li 1 1

Dept. of Computer Science&Eng., Shanghai JiaoTong Univ., Shanghai 200030, P.R.China {deng–qn, lu–xd, li–ml}@cs.sjtu.edu.cn 2

Dept. of Mechanical Eng., Shanghai JiaoTong Univ., Shanghai 200030, P.R. China [email protected]

Abstract. Computational Grid and Peer to Peer computing system are interlap– ping with each other progressively. This paper brings forward a resource model for future large-scale non-hierarchy grid system. With this model we can represent heterogeneous resources sharing relationships. Based on the assumption of power-law degree distribution and result of Lada A. Adamic [20], we find that the unstructured locating algorithms used in P2P system do not suit this resource model. Finally we suggest that it is necessary to classify Grid systems, observe network topology and build classified resource models for different types of Grid system.

1 Introduction Early Computational Grid [1,2] and Peer to Peer computing system [3] are two different types of distributed systems, now they are interlapping with each other progressively, future large scale resource sharing Grid system will has following features. Large-scale, lack of centralized control. Dynamical changeable membership. Some of Grid resources are stable and reliable, but the others may join or leave the system dynamically. Resource diversity. The diversity means (a) multiple types of resources, e.g. computational resource, data, service, instrument, storage and so on. Resources of a same type are heterogeneous, e.g. different O.S., different number and speed of CPU, different size of data, and different provided services, (b) Multiple characteristics. Some resources can provide stable services, for example, provide service for all users from 6.00AM to 6:00PM. But some resources are un-stable, for example, only can be shared when the resources are idle. To manage and locate diverse resources in this large-scale, dynamic and heterogeneous environment, one needs an effective resource organization model. This model must be different with the hierarchy resource model of early computational Grid. In * This work is supported by National Natural Science Fund of China (No. 60173031). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 669–676, 2004. © Springer-Verlag Berlin Heidelberg 2004

670

Q. Deng et al.

following sections of this paper, we first analyze the network topology of large scale resource sharing Grid system, then put forward a non-hierarchy resource model, based on the proposed model we compare two resource locating mechanisms by theoretical analysis. And finally we bring forward possible researching hotspots of Grid resource model.

2 Network Topology of Large-Scale Resource Sharing System Internet is the base infrastructure of large-scale distributed resource sharing Grid system. The network topology of Internet is un-predictable and changes dynamically, and it is one of key factors that affect the efficiency of resources management and locating in Grid system. Random graph is a basis mathematic tool used to research large-scale complicated network system. The classical model [4,5,6,7] of random graph theory supposes that there are totally N labeled nodes in the graph and connectivity probability between any two randomly chosen nodes is p, therefore there exists pN(N – 1)/2 links in the graph totally. Each node has several links with neighbor nodes. Not all nodes in a network have the same number of links. The spread in the number of links of the diverse nodes, or a node’s degree, is characterized by a distribution function p(k) , which gives the probability that a randomly selected node has exactly k edges. Since in a random graph the edges are placed randomly, the majority of nodes have approximately the same degree, close to the average degree of the network. The degree distribution of a random graph is a Poisson distribution

But it has been discovered that for most large networks the degree distribution significantly deviates from a Poisson distribution. In particular, for a large number of

A Resource Model for Large-Scale Non-hierarchy Grid System

671

networks [8,9,10,11,12,14], including the World Wide Web, the Internet, the degree distribution has a power-law tail.

is approximately a constant, that is said p(k) will not be affected by the scale of the network, such networks are called scale free. From most research and analysis, it has been observed that Internet graph follow Power–law in three level: router topology, inter-domain topology and World Wide Web, and the degree distribution power-law tails are all between 2 and 3. The related exponents are shown in Table 1.

3 A Non–hierarchy Resource Discovery Model Wei Li et al[17] has proposed a hierarchy Grid resource model. This paper mainly focuses on building a resource model for p2p grid environment proposed by Iam– nitchi A. [13]. In the context of decentralized resource discovery in large-scale, distributed, heterogeneous systems, we assume that every participant in the virtual organization has one or more servers that store and provide access to resource information. We call these server-nodes or peers. A node may provide information about a small set of resources (e.g. locally stored files or the node’s computing power, as in a traditional P2P scenario) or a large set of resources (e.g. all resources shared by an institution, as in a typical Grid scenario). From the perspective of resource discovery, the gird is thus a collection of geographically distributed nodes that may join and leave at any time and without notice. Then we give formalization of a resource discovery model based on above assumption. Definition 1. We look the resource discovery model as an undirected graph G. In G the number of nodes and connection between nodes are changeable dynamically, the number of nodes is at most N. All nodes in G are peer to peer, and they all have same function, providing information about (local or community) resources. If two randomly chosen nodes can access each other and exchange information mutually, one link exits between these two nodes. When a request passes to a node one needs to check the resource information in this node, we suppose that the checking time in each node is a constant. Definition 2. Because of the diversity of resources, the access probabilities of some high performance nodes are higher than the other common nodes, we call these nodes capable nodes, e.g. a high performance server can provide high quality service and information about a large set of resources. Therefore the capable nodes would have more links than common nodes. Based on the observed conclusion about Internet topology we assume without loss of generality that the graph G is a scale-free net-

672

Q. Deng et al.

work, the degree distribution of graph G is a

the power tail

is between

2 and 3, and the max degree in graph G is Definition 3. We denote any one resource as r. For any one node, we denote it as the relationship between and r can be denoted as If node has record of the information of resource r, we give but if node has no record of the information of resource r , we give Definition 4. Locating process of resource can be regarded as a request going from initiative node until reaching information provider node where We denote this process as a search path and

4 Analysis of Resource Discovery Algorithm 4.1 Random Walk Description. This algorithm does not consider node heterogeneity and degree distribution disequilibrium. Searching for a requested resource can be look as a random walk process, walking from the initiative node, randomly choosing a neighbor of current node as next checking node, repeating above process until the requested resource is found. Analysis. In worst situation, resource r only can be found until all information nodes have been visited. Therefore the searching cost of random walk, we denote it as s, is the length of walking path through which can scan all nodes in the whole graph. We use the generating function formalism introduced by Newman [19] and Lada A. Adamic [20] for graphs with arbitrary degree distributions to analytically characterize search-cost in power-law graphs. Suppose that we have an undirected graph—of N vertices, with N large. We define the generating function for the distribution of the vertex degrees k, then

Where

is the probability that a randomly chosen vertex on the graph has degree

k. For a graph with a power-law distribution with exponent minimum degree k = 1 and an abrupt cutoff at the generating function is given by

The distribution is assumed correctly normalized, so that

A Resource Model for Large-Scale Non-hierarchy Grid System

The average degree

of a vertex in the case of

673

is given by

Another very important quantity is the distribution of the degree of the vertices that we arrive at by following a randomly chosen edge. Such an edge arrives at a vertex with probability proportional to the degree of that vertex, and the vertex therefore has a probability distribution of degree proportional to The correctly normalized distribution is generated by

If we only consider the number of outgoing edges from the vertex we arrived at, but not include the edge we just came from, we need to divide by one power of x. Hence the number of new neighbors encountered on each step of a random walk is given by the generating function

Therefore the average number of visited neighbors in each step of random walk is

Supposed that

we impose the result of Lada A. Adamic[20] then give

674

Q. Deng et al.

The cost for scanning the whole graph is

so

4.2 High Degree Neighbor Routing Description. This Algorithm considers node heterogeneity and degree distribution disequilibrium. In resource locating process, if a node fails to find the requested resource information in local store it can choose to pass the request message to the neighbor with the most neighbors. Each step of locating resource r is, choosing the highest degree neighbor (denoted as A) of current node as the next checking node, checking whether A has information of r, repeating above process until the requested resource r is found. Searching Cost. In worst situation, resource r can be found until all information nodes had been visited. Therefore the searching cost, we denote it as s, is the length of walking path through which can scan all nodes in the whole graph. Note. In fact, because of the dynamicity of resources, even though a certain node has a information record that resource r locates in node it does not mean that we need to check it from but the above algorithm ignores the cost of this scene. Let be the degree of the last node we need to visit in order to scan a certain fraction of the graph. Then the number of first neighbors scanned is given by

Where we make the assumption that i.e., the degree of the node has not dropped too much by the time we have scanned a fraction of the graph. That is to say, to scan a fraction of the graph we need to walk steps

The cost for scanning the whole graph is

If situation.

the searching cost of algorithm 2 is higher than of algorithm 1 in worst

A Resource Model for Large-Scale Non-hierarchy Grid System

675

5 Conclusion and Future Work From the definition of the proposed resource model we know that it is a generalized model, with which we can represent sharing relationships among heterogeneous resources. Because this model does not distinguish type of resources, it is hard to find a uniform structured method to represent and locate resources. Based on the assumption of power-law degree distribution and result of Lada A. Adamic [20], section 3 analyses the scaling of two unstructured locating algorithms, we find that the locating algorithm which impose the power-law distribution of the network topology does not suit the resource model. We believe that we can improve the resource model as follow. Construct different resource model for different type of Grid, for example Computational Grid, Data Grid and Service Grid, then we can find a uniform resource representation in each type of grid respectively. Besides the unstructured method we have analyzed, we hope to find some more effective structured method to improve resource management and discovery in large-scale, non- hierarchy grid system. Observe the realistic network topology of computational grid, data grid and Service grid. We want to answer the following questions after observing, how about the topology of different types of grid? Are they all scale-free networks? Are their power-tails same with Internet and between 2 and 3? Design corresponding resource management and locating methods according the network topologies of different types of grid.

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

The Anatomy of the Grid: Enabling Scalable Virtual Organizations, IJSA, 2001. http://www.globus.org What is the Grid: A Three Point Checklist, Opinion Pieces by Ian Foster ,Grid Today, July 20, 2002. http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf Andy Oram, editor. Peer-to-Peer Harnessing the Power of Disruptive Technologies. O’Reilly Associates, 2001. as, Random Graphs, Academic Press, New York (1985). P. Erdos and “On random graphs,” Publications of Mathematicae 6, 290–297 (1959). P. Erdos and “On the evolution of random graphs,” Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–61 (1960). P. Erdos and “On the strength of connectedness of a random graph,” Acta Mathematica Scientia Hungary 12, 261–267 (1961). Albert, R. and Barabasi, A.-L. Statistical Mechanics of Complex Networks. Rev. Mod. Phys.Vol 74, January 2002. B. A. Huberman and L. A. Adamic, “Growth dynamics of the world-wide web,” Nature 401, 131 (1999).

676

Q. Deng et al.

10. J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “The web as a graph: Measurements, models, and methods,” in Lecture Notes in Computer Science, No. 1627, T. Asano, H. Imai, D. T. Lee, S.-I. Nakano, and T. Tokuyama (eds.), Springer Verlag, Berlin (1999). 11. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, “Graph structure in the web,” Computer Networks 33, 309–320 (2000). 12. M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the internet topology,“ Comp.Comm. Rev. 29, 251–262 (1999). 13. Iamnitchi, A.; Foster, I.; Nurmi, D.C. A peer-to-peer approach to resource location in grid environments High Performance Distributed Computing, 2002. HPDC-11 2002. Proceedings. 11th IEEE International Symposium on , 2002 Page(s): 419 –419 14. Pandurangan, G.; Raghavan, P.; Upfal, E.Building low-diameter P2P networks. Foundations of Computer Science, Proceedings. 42nd IEEE Symposium on 2001, Page(s): 492 – 499 15. R. Albert, H. Jeong, and “Diameter of the world-wide web,” Nature 401, 130–131 (1999). 16. Govindan, R., and Tangmunarunkit, H., Proceedings of IEEE INFOCOM 2000, Tel Aviv, Israel, (IEEE, Piscataway, N. J.), 3, 1371 (2000). 17. Wei Li, Zhiwei Xu, Fangpeng Dong, Jun Zhang, Grid Resource Discovery Based on a Routing-Transferring Model, 3rd International Workshop on Grid Computing (Grid 2002). 18. Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani, and Bernardo A. Huberman, Search in power-law networks. PHYSICAL REVIEW E, VOLUME 64, 046135(2001) 19. M. E. J. Newman , S. H. Strogatz , and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. 20. Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani, and Bernardo A. Huberman, Search in power-law networks. PHYSICAL REVIEW E, VOLUME 64, 046135(2001)

A Virtual Organization Based Mobile Agent Computation Model Yong Liu, Cong-fu Xu, Zhao-hui Wu, Wei-dong Chen, and Yun-he Pan College of Computer Science, Zhejiang University Hangzhou 310027, P.R. China [email protected],{xucongfu,wzh,chenwd,yhpan}@zju.edu.cn

Abstract. Mobile agent has developed for decade, and widely implemented in distribute computation. However, traditional mobile agents including strongmigration and weak-migration mobile agents still have some weakness. With the fabric of Grid virtual organization architecture, the mobile agent has gain great advantage comparing to those old mobile agents system. In this paper, a novel formalized mobile agent computation model based on the virtual organization is presented. In this model, all the actions of the mobile agents are treated as states. The process of the mobile agents’ workflow is controlled by a finite-state-machine. This ensures the atomic action for each mobile agent to avoid the abnormal condition of communication mismatch. This model takes full advantages of strong-migration mode agent, such as robustness and intelligence; it can also overcome the serious weakness of large amount of data transmission existing in the strong-migration mode agent systems.

1 Introduction Mobile Agents are programs that can be migrated and executed between different network hosts. They locate for the appropriate computation resources, information resources and network resources, combining these resources in a certain host, to achieve the computing tasks [1]. There are tow types of mobile agents classified by the migration ability of the agents. They are strong migration mobile agents and weak migration mobile agents. The ordinary mobile agents system such as AgentTCL [6], Voyager System, Aglet System [7] etc, can all be ranged into those two types. The AgentTCL used the strong migration policy by which the mobile agent takes not only both executable codes and data used in executing process, but also the states of the executing process. The Voyager and Aglet use weak migration policy by which the mobile agent only takes the executed codes and the states of data. When using the strong migration policy, the mobile agent system needs to record all the states and related data in each position of the agent, which will spend huge time and huge space for the transport, and will lead to low efficiency. When using weak migration policy, the transportation of data will decrease greatly, however, the abilities of adapting the complicated network topology will decrease too. Therefore, how to design a reliable M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 677–682, 2004. © Springer-Verlag Berlin Heidelberg 2004

678

Y. Liu et al.

and high-efficiency work pattern for the mobile agent becomes the key problem. In fact, the grid [3] technology has provided a powerful platform for the mobile agent. A serial of grid protocols [2,3], such as GNSP, GGMP, QDP etc, make the new work pattern available. In this paper, we introduce a finite state machine based mobile agent computation model that the ability of migration is between strong migration and weak migration. In that model, the executing process has been divided into several states controlled by the finite state machine, and only the agent body and the communication data should be transported, by this way, we increase the efficiency of transport and the adaptation of mobile agent.

2 VO Based Fabric Architecture of Computation Model The VO based architecture mentioned in this paper is a structure similar to the fabric layer in [2]. The following part, some definitions are given out: Definition A. Node, the minimized devices that can load and execute the mobile agents in network denoted as There is a kind of nodes called Key Node, which deal with the remote communications. Definition B. Group, the set includes one node or several nodes, noted as The group can identify each node in VO, which means each node except key node will only belong to a certain group. means node belongs to group Group is a comparatively stable organization; the nodes belonging to certain group can leave this group and join another group dynamic. The login and logout of nodes use a GGMP (Grid Group Management Protocol) [2], which is similar to the IGMP. Definition C. Node Distance, the least route number between tow nodes, the Node Distance between node i and j marks as Above all the definitions, we can give out the definition of VO. Definition D. Virtual Organization, VO is a fabric structure that composes by nodes and is established by a serial of protocols. Normally, the group contains resembling and adjacent nodes. There is a Key Node in each group, the Key Node in the group, marks as The function of key node in a group is similar to the gateway in a LAN, which communicates with other nodes outside the group. A protocol called GNSP (Gateway Node Selective Protocol) [3] has been used to determine the key node. Among all the nodes and groups, the key nodes constitute a virtual group called Kernel Group, It is the most important portion that server for other nodes, deal with communication, seeking etc between nodes in virtual organization topology.

3 VO Based Mobile Agent Computation Model To implement the VO based computation model, the first requirement is that the mobile agents can be executed in different nodes, so the finite-state mobile agent has been given out, in that model the data and resource have been distinguished. The definitions present as follow:

A Virtual Organization Based Mobile Agent Computation Model

679

3.1 Finite-State Mobile Agent Definition E. Finite-state mobile agent is a resource driven mobile agent system. In fact, the mobile agent whose migration ability is between strong migration and weak migration can be seen as a finite-state machine auto motioning and driven by the resource and data. Definition F. Data are all the local data that will affect the mobile agent’s state changing. Definition G. Resource is a general designation of all the data, device and software in VO nodes. In our model, all the runtime parameters and devices in remote nodes are called resources, which is different from the data affecting the mobile agent’s state-change. Therefore, we can ignore the influence of network topology to the executing of the mobile agents; we can only care for the state-change of the mobile agents when they are executing, migrating, in other words, this mobile agent system need not to care for which node does the mobile agent move from or move to, the current state of mobile agent is the only parameter that should be recorded in that mobile agent, be migrated with mobile agent and be update in time. There are several parts, finite states, transition relation, exterior input symbols (data or resource) etc, which constitute a finite-state mobile agent. The finite-state mobile agent marks as follow: is the identity of the mobile agent. It will retain the identity value during the runtime, and the VO architecture can locate the right mobile agent by It can adopt a universal finite state set and common transition relation in practice service, such as virtual experiment device sharing service mobile agent, while the can be seemed as an instance handle. is the finite state set, including request state suspend state block state and service state The input symbols Ur include tow condition: one is the resource input symbol for service state the other one is the service time Fr is the transition relation.

3.2 VO Based Mobile Agent Computation Model (MACM) After defining the finite-state mobile agent, we can give out the computation model for VO based mobile agent: Definition H. VO based mobile agent computation model is a six-tuple where, R is the node set. S is the finite state set of the mobile agent. S does not include the state of the agent migration and the null state Here, migration state means that the mobile agent starts move to another node to executing new state; null state means the mobile agent does not perform any action (executing and migration), is the set of all the message operation states for mobile agent. is the state of sending message, is the state of receive message, is the initial node that the mobile agent has been produced, a mobile agent’s service

680

Y. Liu et al.

firstly comes from the node v, and then cycles driven by the finite states, is the set of final node for the mobile agent, only in the final node the mobile agent can be destroyed and the service ends, The transition relation, is a finite subset of where (1) (2) (3) (4)

To all the To all the To all the To all the

then then then then the next transition state relation is

In this computation model the migration state is established by the communication of the nodes in VO. From the definition of the computation model, all the migration, communication (message method) and executing remote have been regard as a state in computation model. There is commonly a communication invalidation problem [4] in traditional message passing mobile agent system, which is caused by the asynchronies of the mobile agent’ migration and the mobile agent’ message. After the mobile agent sends out its message, it moves to another nodes, and then there may be a problem that the return message cannot find the position of the original agent. Analogously, a broadcast based mode [5] for the mobile agent’s seeking and communication has the problems such as huge transmitted data, easy to be block in finite bandwidth, low reliability etc. In our VO based computation model, the communication has been treated as states, and a serious of rules have been defined to ensure the symmetry of the sending message operation and receiving message operation, so the communication process and the migration process can be a atomic operation. This will ensure the communication and migration to be sequence logic; there will not be the communication invalidation problem.

3.3 Transition Forecast Algorithm in MACM In VO based MACM, data and resource are unified, so a minimize distance transition policy can be implemented, which can decrease the migration cost much. The transition problem can be described as, in a node set when a mobile agent arrives at node which node will be the next migration position for this mobile agent. Here the algorithm is given as follow. Algorithm. Minimal distance transition forecast algorithm Step 1. The mobile agent serving in node finds a resource miss problem, which is say there is not enough resource for the service continuing. Step 2. Node broadcasts for the resource, only the node who has the resource that agent needs will reply this broadcast. Step 3. When receives the reply messages, there will be possible-transition node set, the node contained in possible-transition node set has the resource that the mobile agent needs. Calculating getting the minimal node distance the node is the next migration node. Step 4. Node move to node algorithm ends.

A Virtual Organization Based Mobile Agent Computation Model

681

4 Analysis China ministry of education began to carry on a resource-sharing project among universities of China from 1999. The aim of this CSCR (Computer Support Cooperation Research) project is to fully utilize the device, data distributing in each university. The most difficulty is the smart, high efficient, stable, reliable CSCR platform. We implement the MACM in the platform. We propose a concept of Service Availability to evaluate the performance of the MACM. Service Availability (SrvAvl) is the ratio of the agent’s service time in node with the general time (service time and the migration time) of the mobile agent.

Here, is the service time in nodes, is the migration time of mobile agent, from the equation, we can conclude that decrease the migration time can efficiently increase the service availability when the service time keep stable. In our MACM, the migration time can calculate by the following equation.

is the size of the mobile agent; is the transfer velocity from node to node of section q. commonly, we use an average transfer velocity, B, between and the relation between the mobile agent’s size, migration times and the service availability show in figure 1.

Fig. 1. Relation between the agent size, migration times and service availability, here the red curve presents the smallest size of agent with M, blue curve presents the middle size of agent with 2M, black curve presents the biggest size of agent with 3M.

From figure 1, when the mobile agents have different size, the corresponding service availabilities distinguish. The service availability of the smaller size agent is higher than the bigger size agent. So decreasing the size of mobile agent can greatly increase the system performance.

682

Y. Liu et al.

5 Conclusion In this paper, we propose a virtual organization based mobile agent computation model to solving the problems in traditional mobile agent system. This model integrates the advantages of the strong-migration agent and the weak-migration agent, so that it can provide a more intelligence and robust mobile agent computation model which can avoid much frequently data transmit. With the aid of the virtual organization architecture, this computation model can effectually avoid the communication invalidation problem. However, this model needs high performance of the key node and the average bandwidth of the kernel group will affect the system capabilities greatly. Acknowledgement. This paper is supported by the projects of Zhejiang Provincial Natural Science Foundation of China (No. 602045, and No. 601110), and it is also supported by the advanced research project sponsored by China Defense Ministry & Education Ministry.

References 1. Tao, Xian-ping, Liu, Jian, and et al. Mobile agent: a kind of future distributed computation model. Computer Science, 1999, 26(2): 1-6. 2. Huang Lican, Wu Zhaohui, Pan Yunhe. Virtual and Dynamic Hierarchical Architecture for E-Science Grid. International Journal of High Performance Computing Applications, Volume 17 Issue 3- August 2003. 3. I. Foster, C. Kesselman, S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 15(3), 2001. 4. Tao, Xian-ping, Feng, Xin-yu, Li, Xin, and et al. Communication mechanism in Mogent system. Journal of Software, 2000, 11(8):1060-1065. 5. Murphy, A., and Picco, G.P. Reliable communication for highly mobile agents. In: Proceedings of Agent Systems and Architectures/Mobile Agents (ASA /MA)’99, CA, USA, 1999. pp.141-150. 6. Gray, R.S. Agent TCL: a transportable agent system. Proceedings of International Conference on Information and Knowledge Management (CIKM’95). Workshop Intelligent Information Agents, Dec., 1995. 7. Lange, D., and Oshima, M. Programming Mobile Agents in Java - with the Java Aglet API. http://www.cis.upenn.edu/~bcpierce/courses/629/papers/AgletsBook-index.html

Modeling Distributed Algorithm Using B Shengrong Zou Department of Computer Science and Technology,Yangzhou University, Yangzhou 225009,China [email protected]

Abstract. Although there have been several attempts to create grid systems, there is no clear definition for grids. In this paper, a formal approach is presented for defining elementary functionalities of distributed systems. We shall illustrate the use of a certain formal technical for developing distributed algorithms.This technique uses a so-called “event driven” approach together with the B-Method. The resulting general machines for distributed system can serve as a framework for defining new systems or analyzing existing ones.

1 Introduction Design and operation of large systems is becoming increaseingly complex.The interaction of cooperation and competition relationships leads to subtle and even paradoxical behaviours. Therefore formal methods are increasingly required in engineering practice.this is particularly true for performance evaluation, a natural starting point for the design and construction of large and complex systems. B[1], Z [2] and VDM[3] are formal methods based on the construction of models (as opposed to those based on an algebraic approach like LARCH, OBJ, ASM[4]). B is the most recent of the three notations. B is also a development method covering all the steps from the specification to the code. B is based on an explicit axiomatic of the type set theory. B contains a structuring mechanism (composition/ decomposition) which is the abstract machine (transformed during the development into refinement and then implementation). The development method is based on mathematical theories that are fully stated: the theory of generalised substitutions, the theory of refinement, the theory of layered architecture for software. The definition of the system dynamics is not done by means of pre-post conditions but by the generalisation of the notion of substitution based on the theory of predicate transformers. The critical advantages of B,enabling its effective uptake and use within industry are : The relatively simple and familiar notation (generalised substitutions) used to specify state transformations. The uniform use of this from specification to code reduces the cost of learning the notation, and the possibility of semantic errors through translations;

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 683–689, 2004. © Springer-Verlag Berlin Heidelberg 2004

684

S. Zou

constructs for supporting modularity in specification and implementation, allowing decomposition of the task of verification and specification into more feasible subtasks. The unusual nature of these constructs may be an initial problem for those familiar with other specification languages, but represent no greater learning difficulty than the structuring facilities of Ada or C++; the existence of robust tool B-toolkit support for all the stages of the software development lifecycle, including animation and document production. This collection of facilities is not currently offered for any other formal method; the successful application of the method and language to large industrial system, in a range of technical areas: real-time, simulation, information processing and engineering. It is commonly accepted that through the advent of high speed network technology, high performance applications and unconventional applications emerge by sharing geographically distributed resources in a well controlled, secure and mutually fair way. Such a coordinated large scale virtual pool of resources requires an infrastructure called a grid. Although the motivations and goals for grids are obvious, there is no clear definition for a grid system. The grid is a framework for “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources’’[5] , “a single seamless computational environment in which cycles, communication, and data are shared, and in which the workstation across the continent is no less than one down the hall”[6], “a wide area environment that transparently consists of workstations, personal computers, graphic rendering engines, supercomputers and non-traditional devices: e.g., TVs, toasters, etc.” [7] ,“a collection of geographically separated resources (people, computers, instruments, databases) connected by a high speed network a software layer, often called middleware,which transforms a collection of independent rescources into a single, coherent, virtual machine”,with varying degrees of precision these definitions describe the central notion of grid system, yet, they are unable to clearly distinguish between grid systems and conventional distributed systems.In this paper we try to give a model for conventional distributed system using B .

2 One Model of Distributed System Here we introduce a distributed system, it have a possibly large (but finite) number of agents. These agents are disposed on different sites that are in communication with each others by means of unidirectional channels forming a ring [8] [9]. Each agent is thus able to send messages to its “right” neighbour and receive ones from its “left” neighbour. Such messages are not supposed to be transmitted immediately from one node to the next. In fact,we suppose that they can be “buffered” between the two,and also reordered or duplicated. Moreover each agent is supposed to execute the same piece of code. The distributed execution of all these programs should result in a unique agent being “selected the leader”. This decision, based on certain local criteria, should be made by the winning agent itself. Of course, it must be

Modeling Distributed Algorithm Using B

685

proved that no other agent can reach the same conclusion. The determination of such a privileged agent might be useful when the ring is started or re-initiated.

Since every agent executes the same code,the problem seems to be unsolvable:what kind of distinction between them could indeed introduce a certain difference in their,otherwise homogeneous,behaviour? Their position in the ring is certainly not such a distinction, since the very shape of the ring does not give the position of an agent any special distinction (no first, no last, no medium position, etc). In fact,the only attribute that makes one agent different from the others is its name:the agents are indeed supposedly named and named differently. But by itself,this difference in names still is an homogeneous property:there is, a priori, no “more” distinction than the distinction itself.

3 Modeling Distributed System Using B Then we can give some machines for a distributed system. The model presented here is a distributed multi-agent system [10][11][12] where agents are processes. The Self function represented here as p allows an agent to identify itself among other agents. It is interpreted differently by different agents. The following machines constitute a module, i.e. a single-agent program that is executed by each agent. Machine 1: Map The working cycle of a distributed system is based on the notion of a pool of computational nodes. Therefore, first all processes must be mapped to a node chosen from the pool. Other machines cannot work until the process is mapped.

686

S. Zou

Note the declarative style of the description: it is not specified how the appropriate node is selected, any of the nodes where the conditions are true can be chosen.The selection may be done by the user, prescribed in the program text or it can be up to a scheduler or a load balancer layer, but at this level of abstraction it is irrelevant. Actually , the conditions listed here (login access and the presence of the binary code) are the absolute minimal conditions and in a real application there may be others with respect to the performance of the node, the actual load, user’s priority and so on. Machine 2: Resource grant Once a process has been mapped, and there are pending requests for resources, they can be satisfied if the requested resource is on the same node as the process. If a specific type of resource is required by the process, it is the responsibility of the programmer or user to find a mapping where the resource is local with respect to the process. Furthermore, if a user can login to a node, she is authorized to use all resources belonging to or attached to the node: where BelongsTo (r ,n) = true . Therefore, at this level of abstraction it is assumed realistically that resources are available or will be available within a limited time period. The model does not incorporate information as to whether the resource is shared or exclusive.

Machine 3: State transition If all the resource requests have been satisfied and there is no pending communication, the process can enter the running state.

Modeling Distributed Algorithm Using B

687

The running state means that the process is performing activities prescribed by the task. This model is aimed at formalizing the model of distributed execution and not the semantics of a given application. Machine 4: Resource request During the execution of the task ,events can occur represented by the external event function. The event in this machine represents the case when the process needs additional resources during its work. In this case process enters the waiting state and the request relation is raised for every resource in the reslist.

Machine 5: Send (communication) Processes of a distributed application interact with each other via message passing. Although, in modern programming environments there are higher level constructs, virtual object spaces, etc., and sophisticated message passing libraries like MPI provide a rich set of various communication patterns for virtually any kind of data, at the low level they are all based on some form of send and receive communication primitives. This model restricts its scope to (blocking and nonblocking versions of) message passing. In the following, code fragments for blocking versions are bracketed,and are supplementary to the nonblocking code. Upon encountering a send instruction during the execution of the task, a new message is created with the appropriate sender and receiver information. If it is a blocking send and the communication partner p is not waiting for this message, the process goes to the waiting state and expects p to receive.

Machine 6: Receive( communication) Normally receive procedures explicitly specify the source process for expected messages. However, message passing systems must be able to handle indeterminacy, i.e. in some situations there is no way to specify the order in which messages are accepted. If the task reaches the receive instruction and there exists a message that can be accepted, it is removed from the universe MESSAGE and the process resumes its work.

688

S. Zou

MESSAGE ( msg ): = false means that msg is not part of the MESSAGE universe anymore. It is assumed that the content of the message is in possession of the recipient. The concept of message is like a container: the information held by the sender is transformed into a message and the message exists until the receiver extracts the information. The actual handling of the message (queued,buffered or transmitted) is up to the lower levels of abstraction. If the expected message does not exist and the operation is a blocking call, the process goes into the receive waiting state and updates the expecting function.

Machine 7: Termination This machine represents the event of termination.PROCESS ( p ) := false means that process p is removed from universe PROCESS : it does not exist anymore.

4 Conclusions The outcome of our analysis is a highly abstract declarative model. The model is declarative in the sense that it does not specify how to realize or decompose a given functionality , but rather what it must provide. Without any restriction on the actual implementation, if a certain distributed environment conforms to the definition, i.e. it provides the necessary functionalities, it can be termed a distributed system.In this paper the most elementary and inevitable services are defined. It is a minimal set: without them no application can be executed under assumptions made for distributed system although a number of applications may also require additional services. Our model adopts an architectural/system developer’s point of view. The resulting formal model can be applied in several ways. First, it enables checking or comparing existing system to determine if they provide the necessary functionalities. Furthermore it can serve as a basis for high level specification of a new system or components or for modification of an existing one.Finally, the model is also useful in reasoning about the properties of grids[13].

Modeling Distributed Algorithm Using B

689

References 1. 2.

3.

4.

5. 6.

7.

8. 9.

10.

11. 12.

13.

Kevin Lano: The B language and Method. Springer(1996) Wang Yunfeng , Li Bixin , Pang Jun , Zha Ming , Zheng Guoliang : A Formal Software Development Approach Based on COOZ and Refinement Calculus.31st International Conference on Technology of Object-Oriented Language and Systems.IEEE press(1999) Satpathy, M., Snook, C., Harrison, R., Butler, M., Krause, P.:A Comparative Study of Formal and Informal Specification through an Industrial Case Study.Proc IEEE/IFIP Workshop on Formal Specification of Computer Based Systems(2001) Egon Borger: High Level System Design and Analysis using Abstract State Machines(ASM). In: Hutter, D. (eds.): Current Trends in Applied Formal Methods (FMT rends 98). Lecture Notes in Computer Science, Vol.1641. Springer( 1999) 1-43 Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid. International Journal of Supercomputer Applications( 2001) Grimshaw, A.S., Wulf, W.A., French, J.C., Weaver, A.C., Reynolds, P.F.: Legion: The Next Logical Step Toward a Nation wide Virtual Computer. Technical report No. CS-9421(1994) Grimshaw, A.S., Wulf,W.A.: Legion - A View From 50,000 Feet. Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing. IEEE Computer Society Press, Los Alamitos, California(1996) Abrial,J.R. Extending B Without Changing it for Developing Distributed Systems.In: Iiabrias (eds.): Conference on the B-Method(1996) Abrial J.R., Mussat, L.: Introducing Dynamic constraints in B. In: Bert, D. (eds.): B’98:Recent Advances in the Development and Uses of the B-Method. LNCS vol 1393. (1998) Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. Proc. 10th IEEE International Symposium on HighPerformance Distributed Computing (HPDC-10). IEEE Press (2001) Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers (1999) Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, B., Sunderam, V.: PVM: Parallel Virtual Machine - A User’s Guide and Tutorial for Network Parallel Computing. MIT Press, Cambridge (1994) Zsolt Németh, Vaidy Sunderam: A Formal Framework for Defining Grid Systems.2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (2002)

Multiple Viewpoints Based Ontology Integration Kai Zhang, Yunfa Hu, and Yu Wang

Department of Computing and Information Technology, Fudan University, Shanghai, 200433, P.R.China {011021381, yfhu, 011021395}@fudan.edu.cn

Abstract. Ontology integration is a focus on ontology application field. Ontology can be viewed as a kind of software product. Ontology integration needs to be directed by methodology. In many applications, we need to integrate existed ontologies for a unified ontology for application requirements. The ontology to be integrated can be viewed as a viewpoint of the unified ontology. A multipleviewpoints-based ontology integration approach is introduced by multiple viewpoints theory in requirement engineering. We define ontology viewpoint by characters of ontology and use conceptual graph to represent semantic in ontology. We discuss the inconsistency checking inner ontology viewpoint and among ontology viewpoints. At last, we use a concept lattice to construct the concept hierarchy.

1 Introduction Recently ontology is used widely in many kinds of data integration applications as a powerful tool for knowledge share. Defined as “an explicit specification of a conceptualization” [ 1], ontology unifies all the concepts and relations of domain. In many practice applications, we often need to integrate some existed ontologies to be a unified ontology. The target ontology (the integrated unified ontology, we call it “target ontology” below) commonly constructed for special ontology application. So we should consider the application requirements in ontology integration procedure. There is not an explicit formalism definition of ontology so far. In the paper, we define an ontology a tuple O= (E,R,F, A,I) , where E is classes in the ontology which are usually organized in taxonomies, R is relations which represent a type of interaction between concepts of the domain, F is functions, A is axioms which are used to model sentences that are always true, I is Instances which are used to represent specific elements. If we use to represent ontologies to be integrated, to represent target ontology, K to represent knowledge used in ontology integration, to represent an ontology integration system, then ontology integration procedure can be formalized as below:

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 690–693, 2004. © Springer-Verlag Berlin Heidelberg 2004

Multiple Viewpoints Based Ontology Integration

691

In the paper, we induced multiple viewpoints theory of requirements engineering to ontology integration. The aim of doing this is for research on ontology integration from methodology angle. We focus on the inconsistency checking and construction of concepts hierarchy in ontology integration in our work. In our approach, we use conceptual graphs as ontology knowledge representation tools. The remainder of the paper is as follows. In section 2 we discuss how to integrate ontologies based on multiple viewpoints theory in detail, the emphasis is inconsistency checking, management strategy among ontologies and concepts hierarchy construction. In section 3 we compare related work and do some conclusions.

2 Ontology Integration Procedure 2.1 Ontology Viewpoint Every ontology to be integrated can be viewed as target ontology’s one viewpoint. To distinguish from the concept of viewpoint in requirements engineering, we call the viewpoint corresponding to ontology as ontology viewpoint. We can formalize ontology viewpoint as below. Definition 1 (ontology viewpoint). An ontology viewpoint is a tuple P is the name of ontology viewpoint. C is the concepts set in P. The element in R is a here r is the relation between and is a set of first-order predication formulas. It represents the constraints in P. is a set of first-order predication formulas. It represents the relation between P and other viewpoints. M is a set of first-order predication formulas. It represents the relation between concepts in P and concepts in other viewpoints. In definition 1, C and R form the conceptual graphs in is often used for inconsistency check in viewpoint. It corresponds to the axioms in ontology.

2.2 Inconsistency Checking and Management Different ontology created by different organization. There must be semantic inconsistency. As in multiple viewpoints theory in requirements engineering, checking and managing inconsistency between viewpoints and inconsistency in viewpoint is also important for our research. For an ontology viewpoint, the axioms in ontology are the constraints of concepts and relations in ontology. So the axioms can be used for inconsistency checking in ontology viewpoint. To check inconsistency in ontology viewpoint, we have definition below. Definition 2 (inconsistency in ontology viewpoint). For an ontology viewpoint P, if then we call there exists inconsistency in P. Inconsistency among ontology viewpoints are mainly structure inconsistency. We define it below.

692

K. Zhang, Y. Hu, and Y. Wang

Definition 3 (structure inconsistency). Compare to every ontology viewpoint’s structure, if there exists inconsistency, then this kind of inconsistency is called structure inconsistency. To cancel and manage inconsistency, we adjust relations between concepts in ontology viewpoint. Adjusting relations base on two kinds of operations below: 1) delete(r) ; Delete inconsistent relation r . 2) rewrite(r): Delete inconsistent relation r, at the same time create other relation to keep consistence.

2.3 Unite Conceptual Graphs by Concept Lattice We need to get target ontology from all ontology viewpoints. The base of target ontology is the conceptual graphs in every ontology viewpoint. But target ontology does not unite conceptual graphs casually. We need to generate the conceptual graphs of target ontology based on concept hierarchy in all conceptual graphs. We let the set of conceptual graphs in all the ontology viewpoints be a formal context, the aim doing this is to get a concept lattice[2] to represent concept hierarchy. In the formal context, all the classes in ontology viewpoints can be viewed as set of instances, all the attributes of these classes can be viewed as attributes of instance. The aim of constructing concept lattice is to rewrite conceptual graphs, figure 1 shows an instance of rewriting conceptual graphs.

Fig.1. Conceptual graph rewriting

In figure 1, concept A in graph 1 and concept B in graph 2 both are sub-concept of C in concept lattice. The common ground of A and B is in their own conceptual graphs they have the same concepts and relations associated with them. We use C (the least upper bound of A and B in concept lattice) replace A and B in graph 1 and 2, at the same time creates the relation which between A, B and C. The conceptual graphs rewritten cancels redundance and clarifies concept hierarchy. Though we obtain the conceptual graphs of target ontology, the work of ontology integration is not complete. The conceptual graphs still needs to be evaluated and modified by human experts (add validated association assertions). When this step completes, ontology integration completes.

Multiple Viewpoints Based Ontology Integration

693

3 Related Work and Conclusions Some researcher do much work in ontology integration field, the representational work can be classified as two classes: syntactical-level integration and concept-level integration. Omelayenko in reference [3] uses syntactical heuristic rules to complete ontology integration work in e-business. Stumme in reference [4] introduce the FCAMERGE approach which is bottom-up, FCA-MERGE merge two ontologies by formal concept analysis. These approaches above are mainly at the angle of technology. They seldom consider the methodology of ontology integration. In the paper, we view an ontology to be integrated as a viewpoint of target ontology. We use multiple viewpoints theory in requirements engineering in ontology integration. The aim is to do some research for methodology of ontology integration. The practice proves that our research is valuable.

References 1. Gruber, T. R.: A translation approach to portable ontology specifications. Knowledge Acquisition, Vol. 5. 1993. 2. Rudolf, W.: Concept lattices and conceptual knowledge systems. Computers and Mathematics with Applications, 23(6-9):493-515, 1992 3. Omelayenko, B.: Syntactic-Level Ontology Integration Rules for E-commerce. Proceedings of The 14th International FLAIRS Conference (FLAIRS-2001), May 2001. 4. Stumme, G., Maedche, A.: FCA-MERGE: Bottom-Up Merging of Ontologies. In IJCAI, pp. 225–234,2001.

Automated Detection of Design Patterns Zhixiang Zhang and Qinghua Li Department of Computer Science Huazhong University of Science and Technology Wuhan, Hubei Province, China 430033 [email protected]

Abstract. Detection of instances of design patterns is useful for the software maintenance. This paper proposes a new framework for the automated detection of instances of design patterns. The framework uses a reengineering tool to analyze C++ source codes. Prolog is used to induce instances of design patterns, the elemental design patterns are also used as a intermediate results for the final target (design patterns). Two-phrased query makes the discovery process more efficient.

1 Introduction A pattern provides knowledge about the role of each class within the pattern, the reason for certain relationships among pattern constituents and/or the remaining parts of a system. Consequently, in maintenance, the identification of design pattern instances provides insight on software artifact structure and reveals places where changes, reuse, or extensions are expected. Moreover, design patterns can also give some indications to managers about the quality of the overall system. Design pattern is a relatively young field, few works in program understanding and reverse engineering have addressed design pattern detection[1][2] [3] [4] [5]. Most of them used only structural information about the system, or can only recognize several patterns. This paper uses a new representation of structural and behavioral features of design patterns, designs a system for the automated detection of design pattern instances, and discusses the related techniques.

2 System Frameworks The automated system for design pattern instances adopts three techniques: 1. Use Prolog to induce instances of design patterns. The Prolog facts represent the structural information between elements in a design patterns; A design pattern is represented by one or several Prolog rules. 2. Use the elemental design patterns (EDPs)[6] as an intermediate results for the final target (design patterns). These EDPs capture the elemental components of objectM. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 694–697, 2004. © Springer-Verlag Berlin Heidelberg 2004

Automated Detection of Design Patterns

695

oriented language, and the salient relationship used in the vast majority of software engineering. Because of the calling information amongst classes and methods of classes, EDPs are considered the most completed method to formalization design patterns. Most patterns are composed of several EDPs. 3. A reengineering tool called Columbus Schema for C++ was used to convert C++ source codes to intermediate representation of classes and the high level structural relationship between classes. Fig 1 shows the framework of the automated system for design pattern instances detection. The output of the Columbus Schema for C+ + was converted to the design description files. The automated detection of design pattern consists three steps: 1. Design Prolog rules based on the structural features of the design patterns; 2. Convert specific source codes into structural information files, convert these files into Prolog facts; 3. Find instances of design patterns by Prolog queries, and commit the results to users in some ways.

Fig. 1. The pattern detection framework

3 Implementation 3.1 Elemental Design Pattern As the number of facts in Prolog describing the structure of a large software system becomes larger, the efficiency of searching may become lower. So we need a certain level of abstraction about the structure s of design patterns. Here we use the EDPs as intermediate search results. For example, Fig 2 shows that the Decorator pattern is composed of Object Recursion and ExtendMethod EDPs. If we can identify an Object Recursion EDP instance and an ExtendMethod EDP instance, we can get a Decorator pattern.

696

Z. Zhang and Q. Li

3.2 Facts and Rules The structural relationship can be mapped into the Prolog facts. Because variables in Prolog cannot contain the “.”, the representation of method in a class should be converted. For example the “Decorator.Draw” is converted to “ Decorator_Draw”.

Fig. 2. Decorator pattern composed by Object Recursion and ExtendMethod EDPs

Instance of design pattern and Elemental Design pattern are represented by an unique identifier and a set of participants: For example, pattern(FactoryMethod, Creator, Product, CCreator, CProduct, FacMethod) represents a FactoryMethod pattern consisting of Creator, Product, CCreator and CProduct, where FacMethod is the factory method. The rule corresponding to the ObjectRecursion EDP is: edpPattern(object Recursion,Handler,Recurser,Terminator,Initiator):-

Automated Detection of Design Patterns

697

According to the relations amongst between ObjectRecursion, ExtendMethod and Decorator, we can redefine the rule for query of Decorator pattern based on the Object Recursion and ExtendMethod EDPs:

4 Conclusion This paper implemented a automated system for design pattern instances. The Prolog rules for each patterns and EDPs gather the features required to diagnose a pattern instance. As described above, the process composed of two phrases. The benifits of 2-phased recognition are : 1. EDPs themselves are the core primitives that underlie the construction of patterns in general. They are useful for the understanding of the code; 2. As intermediate results(facts), EDPs can reduce the cost greatly during the Prolog query, which can significantly advance the efficiency. Further work could also try to study the formalization method of design patterns to precisely describe the design patterns. The more precisely the design patterns are described, the higher precision and recall can be achieved.

References 1. Christian Kramer , Lutz Prechelt, “Design Recovery by Automated Search for Structural Design Patterns in Object-Oriented Software”. In:International Workshop on Program Comprehension, pp. 208-215 2. Rudolf K. Keller Reinhard Schauer Sébastien Robitaille Patrick Pagé. “Pattern-Based Reverse-Engineering of Design Components”. In Proceedings of International Conference on Software Engineering (ICSE’99), Los Angeles, USA, May 1999. 3. G. Antoniol, G.Casazza “Object-oriented design patterns recovery” The Journal of Systems and Software 59 (2001)181-196 4. Jochen Seemann , Jürgen Wolff von Gudenberg, “Pattern-based design recovery of Java software”, ACM SIGSOFT Software Engineering Notes, v.23 n.6, p. 10-16, Nov. 1998. 5. Antoniol, G. Casazza, M. Di Penta, R. Fiutem - “Object-Oriented Design Patterns Recovery”, Journal of Systems and Software n.59,p181-196 (2001). 6. Jason McC. Smith and David Stotts, “Elemental Design Patterns: A Logical Inference System and Theorem Prover Support for Flexible Discovery of Design Patterns”, Technical Report TR02-038 Department of Computer Science Univ. of North Carolina at Chapel Hill Sep. 2002.

Research on the Financial Information Grid Jiyue Wen1,2 and Guiran Chang1 1

College of Information Science and Engineering, Northeastern University Shenyang, Liaoning 110004, China [email protected] 2

No. 208 Yanan Three Road Qingdao, Shandong 266071, China [email protected]

Abstract. An Information Grid uses grid technologies to achieve the sharing, management, and service providing of information resources. In this article, the demand for grid technology by the financial industry is analyzed and the architecture of a financial information grid is proposed.

1 Introduction The financial industry can use grid technology to integrate financial information, to operate in a multi-channel manner, to strengthen the supervision and management, to guard against and solve financial risks, and to improve the efficiency and quality of financial services. Section 2 of this article analyzes the demand of financial industry for grid technology. Section 3 discusses the architectural idea of a financial information grid, considering the heterogeneity in the financial industry. Section 4 presents implementation proposal and plan based on the real conditions of financial industry today.

2 The Demand for Grid Technology by the Financial Industry Information need of financial industry come from inside and outside. Inside information need is related closely to the daily operations and management of financial businesses. The contents include credit loan information, deposit information, settlement information, savings information, international business information, trust business information, and leasing business information. The outside information need is related to the integrated management and decision-making of financial businesses. The contents include macroeconomic information, financial policy information, industrial policy information, domestic and abroad financial organization information, financial risk prevention information, and other financial computerizing information. Against the demand and management for the above two kinds of financial information, banks have constructed their own intranets. However, these networks run separately on different architectures and operating systems. Between the central

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 698–701, 2004. © Springer-Verlag Berlin Heidelberg 2004

Research on the Financial Information Grid

699

bank and the commercial banks, there is no interconnection. The central bank gets data reports of the commercial banks passively. So the non-spot data are not timely, authenticated, and reliable. This leads to the afterwards supervision by the central bank and the afterwards monitoring by the commercial banks. Thus the traditional model for financial information statistics and audit and the spot and non-spot checking do not meet the requirement to the monitoring of the operational legality or risk prevention of commercial banks. The central bank must find a new collateral supervision system allowing it to log on the computer systems of the commercial banks to actively collect the latest original operation data. In this way, the central bank will know the general operations of various banks timely and correctly, be easy to find operational neglects of those banks, and be able to make objective and feasible financial policies to guide the financial industry to run legally. One solution is to use the grid technology.

3 Design Idea of the Architecture of a Financial Information Grid Globus is a platform for the construction of grid infrastructure, which serves as a grid operating system and undertakes the task of administrating grid resources. Owing to the complexity of commercial bank systems, the customization of grid function becomes exceptionally difficult and complicated. To solve this problem, a Financial Grid Middleware (FGM) can be built between Globus and the application programs, which is a distributed heterogeneous computing environment to help the program transformation between commercial banks. FGM is built as a middleware connecting both the application programs and the grid infrastructure, just like Cactus. FGM is not created on Globus, but uses Globus as a branch of the main body of the financial network system to let the application programs with MPI (Message Passing Interface) run on the grid unmodified. This is shown in Fig. 1. The first layer of this architecture is the fabric layer whose basic function is to control local resources and to provide to upper layers with interfaces for accessing these resources, which broadly include calculate resources, memory resources, network resources, data and information resources, etc. It can be a host computer, a preprocessing computer, or the whole computer cluster of a certain commercial bank. The second layer is Globus layer.Its grid service functions serve as a grid operating system to solve problems such as resource discovery, validation, reservation, and the management of memory, communication or safety. It allocates resources through DUROC (Dynamically Updated Request Online Co-allocator) and provides parallel programming interface for heterogeneous grid environment by MPICH-G2. The third layer is FGM, including financial network trunk and branches. The trunk is the basis of FGM, which provides a group of API for the branches to be linked dynamically in a plug and play manner and coordinates and controls data transfer and program execution between branches. A branches here is a set of software modules or some subprograms written in C, C++, Fortran, or Java language for abstract virtual computers without considering the complicated grid environment. It can be easily

700

J. Wen and G. Chang

Fig. 1. The architecture of a financial finormation grid. FA: Financial Application AP: Application Program, L-DC: Long-Distance Control, WI: Web Interface, I/O PK: I/O Package, FH-C: Financial Host-Computer.

ported into existing application programs of the financial institutions. The branches can be divided into two types: application branches and grid branches. The application branches can be used for account checking, information index, statistics, or supervision criterion function, etc. The grid branches provides functions for parallel computation, I/O, Web interface, high-performance communication, and data mapping, supporting the applications. The fourth layer is the application layer on which users can operate directly. The users can be counter clerks, post monitors, statisticians, policy decision-makers, or spot and non-spot auditors. They can get any results obtained from inquiry, checking, or calculation only by telling the terminal what they want to do, such as the data for making statistics, the items for checking, the supervision index for calculation, or the precision and range of answers, etc. They do not need to consider the heterogeneous environment, the resources being dynamically coordinated, or the optimization algorithms used.

4 Suggestions to the Implementation of a Financial Information Grid Considering the complexity of the financial institutions, the heterogeneity of the network systems, the information need of the financial industry, the incomplete support of current technology, and the future goal, the application of the grid technology can start from the auditing, reporting, and supervision. The development should not affect the normal operation of the commercial banks and should be carried out step by step. Under the present networking condition, the central bank can set up financial grid data processing center to optimize system performance, improve application

Research on the Financial Information Grid

701

environment, administrate the equipment resources and information resources of all the banks, and provide timely and accurate financial information for statistics, monitoring, and auditing. The collateral supervising system makes it possible for special banks to monitor the operation flow of their front offices, to eliminate operative holes, and to avoid factitious risks. The second is to set up broadband networks. The broadband network systems are essential for the grid environment to provide high performance communications. The high quality broadband networks support “connect and play” of computing capacity and information gathering. It also provides users with non-delay and high reliable communication services. The Grid may take advantage of the current automatic banking systems of various banks, the satellite-terrestrial communication system and the transfer system of the central bank, increasing the bandwidth of these systems, improving the communication ability, and ensuring the inter-operability of grid resources. The third is to set up Financial Certification Authority. The security of grid applications depends on digital identity certification. The grid issues an X.509 certificate for each of the statisticians, supervisors, and auditors whose digital signature is required whenever they request to log on. Once an applicant logs on, he or she can access all the authorized resources. The leading bank may be The People’s Bank of China who organizes the commercial banks to establish a national authorized financial certificate agency, China Financial Certification Authority (CFCA), to be responsible for the digital identification on financial grid. The ITU X.509V3 standard is used and the international specification for grid data signature is followed. The grid management software is the key element of financial information grid service. The core techniques include integrated information platform (single system image), semantic web, intelligent agent, and ontology, etc. The grid operating system can be applied to the preprocessing computers with their heterogeneous financial information report systems, and the application programs are upgraded to grid programs. Thus the features such as financial information statistics, supervision, and real time sharing can be realized.

References 1. Zhihui Du, Yu Chen, Peng Liu: Grid Computing. Tsinghua University Press (2002) 2. Jin Chen: E-Commerce – Finance and Safety. Tsinghua University Press (2000) 3. The Research and Liaison Group of Financial E-Banking System: E-Commerce – Safety Certificate and Settlement on Internet. People’s Press (2000) 4. Globus Grid Computing Theory and Applications. www.gridcn.net 5. Financial computerizing. Issues 4, 5, and 6, 2002 6. Lan Foster, Carl Kesselman, Steven Tuecke: The Anatomy of the Grid – Enabling Scalable Virtual Organizations. Int. J. Supercomp. Appl. and High Performance Computing (2001) 7. Ian Foster, Carl Kesselman, Jeffrey Nick, Steven Tuecke: The Physiology of the Grid: Open Grid Service Architecture for Distributed Systems Integration. GGF4 (2002)

RCACM: Role-Based Context-Awareness Coordination Model for Mobile Agent Applications* Xinhuai Tang, Yaying Zhang, and Jinyuan You Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200030, P.R.China {tang-xh, zhang-yy, you-jy }@cs,sjtu.edu.cn

Abstract. In this paper, we present an RCACM coordination model for mobile computing applications based on mobile agents. The key idea in RCACM is the role-based context-awareness hierarchical coordination model. In this model, local programmable reactive tuple space is introduced to address the contextaware coordination problems and hierarchical distributed tuples can make agents dynamically acquire information about resource location and availability according to their permissions; role mechanism is adopted for access control to prevent unauthorized access.

1 Introduction In mobile computing systems, an application may be composed of several mobile agents that cooperatively perform a task[1]. Multiple mobile agents are in need of coordinating their activities with each other, and also accessing resources on hosting execution environments. Furthermore, when an agent transfers to a new environment, the interaction information it accesses and the outside world it perceived might have changed. For an agent, its execution result on one site may be different from the execution result on other site because of the different execution environment. So the migration of mobile agent introduced context-aware coordination issues [2]. Generally, coordination technologies concern with enabling interaction among agents and help them cooperate with each other [3]. However access control should also be considered to constrain interaction to ensure data privacy and integrity, especially when agent mobility is introduced. At present the combination of coordination and access control remains an open problem in the design and implementation of mobile agent applications. This paper aims to present a role-based context-aware coordination model(RCACM) in mobile agent applications. We focus on contextawareness secure coordination, that is, the coordination of activities with context awareness due to agent mobility and with insurance of data integrity. *

This paper is supported by the Shanghai Science and Technology Development Foundation under Grant No. 03DZ15027 .

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 702–705, 2004. © Springer-Verlag Berlin Heidelberg 2004

RCACM: Role-Based Context-Awareness Coordination Model

703

2 Role-Based Context-Aware Coordination 2.1 Overview In order to overcome weakness and still making use of advantages Linda tuple space has, we rebuild the tuple space from passive data storage to reactive programmable coordination media, which can embody computing capabilities within the tuple space. As a consequence, the role-based context-aware coordination model(RCACM) in mobile agent applications is proposed, which extends classical Linda model. There is no global tuple space in the mobile agent system. Instead, multiple distributed tuple spaces are used for agent coordination. The model transfers the global coupling interactions to local uncoupling interactions. Agent interactions take place on the local execution environment on the destination site of migrating agent. Local space can be accessed in an asynchronous and anonymous way as in Linda. Therefore, interaction tuple space can act as a gateway for agent to access resources in the network. RCACM supports a coordination paradigm where agents can migrate from one computing environment to another to interact with each other. The architecture of rolebased context dependent coordination is as in Fig.1.

Fig. 1. Architecture of role-based context-awareness coordination

In RCACM, withoutglobal shared tuple space, multiple tuple spaces are distributed over the network to assist the coordination task. Reactive programmable tuple space is responsible for handling coordination activities, including environment-aware and application-aware coordination.

2.2 Role Mechanism and Reactions in Tuple Space Coordination and access control are two strictly related topics in open distributed applications. One can easily imagine malicious agents attempting to access private information or modify private data. So a server receiving interaction of external agent needs to impose some requirements to ensure no violation to security concerns of host

704

X. Tang, Y. Zhang, and J. You

server. Similarly, the external agent also needs to ensure that its execution at the host server site will not lose its integrity or security. A role can be defined as the behavior and the set of the capabilities expected for the agent that plays such roles. An agent might hold multiple roles. Every role has its corresponding capabilities. We use {read, take, write, execution, new,...} to indicate the set of agent operation abilities. Every element indicates the capability to execute the corresponding operation. Assigning roles to an agent means to associate with it capabilities that describes all the operations the agent intends to perform, while ignoring the specific location it will execute on. In RCACM, tuple space is programmable and reactive. Site manager at every site in the mobile agent system can implement and enforce application specific and local environment specific policies by programming the behaviors of tuple space. Operation behaviors can be associated with specific events.

2.3 Environment-A ware and Application-Aware Coordination In mobile agent applications, there are always both agent related application specific policy and environment specific policy. When a site in the network opens itself to external agents for execution, it must prevent malicious agents from damaging its data and resource. On the other side, when an agent migrates to a new site, it cannot predict the policy of the site[4]. If mobile agent needs to handle all unexpected policy related problems, this would certainly make it even more complex and make the system difficult to be scalable. So with programmable reactive tuple space, environmentawareness policies such as security policies can be integrated into the behavior of tuple space. Mobile agents can be under restrictions on the new site transparently. In RCACM, when an agent arrives at a site, it is bound to the tuple space on the destination site and can use it to coordinate itself with other agents and to access local resources. The local tuple space is implemented as Java objects. Every tuple is implemented as a Java object. To define the environment infrastructure, an administrator has to choose the roles the environment supports and define the application and environment policies to coordinate activities via the tuple space.

3 Conclusion Mobile agent system demonstrated great potential in designing and implementing complex distributed and concurrent software systems. It has involved in many applications such as e-commerce, remote information retrieval, remote diagnostic clinic and military war simulation. We propose a role-based context-awareness coordination

RCACM: Role-Based Context-Awareness Coordination Model

705

model, which is suitable for interactions between agents and between agents and environment in mobile agent systems. The model consists of three parts. (1) A role mechanism is presented for security concerns; (2) Global coupling interaction space has been changed to locally uncoupling interaction space to facilitate information access for mobile agents; (3) Programmable reactive tuple space is used to solve the problem introduced by context-awareness coordination. Environment-awareness and applicationawareness coordination policies can be integrated into the tuple space.

References 1. Gian Pietro Picco, Mobile Agents: An Introduction, Journal of Microprocessors and Microsystems, 25(2)(2001) 65-74. 2. Giacomo Cabri, Letizia Leonardi, Engineering Mobile Agent Applications via Contextdependent Coordination. IEEE transaction on software engineering 28(11) (2002) 10401056. 3. M. Cremonini, A. Omicini, F. Zambonelli, Coordination and Access Control in Open Distributed Agent Systems: The TuCSoN Approach, Proceedings of 4th International Conference on Coordination Languages and Model (COORDINATION 2000), LNCS 1906, Springer, Limassol, Cyprus, 2000, pp. 99-114. 4. Davide Rossi, Giacomo Cabri, Enrico Denti, Tuple-based technologies for coordination, In A. Omicini, F.Zambonelli, M. Klusch, R. Tolksdorf (Eds.), Coordination of Internet Agents, Springer, (2001)83-109

A Model for Locating Services in Grid Environment Erfan Shang, Zhihui Du, and Mei Chen Department of Computer Science and Technology, Tsinghua University, Beijing, 100084 [email protected]

Abstract. A model for locating services in grid is described here. It integrates the Grid framework OGSA (Open Grid Services Architecture) [1], using VO (Virtual Organization) [2] concept to divide logic grid services into different organizations based on its establishing purpose and requirement on resources sharing and services providing. The CARP hash-based information caching mechanism and a hierarchy message dissemination arithmetic are presented. A performance evaluation of the algorithm is analyzed theoretically.

1 Introduction We present a hierarchy message dissemination arithmetic and integrate web cache sharing technique with our service location mechanism. The establishment of each virtual organization has its purpose and requirement on resources sharing and services providing. In general, VO domain is a collection of services which have logically close localities and similar attributes. We need characterize the collection’s similarities by means of describing VO’s properties. The relation among ubiquitous virtual organizations in grid system is flat. We establish a distributed grid service information model and the performance evaluation of the information propagation arithmetic is analyzed theoretically. Our purpose is to combine CARP [3] protocol with service location mechanism. Some will associate the ICP [4] protocol with Gossip protocol [5], a typical multicast group communication protocol. The difference between them is that ICP protocol is a concentrative Gossip alike which disseminates messages only among proxies. CARP is a hash-based routing mechanism, through which the result is a deterministic location for all cached information. This mechanism is similar with Plaxton-based [6] distributed systems, including Tapestry [7], Pastry [8] etc.

2 Grid Service Locating Model Model uses Virtual Organization to divide grid services into different organizations logically. Each node knows partial services information in its virtual organization and a few services information of other organizations. Every kind of service set a Grid

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 706–709, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Model for Locating Services in Grid Environment

707

Service Identification (GSID). This ID is categorized by standard service taxonomy, in order to let similar service attributes map to close service taxonomy code number. There are two types of nodes in VO: general node and VO server. Servers are capable of employing for a long time stably and reliably. Every node knows the servers’ locations of its own VO. The model adopts a two-level hierarchy framework. First level is the proxy server based on CARP protocol and second level is hierarchy Gossip mechanism. So servers act as CARP proxy server, collect information of its own organization to maintain characteristics of its VO and exchanges VOs’ information between different VO servers. VO server is transparent to other VO’s general nodes. VO server maintains two kinds of information: 1) service location and description cached by means of CARP, 2) other VOs’ properties and other VO servers location. There are three types of information operations in our message protocol: 1) General nodes positively send local GSIDs and request times to their VO servers to maintain domain properties. Servers synchronize data among themselves periodically. 2) Information publication operation: general nodes propagate latest service information to other general nodes by Gossip protocol and to servers based on CARP. 3) Information request operation: it is described at Sect. 3.1.

3 System Architecture 3.1 Message Propagation Algorithm The model is a two-level hierarchy framework. The core of the algorithm is composed of CARP routing and hierarchy (dissemination between VO and within VO) Gossip message protocol. The request message propagation process is following: 1) A service request including GSID is sent to the service’s default VO server. 2) If qualified service information is found, go to 6). 3) If no qualified cache information, server searches in its local corresponding information, and forward request to other VOs’ server by Gossip mechanism. 4) Request disseminates in organizations with a TTL (Time-to-Live). If query hit qualified information, go to 6). 5) If time is out, no information hit, go to 6). 6) If qualified information is found, server returns the reservation handle to client. Or the “no qualified information is found” message answers to the requester. The propagation process stops.

3.2 VO Server Information Architecture VO server acts as the VO local representative and provides the caching storage. So the information architecture is an important issue for efficient routing and locating service. It contains three layers. The bottom layer, called local cache, maintains service information table including GSID, URL and service description items, according to the CARP routing. The second layer is Global Cache: servers share cache contents and coordinate replacement so that they appear as one unified cache with global LRU replacement to the users [9]. The top layer is the neighbor VOs’ information. It contains the service requests’ rate served by the local VO and the GSID range of this rate.

708

E. Shang, Z. Du, and M. Chen

4 The Theoretic Analysis of the Model To simplify the model, some assumptions are given. N is used to represent the grid node number, every virtual organization has equivalent M nodes. We consider that all nodes share the same number of services, and have the same service frequency. The number of distinct user requests is in the random distribution. In [10], if each node gossips exactly k-logN+C (N is the grid size and C is a constant) messages, then the probability that everyone gets the message goes to exp(exp(-c)). When the grid size is we set C=0, then every node multicast logN=16.118 messages to its neighbors. This should give 36.8% of covering propagation. Table 1 presents the load of message dissemination for node. The fanout parameter EX is the expectation of every node message dissemination in condition of domain management and hierarchy Gossip routing framework. The approximate expressions is

It can be seen that the expectation of every node load is about 62% or less of normal Gossip mechanism, when the VO size is change from 1000 to 20000. While the VO size M is bigger, the load of grid node is heavier.

Cache sharing mechanism is integrated in the following. Suppose that only one server per VO, the average cache hit ratio is H. According to some resent research [9], hit ratio H is about 30% in distributed system, which means that the expectation of every node message dissemination number will reduce to We analyze the average number of hops per request in our model. Hierarchy Gossip arithmetic divides participant nodes into domains so as to improve controlling hop of request. The CARP protocol with high probability implements one hop qualified process in some degree. It decreases hop number to (1-hit ratio)*(hop number).

5 Related Work, Conclusion, and Future Work Universal Description, Discovery and Integration (UDDI) [12] is a specification for distributed Web-based information registries of Web services. UDDI is actually a

A Model for Locating Services in Grid Environment

709

central registry. The information synchronization pattern in our VO server will refer to UDDI. Existing peer to peer substrates such as Tapestry [7] and Pastry [8] demonstrate distributed hash tables. CARP protocol is similar hash-based routing mechanism. The difference between them is that in Tapestry or Pastry, all nodes in system participate information storage, while in CARP proxies maintain the information, other nodes only provide services. Monitoring and Discovery Service (MDS) [13] is a prominent grid information service. It proposes that virtual organization is the basic logical information management unit. the drawback is MDS is also a central registry. We propose to integrate web cache sharing technique with domain service management (virtual organization) to improve the efficiency of locating service. We should pay more attention to standardization of service description and identification in grid environment, in order to let services which have similar attribute correspond to close service taxonomy number and give a reference to service provider. We will build an emulated gird to understand whether our model and mechanism is appropriate in terms of response time, response quality and scalability. We will consider more parameters to our experimental environment, such as logical distribution of services, distinct user request pattern, node number and service number.

References 1.

2. 3. 4. 5. 6. 7.

8. 9. 10. 11. 12. 13.

Foster, I., Kesselman, C., Nick, J. M., Tuecke, S.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Technical report, Argonne National Laboratory (2002) Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid. Enabling Scalable Virtual Organizations. International Journal of Supercomputer Applications (2001) White Paper Cache Array Routing Protocol and Microsoft Proxy Server 2.0 (1997) ICP working group. National Lab for Applied Network Research. Kempe, D., Kleinberg J., Demers, A.: Spatial gossip and resource location protocols. Proc. 33rd ACM Symp. on Theory of Computing (2001) 163-172 Plaxton, C. G., Rajaraman, R., Richa, A. W.: Accessing nearby copies of replicated objects in a distributed environment. In Proceedings of ACM SPAA. ACM, June (1997) Zhao, B.Y., Kubiatowicz J. D., Joseph A. D.: Tapestry: An infrastructure for fault-resilient wide-area location and routing. Technical Report UCB//CSD-01-1141, U. C. Berkeley, April (2001) Druschel P., Rowstron A.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. Submission to ACM SIGCOMM (2001). Fan L., Cao P., Almeida J, et al: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. Tech. Rep. 1361, February (1998) Kermarrec, A., Massoulie, L.: A Ganesh. Reliable probabilistic communication in largescale information dissemination systems. MSR-TR-2000-105 Iamnitchi A., Ripeanu, M., Foster I.: Locating data in (small-world?) peer-to-peer scientific collaborations. in 1st International Workshop on Peer-to-Peer Systems (2002) UDDI project. http://www.uddi.org Fitzgerald, K., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. On High Performance Distributed Computing, IEEE Press (2001) 181-184

A Grid Service Based Model of Virtual Experiment* Liping Shen, Yonggang Fu, Ruimin Shen, and Minglu Li Department of Computer Science & Engineering, Shanghai Jiaotong Univ., HuaShan Rd. 1954#, Shanghai, 200030, China {lpshen,fyg,rmshen,[email protected]}

Abstract. There is increasing recognition of the need for laboratory experience that is through these experiences that students could deepen their understanding of the conceptual material, especially for the science and engineering courses. Virtual Experiment has advantages over physical laboratory at many aspects. Nowadays virtual experiments are mostly stand-alone applications without standard interface, which are difficult to reuse. In this paper, we propose a virtual experiment model based on novel grid service technology. We employ two-layered virtual experiment services to provide cheap and efficient distributed virtual experiment solution. This model could reuse not only virtual instruments but also compositive virtual experiments.

1 Introduction Virtual Experiment (VE) is powerful application software system which could provide students highly immersion and rich experience. It has many advantages over physical laboratory. It is a cost effective way to leverage expensive equipments and maintain physical laboratory, and provides concurrent on-line instruction, visualization, repeated practice and feedback breaking the geographical, lab space and time constraints. It also could provide experiments that can’t really be done in the physical lab, e.g. simulation of a nuclear power plant. Finally, it enables convenient and economic access to expensive and specialized instruments reuse through remote control, and enables cooperative experiment and research. Early players of VE include Virtual Physics Laboratory [5] in University of Oregon, Control the Nuclear Power Plant [6] in Swedish Linkopings University, The Interactive Frog Dissection [7] in University of Virginia etc. have common drawbacks: the components of VE are difficult to reuse and technology used is typically beyond an average educator. It is an urgent requirement for us to devise an intelligent mechanism for teachers to design a VE without much unnecessary effort. The outline of this paper is as follows. Section 2 set forth the layered structure of VE Services, which base on the grid services and Globus Toolkit 3. The model of the VE grid employing VE services is introduced at section 3 and section 4 concludes this paper.

* This paper is supported by 973 project (No.2002CB312002). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 710–714, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Grid Service Based Model of Virtual Experiment

711

2 Virtual Experiment Services Our proposed VE architecture is based on the widely acknowledged middleware product, the newly released version of Globus (GT3) [2]. Fig.1 describes the layered architecture of VE Services. The VE Services are organized in two hierarchical levels: the core VE Services layer and the high-level VE Services layer.

2.1 Core VE Services Layer This layer employs basic grid services to provide data and resource management. For the VE Services, data are the input/output data of the VE, while resources include Virtual Instruments (VI), analysis & visualization tools, constraints computing tools and stored VE processes besides the generic grid resources such as CPU, memory and database. VI is the main component of a VE, while constraints denote the VE principles, the expressions holding in the VIs together. The Core VE Services layer comprises three main services. VE Directory Service (VEDS). VEDS extends the basic Globus Monitoring and Discovery service and it is responsible for maintaining a description of all resources used in the VE grid and responding to queries of available resources. The metadata information is presented by XML documents and is stored in a VE Metadata Repository (VEMR). Another important repository is the VE Knowledge Repository (VEKR). VEKR stores and provides access to VE process performed within the VE Grid. It warehouses the VE’s process information (past experience) and allows this knowledge to be reused. Once users have constructed successful VE processes they wish to be re-used, they can publish them as new services. In order to enable this function, we need uniform description of a VE. The information needed here include the organization of the resources, the resources metadata description, the steps of the process and the experiment principles (constraints).

Fig. 1. Layered Architecture of VE Services

712

L. Shen et al.

Resource Allocation Service (RAS). This service is used to find the best mapping between a VE design and available resources, with the goal of satisfying the application requirements (network bandwidth, latency, computing power and storage) and grid constraints. RAS is directly based on the Globus Resource Allocation Manager services. The location where each service of a VE is executed may have a strong impact on the overall performance of the VE. When dealing with very large data, it is more efficient to keep as much of the computation as near to the data as possible [4]. To create a reasonably responsive virtual experiment, a compromise has to be made to balance the requirements of communication and computation [3]. A simple allocation algorithm leveraging the above considerations is used to determine the “best” resources as follows. 1. CompuTime=Typical Execution Time stored in VEMR; 2. CommuTime=inLatency + inData/inBandwidth+ outLatency + outData/inBandwidth; 3. Coex=1.2; 4. ExecTime= CompuTime + CommuTime*coex 5. Rank= 1/ExecTime; Line 1 gives the computation time which is estimated as the Typical Execution Time stored in VEMR. Line 2 computes the time needed to transfer the input/output data, where inLatency/outLatency is the network latency of the input/output channel, inData/outData is the amount of the input/output data measured by bit and inBandwidth/outBandwidth is the bandwidth of the input/output channel. Line 3 and 4 gives the value of ExecTime where we give more power to CommuTime because communication time is prone to gain by reason of congestion. Finally rank is the reciprocal of ExecTime, which is the basis for selection. Data Management Services (DMS). The DMS is responsible for the search, collection, extraction, transformation and delivery of the data required or produced by the VI, analysis & visualization tools, and constraints computing tools. Data produced by a remote service may be either stored at the same host of the service executed or collected at a central database, or transferred to next service directly. This information is managed by DMS. DMS service is based on the Globus GridFTP and Replica Location services. The goal of DMS is to realize individual warehouse, a single, large, virtual warehouse of a VE data. It deploys a data grid for a VE. 2.2 High-Level VE Services Layer

This layer is the programming interfaces for VE application developers. Main services are as follows. VI Access Services is responsible for the search, selection, and deployment of distributed VIs, employing the services provided by VEDS and RAS. The VIs may be simulation software, or remote control physical instruments. They may be implemented as java applet which could be downloaded to the client side, or a web service which will be run at server side or a grid service which will be executed in a Virtual Organization [1]. Tool Access Services is responsible for the

A Grid Service Based Model of Virtual Experiment

713

search, selection, and deployment of distributed VE tools, which may provide services for data analysis and management, VE constraint computing, and data visualization. Result Presentation Services is a significant step in the VE process that can help students in the VE result interpretation. This service specifies how to generate, present and visualize the data produced by VI and analysis tools. The result could be recorded and stored either as XML format or visualization format.

3 Model of Virtual Experiment Grid After the general description of the VE Services, here we describe how they are exploited to model the VE grid. Fig.2 shows the different components of the VE grid. In this model teachers and students at the client side could access the resources at the back-end through VE services transparently.

Fig. 2. Model of the Grid Service Based VE

The clients are environments for authoring, executing VE and accessing VE Services. A VE Authoring & Executing Tool (VEAET) is offered at client side. VEAET provides services for teachers to design VE plans easily, and for students to execute VE plans. A VE plan is represented by a graph describing resource composition. A node in the plan graph denotes access to one of the distributed resources including VI, tools etc, and a line between nodes describes the interaction and data flows between the services and tools. With this visual tool, a teacher can directly design the VE plan by selecting and dragging. A VE plan could be recorded and stored as XML format locally or published remotely. When a VE plan is loaded and set to startup, it will firstly get initialized by VEAET. VEAET, acting on the user’s behalf, contacts a VE registry that a relevant Virtual Organization maintains to identify VE service providers. The registry returns handles identifying a VE Services that meet user requirements. Then VEAET issues requests to the VE services factory specifying details such as the VE operation to be

714

L. Shen et al.

performed, and initial lifetimes for the new VE service instance. Assuming that this negotiation proceeds satisfactorily, a new VE service instance is created with appropriate initial state, resources, and lifetime. The VE service, afterwards, initiates queries against appropriate remote VIs, tools and constraints computing, acting as a client on the user’s behalf. Appropriate factories of the relevant resources are selected and then returned from the VE services to the client VEAET. The VEAET is responsible for activating execution on the selected resource as per the scheduler’s instruction and then binds the new service instances to the VE plan. A successful outcome of this process is that a VE plan is transformed into an executable VE. During the execution course, VEAET periodically updates the status of VE execution and records the VE process as XML format. Teachers and students could publish a successfully executed VE process through VEDS for further reuse.

4 Conclusion and Future Work The Grid Services infrastructure is growing up very quickly and is going to be more and more complete and complex both in the number of tools and in the variety of supported applications. In this paper, we propose a VE model based on novel grid service technology. This model puts forward two-layered VE Services to provide cheap and efficient distributed VE solution. This model could reuse not only VIs but also compositive VEs. Moreover, we provide a visualized VE authoring tool for teacher to design an experiment with little effort. In order for the comprehensive communication between the VE environment and VIs, future work will focus on standardization of the virtual instrument interfaces and VE workbench APIs.

References 1.

2. 3.

4.

5. 6. 7.

Foster et al.: The physiology of the grid, an open grid services architecture for distributed systems integration. Tech. report, Open Grid Service Infrastructure WG, Global Grid Forum (2002) Thomas Sandholm, Jarek Gawor: Globus Toolkit3 Core-A Grid Service Container Framework. http://www-unix.globus.org/toolkit/3.0/ogsa/docs/gt3_core.pdf(2003) Chuang Liu, Lingyun Yang, Ian Foster and Dave Angulo: Design and Evaluation of a Resource Selection Framework for Grid Applications. Proceedings of the 11th IEEE Symposium on High-Performance Distributed Computing(2002) Vasa Curcin and Moustafa Ghanem et al.: Discovery Net: Towards a Grid of Knowledge Discovery. Knowledge Discovery and Data Mining Conference(2002), ACM 1-58113567-X/02/0007 Virtual Physics Laboratory. http://jersey.uoregon.edu/vlab/ Control The Nuclear Power Plant. http://www.ida.liu.se/~her/npp/demo.html The Interactive Frog Dissection. http://curry.edschool.virginia.edu/go/frog/

Accounting in the Environment of Grid Society* Jiulong Shan, Huaping Chen, Guoliang Chen, Haitao Tian, and Xin Chen Department of Computer Science and Technology, University of Science and Technology of China 230027 Hefei, Anhui, China {jlshan, tht, chxin}@mail.ustc.edu.cn {glchen, hpchen}@ustc.edu.cn

Abstract. Grid and P2P are both emerging technologies that aim at efficient resource sharing in recent years [1, 2]. Reference [3] named the coexist environment of Grid and P2P “Grid Society”. In the environment of Grid Society there exists a large pool of users as well as resource providers, just as in Human Society, whether all of them can be efficiently managed will greatly affect the performance of the whole system [4]. On the basis that Human Society and Grid Society are Similarity Systems, we use the methods of migration to explore the problem of accounting management and proposed a Society based Accounting Management model for Grid Society.

1 Introduction Grid and P2P are both emerging technologies that aim at efficient resource sharing in recent years. Now more researchers inclined to combine the research work in these two fields. We followed the above idea and put forward a system model that merged Grid and P2P, and we entitle it “Grid Society”. It inherited both from the Grid environment and from the P2P Environment. Based on the comparative research of Human Society and Grid Society, reference [3] has drawn out the conclusion that: Grid Society and Human Society are Similarity Systems, issues in Grid Society can be solved using corresponding solutions of similar issues in Human Society. In the environment of Grid Society there exists a large pool of users as well as resource providers, and they belong to multi-domains separately. Just as in Human Society, whether all of them can be efficiently managed will greatly affect the performance of the whole system. The accounting problem in Grid Society should include: Goal 1: Managing of users’ behaviors, avoiding violate operations. Goal 2: Recording the usage of each user accurately and charging of it. Goal 3: Enhance of resource sharing between consumers and providers. *

This work was supported by the National ‘863’ High-Tech Programme of China under the grant No. 2002AA104560, National Science Foundation of China under the grant No. 60273041, and SRF for ROCS, SEM.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 715–718, 2004. © Springer-Verlag Berlin Heidelberg 2004

716

J. Shan et al.

Several works have already set to solve the accounting problem: First, Globus Toolkit [5] as a de facto standard of Grid Computing software uses GSI to manage the accounts. In [6] Virtual Account System is proposed to simplifying the user management problem. For the schema of United Device’s Grid MP platform [7], a layered structure is used for dispatching jobs and maintaining the users’ account info. In the reminder of this paper, we will mainly discuss the accounting problem in Grid Society. In section 2 we give the comparison between Human Society and Grid Society, and in section 3, a new society based accounting model is described. Summary and future works are presented in section 4.

2 Accounting in Human Society and Grid Society According to the conclusion drawn in [3], we can explore the similar problem of accounting in Human Society and migrate its solution into Grid Society. The fundamental elements of Human Society are People and Nature Resource. The element People acquires various resources to meet its own requirements and offer human labor to serve the Human Society. Then the People elements form various organizations. On the basis of that, various kinds of social affairs come into being. Therefore, in Human Society the account object to be managed can be categorized into two classes: individual user and group user. But, from another point of view, they can also be identified as Consumer, Provider and Agency. The Grid Society also consists of two kinds of fundamental elements: Computer and Grid Resource. The Computer element here means a single processor with some auxiliary devices, having abilities of computing, storage or routing, etc. And the Computer elements can be composed into Machine Teams of various size and ability. The same as it in Human Society, resource sharing in Grid Society can also be regulated under economical mechanisms. In Grid Society the object of accounting management is the user of a Computer or a Machine Team, and the management requirements described in Section 1 is consistent with effects of Human Society’s. Therefore, we can migrate the experience in Human Society into Grid Society to improve the accounting management.

3 Society Based Accounting Management Model In our SAM Model, the accounts in Grid Society can be classified into 3 kinds: Consumer: one who refer job request to Grid Society. Provider: one who shares his own resources with the others in the Grid Society and earns reward through helping others to fulfill their tasks. Agency: one who schedules the interaction between Consumer and Provider. Agency holds the service information from Provider as well as users’ requirements from Consumer. By certain scheduling policy, agency works for the efficient resource sharing in Grid Society.

Accounting in the Environment of Grid Society

717

3.1 General Components Interaction The general interaction among Consumer, Provider and Agency is described in Fig.1. Through the Consumer Component, an end-user submits his job onto the Agency he selected from a list it maintained. After checking user’s identity and analysis of job request, the Agency can react in two ways. Agency can return a list of available services for end-user to choose, or it can use certain scheduling algorithm (authorized by the end-user) to search for the most appropriate Service Provider and submit job to it. When Provider complete one job, it return the result to Consumer through the Agency, and the Agency balance the fee among three sides, according to the execution of trilateral protocols. The Finance Module used in balance can be viewed as a financial service Provider.

Fig. 1. General Components Interaction in SAM

Agency is with responsibility for recording the contract information, and charging it. Especially, the same as in Human Society, all contracts are subscribed by all the three involved parts. Every part can have its own decision and it makes a free market.

3.2 Complex Components Interaction One individual component may be multi-identified, so we think that more complex interaction operations also exist among those entities, as show in Fig.2. First, one entity can be Consumer, Provider and Agency at the same time shows in Fig2. (a). This kind of interaction achieves Work Flow management on Provider part.

Fig. 2. Complex Components Interaction in SAM

Second, for a service request by end-user, Agency can map it to be fulfilled on multiProviders, describes in Fig2. (b) That is, Agency can decompose the request according to its details. And Work Flow management here is achieved on the Agency part. Third, when one Agency does not get match for end-user’s request in his Service List, it can forward the request to his other similar functioned Friend Agency, as Fig2. (c) shows.

718

J. Shan et al.

With the SAM model described above, we can achieve the total 3 Goals listed in Section 1. First, the design of Society based model eliminates the gulf between Provider and Consumer and efficiently makes a match between them. Second, the architecture ensures the independency among components, so that each component can set up self-management policy without influence on the whole architecture. Third, the layered structure makes the user management decentralization possible in Grid Society environment. Only the Agency need to have a certificate on the destination Provider, and the end user only need to communicate with the selected Agency. Therefore, it lightens the burden for every Service Provider. The Provider component may also use the method of Account Template for further improvement [8].Finally, Different service selection policy can be used in Agency component, e.g. economic market based auction mechanism, which can enhance the economization in resource sharing.

4 Conclusion and Future Works This article is based on the principle that methods between Similar Systems can be migrated. And we migrate the basic accounting management policies from Human Society for the management of accounting in Grid Society. Our future work will focus on the method of user-faced Agency addition to provide more available service and protect user’s rights and interests at the same time. We also plan to study the organization of the Agencies, as well as their influence on the whole Grid Society’s resource usage efficiency.

References 1. I. Foster, C. Kesselman, etc.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 2001. 2. tefan Saroiu, P. Krishna Gummadi, etc.: Exploring the Design Space of Distributed and Peer-to-Peer Systems: Comparing the Web, TRIAD, and Chord/CFS, 1st International Workshop on Peer-to-Peer Systems, 2002. 3. Jiulong Shan, Guoliang Chen, etc.: Grid Society: A System View of the Grid and P2P Environment, International Workshop on Grid and Cooperative Computing, 2002. 4. Tom Hacker, Bill Thigpen.: Distributed Accounting Working Group Charter, http://www.gridforum.org/5_ARCH/ACCT.htm, 2000.6. 5. The Globus Toolkit: http://www.globus.org 6. Norbert Meyer, Pawel Wolniewicz, Miroslaw Kupczyk.: Simplifying Administration and Management Processes in the Polish National Cluster, CUG SUMMIT, 2000. 7. United Device – Grid Computing Solution: http://www.ud.com 8. Thomas J. Hacker, Brian D. Athey.: A Methodology for Account Management in Grid Computing Environments, The 2nd International Workshop on Grid Computing, 2001.

A Heuristic Algorithm for Minimum Connected Dominating Set with Maximal Weight in Ad Hoc Networks Xinfang Yan 1,2, Yugeng Sun1, and Yanlin Wang1 1

School of Electrical Engineering & Automation, Tianjin University,Tianjin300072; 2 College of Information Engineering, Zhengzhou University, Zhengzhou450052

Abstract. Routing based on a minimum connected dominating set (MCDS) is a promising approach in mobile ad hoc networks, where the search space for a route is reduced to nodes in the set (also called gateway nodes). This paper introduces MWMCDS, a simple and efficient heuristic algorithm for calculating MCDS with maximal weight. The choiceness based on maximal weight of gateway nodes guarantees that the most suitable nodes have been chosen for the role of gateway nodes so that they can properly coordinate all the other nodes. As a result, the method can keep stability of the MCDS and provide a high effective communication base for broadcast and routing operation in the whole network.

1 Introduction Mobile Ad hoc Networking (MANET) is a temporary, and an autonomous multihopsystem consisting of hosts with wireless receiver and dispatcher, where each host assumes the role of a router for its neighbors and relays packets toward final destinations. MANET has no established infrastructure or centralized administration, where every host can move to any direction at any speed and any time. This induces a dynamic topology. The characteristic put special challenges in routing protocol design, because it must take much expense (bandwidth, CPU, battery, etc) to find a route again when topology changes. So routing algorithm should converge quickly. Recently a hierarchical routing approach based on a MCDS is proposed in [1,8,9,10,]. The gateway hosts in the MCDS form a high-level virtual backbone network.. Each gateway host act as a control center in own cluster. Clearly, the efficiency of this approach depends largely on the process of finding and maintaining a MCDS and the size of the corresponding subnetwork. Unfortunately, computing a MCDS in a unit graph is NP-hard [3]. So the approximate algorithm for MCDS is needed to design in practical applications. Wu [8] gave the contrast among several existing main algorithms, where these algorithms are described by taking into account all the hosts of network with same character. But the hosts in the actual MANET may be quite complicated, such as they can be computer, PDA and varied mobile telephone. And the state of host’s power or the time of host’s online plays an important role in choice of gateway nodes. With a view to this feature, the paper introduces MWMCDS. In order to reflect the influence M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 719–722, 2004. © Springer-Verlag Berlin Heidelberg 2004

720

X. Yan, Y. Sun, and Y. Wang

of host’s power or the time of host’s online, every node v is assigned a weight (a real Furthermore, the sum of all the nodes’ weight is insured as much as possible when MCDS is formed. Because each node only has a gateway node with routing table, the source node can find a route promptly and reduce communication delay. As a result, the mothed can minimize the amount of storage for communication information , and minimize the amount of data to be exchanged in order to maintain routing and control information in a mobile environment, ect. If the route is congested, the other gateway router of the node’ neighbor set is getatable. In addition, only the information of the local topology is amended when its topological structure changes (e.g.,hosts switch on, off and move). So the algorithm has good distributed nature and retractility.

2 Preliminaries We model a MANET by an undirected graph G = (V ,E), in which V is the set of mobile hosts and E represents a set of edges. There is an edge if and only if u and v can mutually receive each others’ transmission (this implies that all the links between the hosts are bidirectional) ; that is, connections of hosts are based on geographic distances of hosts. We assume each host is mounted by an omni-directional antenna. Thus the transmission range of a host is a disk and the corresponding graph is called a unit disk graph [3], or simply, unit graph. A dominating set (DS) D of G is a subset of V such that any node not in D has at least one neighbor in D. If the induced subgraph G[D ] of D is connected, then D is a connected dominating set (CDS). Among all CDSs of graph G, the one with minimum cardinality is called a minimum connected dominating set (MCDS). Vertices in a DS are called gateway nodes (or dominator) while vertices that are outside a DS are called non-gateway nodes(or dominatee). We assume a given MANET instance contains n hosts (nodes). Every node v in the network is assigned a unique identifier (ID). Consider weighted networks, i.e., use a weight of each node of the network to stand for the power of host or the time of host’s online. In the procedures, we use the following notation: hop_count (u, v)---the number of edges in the shortest path from u to v, for u , the one-hop (open) neighbor set of the two-hop (open) neighbor set of m(v)--- a marker for node which is 0, 1, or 2 and respectively correspond v’s role is undecided, will be nongateway or gateway node, the initial state. A node v in this state has m(v)=0. the dominatee state. A node v in this state is a non-gateway and has m(v)=1. the dominator state. A node v in this state is a gateway and has m(v)=2. G(v)---used by a node v to make its neighbors aware that it is gateway node. G(v)=t ---node t is the gateway node of node v. join(v , t)--- node v, whose gateway is node v is a non-gateway. m (v) = 2 }--- the set of all the gateway nodes.

A Heuristic Algorithm for Minimum Connected Dominating Set

721

3 Algorithm Initially, every node v in the network is assigned a unique identifier (ID) randomly and let m(v)=0, i.e., all nodes are in v exchange its open neighbor set with all its neighbors.Thus, v knows information of that is, its neighbor’s neighbor information. Then, v just with one neighbor first decides its own role based on rule (a), then this neighbor’role is decided based on rule (b). And then, v with calculates its own new state based on information. The algorithm rule is as follows: (a). If node u just has one neighbor t, then lets m(u)=1 and goes to state and u will broadcast the message join(u , t). Turn to next node. (b). On receiving a join(u , t) message, t goes to state and broadcasts a G(t) message to its neighbors stating that it will be a gateway node Turn to next node. (c). If v has received from all its neighbors z such that then calculates its own role based on rule (d); else turns to the node with the biggest weight among the neighbors of v’s in (this ensures those nodes with bigger weights are top-priority). (d). If v exists two unconnected neighbors x and y, it checks whether those gateway nodes among its are connected(ths is a optimization design for decreasing size of CDS). If they are unconnected, then sets m(v)=2 and goes to state v will broadcast the message G(t) Turn to next node; else calculates its own role based on rule (e). (e). Goes to state v selects the gateway node t with biggest weight among all its neighbor nodes. If t is not existent, v will keep waiting until t is appeared, and then v will set m(v)=1 and broadcast the message join(v , t). Turn to next node. We will show that all the nodes terminate the algorithms being either gateway nodes or non-gateway nodes, and that the set of the gateway nodes in is indeed a MWMCDS.

Fig. 1. When N =20

Fig. 2. When R = 30

4 Simulation We measure the performance of the proposed MWMCDS algorithm using computer simulation. Assume that there are N hosts are scattered randomly in a 250×250 square units of a 2-D simulation area. Only take connected graph into consideration. We vary

722

X. Yan, Y. Sun, and Y. Wang

N and R (transmission radius) to analyze how the network size and connectivity affect the performance. We run the algorithm 80 times on different set of parameters including N and R and at the end, we simply take the average of the ratios for all cases.Twe of the averaged results are reported in Fig. 1 and Fig.2, where “gate of all” curve represents ratio of number of gateways versus the number of hosts in the network and “gatev of high” curve represents the ratio of gateways with high weight versus all of the gateways in the network. The curve of experiment shows that ratio of number of gateways versus the number of hosts in the network will be reduced as number of node in the topology graph increases. At the same time, the ratio of gateways with high weight versus all of the gateways in the network keeps higher value (over 80% ), i.e., when gateways are chosen in the procedure, the nodes with high weight take first priority of computation.

5 Conclusions Simulation resluts show that the proposed algorithm MWMCDS can ensure the maximality of sum of CDS’ weight and the minimality of CDS’ size. So the scheme can be potentially used in designing efficient routing algorithms based on a MCDS.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10]

K. M. Alzoubi, P .J. Wan, and O. Frieder.: New Distributed Algorithm for Connected Dominating Set in Wireless Ad Hoc Networks, Proc.35th Hawaii Int’l Conf. System Sciences (2002) 3881-3887 S Basagni.: Finding a Maximal Weighted Independent Set in Wireless Networks.. Telecommunication Systems 18(1-3)(2001) 155-168 B. N. Clark, C. J. Colbourn, and D. S. Johnson..: Unit Disk Graphs. Discrete Mathematics, (1990) 86: 165– 177 S. Guha and S. Khuller.: Approximation Algorithms for Connected Dominating Sets., Algorithmica(1998) 20(4): 374-387 H. Lim,C. Kim.: Flooding in Wireless Ad Hoc Networks. Computer Communications, (2001)24: 353-363 E.M. Royer and C.K. Toh.: A Review of Current Routing Protocols for Ad Hoc Mobile Wireless Networks. IEEE Personal Comm.(1999)4: 46-55 P. Sinha, R. Sivakumar and V. Bharghavan.: Enhancing ad hoc routing with dynamic virtual infrastructures. INFOCOM (2001)3: 1763-1772 J. Wu and H. Li.: A Dominating-Set-Based Routing Scheme in Ad Hoc Wireless Networks. Telecomm. Systems, A special issue on Wireless Networks(2001)18:l-3,13-36 J. Wu.: Extended dominating-set-based routing in ad hoc wireless networks with unidirectional links. IEEE Trans. On Parallel and Distributed Systems(2002)13(9):866-881 Peng Wei, Lu Xi-Cheng.: A Novel Distributed Approximation Algorithm for Minimum Connected Dominating Set. Chinese J. Computers (2001)24(3): 254-258

Slice-Based Information Flow Graph* Wan-Kyoo Choi1 and Il-Yong Chung1 School of Computer Engineering, Chosun University, Kwangju, 501-759, Korea, [email protected], [email protected]

Abstract. We, in this paper, try to represent information flow of program on the basis of slices. It is referred to slice-based information flow graph(SIFG). SIFG captures the information flow among data tokens. We can find the elementary characteristics of the information flow of program by using SIFG and increase the understanding about program.

1 Introduction The nature of information flow, which is related to the deep structure of a program, must be considered for understanding a program. Programmers tend to group statements in ways based on other than sequential relationships when attempting to understand programs. In general, the criteria used for the groupings are related to data and control flow [2]. This information is explicit in the program slice [1]. Slices is the abstraction of sets of statements that influence the value of a variable at a particular program location [2]. Slicing is a method of program reduction. Slices were proposed as potential debugging tools and program understanding aids. Slices as originally defined capture the “use” relationship of traditional flow analysis [5] Specially, data slice [4,5] represents a slice abstraction of a program. Data slices modify the concept of metric slice [3] to use data tokens rather than statements as the basic unit. Like metric slices, data slices also are computed on the set of variables that are outputs from a module. Data tokens are variables, constant definitions and references defined in statements. Usage of data tokens ensures that all of elementary changes of interesting variables will cause a change in at least one slice of a program [4]. If programmers use slices when understanding a program, understanding of a program can be regarded as making a search for the information flow on slices. Data slice, however, never consider the information flow among the data tokens. Therefore, if we can capture the information flow among data tokens, we can find the elementary characteristics of the information flow of a program and increase the understanding of it. We, in this paper, propose the slice-based information flow graph(SIFG) for representing information flow of program by capturing and modeling the information flow of data tokens on data slice. Since SIFG can show the information * This study was partially supported by research funds from Chosun University, 2003 M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 723–726, 2004. © Springer-Verlag Berlin Heidelberg 2004

724

W.-K. Choi and I.-Y. Chung

flow on data slice explicitly, it can represent well the interesting elementary changes of the information flow of a program and can enhance our understanding of a program.

2

Slice-Based Information Flow Graph

We developed the slice-based information flow graph, which models the information flow on a data slice to use data tokens as basic unit. Definition 1. The information flow graph on a data slice S, denoted SIFG(S), is a directed graph, SIFG(S)=. N is a set of data tokens defined in statements of S, and E is a set of edges. An edge of SIFG represents the information flow amomg data tokens. An information flow edge is defined among data tokens contained in a statement and related to a variable. An information flow edge defined among data tokens contained in a statement represents use-used between data tokens. Let be statement of S. Then is a set of data tokens used or contained in Let and SIFG contains a direct information flow edge from to (that is, directly uses if one of the followings holds:

1. The result of computation using brings on a change of 2. is an data token related to an array variable and is an index of it. In this case, change of lead to change of an array element. SIFG contains an indirect information flow edge from indirectly uses if the following holds: 1.

contains a logical operator, and side of the one.

to

is left-hand side and

(that is, is right-hand

An information flow edge defined among data tokens related to a variable represents the flow of data value between data tokens. Let be a set of all data tokens of a variable used in S. Let SIFG contains a direct information flow edge from to if the followings holds:

1. There is an control flow path from to in standard control flow graph for S. on an control path 2. There is no such a statement, which contains from to 3. There doesn’t exist such that there is the information flow from to and from to The slice-based information flow graph for a procedure P, denoted SIFG(P), is defined as concatenation of every SIFG(S) related tp P.

Slice-Based Information Flow Graph

725

Definition 2. Let be slices for a procedure P. Let and be sets of nodes and edges on respectively. Then SIFG(P) for a procedure P is as follows.

For examples,

and are slices about the outputs of procedure Sum1 of figure 1, respectively. In figure 1, indicates the data token for a variable in a procedure, and indicated the statements in procedure. The data slices for sumX and sumSqrX are a sequence of data tokens used in S(sumX) and S(sumSqrX). Figure 2 shows SIFG for a procedure Sum1.

Fig. 1. Procedure Sum1

3

Fig. 2. SIFG(sum1)

Related Works

The representative methods for representing the information flow of a program are control flow graph(CFG) [6], data flow graph(DFG) [7] and program dependence graph(PDG) [1]. CFG, in which the nodes represent statements and the edges represent transfer of control between statements, encodes control flow information. DFG, in which the nodes represent statements and variables and the edges represent data flow between statements and variables, encode data flow information and point of input and output. PDG, in which the nodes represent statements or region codes of code and the edges represent control and data dependencies, encodes both control and data dependence information.

726

W.-K. Choi and I.-Y. Chung

We are able to understand control flow, data flow, control dependency and data dependency between statements and between statements and variables in a program by using CFG, DFG and PDG. SIFG use variables that are represented by data tokens, but CFG, DFG and PDG use statements as basis of analysis of a program. Thus, while SIFG can’t represent the data flow structure between variables and the elementary characteristics of the information flow of a program, CFG, DFG and PDG can represent these informations.

4

Conclusion

We, in this paper, proposed the slices-based information flow graph (SIFG). The existing representations of information flow of program are based on statements, but SIFG is based on variables that are represented by data tokens. Specially, it captures characteristic of information flow between variables on slices. Thus SIFG is able to represent the elementary change of interesting variables and the data flow structure between variables. It can show the nature of information flow of a program more clearly and enhance our understanding about a program. We also can use it as tool for the partial analysis and the partial debugging of a program by employing this characteristic of SIFG.

References 1. Karl J. Ottensteion, Linda M. Ottensteion: The program dependence graph in a software development environment. Proceeding of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environment, ACM SIGPLAN Notices 19(1984) 177-184. 2. M. Weiser: Programmers use slices when debugging. Communication of the ACM 25(1982) 446-452. 3. Linda M. Ott, Jeffrey J. Thuss: Slice based metrics for estimating cohesion. Proc. IEEE-CS International Software Metrics Symposium(1993) 71-81. 4. J.M. Bieman, L.M. Otto: Measuring functional cohesion. IEEE Transaction Software Engineering 20(1994) 111-124. 5. J.M. Bieman, B.K. Kang: Measuring design-level cohesion. IEEE Transaction Software Engineering 24(1998) 111-124. 6. Linda M. Otto: Using Slice Profiles and Metrics during Software Maintenance. Proc. 10th Annual Software Reliability Symposium(1992) 16-23. 7. A. Aho, R. Sethi and J. Ullman: Compilers, Principles, Techniques and Tools. Addison-Wesley, Reading, MA(1986).

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture Qi Gao, HuaJun Chen, ZhaoHui Wu, and WeiMing Lin Grid Computing Lab, College of Computer Science, Zhejiang University, Hangzhou, 310027, P.R.China {hyperion, huajunsir, wzh}@zju.edu.cn, [email protected]

Abstract. Based on Semantic Web technology and OGSA architecture, we propose a Semantic Rule Service Model to enable intelligence on grid. In this model, we regard rules and inference engines as resources, and employ rule base services and inference services to encapsulate them to do inference on ontology knowledge. With the support of OGSA architecture, we organize the services as grid services in order to support the dynamic discovery and invocation of the rules and inference engines. The function of this model is to provide intelligent support to other grid services and software agents. In addition, we illustrate the application of this model in an application of the Traditional Chinese Medicine (TCM) system.

1 Background and Introduction The Grid [1] is an integrated infrastructure for coordinated resource sharing and problem solving in distributed environments. In OGSA model [2], various resources, including information and knowledge, are encapsulated in Grid services. The Grid infrastructure is a sound base for dynamic and large-scale web applications. On the other side, with the goal “making the web machine understandable”, Semantic Web [3] research community focuses on the semantic integration of the web. Several ontology languages, such as RDF [4], DAML+OIL[5], and OWL[6], are developed to represent data and information semantically by defining the terms with explicit semantics and indicating relations between them clearly. In the future, the Internet will be integrated both physically with Grid architecture, and semantically with Semantic Web technology. The research on Knowledge Base Grid (KB-Grid) [7] has put effort on utilizing the two new technologies together to enable knowledge sharing and grid intelligence. The Semantic Rule Service, which is a sub-project of KB-Grid, aims at bringing reasoning support to web applications. In our opinion, not only the descriptive knowledge in ontology should be considered as resources, rules and inference engines are also resources to be published and shared. In Semantic Rule Service model, we construct a suit of services to enable various organizations to publish their rules and inference engines on the web, so that other web applications and software agents are able to utilize these resources to solve M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 727–735, 2004. © Springer-Verlag Berlin Heidelberg 2004

728

Q. Gao et al.

specific problems. Different from many traditional rule-based systems, this model is a grid-based rule service model for ontology knowledge on the web. As the services are constructed as Grid service, we can employ some technologies provided by OGSA to support dynamic service registry, discovery, and accessing.

2 Related Work With the rapid development of Semantic Web, many research organizations have put efforts into the representation of logic. RuleML [8] makes the efforts of building a standard rule description language for Semantic Web. Based on the research of situated courteous logic programs (SCLP), RuleML supports prioritized conflict handling as well as sensors and effectors. Based on Horn logic, TRIPLE [9] is designed especially for querying and transforming RDF models. DAMLJessKB [10] adopted the approach of translating DAML description to Jess [11] rules to reason with DAML knowledge. Besides, DAML research community has initiated DAML-Rule [12], based on the research on Description Logic Programs (DLP) [13], which aims at combining Description Logic (DL) reasoning with Logic Programs (LP) inference. In our view, rules should only be used to represent heuristic knowledge derived from experience, and the descriptive knowledge (fact) should be left to other representations, e.g. ontology. Rules should be utilized together with descriptive knowledge in ontology language. In this paper, we may use the word “knowledge” solely refer to the descriptive knowledge. Here, we focus on rules’ ability to process knowledge for other web applications and make decision for software agents. Actually, the decision making procedure is also similar to knowledge processing, since related descriptive knowledge must be processed to reach a conclusion. From this perspective, rules are different from descriptive knowledge, and a set of rules can process a certain kind of descriptive knowledge in a specific way. Then we discuss some differences between Description Logic (DL) reasoners, e.g. FaCT [14], RACER [15] and rule inference engines. Since the ontology languages of Semantic Web are defined as DL language, DL reasoners are especially suitable for computing subsumption relation between classes and discovering inconsistency in ontology model. And that kind of reasoning is essential to building and maintaining large ontologies. On the other side, rule inference engines are more flexible. People can design rules straightforwardly to make inference on certain knowledge. The rule inference engine can only focuses on the related pieces of knowledge and can tolerate some inconsistency of the knowledge model. Therefore, rule inference engines are suitable for domain specific applications, which need flexible, domain related rules.

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture

729

3 Semantic Rule Service Model Figure 1 depicts the layers of Semantic Rule Service Model. The service layer has three related Grid services, each of which encapsulates one kind of resources. Rule base service and inference service will be discussed detailedly in this section. The Ontology KB service is the central part of KBGird [7] providing interfaces for knowledge sharing, query, and management. Here, ontology KB service mainly acts as knowledge source and we do not intend to discuss it in detail. Directory services are on the index layer, playing an important role in service registry and dynamic discovery. The upper-most layer is the application layer, on which other grid services, software agents and semantic browser can utilize the services on lower layers. The rule editor is a supportive tool, which enable users to edit rules visually.

Fig. 1. Semantic Rule Service Model

3.1 Rule Base Service Rule base service provides interfaces for web users to share their rules. On the one hand, rule publishers can register their rules on the rule base. On the other hand, all web users can access the registered rules and use these rules to process knowledge. Rules are organized as RuleSets by function in the rule base. A RuleSet is a group of rules which are written in a certain rule representation language, based on a certain kind of knowledge, and applied to process the knowledge in a certain way. Rules in a RuleSet should be close related and cooperate with each other to implement an inference. It should be noted that the rule base does not define or appoint a rule language for RuleSet. In other words, RuleSet can be written in any rule language, such as RuleML, TRIPLE, etc.

730

Q. Gao et al.

To organize the various RuleSets, rule base must maintain the meta-information of them. Each RuleSet has its meta-information, so that any web user can find the RuleSet which meet the requirements. The basic meta-information includes: the URI, the rule language, the knowledge type, the publisher, version, and last-update time, and the description of the function. For example, the meta-information of a RuleSet in the TCM application is this:

3.2 Inference Service The inference service performs the task of processing knowledge according to certain RuleSets. Generally, an inference service encapsulates an underlying inference engine. With the shared basic interface, various inference services can be provided by different organizations. Some can be built upon traditional rule-based systems, such as Jess [11], with an outer layer to translate knowledge between classical representations and the Semantic Web standards. Others may be built on newly designed inference engines which support Semantic Web languages. In our prototype we designed a new inference engine based on RDF and WRIL, which will be discussed in detail in section 4. Although the implementation is transparent, web users need to know which rule languages and knowledge languages are supported by the inference service. Therefore, every inference engine should have its meta-information, which may also be used for service discovery and locating. The meta-information consists of: the address of the service, the rule language and the knowledge language supported by the engine, the publisher, version, and last-update time, etc. Here is an example:

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture

731

3.3 Rule Editor The rule editor serves as a visual tool of editing rules. Although some traditional rulebased systems have corresponding methods to edit rules, few of them can edit rules based on ontologies. Since we consider rules as the description of the business logic of processing knowledge, rules should support ontologies and should be close related to the knowledge representation languages such as RDF, OWL, etc. Our rule editor can incorporate vocabulary of basic ontologies and customized ontologies defined by user, so that users can choose the terms defined in ontologies to design their rules.

3.4 Directory Service The directory service on the index layer focuses on the registry, discovery, and locating of the rule base service and inference service. Service providers register the meta-information of RuleSets and inference service to the directory. Then users or user applications can query the services they require and get the addresses of them.

Directory service maintains meta-information with a certain life cycle. If after a certain period the item is not reregistered again, the item will be removed from the directory as obsolete information. Query is implemented by matching function to compare the meta-information in the request with the meta-information in the directory and returns the matched items, including the meta-information and the address

732

Q. Gao et al.

the service. The mechanism to communicate between distributed directory services is that: Every directory records a group of other neighboring directories. When a new item is registered to one directory, the directory registers it to its neighbors. When a directory receives a registration from another directory, it subscribes the item from the source and updates the local item according to the source.

3.5 Application Layer In the simplest way, the semantic browser can serve as a client tool of this model. The fundamental function of semantic browser is to display RDF knowledge in a graphical view. Here, the semantic browser can access the rule services in a visual way. It focuses on the personal use of the services enabling the rule publisher to publish RuleSets, query the rule base, and access inference services to process knowledge. Software agents can also benefit from the rule services. Many software agents typically rely on the build-in rules and inference engines to analyze the environment and behave intelligently. However, as the environment is complex and continuously changing, the build-in rules must be frequently updated and the work of the software agents may be interrupted. To solve this problem, we can store the rules in rule base, and keep updating them. Then the software agents can access the rule base service and obtain the appropriate RuleSets according to the external environment. In the most common cases, rule services are used to support users or other web applications to process knowledge. To solve an application problem, special experts or organizations in the specific domain design some RuleSets for the problem by the rule editor. The rules can be represented in any rule language, providing there are corresponding inference services available on the Web. In section 4, we will discuss it with a real-world application.

4 A Case Study of the Application in Traditional Chinese Medicine In this section, we discuss the rule service model with a web application of Traditional Chinese Medicine (TCM). Traditional Chinese Medicine (TCM) is a knowledge intensive domain. In previous work, a Unified TCM Language System [16] has been built as an abstract upper-level class in TCM Ontology. TCM ontology contains many special concepts which are represented as classes and properties in RDF. Now we have finished the building of the whole class definition of TCM Ontology and edited about 100,000 records of TCM ontology instances. As a sub-project of TCM KB-Grid, we have developed a computer-aided prescription analyzing system, which is based on rule service model and ontology KBs. The function of the system is to analyze prescriptions and provide suggestions and warnings according to TCM knowledge and analyzing rules. To represent these rules, we design a language, which especially aims at processing RDF knowledge, called Web

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture

733

Rule Inference Language (WRIL). We designed a rule ontology and use RDF language to represent rules. Definitions of WRIL: A Rule is defined as a 3-tuple , where ASet is antecedents of the rule, consisting of BodyUnits, CSet is consequents of the rule, consisting of HeadUnits, and f is the times the rule can be fired. A RuleStmt is defined as a 3-tuple , where S, P, O are the subject, predicate, and object of the rule statement respectively. Any of them can be variable or constant. A BodyUnit is defined as a 2-tuple , where RS is the RuleStmt and T is the test condition of the RS. The test condition has 2 types: Existence and Nonexistence A HeadUnit is defined as a 2-tuple , where RS is the RuleStmt and A is the action of the RS. The action has 4 types: AddStmt, RemoveStmt, ChangeNum, and ChangeStr. Variable is also defined to represent indefinite parts of RuleStmt. There are 3 types of variable: Res Variable, StrVariable, and NumVariable. The RDF Schema can be found on our web site: http://grid.zju.edu.cn. A simplified example rule for detecting Contraindication in TCM prescription is provided next:

The meaning of this rule is: If any prescription, represented by a variable “V_P”, has two herbs, represented by variables “V_Herb_X” and “V_Herb_Y”, and these two herbs have “Contraindication” with each other, then a warning about Contraindication is generated in the prescription. We have developed an inference engine to execute the WRIL rules. This inference engine employs Jena API [17] to build RDF model and access RDF knowledge.

734

Q. Gao et al.

Fig. 2. Prescription analyzing system on the Semantic Rule Service Model

The computer-aided prescription analyzing system is published as web service. The request includes patient information, diagnosis, and prescription, all of which are represented in RDF according to TCM ontology. The system locate the analyzing RuleSet and corresponding inference service by querying directory service, then access rule base service to get the RuleSet in order to send them to inference service. The inference service processes the patient case and the rules with the support of TCM ontology KB and returns the results to the analyzing system. The system reorganizes the results and reply to users. With the support of TCM ontology KB, which includes huge amount of descriptive knowledge, this system achieves high performance based on a relatively small set of general rules.

5 Summary and Future Work In this paper, we describe the semantic rule services to address the ontology knowledge processing problem in the Grid and Semantic Web background. In this model, rules and inference engines are considered as resources to be shared by Grid services, and rule inference is employed to enable intelligent grid-based applications and software agents. In the future, we will make our effort into other usages of rule inference in Grid architecture, e.g. rule-based service integration or process model validation, etc. And we may also combine rule inference and description logic reasoning together to achieve better utilization of ontology knowledge. The rule service model may evolve as long as the research progresses.

References [1] [2]

Ian Foster, Carl Kesselman, and Steven Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations: Intl J. Supercomputer Applications, 2001 I. Foster, C. Kesselman, J. Nick, S. Tuecke, The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002.

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture [3] [4] [5] [6] [7]

[8] [9]

[10]

[11] [12] [13]

[14] [15] [16] [17]

735

Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, In Scientific American, May 2001 Resource Description Framework (RDF) http://www.w3.org/RDF/ Deborab L.McGuiness, Richard Fikes, James Hendler, LynnAndrea Sten, DAML+OIL: An Ontology Language for the Semantic Web IEEE Intelligent System Sep. - Oct. 2002, Web Ontology Language (OWL) http://www.w3.org/2001/sw/WebOnt/ Huajun Chen, Zhaohui Wu: OKSA: an Open Knowledge Service Architecture for Building Large-Scale Knowledge Systems in Semantic Web. In the Proceeding of IEEE Conference on System, Man and Cybernetics, 2003. Rule Markup Language Initiative http://www.dfki.uni-kl.de/ruleml/ Michael Sintek, Stefan Decker: TRIPLE-A Query, Inference, and Transformation Language for the Semantic Web. International Semantic Web Conference (ISWC), Sardinia, June 2002. Joseph B. Kopena and William C. Regli “DAMLJessKB: A Tool For Reasoning With The Semantic Web” 2nd International Semantic Web Conference (ISWC2003), Sanibel Island, Florida, USA, October 20--23 2003. Java Expert System Shell (Jess) http://herzberg.ca.sandia.gov/jess/ DAML-Rule http://www.daml.org/rules/ Benjamin N. Grosof, Ian Horrocks, Raphael Volz, Stefan Decker, Description Logic Programs: Combining Logic Programs with Description Logic, in Proc. of the Twelfth International World Wide Web Conference 20-24 May 2003 I. Horrocks, FaCT and iFaCT. In Proceeding of the International Workshop on Description Logics(DL’99) V. Haarslev and R. Moller, ‘RACER system description’, in Proc. of IJCAR-01, number 2083 of LNAI, Springer-Verlag, 2001. Xuezhong Zhou, Zhaohui Wu. UTMLS: An Ontology-based Unified Medical Language System for Semantic Web. 2002. Jena 2 Development http://www.hpl.hp.com/semweb/jena2.htm

CSCW in Design on the Semantic Web* Dazhou Kang1, Baowen Xu1,2, Jianjiang Lu1,2,3, and Yingzhou Zhang 1 1

Department of Computer Sci. & Eng., Southeast University, Nanjing 210096, China 2 Jiangsu Institute of Software Quality, Nanjing 210096, China 3 PLA University of Science and Technology, Nanjing, 210007, China [email protected]

Abstract. Computer-Supported Cooperative Work (CSCW) in Design explores the potential of computer technologies to help cooperative design. It requires more efficient technologies of communications and reusing knowledge in design process. This paper looks at CSCW in Design on the Semantic Web and shows how the Semantic Web technologies may improve the current design process. It describes using the Semantic Web technologies to represent design knowledge in a unified and formal form that can be understood by both people and machines and shows how this improve all kinds of communication processes in cooperative design. We study the great advantage of sharing and reusing design knowledge on the Semantic Web. This is very helpful in design process, and may completely change the current way of design.

1 Introduction In the contemporary world, design of complex new artifacts, includes physical artifacts such as airplanes, as well as informational artifacts such as software increasingly requires expertise in a wide range of areas. Concurrent engineering is needed in order to manage increasing product diversity to satisfy customer demands while trying to accelerate the design process to deal with the competitive realities of a global market and decreasing product life cycles[1]. Complex designs may be done by many, sometimes thousands of participants working on different elements of the design. The cooperative design process usually has strong interdependencies between design decisions. This makes it difficult to converge on a single design that satisfies these dependencies and is acceptable to all participants[2]. Current cooperative design processes are typically characterized by expensive and time-consuming, poor incorporation of some important design concerns and reduced creativity[3].

* This work was supported in part by the Young Scientist’s Fund of NSFC (60303024), National Natural Science Foundation of China (NSFC) (60073012), National Grand Fundamental Research 973 Program of China (2002CB312000), National Research Foundation for the Doctoral Program of Higher Education of China, Natural Science Foundation of Jiangsu Province, China (BK2001004), Opening Foundation of State Key Laboratory of Software Engineering in Wuhan University, and Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow University. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 736–743, 2004. © Springer-Verlag Berlin Heidelberg 2004

CSCW in Design on the Semantic Web

737

CSCW explores the potential of computer technologies to help people work together on two fronts: managing task interdependencies, and managing common information spaces[1]. Researchers in both technology and social domains have published many successful theories and applications such as Email Manage Systems, Work Flow Systems and Group Ware. The coordination and integration of the myriads of interdependent and yet distributed and concurrent design activities becomes enormously complex. It thus seems as if CSCW technologies may be indispensable if cooperative design is to succeed. When apply CSCW approaches to cooperative design, researchers address the issue of supporting the cooperation among designers and other actors over distance by means of a series of shared display facilities. In addition, they explore different approaches to capturing “design rationale” and supporting “organizational memory” [4]. CSCW in Design requires research in different domains: 1. New computer technologies to increase the efficiency of communications; 2. The requirement of reusing research and design results from previous projects; 3. The necessity to take into account results and criticisms occurring during the entire life cycle of designed products[5]. When the Web plays an increasingly important role in people’s works including design works, it is both opportunity and challenge for designers. The Semantic Web enables computers and people work in cooperation and can greatly help people share and reuse knowledge on the Web. It will help CSCW in design both in communication process of design and in sharing and reusing design knowledge. This paper will study how the Semantic Web technologies can improve the current design process. It is organized as follows: Section 2 presents the basic technologies and ideas of the Semantic Web. Section 3 describes the representation of design on the Semantic Web. Section 4 shows the communication processes of design on the Semantic Web. Section 5 studies the sharing and reusing design knowledge on the Semantic Web. Finally, it gives a conclusion.

2 The Semantic Web The Semantic Web is a new form of Web content in which information will no longer only be intended for human readers, but also for processing by machines. The Semantic Web is an extension of the current Web in which information is given welldefined meaning, better enabling computers and people to work in cooperation[6]. The most important technologies for developing the Semantic Web are Extensible Markup Language (XML), Resource Description Framework (RDF) and Ontology. They can express the semantic of information on the Web. The XML data is formally structured and can be processed by machines. RDF shows the meaning of data that let machines know how to process the data automatically. Human language thrives when using homonyms and synonyms. But these usually make machines confused. Ontology formally defines the relations among concepts and has a set of inference rules; it can deal with the problem of homonyms and synonyms. Ontology is the basic of sharing and reusing knowledge on the Web. The search results provided by the current search engines are always full of useless and outlying results. On the Semantic Web, People can retrieval pages that refer to a

738

D. Kang et al.

precise concept. The machines can even relate the information on a page to the associated knowledge structures and inference rules. Software agents can help people do many complex works. They can exchange data with others, find and use Web services automatically via Ontology. Both agents and services can describe themselves to others. Everyone can express new concepts and ideas that they invent with minimal effort. The knowledge is expressed in a unifying logical language, and it is meaningful to both software agents and other people. The Semantic Web technologies can be used to represent design in a unified and formal form that can be understood by both people and machines. These representations provide structures to capture and retrieval design knowledge. It will greatly helpful in the communication processes in design projects. Most designers may be interested in the remarkable advantage of sharing and reusing knowledge.

3 Representations of Design 3.1 Current Representation of Design Currently, information of design is mainly in visualized or nature language form for human reading. This makes a heavy and time-consuming work for designers to capture, index and reuse of design knowledge. It needs to represent designs formally in order to share and reuse them and improve the design process. Different design projects often use different tools and systems; the knowledge is represented in different forms. Current techniques to represent design knowledge are often based on special vocabularies and forms. It makes the communication as well as sharing and reusing of design results very difficult. Another problem is the lack of formal representation of requirements and functionality. Different people may have different understanding when reading the same document. It faces the same problem when describing the intentions of designers. It also need to represent the process of design, including the documents and discusses among the designers. Currently, this data are often simply stored as mainly unstructured nature language texts, pictures, voices or videos and hard to retrieval. We need to represent design including its requirements, functionality and process in a unified and formal way that machines can process automatically.

3.2 Representation on the Semantic Web The Semantic Web techniques can be used to represent designs in a unified and formal way; Ontology is a capable tool. Designers can use a general ontology to capture design knowledge, no matter what specific CAD systems they are using. The ontology can describe the relations among the terms using by different designers. Designers can easily come to a common understanding via the ontology. The knowledge in old informal forms can be processed automatically as well by using RDF technology. Objects included in this general ontology may be parts, features, requirements, and constraints[7]. A part is a component of the artifact being designed. The structure of a

CSCW in Design on the Semantic Web

739

part is defined in terms of the hierarchy of its component parts. These parts describe the model of the artifact. There are different kinds of features associated with a part; e.g., geometrical features, functional features, assembly features, mating features, physical features, etc[8]. It also needs to represent relationships between parts: such as joints, constraints, and behaviors. Parts, features and other parameters and constraints can describe the artifact being designed. However, it is difficult to describe designer’s intention called design rationales[9, 10], especially the functionality of design, which is an important part of design rationales. It needs a conceptual framework enabling systematic description of the functional knowledge for designers to share the conceptual engineering knowledge about functionality. The framework should consist of categorization of the functional knowledge and layered ontologies[11]. Requirements of design include physical requirements, structural requirements, performance requirements, functional requirements, cost requirements and so on[7]. Clients’ requirements are decomposed into requirements for the various sub-systems. Analysis and design is driven by the decomposed requirements. Finally, designers integrate sub-systems to meet the customer requirements. Requirement ontology can provide an unambiguous and precise terminology such that each designer can understand and use in describing requirements, and it can describe dependencies and relationships among the requirements to help the decomposition and integration work. The most powerful ability of requirement ontology is to check whether the functionality of design meets the requirements. The clients and designers can reach a shared understanding by exchanging ontologies. Usually requirement analyzers build requirement ontology instead of clients, because it is inurbane asking clients to provide their requirements in a formal way.

4 Communication Cooperative design mainly includes three kinds of communication and coordination processes: between designers and clients, within design teams, and between designers and design environments.

4.1 Representation on the Semantic Web Designers and clients need shared knowledge and artifacts for mutual understanding. Many clients do not know what they really need and are unfamiliar to the artifacts they want, they cannot tell their ideas accurately. So the first step in design is let some experts to analyze the client’s current situation as precisely as possible and determine what the clients really need by communicating with the clients in both informal and formal interviews. It leads to a common understanding of the real requirements. The experts then use requirement ontology for representing requirements that other designers can understand. These requirements should be visualized or translated back to nature language, and be showed to the clients. They can also be process by machines to find and reuse past results of design which meet the requirements. When designers have completed a design, it is important to know whether the design meets the clients’ needs. Clients often require externalizations for mutual understanding instead of formal representations. They prefer to see visualized results

740

D. Kang et al.

and tryout prototypes of artifacts. Visual representations of information can be based on ontological classifications of that information[12]. Designers use ontology to provide design information. The information can be visualized to clients. Of course, it needs experts to provide more intuitionist information, such as VR space, rapid prototype. These experts do not need to know the detail of design; they can get precise information via the ontology provided by designers. Clients may provide their criticisms; and designers need to improve or redesign the product according to these.

4.2 Within Design Teams Most real tasks are not done by individuals but by groups of people. Members in such teams might have very different interests. This kind of communication process is mainly managing task interdependencies to deal with the strong interdependencies between design decisions. The communications within teams are usually informal, unstructured. There is no need to formalize it, because it is for people to understand. Many CSCW technologies increase the efficiency of communications. One of the main objects is to let people in different places communicate as if they are face to face, such as virtual conferences, video telephones, blackboard systems and VR organizations. The other object is to help designers communicate across time, e.g. emails, workflow systems. Both of them need to make the communications visual and lifelikeness using video, audio and VR technologies. The Semantic Web technologies can help communicators share knowledge more easily and precisely. Another important advantage is to annotate the content of communications. It is difficult to organize and retrieve the video and audio records currently. On Semantic Web, the information contained in these records can be captured by the communicators themselves or by machines analyzing speeches, images and texts; then the XML and RDF can be used to annotate these records, describe the main topic and synopsis. We can make indexes and store the records in a structured space for sharing and reusing.

4.3 Between Designers and Environments These records as well as the documents and results of the design project are stored in common spaces. They are also design artifacts memories that can be used to support indirect, long-term communication. They show the results of the designers’ distributed activities. Designers can see their own contribution, the contributions of others, and the interactions. They can share results and ideas in these information spaces. Designers should be able to access the spaces and add or get information while the privacy information should be protected. There are already many technologies to manage the communication processes between designers and the common spaces, such as telnet, FTP, etc. There are mainly two problems when managing shared information spaces: one is that of indexation, that is, the provision of means that allow an individual to assign a publicly visible and permanent ‘pointer’ to each item so as to enable other individuals to locate the items relatively easily and reliably [1]; the other is the requirements of privacy and safety.

CSCW in Design on the Semantic Web

741

If the results and the communication records has been recorded and structured with semantic information, designers can find the information they want easily and know more about the previous design processes to help current design work. When these spaces are linked to the Web and share their information, they can become treasury for designers all over the world. On the Semantic Web, the documents and records of previous design projects are in well structure and meaningful to machines. Designers can search and process them easily and exactly with the help of agents. They no more have to find a piece of information in tons of document paper, or deal with lots of useless results provided by current search engines. The privacy requirement information can also be added to the original data, and the digital signature technology can be used for trust and encrypt processes. Design environments also include design tools using by designers. These tools, such as CAD systems or CSCW systems, are developed by different companies, and may not compatible. They have different user interfaces and data formats. Designers may cost long times to be familiar with a new design tool. HCI (Human-Computer Interaction) problem is one of the central challenges for CSCW.

5 Sharing and Reusing Knowledge An efficient way to accelerate design process and reduce workload is to reuse the results of previous design projects. Knowledge sharing is a dream of all designers. They have been suffering the difficulty in sharing conceptual knowledge representing designs because of lack of rich common vocabulary. It is extremely difficult to sharing knowledge in different domains and representations. Sharing and reusing knowledge on the Web may greatly help design work. Designers can share experiences and methods of design, share and reuse design results on the Web. There are six challenges of the sharing and reusing knowledge, that are acquire, model, retrieve, publish, reuse, and maintain[13]. The experiences and methods of design are mainly written in nature language currently. The documents and records of design are in different forms and representations too, sometimes in media forms. It is difficult to find information needed. The RDF technology can be used to describe all the resources on the Web. It can give a URI to each resource, such as a text, a document or a media file. Then it makes statements to describe the attributes of the resources represented by URIs and the relations among them. These statements can help us to search and manage the information. Designers also need specific knowledge in domains about the current design work. The specific domain knowledge is mainly represented using the specific terms of that domain currently, and most information is hidden in nature language texts. A newcomer may cost much time to search the information he need, and may trouble understanding what the information means. The Semantic Web makes information retrieval on the Web easily. The domain knowledge is represented using domain ontology. The relations of terms of two domains can be well defined. Users can share and reuse knowledge quickly and exactly via ontology on the Semantic Web. Everyone can publish results, ideas and concepts of design on the Semantic Web with minimal effort. Its unifying logical language will enable the knowledge to be progressively linked into a universal Web. The designs can be represented formally,

742

D. Kang et al.

and computers can find and reuse them automatically. This will open up the knowledge to meaningful analysis by software agents and people in different domains. This will extremely increase the volume of knowledge on the Web for all designers. How to find reusable design results? Which parts should be reused? If there are many suitable parts, which one is better to reuse? How to reuse them? These questions are mainly answered by designers themselves and it costs much time and manpower. Designers need to be skilled in reusing design knowledge and preparing their own design solutions to facilitate reuse[14]. The results of previous design are increasing very quickly. It makes find and reuse suitable results increasingly difficult. The general design ontology can represent the requirements and design results in a formal way. Machine can find out whether a design meets its requirements. When designers start a new design project provided with some requirements, they may ask agents to find design results meet one or some of these requirements in previous design projects. Not only search in the projects of the same organization but also the design results all over the world if they are represented on the Web. Adding a suitable previous design to the current design artifact is a difficult process and there may be many compatible problems. Today there are many industry standards and software methods to help reusing. For example, the COM mechanism in the windows system helps programmers to reuse software modules. An artifact can even describe how to reuse itself by using a unified language on the Semantic Web. Designers can easily reuse it according to this information. There is no central maintain system on the Web. The documents can be selfdescribing, homonymy resources and different versions of data can be easily managed on the Semantic Web, while most heavy maintain work can be done by machines. Sharing and reusing knowledge on the Web may greatly change the current way of design. Everyone may be both designer and client on the Web, and the design works are done by people all over the world.

6 Conclusion Design of complex artifacts requires the cooperation of experts working in different domains. CSCW in Design explores the potential of computer technologies to help cooperative design that is challenging and complex. The Web greatly extends the range of cooperation and provides increasing knowledge to designers. It requires new technologies to increase the efficiency of communications and knowledge sharing. The Semantic Web provides technologies such as XML, RDF and Ontology to represent design in formal, structured and unified forms. There are not only representations of the artifacts of design, but also the design rationales, including functions and requirements. It can also improve the efficiency of all communication and coordination processes in cooperative design. The Semantic Web will help designers with a more efficient way to share and reuse knowledge all over the Web. It will greatly change the current way of design.

CSCW in Design on the Semantic Web

743

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

10. 11. 12. 13. 14.

Schmidt, K.: Cooperative Design: Prospects for CSCW in Design. Design Sciences and Technology, Vol. 6, No. 2,1998, pp. 5-18. Klein, M., Sayama, H., Faratin, P., Bar-Yam, Y.: The Dynamics of Collaborative Design: Insights From Complex Systems and Negotiation Research. Journal of Concurrent Engineering: Research and Applications, 2003, in press. Klein, M., Sayama, H., Faratin, P., Bar-Yam, Y.: A Complex Systems Perspective on Computer-Supported Collaborative Design Technology. Communications of the ACM, Vol. 45, No. 11, 2002, pp. 27-31. Carstensen, P.H., Schmidt, K.: Computer Supported Cooperative Work: New Challenges to Systems Design. Handbook of Human Factors, Tokyo, 2002. Chan, S., Ng, V., Lin, Z.: Guest Editors’ Introduction: Recent Developments in Computer Supported Cooperative Work in Design. International Journal of Computer Applications in Technology, Vol. 16, Nos. 2/3, 2002. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American, Vol. 284, No. 5, 2001, pp. 34-43. Lin, J., Fox, M.S., Bilgic, T.: A Requirement Ontology for Engineering Design. Concurrent Engineering: Research and Applications, Vol. 4, No. 4, 1996, pp. 279-291. Dixon, J.R., Cunningham, J.J., Simmons, M.K.: Research in Designing with Features. Workshop on Intelligent CAD, Elsevier, 1987, pp. 137-148. Kopena, J.: Assembly Representations for Design Repositories and the Semantic Web. Report, Geometric and Intelligent Computing Laboratory, Computer Science Department, Drexel University, 2002. Hu, X., Pang, J., Pang, Y., Sun, W., Atwood, M., Regli, W: Design Rationale: A Background Study. Report, Geometric and Intelligent Computing Laboratory, Computer Science Department, Drexel University, 2002. Kitamura, Y., Mizoguchi, R.: An Ontological Schema for Sharing Conceptual Engineering Knowledge. International Workshop on Semantic Web Foundations and Application Technologies, 2003, in press. Harmelen, F.V., Broekstra, J., Fluit, C., Horst, H., Kampman, A., Meer, J., Sabou, M.: Ontology-based Information Visualization. Proceedings of the 15th International Conference on Information Visualization, London, 2001, pp. 546-554. Troxler, P.: Knowledge Technologies in Engineering Design. Proceedings of the 7th International Design Conference, Dubrovnik, 2002, pp. 429-434. Zdrahal, Z., Mulholland, P., Domingue, J., Hatala, M.: Sharing Engineering Design Knowledge in a Distributed Environment. Journal of Behaviour and Information Technology, Vol. 19, No. 3, 2000, pp. 189-200.

SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity – The P2P Meets the Semantic Web * Leyun Pan, Liang Zhang, and Fanyuan Ma Department of Computer Science and Engineering Shanghai Jiao Tong University, 200030 Shanghai, China {pan-ly, zhangliang}@cs.sjtu.edu.cn, [email protected]

Abstract. The semantic web technology is seen as a key to realizing peer-topeer for resource discovery and service combination in the ubiquitous communication environment. However, in a Peer-to-Peer environment, we must face the situation, where individual peers maintain their own view of the domain in terms of the organization of the local information sources. Ontology heterogeneity among individual peers is becoming ever more important issues. In this paper, we propose a multi-strategy learning approach to resolve the problem. We describe the SIMON (Semantic Interoperation by Matching between ONtologies) system, which applies multiple classification methods to learn the matching between ontologies. We use the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. On the prediction results of individual methods, the system combines their outcomes using our matching committee rule called the Best Outstanding Champion. The experiments show that SIMON system achieves high accuracy on real-world domain.

1 Introduction Today’s P2P solutions support only limited update, search and retrieval functionality, which make current P2P systems unsuitable for knowledge sharing purposes. Metadata plays a central role in the effort of providing search techniques that go beyond string matching. Ontology-based metadata facilitates the access to domain knowledge. Furthermore, it enables the construction of semantic queries [1]. Existing approaches of ontology-based information access almost always assume a setting where information providers share an ontology that is used to access the information. However, we rather face the situation, where individual peers maintain their own view of the domain in terms of the organization of the local file system and other information sources. Enforcing the use of a global ontology in such an environment would mean to give up the benefits of the P2P approach mentioned above. *

Research described in this paper is supported by The Science & Technology Committee of Shanghai Municipality Key Project Grant 02DJ14045 and by The Science & Technology Committee of Shanghai Municipality Key Technologies R&D Project Grant 03dz15027.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 744–751, 2004. © Springer-Verlag Berlin Heidelberg 2004

SIMON: A Multi-strategy Classification Approach

745

We can consider the process of addressing the semantic heterogeneity as the process of ontology matching (ontology mapping) [2]. Matching processes typically involve analyzing data instances associated with ontologies and comparing them to determine the correspondence among concepts. Given two ontologies in the same domain, we can find the most similar concept node in one ontology for each concept node in another one. However, at the Internet scale, finding such mappings is tedious, error-prone, and clearly not possible. It cannot satisfy the need of online exchange of ontology to two peers not in agreement. Hence, we must find some approaches to assist in the ontology (semi-) automatically matching process. In the paper, we will discuss the use of data instances associated with the ontology for addressing semantic heterogeneity. We propose the SIMON (Semantic Interoperation by Matching between ONtologies) system, which applies multiple classification methods to learn the matching between the pair of ontologies that are homogenous and their elements have significant overlap. Given the source ontology B and the target ontology A, for each concept node in target ontology A, we can find the most similar concept node from source ontology B. SIMON considers the ontology A and its data instances as the learning resource. All concept nodes in ontology A are the classification categories and relevant data instances of each concept are labeled learning samples in a classification process. The data instances of concept nodes in ontology B are unseen samples. SIMON classifies instances of each node in ontology B into the categories of ontology A according the classifiers for A. SIMON uses multiple learning strategies, namely multiple classifiers. Each of classifier exploits different type of information either in data instances or in the semantic relations among these data instances. Using appropriate matching committee method, we can get better result than simple classifier. This paper is organized as follows. In the next section, we introduce the overview of the ontology matching system. In section 3, we will discuss the multi-strategy classification for ontology matching. Section 4 presents the experiment results with our SIMON system. Section 5 reviews related work. We give the conclusion and the future work in section 6.

2 Overview of the Ontology Matching System The ontology matching system is trained to compare two ontologies and to find the correspondence among concept nodes. An example of such task is illustrated in Figure 1 and Figure 2. There are two ontologies of movie database. When a soft agent wants to collect some information about movies, it accesses a P2P system of movie. The movie information on individual peers will be marked up using some ontology such as Figure. 1 or Figure.2. Here the data is organized into a hierarchical structure that includes movie, person, company, awards and so on. Movies have attributes such as title, language, cast&crew, production company and genre and so on. Some classes link to each other by some attributes shown as italic in figure. However, because each of peers may use different ontology, it is difficult to completely integrate all data for an agent that only master one ontology. For example, agent may consider that “Movie” in Allmovie is equivalent to “Movie” in IMDB. However, in fact “Movie” in IMDB is just an empty ontology node and “MainMovieInfo” in IMDB is the most similar to “Movie” in Allmovie. The

746

L. Pan, L. Zhang, and F. Ma

mismatch also may happen between “MoviePerson” and “Person”, “GenreInstance” and “Genre”, “Awards and Nominations” and “Awards”.

Fig. 1. Ontology of movie database IMDB

Fig. 2. Ontology of movie database Allmovie

SIMON uses multi-strategy learning methods including both statistical and firstorder learning techniques. Each base learner exploits well a certain type of information from the training instances to build matching hypotheses. We use a statistical bag-of-words approach to classifying the pure text instances. Furthermore, the relations among concepts can help to learn the classifier. On the prediction results of individual methods, system combines their outcomes using our matching committee rule called the Best Outstanding Champion that is a weighted voting committee. This way, we can achieve higher matching accuracy than with any single base classifier alone.

3 Multi-strategies Learning for Ontology Matching 3.1 Statistical Text Classification One of methods that we use for text classification is naive Bayes, which is a kind of probabilistic models that ignore the words sequence and naively assumes that the presence of each word in a document is conditionally independent of all other words in the document. Naive Bayes for text classification can be formulated as follows. Given a set of classes and a document consisting of k words, we classify the document as a member of the class, words in the document:

that is most probable, given the

can be transformed into a computable expression by applying Bayes Rule (Eq. 2); rewriting the expression using the product rule and dropping the denominator, since this term is a constant across all classes, (Eq. 3); and assuming that words are independent of each other (Eq. 4).

SIMON: A Multi-strategy Classification Approach

is estimated as the portion of training instances that belong to

747

So a key

step in implementing naive Bayes is estimating the word probabilities, We use Witten-Bell smoothing [3], which depends on the relationship between the number of unique words and the total number of word occurrences in the training data for the class: if most of the word occurrences are unique words, the prior is stronger; if words are often repeated, the prior is weaker.

3.2 First-Order Text Classification As mentioned above, data instances under ontology are richly structured datasets, where data best described by a graph where the nodes in the graph are objects and the edges in the graph are links or relations between objects. The methods for classifying data instances that we discussed in the previous section consider the words in a single node of the graph. However, the method can’t learn models that take into account such features as the pattern of connectivity around a given instance, or the words occurring in instance of neighboring nodes. For example, we can learn a rule such as “An data instance belongs to movie if it contains the words minute and release and is linked to an instance that contains the word birth.” Clearly, rules of this type, that are able to represent general characteristics of a graph, can be exploited to improve the predictive accuracy of the learned models. This kind of rules can be concisely represented using a first-order representation. We can learn to classify text instance using a learner that is able to induce first-order rules. The learning algorithm that we use in our system is Quinlan’s Foil algorithm [4]. Foil is a greedy covering algorithm for learning function-free Horn clauses definitions of a relation in terms of itself and other relations. Foil induces each Horn clause by beginning with an empty tail and using a hill-climbing search to add literals to the tail until the clause covers only positive instances. When Foil algorithm is used as a classification method, the input file for learning a category consists of the following relations: 1. category(instance): This is the target relation that will be learned from other background relations. Each learned target relation represents a classification rule for a category. 2. has_word(instance): This set of relations indicates which words occur in which instances. The sample belonging a specific has-word relation consists a set of instances in which the word word occurs. 3. linkto(instance, instance): This relation represents that the semantic relations between two data instances.

748

L. Pan, L. Zhang, and F. Ma

We apply Foil to learn a separate set of clauses for every concept node in the ontology. When classifying the other ontology’s data instances, if an instance can’t match any clause of any category, we treat it as an instance of other category.

3.3 Evaluation of Classifiers for Matching and Matching Committees Method of Committees (a.k.a. ensembles) is based on the idea that, given a task that requires expert knowledge to perform, k experts may be better than one if their individual judgments are appropriately combined [7]. For obtaining matching result, there are two different matching committee methods according to whether utilizing classifier committee: microcommittees: System firstly utilizes classifier committee. Classifier committee will negotiate for the category of each unseen data instance. Then System will make matching decision on the base of single classification result. macrocommittees: System doesn’t utilize classifier committee. Each classifier individually decides the category of each unseen data instance. Then System will negotiate for matching on the base of multiple classification results. To optimize the result of combination, generally, we wish we could give each member of committees a weight reflecting the expected relative effectiveness of member. There are some differences between evaluations of text classification and ontology matching. In text classification, the initial corpus can be easily split into two sets: a training(and-validatiori) set and test set. However, the boundary among training set, test set and unseen data instance set in ontology matching process is not obvious. Firstly, test set is absent in ontology matching process in which the instances of target ontology are regarded as training set and the instances of source ontology are regarded as unseen samples. Secondly, unseen data instances are not completely ‘unseen’, because instances of source ontology all have labels and we just don’t know what each label means. Because of the absence of test set, it is difficult to evaluate the classifier in microcommittees. Microcommittees can only believe the prior experience and manually evaluate the classifier weights, as did in [2]. We adopt macrocommittees in our ontology matching system. Notes that the instances of source ontology have the relative “unseen” feature. When these instances are classified, the unit is not a single but a category. So we can observe the distribution of a category of instances. Each classifier will find a champion that gains the maximal similarity degree in categories of target ontology. In these champions, some may have obvious predominance and the others may keep ahead other nodes just a little. Generally, the more outstanding one champions is, the more we believe it. Thus we can adopt the degree of outstandingness of candidate as the evaluation of effectiveness of each classifier. The degree of outstandingness can be observe from classification results and needn’t be adjusted and optimized on a validation set. We propose a matching committee rule called the Best Outstanding Champion, which means that system chooses a final champion with maximal accumulated degree of outstandingness among champion-candidates. The method can be regarded as a weighted voting committee. Each classifier votes a ticket for the most similar node according to its judgment. However, each vote has different weight that can be measured by degree of champion’s outstandingness. We define the degree of outstandingness as the ratio of champion to the secondary node.

SIMON: A Multi-strategy Classification Approach

749

4 Experiments We take movie as our experiment domain. We choose the first three movie websites as our experimental objects which rank ahead in google directory Arts > Movies > Databases: IMDB, AllMovie and Rotten Tomatoes. We manually match three ontologies to each other to measure the matching accuracy that can be defined as the percentage of the manual mappings that machine predicted correctly. We found about 150 movies in each website. Then we exchange the keywords and found 300 movies again. So each ontology holds about 400 movies data instances except repetition. We use a three-fold cross-matching methodology to evaluate our algorithms. We conduct three runs in which we performed two experiments that map ontologies to each other. In each experiment, we train classifiers using data instances of target ontology and classify data instances of source ontology to find the matching pairs from source ontology to target ontology.

Table 1 shows the classification result matrixes of partial categories in AllmovieIMDB experiment, respectively for the statistic classifier and the First-Order classifier (The numbers in the parentheses are the results of First-Order classifier). Each column of the matrix represents one category of source ontology Allmovie and shows how the instances of this category are classified to categories of target ontology IMDB. Boldface indicates the leading candidate on each column. These matrixes illustrate several interesting results. First, note that for most classes, the coverage of champion is high enough for matching judgment. For example, 63% of the Movie column in statistic classifier and 56% of the Player column in FirstOrder classifier are correctly classified. And second, there are notable exceptions to this trend: the Player and Director in statistic classifier; the Movie and the Person in First-Order classifier. There will be a wrong matching decision according to results of Player column in statistic classifier, where Player in AllMovie is not matched to Actor but Director in IMDB. In other columns, the first and the second are so close that we can’t absolutely believe the matching results according to these classification results. The low level of classification coverage of champion for the Player and Director is explained by the characteristic of categories: two categories lack of feature properties.

750

L. Pan, L. Zhang, and F. Ma

For this reason, many of the instances of two categories are classified to many other categories. However, our First-Order classifier can repair the shortcoming. By mining the information of neighboring instances-awards and nominations, we can learn the rules for two categories and classify most instances to the proper categories. Because the Player often wins the best actor awards and vice versa. The neighboring instances don’t always provide correct evidence for classification. The Movie column and the Person column in table 6 belong to this situation. Because many data instances between these two categories link to each other, the effectiveness of the learned rules descends. Fortunately, in statistic classifier, the classification results of two categories are ideal. By using our matching committee rule, we can easily integrate the preferable classification results of both classifiers. After calculating and comparing the degree of outstandingness, we more trust the matching results for Movie and Person in statistic classifier and for Player and Director in First-Order classifier.

Fig. 3. Ontology matching accuracy

Figure.3 shows three runs and six groups of experimental results. We match two ontologies to each other in each run, where there is a little difference between two experimental results. The three bars in each experimental represent the matching accuracy produced by: (1) the statistic learner alone, (2) the First-Order learner alone, and (3) the matching committee using the previous two learners.

5 Related Works From perspective of ontology matching using data instance, some works are related to our system. In [2] some strategies classify the data instances and another strategy Relaxation Labeler searches for the mapping configuration that best satisfies the given domain constraints and heuristic knowledge. However, automated text classification is the core of our system. We focus on the full mining of data instances for automated classification and ontology matching. By constructing the classification samples according to the feature property set and exploiting the classification features in or among data instances, we can furthest utilize the text classification methods.

SIMON: A Multi-strategy Classification Approach

751

Furthermore, as regards the combination multiple learning strategies, [2] uses microcommittees and manually evaluate the classifier weights. But in our system, we adopt the degree of outstandingness as the weights of classifiers that can be computed from classification result. Not using any domain and heuristic knowledge, our system can automatically achieve the similar matching accuracy as in [2]. [5] also compare ontologies using similarity measures, whereas they compute the similarity between lexical entries. [6] describes the use of FOIL algorithm in classification and extraction for constructing knowledge bases from the web.

6 Conclusions The completely distributed nature and the high degree of autonomy of individual peers in a P2P system come with new challenges for the use of semantic descriptions. We propose a multi-strategy learning approach for resolving ontology heterogeneity in P2P systems. In the paper, we introduce the SIMON system and describe the key techniques. We take movie as our experiment domain and extract the ontologies and the data instances from three different movie database websites. We use the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. The system combines their outcomes using our matching committee rule called the Best Outstanding Champion. A series of experiment results show that our approach can achieves higher accuracy on a real-world domain.

References 1. J. Broekstra, M. Ehrig, P. Haase. A Metadata Model for Semantics-Based Peer-to-Peer Systems. Proceedings of SemPGRID ’03, 1st Workshop on Semantics in Peer-to-Peer and Grid Computing 2. A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to Map between Ontologies on the Semantic Web. In Proceedings of the World Wide Web Conference (WWW-2002). 3. I. H. Witten, T. C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in text compression. IEEE Transactions on Information Theory, 37(4), July 1991. 4. J. R. Quinlan, R. M. Cameron-Jones. FOIL: A midterm report. In Proceedings of the European Conference on Machine Learning, pages 3-20, Vienna, Austria, 1993. 5. A. Maedche, S. Staab. Comparing Ontologies- Similarity Measures and a Comparison Study. Internal Report No. 408, Institute AIFB, University of Karlsruhe, March 2001. 6. M.Craven, D. DiPasquo, D. Freitag, A. McCalluma, T. Mitchell. Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence, Elsevier, 1999. 7. F. Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, Vol. 34, No. 1, March 2002.

SkyEyes: A Semantic Browser for the KB-Grid Yuxin Mao, Zhaohui Wu, and Huajun Chen Grid Computing Lab, College of Computer Science, Zhejiang University, Hangzhou 310027, China {maoyx, wzh, huajunsir}@zju.edu.cn

Abstract. KB-Grid was introduced for publishing, sharing and utilizing an enormous amount of knowledge base resources on Semantic Web. This paper proposes a generic architecture of Semantic Browser for KB-Grid. Semantic Browser is a widely adaptable and expandable client to Semantic Web and provide users with a series of functions, including Semantic Browse, Semantic Query and so on. We introduce the key techniques to implement a prototype Semantic Browser, called SkyEyes. Also, an application of SkyEyes on Traditional Chinese Medicine (TCM) is described in detail.

1 Introduction The emergence of Semantic Web [1] will result in an enormous of knowledge base (KB) resources distributed across the web. In such a setting, we must face the challenges of sharing, utilizing and managing huge scale of knowledge. Traditional web architecture seems to be quite insufficient to meet these requirements. Since Grid technologies have the ability to integrate and coordinate resources among users without conflict and insecurity, we propose a generic model of Semantic Browser based on the basic ideas of Grid and Semantic Web and implement a prototype Semantic Browser, called SkyEyes. In the following two subsections, we will introduce the background knowledge as well as some related work. And we will propose a generic architecture of Semantic Browser for KB-Grid. We will also discuss the key techniques to implement a prototype Semantic Browser, SkyEyes. Besides, an application of SkyEyes on Traditional Chinese Medicine (TCM) will be described in detail. In the end, we will take a brief summary on our job and look forward to our future work.

1.1 Background The scale of the Internet has grown at a startling rate and provided us large amount of information. We have to face the problems of publishing, sharing and utilizing the web information, so Knowledge base Grid (KB-Grid) [2] was introduced to meet the requirements. KB-Grid is a project being developed by the Grid Computing Lab of Zhejiang University. KB-Grid suggests a paradigm that emphasizes how to organize, publish, discover, utilize, and manage web KB resources. In KB-Grid, distributed knowledge is presented by lightweight ontology languages such as RDF(S) [3]. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 752–759, 2004. © Springer-Verlag Berlin Heidelberg 2004

SkyEyes: A Semantic Browser for the KB-Grid

753

In such a setting, traditional web browsers would be useless, and some particular browser for KB-Grid should be promoted and developed. SkyEyes is just such a new type of Semantic Browser, aimed at browsing, querying, managing and updating knowledge from distributed KBs for KB-Grid.

1.2 Related Work There have been a variety of researches and applications related to our work: Ideagraph [4] is a personal knowledge management tool for creating, managing, editing and browsing personal KB, which is an easy-to-use software. However, it’s just a local tool for personal use rather than distributed application. IsaViz [5] is a visual working environment for creating and browsing RDF model. It makes use of Graph Viz library to display an RDF model as a bitmap, which is lack of proper layout, so its interaction and effect are not satisfying. OntoRama [6] is a prototype ontology browser, which takes RDF/XML as input format and models ontology as a configurable graph. However, the function of OntoRama is confined to browse and display and it hasn’t supported query yet. There is still no Semantic Browser in real sense. Most of these related researches and applications still aim at visualizing or browsing ontology or knowledge, lack of more intelligent functions such as Semantic Query and reasoning. Besides, many applications are only for local use rather than distributed environment. The application background of our work is TCM Information Grid and our goal is to build Open Knowledge Service Architecture to provide a wide series of knowledge services. The idea of SkyEyes is to build a widely adaptable and expandable intelligent client to Semantic Web, which will provide more intelligent functions. The immediate application of SkyEyes is browse and query of TCM ontology.

2 Overview Semantic Browser works as an intelligent client to Semantic Web and is an interface between KBGrid and KB-Grid users. Any user of KB-Grid can publish, browse, query, manage and utilize knowledge via this browser. Here Fig. 1. A generic architecture of Semantic Browser we propose a generic architecture of Semantic Browser for KB-Grid as figure 1.

754

Y. Mao, Z. Wu, and H. Chen

2.1 Knowledge Server KB-Grid consists of many decentralized KB-Grid nodes. Each node may include several KBs. KB-Grid nodes exchange knowledge and deliver knowledge services through Grid Service interface. These nodes work collectively as a super knowledge server and the inner structure of KB-Grid is transparent to clients. Clients interact with knowledge server through Grid Service. A meta-information register center is set to coordinate KB resources. A shared ontology described by RDF(S) is stored in the register center, which performs as an index for distributed KBs.

2.2 Semantic Browser Plugins Semantic Browser remotely accesses knowledge services through Grid Service. For each type of knowledge services, we accordingly develop a plugin, which is an independent module in Semantic Browser. Service Discovery Plugin. Knowledge server will dynamically deliver various knowledge services and Service Discovery plugin accesses Service Discovery services to get meta-information about services. Semantic Browse Plugin. Semantic Browse is to visualize concepts and their instances that are explicitly described and the relationships among them as semantic graphs and assist users to browse semantic information with semantic links. Semantic Browse plugin accesses Semantic Browse services to carry out Semantic Browse. Semantic Query Plugin. When KB-Grid becomes very huge or they are not very familiar with the structure of the knowledge, users would better query the knowledge instead of browse. Semantic Query is to query semantic information or knowledge with semantic links. Semantic Query plugin accesses Semantic Query services to query semantic information from distributed KBs and optimizes query results. Knowledge Management Plugin. Knowledge Management plugin accesses Knowledge Management services to manage knowledge both local and remote. Reasoning Plugin. Reasoning plugin accesses reasoning services to perform reasoning based on domain ontology to solve practical problems. Users can dynamically choose rule set and case base according to specific problems. Besides, SkyEyes still reserves slots for extended plugins to access possible knowledge services that may be delivered by knowledge server in the future.

2.3 Intelligent Controller Since each plugin just implements a single function separately, Semantic Browser needs an intelligent controller to combine them as a whole. The intelligent controller is the kernel of Semantic Browser, which coordinates and schedules various plugins, making proper plugins access proper services of KB-Grid.

Sky Eyes: A Semantic Browser for the KB-Grid

755

2.4 SGL-Parser and SG-Factory For different formats of semantic information such as RDF(S), XML, OWL and so on, Semantic Browser should display a uniform semantic graph. Besides, if we want to make semantic graphs look clear without loss of semantics, we have to take more into account. Since it’s hard to draw semantic graphs just based on RDF or other existed languages, we develop Semantic Graph Language (SGL) for displaying semantic graphs. And the semantic information acquired from server will be translated uniformly into SGL by Semantic Browser plugins. The SGL Parser will read and parse SGL and the SG-Factory will produce uniform and standard semantic graph, despite the formats of semantic information.

3 Implementation and Key Techniques According to the generic architecture of Semantic Browser, we develop a prototype Semantic Browser, called SkyEyes and have implemented two major functions of Semantic Browser, Semantic Browse and Semantic Query. The user interface is similar with traditional web browsers as figure 2 displays; so common users can operate it easily and well. However, to solve some problems with SkyEyes, users should have enough domain knowledge. SkyEyes was implemented with JAVA, so it’s portable and can be used in different environments. There are several key techniques to Fig. 2. SkyEyes: a Semantic Browser implement SkyEyes.

3.1 Expandable Plugin Mechanism SkyEyes is a lightweight client, which calls special plugins to access remote knowledge services to solve problems. As the scale of knowledge increases or users’ requirements change, knowledge services of KB-Grid may be dynamically changed and delivered. The expandable plugin mechanism allows SkyEyes to expand its function easily just by adding and updating new plugins, without the code and structure modified. In this way, users can even custom their own browsers by subscribing or unsubscribing services as they wish.

756

Y. Mao, Z. Wu, and H. Chen

3.2 Operatable Vectographic Components Each user operation in SkyEyes will result in a semantic graph, which is composed of vectographic components. Vectographic components can be scaled and dragged freely without the quality of the graph reduced, so the display effect of semantic graph is much better and more acceptable to users. A vectographic component itself doesn’t contain or store any data and it is used as proxy or view for semantic information. In a semantic graph, each vectographic component provides users not only a view of semantic information but also a series of functions. If a great deal of semantic information is returned from server, the structure of corresponding semantic graph will become so complex that a lot of nodes and arrows will overlap with each other in one graph. Many visualization tools do suffer this problem that is quite inconvenient to users and sometimes intolerable. In order to solve this problem, SkyEyes adopts radial layout algorithm [6] to arrange the global layout of a semantic graph., so as to avoid overlaping.

3.3 SGL: Semantic Graph Language Not like general graph languages, SGL takes semantics in and treat semantics as part of graph elements. Graph elements described in SGL is related with each other by semantic link not hyper link or graphic link. We can use SGL to describe both the semantics and the appearance of a semantic graph. SGL is an XML-based language and here is part of brief BNF definition of SGL.

The structure of a semantic graph is clear in such a SGL document, and therefore SkyEyes can draw out a standard semantic graph. There is a simple example.

SkyEyes: A Semantic Browser for the KB-Grid

757

Fig. 3. A simple example of SGL

4 Application: Semantic Browse on Traditional Chinese Medicine A practical domain of SkyEyes is Traditional Chinese Medicine (TCM). We have been building an ontology of Unified TCM Language System, which includes TCM concepts, objects, and their relationships with each other. We have chosen Protégé 2000 [7] as ontology editor to build the ontology in RDF(S). The TCM ontology is distributed across the web in more than twenty nodes throughout our country. They share a common ontology in the meta-information register center and are related by semantic links, URIs. Now we have finished in building the whole concept definition of the TCM ontology and edited more than 100,000 records of instances. Users of this TCM ontology can download SkyEyes from our server and install it. Then they can use it to acquire useful information they need from TCM ontology. TCM experts can solve practical problems with the help of SkyEyes, or they can take the result returned by SkyEyes as a reference. For example, if a doctor is not sure about the use of a medicine, he can turn to SkyEyes.

4.1 Semantic Browse Before starting to browse, the doctor can input the URL of the TCM ontology into the address field of SkyEyes and begin to perform Semantic Browse. The process of Semantic Browse can be divided into several steps: First, given a URL, Sky Eyes will connect to knowledge server and get RDF(S) files. It will call Jena [8] API to parse RDF(S) files into some data model that can be understood and processed by client. SkyEyes will extract meta-information about RDF(S) and parse it into a class hierarchy tree to finish the initialization. Next, the doctor can expand a tree node to browse its sub-classes, or click it to list its direct instances in the instance list area. The relationships between the class and properties are displayed as a semantic graph in the semantic graph area. Then, if the doctor finds the instance of the medicine in the instance list, he can click the instance and then all its properties and property values will be displayed around the node standing for the medicine in a graph, including compositions, effect and so on, so the use of that medicine will be very clear. If the doctor wants to know more about one composition of that medicine, he can click the node standing for the composition, then a detailed semantic graph about that composition will be displayed.

758

Y. Mao, Z. Wu, and H. Chen

During Semantic Browse, each user operation will send in a URI, acquire further related semantic information from server through this URI. SkyEyes then calls JGraph [9] API to draw semantic graph according to the information. If the doctor can’t find the information this way, he can turn to Semantic Query.

4.2 Semantic Query SkyEyes itself is unable to query, and query is done by accessing query services. SkyEyes just provides an easy-to-use interface to do Semantic Query and visualizes query results. Query results is returned from server as semantic information, which will be optimized by Semantic Query plugin and displayed as semantic graphs. For the moment, SkyEyes provides four kinds of Semantic Query, and for each, SkyEyes will display a type of corresponding semantic graph. Class-class query returns semantic information about the specific class, its upclasses and its sub-classes. Class-instance query returns semantic information about the specific class and its direct instances. Instance-property query returns semantic information about the specific instance and its properties. Correlative query returns semantic information about the specific class, its upclasses, its sub-classes, and their instances. It fits the condition that the class users queried has no direct instances at all but its sub-classes or up-classes have. The doctor can input restrictive information about the medicine to perform Semantic Query within the TCM ontology. The query result won’t be documents simply containing the keyword but a semantic graph focusing on the medicine, just like the semantic graph displayed when browsing. He can also set the depth of each query and control the semantic graph by configuring some parameters of SkyEyes. Deeper depth means there is more semantic information returned and displayed.

4.3 Reasoning If the problem is more than querying a medicine, for example, treating a patient, the doctor can take advantage of reasoning function to perform more complex work. Firstly, he can describe the symptoms of the patient in a particular format. Next, he could choose a case base that may contain a similar case with a rule set on diagnosis and treatment. The reasoning plugin of SkyEyes will access reasoning services to perform complex reasoning and the results will be returned as semantic information. At last, SkyEyes will display results as semantic graphs and the doctor could take the result as a useful reference when treating the patient.

5 Summary SkyEyes is a prototype Semantic Browser, which was implemented according to a generic architecture of Semantic Browser for KB-Grid. As an intelligent client to Semantic Web, it provides users with the major functions of Semantic Browser and a

SkyEyes: A Semantic Browser for the KB-Grid

759

friendly user interface. SkyEyes owns several important features that distinguish itself from traditional web browsers: Open. SkyEyes is based on Grid Service and works as part of KB-Grid, not subject to traditional C/S structure. Exact. Browse and query utilize semantic links to locate and return more exact and useful information users require. Intelligent. Accessing knowledge services to understand and solve more complex and practical problems, which previously call for domain experts. Expandable. Expandable plugin mechanism allows expanding function dynamically according to the services delivered by knowledge server. Universal. SkyEyes converts various formats of semantic information into uniform semantic graphs based on SGL. Convenient and Operatable. Use of vectographic components provides users with excellent view of semantic information and a series of interactive functions. Our future work is to build a knowledge sharing and knowledge management platform towards Semantic Web. As part of this platform, the function of SkyEyes is still insufficient, so a series of knowledge services especially reasoning services and knowledge management services will be developed based on Grid Service and more Semantic Browser plugins will be added to SkyEyes to extend its functions. Besides, we will go on with the TCM Information Grid for TCM science research.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

Berners-Lee, T., Hendler, J., Lassila, O. The Semantic Web. Scientific American, May, 2001. WU ZhaoHui, CHEN HuaJun, XU JieFeng. Knowledge Base Grid: A Generic Grid Architecture for Semantic Web. JCST Vol.18, No.4, July, 2003. Resource Description Framework (RDF) Model and Syntax Specification. http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/. Ideagraph, an Idea Development Tool for the Semantic Web. http://ideagraph.net/. IsaViz: A Visual Authoring Tool for RDF. http://www.w3.org/2001/11/IsaViz/. Peter Eklund, Nataliya Roberts, Steve Green. OntoRama: Browsing RDF Ontologies using a Hyperbolic-style Browser. The First International Symposium on CyberWorlds (CW2002), pp.405-411, Theory and Practices, IEEE press, 2002. Protégé 2000. http://protege.stanford.edu/. Jena Semantic Web Toolkit. http://www.hpl.hp.com/semweb/jena.htm. Jgraph. http://jgraph.sourceforge.net/.

Toward the Composition of Semantic Web Services Jinghai Rao and Xiaomeng Su Department of Computer and Information Science, Norwegian University of Science and Technology, N-7491, Trondheim, Norway {jinghai, xiaomeng}@idi.ntnu.no

Abstract. This paper introduces a method for automatic composition of semantic web services using linear logic theorem proving. The method uses semantic web service language (DAML-S) for external presentation of web services, and, internally, the services are presented by extralogical axioms and proofs in linear logic. Linear logic(LL)[2], as a resource conscious logic, enables us to define the attributes of web services formally (in particular, qualitative and quantitative value of non-functional attributes). The subtyping rules that are used for semantic reasoning are presented as linear logic implication. We propose a system architecture where the DAML-S parser, linear logic theorem prover and semantic reasoner can work together. This architecture has been implemented in Java programming language.

1

Introduction

The Grid is a promising computing platform that integrates resources from different organizations in a shared, coordinated and collaborative manner to solve large-scale science and engineering problems. The current development of the Grid has adapted to a services oriented architecture and, as a result, recently Grid technologies are evolving towards an Open Grid Services Architecture (OGSA).The convergence of Web services with Grid computing will accelerate the adoption of Grid technologies. [1] defines a Grid service as a Web service that provides a set of well-defined interfaces and follows specific conventions. As such, Grid service will inherently share some of the same problems and technical challenges of Web service in general. The ability to efficiently and effectively select and integrate interorganizational services on the web at runtime is a critical step towards the development of the online economy. In particular, if no single web service can satisfy the functionality required by the user, there should be a possibility to combine existing services together in order to fulfill the request rapidly. However, the task of web service composition is a complex one. Firstly, web services can be created and updated on the fly and it may be beyond human capabilities to analyze the required services and compose them manually. Secondly, the web services are developed by different organizations that use different semantic model to M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 760–767, 2004. © Springer-Verlag Berlin Heidelberg 2004

Toward the Composition of Semantic Web Services

761

describe the features of services. The different semantic models indicate that the matching and composition of web service have to take into account on the semantic information. In this paper, we propose a candidate solution which we believe to contribute to solving the two challenges. We describe a method for automated web service composition which is based on the proof search in (propositional) Multiplicative Intuitionistic fragment of Linear Logic (MILL [3]). The idea is, given a set of existing web services and a set of functionality and non-functional attributes, the method finds a composition of atomic services that satisfies the user requirements. The fact that Linear Logic is resource conscious makes it possible to make proving on both qualitative and quantitative non-functional attributes of web services. Because of the soundness of the logic fragment correctness of composite services is guaranteed with respect to initial specification. Further, the completeness of logic ensures that all composable solutions would be found. The rest of this paper is organized as follows: Section 2 presents a system architecture for composition of semantic web services. Section 3 presents the methods on transformation between DAML-S documents and Linear Logic axioms. Section 4 discusses the usage of type system to enable semantic composition. Section 5 is the related works and the conclusion of the paper.

Fig. 1. Architecture for Service Composition

2

The Service Composition Architecture

Figure 1 depicts the general architecture of the proposed web service composition process. The approach is presented by the following process. First, a description of existing web services(in the form of DAML-S Profile) is translated into axioms of Linear Logic, and the requirements to the composite services are specified in form of a Linear Logic sequent to be proven. Second, the Linear Logic Theorem Prover determines whether the requirements can be fulfilled by composition of existing atomic services. On reading each propositional variable, the theorem prover requires the semantic reasoner to provide possible subtyping inference. The subtypings are inserted into the theorem prover as logic implications. If one or more proofs are generated the last step is the construction of flow models (written in DAML-S Process). The process is controlled by the coordinator,

762

J. Rao and X. Su

especially when components are distributely located. During the process, the user is able to interact with the system by GUI. In this paper, we pay special attention on the DAML-S Parser and the Semantic Reasoner. The detail on theorem proving part has been already introduced in [7]. The readers who have knowledge about Linear Logic or theorem proving are able to understand this part easily without referring to the separate publication.

3

Transforming from DAML-S to Linear Logic Axioms

In our system, the web services are specified by DAML-S profile externally and presented by LL Axioms internally. The transformation is made automatically by DAML-S Parser. The detail presentation of DAML-S can be found in [4]. Here, we focus on the presentation of LL axioms. Generally, a requirement to composite web service, including functionalities and non-functional attributes, can be expressed by the following formula in LL:

where is a set of logical axioms representing available atomic web services, is a conjunction of non-functional constraints. is a conjunction of nonfunctional results. We will distinguish these two concepts later. is a functionality description of the required composite service. Both I and O are conjunctions of literals, I represents the set of input parameters of the service and O represents output parameters produced by the service. Intuitively, the formula can be explained as follows: given a set of available atomic services and the non-functional attributes, try to find a combination of services that computes O from I. Every element in is in form whereas meanings of I and O are the same as described above. Next, we describe the detail procedure to transform a DAML-S document into the linear logic expression. We present in sequence the transformation on functionalities and non-functional attributes. Afterwards, we present the whole process by an example.

3.1

Transforming on Functionalities

The functionality attributes are used to connect atomic services by means of inputs and outputs. The composition is possible only if output of one service could be transferred to another service as input. The web service is presented by DAML-S profile externally. The functionality attributes of the “ServiceProfile” specifies the computational aspect of the service, denoting by the input, output, precondition and postcondition. Below is an example of the functionalities for a temperature report service:

Toward the Composition of Semantic Web Services

763

From the computation point of view, this service requires an input that has type “&zo;#ZipCode” and produces an output that has type “&zo;#CelsTemp”, the value of temperature measured in Celsius. Here, we use entity types as a shorthand for URIes. For example, &zo;#ZipCode refers to the URI of the definitions for zip code parameter: http://www.daml.org/2001/10/html/zipcode-ont\#ZipCode When translating to Linear Logic formula, we translate the field “restrictedTo” (variable type) instead of the parameter name, because we regard the parameters’ type as their specification. Below is the example propositional linear logic formula that expresses the above DAML-S document:

3.2

Non-functional Attributes

Non-functional attributes are useful in evaluating and selecting service when there are many services that have the same functionalities. In the service presentation, the non-functional attributes are specified as facts and constraints. We classify the attributes into four categories: Consumable Quantitavie Attributes: These attributes limit the amount of resources that can be consumed by the composite service. The total amount of resource is the sum of all atomic services that formulate the composite service. Non-consumable Quantitative Attributes: These attributes are used to limit the quantitative attributes for each single atomic service. The attributes can present either amount or scale. Qualitative Constraints: Those attributes which can’t be expressed by quantities are called qualitative attributes. Qualitative Constraints are those qualitative attributes which specify the requirements to execute a web service. Qualitative Facts: Another kind of qualitative attributes, such as service type, service provider or geographical location, specify the facts regarding the services’ environment. Those attributes can be regarded as goals in LL.

764

J. Rao and X. Su

The different categories of non-functional attributes are presented differently in logical axioms. The non-functional attributes can be described as either constraints or results. The constraints and results of the services can be presented as follows: The constraints to the service: The results produced by the service:

3.3

Example

Here, we illustrate the LL presentation of the temperature report service example where both functionalities and non-functionalities have been taken into consideration. The complete DAML-S description of this example can be found at http://bromstad.idi.ntnu.no/services/TempService.daml. For the sake of readability, we omit the namespace in the name of the parameters. The available atomic web services in the example are specified as follows:

The formula presents three atomic services. name2code outputs the zip code of a given city. temp reports the Celsius temperature of a city, given the zip code of the city. trans transforms the Celsius temperature to the Fahrenheit temperature. in the left hand side of the name2code service denotes that 10 Norwegian Krones(NOK) are consumed by executing the service. The service trans costs 5 NOK and has a quality level 2. The quality level is not a consumable value, so it appears at both the left and right hand sides. In the specification it is also said that the temperature reporting service temp is located in Norway and it only responses to the execution request that has certificated by Microsoft. For other attributes which are not specified in service specification, the values are not considered. The required composite service takes a city name as input and outputs the Fahrenheit temperature in that city. It is specified by LL as follows:

The non-functional attributes for the composite service are:

This means that we would like to spend no more than 20 NOK for the composite service. The quality level of all the selected services should be no higher than 3. The composite service consumer has certification from Microsoft (!CA_MICROSOFT) and it requires that all location-aware services are located

Toward the Composition of Semantic Web Services

765

within Norway (!LOC_NORWAY). ! symbol describes that we allow unbound number of atomic services in the composite service. For the qualitative constraints (location), the service uses LOC_NORWAY to determine its value and we can determine in the set of requirements whether a service meets the requirement. By now, we have discussed how DAML-S specification have been translated to LL extralogical axioms. Next step is to derive the process model from the specification of the required composite service. If the specification can be proven to be correct, the process model is extracted from the proof. we have stressed the proof in a separate publication [7] and therefore we don’t go into detail here. The result dataflow of the selected atomic service are presented through a graphic user interface. A screen shot is presented in figure 2. In figure 2, the interface of the user required service is presented in the ServiceProfile panel (upperright) and the dataflow of the component atomic services is presented in the ServiceModel panel (lowerright).

Fig. 2. The Screen Shot

4

Composition Using Semantic Description

So far, we considered only exact match of the parameters in composition. But in reality, two services can be connected together, even if the output parameters of one service does not match exactly the input parameters of another service. In general, if a type assigned to the output parameter for service A is a subtype of the type assigned to an input parameter for service B, it is safe to transfer data from the output to the input. If we consider resources and goals in LL, the subtyping is used in two cases: 1) given a goal of type T, it is safe to replace by another goal of type S, as long as it holds that T is a subtype of S; 2) conversely, given a resource of type S, it is safe to replace by another resource of type T, as long as it holds that T is a subtype of S. In the following we extend the subsumption rule for both resource and goal. Here we should mention that the rules are not extension to LL. The subtyping can be explained by inference figures of LL. We write in following to

766

J. Rao and X. Su

emphasis that these inference rules are for typing purposes, not for sequencing methods, when constructing programs. First of all, the subtype relation is transitive.

In addition, subsumption rules state the substitution between types.

Such subtyping rules can be applied to either functionality (parameters) or non-functional attributes. Here we use two examples to illustrate the basic idea. First, let us assume that the output of the temperature reporting service is air temperature measured by Celsius scale, while the input of temperature translation service is all Celsius temperature. Because the later is more general than the former, it is safe to transfer the more specific output to the more general input. Another example considers the qualitative facts. If an atomic service is located in Norway, we regard Norway is a goal in LL. Because Norway is a country in Europe, it is safe to replace Norway with Europe. Intuitively, if the user requires a service that is located within Europe, the service located within Norway meets such requirement. In this paper, we assume that the ontology used by service requester and that for the service provider are interoperable. Otherwise, the ontology integration is another issue which is beyond the scope of this paper.

5

Conclusion

This paper approaches the important issue of automatic semantic web service composition. It argues that Linear Logic, combined with semantic reasoning for relaxation of service matching (choosing), offers a potentially more efficient and flexible approach to the successful composition of web services. To that end, an architecture for automatic semantic web service composition is introduced. The functional settings of the systems are discussed and techniques for DAML-S presentation, Linear Logic presentation, and semantic relaxation are presented. A prototype implementation of the approach is proposed to fulfill the task of representing, composing and handling of the services. This paper concentrate on the automatic translation part and the semantic relaxation part, while the theorem proofing part has been stressed elsewhere [7]. Some works have been performed on planning based on semantic description of web services. In [5], the authors adapt and extend the Golog language for automatic construction of web services. The authors addressed the web service composition problem through the provision of high-level generic procedures and customizing constraints. SWORD[6] is a developer toolkit for building composite web services. SWORD uses ER model to specified the inputs and outputs of the web services. As a result, the reasoning is made based on the entity

Toward the Composition of Semantic Web Services

767

and attribute information provided by ER model. [8] presents a semi-automatic method for web service composition. The choice of the possible services are based on functionalities and filtered on non-functional attributes. The main difference between our methods and the above methods is we consider the non-functional attributes during the planning. Usage of Linear Logic as planning language allows us formally define the non-functional characteristics of web services, in particular, quantitative attributes. In addition, we distinguish the constraints and facts in qualitative attributes. The planner treats them differently in logic formulas. Also, as more and more organizations and companies embrace the idea of using web service interface as a cornerstone for future Grid computing architecture, the author hope that the revealing and discussing of semantic related issues will inform researchers in Grid computing of the intricate problem of service composition which might as well rise up in Grid service research. Our current work is directed to add the disjunction connective to the logical specification of service output. This is useful when we should consider exceptions or optional outputs of atomic services. By using disjunction, the planner is also able to generate control constructs such as choice and loop. Although the introduction of disjunction is easy in logic presentation, the proving speed is slowed down significantly. The mechanism to improve the computation efficiency of proving is also under consideration.

References 1. I. Foster, C. Kesselman, J. Nick, and S. Tuecke. The physiology of the grid. Online: http://www.gridforum.org/ogsi-wg/drafts/ogsa_draft2.9_2002-06-22.pdf, January 2002. 2. J.-Y. Girard. Linear logic. Theoretical Computer Science, 50:1–102, 1987. 3. Patrick Lincoln. Deciding provability of linear logic formulas. In London Mathematical Society Lecture Note Series, volume 222. Cambridge University Press, 1995. 4. David Martin et al. DAML-S(and OWL-S) 0.9 draft release. Online: http://www.daml.org/services/daml-s/0.9/, May 2003. 5. Sheila McIlraith and Tran Cao Son. Adapting golog for composition of semantic web services. In Proceedings of the Eighth International Conference on Knowledge Representation and Reasoning(KR2002), Toulouse, France, April 2002. 6. Shankar R. Ponnekanti and Armando Fox. SWORD: A developer toolkit for web service composition. In The Eleventh World Wide Web Conference, Honolulu, HI, USA, 2002. 7. Jinghai Rao, Peep Kungas, and Mihhail Matskin. Application of linear logic to web service composition. In The First International Conference on Web Services, Las Vegas, USA, June 2003. CSREA Press. 8. Evren Sirin, James Hendler, and Bijan Parsia. Semi-automatic composition of web services using semantic descriptions. In Web Services: Modeling, Architecture and Infrastructure” workshop in conjunction with ICEIS2003, 2002.

A Viewpoint of Semantic Description Framework for Service* Yuzhong Qu Dept. of Computer Science and Engineering Southeast University, Nanjing 210096, P. R. China [email protected]

Abstract. The evolvement of Semantic Web Service technology is synergistic with the development of the Semantic Grid, reinforced by the adoption of a service-oriented approach in the Grid through the OGSI. However, the Semantic Web Service technology is far from maturity to pursue the vision of semantic service. This paper illustrates the semantic description framework of DAML-S by using RDF graph model, gives some thinking in improving DAML-S and designing “semantic” service description languages, presents a novel semantic description framework for service, and then illustrates the usage of our framework by two examples in describing web service and grid service.

1 Introduction Building on both Grid and Web services technologies, the Open Grid Services Infrastructure (OGSI) [1,2] defines mechanisms for creating, managing, and exchanging information among entities called Grid services. The main motivation of OGSI is the need for open standards that define the interaction and encourage interoperability between components supplied from different sources. Web/Grid services are represented and described using the WSDL, which uses XML to describe services as a set of endpoints operating on messages. Based on this description language, it is usually impossible for software agents to figure out the precise meaning of the service identifiers and functionality provided by the service. The lack of semantics in the capabilities of the service makes it difficult for machines to discover and use the service at the right time. To bring semantics to web services [3], the Semantic Web technologies such as RDF Schema, DAML+OIL or OWL, have been used to provide more explicit and expressive descriptions for web services. DAML-S [4,5] is a key component towards the Semantic Web Services vision. DAML-S can be used to characterize the serviceportfolio offered by a web service in a more expressive manner than the existing WSDL, thereby opening up the possibility of automatic service discovery and use. Recently, some applications of Semantic Web technologies in Grid applications [6] are well aligned with the research and development activities in the Semantic Web and Grid community, most notably in the areas where there is established use of on*

This paper is jointly supported by NSFC with project no. 60173036 and JSNSF with project no. BK2003001.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 768–777, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Viewpoint of Semantic Description Framework for Service

769

tology. However, the application of Semantic Web technologies inside the grid infrastructure is less well developed. The emerging work on Semantic Web Services is synergistic with the development of the Semantic Grid, reinforced by the adoption of a service-oriented approach in the Grid through the OGSI, as pointed out by David De Roure [7]. We see the importance of semantic description for Grid Service in the vision of Semantic Grid [8], and observe the trend of convergence between Web Service and Grid Service in the future. We also notice that the Semantic Web Service technology (e.g. DAML-S) is far from maturity to pursue the vision of semantic service. Against this background, we focus on the semantic description framework for web service and grid service in this paper. It illustrates the semantic description framework of DAML-S by using RDF graph model, discusses some consideration in improving DAML-S and designing other “semantic” service description languages (SSDL, in short), presents a novel semantic description framework for service, and then illustrates the usage of our framework by two examples.

2 Semantic Description Framework of DAML-S DAML-S provides a set of basic classes and properties for declaring and describing services, as well as an ontology structuring mechanism inherited from DAML+OIL. The set of basic classes and properties is defined in DAML-S Service (an upper ontology) and three subparts including DAML-S Profile, DAML-S Process and DAMLS Grounding. An overview of DAML-S Ontology and the DAML-S Process Model will be discussed respectively in the following subsections by using RDF graph model. Some conventions in the RDF graph model is as follows:

The above notation means that the object labeled with “p” is a property, whose domain and range are specified to be class A and B, respectively. In addition, the label “m:n” is the multiplicity specification of the class A with the property “p” if the multiplicity presents.

The above notation means that the object labeled with “C” and the object labeled with “D” have the binary relationship denoted by the property labeled with “p”, in other words, object C has object D as a value of property p. Usually, a class or a property is assumed to be in a namespace. We use the prefixes such as “xsd”, “rdf”, “rdfs” and “owl”, to denote the commonly used namespaces. In addition, the prefixes such as “service”, “process” and “profile”, are used to denote the namespaces as defined in DAML-S. Note that the namespace prefix of a name may be omitted when there is no ambiguity.

770

Y. Qu

2.1 An Overview of the DAML-S Ontology An RDF graph model of the DAML-S ontology is roughly depicted in Fig. 1. Each instance of Service “presents” zero or more instances of (a descendant class of) ServiceProfile, and may be “describedBy” at most one instance of (a descendant class of) ServiceModel. ProcessModel is a subclass of ServiceModel. Each instance of ProcessModel can has at most one process through the “hasprocess” property. It adopts the processes as classes approach, i.e. any application specific process should be defined as a subclass of (or, a descendant class of) Process. ProcessPowerSet is defined to be the class of all subclasses of Process. In addition, each instance of ProcessModel can have at most one instance of ProcessControlModel, and there exist some debates on the necessity of the process control model. Further, processes have IOPEs (inputs, outputs, preconditions, effects) properties to describe their functionalities, and more discussion will be given in section 2.2. Profile is a subclass of ServiceProfile, and each instance of Profile can point to at most one process through the functional property “has_process”. In addition, each instance of Profile has its IOPEs to present function description. But these IOPEs in profile just refer to the corresponding IOPEs in process, e.g. an input in a profile can only refer to a sub-property of input in the process model.

Fig. 1. RDF Graph Model of the DAML-S Ontology

In the Congo example of the DAML-S 0.9 [4], ExpressCongoBuyService is defined to be an instance of Service. This service presents Profile_Congo_BookBuying_Service (an instance of Profile), and is described by ExpressCongoBuyProcessModel (an instance of ProcesssModel). ExpressCongoBuyProcessModel has a user-defined process ExpressCongoBuy,

A Viewpoint of Semantic Description Framework for Service

771

which is defined to be a subclass of AtomicProcess, and then can be seen as a subclass of Process.

2.2 Framework of the DAML-S Process Ontology Within DAML-S Process Model, there are two chief components of a process model: the Process Ontology and the Process Control Ontology. The latter is out of the scope of this paper, and the former describes a service in terms of its IOPEs and where appropriate, its component sub-processes.

Fig. 2. RDF Graph Model of the DAML-S Process Ontology

An RDF graph model of the DAML-S process ontology is roughly depicted in Fig.2. Processes can have IOPEs and participants. Among them, input, output and participant are sub-properties of parameter in the process model. The ranges of precondition, effect and output are specified to be Condition, ConditionalEffect and ConditionalOutput respectively. The representation of preconditions and effects, as well as coCondition, ceCondition and ceEffect, depends on the representation of rules in the DAML language, but no proposal for specifying rules in DAML has been put forward. For this reason, they are currently mapped to anything possible. In addition to its action-related properties, a process has a number of bookkeeping properties such as name (rdfs:Literal) and address (URI). There are three different types of processes: atomic, simple, and composite. Composite process can be described as atomic processes chained together in a process model. A composite process must have only one composedOf property by which is indicated the control structure of the composite process. The use of control constructs, such as if-then-else, allows the ordering and the conditional execution of the subprocesses (or control constructs) in the composition. Again, take the atomic process ExpressCongoBuy [4] as an example. The ExpressCongoBuy process has two properties (congoBuyBookISBN and congoBuySignInInfo) as its input, two properties (congoOrderShippedOutput and con-

772

Y. Qu

goOutOfStockOutput) as its outputs, two properties as its preconditions (congoBuyAcctExistsPrecondition and congoBuyCreditExistsPrecondition), one property as its effect (ongoOrderShippedEffect). These concrete IOPEs for the ExpressCongoBuy process are specified to be sub-properties of input, output, precondition and effect, with some constraints on their domains and ranges (through using anonymous subclasses via the use of “Restriction”).

3 Some Design Consideration DAML-S can be seen as a “semantic” service description language (in short, SSDL). We use the term SSDL to mean an ontology for describing services plus an ontology structuring mechanism inherited from the ontology defining language (base language). In the case of DAML-S, the base language is DAML+OIL, and the provided ontology is just what we discussed in previous section. The base language of DAMLS will be shifted to OWL in the next release. DAML-S is definitely a great propellant towards Semantic Web Service. However, we also learned some lessons from the evolvement of DAML-S. These lessons are helpful to the design of SSDL as well as the improvement of DAML-S. The followings are our corresponding considerations.

(1) The Use of Meta-modeling Facilities of RDF Schema In the version of DAML-S 0.9, there are some misuses of the meta-modeling facilities of RDF Schema, e.g. the rdfs:range of “has_process” properties. In fact, ontology developers should take care when they define classes of classes or attach properties to classes. As we know, OWL provides three increasingly expressive sub-languages designed for use by specific communities of implementers and users. OWL Full can be viewed as an extension of RDF, while OWL Lite and OWL DL can be viewed as extensions of a restricted view of RDF. When using OWL Full as compared to OWL DL, reasoning support is less predictable since complete OWL Full implementations do not currently exist. Our position is as follows: The designer of a “semantic” service description language (in short, SSDL), such as DAML-S, may use some meta-modeling facilities of RDF Schema, and should be careful to do so, but users of the SSDL should not use any meta-modeling facilities in their applications. In the case of adopting OWL, the designer of a SSDL may use some constructs from OWL Full with carefulness, but users of the SSDL should just use ontology and constructs from SSDL and OWL DL to describe their web service and/or grid service.

(2) The Functional Description of Services In DAML-S Process Model, as illustrated in section 2.1, any application specific process should be defined as a subclass of (or, a descendant class of) Process. In other words, the class Process is the super-class of every user-defined class representing an application-specific process. This representing processes-as-classes approach will bring some harm to the usage of DAML-S. Fortunately, DAML-S 1.0 will adopt the processes-as-instances approach.

A Viewpoint of Semantic Description Framework for Service

773

Secondly, an instance of Service can be described by at most one instance of ProcessModel, and each instance of ProcessModel can has at most one subclass of Process through the “hasprocess” property. This means that each service can be associated with at most one process. It lacks the capability to describe the service type having multiple functions. Of course, there is a trade-off between fine-grain and coarse-grain. We think a good description framework should be flexible enough to cover the coarse granularity. Thirdly, the DAML-S Profile also provides the functional description for service through the IOPEs. But these IOPEs in profile just refer to the corresponding IOPEs in process, and have little added value in principle. As we noted, in myGrid [6], additional properties such as performs_task and uses_resource, etc., are used with profiles to specify which task they perform, which resource they use. We think that the task being performed is an important aspect of a functional description, e.g. retrieving is an example of a generic task. The structure of the DAML-S Profile could be improved accordingly. Finally, the representation of preconditions and effects, as well as the condition of conditional output, depends on the representation of rules in the DAML language. A decision was made to more closely align the DAML Rules with the Rule Markup Language (RuleML [9], in short). Currently, RuleML supports user-level roles, URI grounding, and order-sortedness [10]. The rules will definitely play an important role in representing the functional behavior, however, the variable and scope issue should be resolved when integrating rules into DAML-S or other SSDLs, which will be discussed in more detail in section 4.

(3) The Representation of Stateful Services There are some debates on “Service and State: Separation or Integration?” As we know, pure stateless services are rare and usually uninteresting. The question is through which mechanism to expose the state, through service, specific operation, session or contextualization? Within OGSI, the state of grid service is exposed directly through the service itself. We notice the trend of convergence between Web Service and Grid Service. A good description framework should be flexible enough to cover the requirement of grid service directly, including stateful grid services.

4 A Semantic Description Framework for Service Based on previous considerations, we propose a semantic description framework for service in this section. Our framework has following characteristics: (1). Services are organized by service types using “services implements service types” mechanism; (2). Operations (or processes) are associated with service types; (3). Each operation is represented as an instance; (4). Multiple inputs/outputs of an operation are aggregated by a message with part names, and then each operation has only one input message and only one output message;

774

Y. Qu

(5). Services state can be directly exposed at the service type level; (6). Rules are used to prescribe the behaviors of services at both of service type and operation levels, and three pseudo variables, i.e. “hostService”, “input” and “output”, are introduced to be used within rules. An RDF graph model of our description framework for service is roughly depicted in Fig. 3. Note that the classes and properties within our description framework are assumed to be in a fiction namespace “mySSDL” to avoid name collision, and the prefix of a name would be omitted when there is no ambiguity.

Fig. 3. A Semantic Description Framework for Service

A service has service profiles, implements zero or more service types. A service type is an instance of the class ServiceType. ServiceType is defined to be a subclass of owl:Class, the intended meaning of ServiceType is the class of all service types just as owl:Class is the class of all OWL classes. Each service type may have service data templates, service rules and operations. A service data template describes the name and type of an exposed service data. A service rule prescribes the behaviors of a service type, such as the consistency constraint, the dependency and concurrency between the operations, etc. An operation is an instance of the class Operation, just as a property is an instance of the class rdf:Property. An operation has codomain and/or corange property, just as a property can have rdfs:domain and/or rdfs:range property. But, an operation must have exactly one codomain and corange property, and the codomain and corange property of an operation must be an instance of MessageType. In addition, an operation can have rules to describe functional behaviors of the operation. The class MessageType is defined to be a subclass of owl:Class, the intended meaning of MessageType is the class of all message types. Any message type (an instance of MessageType) can have zero or more part templates. A data template must have exactly one data name and one data type. The data type of a data template can be one of the built-in OWL datatypes including many of the built-in XML Schema datatypes, or anyone of other OWL classes including application specific data types

A Viewpoint of Semantic Description Framework for Service

775

and even user-defined service types. We define a specific message type (named as myssdl:Void) to be an instance of MessageType, with the constraint that there doesn’t exist any data template associated with Void. Rules are used to specify the behaviors of a service as well as the functional behaviors of an operation. As to the variable and scope issue, we propose: The pseudo variable “hostService”, the service data names and operation names of a service type can be used within the service rules of the service type. The service data names of the containing service type, three pseudo variables (“hostService”, “input” and “output”), as well as the data names of the co-domain and co-range message types can be used within operation rules of the function. The pseudo variable “hostService” denotes the service that implements the corresponding service type. Within operation rules of a function, “hostService” denotes the service requested to execute the function, while the pseudo variables “input” and “output” denotes the input message and the output message of the corresponding operation, respectively. It should be noted that further more research work should be taken to integrate RuleML and logic reasoning into our description framework for service, although some theoretical research results [10, 11] has been made.

4.1 Examples This subsection illustrates the usage of our framework by two examples. The first one is about describing web service, while the second one is about grid service. (1) The Congo Example We define a service type BuyBook as an instance of ServiceType, and a service congoBuyBook as an instance of Service. The service congoBuyBook implements the service type BuyBook, which has two or more operations: expressBuyBook, fullBuyBook and other possible operations exposed (e.g. createAcct, createProfile, locateBook). These operations are defined to be instances of Operation, with corresponding input/output message types and operation rules. As to the operation that uses other exposed operations, we can use the service rule to specify the control flow of the composition. Take the expressBuyBook operation as an example. It has an input message type with two parts: a String value with the name buyBookISBN and an instance of SignInData with the name buyBookSignInInfo. The operation has an output message type with one part: a String value with the name replyMessage. The operation has an operation rule, which prescribe that if the book is in stock, then the reply message indicates that the order is shipped; and that if the book is out of stock, then the reply message indicates that the book is out of stock. In the case that the quantity of available books is exposed as a service data, the rule could say more about the precondition and effect of the operation. (2) The GridService portType As we know from OGSI [2], the GridService portType MUST be implemented by all Grid services and thus serves as the base interface definition in OGSI. This portType

776

Y. Qu

is analogous to the base Object class within object-oriented programming languages such as Smalltalk or Java, in that it encapsulates the root behavior of the component model. The behavior encapsulated by the GridService portType is that of querying and updating against the serviceData set of the Grid service instance and managing the termination of the instance. Now let’s define a service type, also named as “GridService”, to reflect the above idea. First, we can define GridService as an instance of ServiceType. The GridService is defined to have many service data templates, including the ones with following data names: interface, serviceDataName, gridServiceHandle, gridServiceReference, findServiceDataExtensibility, setServiceDataExtensibility, factoryLocator, terminationTime. The first six ones have list types as their data types, the last two has ogsi:LocatorType and ogsi:TerminationTimeType as their data types, respectively. The mutability, modifiable and nillable of these service data can be described by service rules, and rules can also be used to describe the multiplicity constraints on the first six service data elements, e.g. a grid service has at least one interface and two setServiceDataExtensibility, as well as the other consistency constraint, e.g. initial setting of service data value. Second, we define the following operations for GridService: findServiceData, setServiceData, requestTerminationAfter, requestTerminationBefore and destroy, with their corresponding input/output message types. By doing so, every service that implements the service type “GridService” can be seen as a grid service. That’s just what service data and behaviors the OGSI requires every grid service should have. As to the domain-specific grid applications, the application domain community can design a couple of service types for reuse in their grid applications.

5 Conclusion The semantic description framework of DAML-S is illustrated by using RDF graph model in this paper. We discussed some consideration issues in improving DAML-S and/or designing another SSDL, and a novel description framework for service is presented. The six characteristics of our description framework are outlined, and it’s illustrated by two examples. There are some research works related to this paper. For example, Web Service Modeling Framework (WSMF) is proposed in [12] to enable fully flexible and scalable E-commerce based on web services, and IRS-II (Internet Reasoning Service) [13] is a framework and implemented infrastructure, whose main goal is to support the publication, location, composition and execution of heterogeneous web services, augmented with semantic descriptions of their functionalities. As compared to these related works, the system infrastructure and service composition are not the focus of this paper. Our main concerns include the semantic description framework for service and the design issues of the SSDL. We hope that the six characteristics of our description framework given in section 4 and our thinking on designing SSDL given in section 3 could push the improvement to DAML-S as well as the design of other SSDLs. To make the vision of Semantic Grid Service and Semantic Web Service a reality, a number of research challenges need to be addressed. Our further research work in-

A Viewpoint of Semantic Description Framework for Service

777

cludes the integration of rules into SSDL as well as the trust and provenance of Semantic Service. We believe, with the evolvement of the SSDL and the emerging of system infrastructure and framework, the Semantic Grid/Web Service will become a real life in the future.

References 1 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Foster, C. Kesselman, J. Nick, S. Tuecke, Grid Services for Distributed System Integration. IEEE Computer, 3 5(6): 37-46, 2002. S. Tuecke, K. Czajkowski,I. Foster, J. Frey, S. Graham, C. Kesselman et al, Open Grid Services Infrastructure (OGSI), 2003, Available online at http://www-unix.globus.org/toolkit/draft-ggf-ogsi-gridservice-33 2003-06-27.pdf. Sheila A. McIlraith, David L. Martin: Bringing Semantics to Web Services. IEEE Intelligent Systems, 18(1): 90-93, 2003. DAML-S 0.9 Draft Release (2003). DAML Services Coalition. Available online at http://www.daml.org/services/daml-s/0.9/. DAML-S Coalition. DAML-S: Web Service Description for the Semantic Web. In First International Semantic Web Conference (ISWC) Proceedings, Sardinia (Italy), June, 2002, pp 348-363, 2002. C. Wroe, R. Stevens, C. Goble, A. Roberts, M. Greenwood, A suite of DAML+OIL Ontologies to Describe Bioinformatics Web Services and Data, International Journal of Cooperative Information Systems, Vol. 12, No. 2 (2003) 197-224. David De Roure, Semantic Grid and Pervasive Computing, available online at: http ://www. semanticgrid .org/GGF/ggf9/gpc/ Roure, D., Jennings, N. and Shadbolt, N. 2001. Research Agenda for the Future Semantic Grid: A Future e-Science Infrastructure. Available online at http://www.semanticgrid.org/v1.9/semgrid.pdf Harold Boley, Said Tabet, and Gerd Wagner. Design Rationale of RuleML: A Markup Language for Semantic Web Rules. In Proc. Semantic Web Working Symposium (SWWS’01), pages 381-401. Stanford University, July/August 2001. Harold Boley, Object-Oriented RuleML: User-Level Roles, URI-Grounded Clauses, and Order-Sorted Terms, in Workshop on Rules and Rule Markup Languages for the Semantic Web (RuleML-2003), Sanibel Island, Florida, USA, 20 October 2003. Benjamin N. Grosof, Ian Horrocks, Raphael Volz, and Stefan Decker. Description Logic Programs: Combining Logic Programs with Description Logic. In Proc. 12th Intl. Conf. on the World Wide Web (WWW-2003). Budapest, Hungary, May 2003. Fensel, D., Bussler, C. (2002). The Web Service Modeling Framework WSMF. In 1st meeting of the Semantic Web enabled Web Services workgroup, 2002. Available at http://informatik.uibk.ac.at/users/c70385/wese/wsmf.bis2002.pdf. Enrico Motta, John Domingue, Liliana Cabral , and Mauro Gaspari, IRS-II: A Framework and Infrastructure for Semantic Web Services. In Proceedings of the 2nd International Semantic Web Conference 2003 (ISWC’2003), 20-23 October 2003, Sundial Resort, Sanibel Island, Florida, USA.

A Novel Approach to Semantics-Based Exception Handling for Service Grid Applications* Donglai Li,Yanbo Han, Haitao Hu, Jun Fang, and Xue Wang Software Division, Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing, China Graduate School of Chinese Academy of Sciences, 100039, Beijing, China {ldl, yhan, huhaitao,fangjun, wxue }@software.ict.ac.cn

Abstract. Whenever the characteristics of a service grid environment are addressed, issues related to openness and dynamism pop out first. Such issues do affect the definition and handling of application exceptions, and traditional approaches to exception handling lack in proper mechanisms for capturing exception semantics and handling exceptions. In this paper, after analyzing the newly arisen problems of exception handling in a service grid environment, we focus on exceptions caused by runtime mismatches between user’s requests and underlying services, and propose a semantics-based approach to handling this kind of exceptions. The approach was first developed within the FLAME2008 project and some promising results have been achieved.

1 Introduction In a service grid environment, services evolve autonomously, their coupling is highly loose, and system boundaries are no longer clearly in control. Exceptions [8] [10] may happen more frequently when building applications in such an open and dynamic environment, especially in connection with frequent changes of user requirements. Exception handling has always been an important topic and some previous efforts have led to remarkable achievements [3][9][11]. But in a service grid environment, new challenges appear: Applications may use any network-based service. Most users don’t have the ability to describe all the potential runtime mismatches, which lead to exceptions. We thus intend to detect such mismatches automatically. The handling process should be determined dynamically and mostly by the system instead of users. Often, because of lacking background knowledge and potential services, one couldn’t describe details of most mismatch exceptions. A mechanism is needed to judge what a mismatch exception is and whether it happens or not, and a mechanism is needed to determine the way of handling a mismatch exception. The connection among aspects of exception handling process is “tight” in current exception handling technology. To solve this problem, semantics is naturally introduced. In the information systems context, semantics can be viewed as a mapping between an object modeled, represented and/or stored in an information * This paper is supported by the National Natural Science Foundation of China under Grant No. 60173018 and the Young Scientist Fund of ICT, CAS under Grant No. 20026180-22. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 778–786, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Novel Approach to Semantics-Based Exception Handling

779

system and the real-world object(s) it represents. [1]. Finding out the relationship between the semantics of a user’s request and the semantics of the services may help to solve the mismatch exception handling problems. Based on semantics, we propose an approach, named ASEED(a novel approach to semantics-based exception handling for service grid applications). We developed this approach within the FLAME2008[12], which adopts the service grid paradigm and develops a service integration platform targeted at a real-world application scenario – an information service provision center for the Olympic Games Beijing 2008. The paper organized as follow: section 2 discusses the prerequisites of the approach; section 3 illustrates the approach; section 4 shows and evaluates the approach; section 5 compares with related works; Last section lists some future directions.

2 Prerequisites of ASEED Although services are encouraged to be capsulated in a unified form, for example, Web Services facilitate open standards and a service is described by WSDL, there is no global system for publishing services in such a way as it can be easily processed by anyone else. The problem is that, in some contexts, it is difficult to use the services in the ways that their designers might want [4]. The unclear meaning of user’s requests and services makes the situation more complicated. Thus the meaning of user’s requests and services, so called semantics, needs to be exposed. In order to solve the mismatch exceptions, some prerequisites are needed.

2.1 Semantics Infrastructure We take ontology-based semantics as the infrastructure of our approach. An ontology is a formal, explicit specification of a shared conceptualization, which is mediator to share common concepts among different parties. We depict the relationship between two concepts. For example a single ontology includes some concepts: mammal, human, woman, etc. Woman is the subclass of human, human is the subclass of the mammal and woman inherits all properties of human and human inherits all properties of mammal. Thus we have different granularities of semantics. To illustrate these layered structural ontologies, we use a graph to demonstrate, nodes in the graph stand for specified semantics and the edges of the graph show the relations among specified semantics. Many research projects have produced large ontologies such as WordNet, which provides a thesaurus for over 100,000 terms explained in natural language. In a grid environment, resource sharing is done with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO)[7]. Such rules are also the guidance and reference when experts construct ontologies besides their domain knowledge.

780

D. Li et al.

2.2 Semantics of Services and User’s Requests Usually a service has two kinds of properties: functional ones and non-functional ones. Functional properties illustrate what the service could do, while non-functional properties show other information such as QoS. And they have their respective usage: - Semantics of functional properties: it manifests the service’s ability. Though languages like WSDL can describe it from syntax aspect, they don’t make it machineunderstandable. Services published with accessible semantics of their functional properties can be searched, invoked automatically. - Semantics of non-functional properties: services having same functional properties may be different from many other aspects, such as different QoS. Semantics of non-functional properties describe the constraints of the services so that the services can be understood more precisely. In a user’s request, each unit, so called activity, illustrates the user’s partial desire. Similar to a service, an activity usually has two aspects of semantics: Semantics of functional desire and Semantics of non-functional desire With the semantics infrastructure, it’s possible for users to describe their requests and for services suppliers to publish services from semantics level. The mismatch between them can be checked automatically by the system, because the machine understands feature of semantics.

3 The Semantics-Based Exception Handling Approach To ensure a smooth execution of service grid applications, mismatches between user’s requests and services should be observed and handled as mentioned before. Fig1 illustrates the central idea of our approach:

Fig. 1. Reference model of ASEED When an exception is signaled, the semantics of user’s request and the services are derived form the Exception Context and the compatibility between them are analyzed by the Mismatch Analysis to find out whether it’s a mismatch exception. There are common factors among different handling processes, such as the exception context, the strategy of handling exceptions, etc. The Handling Pattern contains patterns, which abstract out the common information of exception handling processes. Patterns are referred when similar exception happens.

A Novel Approach to Semantics-Based Exception Handling

781

When a mismatch exception happens, the Mismatch Analysis informs the Strategy Selection and Specified Handling Strategy is found out then by consulting the Exception Context and the Handling Pattern. The semantics infrastructure is the basis of connecting aspects of exception handling. Based on it, exceptions of mismatches between user’s requests and services can be observed and specified handling strategy is produced. Main components of the approach will be illustrated in the following sections.

3.1 Mismatch Exception Analysis During execution, the semantics of selected services should match the semantics of user’s requests from both the functional aspect and the non-functional aspect. When an exception is signaled, the compatibility between semantics of services and user’s requests will be calculated. The algorithm is:

Compatibility of both functional and non-functional semantics would be checked, because either of them may lead to mismatch exceptions. With the semantics infrastructure, those kinds of mismatch exception can be distinguished automatically by the system when exceptions happen, and users don’t have to specify them when describing their requests.

3.2 Handling Pattern There are common factors among different handling processes. Pattern[2] is used to describe those common factors. A pattern may consist of many parts, and we use the following four to contain a minimum: name, problem, context and solutions: Name: each pattern has its specified name, which identifies itself. Problem: it describes what kind of mismatch exception the pattern tries to solve. Context: a mismatch exception resides in a specified context, which consists of the structure of user’s requests, the semantics of user’s request or the semantics of the services and the context affects the exact meaning of an exception. Solutions: solutions contain the necessary actions to handle the exceptions. It may contain one or more steps to guide the handling of a specified exception. The instantiations of patterns, which we call cases, have the solid information of the exception handling processes. For the context of a case, an expression is used to describe the structure: ax stands for a single activity; stands for Sequence; stands for Concurrent;

stands for Choice; power stands for Loop:

782

D. Li et al.

Fig. 2. Illustration of the a process’s structure

While the semantics of the user’s requests or services in the case are illustrated by a set of value pair. And the Solutions contain a set of semantics notations illustrating the handling process. Case will be populated during the exception handling processes.

3.3 Strategy Selection Handling strategy is the solutions property of the case. In order to solve the mismatch exceptions, information from the exception context will be consulted to retrieve the suitable case. The case matching consists of two parts: the structure matching and the semantics matching. - Structure matching: for the case and the user’s request, the structure can be described as an expression as mentioned above. Thus the structure matching then can be treated as an expression matching. Only those cases which have the same structure as the user’s request can be considered as matched. - Semantics matching: each activity in the case or in the user’s request maps with a (set of) specified semantics. Usually a single case or user’s request consists of many activities. The semantics matching results are defined by the least square distance function, which assumes that the best-fit is the one that has the minimal sum of the deviations squared from a given set of semantics, shown as below:

and are the corresponding activities in the user’s requests and the case; is the weight for which illustrates the relativity of in the case. is the function which calculates the semantics matching degree of two corresponding activities from user’s request and the case:

A Novel Approach to Semantics-Based Exception Handling

783

The smaller SM is, the closer the semantics compatibility is. Structure similarity and semantics similarity make the Strategy Selection retrieve the suitable case, and the handling strategy is then retrieved and used to guide the exception handling.

4 Implementation in FLAME2008 4.1 Case Study A common travel scenario from Flame2008: first booking the flight, then reserving a hotel room and renting a car in parallel. If both are done, then booking a ticket for swimming game of Olympic Games 2008.

Fig. 3. Demonstration of travel scenario

Fig3 shows the scenario that illustrates the functional semantics mapping (two dotted broken line) and non-functional semantics mapping (doted line) for GTB. Before starting GTB, User changes his mind to watch fencing game instead of swimming game. So he modifies GTB by changing a non-functional semantics (lightcolored doted line). While the continuous execution of the application stops due to some wrong returning result, which signal an exception. The following shows the mismatch exception handling process: (1) Mismatch Analysis: For GTB, the semantics of the user’s new request (fencing ticket booking), the semantics of the user’s old request (swimming ticket booking) and the semantics of the selected service have the same functional semantics: “http://flame/KgB/travel.daml#sports.ticketbooking”, but their non-functional semantics are different:

784

D. Li et al.

Activity name GTB (new) GTB (old)

Non-functional semantics http://flame/KgB/travel.daml#sports.ticketbooking.fencing http ://flame/KgB/travel.daml#sports .ticketbooking . s wimmin ging Selected Service http://flame/KgB/travel.daml#sports.ticketbooking.swimmin ging After the modification, the functional semantics of the service still satisfies the user’s request, but the non-functional semantics of the service is no longer compatible (they are siblings in the semantics graph). Thus a mismatch exception is observed. (2) Strategy Selection: in order to get the handling strategy, the structure similarity and the semantics similarity are calculated so as to retrieve suitable case: - Structure matching: shown in Fig3, the structure of the exception context is By expression compare, Cases with same structure are selected out. - Semantics matching: Among the cases selected out by structure similarity comparing, the semantics similarity is calculated. Using the algorithm we have mentioned, a suitable case is selected out and it is: Service ReSelect Name A new activity replaces the old one to satisfy a different goal. Problems Context Structure: Semantics: //semantics of non-functional properties are omitted here http://flame/KgB/travel.daml#traffic.plane-ticketbooking> http://flame/KgB/travel.daml#accomodation.hotelreserving> http://flame/KgB/travel.daml#traffic.carrenting> http://flame/KgB/travel.daml#sightseeing.ticketbooking> Solutions http://flame/KgB/task/Execute.daml#ServiceReSelect In the case, the solution is “http://flame/KgB/task/Execute.daml#ServiceReSelect”, which guides the execution of the application by allowing it to select another service. Meanwhile, if there is no suitable case available, a default case will be consulted.

4.2 Evaluation of ASEED Based on semantics, we proposed an approach to handle runtime mismatch exceptions of service grid applications. Semantics helps the system know what a runtime mismatch exception is. Also it helps to auto-detect these exceptions and handle them dynamically. In service grid environment, handling exception by using our approach has some promising effects: - The veracity of catching mismatch exceptions: The explicit semantics from some aspects of the application are used, such as the semantics of the services. Exceptions of mismatches between user’s requests and services could be detected precisely. - The flexibility of handling exceptions: as we have mentioned, exceptions may have different meaning and should be handled by different ways. Our approach tries to find out the suitable way each time an application encounters mismatch exceptions. And during handling process, the handling strategies are located dynamically. Still there are some points affecting the handling process: - How minute the semantics has been described: the precision of catching mismatch exceptions depends largely on the granularity of semantics description. If the granularity is rough, the catching result is not good.

A Novel Approach to Semantics-Based Exception Handling

785

The similarity-matching of the case: it’s easy to compare the structure similarity. But the semantics similarity may count on more factors so as to make more precise matching. The handling patterns: in order to solve all kinds of mismatch exceptions, the handling patterns and their instances are needed to be enriched and better managed.

5 Related Works In a service grid environment, exception handling is still in its infant state. Some research groups have begun to pay their attention to this issue. IBM’s BPEL4WS[6] pays attention to the ability for flexible control of the reversal by providing the ability to define exception handling and compensation in an application-specific manner. But it mostly deals with local exceptions and exceptions (handlers) which are predefined by users themselves. If there are unexpected exceptions, the system cannot be aware of them. Globus[5] provides a range of basic services to enable the construction of application specific fault recovery mechanisms. It focuses on providing a basic, flexible service that can be used to construct a range of application-specific fault behaviors. But it is difficult to build those kinds of service for common use. There are some other works in related domains like workflow domain: [11] provides a model, which provides a rule base that consists of a set of rules for handling exceptions. But the rule base is a separate component functionally disjoint from the exception database. Sometimes the rules cannot describe the scenario even when an approach has been adopted to resolve many exceptions because of the disconnection between the two bases. METEOR[13] tried to solve the conflicts resolution in cross-organizational workflows. Their approach bundles knowledge sharing, coordinated exception handling and intelligent problem solving. The attention is paid to the conflictions between the handling participants not the mismatches we mentioned. But the case matching algorithm is worthy of reference. For most work, if the mismatch exceptions are not defined by users, they can’t be detected and handled. But it is quite difficult for users to describe all mismatch exceptions in a service grid environment due to its open and dynamic characteristics. Also the system lacks in the means of automatically detecting and handling the mismatch exceptions. And most work focuses on the problems within a bounded environment either intra-organization or inter-organization where exceptions and exception handling processes are stable in some sense.

6 Conclusions In this paper, we analyzed the upcoming problem of mismatch exceptions between user’s requests and the underlying services in service grid environment. In order to build reliable applications, these exceptions needed to be detected and handled in an effective way. We proposed a novel approach named ASEED, which adopts semantics as a dominant role for providing a basis throughout an exception handling process. Semantics makes the mismatch exceptions and the handling processes machine understandable. The approach makes it possible that these exceptions don’t

786

D. Li et al.

have to be described by users and mostly they are detected and handled by the system itself. The approach has been implemented in the FLAME2008 project and some promising results have been achieved. Still some problems need to be solved to perfect the approach. In our future research, we will pay our attention to the following problems: the spectrum of exception context needs to be broadened and the details of context will be studied more thoroughly; Elements of a pattern will be under thorough consideration and a thorough classification of the patterns is needed to be done. Effective management of the cases is needed to offer better support.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

A. Sheth, Data Semantics: What, Where, and How?, in Data Semantics (IFIP Transactions), R. Meersman and L. Mark, Eds., Chapman and Hall, UK, (1996) 601-610. E. Gamma,R. Helm, R. Johnson, J. Vlissades, Addison-Wesley, Design Patterns Elements of Reusable Object-Oriented Software, ISBN: 0-201-63361-2, (1995) F. Casati, S. Ceri, S. Paraboschi, and G. Pozzi, Specification and Implementation of Exceptions in Workflow Management Systems, TODS, Vol 24, No. 3, (1999) 405-451 http://infomesh.net/2001/swintro/ http://www.globus.org/details/fault_det.html http://www-900.ibm.com/developerWorks/cn/webservices/ws-bpel_spec/index_eng.shtml I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the Grid: Enabling scalable virtual organisations. International Journal of Supercomputer Applications, 15(3), (2001). J.B. Goodenough, Exception Handling: Issues and a Proposed Notation, Communications of the ACM, Vol. 18, No. 12 (1975)683-696 J. Eder and W. Liebhart, Contributions to Exception Handling in Workflow Systems, EDBT Workshop on Workflow Management Systems, Spain, (1998). J.L. Knudsen, Better Exception-Handling in Block-Structured Systems, IEEE Software, Vol. 17, No. 2 (1987) 40-49 S.Y. Hwang, S.F. Ho, J. Tang, Mining Exception Instances to Facilitate Workflow Exception Handling, Proc. of the Sixth International Conference on Database Systems for Advanced Applications, Taiwan, (1999) 45-52. Y. Han, H. Geng, H. Li, J. Xiong et al, VINCA – A Visual and Personalized Businesslevel Composition Language for Chaining Web-based Services, Proc. of International Conference on Service Oriented Computing , Italy (2003) Z. Luo, A. Sheth, K. Kochut and B. Arpinar, Exception Handling for Conflict Resolution in Cross-Organizational Workflows, Distributed and Parallel Databases Journal, Vol 11(2003)

A Semantic-Based Web Service Integration Approach and Tool* Hai Zhuge, Jie Liu, Lianhong Ding, and Xue Chen Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing, China [email protected], [email protected], [email protected]

Abstract. Integration of Web Services for large-scale applications concerns complex processes. Component technology is an important way to decompose a complex process and promote efficiency and quality. This paper proposes a service integration approach considering both the integration of service flows and the integration of data flows. The approach includes the component-based Web Service process definition tool, mechanism for retrieving services in a well-organized service space and UDDI repositories, algorithms for heterogeneous data flow integration, and rules for service verification. The proposed approach has been integrated into an experimental service platform and used in an online book sale business application. Comparisons show the features of the proposed approach.

1 Introduction Integration of Web Services for large-scale applications is a challenging issue due to unmanageable efficiency and quality issue of the involved complex service processes. Another issue arising from service integration is how to conveniently, accurately and efficiently retrieve services from the rapidly expanding and large-scale service repositories. Data returned from multiple services may be heterogeneous in semantics, structure and value [1, 4], so the third issue of service integration is how to integrate the heterogeneous data flows returned from different services so as to provide a unified view for users. Previous research on Web Service integration mainly concerns approaches for automatically integrating relevant services by using semantic markups [8], Petri-Netbased and ontology-based approaches for service description, simulation, verification and composition [3, 10], and languages for describing behavioral aspects of the service flow [6, 11]. However, these research works seldom address the above three issues. The current UDDI registry nodes only provide keyword-based service retrieval [9]. If users are not familiar with the pre-specified service categories, they usually could * The research work was supported by the National Science Foundation of China (NSFC). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 787–794, 2004. © Springer-Verlag Berlin Heidelberg 2004

788

H. Zhuge et al.

not get the satisfied retrieval results. Applications show that the current UDDI repositories cannot meet the needs of the business processes in efficiency and accuracy. This paper solves the issue of complex service process construction by making use of component technology, an important way to decompose a complex service process. A component-based service process definition tool has been implemented to assist users to transform a business process into a service process and then to specify the requirements for the related service components, which are integrated by service flow and data flow. Interactions between the components in the service process are based on XML, SOAP and WSDL. We solve the issue of improving the accuracy and efficiency of service retrieval by making use of the service space model, which organizes Web Services in a normalized and multi-dimensional service space so that services could be retrieved efficiently. We solve the issue of heterogeneous data flow integration by establishing mapping between the global schema and the source schema. The semantic heterogeneity, the structure heterogeneity, and the data value heterogeneity are considered.

2 General Architecture The general architecture of the proposed Web Service integration approach is illustrated in Fig. 1, which mainly consists of the following modules: Process Definition, Definition Verification, Requirement Description, Web Service Retrieval, Integration, Integration Verification, and Registration.

Fig. 1. General architecture of the proposed approach

A Semantic-Based Web Service Integration Approach and Tool

789

We have developed a component-based Web Service process definition tool to assist users to transform a business process into a service process. The process definition is accomplished by drawing nodes and arcs on the interface with the help of the operation buttons. After definition, the completeness and time constraints of the defined process components are verified. Modification is required in case errors occur. Otherwise, users can specify the requirements for the related service components by using the definition tool. A service space is a multi-dimensional space with a set of uniform service operations (http://kg.ict.ac.cn). A referential model for the service space can be expressed as Service-Space=(Classification-Type, Category, Registry-Node). In order to retrieve the required services effectively and efficiently, multi-valued specialization relationships and similarity degree between services are constructed [13]. Besides the GUI, the service space supports applications to retrieve services by using SOAP messages. If no matching services are retrieved, the service space will automatically communicate with the UDDI repositories through SOAP messages to get the required services. The components in the service process are integrated through service flow and data flow. The service flow reflects the control dependence, while the data flow denotes the data dependence among the service components. The Integration Verification module checks the accessibility, deadlock and the execution state of the service process. If no error occurs, the new service will be registered at the service space and also the UDDI repositories, otherwise modification of process definition is triggered.

3 Semantic Heterogeneous Data Flow Integration In order to form a single semantic image for heterogeneous data returned by the service components [15], we use a triple DIS = to represent a data integration system, where G is the global schema — the XML schema defined by application developers, S is a set of the source schemas — the XML schemas of the data sources returned by service components, and M is the mapping from G into S. The process of heterogeneous data integration consists of the following four steps: The first step is global schema definition. The application developers define the basic information, the data dependence relationships (i.e., the semantic constraints), and the structure of the global schema according to the requirements. The basic information is represented by the structure: GNode (GnodeID, Gnode, Gtype), where GnodeID is the node identifier, Gnode is the node name, and Gtype is the node type. The data dependence relationship is represented by a set of pairs where is the key just as the key in relational database systems. Paths in the global schema is expressed by the structure: GSchema (GpathID, Gpath, GpathLID, Gtype), where GpathID is the path identifier, Gpath is the label path (i.e., a sequence of slash-separated labels starting from the root to the current node), GpathLID is the identifier path (i.e., a sequence of slash-separated node identifiers starting from the root to the current node), and Gtype is the terminal node type.

790

H. Zhuge et al.

The second step is source schema extraction, which loads each data flow of the componential services, traverses the source schema recursively by the preorder sequence, and extracts the node name and the label path of the leaf node (or attribute node) from each source. The node information is kept by the structure: SNode (SourceID, SnodeID, Snode, Stype), and the label path information is recorded by the structure: SSchema (SourceID, SpathID, Spath, SpathLID, Stype), where SourceID is the source identifier, SpathID is the path identifier, Spath is the label path, SpathLID is the identifier path, and Stype is the terminal node type. The third step is mapping construction between the global schema and the source schemas, which solves the semantic conflict, structure conflict, and data value conflict among the involved service components. To solve the semantic conflict, such as the synonymy relationships between nodes naming, each node in the global schema is associated with a semantic attribute set (i.e., a set of semantically related terms) generated by making use of WordNet and can be added, modified and deleted on demand. Structure conflict is resolved through node mapping, path mapping, and tree mapping between the global schema and the source schemas. The node mapping is to map nodes in the global schema into nodes in the source schemas according to the established semantic attribute set. Human intervention is necessary in order to denote the mapping nodes on demand but they are not included in the semantic attribute set. The path mapping is to map the label paths in the global schema into paths in the source schemas. The tree mapping is to map the global schema as a tree into the source schemas. The tree structure sequence derived from the global schema is denoted as where is the path identifier, and is the label path from the root to the leaf node. The tree structure of the source schemas is denoted as where is the source identifier, is the path identifier, and is the label path from the root to the leaf node. The fourth step is data integration. To integrate the heterogeneous data flows satisfying users’ queries, the global query sequences including all the possible sub queries about the global schema are established. We use a set of triples to denote the global query sequences as where denotes the sub-query identifier, is the path expression of the sub-query, and is the condition to be satisfied. Each user query corresponds to a set of non-continuous branches in the global query sequence, which further corresponds to non-continuous branches in each source tree and executes at each source. Data returned from the service components satisfying the sub-query branches with Boolean conditions is integrated. To solve the problem of data value conflict, the involved sources are ranked considering the reliability, data accuracy, and data quality. Data returned from the sources with higher rank has the higher priority.

4 Verification The Definition Verification module validates the completeness and time constraints of process definition. First, a component should be independent. The independency

A Semantic-Based Web Service Integration Approach and Tool

791

requires the component to reflect an independent business and to be able to execute independently. Second, a component should be encapsulated to interact with the rest components in the process through SOAP messages. Third, the start node and the end node should be unique and the internal process completeness should be satisfied as discussed in [14]. Considering the time factor and the logical relationships among services, the following rules are used for verification: The start time of a single node must not be earlier than its predecessor’s end time, The end time of a single node must not be later than its successor’s start time, The start time of a node with “And-join” predecessors must not be earlier than any of its predecessors’ end time, The start time of a node with “Or-join” predecessors must not be earlier than all of its predecessors’ end time, The end time of a node with “And-split” successors must not be later than any of its successors’ start time, The end time of a node with “Or-split” successors must not be later than all of its successors’ start time, The start time of an arc must not be earlier than its predecessor’s end time, The end time of an arc must not be later than its successor’s start time. The Integration Verification focuses on the following aspects: First, the components in the service process should be reachable from the start node, and, deadlock and loop should be checked and eliminated. Second, the components to be retrieved should be found in service space and UDDI repositories. Third, the execution condition of the components should be satisfied during the execution process, and data returned in the data flows should satisfy the requirements.

5 Application in Online Book Sale The purpose of this application is to demonstrate the integration of book information from multiple booksellers. According to the business process of book sale, users can use the definition tool to define service process and specify requirements for service components as shown in Fig. 2, where the background is the top-level service process, and the middle window is service component requirement specification. Clicking the “Search” button in the middle window will trigger the search process. Users will be asked to select services from a name list in the front window, and then information about service components will be automatically returned. The global schema defined by the application developers is shown in Fig. 3. The basic information of a book includes Book={ISBN, Title, Author, Publisher, Year, Abstract, Vendor, Price, Stock}. The semantic constraints is denoted as where ISBN is the key of Title, Author, Publisher, Year, and Abstract, and both ISBN and VendorID are the key of Price and Stock. The semantic attribute set consists of {ISBN (Book No, Book ID), Title (Name), Author (Writer),

792

H. Zhuge et al.

Publisher (Bookman), Abstract (Outline, Abstraction), Price (Cost), Stock (Inventory, Amount, Quantity)}.

Fig. 2. An interface of the component-based service process definition tool

Fig. 3. The global schema of application in online book sale

After extracting the source schemas of the involved service components, the node mapping, path mapping, and tree mapping between the global schema and the source schemas can be constructed automatically. The global query sequence can be denoted as Each user query corresponds to non-continuous branches in the global query sequence, which further corresponds to branches of the source schemas. We use the example

A Semantic-Based Web Service Integration Approach and Tool

793

“To retrieve books about Web Services under $60” to illustrate the query matching process. The above user query corresponds to two sub-query branches which further corresponds to sub-queries at three involved service components as {(1, 4, Amazon/Book/Title, “Web Services”) AND (1, 11, Amazon/Book/Price, “